Technology Spotlight: Who Is Cloudera?
Cloudera, Inc. is a US-based software company that provides a software platform for data engineering, data warehousing, machine learning and analytics that runs in the cloud or on-premises. The company started as a hybrid open-source Apache Hadoop distribution, CDH (Cloudera Distribution Including Apache Hadoop), that targeted enterprise-class deployments of that technology. Cloudera states that more than 50% of its engineering output is donated upstream to the various Apache-licensed open source projects (Apache Spark, Apache Hive, Apache Avro, Apache HBase, and so on) that combine to form the Apache Hadoop platform. Cloudera is also a sponsor of the Apache Software Foundation. Cloudera was founded in 2008 by three engineers from Google, Yahoo! and Facebook (Christophe Bisciglia, Amr Awadallah and Jeff Hammerbacher, respectively) joined by a former Oracle executive (Mike Olson).
- DataFlow - Cloudera DataFlow (CDF) is a scalable, real-time streaming analytics platform that ingests, curates, and analyzes data for key insights and immediate actionable intelligence. CDF offers a simple visual UI for building sophisticated data flows to accomplish major data ingestions, transformations, and enrichment from a variety of streaming sources. Powered by Apache NiFi, CDF ingests data from devices, enterprise applications, partner systems, and edge applications generating real-time streaming data. It also enables high volume data collection at the edge, even from edge devices using Minifi. Now you can set up widely distributed IoT deployment models for regional data collection with ease using NiFi with Minifi to stream data from the edge. Tight integration with Apache Ranger gives CDF the unique advantage of seamless security across all your data-in-motion and data-at-rest. Using the powerful streaming platform Apache Kafka, CDF can process several million transactions per second, identify key patterns, compare against machine learning models, and offer predictive or prescriptive analytics to help business leadership make key decisions and seize opportunities. Finally, CDF is the only product in the industry offering data provenance and edge-to-enterprise data governance out of the box. In the age of GDPR and other regulatory compliance, it’s important to track data lineage, even for streaming data. NiFi within CDF offers data provenance tracking without any extra configuration or setup. With tight integration of Apache Atlas, you have a complete governance of data from the edge to the enterprise.
- Data Science Workbench - With Python, R, and Scala directly in the web browser, Cloudera Data Science Workbench (CDSW) delivers a self-service experience data scientists will love. Download and experiment with the latest libraries and frameworks in customizable project environments that work just like your laptop. Access any data, anywhere—from cloud object storage to data warehouses, Cloudera Data Science Workbench provides connectivity not only to CDH and HDP but also to the systems your data science teams rely on for analysis. Cloudera Data Science Workbench lets data scientists manage their own analytics pipelines, including built-in scheduling, monitoring, and email alerting. Quickly develop and prototype new machine learning projects and easily deploy them to production.
- Enterprise Data Hub - From autonomous vehicles and surgical robots to churn prevention and fraud detection, enterprises rely on data to uncover new insights and power world-changing solutions. And it all starts with a foundational data management platform. Cloudera delivers an integrated suite of analytic engines ranging from stream and batch data processing to data warehousing, operational database, and machine learning. Cloudera SDX applies consistent security and governance, enabling users to share and discover data for use across workloads. Tackling complex data-driven problems requires analytics working in concert, not isolation. Cloudera SDX combines enterprise-grade centralized security, governance, and management capabilities with shared metadata and data catalogue, eliminating costly data silos, preventing lock-in to proprietary formats, and eradicating resource contention. Now all users and administrators can enjoy the advantages of shared data experience.
- Fast Forward Labs Research - Despite its promise, machine learning can be downright daunting. Best efforts can be quickly undermined by uncertainty about a rapidly changing technical landscape, bewilderment on how best to build and organize teams, and difficulty separating hype from reality. Free up executives and data science teams to focus on the future of the business with a virtual dedicated research staff that continually monitors the latest techniques and industry best practices, determining how best to apply them to your difficult business problems. Cloudera Fast Forward Labs Research focuses on emerging trends that are still changing due to algorithmic breakthrough, hardware breakthrough, technological commoditization, and data availability. Accompanying the reports are working prototypes that exhibit the capabilities of the algorithm and offer detailed technical advice on its practical application.
- Hortonworks Data Platform - Hortonworks Data Platform (HDP) is an open-source framework for distributed storage and processing of large, multi-source data sets. HDP modernizes your IT infrastructure and keeps your data secure—in the cloud or on-premises—while helping you drive new revenue streams, improve customer experience, and control costs. HDP enables agile application deployment, machine learning and deep learning workloads, real-time data warehousing, and security and governance. It is a key component of modern data architecture for data at rest.
Case Study: Cloudera and Banco Santander:
Santander Group is a leading retail and commercial bank founded and based in Spain. Ranked as the 1st bank in the eurozone by market capitalization, the organization runs a variety of diversified businesses around the world, with a clear focus on digital transformation. With an array of new digital solutions, Santander Group is setting the benchmark for technology innovation for the best customer experience. The bank could no longer process the vast amount of data being generated reliably and within reasonable timelines. The Group was on the lookout for a scalable big data infrastructure to fuel growth. The Group deployed Cloudbreak, part of HDP, along with open source tools such as Spark, Nifi, and Kafka to power its hybrid data architecture. The Group has achieved a consistent overview of data across the organization, reduced infrastructure cost by 20x, and a 10x faster time to market for applications.
Check out the full case study here.
Cloudera is a strategic partner of Bupa, a company who we featured in this month's magazine. A spokesman for Bupa told us that “We have been expanding to provide a variety of services to our customers, and with that growth, digital transformation has provided a lot of opportunities for us. Across the Technology team, we manage various digital estates such as data platforms, and have worked with companies along the way including Cloudera to explore our options for continued development.” You can find the full brochure here.
Confluent announces new private cloud building platform
Confluent, a platform that sets data in motion, today announced Confluent for Kubernetes, the first platform purpose-built to bring cloud-native capabilities to data streams in private infrastructures.
Confluent for Kubernetes allows platform teams to bring much of the same cloud-native experience found within Confluent Cloud to their self-managed environments while enabling operations teams to retain control of their data and infrastructure. As a cloud-native solution, Confluent for Kubernetes helps achieve faster time-to-value and reduce operational burdens with a fully elastic and scalable cloud-native experience in private infrastructure.
“To compete in the digital realm, organisations need to quickly deliver personalised customer experiences and real-time operations, which are only possible with access to data from all environments and cloud-native advantages,” said Ganesh Srinivasan, Chief Product and Engineering Officer, Confluent.
“For organisations that need to operate on-premises, we’re bringing the benefits of cloud computing to their private infrastructure with Confluent for Kubernetes. Now, any company can build a private cloud service to move data across their business regardless of its environment.”
How can Confluent for Kubernetes help?
Organisations who are transitioning to the cloud or who need to keep workloads on-premises can use Confluent for Kubernetes’ cloud-native capabilities, including a declarative API to deploy and operate Confluent. According to the company, the platform also makes moving applications to the public cloud easier by ‘seamlessly migrating workloads to wherever your business needs them with the ability to connect and share data with Confluent Cloud’.
Enhanced reliability – As a cloud-native system, Confluent for Kubernetes detects if a process fails and will automatically restart processes or reschedule as necessary. Automated rack awareness spreads replicas of a partition across different racks, improving the availability of your brokers and limiting the risk of data loss.
Automated elasticity – Meet changing business demands with the ability to scale up using API-driven operations. The platform will automatically generate configurations, schedule and run new broker processes, and ensure data is balanced across brokers so that clusters can be efficiently utilised.
Simplified infrastructure management – Confluent for Kubernetes extends the Kubernetes API, enabling organisations to define the desired high-level state of clusters rather than manage all the low-level details. This infrastructure-as-code approach reduces the operational burden and achieves a faster time to value, while enhancing security with standards that can be easily and consistently deployed across an organisation.