Technology Spotlight: Who Is Cloudera?

By Kayleigh Shooter

June 10, 2020

undefined mins

Share this article

Prioritise Us on Google

Share this article

Prioritise Us on Google

We take a closer look into Cloudera, a software company and a strategic partner of Bupa, who was featured in this month’s magazine...

Business Overview:

Cloudera, Inc. is a US-based software company that provides a software platform for data engineering, data warehousing, machine learning and analytics that runs in the cloud or on-premises. The company started as a hybrid open-source Apache Hadoop distribution, CDH (Cloudera Distribution Including Apache Hadoop), that targeted enterprise-class deployments of that technology. Cloudera states that more than 50% of its engineering output is donated upstream to the various Apache-licensed open source projects (Apache Spark, Apache Hive, Apache Avro, Apache HBase, and so on) that combine to form the Apache Hadoop platform. Cloudera is also a sponsor of the Apache Software Foundation. Cloudera was founded in 2008 by three engineers from Google, Yahoo! and Facebook (Christophe Bisciglia, Amr Awadallah and Jeff Hammerbacher, respectively) joined by a former Oracle executive (Mike Olson).

Its products:

DataFlow - Cloudera DataFlow (CDF) is a scalable, real-time streaming analytics platform that ingests, curates, and analyzes data for key insights and immediate actionable intelligence. CDF offers a simple visual UI for building sophisticated data flows to accomplish major data ingestions, transformations, and enrichment from a variety of streaming sources. Powered by Apache NiFi, CDF ingests data from devices, enterprise applications, partner systems, and edge applications generating real-time streaming data. It also enables high volume data collection at the edge, even from edge devices using Minifi. Now you can set up widely distributed IoT deployment models for regional data collection with ease using NiFi with Minifi to stream data from the edge. Tight integration with Apache Ranger gives CDF the unique advantage of seamless security across all your data-in-motion and data-at-rest. Using the powerful streaming platform Apache Kafka, CDF can process several million transactions per second, identify key patterns, compare against machine learning models, and offer predictive or prescriptive analytics to help business leadership make key decisions and seize opportunities. Finally, CDF is the only product in the industry offering data provenance and edge-to-enterprise data governance out of the box. In the age of GDPR and other regulatory compliance, it’s important to track data lineage, even for streaming data. NiFi within CDF offers data provenance tracking without any extra configuration or setup. With tight integration of Apache Atlas, you have a complete governance of data from the edge to the enterprise.
Data Science Workbench - With Python, R, and Scala directly in the web browser, Cloudera Data Science Workbench (CDSW) delivers a self-service experience data scientists will love. Download and experiment with the latest libraries and frameworks in customizable project environments that work just like your laptop. Access any data, anywhere—from cloud object storage to data warehouses, Cloudera Data Science Workbench provides connectivity not only to CDH and HDP but also to the systems your data science teams rely on for analysis. Cloudera Data Science Workbench lets data scientists manage their own analytics pipelines, including built-in scheduling, monitoring, and email alerting. Quickly develop and prototype new machine learning projects and easily deploy them to production.
Enterprise Data Hub - From autonomous vehicles and surgical robots to churn prevention and fraud detection, enterprises rely on data to uncover new insights and power world-changing solutions. And it all starts with a foundational data management platform. Cloudera delivers an integrated suite of analytic engines ranging from stream and batch data processing to data warehousing, operational database, and machine learning. Cloudera SDX applies consistent security and governance, enabling users to share and discover data for use across workloads. Tackling complex data-driven problems requires analytics working in concert, not isolation. Cloudera SDX combines enterprise-grade centralized security, governance, and management capabilities with shared metadata and data catalogue, eliminating costly data silos, preventing lock-in to proprietary formats, and eradicating resource contention. Now all users and administrators can enjoy the advantages of shared data experience.
Fast Forward Labs Research - Despite its promise, machine learning can be downright daunting. Best efforts can be quickly undermined by uncertainty about a rapidly changing technical landscape, bewilderment on how best to build and organize teams, and difficulty separating hype from reality. Free up executives and data science teams to focus on the future of the business with a virtual dedicated research staff that continually monitors the latest techniques and industry best practices, determining how best to apply them to your difficult business problems. Cloudera Fast Forward Labs Research focuses on emerging trends that are still changing due to algorithmic breakthrough, hardware breakthrough, technological commoditization, and data availability. Accompanying the reports are working prototypes that exhibit the capabilities of the algorithm and offer detailed technical advice on its practical application.
Hortonworks Data Platform - Hortonworks Data Platform (HDP) is an open-source framework for distributed storage and processing of large, multi-source data sets. HDP modernizes your IT infrastructure and keeps your data secure—in the cloud or on-premises—while helping you drive new revenue streams, improve customer experience, and control costs. HDP enables agile application deployment, machine learning and deep learning workloads, real-time data warehousing, and security and governance. It is a key component of modern data architecture for data at rest.

Case Study: Cloudera and Banco Santander:

Santander Group is a leading retail and commercial bank founded and based in Spain. Ranked as the 1st bank in the eurozone by market capitalization, the organization runs a variety of diversified businesses around the world, with a clear focus on digital transformation. With an array of new digital solutions, Santander Group is setting the benchmark for technology innovation for the best customer experience. The bank could no longer process the vast amount of data being generated reliably and within reasonable timelines. The Group was on the lookout for a scalable big data infrastructure to fuel growth. The Group deployed Cloudbreak, part of HDP, along with open source tools such as Spark, Nifi, and Kafka to power its hybrid data architecture. The Group has achieved a consistent overview of data across the organization, reduced infrastructure cost by 20x, and a 10x faster time to market for applications.

Check out the full case study here.

Cloudera is a strategic partner of Bupa, a company who we featured in this month's magazine. A spokesman for Bupa told us that “We have been expanding to provide a variety of services to our customers, and with that growth, digital transformation has provided a lot of opportunities for us. Across the Technology team, we manage various digital estates such as data platforms, and have worked with companies along the way including Cloudera to explore our options for continued development.” You can find the full brochure here.