The Ceph Foundation: Democratising Data Storage
It should come as absolutely no surprise to anyone who hasn’t spent the past decade living under a rock that data has become the backbone, the body, the soul (the metaphor of your choice) of the modern world. Data storage, analysis, recovery, and management are mission critical capabilities for any enterprise - and the core value proposition for more than a few.
The data centre industry is experiencing an explosion of capacity throughout both mature and emerging markets, datasets are growing exponentially like some 1950s sci-fi special effect, and emerging tech trends like 5G, the IoT, artificial intelligence, machine learning, HPC, cold storage, and edge computing all conspire to pour gasoline on an already raging fire. Yet, the more critical that an effective data storage solution becomes to organisations and enterprises of all shapes and sizes, the more apparent it is that the solutions dominating the market today aren’t necessarily the right tools for the job.
“In retrospect especially, but even at the time there was a glaring hole in the market. There was a clear need: everybody needed storage, it needed to be scalable, and there was no open source option; you had to buy expensive proprietary solutions,” reflects Sage Weil, Principal Ceph architect at Red Hat, and the founder and chief architect of Ceph. “There needed to be an open source alternative that was good, and that's the niche we've tried to fill.”
The Ceph Foundation
Since the first prototype of Ceph was launched back in 2007, the community of enterprises, organisations, and users that use it has grown to touch a huge number of spaces, from small businesses to large scale enterprises; from the scientific community to regional telecom carriers.
In November 2018, a cluster of organisations actively involved in the development, support, and commercialisation of Ceph formed the Ceph Foundation, launching the new initiative under the umbrella of the Linux Foundation. The founding members included Amihan, Canonical, China Mobile, DigitalOcean, Intel, OVH, ProphetStor Data Services, Red Hat, SoftIron, SUSE, Western Digital, XSKY Data Technology, and ZTE.
“I was pretty naive back then. I thought you just built something, open sourced it, and people would just start appearing to develop it, fix bugs, etc. and that's not how it works,” laughs Weil. “We had spent several years trying to add all the features that we thought Ceph had to have before people would be willing to use it,” before the launch of the Foundation in 2018. “There are a lot of industry stakeholders that are selling or using Ceph as part of their business. So the Ceph Foundation became a way for those organisations to contribute funds that could be managed and spent to further Ceph's development and the community. Prior to the Foundation, it felt a lot more like begging - going around asking 'who wants to pay for X or Y',” he adds.
“The Ceph Foundation is essential to the Ceph community and our customers because its members are all invested in the development and progression of Ceph,” says Aaron Joue, founder and CEO of Ambedded Technology - which combines Ceph technology with its own line of decentralised Arm servers.
The power of the Foundation, continues Kyle Bader, a Data Foundation Architect at Red Hat, lies in its ability to drive the industry to “deliver on the promise of democratising software defined storage through open source in a way that’s very similar in the way that Linux led to the democratisation of the operating system.”
That democratisation, adds Craig Chadwell, VP of Product at SoftIron, creates the necessary competition (centred around the foundation itself) to push the Ceph commercial ecosystem to even greater heights. “The Ceph community is very large and robust. The Ceph Foundation helps to enliven and support that community, which in turn ensures that there will always be other options out there so that we can maintain that no vendor lock-in value proposition,” he explains. “It really forces us to continually challenge ourselves to deliver solutions that are uniquely solving customer problems, because the reality is, if a customer can move away and there's something providing more value out there, they will. It keeps us honest and on our toes.”
Philip Williams, Product Lead at Canonical, reflects that “a funny thing about the open source world is that essentially we’re all competitors, but we're also all working together to make something that is available for free even better.”
Meet Ceph: Reliable, Scalable, Affordable, Open Source
Developed by Weil - in collaboration with data storage researchers at the University of California: Santa Cruz, as well as at researchers from the country’s leading laboratories in Los Alamos and beyond - Ceph is a distributed, open source data storage solution that grew to fill that glaring hole in the market Weil and his colleagues saw back in the 2000s.
“Ceph is designed to provide a reliable storage service out of unreliable components. You take a bunch of individual hard drives that can fail, a bunch of networks that can fail, switches, servers that all individually are very fallible, you put them all together with Ceph and the net result is something that's highly reliable that tolerates any single point of failure - or in many cases many points of failure. It's highly available and highly scalable as well,” Weil explains, adding that Ceph is also capable of providing object, block, and file storage all in one system on the same hardware.
Ceph’s distributed approach to data storage is hugely fault tolerant. Like a commercial airliner that can continue to fly with all but one engine out of commission, Ceph is robust enough to handle all but the most catastrophic of outages.
As a storage solution, Ceph’s appeal also lies in its open source, software defined design that - in addition to delivering reliability and flexibility at scale - excels at adding up to far more than the sum of any somewhat meagre parts you might happen to have lying around. “Ceph is open source, software defined, and meant to be run on any commodity hardware you want to buy or already have,” Weil says. “It doesn't matter which vendor you're buying your hardware from, whether you're using hard drives or SSDs, what kind of switches are in your network; it's fully software defined,” and therefore a legitimate and long-awaited answer to market demand for alternatives to restrictive, proprietary storage solutions.
“Storage is quite an interesting industry. It's kind of hidden; people don't really think about storage until it's either too expensive or it's not available and, worst case, all your data has been lost,” says Philip Williams, a Product Leader at Canonical. “So, it's this funny little world that's dominated by a number of very large players. The whole aim of the Ceph Foundation is not just to shepherd the upstream projects and this collaborative development work on Ceph itself, but also to demonstrate to enterprise users that there is this viable alternative to the big players, and that their organisations don't have to be developer centric to make use of Ceph.”
Ceph’s open source, software defined nature means that organisations looking to deploy it can “choose any hardware you like, choose any vendor you like - or even no vendor at all - but if you build a Ceph system and you want to switch vendors or run things on your own, you can do that very easily.”
In addition to offering the unparalleled freedom to start from scratch, move freely within its ecosystem, and avoid both the vendor lock in agreements and cumbersome, expensive upgrade cycles that define managed, proprietary storage solutions, Weil adds that the beauty of Ceph is that “Because it's so flexible and built to scale, Ceph doesn't require a lot of foreknowledge about where your organisation's going to be in a couple of years time. You can just expand your hardware footprint in whatever direction you end up growing.”
Large storage systems - the kinds that are increasingly coming to define the cloud and data centre industries - are fundamentally dynamic. They grow and change in new and unexpected directions in response to the market and, with Ceph, organisations can grow and change with as little friction as possible. “You might start out with 10 servers from one vendor, and then five years later you're storing 12 times as much data and you've been through three different hardware revisions all from different vendors, you've had to migrate data, change policy, and now you're storing a different type of data than you were before - it's all a total mess,” Weil laughs. “Often, your net system is going to be a mixture of all sorts of different stuff, and open source lends itself to solving those problems really well because you have the neutrality to be flexible and adaptable. If you're buying a proprietary solution from a particular vendor, you're going to have to buy more of the X solution that they allow you to interoperate with. You're locked into a particular path.” Ceph, he adds, not only frees organisations from those restrictive, vendor-defined upgrade paths, but opens up a huge, mature ecosystem of enterprises and community members to its user base.
Harnessing the Ceph community
When it comes to harnessing the true value of Ceph, its commercial ecosystem and user community are pivotal. From companies like Red Hat that deliver Ceph solutions to Fortune 500 companies, to SoftIron which simplifies the Ceph adoption process through curated, in-house designed hardware that’s tailor made to support its deployment, Ambedded, Canonical, and beyond, the Ceph commercial ecosystem provides support and services that allow companies of any scale, maturity, or specialisation to deploy and benefit from distributed storage - all built on Ceph.
“When it comes to getting started with Ceph, it can be an issue knowing which servers to buy, which hard drives and how many,” Weil acknowledges. “That's where companies in the commercial ecosystem really add a lot of value, not to mention the open source community at large.”
Ceph for everyone
Since the dawn of the open-source approach to software design, open source solutions have often garnered “a reputation for being really complicated to use”, Weil admits - adding that he and the Ceph team have spent the past few years painfully aware of that fact. Now, however, “A lot of the stigma surrounding open source in general has gone away in recent years,” he explains, something that perfectly aligns with the latest evolution of Ceph’s General User Interface (GUI).
“These days, if you're a small business and you need 100 terabytes of storage, you're going to want something with a nice GUI that just works,” Weil notes. “So, over the last three to four years, there's been a huge investment of time and resources in the Ceph community on the usability front. We've created a whole new, integrated GUI dashboard for Ceph for management. We've also developed an orchestrator layer for Ceph that can call out to whatever tools you use to deploy it, so that you can do just about anything you need to do from the new GUI. I think we've made huge progress.”
Enterprise storage is full of challenges. Apart from the obvious spiralling quantity of data being generated, the applications that create and use that data are also increasingly diverse and changing almost daily. Storage, of course, is also not immune to the broader IT skills crisis that enterprises find themselves dealing with every day. Add to that the constant revolving door of mergers and acquisitions in the storage industry and it's hard not to find a storage manager that hasn't been burned by obsoleted or sidelined proprietary solutions that have fallen out of favour. It's little wonder then that a platform like Ceph - able to flex and grow to meet ever changing demands across a huge variety of use cases - and do all that from within a vibrant open source community eliminating the lock-in problem, becomes deeply compelling.
The Ceph decade
Looking to the future, the intersection of market trends with Ceph’s constantly developing capabilities (Weil stresses that a sizable portion of the Foundation’s role is keeping up with cutting edge hardware developments to ensure Ceph continues to run smoothly, no matter what you plug it into), as well as an ever-expanding ecosystem of vendors, users, and developers positions it ideally for a decade of meteoric growth. “Ceph is a pretty mature piece of software at this point,” Weil reflects. “All of the important stuff is there and, in addition to building it out further, we’re starting to add a lot of polish.”
Craig Chadwell, VP of Product at SoftIron, reflects that “open-source infrastructure has rapidly evolved and matured over the last decade and is in all likelihood going to be the way that most organisations deploy their IT footprint going forward.”
“People like to call Ceph the Linux of storage, which I think is appropriate,” adds Weil. “Nobody thinks about which Unix they should buy because the open source one is the best, everyone's using it, and everyone is constantly improving it. Ceph is moving into that position in the storage space.”