IBM Research has unveiled CodeFlare, a new framework for integrating and scaling big data and AI workflows in a hybrid cloud environment. The open-source framework aims to help developers cut back the time they spend creating pipelines to train and optimise machine learning models.
CodeFlare is built on top of Ray, an emerging open-source distributed computing framework for machine learning applications. CodeFlare extends the capabilities of Ray by adding specific elements to make scaling workflows easier, according to IBM.
Researchers and developers have to train and optimise the model first to create a machine learning model today. CodeFlare simplifies this process using a Python-based interface for what’s called a pipeline—by making it simpler to integrate, parallelise and share data. The new framework aims to unify pipeline workflows across multiple platforms without requiring data scientists to learn a new workflow language.
A simpler way to integrate and scale full pipelines
CodeFlare pipelines run with ease on IBM’s new serverless platform IBM Cloud Code Engine, and Red Hat OpenShift, the company explained. It allows users to deploy it just about anywhere, extending the benefits of serverless to data scientists and AI researchers. It also makes it easier to integrate and bridge with other cloud-native ecosystems by providing adapters to event-triggers (such as the arrival of a new file), and load and partition data from a wide range of sources, such as cloud object storages, data lakes, and distributed filesystems.
CodeFlare "goes beyond isolated tasks to seamlessly integrate and scale end-to-end pipelines with a data-scientist-friendly interface--like Python--instead of using containers,'' said Priya Nagpurkar, director, hybrid cloud platform at IBM Research. "CodeFlare can provide a simpler way to integrate and scale full pipelines, while offering a unified runtime and programming interface."
The company has already seen CodeFlare in action and cutting time. For example, one user applied the framework to analyse and optimise approximately 100,000 pipelines for training machine learning models, CodeFlare cut the time it took to execute each pipeline from 4 hours to 15 minutes