Data warehousing: why the need for flexibility is an inflexible truth
As the global business landscape is increasingly digitalised, and new technologies like 5G drive the exponential expansion of the Internet of Things (IoT), the amount of data created on a daily basis is growing exponentially. Business intelligence and research firm Raconteur found this year that, on an average day, 500mn tweets, 65bn WhatsApp messages and 294bn emails are sent, while four petabytes of data are created on Facebook and 5bn searches are made online. By 2025, it’s estimated that 463 exabytes of data will be created each day globally – the equivalent of 212,765,957 DVDs per day.
In order to keep pace and stay afloat, modern businesses need to gather, store, analyse and draw insights from a mind-bending amount of raw data. Determining what information is valuable, how to extract it and where to keep it are the challenges that every business in the current landscape must overcome. This landscape, however, is changing so fast that today’s solutions are outdated within as little as six months. In order to keep up, enterprises are increasingly moving towards third party data management and storage solutions hosted in the cloud, for the flexibility and access to leading edge technology that they provide.
Gigabit magazine spoke with experts in the data warehousing space to gauge the state of the evolving data warehousing industry, and why flexibility is at the heart of leading modern solutions. But first…
What is a data warehouse?
The differences between a database and a data warehouse aren’t immediately obvious. Both contain data. Both databases and data warehouses are what’s called ‘relational’ data systems; they each store data in a structured format, using rows and columns. Where they differ is the purposes they serve. Also affecting the market are data lakes, which are newer, and solve different problems in a slightly different way.
A database stores current transactions and enables quick, easy access to specific transactions for ongoing business processes, known as Online Transaction Processing (OLTP).
Data warehouses, on the other hand, present a consolidated view of either a physical or logical data repository collected from various systems, according to Panoply. They are best at providing a correlation between data from existing systems (product inventory stored in one system and purchase orders for a specific customer, stored in another system for example), and are mostly used for online analytical processing (OLAP), which uses complex queries to analyse rather than process transactions.
Lastly, a data lake is a newer, highly scalable storage system that holds structured and unstructured data in its original form and format, rather than organising it into rows and columns like a database or warehouse. A data lake does not require planning or prior knowledge of the data analysis needed - it assumes that analysis will happen later, on-demand.
Jean-Michel Franco, Senior Director of Data Governance, Talend
“According to a TDWI and Talend survey, the top reasons companies migrate to a cloud data warehouse are: a flexible cost model, to take advantage of cloud features, faster performance and to migrate existing products to cloud. The on-premises data warehouse business is shrinking inexorably. Most new customer data warehouses under construction today are being built in the cloud (most commonly Snowflake, AWS Redshift, Azure SQL Data Warehouse, or Google BigQuery).
Putting your data repository in the cloud is simply better. It’s faster, more scalable, with zero install time, you can go live in minutes, and it’s always up-to-date. Nearly every single company looking for a new data warehouse or a new data lake will choose a cloud-based data repository.”
Rob Lamb, Chief Technology Officer, Dell Technology, UK
“There are fundamental differences between data lakes and data warehousing, and some challenges arise from confusion over terminology and usage. Data lakes and data warehouses are both used for storing data, but they are not the same. A data lake is a large pool of raw data set for future extraction and analysis – it needs to be searchable, but that may be the extent of tooling provided.
“The oil and gas industry was an early adopter of data lakes to land data for use cases such as minimising unplanned downtime and improving safety. A data warehouse is a repository for structured data supported by a combination of processes and tools to prepare data for a specific purpose. For example, warehousing is essential for the healthcare industry as it utilises it to strategise and predict outcomes, generate patients’ treatments and share data with medical aid services.”
Lamb has worked for Dell for almost a decade, watching the global explosion of data and working to support the expansion of cloud infrastructure from cutting edge, niche technology to the foundation of modern digital society.
“There has been a shift towards the use of cloud for data warehouse architecture in recent years as the services and capabilities have matured,” he continues. “There are three primary drivers for organisations looking at cloud for data warehousing:
The inability to handle the speed and volume of multi-source data, especially IoT data;
The inability to find a single technological solution to collect, store, and organise data from disparate sources;
The inability to handle Big Data projects with a single database;
“The challenge is managing these data sources and only integrating the valuable data into the data warehouse.”
Walter Heck, CTO, HeleCloud, Netherlands
“The more data businesses gather, the more information they have at their disposal. In a digital world, this is a great asset. But, with more data comes more responsibility. Businesses process and store thousands, millions, sometimes even billions of transactions each day, all of which need to be managed securely and effectively. The ability to store large quantities of data is being made increasingly possible by creating data warehouses,” says Heck, who took on his current role at the Amazon Web Services (AWS) Advanced Consulting Partner in August.
Heck has seen data warehouses grow dramatically in both size and complexity over the past year. He notes that the trend is spurring a large number of enterprises to closely investigate the possibilities of new generations of cloud and data management infrastructure, particularly those that are backed by machine learning and AI which allow companies to more accurately manipulate and understand their data.
The change, Heck believes, could not have come at a better time.
“Despite widespread talk of digital transformation, many companies across the globe still do not optimally use the data available to them. This is because data tends to sit undiscovered in silos across these businesses,” he explains. “That said, businesses are starting to wake up to this reality. As such, we are likely to see organisations start organising their approach to managing data. This is a good thing. With the introduction of 5G and the evolution of edge computing, data volumes are likely to explode to unprecedented levels in the next few years. This means that data warehousing needs to be flexible enough to scale based on volume as well as integrate the many different data types for analysis.”
A flexible future in the cloud
The mass migration of the modern enterprise to the cloud may even see CTOs and digital executives move their organisations beyond the concept of the data centre altogether. Rather than storing data in warehouses, solutions that provide even more immediate access as a flexible service are becoming the object of demand for industry leaders. Regardless, the days of on-premises legacy systems are ending, and companies need to look to the future if they expect to survive and thrive in a future where the accumulated digital universe is predicted to expand from 4.4 zettabytes at the start of this year to more than 44 zettabytes in 2020. Data is the future, and in the future only the flexible will survive.
IT Employees Predict 90% Increase in Cloud Security Spending
As companies get back on their feet post-pandemic, they’re going all-in on cloud applications. In a recent report by Devo Technology titled “Beyond Cloud Adoption: How to Embrace the Cloud for Security and Business Benefits”, 81% of the 500 IT and security team members surveyed said that COVID accelerated their cloud timelines. More than half of the top-performing businesses reported gains in visibility. In fact, the cloud now outnumbers on-premise solutions at a 3:1 ratio.
But the benefits are accompanied by significant cybersecurity risks, as cloud infrastructure is more complex than legacy systems. Let’s dive in.
Why Are Cloud Platforms Taking Over?
According to Forrester, the public cloud infrastructure market could grow 28% over the next year, up to US$113.1bn. Companies shifting to remote work and decentralised workplaces find it easy to store and access information, especially as networks start to share more and more supply chain and enterprise information—think risk mitigation platforms and ESG ratings.
Here’s the catch: when you shift to the cloud, you choose a more complex system, which often requires cloud-native platforms for network security. In other words, you can’t stop halfway. ‘Only cloud-native platforms can keep up with [the cloud’s] speed and complexity” and ultimately increase visibility and control’, said Douglas Murray, CEO at cloud security provider Valtix.
Here’s a quick list of the top cloud security companies, as ranked by Software Testing Help:
What are the Security Issues?
Here’s the bad news. According to Accenture, less than 40% of companies have achieved the full value they expected on their cloud investments. All-in greater complexity has forced companies to spend more to hire skilled tech workers, analyse security data, and manage new cybersecurity threats.
The two main issues are (1) a lack of familiarity with cloud systems and (2) challenges with shifting legacy security systems to new platforms. Out of the 500 IT employees from Devo Technology’s cloud report, for example, 80% said they’d sorted 40% more security data, suffered from a lack of cloud security training, and experienced a 60% increase in cybersecurity threats.
How Will Companies React?
They certainly won’t stop investing in cloud platforms. Out of the 500 enterprise-level companies that Devo Technology talked to throughout North America and Western Europe, 90% anticipated a jump in cloud security spending in 2021. They’ll throw money at automating security processes and investing in security upskilling programmes.
After all, company executives will find it incredibly difficult to stick with legacy systems when some cloud-centred companies have found success. Since moving from Security Information and Event Management (SIEM) offerings to the cloud, Accenture has saved up to 70% on its processes; recently, the company announced that it would invest US$3bn to help its clients ‘realise the cloud’s business value, speed, cost, talent, and innovation benefits’.
The company stated: ‘Security is often seen as the biggest inhibitor to a cloud-first journey—but in reality, it can be its greatest accelerator’.