Let’s go fishing in the pond
Fishing is a very special time to me, especially when I go with my son, Connor. It is father and son time; a time to check in see how life is going, get insights into his life and see if I can add any counsel to lessen his burdens. It is my job as his father to offer advice and help drive real life results, to identify the data points that mean the most and hold the most value to enhance and engage his life experiences.
When I go fishing with my son, sometimes we fish in the ocean, sometimes we fish in the lake and sometimes we fish in a local pond. When we fish in the ocean, we need to charter a boat, a captain and we are never sure what we will catch… if anything. We also must deal with waves, currents and white caps, and the Atlantic Ocean is massive. When we go fishing in the lake, it is not as expensive as fishing in the ocean and we have a greater likelihood of knowing what we will catch. When my son and I fish in the pond, we know what we will catch, it takes less time to catch a fish and there is no need to rent a boat. Fishing in the pond is simple, easy and fun compared to fishing in the ocean or a lake.
Big Data is like fishing in the ocean - massive volumes of both structured and unstructured data that is so large it is difficult to process through traditional database and software techniques. In most organisations, the volume of data is too big for it to move quickly through system processing, or it exceeds current processing capacity. Big Data is high volume and high variety: it requires new technologies and techniques to capture, store, and analyse it, it is used to enhance decision making, provide insight and discovery, and support and optimise processes. It is always challenging and costly to collect, manage and use, and it is not necessarily relevant to any specific problem or issue to resolve. Gartner defines Big Data as “high-volume, high-velocity and high-variety information assets that demand cost-effective, innovative forms of information processing for enhanced insight and decision making”; we must be aware the data we have is not necessarily the data we really need to drive value.
Data lakes are like fishing in a lake – not as large as an ocean, and with a more concentrated type of data. The data lake storage repository holds a vast amount of raw data (in its native format) until it is needed. While a hierarchical data warehouse stores data in files or folders, a data lake uses a flat architecture to store data, the purpose of which is not yet defined. You can store your data as is without having to first structure the data and run different types of analytics—from dashboards and visualisations to Big Data processing, real time analytics and machine learning to guide better decisions. Gartner refers to Data Lakes in broad terms as “enterprise-wide data management platforms for analysing disparate sources of data in its native format”. The data we capture is missing the context and framework to drive insights.
Data pond is a term I crafted many years ago during my undergraduate studies at St John’s University in New York City. A well-realised data pond can provide critical insights and vital clarity that is almost impossible to find with larger volumes of data. You can have data without information, but you cannot have information without data. That being said, there is zero value in information if it doesn’t drive actionable insights. Why do we think bigger is better and more is better than less? I think less is better, more is waste and bigger is not better. Bigger is just bigger, more costly, hard to deal with and extremely difficult to drive real actional insights that will help lead an organisation to success.
Initiated in 1958, completed in 1963, Project Mercury was the United States' first man-in-space program. The objectives of the program, which made six manned flights from 1961 to 1963, were highly specific: orbit a manned spacecraft around Earth, investigate man's ability to function in space, and recover both man and spacecraft safely. The computers used on that project utilised 300 kilobytes of memory. If you can operate a spacecraft on less memory that it takes to take a snapshot of my kids, we can certainly do more with less and drive real actionable insights through data ponds. Small enough for human comprehension, data ponds offer an accessible volume and format that is informative and, most importantly, actionable. It is not about the data, it is about the insights that will drive value. This is the end game, nothing more, nothing less. Why fish in the ocean when you have all you can eat in the pond right next door?
Fish in the pond with me and my son, not in the ocean with Captain Ahab or in the lakes with the Loch Ness Monster. You will find the fish you’re looking for faster and easier at a lower cost, and you can tell all your friends about the insights you learned about life and business while fishing. Data ponds are the place to fish, drive actionable insights and not get lost in the sea of data.
By Paul Bailo, Global Head of Digital Strategy and Innovation, Infosys