Dispelling misconceptions about synthetic data sets
In November last year, Amazon released AWS Data Exchange, which made it easier for AWS customers to use and share data. Meanwhile, recently Google Data Search came out of beta which makes it easy for researchers to access more than 25 million publicly available data sets.
What can we make of these moves by big tech to open up the sharing of data? It is one I've observed over the last number of years (and one that makes me especially optimistic about the future growth of the world economies) - data is becoming a service.
To put data and its importance into context, McKinsey estimates that about 1.7 megabytes a second of new data will be created for every person globally this year. Considering how central banking and financial services are to daily life, with 65% of customers now interacting with their banks via digital channels, it is fair to assume that the sector will play a pivotal role in the development of this trend. In fact, it is estimated that just under half (47%) of people will check in on their online banking every single day.
This puts incredible pressure on these organisations in critical ways, including accountability to comply with data and privacy regulations (such as GDPR introduced into the EU in 2018), while also catering to consumers in an “always-on” digital environment.
When it comes to data provisioning for development and testing purposes, historically there have been two categories relied upon by financial organisations - original and anonymous. Original refers to all personally identifiable information (PII), such as a customer’s name and a transaction’s details being available. Meanwhile, anonymous data (generally speaking) removes such PII, but includes transactional data. Clearly, with both types of data, there are significant challenges for financial organisations to protect this information, while also remaining compliant with ever more complex regulations around data protection.
Yet there is an alternative AI-driven approach for financial organisations to consider in development - so-called synthetic or synthesized data.
What Is Synthetic Data?
In essence, synthetic data is computer-synthesized data (powered by cutting-edge machine learning technology) that mimics original data.
When implemented accurately, the benefits of this approach include full data privacy compliance and reducing the time needed for product development and testing; synthesizing data can take as little as 10 minutes. As a result, extremely sensitive data can be unlocked to turbocharge product or service development with no actual risks around a potential data breach.
Even still, to truly understand the impact of synthetic data and the impact it could have on the financial industry, it is important to consider what I believe are two common misconceptions related to it.
1. Synthetic data refers to anonymised data
When data anonymisation is discussed, “synthetic” frequently refers to “modified” (and interchangeably so), meaning original data is altered in some systematic way to make it more difficult to identify the original data points.
In fact, there are three main approaches to data provisioning currently available on the market:
anonymised data - produced by a 1-to-1 transformation from original data,
artificial data - produced by a probabilistic model, based on a sample of data
fully synthetic data - produced by a generative model of original data which “understands” how original data should look like.
So powerful is synthetic data that recent research from MIT found that by using high-quality data synthesized by an advanced machine learning engine, it is possible to get the same results for a data-driven project as using original data.
2. Synthetic data can only ever be worse than real data
There can at times be an assumption that data which mimics real data is a poor copy of the original. The overall aim of synthetic data is that it wants to be just as good as providing such key insights as original data. When used right, synthetic data can be just as insightful as needed, what is really critical is that the way the data is created is optimised for what financial organisations are looking for.
By using agile data synthesized by an algorithm (i.e. created and examined in minutes thanks to advanced technologies) the potential to free up a financial organisation’s staff for other tasks is obvious, as even collecting original data is time-consuming, with an estimated 12.5% of development time taken by the process.
The future of data, which is increasingly available on-demand, is actually already here. The financial organisations that face the data challenge directly, and utilise the power and efficiencies of synthetic data, will be the winners for digital customers, both now and into the future.
By Dr Nicolai Baldin, CEO & Founder, Synthesized
Logi Analytics Webinar: Meet the speaker
Data allows business owners to leverage digital insights and embrace the power of data-driven business intelligence to make more informed decisions that are better for business growth and evolution. By using data to drive its actions, an organisation can contextualise and/or personalise its messaging to its prospects and customers for a more customer-centric approach.
BizClik Media Group and Logi Analytics invite you to explore next-gen embedded analytics in our live webinar. There’s still time to sign up for the event entitled ‘Application Imperative: How Next-Gen Embedded Analytics Power Data-Driven Action’, which is taking place on 10 June at 4 pm BST.
The webinar will be led by Constellation Research’s Principal Analyst, Doug Henschen, who focuses on data-driven decision-making. Henschen’s Data-to-Decisions research examines how organisations employ data analysis to reimagine their business models and gain a deeper understanding of their customers.
Henschen's research acknowledges that innovative data analysis applications require a multi-disciplinary approach starting with information and orchestration technologies, continuing through business intelligence, data visualisation, and analytics, and moving into NoSQL and big-data analysis, third-party data enrichment, and decision-management technologies.
Constellation Research is a technology research and advisory firm based in Silicon Valley. Prior to joining Constellation, Doug Henschen led analytics, big data, business intelligence, optimisation, smart applications research, and news coverage at InformationWeek.
What will the webinar cover?
This exclusive webinar will explain next-gen embedding capabilities that will enable your company to:
- Eliminate unproductive toggling between transactional interfaces and purely analytic dashboards
- Drive two-way interactions between app features and embedded analytics to drive data-driven action
- The compounding impact of embedded analytics on your overall ROI
- Harness analytics as triggers for automated workflows and suggested next-best actions
- Enable developers to build quickly without coding while customising self-service options for end users
Logi Analytics is the only developer-grade analytics solutions provider focused exclusively on embedding analytics in commercial and enterprise applications, empowering the world’s software teams with the most intuitive data analytics solutions and a team of dedicated professionals invested in your company’s success.
Why not sign up today to find out exactly how Logi Analytics can revolutionise your data analytics game?
We look forward to seeing you there!