May 17, 2020

Dispelling misconceptions about synthetic data sets

Big Data
Dr Nicolai Baldin, CEO & Found...
4 min
The future of data, which is increasingly available on-demand, is actually already here
In November last year, Amazon released AWS Data Exchange, which made it easier for AWS customers to use and share data. Meanwhile, recently Google Data...

In November last year, Amazon released AWS Data Exchange, which made it easier for AWS customers to use and share data. Meanwhile, recently Google Data Search came out of beta which makes it easy for researchers to access more than 25 million publicly available data sets. 

What can we make of these moves by big tech to open up the sharing of data? It is one I've observed over the last number of years (and one that makes me especially optimistic about the future growth of the world economies) - data is becoming a service.

To put data and its importance into context, McKinsey estimates that about 1.7 megabytes a second of new data will be created for every person globally this year. Considering how central banking and financial services are to daily life, with 65% of customers now interacting with their banks via digital channels, it is fair to assume that the sector will play a pivotal role in the development of this trend. In fact, it is estimated that just under half (47%) of people will check in on their online banking every single day. 

This puts incredible pressure on these organisations in critical ways, including accountability to comply with data and privacy regulations (such as GDPR introduced into the EU in 2018), while also catering to consumers in an “always-on” digital environment. 

When it comes to data provisioning for development and testing purposes, historically there have been two categories relied upon by financial organisations - original and anonymous. Original refers to all personally identifiable information (PII), such as a customer’s name and a transaction’s details being available. Meanwhile, anonymous data (generally speaking) removes such PII, but includes transactional data. Clearly, with both types of data, there are significant challenges for financial organisations to protect this information, while also remaining compliant with ever more complex regulations around data protection.

Yet there is an alternative AI-driven approach for financial organisations to consider in development - so-called synthetic or synthesized data. 


What Is Synthetic Data?

In essence, synthetic data is computer-synthesized data (powered by cutting-edge machine learning technology) that mimics original data.

When implemented accurately, the benefits of this approach include full data privacy compliance and reducing the time needed for product development and testing; synthesizing data can take as little as 10 minutes. As a result, extremely sensitive data can be unlocked to turbocharge product or service development with no actual risks around a potential data breach.

Even still, to truly understand the impact of synthetic data and the impact it could have on the financial industry, it is important to consider what I believe are two common misconceptions related to it. 

1. Synthetic data refers to anonymised data

When data anonymisation is discussed, “synthetic” frequently refers to “modified” (and interchangeably so), meaning original data is altered in some systematic way to make it more difficult to identify the original data points. 

In fact, there are three main approaches to data provisioning currently available on the market: 

  1. anonymised data - produced by a 1-to-1 transformation from original data, 

  2. artificial data - produced by a probabilistic model, based on a sample of data

  3. fully synthetic data - produced by a generative model of original data which “understands” how original data should look like. 

So powerful is synthetic data that recent research from MIT found that by using high-quality data synthesized by an advanced machine learning engine, it is possible to get the same results for a data-driven project as using original data. 

2. Synthetic data can only ever be worse than real data

There can at times be an assumption that data which mimics real data is a poor copy of the original. The overall aim of synthetic data is that it wants to be just as good as providing such key insights as original data. When used right, synthetic data can be just as insightful as needed, what is really critical is that the way the data is created is optimised for what financial organisations are looking for. 

By using agile data synthesized by an algorithm (i.e. created and examined in minutes thanks to advanced technologies) the potential to free up a financial organisation’s staff for other tasks is obvious, as even collecting original data is time-consuming, with an estimated 12.5% of development time taken by the process.  

The future of data, which is increasingly available on-demand, is actually already here. The financial organisations that face the data challenge directly, and utilise the power and efficiencies of synthetic data, will be the winners for digital customers, both now and into the future.  

By Dr Nicolai Baldin, CEO & Founder, Synthesized

Share article

Jun 16, 2021

SAS: Improving the British Army’s decision making with data

British Army
3 min
Roderick Crawford, VP and Country GM, explains the important role that SAS is playing in the British Army’s digital transformation

SAS’ long-standing relationship with the British Army is built on mutual respect and grounded by a reciprocal understanding of each others’ capabilities, strengths, and weaknesses. Roderick Crawford, VP and Country GM for SAS UKI, states that the company’s thorough grasp of the defence sector makes it an ideal partner for the Army as it undergoes its own digital transformation. 

“Major General Jon Cole told us that he wanted to enable better, faster decision-making in order to improve operational efficiency,” he explains. Therefore, SAS’ task was to help the British Army realise the “significant potential” of data through the use of artificial intelligence (AI) to automate tasks and conduct complex analysis.

In 2020, the Army invested in the SAS ‘Viya platform’ as an overture to embarking on its new digital roadmap. The goal was to deliver a new way of working that enabled agility, flexibility, faster deployment, and reduced risk and cost: “SAS put a commercial framework in place to free the Army of limits in terms of their access to our tech capabilities.”

Doing so was important not just in terms of facilitating faster innovation but also, in Crawford’s words, to “connect the unconnected.” This means structuring data in a simultaneously secure and accessible manner for all skill levels, from analysts to data engineers and military commanders. The result is that analytics and decision-making that drives innovation and increases collaboration.

Crawford also highlights the importance of the SAS platform’s open nature, “General Cole was very clear that the Army wanted a way to work with other data and analytics tools such as Python. We allow them to do that, but with improved governance and faster delivery capabilities.”

SAS realises that collaboration is at the heart of a strong partnership and has been closely developing a long-term roadmap with the Army. “Although we're separate organisations, we come together to work effectively as one,” says Crawford. “Companies usually find it very easy to partner with SAS because we're a very open, honest, and people-based business by nature.”

With digital technology itself changing with great regularity, it’s safe to imagine that SAS’ own relationship with the Army will become even closer and more diverse. As SAS assists it in enhancing its operational readiness and providing its commanders with a secure view of key data points, Crawford is certain that the company will have a continually valuable role to play.

“As warfare moves into what we might call ‘the grey-zone’, the need to understand, decide, and act on complex information streams and diverse sources has never been more important. AI, computer vision and natural language processing are technologies that we hope to exploit over the next three to five years in conjunction with the Army.”

Fundamentally, data analytics is a tool for gaining valuable insights and expediting the delivery of outcomes. The goal of the two parties’ partnership, concludes Crawford, will be to reach the point where both access to data and decision-making can be performed qualitatively and in real-time.

“SAS is absolutely delighted to have this relationship with the British Army, and across the MOD. It’s a great privilege to be part of the armed forces covenant.”


Share article