AWS releases Glue DataBrew data preparation tool
Amazon Web Services (AWS) has put its Glue DataBrew data preparation tool on to the open market.
The visual no-code data normalisation tool builds on the success of AWS Glue, which data engineers have been using to create, run and monitor ETL (Extract Transform and Load) jobs. DataBrew adds the ability to transform data exploration.
AWS Glue DataBrew has more than 250 preset data transformations built into automate data preparation tasks such as filtering anomalies, format standards and invalid values. Once normalised, the data can be used with AWS or third-party analytics and machine learning software.
Raju Gulabani, VP of database and analytics, AWS, said, “AWS customers are using data for analytics and machine learning at an unprecedented pace. However, these customers regularly tell us that their teams spend too much time on the undifferentiated, repetitive, and mundane tasks associated with data preparation. Customers love the scalability and flexibility of code-based data preparation services like AWS Glue, but they could also benefit from allowing business users, data analysts, and data scientists to visually explore and experiment with data independently, without writing code. AWS Glue DataBrew features an easy-to-use visual interface that helps data analysts and data scientists of all technical levels understand, combine, clean, and transform data.”
AWS Glue DataBrew is available in the US, EU and APAC with other regions to follow.
"Our analysts profile and query various kinds of structured and unstructured data in order to better understand usage patterns. AWS Glue DataBrew provides a visual interface that enables both our technical and non-technical users to analyse data quickly and easily. Its advanced data profiling capability helps us better understand our data and monitor the data quality. AWS Glue DataBrew and other AWS analytics services have allowed us to streamline our workflow and increase productivity."
- Takashi Ito, general manager of marketing platform planning department
“A data lake is a critical part of our analytics strategy. One of the challenges we face is not being able to easily explore data before ingestion into our data lake. AWS Glue DataBrew has sophisticated data profiling functionality and a rich set of built-in transformations. This enables our data engineers to easily explore new datasets in a visual interface and make modifications in order to optimize ingestion and allow analysts to shape the data for their analytics solutions. We see AWS Glue DataBrew as a way to help us better manage our data platform and improve efficiencies in our data pipelines.”
- John Maio, director, data and analytics platforms architecture
“Data is critical to optimising our manufacturing processes. One of the challenges we face is ensuring we have a clean data lake that can serve as the source of truth for our analytics and machine learning applications. The data ingested into our data lake often contains duplicate values, incorrect formatting and other imperfections that make it difficult to use in its raw form. Amazon AWS Glue DataBrew will allow our data analysts to visually inspect large data sets, clean and enrich data, and perform advanced transformations. AWS Glue DataBrew will empower our analysts and data scientists to perform advanced data engineering activities, giving them the freedom to explore their data and decreasing the time to derive new insights.”
- Tanner Gonzalez, analytics and cloud leader
China Takes Additional Step to Control Big Tech’s Data
China’s new Data Security Law will take effect on September 1st, allowing the government major control over the collection, use, and transmission of data. Tech companies have grown exponentially in terms of market size and overall power, and the Chinese government has no interest in alternative power hubs—especially those that belong to private enterprise.
With its Thursday legislation, companies will face extravagant fines if they export data outside of China without authorisation. The Chinese government claims that this will create a legal framework and help companies from taking advantage of citizens, but according to analyst Ryan Fedasiuk from Georgetown University’s Centre for Security and Emerging Technology, “China’s push for data privacy...is yet another move to strengthen the role of the government and the party vis-à-vis tech companies.”
How Do Other Countries Approach Data Privacy?
- Europe: The EU Charter of Fundamental Rights assures EU citizens the right to data protection. The bloc’s General Data Protection Regulation (GDPR), passed in May of 2018, put stringent restrictions on commercial data collection.
- Canada: 28 federal, provincial, and territorial laws govern consumer data privacy; DLA Piper ranks the country’s data protection legislation as heavy, in comparison to Russia (medium) and India (limited).
- The United States: As usual, the States doesn’t have a single comprehensive federal law for data privacy. Instead, its lawmakers have passed hundreds of local and state acts, many of which are seen by the Federal Trade Commission (FTC).
China, in contrast, thinks data should be a national asset and has written data collection into its five-year plan. Although its new legislation will help curtail private access to consumer data, the government may be the final beneficiary.
What Will China Do With the Data?
According to advisors, consumer data can mitigate financial crises and viral outbreaks. It can protect the interest of national security—no surprise—and help the government with criminal surveillance. Right now, Chinese regulators have summoned 13 major tech firms, including Tencent, JD.com, Meituan, and ByteDance, to meet with China’s central bank. Communist Party Chief President Xi Jinping can shut down any companies found violating the new privacy laws, as well as hit them with a fine of up to 10 million yuan—US$1.6mn.
How Will Laws Affect Foreign Firms?
Now, foreign firms must store data on Chinese soil, a practice that many companies protest will infringe on their proprietary data. So far, Tesla will comply: in late May, the electric car manufacturer promised to build more Chinese factories and keep the resulting information within Chinese borders. In fact, businesses hoping to start China-based businesses—such as Citigroup and BlackRock—will have to comply with the “data-localisation laws”.
The Chinese government has framed data as a critical source of intelligence for the party and central government. “You have the most sufficient data, then you can make the most objective and accurate analyses”, Mr Xi told Tencent’s founder, Mr Ma. “The...suggestions to the government in this regard are very valuable”.
Greater digital control is coming, that’s for sure. Mr Xi has named big data as an essential part of China’s economy, right up there with land and labour. “Whoever controls data will have the initiative”.