- Synthetic Data News
- Posts
- Public sector and synthetic data: Scotland's strategy, a NATO report, and public health datasets from the UK
Public sector and synthetic data: Scotland's strategy, a NATO report, and public health datasets from the UK
📰 Recent initiatives of the public sector with synthetic data
Welcome to the first edition of the year. In this issue, we focus on how the public sector is getting involved with synthetic data and how this might influence technology adoption.
We tend to focus on vendors and users a lot, but public actors are really important in the adoption and growth of synthetic data tech. They provide funding and resources for development, set standards and regulations to encourage (or frame) its use, and can partner with private companies or research institutions to promote synthetic data. Plus, by using the technology themselves, the public sector help raise awareness on the potential of synthetic data, ultimately leading to wider adoption.
Below are 5 recent examples that illustrate these efforts, plus additional resources and news from the industry. Happy reading!
📰 Recent initiatives of the public sector with synthetic data
Synthetic and public sector data for research purposes in Scotland: Research Data Scotland (RDS) has proposed a strategy to move forward with the production and research use of synthetic data in Scotland in collaboration with external partners and other data organizations. It includes the development of a framework for validating synthetic data utility, creating an information governance support system, developing training and accreditation requirements for users, and establishing a network for sharing knowledge and expertise about synthetic data (link)
NATO report on synthetic data: NATO Strategic Communication Center of Excellence published a report about the opportunities and potential risks of using synthetic data and investigated how Open AI could support sentiment analysis. In the risk section, they explain how it may be used to support disinformation actors in their activities and note: "Most of the deepfakes over 2018–22 have imitated human faces and voices, but AI is now being used to alter maps, imagery, and X-rays and to generate text." (link)
Synthetic data challenge for defense: The National Security Innovation Network (NSIN), program of the U.S. Department of Defense, is running with several partners a challenge to identify solutions for synthetic training data for computer vision algorithms in defense applications, with a $75,000 prize for the top performer. (link)
Release of synthetic public datasets for research: UK research service for public health and clinical studies CPRD made available for training purposes synthetic health dataset. They currently propose two datasets, a cardiovascular disease synthetic dataset, a COVID-19 symptoms and risk factors synthetic dataset (link)
Regulating synthetic data: the Data Governance Act, applicable in September 2023, might drive the adoption of synthetic data technology. The Act aims to encourage the sharing of public sector data and supports the use of anonymization techniques. (link)
âš™ New synthetic data companies and tools
📣 From the community
Researchers from the University of Chile conducted a study on the use of synthetic data to train credit scoring models. They found that models trained with synthetic data showed a reduction of 3% of AUC and 6% of KS compared with real data. They noted that increasing the number of features negatively impacted the synthesized data quality. (link)
The article discusses the use of synthetic patient data in silico trials to generate virtual population cohorts and create more diverse virtual populations for in silico trials. (gated research - link)
IBM and the University of California San Diego have published research showing that pre-training machine learning models using synthetic data could lead to improved machine translation, enhancing the translation quality for low-resource languages (link)
Learn about privacy, differential privacy, and how to use OpenDP to generate differentially-private synthetic data with this 7-minute video from David Zagardo. (link)
Introduction to Synthetic Data for Researchers: a live workshop is taking place on February 21 if you want to be introduced to the topic. (link)
A comprehensive review paper on the use of synthetic data in healthcare, identifying 7 use cases such as simulation and prediction research, hypothesis testing, health IT development, and education. (link)
I'm posting this content every two weeks. You can subscribe below to receive it by email. Have a great week. ✌