Data DNA to secure generative modeling

November 24, 2021

What are the consequences if the machines are smart but have no identity? What are the possible ways to hold machines and their creators accountable for their actions towards human society?

This is part of the work explored by Yi “Max” Ren, assistant professor of aerospace and mechanical engineering at the Ira A. Fulton Schools of Engineering at Arizona State University. Ren and a team of ASU researchers recently received a grant from the National Science Foundation to study the attribution and secure training of generative models.

A team of ASU researchers led by Assistant Professor Yi “Max” Ren are working on creating digital DNA for generative models that can be used to create things like deepfake photos and videos. Their work will attribute attribution to the creators of the models and, in the event of malicious activity, allow accountability. Image courtesy of Shutterstock
Download the full image

Generative models capture data distributions from a range of high-dimensional and real-world content, such as the cat images that dominate the web, human speech, driving behavior, and material microstructures, among many. many other types of content. And in doing so, models gain the ability to synthesize new content similar to genuine content.

This ability of generative models has inspired many creative ideas that materialize in the real world, from ultra-high-resolution photography to autonomous vehicles and the discovery of computational materials. However, the advance of generative models has created two major socio-technical challenges.

First, generative models were used to create deepfake technology. Therefore, it is no longer possible to fully determine whether an image, video, audio recording or chat message was created by a human or artificial intelligence. In fact, generative models would have been used for espionage and malicious impersonation operations.

Second, training such models may require collaboration between multiple data providers to improve their performance and reduce bias. Such collaboration, however, can expose proprietary data sets (eg, medical records) and raise privacy concerns.

With support from the NSF grant, researchers at Fulton Schools are working to make generative models more regulated when released and more secure when created.

Ren and Co-Principal Investigators Ni Trieu and Yezhou “YZ” Yang, both Assistant Professors of Computer Science and Engineering, will address the two open challenges that accompany recent advances and the use of generative models to synthesize media and scientific content.

The three researchers represent a strong interdisciplinary collaboration between computer scientists in artificial intelligence (Yang), security (Trieu) and optimization (Ren). All three have been working on research areas related to this project for years.

“We hope that the results of our study will help assess the technical feasibility of legislative instruments being developed to address the emerging deepfake crisis and concerns about data privacy,” said Ren.

In each row above, the creator’s “key” on the left is included in each of the four images on the right. The Imperceptible Key, which works similarly to how DNA can identify an individual, helps assign the creator in the event that one of their images is used maliciously. Image created by Changhoon Kim / ASU

The project will develop new mathematical theories and computer tools to assess the feasibility of two solutions connected to these challenges. If successful, the results of the project will provide technical advice for the design of future regulations to ensure the safe development and dissemination of generative models.

The team’s solution to the first challenge is “model assignment”. When an application is developed, unique keys are added by an independent registry, creating a “watermark” on every copy of the application that is released to end users. This allows the registry to assign the owners of the generated content.

“Watermark is a system design problem where we have to strike a good balance between three metrics that trade off: attribution accuracy, build quality, and model capacity,” Ren said.

He describes the challenge of creating this balance by thinking of every copy of an application like a dumpling; these pellets all surround the original authentic data set.

“For greater attribution accuracy, we would like these pellets to be separated from each other, resulting in either a drop in build quality as they move away from the authentic data set,” or a decrease in the capacity of the model if we only keep the models at a certain distance, “says Ren. “We are trying to improve on this compromise by using the fact that semantic differences are not measured by Euclidean distances.”

For example, if the AI ​​created a series of faces, having a chain of red hair on all the faces generated as an ID watermark could mean a significant departure from the authentic data in Euclidean space, but semantically, the content is always high quality because faces would always appear almost identical to those without watermark.

“This idea can allow us to learn a way to wrap a lot of “dumplings” so that they don’t overlap, which allows high attribution accuracy while remaining semantically valid and creating high build quality, “Ren said.

The second challenge concerns data privacy and how to ensure the security of private data when creating generative models.

The research team is studying the secure multi-stakeholder formation of generative models. Data privacy and training scalability will be balanced through the design of secure model architectures and learning losses.

“We offer secure key generation and secure training of generative models, in the context where data providers divide and encrypt their data on multiple servers on which a calculation will be performed,” said Yang. “To this end, we are studying scalable secure computing algorithms suitable for generative models. “

According to Ren, the scope of this research has the capacity to advance into other areas that may currently be considered science fiction, but which could quickly become reality as watermarks turn into ‘DNA’ data – functioning as a genetic key to tell where the data comes from. .

“There will be a need for artificial DNA for androids in the future,” said Ren, who believes that androids will form future societies alongside humans. “This subject has been covered in many science fiction novels, movies and games. Often, fictitious discussions involve how opposing androids can remove their DNA allowing them to attack maliciously, without humans knowing their origin, and how society is changing as they respond to these attacks. Our study on deepfakes is among the first to address this problem. ”

Comments are closed.