Unlearnable Examples: Making Personal Data Unexploitable

We need more control of how our data is used.

The volume of “free” data on the internet has been critical to the current success of deep learning. However, it also raises privacy concerns about the unauthorized exploitation of personal data for training commercial models. We believe it’s crucial to develop methods to allow individuals to take active actions and prevent their data from any unauthorized exploitations.

In this research, we present a type of error-minimizing (unlearnable) noise that can make training examples unlearnable to deep learning. The unlearnable noise can be used by an individual to tag their data, so that it can’t easily be used by others for training their machine learning systems. This gives the owner more control about how their data is used.

What are Unlearnable Examples?

AI is supposed to learn from its own mistakes (errors). But what happens if there are no mistakes or it is too easy to learn, will learning stop?

Unlearnable examples exploits the above aspect of AI and tricks the model to believe there is nothing to learn. Deep Neural Networks (DNNs) trained on unlearnable examples will have a performance equivalent to random guessing on normal test examples.

The unlearnable effectiveness of different types of noise: random, adversarial (error-maximizing) and our proposed error-minimizing noise on CIFAR-10 dataset. The lower the clean test accuracy the more effective of the noise.

Difference to adversarial examples?

Adversarial examples can protect you from being recognized by a well-trained model (the model learns the version A of your data, while adversarial examples are version B), while unlearnable examples protect your data from contributing to any model training.

How to use unlearnable noise to protect your data?

Before you release your data to the wild, add an imperceptible noise to the data to create unlearnable data. We need to pre-generate the noise based on a public dataset, based on image categories (classes). Then you can choose the category-specific noise to add to your image according to its category. The noise can also be generated for each image individually (sample-wise noise).

We plan to develop an app for you to use in the future.

Examples on CIFAR-10

In our code repository, we have a QuickStart notebook that contains minimal implementations for sample-wise error-minimizing noise.

This is an example of Unlearnable Examples on CIFAR-10. From left to right: Original Images, Visualization of the Error-Minimizing Noise and Unlearnable Images.

ICLR-2021 Poster

Researchers

Hanxun Huang , PhD student, The University of Melbourne
Xingjun Ma , Lecturer, Deakin University
Sarah Erfani , Senior Lecturer, The University of Melbourne
James Bailey , Professor, The University of Melbourne
Yisen Wang , Assistant Professor, Peking University

Top row: original photos;
Bottom row: unlearnable photos generated using our technology.

Media Coverage

Pursuit: Blocking AI to keep your personal data your own
Gadgets 360: Worried About Privacy for Your Selfies? These Tools Can Help Spoof Facial Recognition AI
MIT Technology Review: How to stop AI from recognizing your face in selfies

Cite Our Work

@inproceedings{huang2021unlearnable,
    title={Unlearnable Examples: Making Personal Data Unexploitable},
    author={Hanxun Huang
      and Xingjun Ma
      and Sarah Monazam Erfani
      and James Bailey
      and Yisen Wang},
    booktitle={ICLR},
    year={2021}
}