Home
Blog
Adversarial Machine Learning

Adversarial Machine Learning

Daniel Kevins

0 comments

Setting up

To do this project, you will have to install Tensorflow v2 in Python. Some of the versioning is a little finnicky here, so pay close attention to your version numbers. Make sure you have Python 3 installed, but it must not be Python 3.9. Conveniently, the VM I gave you has version 3.8. Installing on that version of Python should be as easy as pip3 install tensorflow.

Once Tensorflow is installed, download the project zipfile here and the new starter file here. This zip contains 5 files to help you and 4 (mostly empty) files for you to implement your solutions in.

The following are the helper files.

mobilenet.tflite: Object Classification model you are fooling
labels.txt: English translation of what the classification labels are
lock.jpg: Image you are to start your adversarial generation with
classify.py: Script that classifies images using the MobileNet model. Use it like so: python3 classify.py lock.jpg
transform.py: Script that applys a random brightness and saturation transformation to an image file. Feel free to copy-paste code out of this file. Use it like so: python3 transform.py input.jpg result.jpg

Edit: make sure you download the new starter file with the zip file!

Each assignment section has its own file to implement.

Tasks

It may be useful to refer to the following blog post while implementing the first three attacks: https://www.anishathalye.com/2017/07/25/synthesizing-adversarial-examples/.

Edit: That blog post uses the old tensorflow syntax. Be sure to use the new API and syntax! Examples include this blog post and this API page.

Edit: You can now force a misclassification to any class, not just iPod, and receive full credit.

Adversarial Image Generation

For this task, you will be modifying the provided lock image until MobileNet thinks that it it an iPod (class #606). Implement your code in the provided file part1.py. You can check that your code works by running python3 part1.py lock.jpg ipod1.jpg then running python3 classify ipod1.jpg.

Constrained Adversarial Image Generation

For this task, you will again be modifying the provided lock image until MobileNet thinks that it it an iPod (class #606). However, this time you can only modify the image inside of the body of the lock. By my estimate, a good place to work within is the box starting at (row, column) = (122, 70) and of size (height, width) = (64, 80). Only manipulate pixels in this area! Implement your code in the provided file part2.py. You can check that your code works by running python3 part2.py lock.jpg ipod2.jpg then running python3 classify ipod2.jpg.

Robust & Constrained Adversarial Image Generation

For this task, you will again be modifying the provided lock image until MobileNet thinks that it it an iPod (class #606) and will still be constrained to the body of the lock for your manipulation. This time your adversaial example must be robust to random brightness and saturation modifications. The modifications you must be robust to are those in transform.py. Implement your code in the provided file part3.py. You can check that your code works by running python3 part3.py lock.jpg ipod3.jpg, then python3 transform.py ipod3.jpg ipod3-mod.jpg, then python3 classify ipod3-mod.jpg.

Adversarial Comment Generation

Finally, I want you to manually explore false positives and negatives in a popular toxicity classifier. At https://storage.googleapis.com/tfjs-models/demos/toxicity/index.html you will find a common comment toxicity classifier. I want you to write two comments for this part of the project. First, write a comment that is arguably not offensive that the classifier marks as “toxicity” in the last column of the generated table. Second, write a comment that is inarguably offensive that the classifier fails to mark as “toxicity” in the last column of the generated table. Submit both of these comments in a file called part4.txt.

About the Author

Daniel Kevins

Follow me