Authors
Congzheng Song, Thomas Ristenpart, Vitaly Shmatikov
Publication date
2017/10/30
Book
Proceedings of the 2017 ACM SIGSAC Conference on computer and communications security
Pages
587-601
Description
Machine learning (ML) is becoming a commodity. Numerous ML frameworks and services are available to data holders who are not ML experts but want to train predictive models on their data. It is important that ML models trained on sensitive inputs (e.g., personal images or documents) not leak too much information about the training data.
We consider a malicious ML provider who supplies model-training code to the data holder, does \emph{not} observe the training, but then obtains white- or black-box access to the resulting model. In this setting, we design and implement practical algorithms, some of them very similar to standard ML techniques such as regularization and data augmentation, that "memorize" information about the training dataset in the model\textemdash yet the model is as accurate and predictive as a conventionally trained model. We then explain how the adversary can extract memorized …
Total citations
20172018201920202021202220232024324498810912413237
Scholar articles
C Song, T Ristenpart, V Shmatikov - Proceedings of the 2017 ACM SIGSAC Conference on …, 2017