View article

[PDF] from arxiv.org

Machine learning models that remember too much

Authors

Congzheng Song, Thomas Ristenpart, Vitaly Shmatikov

Publication date

2017/10/30

Book

Proceedings of the 2017 ACM SIGSAC Conference on computer and communications security

Pages

587-601

Description

Machine learning (ML) is becoming a commodity. Numerous ML frameworks and services are available to data holders who are not ML experts but want to train predictive models on their data. It is important that ML models trained on sensitive inputs (e.g., personal images or documents) not leak too much information about the training data.

We consider a malicious ML provider who supplies model-training code to the data holder, does \emph{not} observe the training, but then obtains white- or black-box access to the resulting model. In this setting, we design and implement practical algorithms, some of them very similar to standard ML techniques such as regularization and data augmentation, that "memorize" information about the training dataset in the model\textemdash yet the model is as accurate and predictive as a conventionally trained model. We then explain how the adversary can extract memorized …

Total citations

Cited by 572

201720182019202020212022202320243 24 49 88 109 124 132 37

Scholar articles

Machine learning models that remember too much

C Song, T Ristenpart, V Shmatikov - Proceedings of the 2017 ACM SIGSAC Conference on …, 2017