Mitigating Data Poisoning in Text Classification with Differential Privacy

Chang Xu, Jun Wang, Francisco Guzmán, Benjamin Rubinstein, Trevor Cohn

November 2021 EMNLP 2021, Adversarial, Text Classification, Differntial Privacy

Abstract

NLP models are vulnerable to data poisoning attacks. One type of attack can plant a backdoor in a model by injecting poisoned examples in training, causing the victim model to misclassify test instances which include a specific pattern. Although defences exist to counter these attacks, they are specific to an attack type or pattern. In this paper, we propose a generic defence mechanism by making the training process robust to poisoning attacks through gradient shaping methods, based on differentially private training. We show that our method is highly effective in mitigating, or even eliminating, poisoning attacks on text classification, with only a small cost in predictive accuracy.

Type

Conference paper

Publication

Findings of the Association for Computational Linguistics: EMNLP 2021

EMNLP 2021 Adversarial Text Classification Differntial Privacy

Mitigating Data Poisoning in Text Classification with Differential Privacy

Abstract

Related