Please enable javascript for this website.

Feasibility of Backdoor Poisoning Attacks during Pre-Training & Fine-Tuning

Authors

Abstract

Looks at the risks of malicious manipulation in the training of large language models, where attackers could introduce harmful behavior by altering a small portion of the training data. It focuses on how such attacks could occur during the early training phase, even when most of the data is clean and unaffected.

Notes

Work in progress