We are now the AI Security Institute
Please enable javascript for this website.

Feasibility of Backdoor Poisoning Attacks during Pre-Training & Fine-Tuning

Read the Full Paper

Authors

No items found.

Abstract

Looks at the risks of malicious manipulation in the training of large language models, where attackers could introduce harmful behavior by altering a small portion of the training data. It focuses on how such attacks could occur during the early training phase, even when most of the data is clean and unaffected.

Notes

Work in progress