Looks at the risks of malicious manipulation in the training of large language models, where attackers could introduce harmful behavior by altering a small portion of the training data. It focuses on how such attacks could occur during the early training phase, even when most of the data is clean and unaffected.
Work in progress