Can Large Language Models Resist Transfer Attacks on Coding Tasks?

Original title: Transfer Attacks and Defenses for Large Language Models on Coding Tasks

Authors: Chi Zhang, Zifan Wang, Ravi Mangal, Matt Fredrikson, Limin Jia, Corina Pasareanu

The article explores how modern large language models (LLMs), like ChatGPT, handle coding tasks compared to previous code models like code2seq and seq2seq. While these LLMs excel in code-related tasks like summarization and vulnerability identification, they’re susceptible to adversarial examples—small syntactic changes tricking models without altering the code’s meaning, such as adding “dead code” or irrelevant print statements. Investigating this vulnerability, the study probes how adversarial examples, crafted on smaller code models, impact LLMs. They propose prompt-based defenses, enhancing LLMs by modifying prompts to include adversarially perturbed code examples and instructions for reversing these alterations. Results showcase that adversarial examples from smaller models weaken LLMs, but the suggested defenses display promise in fortifying these models against such attacks, indicating a path toward more resilient language models for coding tasks.

Original article: https://arxiv.org/abs/2311.13445