Stealing Part Of A Production Language Model
2024-3-13 23:25:20 Author: packetstormsecurity.com(查看原文) 阅读量:8 收藏

In this whitepaper, the authors introduce the first model-stealing attack that extracts precise, nontrivial information from black-box production language models like OpenAI's ChatGPT or Google's PaLM-2. Specifically, their attack recovers the embedding projection layer (up to symmetries) of a transformer model, given typical API access. For under $20 USD, their attack extracts the entire projection matrix of OpenAI's ada and babbage language models. They thereby confirm, for the first time, that these black-box models have a hidden dimension of 1024 and 2048, respectively. They also recover the exact hidden dimension size of the gpt-3.5-turbo model, and estimate it would cost under $2,000 in queries to recover the entire projection matrix. They conclude with potential defenses and mitigations, and discuss the implications of possible future work that could extend this attack.


文章来源: https://packetstormsecurity.com/files/177567/2403.06634.pdf
如有侵权请联系:admin#unsafe.sh