The Generation and Serving Procedures of Typical LLMs: A Quick Explanation

The Generation and Serving Procedures of Typical LLMs: A Quick Explanation
In this section, we describe the generation and serving procedures of typical LLMs and the iteration 2024-12-14 03:31:25 Author: hackernoon.com(查看原文) 阅读量:9 收藏

In this section, we describe the generation and serving procedures of typical LLMs and the iteration-level scheduling used in LLM serving.

The task of language modeling is to model the probability of a list of tokens (𝑥1, . . . , 𝑥𝑛). Since language has a natural sequential ordering, it is common to factorize the joint probability over the whole sequence as the product of conditional probabilities (a.k.a. autoregressive decomposition [3]):

Authors:

(1) Woosuk Kwon, UC Berkeley with Equal contribution;

(2) Zhuohan Li, UC Berkeley with Equal contribution;

(3) Siyuan Zhuang, UC Berkeley;

(4) Ying Sheng, UC Berkeley and Stanford University;

(5) Lianmin Zheng, UC Berkeley;

(6) Cody Hao Yu, Independent Researcher;

(7) Cody Hao Yu, Independent Researcher;

(8) Joseph E. Gonzalez, UC Berkeley;

(9) Hao Zhang, UC San Diego;

(10) Ion Stoica, UC Berkeley.

文章来源: https://hackernoon.com/the-generation-and-serving-procedures-of-typical-llms-a-quick-explanation?source=rss
如有侵权请联系:admin#unsafe.sh