[Deterministic RL] 确定性问题的来源 & Reproducible RL

理解LLM推理中deterministic问题来源 Wiki上对deterministic算法的定义是: “a deterministic algorithm is an algorithm that, given a particular input, will always produce the same output.” 而我们在文中要讨论的,即对于LLM这个context下的deterministic问题,我会先从inference角度(即重复给定一个确定的input,模型的推理为什么无法给定确定的输出)进行问题的理解,再进一步讨论RL工程中的training & inference之间差异,可能会导致RL训练的崩溃问题,并继续讨论业界现在已有的解决方案、与还在working-in-progress的工作。 浮点数的非结合性 thinking machines lab针对batch invariant讨论的文章,详细地解释了在LLM推理中不确定性的来原,即因为精度有限,GPU浮点数运算中的结合性通常不成立: $$(a+b)+c \neq a+(b+c) $$ 这篇arxiv文章,则更深入得说明了这个问题: Floating-point arithmetic in GPUs exhibits non-associativity, meaning (a+b)+c≠a+(b+c) due to finite precision and rounding errors. This property directly impacts the computation of attention scores and logits in the transformer architecture, where parallel operations across multiple threads can yield different results based on execution order. ...

November 20, 2025 · 6 min · 1157 words · Me