Aiinfra on CctoctoFX

Aiinfra on CctoctoFX https://pillumina.github.io/categories/aiinfra/ Recent content in Aiinfra on CctoctoFX CctoctoFX https://pillumina.github.io/imgs/icon_head.png https://pillumina.github.io/imgs/icon_head.png Hugo -- 0.148.2 en Thu, 28 May 2026 00:00:00 +0000 Ascend Profiling Analysis Skill 设计深度解析 https://pillumina.github.io/posts/aiinfra/ascend-profiling-analysis-skill/ Thu, 28 May 2026 00:00:00 +0000 https://pillumina.github.io/posts/aiinfra/ascend-profiling-analysis-skill/ <h1 id="ascend-profiling-analysis-skill-设计深度解析">Ascend Profiling Analysis Skill 设计深度解析</h1> <blockquote> <p>本文深度解析一个用于分析 Ascend NPU torch profiler 产出的 skill，涵盖其设计哲学、Pipeline 架构、昇腾核心知识体系和先验知识体系。</p></blockquote> <h2 id="一背景与动机">一、背景与动机</h2> <h3 id="为什么需要-profiling-分析">为什么需要 profiling 分析？</h3> <p>在昇腾 NPU 上运行 LLM 推理时，的性能调优需要回答几个关键问题：</p> <ul> <li><strong>Step 时间去哪了？</strong> attention/FFN/MoE 各占多少？</li> <li><strong>瓶颈在哪？</strong> Cube 计算还是 Vector 内存搬运？</li> <li><strong>EP/TP 负载均衡吗？</strong> 有没有 rank 掉队？</li> <li><strong>通信是否拖后腿？</strong> HCCL collective 是否慢于预期？</li> </ul> <p>传统的分析手段面临几个问题：</p> <table> <thead> <tr> <th>工具</th> <th>问题</th> </tr> </thead> <tbody> <tr> <td>CANN Studio Timeline</td> <td>只能看时序，无法聚合统计</td> </tr> <tr> <td><code>trace_view.json</code></td> <td>数据稀疏，难以关联到 kernel 语义</td> </tr> <tr> <td><code>kernel_details.csv</code></td> <td>数据量级 GB，需要专门解析逻辑</td> </tr> </tbody> </table> <h3 id="设计目标">设计目标</h3> <p>这个 skill 的核心目标：<strong>从原始 profiling 数据出发，产出带证据链的可追溯报告</strong>。</p> <ul> <li>每一条诊断结论都必须能追溯到原始 CSV 的行号</li> <li>支持跨 rank 对齐和异常检测</li> <li>输出 Markdown / Excel / HTML 三种格式</li> </ul> <h2 id="二设计哲学证据链优先">二、设计哲学：证据链优先</h2> <h3 id="核心理念">核心理念</h3> <blockquote> <p><strong>每个 claim 必须能追溯到原始 row。</strong></p>