Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
40 changes: 23 additions & 17 deletions docs/01-why-tokens-matter.md
Original file line number Diff line number Diff line change
Expand Up @@ -51,23 +51,29 @@ Every token you send or receive has a cost. Here's how:

Understanding what Copilot does behind the scenes helps you optimize:

```text
┌─────────────────────────────────────────────────┐
│ Context Window │
│ │
│ ┌──────────────────┐ ┌─────────────────────┐ │
│ │ INPUT TOKENS │ │ OUTPUT TOKENS │ │
│ │ │ │ │ │
│ │ System prompt │ │ The response │ │
│ │ + copilot- │ │ you receive │ │
│ │ instructions │ │ │ │
│ │ + file context │ │ │ │
│ │ + conversation │ │ │ │
│ │ history │ │ │ │
│ │ + YOUR prompt │ │ │ │
│ └──────────────────┘ └─────────────────────┘ │
└─────────────────────────────────────────────────┘
```
<div class="token-context-diagram" role="img" aria-label="Context window containing input tokens and output tokens">
<div class="token-context-diagram__frame">
<p class="token-context-diagram__title">Context Window</p>
<div class="token-context-diagram__columns">
<section class="token-context-diagram__panel">
<h3>Input tokens</h3>
<ul>
<li>System prompt</li>
<li><code>copilot-instructions.md</code></li>
<li>File context</li>
<li>Conversation history</li>
<li>Your prompt</li>
</ul>
</section>
<section class="token-context-diagram__panel token-context-diagram__panel--output">
<h3>Output tokens</h3>
<div class="token-context-diagram__panel-body">
<p>The response you receive</p>
</div>
</section>
</div>
</div>
</div>

- **System prompt:** Copilot's own instructions (you can't control this)
- **`copilot-instructions.md`:** Your project-level instructions — loaded on **every** interaction
Expand Down
40 changes: 23 additions & 17 deletions docs/01-why-tokens-matter.zh-TW.md
Original file line number Diff line number Diff line change
Expand Up @@ -51,23 +51,29 @@ Token 是大型語言模型讀寫時使用的基本單位。它不是單字,

了解 Copilot 背後實際做了什麼,才能知道該怎麼最佳化:

```text
┌─────────────────────────────────────────────────┐
│ Context Window │
│ │
│ ┌──────────────────┐ ┌─────────────────────┐ │
│ │ INPUT TOKENS │ │ OUTPUT TOKENS │ │
│ │ │ │ │ │
│ │ System prompt │ │ 你收到的回應 │ │
│ │ + copilot- │ │ │ │
│ │ instructions │ │ │ │
│ │ + file context │ │ │ │
│ │ + conversation │ │ │ │
│ │ history │ │ │ │
│ │ + 你的 prompt │ │ │ │
│ └──────────────────┘ └─────────────────────┘ │
└─────────────────────────────────────────────────┘
```
<div class="token-context-diagram" role="img" aria-label="Context window containing input tokens and output tokens">
<div class="token-context-diagram__frame">
<p class="token-context-diagram__title">Context Window</p>
<div class="token-context-diagram__columns">
<section class="token-context-diagram__panel">
<h3>輸入 token</h3>
<ul>
<li>System prompt</li>
<li><code>copilot-instructions.md</code></li>
<li>File context</li>
<li>Conversation history</li>
<li>你的 prompt</li>
</ul>
</section>
<section class="token-context-diagram__panel token-context-diagram__panel--output">
<h3>輸出 token</h3>
<div class="token-context-diagram__panel-body">
<p>你收到的回應</p>
</div>
</section>
</div>
</div>
</div>

- **System prompt:** Copilot 自身的內建指示(你無法控制)
- **`copilot-instructions.md`:** 專案層級指示,**每次互動都會載入**
Expand Down
29 changes: 18 additions & 11 deletions docs/08-mcp-tool-costs.md
Original file line number Diff line number Diff line change
Expand Up @@ -55,17 +55,24 @@ This isn't free. Each tool definition costs approximately:

Here's where it gets expensive:

```text
Tools loaded = servers × tools_per_server × tokens_per_tool

Example (heavy setup):
10 MCP servers × 5 tools each × 200 tokens avg = 10,000 tokens

Agent mode runs 5-25 steps per task.
Tool definitions reload EVERY step.

10,000 tokens × 15 steps = 150,000 tokens just for tool definitions.
```
<div class="guide-visual" role="img" aria-label="Tool definition cost multiplies across servers, tools, and agent steps">
<p class="guide-visual__title">Reloaded Tool Cost</p>
<div class="guide-visual__grid guide-visual__grid--2">
<section class="guide-visual__card">
<h4>Formula</h4>
<p class="guide-visual__math">Tools loaded = servers x tools_per_server x tokens_per_tool</p>
<p class="guide-visual__note">That whole bundle reloads on every agent step.</p>
</section>
<section class="guide-visual__card">
<h4>Heavy setup example</h4>
<p class="guide-visual__math">10 MCP servers x 5 tools x 200 tokens = 10,000 tokens</p>
<div class="guide-visual__flow">
<p class="guide-visual__math">10,000 tokens x 15 steps</p>
</div>
<p class="guide-visual__metric">150,000 tokens</p>
</section>
</div>
</div>

That's 150K tokens doing nothing but telling the agent what tools exist. Before any actual work happens.

Expand Down
27 changes: 18 additions & 9 deletions docs/08-mcp-tool-costs.zh-TW.md
Original file line number Diff line number Diff line change
Expand Up @@ -47,15 +47,24 @@ Buffer: 40.4k (20%)

真正貴的是它會被重複載入:

```text
Tools loaded = servers × tools_per_server × tokens_per_tool

Example:
10 MCP servers × 5 tools × 200 tokens = 10,000 tokens

Agent mode 走 15 steps:
10,000 × 15 = 150,000 tokens
```
<div class="guide-visual" role="img" aria-label="工具定義成本會隨 server、tool 與 agent 步數相乘">
<p class="guide-visual__title">工具成本會一直重載</p>
<div class="guide-visual__grid guide-visual__grid--2">
<section class="guide-visual__card">
<h4>公式</h4>
<p class="guide-visual__math">Tools loaded = servers x tools_per_server x tokens_per_tool</p>
<p class="guide-visual__note">整包工具定義會在每個 agent step 再載一次。</p>
</section>
<section class="guide-visual__card">
<h4>重度設定範例</h4>
<p class="guide-visual__math">10 MCP servers x 5 tools x 200 tokens = 10,000 tokens</p>
<div class="guide-visual__flow">
<p class="guide-visual__math">10,000 tokens x 15 steps</p>
</div>
<p class="guide-visual__metric">150,000 tokens</p>
</section>
</div>
</div>

也就是說,還沒做任何真正工作,就先花了 15 萬個 token 讓 agent 知道有哪些工具可用。

Expand Down
39 changes: 30 additions & 9 deletions docs/09-comparisons-data.md
Original file line number Diff line number Diff line change
Expand Up @@ -151,15 +151,36 @@ Does compression hurt output quality? The research says: **rarely, and only at e

The savings curve is not linear. The first 30% of compression (dropping filler) is free. The next 20% (fragments, abbreviations) is nearly free. Beyond that, each additional compression point risks quality.

```text
Savings vs. Quality Risk:

Quality ████████████████████████████████████░░░░░░░░░
Risk ░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░███████████████
0% 20% 40% 60% 80%
Token Savings →
lite full ultra extreme
```
<div class="guide-visual" role="img" aria-label="Compression savings versus quality risk curve">
<p class="guide-visual__title">Savings vs. Quality Risk</p>
<div class="guide-visual__curve">
<div class="guide-visual__curve-row">
<span class="guide-visual__curve-label">Quality</span>
<div class="guide-visual__curve-bar">
<div class="guide-visual__curve-fill" style="width: 82%;"></div>
</div>
</div>
<div class="guide-visual__curve-row">
<span class="guide-visual__curve-label">Risk</span>
<div class="guide-visual__curve-bar">
<div class="guide-visual__curve-fill guide-visual__curve-fill--risk" style="width: 62%;"></div>
</div>
</div>
</div>
<div class="guide-visual__scale">
<span>0%</span>
<span>20%</span>
<span>40%</span>
<span>60%</span>
<span>80%</span>
</div>
<div class="guide-visual__ticks">
<span>lite</span>
<span>full</span>
<span>ultra</span>
<span>extreme</span>
</div>
</div>

**Sweet spot: full caveman (30-50% input token savings; 40-55% output savings with terse system instructions).** Maximum return, negligible risk.

Expand Down
31 changes: 31 additions & 0 deletions docs/09-comparisons-data.zh-TW.md
Original file line number Diff line number Diff line change
Expand Up @@ -99,6 +99,37 @@
再往後 20% 也通常很划算。
超過那個點後,每多壓一點,都更可能帶來誤解。

<div class="guide-visual" role="img" aria-label="壓縮節省與品質風險的關係圖">
<p class="guide-visual__title">效益與風險曲線</p>
<div class="guide-visual__curve">
<div class="guide-visual__curve-row">
<span class="guide-visual__curve-label">品質</span>
<div class="guide-visual__curve-bar">
<div class="guide-visual__curve-fill" style="width: 82%;"></div>
</div>
</div>
<div class="guide-visual__curve-row">
<span class="guide-visual__curve-label">風險</span>
<div class="guide-visual__curve-bar">
<div class="guide-visual__curve-fill guide-visual__curve-fill--risk" style="width: 62%;"></div>
</div>
</div>
</div>
<div class="guide-visual__scale">
<span>0%</span>
<span>20%</span>
<span>40%</span>
<span>60%</span>
<span>80%</span>
</div>
<div class="guide-visual__ticks">
<span>lite</span>
<span>full</span>
<span>ultra</span>
<span>extreme</span>
</div>
</div>

**建議甜蜜點:** Full caveman。

---
Expand Down
55 changes: 33 additions & 22 deletions docs/10-practical-setup.md
Original file line number Diff line number Diff line change
Expand Up @@ -371,28 +371,39 @@ Each mode has a fundamentally different token cost profile:

Understanding the loop helps you minimize steps:

```text
Step 1: Load context
├── System prompt (~500 tokens)
├── copilot-instructions.md (~50-1500 tokens)
├── Tool definitions (~2,000-20,000 tokens)
├── Conversation history (growing)
└── YOUR prompt
→ Send to LLM → Get response

Step 2: LLM decides to call a tool
├── Tool call (function + params) → output tokens
├── Tool result → input tokens (next step)
└── Reasoning about result → output tokens

Step 3: Another tool call (or generate response)
├── ALL of Step 1's context reloaded
├── + Step 2's tool call and result
└── + growing conversation
→ Send to LLM again

... repeat 5-25 times
```
<div class="guide-visual" role="img" aria-label="Agent mode loop reloading context and tool calls across repeated steps">
<p class="guide-visual__title">Agent Mode Loop</p>
<div class="guide-visual__grid guide-visual__grid--3">
<section class="guide-visual__card guide-visual__card--step">
<h4><span class="guide-visual__step-label">Step 1:</span><span class="guide-visual__step-copy">Load context</span></h4>
<ul class="guide-visual__list">
<li>System prompt (~500 tokens)</li>
<li><code>copilot-instructions.md</code> (~50-1500 tokens)</li>
<li>Tool definitions (~2,000-20,000 tokens)</li>
<li>Conversation history (growing)</li>
<li>Your prompt</li>
</ul>
<p class="guide-visual__note">Send to LLM → get response</p>
</section>
<section class="guide-visual__card guide-visual__card--step">
<h4><span class="guide-visual__step-label">Step 2:</span><span class="guide-visual__step-copy">Call tool</span></h4>
<ul class="guide-visual__list">
<li>Tool call (function + params) → output tokens</li>
<li>Tool result → input tokens</li>
<li>Reasoning about result → output tokens</li>
</ul>
</section>
<section class="guide-visual__card guide-visual__card--step">
<h4><span class="guide-visual__step-label">Step 3:</span><span class="guide-visual__step-copy">Repeat</span></h4>
<ul class="guide-visual__list">
<li>All of Step 1 context reloads</li>
<li>+ prior tool call and result</li>
<li>+ growing conversation</li>
</ul>
<p class="guide-visual__metric">Repeat 5-25 times</p>
</section>
</div>
</div>

**Key insight:** Context grows with every step. Step 15 carries all the context from steps 1-14 plus the original prompt. This is why long agent sessions get expensive fast.

Expand Down
34 changes: 34 additions & 0 deletions docs/10-practical-setup.zh-TW.md
Original file line number Diff line number Diff line change
Expand Up @@ -234,6 +234,40 @@ Agent Mode 常比 Ask Mode 貴上很多倍。

每多一步,完整 context 都可能再重送一次,而且還會帶上前一步的結果,因此後期步驟會越來越貴。

<div class="guide-visual" role="img" aria-label="Agent Mode 會在多步驟中重複重送 context 與工具結果">
<p class="guide-visual__title">Agent Mode 迴圈</p>
<div class="guide-visual__grid guide-visual__grid--3">
<section class="guide-visual__card guide-visual__card--step">
<h4><span class="guide-visual__step-label">Step 1:</span><span class="guide-visual__step-copy">載入 context</span></h4>
<ul class="guide-visual__list">
<li>System prompt(約 500 tokens)</li>
<li><code>copilot-instructions.md</code>(約 50-1500 tokens)</li>
<li>Tool definitions(約 2,000-20,000 tokens)</li>
<li>Conversation history(持續增加)</li>
<li>你的 prompt</li>
</ul>
<p class="guide-visual__note">送進 LLM → 取得回應</p>
</section>
<section class="guide-visual__card guide-visual__card--step">
<h4><span class="guide-visual__step-label">Step 2:</span><span class="guide-visual__step-copy">呼叫工具</span></h4>
<ul class="guide-visual__list">
<li>Tool call(function + params)→ output tokens</li>
<li>Tool result → 下一步的 input tokens</li>
<li>對結果做判斷 → output tokens</li>
</ul>
</section>
<section class="guide-visual__card guide-visual__card--step">
<h4><span class="guide-visual__step-label">Step 3:</span><span class="guide-visual__step-copy">再次重送</span></h4>
<ul class="guide-visual__list">
<li>Step 1 的 context 全部重載</li>
<li>+ Step 2 的工具呼叫與結果</li>
<li>+ 持續成長的對話內容</li>
</ul>
<p class="guide-visual__metric">重複 5-25 次</p>
</section>
</div>
</div>

### 4.5.3 如何減少 Agent 步數

- **Prompt 要精準,並加上 acceptance criteria**
Expand Down
Loading