docs: change for local provider

snowyu · snowyu · commit a4390a759aaa · 2025-04-05T14:45:12.000+08:00
diff --git a/README.cn.md b/README.cn.md
@@ -26,24 +26,53 @@ AI Agent 脚本引擎特点:
 * [可编程提示词工程测试用例单元测试](https://github.com/offline-ai/cli-plugin-cmd-test.js)
   * Fixtures Demo: https://github.com/offline-ai/cli/tree/main/examples/split-text-paragraphs
 * 智能缓存LLM大模型以及智能体调用结果，加速运行以及减少tokens开销
-* 内置本地LLM提供者(llama.cpp): 不再需要额外安装llama.cpp server.
-  * `ai brain download hf://bartowski/Qwen_QwQ-32B-GGUF -q q4_0`
-  * `ai run example.ai.yaml -P local://bartowski-qwq-32b.Q4_0.gguf`
-  * 可以在PPE脚本中指定或任意切换LLM大模型文件了
+* 支持多LLM服务提供商：
+  * （**推荐**）**内置本地LLM提供商（llama.cpp）**作为默认选项，以保护知识的安全性和隐私。
+    * 首先下载GGUF模型文件：`ai brain download hf://bartowski/Qwen_QwQ-32B-GGUF -q q4_0`
+    * 使用默认的大脑模型文件运行：`ai run example.ai.yaml`
+    * 使用指定的模型文件运行：`ai run example.ai.yaml -P local://bartowski-qwq-32b.Q4_0.gguf`
+  * 兼容OpenAI的服务提供商：
+    * OpenAI: `ai run example.ai.yaml -P openai://chatgpt-4o-latest --apiKey “sk-XXX”`
+    * DeepSeek: `ai run example.ai.yaml -P openai://deepseek-chat -u https://api.deepseek.com/ --apiKey “sk-XXX”`
+    * Siliconflow: `ai run example.ai.yaml -P openai://Qwen/Qwen2.5-Coder-7B-Instruct -u https://api.siliconflow.cn/ --apiKey “sk-XXX”`
+    * Anthropic(Claude): `ai run example.ai.yaml -P openai://claude-3-7-sonnet-latest -u https://api.anthropic.com/v1/ --apiKey “sk-XXX”`
+  * [llama-cpp服务器(llama-server)提供商](https://github.com/ggml-org/llama.cpp/tree/master/examples/server)：`ai run example.ai.yaml -P llamacpp`
+    * llama-cpp服务器不支持指定模型名称，它是在启动llama-server时通过 model 参数指定的。
+  * 您可以在PPE脚本中指定或任意切换*LLM模型或提供商*:
+
+  ```yaml
+  ---
+  parameters:
+    model: openai://deepseek-chat
+    apiUrl: https://api.deepseek.com/
+    apiKey: "sk-XXX"
+  ---
+  system: You are a helpful assistant.
+  user: "tell me a joke"
+  --- # first dialog begin
+  assistant: "[[AI]]"
+  --- # reset to first
+  assistant: "[[AI:model='openai://claude-3-7-sonnet-latest',apiUrl='https://api.anthropic.com/v1/',apiKey='sk-XXX']]"
+  --- # reset to first
+  assistant: "[[AI:model='local://bartowski-qwq-32b.Q4_0.gguf']]"
+  ```
+
+* **内置本地LLM提供商(llama.cpp)** 功能特性
   * 默认自动检测内存和GPU，并默认使用最佳计算层，自动分配gpu-layers以及上下文窗口大小（会采用尽可能大的值），以便从硬件中获得最佳性能，无需手动配置任何内容。
     * 建议上下文窗口自行配置
   * 系统安全:系统模板反注入（避免越狱）支持
-* 大模型通用工具调用（Tool Funcs）支持(仅限内置本地LLM提供者)
-  * 无需大模型专门训练即可支持，要求大模型指令遵循能力强
-  * 最小适配3B模型，推荐使用7B及以上
-  * 双重权限控制:
-    1. 脚本设定AI能够使用的工具列表
-    2. 用户设定脚本能使用的工具列表
-* 大模型通用思维模式（`shouldThink`）支持(仅限内置本地LLM提供者):
-  * 无需大模型专门训练即可支持，要求大模型指令遵循能力强
-  * 先回答再思考（`last`）
-  * 先思考再回答(`first`)
-  * 深度思考后再回答（`deep`）: 7B及以上
+  * 任意大模型通用工具调用（Tool Funcs）支持
+    * 无需大模型专门训练即可支持，要求大模型指令遵循能力强
+    * 最小适配3B模型，推荐使用7B及以上
+    * 双重权限控制:
+      1. 脚本设定AI能够使用的工具列表
+      2. 用户设定脚本能使用的工具列表
+  * 任意大模型通用思维模式（`shouldThink`）支持
+    * 无需大模型专门训练即可支持，要求大模型指令遵循能力强
+    * 最小适配3B模型，推荐使用7B及以上
+    * 先回答再思考（`last`）
+    * 先思考再回答(`first`)
+    * 深度思考后再回答（`deep`）: 7B及以上
 * Package 支持
 * PPE支持直接调用 wasm
 * 多种结构化响应输出格式类型(`response_format.type`)支持:
diff --git a/README.md b/README.md
@@ -26,24 +26,50 @@ Enjoying this project? Please star it! 🌟
 * [PPE Fixtures Unit Test](https://github.com/offline-ai/cli-plugin-cmd-test.js)
   * Unit Test Fixture Demo: https://github.com/offline-ai/cli/tree/main/examples/split-text-paragraphs
 * Smart caching of LLM large models and intelligent agent invocation results to accelerate execution and reduce token expenses.
-* Builtin local LLM provider(llama.cpp): No need to install llama.cpp server additionally.
-  * `ai brain download hf://bartowski/Qwen_QwQ-32B-GGUF -q q4_0`
-  * `ai run example.ai.yaml -P local://bartowski-qwq-32b.Q4_0.gguf`
-  * You can specify or arbitrarily switch LLM large model files in the PPE script.
+* Support for Multi LLM Service Providers:
+  * (**Recommended**) Builtin local LLM provider(llama.cpp) as default to protect the security and privacy of the knowledge.
+    * Download GGUF model file first: `ai brain download hf://bartowski/Qwen_QwQ-32B-GGUF -q q4_0`
+    * Run with the default brain model file: `ai run example.ai.yaml`
+    * Run with specified the model file: `ai run example.ai.yaml -P local://bartowski-qwq-32b.Q4_0.gguf`
+  * OpenAI Compatible Service Provider:
+    * OpenAI: `ai run example.ai.yaml -P openai://chatgpt-4o-latest --apiKey “sk-XXX”`
+    * DeepSeek: `ai run example.ai.yaml -P openai://deepseek-chat -u https://api.deepseek.com/ --apiKey “sk-XXX”`
+    * Siliconflow: `ai run example.ai.yaml -P openai://Qwen/Qwen2.5-Coder-7B-Instruct -u https://api.siliconflow.cn/ --apiKey “sk-XXX”`
+    * Anthropic(Claude): `ai run example.ai.yaml -P openai://claude-3-7-sonnet-latest -u https://api.anthropic.com/v1/ --apiKey “sk-XXX”`
+  * [llama-cpp Server(llama-server) Provider](https://github.com/ggml-org/llama.cpp/tree/master/examples/server): `ai run example.ai.yaml -P llamacpp`
+    * llama-cpp Server does not support specifying model name, It is specified with the model parameter when llama-server is started.
+  * You can specify or arbitrarily switch *LLM model or provider* in the PPE script.
+
+  ```yaml
+  ---
+  parameters:
+    model: openai://deepseek-chat
+    apiUrl: https://api.deepseek.com/
+    apiKey: "sk-XXX"
+  ---
+  system: You are a helpful assistant.
+  user: "tell me a joke"
+  ---
+  assistant: "[[AI]]"
+  ---
+  assistant: "[[AI:model='local://bartowski-qwq-32b.Q4_0.gguf']]"
+  ```
+
+* Builtin local LLM provider(llama.cpp) **Features**:
   * By default, it automatically detects memory and GPU, and uses the best computing layer by default. It automatically allocates gpu - layers and context window size (it will adopt the largest possible value) to get the best performance from the hardware without manually configuring anything.
     * It is recommended to configure the context window yourself.
   * System security: Support for system template anti-injection (to avoid jailbreaking).
-* Support for general tool invocation (Tool Funcs) of large models (only for builtin local LLM provider):
-  * Can be supported without specific training of large models, requiring strong instruction - following ability of the large model.
-  * Minimum adaptation for 3B models, recommended to use 7B and above.
-  * Dual permission control:
-    1. Scripts set the list of tools AI can use.
-    2. Users set the list of tools scripts can use.
-* Support for General Thinking Mode (`shouldThink`) of large models (only for builtin local LLM provider):
-  * Can be supported without specific training of large models, requiring strong instruction - following ability of the large model.
-  * Answer first then think (`last`).
-  * Think first then answer(`first`).
-  * Think deeply then answer(`deep`): 7B and above.
+  * Support for general tool invocation (Tool Funcs) of any LLM models (only for **builtin local LLM provider**):
+    * Can be supported without specific training of LLM, requiring LLM can accurately follow instructions.
+    * Minimum adaptation for 3B model, recommended to use 7B and above.
+    * Dual permission control:
+      1. Scripts set the list of tools AI can use.
+      2. Users set the list of tools scripts can use.
+  * Support for General Thinking Mode (`shouldThink`) of large models (only for **builtin local LLM provider**):
+    * Can be supported without specific training of LLM, requiring LLM can accurately follow instructions.
+    * Answer first then think (`last`).
+    * Think first then answer(`first`).
+    * Think deeply then answer(`deep`): 7B and above.
 * Package support.
 * PPE supports direct invocation of wasm.
 * Support for multiple structured response output format types(`response_format.type`):
@@ -58,7 +84,7 @@ Developing an intelligent application with AI Agent Script Engine involves just
   * Select a parameter size based on your application's requirements; larger sizes offer better quality but consume more resources and increase response time...
   * Choose the model's expertise: Different models are trained with distinct methods and datasets, resulting in unique capabilities...
   * Optimize quantization: Higher levels of quantization (compression) result in faster speed and smaller size, but potentially lower accuracy...
-  * Decide on the optimal context window size (`content_size`): Typically, 2048 is sufficient; this parameter also influences model performance...
+  * Decide on the optimal context window size (`max_tokens`): Typically, 2048 is sufficient; this parameter also influences model performance...
   * Use the client (`@offline-ai/cli`) directly to download the AI brain: `ai brain download`
 * Create the ai application's agent script file and debug prompts using the client (`@offline-ai/cli`): `ai run your_script.ai.yaml --interactive --loglevel info`.
 * Integrate the script into your ai application.
@@ -160,11 +186,6 @@ Downloading https://huggingface.co/QuantFactory/Phi-3-mini-4k-instruct-GGUF-v2/r
 1. https://huggingface.co/QuantFactory/Phi-3-mini-4k-instruct-GGUF-v2/resolve/main/Phi-3-mini-4k-instruct.Q4_0.gguf
    ~/.local/share/ai/brain/phi-3-mini-4k-instruct.Q4_0.gguf
 done
-mkdir llamacpp
-cd llamacpp
-#goto https://github.com/ggerganov/llama.cpp/releases/latest download latest release
-wget https://github.com/ggerganov/llama.cpp/releases/download/b3563/llama-b3563-bin-ubuntu-x64.zip
-unzip llama-b3563-bin-ubuntu-x64.zip
 ```
 
 Upgrade:
@@ -176,16 +197,7 @@ npm install -g @offline-ai/cli
 
 ## Run
 
-First run llama.cpp server(provider)
-
-```bash
-#run llama.cpp server
-cd llamacpp/build/bin
-#set -ngl 0 if no gpu
-./llama-server -t 4 -c 4096 -ngl 33 -m ~/.local/share/ai/brain/phi-3-mini-4k-instruct.Q4_0.gguf
-```
-
-Now you can run your ai agent script, eg, the `Dobby` character:
+run your ai agent script, eg, the `Dobby` character:
 
 ```bash
 $ai run --interactive --script examples/char-dobby
@@ -211,7 +223,7 @@ $ npm install -g @offline-ai/cli
 $ ai COMMAND
 running command...
 $ ai (--version)
-@offline-ai/cli/0.10.0 linux-x64 node-v20.18.3
+@offline-ai/cli/0.10.1 linux-x64 node-v20.18.3
 $ ai --help [COMMAND]
 USAGE
   $ ai COMMAND
@@ -389,31 +401,45 @@ Specific script instruction manual see: [Programmable Prompt Engine Specificatio
 # Commands
 
 <!-- commands -->
-* [`ai agent`](#ai-agent)
-* [`ai autocomplete [SHELL]`](#ai-autocomplete-shell)
-* [`ai brain [NAME]`](#ai-brain-name)
-* [`ai brain dn [NAME]`](#ai-brain-dn-name)
-* [`ai brain down [NAME]`](#ai-brain-down-name)
-* [`ai brain download [NAME]`](#ai-brain-download-name)
-* [`ai brain list [NAME]`](#ai-brain-list-name)
-* [`ai brain refresh`](#ai-brain-refresh)
-* [`ai brain search [NAME]`](#ai-brain-search-name)
-* [`ai config [ITEM_NAME]`](#ai-config-item_name)
-* [`ai config save [DATA]`](#ai-config-save-data)
-* [`ai help [COMMAND]`](#ai-help-command)
-* [`ai plugins`](#ai-plugins)
-* [`ai plugins add PLUGIN`](#ai-plugins-add-plugin)
-* [`ai plugins:inspect PLUGIN...`](#ai-pluginsinspect-plugin)
-* [`ai plugins install PLUGIN`](#ai-plugins-install-plugin)
-* [`ai plugins link PATH`](#ai-plugins-link-path)
-* [`ai plugins remove [PLUGIN]`](#ai-plugins-remove-plugin)
-* [`ai plugins reset`](#ai-plugins-reset)
-* [`ai plugins uninstall [PLUGIN]`](#ai-plugins-uninstall-plugin)
-* [`ai plugins unlink [PLUGIN]`](#ai-plugins-unlink-plugin)
-* [`ai plugins update`](#ai-plugins-update)
-* [`ai run [FILE] [DATA]`](#ai-run-file-data)
-* [`ai test [FILE]`](#ai-test-file)
-* [`ai version`](#ai-version)
+- [Offline AI PPE CLI(WIP)](#offline-ai-ppe-cliwip)
+- [Quick Start](#quick-start)
+  - [PPE CLI Command](#ppe-cli-command)
+  - [Programmable Prompt Engine Language](#programmable-prompt-engine-language)
+    - [I. Core Structure](#i-core-structure)
+    - [II. Reusability \& Configuration](#ii-reusability--configuration)
+    - [III. AI Capabilities](#iii-ai-capabilities)
+      - [IV. Message Text Formatting](#iv-message-text-formatting)
+    - [V. Script Capabilities](#v-script-capabilities)
+  - [Install](#install)
+  - [Run](#run)
+- [Usage](#usage)
+- [Commands](#commands)
+  - [`ai agent`](#ai-agent)
+  - [`ai autocomplete [SHELL]`](#ai-autocomplete-shell)
+  - [`ai brain [NAME]`](#ai-brain-name)
+  - [`ai brain dn [NAME]`](#ai-brain-dn-name)
+  - [`ai brain down [NAME]`](#ai-brain-down-name)
+  - [`ai brain download [NAME]`](#ai-brain-download-name)
+  - [`ai brain list [NAME]`](#ai-brain-list-name)
+  - [`ai brain refresh`](#ai-brain-refresh)
+  - [`ai brain search [NAME]`](#ai-brain-search-name)
+  - [`ai config [ITEM_NAME]`](#ai-config-item_name)
+  - [`ai config save [DATA]`](#ai-config-save-data)
+  - [`ai help [COMMAND]`](#ai-help-command)
+  - [`ai plugins`](#ai-plugins)
+  - [`ai plugins add PLUGIN`](#ai-plugins-add-plugin)
+  - [`ai plugins:inspect PLUGIN...`](#ai-pluginsinspect-plugin)
+  - [`ai plugins install PLUGIN`](#ai-plugins-install-plugin)
+  - [`ai plugins link PATH`](#ai-plugins-link-path)
+  - [`ai plugins remove [PLUGIN]`](#ai-plugins-remove-plugin)
+  - [`ai plugins reset`](#ai-plugins-reset)
+  - [`ai plugins uninstall [PLUGIN]`](#ai-plugins-uninstall-plugin)
+  - [`ai plugins unlink [PLUGIN]`](#ai-plugins-unlink-plugin)
+  - [`ai plugins update`](#ai-plugins-update)
+  - [`ai run [FILE] [DATA]`](#ai-run-file-data)
+  - [`ai test [FILE]`](#ai-test-file)
+  - [`ai version`](#ai-version)
+- [Credit](#credit)
 
 ## `ai agent`
 
@@ -441,7 +467,7 @@ EXAMPLES
   $ ai agent publish <agent-name>
 ```
 
-_See code: [src/commands/agent/index.ts](https://github.com/offline-ai/cli/blob/v0.10.0/src/commands/agent/index.ts)_
+_See code: [src/commands/agent/index.ts](https://github.com/offline-ai/cli/blob/v0.10.1/src/commands/agent/index.ts)_
 
 ## `ai autocomplete [SHELL]`
 
diff --git a/guide-cn.md b/guide-cn.md
@@ -198,17 +198,7 @@ user: |-
 该语句表示用户角色说的话（消息），消息内容可以使用[jinja2](https://wsgzao.github.io/post/jinja/)的模板语法。
 `|-` 是YAML语法，表示多行字符串，原样保留换行。
 
-
-让我们用用看. 首先确认后台已经在运行`llama.cpp`服务器:
-
-```bash
-#run llama.cpp server
-cd llamacpp/build/bin
-#set -ngl 0 if no gpu
-./server -t 4 -c 4096 -ngl 33 -m ~/.local/share/ai/brain/phi-3-mini-4k-instruct.Q4_0.gguf
-```
-
-确认完毕,现在试一试,翻译一段文字为葡萄牙语:
+让我们用用看. 现在试一试,翻译一段文字为葡萄牙语:
 
 ```bash
 ai run -f translator-simple.ai.yaml "{ \
@@ -279,18 +269,8 @@ balabala,说了这么多,如何安装,请看下面:
 ### Install
 
 ```bash
+# 安装
 npm install -g @offline-ai/cli
-ai brain download QuantFactory/Phi-3-mini-4k-instruct-GGUF-v2 -q Q4_0
-Downloading to ~/.local/share/ai/brain
-Downloading https://huggingface.co/QuantFactory/Phi-3-mini-4k-instruct-GGUF-v2/resolve/main/Phi-3-mini-4k-instruct.Q4_0.gguf... 5.61% 121977704 bytes
-1. https://hf-mirror.com/QuantFactory/Phi-3-mini-4k-instruct-GGUF-v2/resolve/main/Phi-3-mini-4k-instruct.Q4_0.gguf
-   ~/.local/share/ai/brain/phi-3-mini-4k-instruct.Q4_0.gguf
-done
-mkdir llamacpp
-cd llamacpp
-# 以 Ubuntu x64 系统为例
-wget https://github.com/ggerganov/llama.cpp/releases/download/b3091/llama-b3091-bin-ubuntu-x64.zip
-unzip llama-b3091-bin-ubuntu-x64.zip
 ```
 
 ### 下载脑子🧠
@@ -306,16 +286,8 @@ done
 
 ### Run
 
-首先需要运行 llama.cpp server:
-
-```bash
-#run llama.cpp server
-cd llamacpp/build/bin
-#set -ngl 0 if no gpu
-./llama-server -t 4 -c 4096 -ngl 33 -m ~/.local/share/ai/brain/phi-3-mini-4k-instruct.Q4_0.gguf
-```
-
-现在, 你可以运行智能体脚本了:
+现在, 打开命令行终端，你可以运行智能体脚本了:
+第一次运行会让你设置默认脑子。
 
 ```bash
 # -i `--interactive`: 交互方式运行
diff --git a/guide.md b/guide.md
@@ -196,16 +196,9 @@ user: |-
 This statement represents what the user (role) says (message), and the message content can use [jinja2](https://wsgzao.github.io/post/jinja/) template syntax.
 `|-` is YAML syntax, indicating a multi-line string with line breaks preserved.
 
-Let's give it a try. First, confirm that the background `llama.cpp` brain server is already running:
+Let's give it a try.
 
-```bash
-#run llama.cpp server
-cd llamacpp/build/bin
-#set -ngl 0 if no gpu
-./server -t 4 -c 4096 -ngl 33 -m ~/.local/share/ai/brain/phi-3-mini-4k-instruct.Q4_0.gguf
-```
-
-Confirmed. Now, let's try translating a piece of text into Portuguese:
+Now, let's try translating a piece of text into Portuguese:
 
 ```bash
 ai run -f translator-simple.ai.yaml "{ \
@@ -273,17 +266,6 @@ Alright, the agent script has successfully returned a JSON result. How to automa
 
 ```bash
 npm install -g @offline-ai/cli
-ai brain download QuantFactory/Phi-3-mini-4k-instruct-GGUF-v2 -q Q4_0
-Downloading to ~/.local/share/ai/brain
-Downloading https://huggingface.co/QuantFactory/Phi-3-mini-4k-instruct-GGUF-v2/resolve/main/Phi-3-mini-4k-instruct.Q4_0.gguf... 5.61% 121977704 bytes
-1. https://hf-mirror.com/QuantFactory/Phi-3-mini-4k-instruct-GGUF-v2/resolve/main/Phi-3-mini-4k-instruct.Q4_0.gguf
-   ~/.local/share/ai/brain/phi-3-mini-4k-instruct.Q4_0.gguf
-done
-mkdir llamacpp
-cd llamacpp
-# Example for Ubuntu x64 system
-wget https://github.com/ggerganov/llama.cpp/releases/download/b3091/llama-b3091-bin-ubuntu-x64.zip
-unzip llama-b3091-bin-ubuntu-x64.zip
 ```
 
 ### Download Brain(LLM) File 🧠
@@ -299,15 +281,6 @@ done
 
 ### Run
 
-First, you need to run the llama.cpp brain(LLM) server in background:
-
-```bash
-#run llama.cpp server
-cd llamacpp/build/bin
-#set -ngl 0 if no gpu
-./llama-server -t 4 -c 4096 -ngl 33 -m ~/.local/share/ai/brain/phi-3-mini-4k-instruct.Q4_0.gguf
-```
-
 Now, you can run the agent script:
 
 ```bash