Releases: cubist38/mlx-openai-server
v1.6.3
v1.6.2
What's Changed
- Sync/mflux by @cubist38 in #212
- feat: add Moonshot's partial mode extension by @blightbow in #213
- fix(parsers): resolve test failures after parser refactor by @lyonsno in #214
- fix: handle split reasoning/tool markers in streaming parsers by @lyonsno in #215
- fix(stream): preserve tool_call id stability across deltas by @lyonsno in #216
- feat: add DEFAULT_MIN_P environment variable support for chat completions by @lyonsno in #223
- fix: persist prompt cache when streaming is cancelled by @lyonsno in #225
- fix(parsers): harden function-parameter extraction and streaming opener-tail buffering by @lyonsno in #218
- feat(parsers): add mixed-think reasoning handoff parser and stream re-entry wiring by @lyonsno in #219
- Fix: Prevent reasoning re-injection across APIs by @lyonsno in #224
- fix(parsers): restore step_35 implicit-open compatibility and harden non-stream tool fallback by @lyonsno in #220
- refactor: enhance message handling in MLXLM and MLXVLM handlers by @cubist38 in #227
New Contributors
Full Changelog: v1.6.1...v1.6.2
v1.6.1
v1.6.0
v1.5.3
v1.5.2
What's Changed
- fix(cache): deterministic random seeds and cache leaks by @blightbow in #162
- Fix MiniMax M2 parser failing to parse multi-line HTML content in parameters by @jverkoey in #164
- Remove deprecated parser files for various models including Harmony, … by @cubist38 in #165
- feat(api): add XTC sampling and logit_bias parameters by @blightbow in #167
- fix(handler): enhance unified parser handling in MLXLMHandler by @cubist38 in #170
- Hotfix for embedding models by @icelaglace in #172
- Feat/long cat flash lite by @cubist38 in #175
- Log chat template loading results by @jverkoey in #182
- Feat/kimi k2 by @cubist38 in #184
- feat: make LRU prompt cache size configurable via CLI by @jverkoey in #187
- Refactor/function calling by @cubist38 in #193
New Contributors
- @blightbow made their first contribution in #162
- @jverkoey made their first contribution in #164
Full Changelog: v1.5.1...v1.5.2
v1.5.1
What's Changed
- Feat/glm47 flash by @cubist38 in #151
- Feat/mflux by @cubist38 in #154
- Refactor MLXVLMHandler to remove prompt caching and update parser han… by @cubist38 in #155
- Fix Harmony parser for Open WebUI tool calls by @icelaglace in #156
- Enhance MLXLMHandler to include detailed prompt token usage tracking by @cubist38 in #157
- Hotfix/gpt oss by @cubist38 in #158
- Refactor README.md for clarity and conciseness by @cubist38 in #160
New Contributors
- @icelaglace made their first contribution in #156
Full Changelog: v1.5.0...v1.5.1
v1.5.0
What's Changed
- Linting/Formatting of schemas folder by @Snuffy2 in #106
- Linting/Formatting of middleware folder by @Snuffy2 in #107
- Linting/Formatting of tests folder by @Snuffy2 in #110
- Linting/Formatting of scripts folder by @Snuffy2 in #109
- Add Nemotron3 Nano parsers and update parser registry by @cubist38 in #121
- Refactor MLXFluxHandler to enhance image generation and editing funct… by @cubist38 in #123
- (fix) parser: minimax - tool arguments json by @mialso in #124
- Refactor/server by @cubist38 in #126
- Refactor/server by @cubist38 in #128
- Implement Nemotron3 Nano parsers for reasoning and tool calls by @cubist38 in #129
- Refactor/server by @cubist38 in #130
- Refactor MLX_LM and MLX_VLM to use prompt_cache directly by @cubist38 in #131
- Refactor/server by @cubist38 in #133
- (fix) Qwen3 Coder: tool parser and message converter by @mialso in #136
- Server/enhancement by @cubist38 in #138
- Server/mlx vlm cache by @cubist38 in #139
- Server/enhance parsers by @cubist38 in #142
- Server/enhance parsers by @cubist38 in #143
- Refactor context length handling in model initialization by @cubist38 in #146
New Contributors
Full Changelog: v1.4.2...v1.5.0
v1.4.2
v1.4.1
🚀 Introducing mlx-openai-server v1.4.1 — Our Biggest Upgrade Yet!
After a month of intensive refactoring and codebase improvements, we’re thrilled to roll out v1.4.1, packed with powerful new capabilities and expanded model support.
🔧 1. Customizable Parsers for Tool Calls & Reasoning
You can now define custom parsers for both tool-call handling and reasoning-content extraction. If no parser is provided, the server falls back to None, meaning no parsing is performed by default.
Example: Launching Qwen3-VL locally
mlx-openai-server launch \
--model-path /path/to/model \
--model-type multimodal \
--tool-call-parser qwen3_vl \
--reasoning-parser qwen3_vl🖼️ 2. Expanded Support for Image Generation & Editing Models
We’ve significantly broadened support for image models, including:
- Tongyi-MAI/Z-Image-Turbo
- briaai/FIBO
- Qwen/Qwen-Image-Edit-2509
- Qwen/Qwen-Image-Edit
- Qwen/Qwen-Image
- black-forest-labs/FLUX.1-dev
- black-forest-labs/FLUX.1-schnell
- black-forest-labs/FLUX.1-Kontext-dev
Example: Running z-image-turbo locally on your MacBook
mlx-openai-server launch \
--model-path /path/to/model \
--config-name z-image-turbo \
--model-type image-generationWith only 9 steps and the prompt:
"Young Chinese woman in red Hanfu, intricate embroidery. Impeccable makeup, red floral forehead pattern. Elaborate high bun, golden phoenix headdress, red flowers, beads. Holds round folding fan with lady, trees, bird. Neon lightning-bolt lamp (⚡️), bright yellow glow, above extended left palm. Soft-lit outdoor night background, silhouetted tiered pagoda (西安大雁塔), blurred colorful distant lights."
Full Changelog: v1.3.12...v1.4.0
Full Changelog: v1.4.0...v1.4.1