Skip to content

Releases: cubist38/mlx-openai-server

v1.6.3

08 Mar 10:21
f551524

Choose a tag to compare

What's Changed

Full Changelog: v1.6.2...v1.6.3

v1.6.2

06 Mar 17:10

Choose a tag to compare

What's Changed

  • Sync/mflux by @cubist38 in #212
  • feat: add Moonshot's partial mode extension by @blightbow in #213
  • fix(parsers): resolve test failures after parser refactor by @lyonsno in #214
  • fix: handle split reasoning/tool markers in streaming parsers by @lyonsno in #215
  • fix(stream): preserve tool_call id stability across deltas by @lyonsno in #216
  • feat: add DEFAULT_MIN_P environment variable support for chat completions by @lyonsno in #223
  • fix: persist prompt cache when streaming is cancelled by @lyonsno in #225
  • fix(parsers): harden function-parameter extraction and streaming opener-tail buffering by @lyonsno in #218
  • feat(parsers): add mixed-think reasoning handoff parser and stream re-entry wiring by @lyonsno in #219
  • Fix: Prevent reasoning re-injection across APIs by @lyonsno in #224
  • fix(parsers): restore step_35 implicit-open compatibility and harden non-stream tool fallback by @lyonsno in #220
  • refactor: enhance message handling in MLXLM and MLXVLM handlers by @cubist38 in #227

New Contributors

Full Changelog: v1.6.1...v1.6.2

v1.6.1

23 Feb 00:52

Choose a tag to compare

What's Changed

Full Changelog: v1.6.0...v1.6.1

v1.6.0

22 Feb 10:08

Choose a tag to compare

What's Changed

  • fix: preserve assistant messages with tool_calls when content is null by @loveqoo in #205
  • Server/OpenAI compatible api by @cubist38 in #207

New Contributors

Full Changelog: v1.5.3...v1.6.0

v1.5.3

12 Feb 02:29

Choose a tag to compare

What's Changed

Full Changelog: v1.5.2...v1.5.3

v1.5.2

07 Feb 14:02

Choose a tag to compare

What's Changed

  • fix(cache): deterministic random seeds and cache leaks by @blightbow in #162
  • Fix MiniMax M2 parser failing to parse multi-line HTML content in parameters by @jverkoey in #164
  • Remove deprecated parser files for various models including Harmony, … by @cubist38 in #165
  • feat(api): add XTC sampling and logit_bias parameters by @blightbow in #167
  • fix(handler): enhance unified parser handling in MLXLMHandler by @cubist38 in #170
  • Hotfix for embedding models by @icelaglace in #172
  • Feat/long cat flash lite by @cubist38 in #175
  • Log chat template loading results by @jverkoey in #182
  • Feat/kimi k2 by @cubist38 in #184
  • feat: make LRU prompt cache size configurable via CLI by @jverkoey in #187
  • Refactor/function calling by @cubist38 in #193

New Contributors

Full Changelog: v1.5.1...v1.5.2

v1.5.1

27 Jan 03:32

Choose a tag to compare

What's Changed

New Contributors

Full Changelog: v1.5.0...v1.5.1

v1.5.0

15 Jan 03:43

Choose a tag to compare

What's Changed

New Contributors

Full Changelog: v1.4.2...v1.5.0

v1.4.2

09 Dec 04:25

Choose a tag to compare

What's Changed

Full Changelog: v1.4.1...v1.4.2

v1.4.1

05 Dec 08:15

Choose a tag to compare

🚀 Introducing mlx-openai-server v1.4.1 — Our Biggest Upgrade Yet!

After a month of intensive refactoring and codebase improvements, we’re thrilled to roll out v1.4.1, packed with powerful new capabilities and expanded model support.

🔧 1. Customizable Parsers for Tool Calls & Reasoning

You can now define custom parsers for both tool-call handling and reasoning-content extraction. If no parser is provided, the server falls back to None, meaning no parsing is performed by default.

Example: Launching Qwen3-VL locally

mlx-openai-server launch \
  --model-path /path/to/model \
  --model-type multimodal \
  --tool-call-parser qwen3_vl \
  --reasoning-parser qwen3_vl

🖼️ 2. Expanded Support for Image Generation & Editing Models

We’ve significantly broadened support for image models, including:

  • Tongyi-MAI/Z-Image-Turbo
  • briaai/FIBO
  • Qwen/Qwen-Image-Edit-2509
  • Qwen/Qwen-Image-Edit
  • Qwen/Qwen-Image
  • black-forest-labs/FLUX.1-dev
  • black-forest-labs/FLUX.1-schnell
  • black-forest-labs/FLUX.1-Kontext-dev

Example: Running z-image-turbo locally on your MacBook

mlx-openai-server launch \
  --model-path /path/to/model \
  --config-name z-image-turbo \
  --model-type image-generation

With only 9 steps and the prompt:

"Young Chinese woman in red Hanfu, intricate embroidery. Impeccable makeup, red floral forehead pattern. Elaborate high bun, golden phoenix headdress, red flowers, beads. Holds round folding fan with lady, trees, bird. Neon lightning-bolt lamp (⚡️), bright yellow glow, above extended left palm. Soft-lit outdoor night background, silhouetted tiered pagoda (西安大雁塔), blurred colorful distant lights."

output

Full Changelog: v1.3.12...v1.4.0
Full Changelog: v1.4.0...v1.4.1