Releases · cubist38/mlx-openai-server

Sync/mflux by @cubist38 in #212
feat: add Moonshot's partial mode extension by @blightbow in #213
fix(parsers): resolve test failures after parser refactor by @lyonsno in #214
fix: handle split reasoning/tool markers in streaming parsers by @lyonsno in #215
fix(stream): preserve tool_call id stability across deltas by @lyonsno in #216
feat: add DEFAULT_MIN_P environment variable support for chat completions by @lyonsno in #223
fix: persist prompt cache when streaming is cancelled by @lyonsno in #225
fix(parsers): harden function-parameter extraction and streaming opener-tail buffering by @lyonsno in #218
feat(parsers): add mixed-think reasoning handoff parser and stream re-entry wiring by @lyonsno in #219
Fix: Prevent reasoning re-injection across APIs by @lyonsno in #224
fix(parsers): restore step_35 implicit-open compatibility and harden non-stream tool fallback by @lyonsno in #220
refactor: enhance message handling in MLXLM and MLXVLM handlers by @cubist38 in #227

New Contributors

@lyonsno made their first contribution in #214

Full Changelog: v1.6.1...v1.6.2

Contributors

lyonsno, blightbow, and cubist38

Assets 2

23 Feb 00:52

cubist38

v1.6.1

99fb68e

v1.6.1

What's Changed

Add openai dependency to pyproject.toml by @cubist38 in #210

Full Changelog: v1.6.0...v1.6.1

Contributors

cubist38

Assets 2

22 Feb 10:08

cubist38

v1.6.0

c6aeb6f

v1.6.0

What's Changed

fix: preserve assistant messages with tool_calls when content is null by @loveqoo in #205
Server/OpenAI compatible api by @cubist38 in #207

New Contributors

@loveqoo made their first contribution in #205

Full Changelog: v1.5.3...v1.6.0

Contributors

loveqoo and cubist38

Assets 2

12 Feb 02:29

cubist38

v1.5.3

30cbe57

v1.5.3

What's Changed

Server/non blocking io by @cubist38 in #194
Feat/speculative decoding by @cubist38 in #195
Server/multi handlers by @cubist38 in #197
Fix/mflux by @cubist38 in #199

Full Changelog: v1.5.2...v1.5.3

Contributors

cubist38

Assets 2

07 Feb 14:02

cubist38

v1.5.2

13bd5f7

v1.5.2

What's Changed

fix(cache): deterministic random seeds and cache leaks by @blightbow in #162
Fix MiniMax M2 parser failing to parse multi-line HTML content in parameters by @jverkoey in #164
Remove deprecated parser files for various models including Harmony, … by @cubist38 in #165
feat(api): add XTC sampling and logit_bias parameters by @blightbow in #167
fix(handler): enhance unified parser handling in MLXLMHandler by @cubist38 in #170
Hotfix for embedding models by @icelaglace in #172
Feat/long cat flash lite by @cubist38 in #175
Log chat template loading results by @jverkoey in #182
Feat/kimi k2 by @cubist38 in #184
feat: make LRU prompt cache size configurable via CLI by @jverkoey in #187
Refactor/function calling by @cubist38 in #193

New Contributors

@blightbow made their first contribution in #162
@jverkoey made their first contribution in #164

Full Changelog: v1.5.1...v1.5.2

Contributors

jverkoey, blightbow, and 2 other contributors

Assets 2

27 Jan 03:32

cubist38

v1.5.1

edd9a19

v1.5.1

What's Changed

Feat/glm47 flash by @cubist38 in #151
Feat/mflux by @cubist38 in #154
Refactor MLXVLMHandler to remove prompt caching and update parser han… by @cubist38 in #155
Fix Harmony parser for Open WebUI tool calls by @icelaglace in #156
Enhance MLXLMHandler to include detailed prompt token usage tracking by @cubist38 in #157
Hotfix/gpt oss by @cubist38 in #158
Refactor README.md for clarity and conciseness by @cubist38 in #160

New Contributors

@icelaglace made their first contribution in #156

Full Changelog: v1.5.0...v1.5.1

Contributors

icelaglace and cubist38

Assets 2

15 Jan 03:43

cubist38

v1.5.0

5a217c3

v1.5.0

What's Changed

Linting/Formatting of schemas folder by @Snuffy2 in #106
Linting/Formatting of middleware folder by @Snuffy2 in #107
Linting/Formatting of tests folder by @Snuffy2 in #110
Linting/Formatting of scripts folder by @Snuffy2 in #109
Add Nemotron3 Nano parsers and update parser registry by @cubist38 in #121
Refactor MLXFluxHandler to enhance image generation and editing funct… by @cubist38 in #123
(fix) parser: minimax - tool arguments json by @mialso in #124
Refactor/server by @cubist38 in #126
Refactor/server by @cubist38 in #128
Implement Nemotron3 Nano parsers for reasoning and tool calls by @cubist38 in #129
Refactor/server by @cubist38 in #130
Refactor MLX_LM and MLX_VLM to use prompt_cache directly by @cubist38 in #131
Refactor/server by @cubist38 in #133
(fix) Qwen3 Coder: tool parser and message converter by @mialso in #136
Server/enhancement by @cubist38 in #138
Server/mlx vlm cache by @cubist38 in #139
Server/enhance parsers by @cubist38 in #142
Server/enhance parsers by @cubist38 in #143
Refactor context length handling in model initialization by @cubist38 in #146

New Contributors

@mialso made their first contribution in #124

Full Changelog: v1.4.2...v1.5.0

Contributors

Snuffy2, mialso, and cubist38

Assets 2

09 Dec 04:25

cubist38

v1.4.2

0596215

v1.4.2

What's Changed

Feat/ministral3 by @cubist38 in #116

Full Changelog: v1.4.1...v1.4.2

Contributors

cubist38

Assets 2

05 Dec 08:15

cubist38

v1.4.1

1aa5878

v1.4.1

🚀 Introducing mlx-openai-server v1.4.1 — Our Biggest Upgrade Yet!

After a month of intensive refactoring and codebase improvements, we’re thrilled to roll out v1.4.1, packed with powerful new capabilities and expanded model support.

🔧 1. Customizable Parsers for Tool Calls & Reasoning

You can now define custom parsers for both tool-call handling and reasoning-content extraction. If no parser is provided, the server falls back to None, meaning no parsing is performed by default.

Example: Launching Qwen3-VL locally

mlx-openai-server launch \
  --model-path /path/to/model \
  --model-type multimodal \
  --tool-call-parser qwen3_vl \
  --reasoning-parser qwen3_vl

🖼️ 2. Expanded Support for Image Generation & Editing Models

We’ve significantly broadened support for image models, including:

Tongyi-MAI/Z-Image-Turbo
briaai/FIBO
Qwen/Qwen-Image-Edit-2509
Qwen/Qwen-Image-Edit
Qwen/Qwen-Image
black-forest-labs/FLUX.1-dev
black-forest-labs/FLUX.1-schnell
black-forest-labs/FLUX.1-Kontext-dev

Example: Running z-image-turbo locally on your MacBook

mlx-openai-server launch \
  --model-path /path/to/model \
  --config-name z-image-turbo \
  --model-type image-generation

With only 9 steps and the prompt:

"Young Chinese woman in red Hanfu, intricate embroidery. Impeccable makeup, red floral forehead pattern. Elaborate high bun, golden phoenix headdress, red flowers, beads. Holds round folding fan with lady, trees, bird. Neon lightning-bolt lamp (⚡️), bright yellow glow, above extended left palm. Soft-lit outdoor night background, silhouetted tiered pagoda (西安大雁塔), blurred colorful distant lights."

Full Changelog: v1.3.12...v1.4.0
Full Changelog: v1.4.0...v1.4.1

Assets 2

Releases: cubist38/mlx-openai-server

v1.6.3

What's Changed

Contributors

Uh oh!

v1.6.2

What's Changed

New Contributors

Contributors

Uh oh!

v1.6.1

What's Changed

Contributors

Uh oh!

v1.6.0

What's Changed

New Contributors

Contributors

Uh oh!

v1.5.3

What's Changed

Contributors

Uh oh!

v1.5.2

What's Changed

New Contributors

Contributors

Uh oh!

v1.5.1

What's Changed

New Contributors

Contributors

Uh oh!

v1.5.0

What's Changed

New Contributors

Contributors

Uh oh!

v1.4.2

What's Changed

Contributors

Uh oh!

v1.4.1

Uh oh!