-
Notifications
You must be signed in to change notification settings - Fork 102
Open
Description
Description:
When attempting to run the Qwen3.5-4B-MLX-4bit model from the MLX community using the docker model run command, the model fails to preload. The underlying vllm-metal runner crashes with a ValueError indicating that the qwen3_5 model type is not supported by mlx_lm.
Steps to Reproduce:
- Run the following command in the terminal:
docker model run hf.co/mlx-community/Qwen3.5-4B-MLX-4bit
- Enter a prompt (e.g.,
> Hola). - Observe the crash and the traceback.
Expected Behavior:
The model should load successfully and generate a response to the prompt.
Actual Behavior / Error Logs:
The background model preload fails with a status 500. Here is the traceback:
background model preload failed: preload failed: status=500 body=unable to load runner: error waiting for runner to be ready: vllm-metal terminated unexpectedly: vllm-metal failed: model, config = load_model(model_path, lazy, model_config=model_config)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/emi/.docker/model-runner/vllm-metal/lib/python3.12/site-packages/mlx_lm/utils.py", line 201, in load_model
model_class, model_args_class = get_model_classes(config=config)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/emi/.docker/model-runner/vllm-metal/lib/python3.12/site-packages/mlx_lm/utils.py", line 75, in _get_classes
raise ValueError(msg)
ValueError: Model type qwen3_5 not supported.
packages/vllm_metal/server.py", line 288, in create_engine
runner.load_model()
File "/Users/emi/.docker/model-runner/vllm-metal/lib/python3.12/site-packages/vllm_metal/model_runner.py", line 51, in load_model
self.model, self.tokenizer = mlx_load(
^^^^^^^^^
File "/Users/emi/.docker/model-runner/vllm-metal/lib/python3.12/site-packages/mlx_lm/utils.py", line 322, in load
Environment details:
- OS: macOS (Apple Silicon) M3
- Command line tool: Docker Desktop local AI features (
docker modelCLI)
Additional Context:
It appears the version of mlx_lm bundled with the Docker model-runner (vllm-metal) might be outdated and missing support for the qwen3_5 architecture, which was added in more recent versions of the MLX framework.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels