Skip to content

VLLM Metal Engine Error (ValueError: Model type qwen3_5 not supported when running Qwen3.5-4B-MLX-4bit via docker model run) #767

@emi-dm

Description

@emi-dm

Description:
When attempting to run the Qwen3.5-4B-MLX-4bit model from the MLX community using the docker model run command, the model fails to preload. The underlying vllm-metal runner crashes with a ValueError indicating that the qwen3_5 model type is not supported by mlx_lm.

Steps to Reproduce:

  1. Run the following command in the terminal:
    docker model run hf.co/mlx-community/Qwen3.5-4B-MLX-4bit
  2. Enter a prompt (e.g., > Hola).
  3. Observe the crash and the traceback.

Expected Behavior:
The model should load successfully and generate a response to the prompt.

Actual Behavior / Error Logs:
The background model preload fails with a status 500. Here is the traceback:

background model preload failed: preload failed: status=500 body=unable to load runner: error waiting for runner to be ready: vllm-metal terminated unexpectedly: vllm-metal failed:    model, config = load_model(model_path, lazy, model_config=model_config)
                                                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/emi/.docker/model-runner/vllm-metal/lib/python3.12/site-packages/mlx_lm/utils.py", line 201, in load_model
    model_class, model_args_class = get_model_classes(config=config)
                                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/emi/.docker/model-runner/vllm-metal/lib/python3.12/site-packages/mlx_lm/utils.py", line 75, in _get_classes
    raise ValueError(msg)
ValueError: Model type qwen3_5 not supported.
packages/vllm_metal/server.py", line 288, in create_engine
    runner.load_model()
  File "/Users/emi/.docker/model-runner/vllm-metal/lib/python3.12/site-packages/vllm_metal/model_runner.py", line 51, in load_model
    self.model, self.tokenizer = mlx_load(
                                 ^^^^^^^^^
  File "/Users/emi/.docker/model-runner/vllm-metal/lib/python3.12/site-packages/mlx_lm/utils.py", line 322, in load

Environment details:

  • OS: macOS (Apple Silicon) M3
  • Command line tool: Docker Desktop local AI features (docker model CLI)

Additional Context:
It appears the version of mlx_lm bundled with the Docker model-runner (vllm-metal) might be outdated and missing support for the qwen3_5 architecture, which was added in more recent versions of the MLX framework.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions