InferX Catalog | Qwen2.5-14B-Instruct

Qwen2.5-14B-Instruct

A 14B instruction-tuned model for strong reasoning, coding, and chat tasks.

Qwen text text2text coding reasoning chat

Log in to deploy

Qwen/Qwen2.5-14B-Instruct is a 14B parameter instruction-tuned model from the Qwen2.5 family, offering strong performance across reasoning, coding, and conversational tasks, with efficient resource usage suitable for agents, assistants, and production deployments.

Metadata

Provider

Qwen

Modality

text

API type

text2text

Source

huggingface / Qwen/Qwen2.5-14B-Instruct

Created

2026-04-06 05:40:30 UTC

Updated

2026-04-13 03:13:42 UTC

Catalog version

5

Visibility

Published

Specifications

Parameters

14.00B

MoE

No

Max model length

32768

Image

vllm/vllm-openai:v0.16.0

Default Deploy Config

GPU count

1

vRAM

40000 MB

Summary

1xGPU 40000 MB

Recommended Use Cases

—

Model Spec

{
    "image": "vllm/vllm-openai:v0.16.0",
    "commands": [
        "--model",
        "Qwen/Qwen2.5-14B-Instruct",
        "--trust-remote-code",
        "--gpu-memory-utilization",
        "0.95",
        "--max-model-len",
        "32768"
    ],
    "resources": {
        "GPU": {
            "Count": 1,
            "vRam": 40000
        }
    },
    "envs": [],
    "policy": {
        "Obj": {
            "min_replica": 0,
            "max_replica": 1,
            "standby_per_node": 1,
            "parallel": 50,
            "queue_len": 100,
            "queue_timeout": 30.0,
            "scalein_timeout": 1.0,
            "scaleout_policy": {
                "WaitQueueRatio": {
                    "wait_ratio": 0.1
                }
            },
            "runtime_config": {
                "graph_sync": false
            }
        }
    },
    "sample_query": {
        "body": {
            "stream": "true",
            "max_tokens": "1000",
            "temperature": "0"
        },
        "path": "v1/completions",
        "prompt": "write a quick sort algorithm.",
        "apiType": "text2text",
        "dataUrl": "",
        "prompts": [],
        "loadingTimeout": 90
    }
}