InferX Catalog | Qwen2.5-7B-Instruct

Qwen2.5-7B-Instruct

Qwen2.5-7B-Instruct is part of Alibaba Cloud’s latest generation of large language models, released as an evolution of the Qwen2 series.

Qwen text text2text

Log in to deploy

Metadata

Provider

Qwen

Modality

text

API type

text2text

Source

huggingface / Qwen/Qwen2.5-7B-Instruct

Created

2026-04-12 00:35:24 UTC

Updated

2026-04-13 03:12:48 UTC

Catalog version

3

Visibility

Published

Specifications

Parameters

7.00B

MoE

No

Max model length

2000

Image

vllm/vllm-openai:v0.16.0

Default Deploy Config

GPU count

1

vRAM

70000 MB

Summary

1xGPU 70000 MB

Recommended Use Cases

—

Model Spec

{
    "image": "vllm/vllm-openai:v0.16.0",
    "commands": [
        "--model",
        "Qwen/Qwen2.5-7B-Instruct",
        "--trust-remote-code",
        "--gpu-memory-utilization",
        "0.85",
        "--max-model-len",
        "2000"
    ],
    "resources": {
        "GPU": {
            "Count": 1,
            "vRam": 70000
        }
    },
    "envs": [],
    "policy": {
        "Obj": {
            "min_replica": 0,
            "max_replica": 1,
            "standby_per_node": 1,
            "parallel": 50,
            "queue_len": 100,
            "queue_timeout": 30.0,
            "scalein_timeout": 1.0,
            "scaleout_policy": {
                "WaitQueueRatio": {
                    "wait_ratio": 0.1
                }
            },
            "runtime_config": {
                "graph_sync": false
            }
        }
    },
    "sample_query": {
        "body": {
            "stream": "true",
            "max_tokens": "1000",
            "temperature": "0"
        },
        "path": "v1/completions",
        "prompt": "write a quick sort algorithm.",
        "apiType": "text2text",
        "dataUrl": "",
        "prompts": [],
        "loadingTimeout": 90
    }
}