InferX Catalog | Qwen3-Coder-30B-A3B-Instruct-FP8

Qwen3-Coder-30B-A3B-Instruct-FP8

Qwen3-Coder-30B-A3B-Instruct-FP8 is a specialized, high-efficiency model released in late 2025/early 2026. It is designed for agentic coding—tasks where the AI acts as an autonomous developer, interacting with environments and tools

Qwen text text2text

Log in to deploy

Metadata

Provider

Qwen

Modality

text

API type

text2text

Source

huggingface / Qwen/Qwen3-Coder-30B-A3B-Instruct-FP8

Created

2026-04-10 01:00:19 UTC

Updated

2026-04-13 03:07:35 UTC

Catalog version

2

Visibility

Published

Specifications

Parameters

30.00B

MoE

No

Max model length

2000

Image

vllm/vllm-openai:v0.16.0

Default Deploy Config

GPU count

1

vRAM

50000 MB

Summary

1xGPU 50000 MB

Recommended Use Cases

—

Model Spec

{
    "image": "vllm/vllm-openai:v0.16.0",
    "commands": [
        "--model",
        "Qwen/Qwen3-Coder-30B-A3B-Instruct-FP8",
        "--trust-remote-code",
        "--gpu-memory-utilization",
        "0.95",
        "--max-model-len",
        "2000",
        "--tensor-parallel-size=1"
    ],
    "resources": {
        "GPU": {
            "Count": 1,
            "vRam": 50000
        }
    },
    "envs": [
        [
            "VLLM_USE_DEEP_GEMM",
            "0"
        ]
    ],
    "sample_query": {
        "body": {
            "stream": "true",
            "max_tokens": "1000",
            "temperature": "0"
        },
        "path": "v1/completions",
        "prompt": "write a quick sort algorithm.",
        "apiType": "text2text",
        "dataUrl": "",
        "prompts": [],
        "loadingTimeout": 90
    }
}