InferX Catalog | Qwen3-TTS-12Hz-1.7B-CustomVoice

Qwen3-TTS-12Hz-1.7B-CustomVoice

The Qwen3-TTS-12Hz-1.7B-CustomVoice represents a specialized, highly efficient iteration of Alibaba’s Qwen (Tongyi Qianwen) ecosystem, specifically tuned for Neural Text-to-Speech (TTS). At 1.7 billion parameters, it sits in the "Edge-AI" category—powerful enough to capture human-like prosody and emotion, but small enough to run with extremely low latency on local hardware or mobile devices.

Qwen audio text2audio

Log in to deploy

Metadata

Provider

Qwen

Modality

audio

API type

text2audio

Source

huggingface / Qwen/Qwen3-TTS-12Hz-1.7B-CustomVoice

Created

2026-04-06 15:38:34 UTC

Updated

2026-05-04 20:45:18 UTC

Catalog version

4

Visibility

Published

Specifications

Parameters

1.70B

MoE

No

Max model length

—

Image

vllm/vllm-omni:v0.14.0

Default Deploy Config

GPU count

1

vRAM

30000 MB

Summary

1xGPU 30000 MB

Recommended Use Cases

—

Model Spec

{
    "image": "vllm/vllm-omni:v0.14.0",
    "commands": [
        "vllm",
        "serve",
        "Qwen/Qwen3-TTS-12Hz-1.7B-CustomVoice",
        "--trust-remote-code",
        "--gpu-memory-utilization",
        "0.90",
        "--omni"
    ],
    "resources": {
        "GPU": {
            "Count": 1,
            "vRam": 30000
        }
    },
    "envs": [],
    "policy": {
        "Obj": {
            "min_replica": 0,
            "max_replica": 1,
            "standby_per_node": 1,
            "parallel": 2,
            "queue_len": 100,
            "queue_timeout": 30.0,
            "scalein_timeout": 1.0,
            "scaleout_policy": {
                "WaitQueueRatio": {
                    "wait_ratio": 0.1
                }
            },
            "runtime_config": {
                "graph_sync": false
            }
        }
    },
    "sample_query": {
        "body": {
            "input": "Hello! The vLLM-Omni server is now running successfully and I am generating speech.",
            "model": "Qwen/Qwen3-TTS-12Hz-1.7B-CustomVoice",
            "voice": "Vivian",
            "response_format": "wav"
        },
        "path": "v1/audio/speech",
        "prompt": "Can you provide ways to eat combinations of bananas and dragonfruits?",
        "apiType": "text2audio",
        "dataUrl": "",
        "prompts": [],
        "loadingTimeout": 90
    }
}