Qwen3-TTS-12Hz-1.7B-CustomVoice
The Qwen3-TTS-12Hz-1.7B-CustomVoice represents a specialized, highly efficient iteration of Alibaba’s Qwen (Tongyi Qianwen) ecosystem, specifically tuned for Neural Text-to-Speech (TTS). At 1.7 billion parameters, it sits in the "Edge-AI" category—powerful enough to capture human-like prosody and emotion, but small enough to run with extremely low latency on local hardware or mobile devices.
Metadata
Provider
Qwen
Modality
audio
API type
text2audio
Source
huggingface /
Qwen/Qwen3-TTS-12Hz-1.7B-CustomVoice
Created
2026-04-06 15:38:34 UTC
Updated
2026-05-04 20:45:18 UTC
Catalog version
4
Visibility
Published
Specifications
Parameters
1.70B
MoE
No
Max model length
—
Image
vllm/vllm-omni:v0.14.0
Default Deploy Config
GPU count
1
vRAM
30000 MB
Summary
1xGPU 30000 MB
Recommended Use Cases
—