Qwen3-TTS-12Hz-1.7B-VoiceDesign
ightweight text-to-speech model designed for customizable voice generation
Qwen/Qwen3-TTS-12Hz-1.7B-VoiceDesign is a 1.7B parameter text-to-speech model from the Qwen3 family that enables controllable voice and style generation, delivering natural speech with low latency and efficient resource usage, making it ideal for voice assistants, agents, and personalized speech applications.
Metadata
Provider
Qwen
Modality
audio
API type
text2audio
Source
huggingface /
Qwen/Qwen3-TTS-12Hz-1.7B-VoiceDesign
Created
2026-03-30 00:46:59 UTC
Updated
2026-03-30 16:49:24 UTC
Catalog version
1
Visibility
Published
Specifications
Parameters
1.70B
MoE
No
Max model length
—
Image
vllm/vllm-omni:v0.14.0
Default Deploy Config
GPU count
1
vRAM
20000 MB
Summary
1xGPU 20000 MB
Recommended Use Cases
- Text-to-speech