InferX Beta Serverless GPU Inference Platform, Built for Agent-Native Workloads

Qwen3-TTS-12Hz-1.7B-VoiceDesign

ightweight text-to-speech model designed for customizable voice generation
Qwen audio text2audio speech

Qwen/Qwen3-TTS-12Hz-1.7B-VoiceDesign is a 1.7B parameter text-to-speech model from the Qwen3 family that enables controllable voice and style generation, delivering natural speech with low latency and efficient resource usage, making it ideal for voice assistants, agents, and personalized speech applications.

Log in to deploy: this public page shows the catalog model details, but deployment and customization stay behind login.
Log in to deploy

Metadata

Provider
Qwen
Modality
audio
API type
text2audio
Source
Created
2026-03-30 00:46:59 UTC
Updated
2026-03-30 16:49:24 UTC
Catalog version
1
Visibility
Published

Specifications

Parameters
1.70B
MoE
No
Max model length
Image
vllm/vllm-omni:v0.14.0

Default Deploy Config

GPU count
1
vRAM
20000 MB
Summary
1xGPU 20000 MB

Recommended Use Cases

  • Text-to-speech

Model Spec