InferX Beta Serverless GPU Inference Platform, Built for Agent-Native Workloads

Qwen3-TTS-12Hz-1.7B-CustomVoice

The Qwen3-TTS-12Hz-1.7B-CustomVoice represents a specialized, highly efficient iteration of Alibaba’s Qwen (Tongyi Qianwen) ecosystem, specifically tuned for Neural Text-to-Speech (TTS). At 1.7 billion parameters, it sits in the "Edge-AI" category—powerful enough to capture human-like prosody and emotion, but small enough to run with extremely low latency on local hardware or mobile devices.
Qwen audio text2audio
Log in to deploy: this public page shows the catalog model details, but deployment and customization stay behind login.
Log in to deploy

Metadata

Provider
Qwen
Modality
audio
API type
text2audio
Source
Created
2026-04-06 15:38:34 UTC
Updated
2026-05-04 20:45:18 UTC
Catalog version
4
Visibility
Published

Specifications

Parameters
1.70B
MoE
No
Max model length
Image
vllm/vllm-omni:v0.14.0

Default Deploy Config

GPU count
1
vRAM
30000 MB
Summary
1xGPU 30000 MB

Recommended Use Cases

Model Spec