Qwen2.5-32B-Instruct-AWQ
High-quality instruction-tuned model quantized with AWQ for efficient, lower-memory inference.
Qwen2.5-32B-Instruct-AWQ is a 32B instruction-tuned Qwen model optimized with AWQ quantization to reduce GPU memory usage while maintaining strong reasoning, coding, and general chat performance, making it ideal for cost-efficient production deployments.
Metadata
Provider
Qwen
Modality
text
API type
text2text
Source
huggingface /
Qwen/Qwen2.5-32B-Instruct-AWQ
Created
2026-03-30 00:42:15 UTC
Updated
2026-04-13 03:13:10 UTC
Catalog version
2
Visibility
Published
Specifications
Parameters
32.00B
MoE
No
Max model length
32768
Image
vllm/vllm-openai:v0.16.0
Default Deploy Config
GPU count
1
vRAM
30000 MB
Summary
1xGPU 30000 MB
Recommended Use Cases
- Chatbot
- Code completion
- Reasoning assistant