InferX Beta Serverless GPU Inference Platform, Built for Agent-Native Workloads

Moonlight-16B-A3B

Moonlight-16B-A3B is a high-efficiency Mixture-of-Experts (MoE) language model released in February 2025 by Moonshot AI (the creators of Kimi). It was designed to push the "Pareto frontier"—delivering the reasoning power of much larger models while maintaining the inference speed and VRAM footprint of a small model
moonshotai text text2text
Log in to deploy: this public page shows the catalog model details, but deployment and customization stay behind login.
Log in to deploy

Metadata

Provider
moonshotai
Modality
text
API type
text2text
Source
Created
2026-04-07 21:23:20 UTC
Updated
2026-04-13 03:13:35 UTC
Catalog version
2
Visibility
Published

Specifications

Parameters
MoE
No
Max model length
2000
Image
vllm/vllm-openai:v0.16.0

Default Deploy Config

GPU count
1
vRAM
50000 MB
Summary
1xGPU 50000 MB

Recommended Use Cases

Model Spec