InferX Catalog | Serverless Model Catalog For Agent-Native Workloads

Provider

Modality

API Type

Tag

Use Case

Model	Intro	Tags	Action
chatterbox-tts	Chatterbox TTS is a state-of-the-art, open-source text-to-speech system developed by Resemble AI. It supports zero-shot voice cloning, emotion control, and high-quality audio generation, all MIT-licensed and fully production-ready. View on Hugging Face		View Details Log in to deploy
cogito-v1-preview-llama-3B	Cogito-v1-preview-llama-3B is a high-performance "hybrid reasoning" model released by Deep Cogito. Built on the Llama 3.2 3B architecture View on Hugging Face		View Details Log in to deploy
cogito-v1-preview-llama-8B	Cogito-v1-preview-llama-8B is an 8-billion parameter hybrid reasoning model released in April 2025 by San Francisco-based startup Deep Cogito. View on Hugging Face		View Details Log in to deploy
cogito-v1-preview-qwen-14B	Cogito-v1-preview-qwen-14B is a hybrid reasoning model developed by Deep Cogito, a San Francisco-based startup that emerged from stealth in April 2025. It is built on the Qwen 2.5 architecture but heavily modified to include self-reflection and "deep thinking" capabilities similar to OpenAI’s o1 or DeepSeek-R1. View on Hugging Face		View Details Log in to deploy
cogito-v1-preview-qwen-32B	Cogito-v1-preview-qwen-32B (often referred to as Cogito v1 Preview) is a high-performance hybrid reasoning model developed by Deep Cogito. View on Hugging Face		View Details Log in to deploy
context-1	Context-1 is a 20B parameter agentic search model trained to retrieve supporting documents for complex, multi-hop queries. It is designed to be used as a retrieval subagent alongside a frontier reasoning model View on Hugging Face		View Details Log in to deploy
Cydonia-24B-v4.3	general-purpose model optimized for strong reasoning, coding, and chat performance View on Hugging Face	coding reasoning	View Details Log in to deploy
DeepCoder-14B-Preview	DeepCoder-14B-Preview is a high-performance, open-source code reasoning model View on Hugging Face		View Details Log in to deploy
DeepCoder-1.5B-Preview	DeepCoder-1.5B-Preview is a lightweight yet powerful code-reasoning model released in April 2025 as part of the DeepCoder series by the Agentica team and Together AI. View on Hugging Face		View Details Log in to deploy
DeepSeek-Coder-V2-Lite-Instruct	A lightweight coding model designed for efficient code generation and reasoning. View on Hugging Face	reasoning coding	View Details Log in to deploy
DeepSeek-OCR	DeepSeek-OCR (released in late 2025, with v2 arriving in January 2026) is a specialized multimodal model designed to solve the "token explosion" problem in traditional Document AI. While standard Vision-Language Models (VLMs) often convert a single page into thousands of tokens, DeepSeek-OCR treats OCR as a multimodal compression task, achieving high accuracy with a fraction of the computational cost. View on Hugging Face		View Details Log in to deploy
DeepSeek-Prover-V2-7B	DeepSeek-Prover-V2-7B is a specialized, open-source language model released in 2025 that focuses on formal theorem proving in Lean 4. View on Hugging Face		View Details Log in to deploy
DeepSeek-R1-Distill-Qwen-14B	DeepSeek-R1-Distill-Qwen-14B is a premier reasoning model from the DeepSeek-R1 family, specifically engineered to deliver frontier-level logic within a compact 14.7-billion parameter frame. View on Hugging Face		View Details Log in to deploy
Devstral-Small-2-24B-Instruct-2512	Devstral is an agentic LLM for software engineering tasks. Devstral Small 2 excels at using tools to explore codebases, editing multiple files and power software engineering agents. The model achieves remarkable performance on SWE-bench. View on Hugging Face		View Details Log in to deploy
Devstral-Small-2505	Devstral-Small-2505 is a specialized, agentic large language model released in May 2025 through a collaboration between Mistral AI and All Hands AI. View on Hugging Face		View Details Log in to deploy
ERNIE-4.5-21B-A3B-PT	ERNIE-4.5-21B-A3B-PT is a high-efficiency Large Language Model (LLM) developed by Baidu, released as part of their ERNIE 4.5 family in late 2025. It is specifically designed to balance high-level reasoning capabilities with low computational costs. View on Hugging Face		View Details Log in to deploy
FLUX.1-dev	FLUX.1 [dev] is a 12 billion parameter rectified flow transformer capable of generating images from text descriptions. View on Hugging Face		View Details Log in to deploy
FLUX.2-klein-9B	FLUX.2-klein-9B is a high-performance, mid-sized text-to-image model that belongs to the next generation of the FLUX family (developed by Black Forest Labs). View on Hugging Face		View Details Log in to deploy
gemma-3-12b-it	Gemma-3-12B-IT is a mid-sized, instruction-tuned multimodal model from Google’s Gemma 3 family, View on Hugging Face		View Details Log in to deploy
gemma-3-1b-it	a compact instruction-tuned model designed for fast and efficient general-purpose tasks View on Hugging Face	low-latency	View Details Log in to deploy
gemma-3-27b-it	instruction-tuned model designed for strong reasoning, coding, and chat performance View on Hugging Face	reasoning coding chat	View Details Log in to deploy
gemma-3-4b-it	The Gemma-3-4B-IT (Instruction Tuned) is a mid-sized, multimodal model from Google’s latest open-weights family, released in March 2025. It represents a significant architectural shift from the Gemma 2 series, moving from a text-only focus to a native vision-language (multimodal) design. View on Hugging Face		View Details Log in to deploy
gemma-3n-E4B-it	Gemma-3n-E4B-it is part of the experimental "N" (Native) series from Google, released in early 2026. This model represents a pivot toward native multimodal reasoning, View on Hugging Face		View Details Log in to deploy
gemma-4-31B-it	gemma-4-31B-it View on Hugging Face		View Details Log in to deploy
gemma-4-31B-it-uncensored-heretic	A 31B Gemma-4 model modified for reduced safety restrictions and more open responses View on Hugging Face	reasoning coding chat	View Details Log in to deploy
gemma-4-31B-it-uncensored-heretic-2GPU	gemma-4-31B-it-uncensored-heretic-2GPU View on Hugging Face		View Details Log in to deploy
gemma-4-E2B-it	An efficient Gemma 4 model optimized for strong performance with lower resource usage View on Hugging Face	low-latency	View Details Log in to deploy
gemma-4-E2B-it_sound	gemma-4-E2B-it with sound support View on Hugging Face		View Details Log in to deploy
gemma-4-E4B-it	gemma-4-E4B-it View on Hugging Face		View Details Log in to deploy
Glistening-Gem-31B-v1.0	Glistening-Gem-31B-v1.0 View on Hugging Face		View Details Log in to deploy
GLM-OCR	GLM-OCR is a compact, high-performance multimodal model released in February 2026 by Zhipu AI (Z.ai). It is specifically designed to bridge the gap between traditional OCR (character recognition) and full "Document Understanding" (layout, tables, and reasoning). View on Hugging Face		View Details Log in to deploy
GLM-Z1-32B-0414	GLM-Z1-32B-0414 is a high-performance, open-source reasoning model with 32 billion parameters, released by the zai-org group View on Hugging Face		View Details Log in to deploy
gpt-oss-20b	The GPT-OSS-20B (Generative Pre-trained Transformer - Open Source Software) is a significant milestone in the move toward high-performance, transparent large language models. It is part of a broader family of models designed to provide a powerful, open-source alternative to proprietary models like GPT-3 or early GPT-4 iterations. View on Hugging Face		View Details Log in to deploy
gpt-oss-safeguard-20b	GPT-OSS-Safeguard-20B is an open-weight, safety-focused reasoning model View on Hugging Face		View Details Log in to deploy
Holo3-35B-A3B	Holo3 is our latest generation of large-scale Vision-Language Models (VLMs) specifically optimized for GUI Agents. View on Hugging Face		View Details Log in to deploy
Huihui-Qwen3.5-27B-Claude-4.6-Opus-abliterated	Model tuned for Claude-style reasoning with reduced safety restrictions View on Hugging Face	reasoning	View Details Log in to deploy
Huihui-Qwen3.5-35B-A3B-abliterated	This is an uncensored version of Qwen/Qwen3.5-35B-A3B created with abliteration View on Hugging Face		View Details Log in to deploy
Huihui-Qwen3.6-35B-A3B-abliterated	The Huihui-Qwen3.6-35B-A3B-abliterated (released April 19, 2026) is a specialized variant of Alibaba's latest Qwen3.6 MoE mode View on Hugging Face		View Details Log in to deploy
Huihui-ThinkingCap-Qwen3.6-27B-abliterated	Huihui-ThinkingCap-Qwen3.6-27B-abliterated View on Hugging Face		View Details Log in to deploy
Inferx-bundle-Qwen3.6-35B-A3B-FP8-Qwen3-Embedding-0.6B-Qwen3-Reranker-0.6B	this is a bundle of Qwen3.6-35B-A3B-FP8, Qwen3-Embedding-0.6B and Qwen3-Reranker-0.6B View on Hugging Face		View Details Log in to deploy
InnerVerse-GLM47Flash-v1	A fast, reasoning-focused model optimized for efficient inference and strong instruction following View on Hugging Face	coding reasoning	View Details Log in to deploy
IntelliAsk-Qwen3-32B-450-Merged	IntelliAsk-Qwen3-32B-450-Merged View on Hugging Face		View Details Log in to deploy
InternVL3_5-38B-FP8-Dynamic	InternVL3.5-38B-FP8-Dynamic is a state-of-the-art multimodal large language model (MLLM) optimized for high-efficiency inference View on Hugging Face		View Details Log in to deploy
InternVL3_5-38B-Instruct	InternVL3.5-38B-Instruct is an advanced multimodal large language model (MLLM) released in late 2025 by Shanghai AI Laboratory. View on Hugging Face		View Details Log in to deploy
InternVL3_5-8B	InternVL3.5-8B-Instruct is the latest state-of-the-art multimodal large language model (MLLM) from OpenGVLab (Shanghai AI Lab), View on Hugging Face		View Details Log in to deploy
Kimi-Linear-48B-A3B-Instruct-AWQ-8bit	Kimi-Linear-48B-A3B-Instruct-AWQ-8bit is a high-efficiency, long-context model released by Moonshot AI in late 2025. It represents a significant departure from standard Transformer architectures, specifically designed to eliminate the "quadratic bottleneck" that usually slows down long-context processing. View on Hugging Face		View Details Log in to deploy
Kimi-VL-A3B-Thinking-2506	Kimi-VL-A3B-Thinking-2506 is a state-of-the-art vision-language model (VLM) released by Moonshot AI in mid-2025. View on Hugging Face		View Details Log in to deploy
L3.3-70B-Loki-V2.0	Llama-based model tuned for immersive roleplay, storytelling, and strong narrative consistency View on Hugging Face	long-context	View Details Log in to deploy
Llama-3.1-8B-Instruct	Llama-3.1-8B-Instruct is the lightweight, instruction-tuned variant of Meta’s Llama 3.1 family. View on Hugging Face		View Details Log in to deploy
Llama-3.3-70B-Instruct-AWQ	Llama-3.3-70B-Instruct-AWQ is the 4-bit quantized version of Meta's December 2024 flagship "efficiency" model. View on Hugging Face		View Details Log in to deploy
Magistral-Small-2509-AWQ-4bit	Magistral-Small-2509-AWQ-4bit is the 4-bit quantized version of Mistral AI's Magistral Small 1.2 View on Hugging Face		View Details Log in to deploy
medgemma-27b-text-it-FP8-Dynamic	MedGemma-27B-Text-IT-FP8-Dynamic is an FP8 Dynamic–quantized derivative of Google’s MedGemma-27B-Text-IT model, optimized for high-throughput inference while preserving strong performance on medical and biomedical instruction-tuned text-only tasks. View on Hugging Face		View Details Log in to deploy
Midnight-Miqu-70B-v1.5-FP8-Dynamic	Midnight-Miqu-70B-v1.5 is a high-performance 70B parameter model specifically engineered for creative writing, long-form roleplay, and complex character interactions. It is a "DARE Linear" merge of Midnight-Miqu-v1.0 and Tess-v1.6, designed to retain the legendary prose quality of the original "Miqu" (the leaked Mistral-70B weights) while improving instruction following and world-state tracking. View on Hugging Face		View Details Log in to deploy
Ministral-3-14B-Reasoning-2512	The Ministral-3-14B-Reasoning-2512 (often referred to as part of the "Les Ministraux" family) is one of Mistral AI's most sophisticated "mid-weight" models. It is specifically engineered to bridge the gap between low-latency edge computing and the deep reasoning capabilities typically reserved for massive 70B+ parameter models. View on Hugging Face		View Details Log in to deploy
Ministral-3-8B-Instruct-2512-BF16	Ministral-3-8B-Instruct-2512-BF16 (released in December 2025/January 2026) is the newest "edge-sovereign" multimodal model from Mistral AI. View on Hugging Face		View Details Log in to deploy
Mistral-Small-24B-Instruct-2501	Mistral-Small-24B-Instruct-2501 (often referred to as Mistral Small 3) is a high-efficiency language model released in late January 2025. View on Hugging Face		View Details Log in to deploy
Mixtral-8x7B-Instruct-v0.1	Mixtral-8x7B-Instruct-v0.1 is a high-performance Sparse Mixture-of-Experts (SMoE) model released by Mistral AI. View on Hugging Face		View Details Log in to deploy
Molmo2-4B	Molmo2-4B is a highly efficient, small-scale Vision-Language Model (VLM) View on Hugging Face		View Details Log in to deploy
Molmo2-8B	multimodal model optimized for image and video understanding with strong grounding and reasoning capabilities View on Hugging Face	multimodal	View Details Log in to deploy
Moonlight-16B-A3B	Moonlight-16B-A3B is a high-efficiency Mixture-of-Experts (MoE) language model released in February 2025 by Moonshot AI (the creators of Kimi). It was designed to push the "Pareto frontier"—delivering the reasoning power of much larger models while maintaining the inference speed and VRAM footprint of a small model View on Hugging Face		View Details Log in to deploy
NextCoder-14B	NextCoder-14B is a specialized large language model designed for code editing and modification View on Hugging Face		View Details Log in to deploy
NextCoder-7B	NextCoder-7B is a specialized, open-weights large language model (LLM) developed by Microsoft Foundry View on Hugging Face		View Details Log in to deploy
notux-8x7b-v1-AWQ	Notux-8x7b-v1-AWQ is a high-performance, 4-bit quantized version of the Notux-8x7b-v1 model. It combines a state-of-the-art Mixture-of-Experts (MoE) architecture with Activation-aware Weight Quantization (AWQ) for efficient deployment on NVIDIA GPUs. View on Hugging Face		View Details Log in to deploy
NVIDIA-Nemotron-3-Nano-30B-A3B-FP8	Optimized with FP8 for fast, efficient reasoning and chat workloads. View on Hugging Face	reasoning high-throughput agentic	View Details Log in to deploy
NVIDIA-Nemotron-3-Nano-30B-A3B-FP8-temp	test View on Hugging Face		View Details Log in to deploy
NVIDIA-Nemotron-Nano-12B-v2-VL-FP8	NVIDIA-Nemotron-Nano-12B-v2-VL-FP8 is a cutting-edge multimodal model released by NVIDIA in late 2025. It is specifically engineered for high-throughput, low-latency applications like document intelligence and long-form video understanding. View on Hugging Face		View Details Log in to deploy
NVIDIA-Nemotron-Nano-9B-v2	NVIDIA-Nemotron-Nano-9B-v2 is a 9-billion-parameter hybrid language model designed for high-efficiency reasoning and agentic workflows. View on Hugging Face		View Details Log in to deploy
Olmo-3.1-32B-Think-AWQ-4bit	OLMo-3.1-32B-Think-AWQ-4bit is a high-efficiency, reasoning-optimized version of the OLMo 3.1 family View on Hugging Face		View Details Log in to deploy
OpenEuroLLM-Czech-vLLM-GGUF	a Czech-language model optimized for local inference using the GGUF format View on Hugging Face		View Details Log in to deploy
OpenThinker3-7B	OpenThinker3-7B is a state-of-the-art open-source reasoning model View on Hugging Face		View Details Log in to deploy
OpenThinker-Agent-v1	OpenThinker-Agent-v1 is a state-of-the-art, 8-billion parameter open-source model specifically engineered for terminal automation and software engineering tasks. View on Hugging Face		View Details Log in to deploy
Phi-3.5-vision-instruct	Phi-3.5-vision-instruct is a lightweight, multimodal small language model (SLM) released by Microsoft. View on Hugging Face		View Details Log in to deploy
Phi-4-mini-reasoning	Phi-4-mini-reasoning is a compact, open-weight reasoning model from Microsoft, designed to bring high-level logical and mathematical "thinking" to small-scale hardware. View on Hugging Face		View Details Log in to deploy
Qianfan-OCR	Qianfan-OCR is a 4B-parameter end-to-end document intelligence model developed by the Baidu Qianfan Team. It unifies document parsing, layout analysis, and document understanding within a single vision-language architecture. View on Hugging Face		View Details Log in to deploy
Qwen2.5-14B-Instruct	A 14B instruction-tuned model for strong reasoning, coding, and chat tasks. View on Hugging Face	coding reasoning chat	View Details Log in to deploy
Qwen2.5-32B-Instruct-AWQ	High-quality instruction-tuned model quantized with AWQ for efficient, lower-memory inference. View on Hugging Face	reasoning coding chat	View Details Log in to deploy
Qwen2.5-7B-Instruct	Qwen2.5-7B-Instruct is part of Alibaba Cloud’s latest generation of large language models, released as an evolution of the Qwen2 series. View on Hugging Face		View Details Log in to deploy
Qwen2.5-7B-Instruct-Test	a test model for special image, not for public usage. View on Hugging Face		View Details Log in to deploy
Qwen2.5-Coder-0.5B	A lightweight coding model for fast, efficient code generation and debugging View on Hugging Face	coding	View Details Log in to deploy
Qwen2.5-Coder-1.5B-Instruct	A lightweight coding model for fast, low-cost code generation and editing. View on Hugging Face	coding agentic	View Details Log in to deploy
Qwen2.5-VL-32B-Instruct-AWQ	The Qwen2.5-VL-32B-Instruct-AWQ is a high-performance, vision-language model optimized for efficient inference. It represents a significant step up in complexity and reasoning from the 7B models, sitting in the "heavyweight" class that typically requires multi-GPU setups or advanced quantization to run at interactive speeds. View on Hugging Face		View Details Log in to deploy
Qwen2.5-VL-7B-Instruct	Qwen2.5-VL-7B-Instruct is the latest iteration of Alibaba Cloud’s vision-language models, released in early 2025. View on Hugging Face		View Details Log in to deploy
Qwen2-VL-7B-Instruct	The Qwen2-VL-7B-Instruct is a cornerstone model in the second generation of Alibaba's Vision-Language (VL) series. It was released as a major upgrade to the original Qwen-VL View on Hugging Face		View Details Log in to deploy
Qwen3-14B	A 14B general-purpose model designed for strong reasoning, coding, and chat performance. View on Hugging Face	reasoning coding chat	View Details Log in to deploy
Qwen3-32B	A general-purpose model designed for strong reasoning, coding, and conversational performance. View on Hugging Face	reasoning chat coding	View Details Log in to deploy
Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled	Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled is a specialized reasoning model released in March 2026. It is built on Alibaba's Qwen3.5-27B architecture and fine-tuned using high-density Chain-of-Thought (CoT) distillation from Anthropic’s Claude 4.6 Opus. View on Hugging Face		View Details Log in to deploy
Qwen3.5-27B-FP8	Optimized with FP8 quantization for efficient, high-performance inference View on Hugging Face	reasoning coding chat	View Details Log in to deploy
Qwen3.5-2B	Qwen3.5-2B View on Hugging Face		View Details Log in to deploy
Qwen3.5-35B-A3B	A model designed for strong reasoning, coding, and agent workloads with improved efficiency. View on Hugging Face	reasoning coding chat	View Details Log in to deploy
Qwen3.5-35B-A3B-FP8	Optimized with FP8 for efficient, high-performance reasoning and chat workloads View on Hugging Face	high-throughput reasoning coding agentic	View Details Log in to deploy
Qwen3.5-35B-A3B-GPTQ-Int4	The Qwen3.5-35B-A3B-GPTQ-Int4 is a specialized, highly optimized version of the Qwen3.5 model family. View on Hugging Face		View Details Log in to deploy
Qwen3.5-4B	Qwen3.5-4B is a compact, natively multimodal model released by Alibaba Cloud in February 2026. View on Hugging Face		View Details Log in to deploy
Qwen3.5-9B	A 9B general-purpose model designed for strong reasoning, coding, and chat performance View on Hugging Face	reasoning coding chat	View Details Log in to deploy
Qwen3.5-9B-AWQ	a 9B Qwen3.5 model quantized with AWQ for efficient, low-memory inference. View on Hugging Face	reasoning coding chat	View Details Log in to deploy
Qwen3.5-9B-NVFP4	A 9B Qwen3.5 model quantized to NVFP4 for ultra-efficient, low-memory inference. View on Hugging Face	reasoning coding chat	View Details Log in to deploy
Qwen3.6-27B	Qwen3.6-27B View on Hugging Face		View Details Log in to deploy
Qwen3.6-27B-FP8	Qwen3.6-27B-FP8 View on Hugging Face		View Details Log in to deploy
Qwen3.6-35B-A3B	Qwen3.6-35B-A3B (released April 14, 2026) is the first open-weight model of the Qwen3.6 series. View on Hugging Face		View Details Log in to deploy
Qwen3.6-35B-A3B-AWQ	The QuantTrio/Qwen3.6-35B-A3B-AWQ is a high-performance, 4-bit quantized version of the Qwen3.6-35B-A3B model View on Hugging Face		View Details Log in to deploy
Qwen3.6-35B-A3B-FP8	Qwen3.6-35B-A3B-FP8 (officially released on April 16, 2026) is the first natively quantized FP8 variant of the Qwen3.6 series. View on Hugging Face		View Details Log in to deploy
Qwen3-ASR-1.7B	The Qwen3-ASR family includes Qwen3-ASR-1.7B and Qwen3-ASR-0.6B, which support language identification and ASR for 52 languages and dialects. Both leverage large-scale speech training data and the strong audio understanding capability of their foundation model, Qwen3-Omni. View on Hugging Face		View Details Log in to deploy
Qwen3-Coder-30B-A3B-Instruct-1M-GGUF	Qwen3-Coder is available in multiple sizes. Today, we're excited to introduce Qwen3-Coder-30B-A3B-Instruct. This streamlined model maintains impressive performance and efficiency, View on Hugging Face		View Details Log in to deploy
Qwen3-Coder-30B-A3B-Instruct-FP8	Qwen3-Coder-30B-A3B-Instruct-FP8 is a specialized, high-efficiency model released in late 2025/early 2026. It is designed for agentic coding—tasks where the AI acts as an autonomous developer, interacting with environments and tools View on Hugging Face		View Details Log in to deploy
Qwen3-Coder-Next-AWQ-4bit	This is a 4-bit AWQ quantized version of Qwen3-Coder-Next, an open-weight language model designed specifically for coding agents and local development. View on Hugging Face		View Details Log in to deploy
Qwen3-Coder-Next-FP8	Qwen3-Coder-Next-FP8, an open-weight language model designed specifically for coding agents and local development. View on Hugging Face		View Details Log in to deploy
Qwen3-Omni-30B-A3B-Instruct-iws	test View on Hugging Face		View Details Log in to deploy
Qwen3-TTS-12Hz-1.7B-Base	A lightweight text-to-speech model designed for efficient, high-quality speech synthesis View on Hugging Face	audio speech	View Details Log in to deploy
Qwen3-TTS-12Hz-1.7B-CustomVoice	The Qwen3-TTS-12Hz-1.7B-CustomVoice represents a specialized, highly efficient iteration of Alibaba’s Qwen (Tongyi Qianwen) ecosystem, specifically tuned for Neural Text-to-Speech (TTS). At 1.7 billion parameters, it sits in the "Edge-AI" category—powerful enough to capture human-like prosody and emotion, but small enough to run with extremely low latency on local hardware or mobile devices. View on Hugging Face		View Details Log in to deploy
Qwen3-TTS-12Hz-1.7B-VoiceDesign	ightweight text-to-speech model designed for customizable voice generation View on Hugging Face	speech	View Details Log in to deploy
Qwen3-VL-30B-A3B-Instruct	Qwen3-VL-30B-A3B-Instruct is a state-of-the-art multimodal Large Vision-Language Model (LVLM) released by the Qwen team (Alibaba Cloud) in late 2025. View on Hugging Face		View Details Log in to deploy
Qwen3-VL-32B-Instruct-FP8	Qwen3-VL-32B-Instruct-FP8 represents the pinnacle of mid-sized multimodal intelligence from Alibaba Cloud (Qwen Team), View on Hugging Face		View Details Log in to deploy
Qwen-Image-2512	Enhanced Huamn Realism Qwen-Image-2512 significantly reduces the “AI-generated” look and substantially enhances overall image realism, especially for human subjects. View on Hugging Face		View Details Log in to deploy
Qwen/Qwen3-32B-AWQ	a 32B Qwen3 model quantized with AWQ for efficient, high-performance inference View on Hugging Face	reasoning coding chat	View Details Log in to deploy
Qwopus3.5-27B-v3	Jackrong/Qwopus3.5-27B-v3 is a highly specialized, reasoning-distilled version of the Qwen3.5-27B base model. View on Hugging Face		View Details Log in to deploy
rnj-1-instruct-AWQ-8bit	rnj-1-instruct-AWQ-8bit is the 8-bit quantized version of Rnj-1 Instruct, an elite 8-billion parameter agentic coding model released by Essential AI in late 2025. View on Hugging Face		View Details Log in to deploy
RolmOCR	RolmOCR is an open-source, high-performance document OCR model developed by Reducto AI as a lighter and faster alternative to Allen Institute for AI's olmOCR. View on Hugging Face		View Details Log in to deploy
Seed-OSS-36B-Instruct-AWQ	Seed-OSS-36B-Instruct-AWQ is a 4-bit quantized version of ByteDance’s Seed-OSS-36B, a mid-sized but extremely powerful open-source model released in August 2025. View on Hugging Face		View Details Log in to deploy
stable-diffusion-3.5-medium	Stable Diffusion 3.5 Medium (SD 3.5 Medium) is a state-of-the-art text-to-image model released by Stability AI View on Hugging Face		View Details Log in to deploy
Step3-VL-10B	Step3-VL-10B is an open-source multimodal large language model (MLLM) released in January 2026 by StepFun (Stepwise Star). View on Hugging Face		View Details Log in to deploy
Strand-Rust-Coder-14B-v1	The model fine-tunes Qwen2.5-Coder-14B for Rust-specific programming tasks using a 191K-example synthetic dataset built via multi-model generation and peer-reviewed validation. View on Hugging Face		View Details Log in to deploy
translategemma-27b-it-FP8-Dynamic	Multilingual translation model optimized with FP8 for fast, memory-efficient inference View on Hugging Face	low-latency	View Details Log in to deploy
VibeVoice-ASR	A speech recognition model designed for accurate and efficient audio-to-text transcription View on Hugging Face	speech audio low-latency agentic	View Details Log in to deploy
VibeVoice-ASR-iws	test View on Hugging Face		View Details Log in to deploy
Z-Image-Turbo	Z-Image-Turbo is a 6-billion parameter text-to-image model released by Alibaba's Tongyi Lab (the team behind Qwen) in late 2025. It was specifically engineered to challenge the dominance of larger models like FLUX.1 by prioritizing extreme inference speed and bilingual text rendering without sacrificing photorealism. View on Hugging Face		View Details Log in to deploy

InferX Beta — Serverless GPU Inference Platform, Built for Agent-Native Workloads