InferX Beta Serverless GPU Inference Platform, Built for Agent-Native Workloads

Endpoint Qwen3.6-27B-FP8

Usage Complex Coding Agents: Handling massive contexts in tools like OpenCode. Enterprise RAG: Searching across thousands of documents.

Metadata

Name
Qwen3.6-27B-FP8
Provider
Qwen
Parameter Size
27.00B
GPU Count
1
Context Length
262144
Concurrency
2.29x
Cold Start TTFT
1 sec
Recommended Use Cases
Coding assistant, Agent / tool use
Detailed Intro
Qwen3.6-27B-FP8 is a dense, high-performance LLM optimized for efficiency and long-context reasoning. Strengths High Density: Superior reasoning and coding logic compared to smaller models. Context Mastery: Native support for up to 1M tokens, ideal for full-repo analysis. Efficiency: FP8 quantization halves memory usage with minimal accuracy loss, enabling 300K+ KV cache tokens on a single GPU. Usage Complex Coding Agents: Handling massive contexts in tools like OpenCode. Enterprise RAG: Searching across thousands of documents. Cost-Effective Scaling: Higher concurrency for serverless platforms like InferX.

Log In To Use This Endpoint

This public page shows the published endpoint metadata and integration shape. Log in to get a tenant-scoped endpoint URL, inference API key, and the interactive playground. Log in

Integration

Use these values in Dify, OpenWebUI, Continue, OpenCode, or any OpenAI-compatible client that asks for a base URL, API key, and model name.

  1. Copy the API base URL into your client endpoint field.
  2. Copy the model name exactly as shown.
  3. Copy the inference API key.
https://model.inferx.net/funccall/<tenant>/endpoints/Qwen3.6-27B-FP8/v1
Qwen/Qwen3.6-27B-FP8
<INFERENCE_API_KEY>

An inference API key is required for this endpoint. Until one is available, the sample request below keeps the correct request shape and uses a placeholder token.

Sample REST Call

Model Spec