InferX Beta Serverless GPU Inference Platform, Built for Agent-Native Workloads

translategemma-27b-it-FP8-Dynamic

Multilingual translation model optimized with FP8 for fast, memory-efficient inference
kaitchup text text2text low-latency

kaitchup/translategemma-27b-it-FP8-Dynamic is a 27B instruction-tuned translation model based on Gemma, optimized with FP8 dynamic quantization to reduce memory usage and improve throughput while maintaining high-quality multilingual translation, making it ideal for production translation services and latency-sensitive deployments.

Log in to deploy: this public page shows the catalog model details, but deployment and customization stay behind login.
Log in to deploy

Metadata

Provider
kaitchup
Modality
text
API type
text2text
Source
Created
2026-03-31 23:37:25 UTC
Updated
2026-03-31 23:56:57 UTC
Catalog version
1
Visibility
Published

Specifications

Parameters
27.00B
MoE
No
Max model length
2048
Image
vllm/vllm-openai:v0.9.0

Default Deploy Config

GPU count
1
vRAM
32000 MB
Summary
1xGPU 32000 MB

Recommended Use Cases

  • Translation

Model Spec