InferX Catalog | GLM-OCR

GLM-OCR

GLM-OCR is a compact, high-performance multimodal model released in February 2026 by Zhipu AI (Z.ai). It is specifically designed to bridge the gap between traditional OCR (character recognition) and full "Document Understanding" (layout, tables, and reasoning).

zai-org multimodal image2text

Log in to deploy

Metadata

Provider

zai-org

Modality

multimodal

API type

image2text

Source

huggingface / zai-org/GLM-OCR

Created

2026-04-06 23:48:30 UTC

Updated

2026-04-20 11:52:08 UTC

Catalog version

2

Visibility

Published

Specifications

Parameters

1.50B

MoE

No

Max model length

2000

Image

inferx/vllm-openai:v0.19.1

Default Deploy Config

GPU count

1

vRAM

24000 MB

Summary

1xGPU 24000 MB

Recommended Use Cases

—

Model Spec

{
    "image": "inferx/vllm-openai:v0.19.1",
    "commands": [
        "--model",
        "zai-org/GLM-OCR",
        "--trust-remote-code",
        "--gpu-memory-utilization",
        " 0.90",
        "--max-model-len",
        "2000"
    ],
    "resources": {
        "GPU": {
            "Count": 1,
            "vRam": 24000
        }
    },
    "envs": [],
    "policy": {
        "Obj": {
            "min_replica": 0,
            "max_replica": 1,
            "standby_per_node": 1,
            "parallel": 20,
            "queue_len": 100,
            "queue_timeout": 30.0,
            "scalein_timeout": 1.0,
            "scaleout_policy": {
                "WaitQueueRatio": {
                    "wait_ratio": 0.1
                }
            },
            "runtime_config": {
                "graph_sync": false
            }
        }
    },
    "sample_query": {
        "body": {
            "max_tokens": "200",
            "temperature": "0"
        },
        "path": "v1/chat/completions",
        "prompt": "What is in this image?",
        "apiType": "image2text",
        "dataUrl": "https://www.ilankelman.org/stopsigns/australia.jpg",
        "prompts": [],
        "loadingTimeout": 90
    }
}