LIVE COMPRESSION STREAM

Stop Paying For Predictable Tokens.

SemanticZip compresses prompts before they hit your model. Watch requests move from compressing to finished in real time, with exact token savings per request.

Live feed updates continuously to show active compression status and saved-token impact.

Total Money Saved

$0.00

worth 0 tokens saved

Total Tokens Saved

0

all-time saved tokens

Finished

0

requests finished till now

Live Compression Requests

request_idstatustokens_saved

Loading latest requests...

Compression That Still Communicates

Watch the "After" panel auto-remove low-signal tokens while preserving the core meaning.

Before

Original Prompt

When you send a prompt to a large language model, not every token actually carries

equal weight. Some words are highly predictable from context-articles,

prepositions, and filler phrases that the model can already infer on its own.

SemanticZip uses a lightweight logprob-based signal to score each token and

determine whether it contributes real meaning or is just structural padding.

Tokens with high predictability-ones the model would guess anyway-are

safely removed without changing the downstream behavior of the model.

The result: a compressed prompt that preserves intent, keeps semantic anchors,

and delivers the same output quality at 30-75% fewer tokens-saving real

money on every API call you make.

After

SemanticZip Compression

When you send a prompt to a large language model, not every token actually carries

equal weight. Some words are highly predictable from context-articles,

prepositions, and filler phrases that the model can already infer on its own.

SemanticZip uses a lightweight logprob-based signal to score each token and

determine whether it contributes real meaning or is just structural padding.

Tokens with high predictability-ones the model would guess anyway-are

safely removed without changing the downstream behavior of the model.

The result: a compressed prompt that preserves intent, keeps semantic anchors,

and delivers the same output quality at 30-75% fewer tokens-saving real

money on every API call you make.

Auto-runs when this section enters view0% now hidden - target 44%

Built for Production APIs

Signal-Based Filtering

Uses logprob signals to remove easy-to-predict tokens while preserving semantic anchors.

Clear Unit Pricing

Future billing is simple: $0.05 per 1M saved tokens. During beta, usage is unlimited.

Drop-In Endpoint

Keep your existing LLM stack and place SemanticZip as a single preprocessing call in front of it.

How Teams Use SemanticZip

Chat Products

Fit more conversational history into fixed context windows.

How Teams Use SemanticZip

Document Pipelines

Reduce long-document bloat before retrieval, reasoning, or extraction.

How Teams Use SemanticZip

Agent Workflows

Lower per-step token burn across multi-tool and multi-turn agent execution.

Pricing

Unlimited today. Transparent usage pricing when billing is enabled.

CURRENT PLAN

Beta Unlimited

$0 / month

Unlimited API calls
Unlimited saved tokens
Full dashboard access

FUTURE BILLING

Usage Plan

$0.05 / 1M saved tokens

You pay only on tokens SemanticZip removes
No setup fee
No commitment

Docs

Two integration paths: OpenClaw plugin or direct API calls.

OpenClaw Plugin

Install and configure SemanticZip inside OpenClaw.

openclaw plugins install @ben_labs/semanticzip
export SEMANTICZIP_API_KEY="your_api_key_here"
openclaw gateway restart

Direct API

Call the compression endpoint directly from your app.

curl -X POST https://api.semanticzip.com/v1/compress \
  -H "Content-Type: application/json" \
  -H "X-API-Key: your_api_key" \
  -d '{
    "text": "Your long prompt here...",
    "threshold": -2.0
  }'