Before
Original PromptWhen you send a prompt to a large language model, not every token actually carries
equal weight. Some words are highly predictable from context-articles,
prepositions, and filler phrases that the model can already infer on its own.
SemanticZip uses a lightweight logprob-based signal to score each token and
determine whether it contributes real meaning or is just structural padding.
Tokens with high predictability-ones the model would guess anyway-are
safely removed without changing the downstream behavior of the model.
The result: a compressed prompt that preserves intent, keeps semantic anchors,
and delivers the same output quality at 30-75% fewer tokens-saving real
money on every API call you make.