Spaxiom: Token and Energy Compression - Part 6 of the Technical Series

3. Token- and Energy-Efficient Context Compression

A central claim of this paper is that a Spaxiom + INTENT stack can be drastically more token- and energy-efficient than sending raw sensor logs into LLMs.

Simple token model

Consider:

S: number of sensors.
f: sampling frequency (Hz).
T: time horizon (seconds).
k_value: average tokens needed to represent one numeric sample (including separators, timestamps, etc).

If you naively serialize each reading as text for an LLM, the token count over horizon T is approx:

tokens_raw ≈ S · f · T · k_value

For example:

S = 500 sensors (not unusual for a large floor grid + environment sensors).
f = 10 Hz.
T = 600 s (10 minutes).
k_value ≈ 4 tokens/sample (e.g. "0.83,", plus overhead).

Then:

tokens_raw ≈ 500 · 10 · 600 · 4 = 12,000,000 tokens

Even if you aggressively compress and downsample, you're still in the millions of tokens for a modest time window.

With Spaxiom, the goal is to produce a small set of semantically dense events E over the same horizon T:

Each event may encode a pattern: e.g. "queue length in [5,10) for 3+ minutes", "gait stability deteriorating", "unsafe proximity between robot and human", etc.
Each event is serialized as structured JSON that might cost k_event tokens per event.

Now token cost becomes:

tokens_intent ≈ E · k_event

with E ≪ S f T by design.

If we take:

E = 200 events over the same 10 minutes.
k_event = 40 tokens/event (small JSON objects).

Then:

tokens_intent ≈ 200 · 40 = 8,000 tokens

That is a reduction factor:

reduction ≈ tokens_raw / tokens_intent ≈ 12,000,000 / 8,000 ≈ 1500×

Even if our assumptions are off by an order of magnitude, 100× reductions are very plausible in realistic deployments.

From tokens to energy

Recent work has begun to measure energy per token for LLM inference, with values on the order of a few Joules per token for large models, depending on hardware and optimizations.

Let:

e = Joules/token for a given model + hardware stack (e.g. 3–4 J/token for large models on data center GPUs; much less for smaller models on edge SoCs).

Then the energy cost of feeding a horizon T to a model is:

E_raw = tokens_raw · e
E_intent = tokens_intent · e

Using the numeric example above with e = 3 J/token:

E_raw ≈ 12,000,000 · 3 = 36 × 10⁶ J (~10 kWh).
E_intent ≈ 8,000 · 3 = 24,000 J (~0.0067 kWh).

Again, this is a back-of-the-envelope illustration, but it supports the claim that:

Spaxiom can act as a context compressor for agents, turning raw sensor deluges into compact intent streams that dramatically reduce token (and therefore energy) usage.

Conceptual figure

Figure 1 (Context Compression Curves): Plot tokens vs. time horizon T on a log–log scale. Curve 1 (Raw): tokens_raw(T) ∝ T. Curve 2 (Spaxiom): tokens_intent(T) grows sublinearly or saturates as the number of salient events per unit time plateaus. The gap between the curves widens as T increases, showing how Spaxiom enables long-horizon reasoning for agents without exploding token budgets.

Token and Energy-Efficient Context Compression

About This Section

3. Token- and Energy-Efficient Context Compression

Simple token model

From tokens to energy

Conceptual figure