Information Theory, Rate-Distortion, and Empirical Validation
Joe Scanlin
November 2025
This section provides a rigorous theoretical framework for analyzing Spaxiom's compression, drawing on information theory, rate-distortion theory, and algorithmic compression. It moves beyond the intuitive token-counting arguments to formal analysis.
You'll learn about information-theoretic lower bounds, rate-distortion tradeoffs, semantic distortion metrics, event saturation and sublinear scaling, worst-case failure scenarios, theoretical best-case compression (up to 10,000×), learning optimal event vocabularies, connections to Kolmogorov complexity, and empirical validation from three production deployments showing 64-3286× compression ratios.
The intuitive token-counting arguments in Sections 3.1–3.2 demonstrate order-of-magnitude savings, but they leave open several theoretical questions:
This section provides a more rigorous framework for analyzing Spaxiom's compression, drawing on information theory, rate-distortion theory, and algorithmic compression.
Consider an agent making a sequence of decisions D1, D2, ..., DN over a time horizon T, based on sensor observations X1, X2, ..., XM where M = S · f · T (S sensors, f Hz sampling, T seconds).
By the data processing inequality, any representation Z of the sensor stream (whether raw tokens or Spaxiom events) must satisfy:
where I(·;·) denotes mutual information. This states that any compressed representation Z cannot convey more information about decisions D than the raw observations X.
The minimum description length (in bits, convertible to tokens via tokens ≈ bits / log2(vocab_size)) required to represent sufficient information for decision D is lower-bounded by:
where H(D | context) is the conditional entropy of the decision given any prior context (previous decisions, world model, task specification).
In practice, this lower bound is unattainable because:
Rate-distortion theory formalizes the tradeoff between compression rate (bits or tokens) and reconstruction distortion (decision quality loss).
Let:
The rate-distortion function R(D) defines the minimum rate required to achieve distortion ≤ D:
For Gaussian sources with squared-error distortion, this has a closed form:
where σ² is the source variance. This logarithmic relationship means that each additional bit of compression exponentially increases distortion. Conversely, modest increases in event vocabulary can dramatically improve decision quality.
Spaxiom differs from classical lossy compression (JPEG, MP3) in that the distortion metric is not pixel-error or waveform-error, but decision-relevant semantic loss.
Define a semantic distortion metric:
where D̂ = policy(Z) is the decision made from compressed representation Z, D = policy(X) is the oracle decision from full observations, and ℓ is a task loss (e.g., regret, value gap, safety violations).
Spaxiom's design hypothesis is that:
In other words, for the same token budget, Spaxiom's semantically-aware event compression incurs much lower decision-quality loss than generic compression algorithms optimized for reconstruction error.
Figure 1 (Section 3.3) shows that Spaxiom's token count grows sublinearly with time horizon T, eventually saturating. We can model this mathematically.
Let E(T) be the number of events emitted over horizon T. Assume events are triggered by salient state transitions with rate λ(t) per second. Then:
Key insight: in many real-world domains, salient events occur at a bounded rate that does not scale with sensor count or sampling frequency. Examples:
In steady-state, λ(t) → λ∞, a constant. This implies:
Thus tokensintent ≈ λ∞ · T · kevent, which is linear in T but with a slope determined by event rate, not sensor count.
Contrast with raw sensor tokens: tokensraw ≈ S · f · T · kvalue, linear in both T and S·f.
The compression ratio is:
This ratio is constant in T for large T, meaning Spaxiom provides consistent compression regardless of time horizon. When S·f ≫ λ∞ (many sensors, sparse events), compression can be 100–10,000×.
Spaxiom's compression degrades or fails in several scenarios:
For these cases, we expect:
Compression vanishes. However, these represent a minority of embodied-agent scenarios. Most human-scale environments (buildings, hospitals, warehouses) exhibit the sparse-event structure that Spaxiom exploits.
Assume an idealized scenario:
Then:
This 800× compression is achievable when events are truly sparse. In practice, λ varies across domains:
Resulting compressions range from 10× (dense events) to 10,000× (very sparse events).
The choice of event types and granularity is currently manual (domain expert designs INTENT patterns). Future work could learn optimal event vocabularies via:
Preliminary experiments suggest learned event vocabularies can achieve 1.5–3× better compression than hand-designed ones, at the cost of interpretability.
From an algorithmic information theory perspective, Spaxiom's event stream can be viewed as a succinct program that generates decisions.
Let K(D | X) be the Kolmogorov complexity of decision sequence D given observations X: the length of the shortest program (in bits) that outputs D when given X as input.
Spaxiom's claim is effectively:
That is, the Spaxiom event abstraction preserves the algorithmic information relevant to decisions, despite massive compression of the raw observation stream.
This is analogous to how JPEG preserves the "semantic content" of an image (recognizable objects, scenes) while discarding high-frequency details irrelevant to human perception.
To validate these theoretical arguments, we deployed Spaxiom in three production environments and measured actual token usage:
| Deployment | Sensors | Time Horizon | Raw Tokens | Spaxiom Tokens | Compression |
|---|---|---|---|---|---|
| Hospital ward (elder care) | 120 | 8 hrs | 13.8M | 4.2K | 3286× |
| Retail store (queue mgmt) | 450 | 12 hrs | 77.8M | 128K | 608× |
| Warehouse (safety) | 800 | 10 hrs | 115.2M | 1.8M | 64× |
Key observations:
All three deployments achieve 60–3000× compression, validating the theoretical model. The warehouse case demonstrates that even in dense-event scenarios, Spaxiom provides meaningful token savings.
To summarize the theoretical framework:
This formal analysis grounds the intuitive token-counting arguments from earlier sections and provides a predictive model for when Spaxiom will (and won't) provide compression benefits.