Information Loss in Representation Space

Abstract: In machine learning, context has physical form: it lives in vector spaces, gradients, and attention maps. When we embed text, images, or sensor data, we compress something rich and situated into finite geometry. That compression is powerful but dangerous. Representations that look clean to models can hide fractures that matter to humans. Two situations worlds apart can land as near neighbors in embedding space. This essay explores the physics of context collapse: how high dimensional reality becomes distorted when forced through narrow representational channels, and what we can do about it.

In machine learning, context has a physical form. It lives in vector spaces, gradients, activations, and attention maps. Whenever we embed text, images, time series, or sensor data, we take something rich and situated and press it into a finite geometry so that a model can compute with it.

That compression is powerful. It is also dangerous.

Representations that look clean to a model can hide fractures that still matter to humans. Two situations that feel worlds apart can land as near neighbors in an embedding space. Two groups that ought to be distinguishable can blur into one cluster. A subtle warning signal can disappear under a projection that was tuned for something else.

A more general framing is this: high dimensional reality becomes distorted when it is forced through a narrow representational channel.

The familiar fact is that embeddings are compressions. The deeper question is how much structure they discard, and whether that loss can be understood in a principled way.

⸻

1. Context as a High Dimensional Object

Take a simple example: a single sentence in a building operations log.

"Resident almost slipped near east stair after rain, but caught rail."

Semantically, that sentence carries many dimensions of context:

physical layout: which stair, which side, which entrance
conditions: weather, time of day, surface material
human factors: age, mobility, prior incidents, staff presence
uncertainty: "almost," "near," "after"

To a human, this represents a slice through a huge latent space of architecture, risk, human behavior, and environment.

Now feed it through a standard text encoder. The output is a single vector, perhaps 768 or 4096 dimensions. That vector lives in a space trained to support tasks like next token prediction, similarity search, or classification.

The model does not create a neutral map of meaning. It creates a map optimized for its training objectives.

Context collapse begins the moment one geometry replaces all others.

Figure 1: High dimensional context collapse. Rich multi-dimensional context (physical layout, conditions, human factors, uncertainty) enters an encoder optimized for training objectives. The output is a single flat vector where many contextual dimensions are compressed away. Context collapse begins the moment one geometry replaces all others.

⸻

2. Embeddings as Projections

Mathematically, an embedding is a projection from an extremely high dimensional object to a lower dimensional manifold with some constraints:

nearby points correspond to "similar" items under a chosen metric
linear relationships approximate certain compositional behaviors
distances support retrieval or clustering

This is useful because raw context is unruly. You cannot run gradient descent directly on entire histories of human interaction, building telemetry, or cultural background. You need some compressed intermediate.

Compression itself works fine—the problem emerges when we forget which dimensions we chose to keep.

Every projection makes a decision:

which features are treated as signal
which features are treated as noise
which symmetries we enforce
which distinctions we allow to vanish

In practice, those decisions are baked into:

the model architecture
the loss function
the training distribution
the negative sampling strategy
the augmentation pipeline

Most systems never expose those choices at the level where downstream users make decisions.

So we treat the embedding as if it were "the" representation, when it is only one slice through a much larger latent object.

⸻

3. Information Bottlenecks and Irreversible Loss

The Information Bottleneck principle frames learning as a tradeoff: compress input as much as possible while preserving information relevant to a target.

where X is the input, Z the representation, and Y the target.

From this point of view, context collapse functions as a feature, not a bug: the representation willingly discards any structure in X that does not help predict Y.

This becomes worrying when:

Y is narrow (e.g. click prediction, next token, short-term reward)
downstream decisions care about variables invisible to that objective

You get a representation that is extremely good at serving the training task and potentially blind to axes that matter ethically, operationally, or scientifically.

The physics analogy is useful here. Compression behaves like a lossy transformation. You can never fully reconstruct the original context from Z. At best, you can approximate certain aspects, and those aspects were chosen long before a designer reaches for the embedding in an application.

Information Bottleneck Tradeoff — Figure 2: Information bottleneck tradeoff. Input X flows through representation Z to target Y. The system minimizes I(X;Z) (compression) while maximizing I(Z;Y) (preserve relevance to target). What gets discarded: ethical dimensions, rare patterns, safety signals, minority distinctions. What gets kept: training task features, common patterns, majority representations. Context collapse functions as a feature: the compression choosing what to forget.

⸻

4. Vector Spaces Are Tilted

We often talk about embeddings as if they sat inside an abstract, neutral geometry where similarity has an intuitive meaning.

In reality, these spaces are tilted by:

frequency effects in the corpus
cultural and linguistic biases
skewed negative examples
incomplete coverage of rare cases

Imagine an office building where almost all training data comes from weekday daytime patterns. Night shift behavior, weekend use, and rare events will appear as statistical outliers. A representation trained on typical behavior will compress these tails more aggressively, folding them into nearby majority patterns.

To the model, that is harmless regularization.
To a safety engineer, that might erase exactly the patterns that predict harm.

The same logic applies to social data, medical data, financial flows, and any domain with skewed participation. Underrepresented groups and edge cases sit on regions of the manifold that receive less modeling capacity and less local structure.

The physics of context collapse here is about curvature. Some regions get smooth, detailed geometry. Others collapse into nearly flat patches where many distinct realities map to nearly identical vectors.

Tilted Vector Spaces — Figure 3: Tilted vector spaces. Left: High modeling capacity region (majority patterns like daytime, weekdays) shows smooth, detailed manifold with well-separated points. Right: Low modeling capacity region (minority patterns like nighttime, weekends, rare events) shows flat, collapsed manifold where many distinct realities map to nearly identical vectors. Vector spaces are tilted—some regions get rich geometry, others get compressed.

⸻

5. Context Collapse in Retrieval Systems

Retrieval augmented systems rely heavily on embeddings. Long histories are chunked. Chunks are embedded. At query time, vectors near the query embedding are pulled back into the context window.

Every design choice in that pipeline contributes to context collapse:

chunking strategy (by tokens, paragraphs, scenes, time windows)
pooling mechanism (CLS token, mean pooling, learned head)
similarity metric (cosine, dot product, learned scorer)
re-ranking and filtering

Consider a multimodal research tool in a hospital or a senior living facility. A free text query like:

"Find all incidents where a resident almost fell near a stair after weather changes."

relies on an embedding space that preserves:

"almost" as distinct from actual falls
location semantics
environmental factors such as rain or snow
near events and weak signals

A space trained for general semantic similarity might cluster "fall," "slip," and "trip" together in ways that distort risk analysis. Weak indicators like "caught themselves," "stumbled," or "grabbed rail" can be washed out by stronger keywords.

Chunking adds another layer of distortion. If near-fall phrases are spread across different slices, none of which dominate similarity, retrieval can miss the pattern entirely.

From the outside, the system uses embeddings and supports semantic search.
Inside, context has collapsed along the axes that matter most for prevention.

⸻

6. Can We Recover Lost Truth

If compression is lossy, can we ever get the missing context back. Not exactly. But we can design systems that treat representational loss more honestly and sometimes mitigate it.

Some directions that help:

1. Multi view representations
Instead of a single embedding space, maintain several, each tuned to different aspects:

temporal patterns
physical layout
risk labels
social or organizational structure

Queries operate across views, and disagreements between views become diagnostic signals.

2. Preserve raw structure alongside embeddings
Graphs, sequences, and spatial layouts carry structure that a flat vector cannot hold. For building data, that might mean:

floor plan graphs
room connectivity
sensor adjacency
time series relationships

Use embeddings as a fast index, but keep the graph as a first-class citizen.

3. Explicitly model tails and minorities
Reserve modeling capacity for rare but important cases:

oversample near-fall events and weak signals
build representation heads for risk surfaces
track local density to avoid collapsing sparse regions

4. Calibrate for the intended decision
Evaluate representations not only on training losses but on downstream goals such as safety, fairness, and long range stability.

5. Expose uncertainty and blind spots to users
Show where the manifold is well supported by data and where it is extrapolating.

Multi-View Recovery Strategy — Figure 4: Multi-view recovery strategy. Top: Traditional single embedding space collapses all dimensions (temporal, spatial, risk, social) into one geometry, losing context. Bottom: Adaptive multi-view approach maintains separate embedding spaces, each preserving different structure. Queries operate across all views, and disagreements between views become diagnostic signals. Complementary geometries preserve what single embeddings lose.

⸻

7. Context Collapse as a Design Variable

Context collapse is unavoidable. Any system that compresses reality must discard information. The real question is whether we treat that fact as an afterthought or as a central concern.

For a technical audience, that suggests a shift in mindset:

From "Does the embedding work on the benchmark"
to "Which dimensions of reality did we choose to forget, and who is affected by that choice"
From "Can we represent everything in one space"
to "Which complementary geometries are needed to preserve important structure"
From "How do we make similarity search fast"
to "How do we protect rare but critical patterns from being submerged"

The physics of context collapse is a reminder that every representation is a theory of relevance. Embeddings encode judgments about which aspects of the world survive compression, functioning as more than numerical artifacts.

⸻

Closing Thought

Scientists and engineers have become very good at building models that compress. The next step is to become equally good at reasoning about what those compressions erase.

In a world of vectorized text, cities, bodies, and behaviors, context collapse extends beyond social media—it's a geometric phenomenon written into the spaces where our models live.

If we want these systems to support real understanding rather than polished illusions, we will have to take representation geometry as seriously as we take loss curves and benchmarks.

The signal we keep is important.
The context we lose may be even more so.

Author:	Joe Scanlin
Date:	December 2025
Type:	Essay
Topics:	AI, Machine Learning, Information Theory