The Scaling Mythology Problem

Abstract: Modern machine learning has built its identity around scaling: more data, more parameters, more compute. This belief has produced breakthroughs, but it has also hardened into mythology. Scaling laws describe trends within specific regimes, not guarantees of smooth improvement at arbitrary sizes. Under the hood, there are cliffs, plateaus, phase transitions, and regions where error cannot be reduced no matter how many FLOPs are burned. This essay explores the boundaries where scale stops helping and structure, grounding, and environment start to matter.

Modern machine learning has built its identity around a simple idea: if you want better performance, scale the system. Scale the data, scale the parameters, scale the compute, scale the context window, scale the pipeline. Repeat until the curve bends in your favor.

This belief has produced real breakthroughs. But it has also hardened into something closer to mythology. A mythology forms when a pattern holds long enough that people forget the boundary conditions that made it work in the first place.

Scaling laws describe trends that appear within specific regimes. They do not guarantee smooth improvement at arbitrary sizes. Under the hood, there are cliffs, plateaus, phase transitions, and regions where error cannot be reduced no matter how many FLOPs are burned.

Scaling succeeds only in particular regions of the landscape, and we often do not know where those regions end until we cross them.

⸻

1. The Regime Problem

Scaling laws are empirical regularities. They describe how loss shrinks as model size and data size grow. But these curves are conditioned on many hidden variables:

quality and diversity of the training distribution
inductive biases in the architecture
mismatch between pretrain and downstream tasks
irreducible noise in the data
domains that require slow reasoning rather than fast correlation

When any of these shift, the slope of the scaling curve changes. Sometimes it flattens. Sometimes it bends upward. Sometimes it snaps into a new regime that behaves nothing like the previous one.

This is the phase transition problem.
You can follow the gradient until it vanishes.
Scale does not warn you when you are approaching a boundary.

Models that grow smoothly in one domain can fail abruptly in another. This is the first mythology: the belief that scaling is an unbroken continuum.

Regime Boundaries — Figure 1: Regime boundaries—where scaling laws break down. Smooth scaling in Regime 1 gives way to plateaus with diminishing returns, phase transitions into Regime 2 with new behavior, abrupt cliffs of degradation, and eventual ceilings of irreducible error. Scale does not warn you when you are approaching a boundary.

⸻

2. Phase Transitions and Behavioral Shifts

In physics, a system can look stable until a threshold is crossed, after which entirely new behavior emerges. Water remains liquid until the critical point approaches and then rearranges its structure. Magnetic materials realign their internal order suddenly. Neural networks do something similar.

Increase the number of parameters and you may find:

new failure modes
sharp changes in reasoning behavior
memorization spikes
instability in long context windows
surprising blind spots
collapse in out-of-domain generalization

None of these effects show up in smooth extrapolations of the scaling curve. They show up as breaks. They resemble the moment in training when a model suddenly begins to reason algebraically or hallucinate more aggressively. These are not linear improvements. They are structural transitions inside the model's internal geometry.

Scaling gets you to these transitions. It does not tell you what they will be.

Phase Transitions — Figure 2: Phase transitions—discontinuous behavioral shifts. Before threshold: stable, predictable behavior with consistent outputs, smooth gradients, and predictable failures. After crossing threshold: emergent, unexpected behavior including memorization spikes, increased hallucination, context instability, reasoning shifts, and new blind spots. Scaling gets you to these transitions—it does not tell you what they will be.

⸻

3. Limits Imposed by Data

Even infinite compute cannot outrun the limits of data quality.
Some tasks have irreducible error.
Some domains require physical grounding.
Some problems collapse without fine-grained or high dimensional input.

When the world is noisy, ambiguous, or under sampled, scaling encounters a ceiling. Models excel when the training data captures the variability of the environment. When the data does not, the model simply learns to rehearse correlations.

This is one of the reasons I find physical-world sensing important. A building instrumented with a high resolution floor sensor has access to the complete distribution of movement patterns. Nothing is approximated. The data reflects the environment rather than a proxy for it. Physical intelligence relies less on extrapolated correlations and more on grounded evidence.

Physical sensing reshapes the question rather than solving the scaling problem. Data regimes matter as much as model regimes.

⸻

4. Biological Scaling vs Synthetic Scaling

Scaling has a persistent metaphorical shadow: the idea that human intelligence also scales. Larger brains, more neurons, deeper histories. But biological systems are subject to constraints that synthetic models do not face:

metabolic cost
wiring length limitations
developmental schedules
evolutionary pressures
embodied interactions

Brains scale through specialization rather than raw size. They develop hierarchies, modularity, and sparsity. They prune aggressively. They optimize for energy rather than maximum representational capacity. They embed learning inside sensory loops and environmental feedback.

Synthetic models scale by adding parameters in bulk. This produces emergent behavior, but it is not an analogue of biological intelligence. It is a different species of scaling altogether.

Assuming that one mirrors the other is a conceptual error. It encourages the belief that human-like reasoning is an extrapolation of parameter count, rather than a product of structure, embodiment, and constraint.

Biological vs Synthetic Scaling — Figure 3: Biological vs synthetic scaling—different species of intelligence. Biological scaling operates under constraints (metabolic cost, wiring limits, development time, evolution) and uses strategies of specialization, modularity, hierarchies, sparsity, pruning, and energy optimization. Synthetic scaling is freed from physical constraints and uses strategies of adding parameters in bulk, dense connectivity, homogeneous layers, and brute force training. Assuming one mirrors the other is a conceptual error.

⸻

5. Divergent Scaling Strategies

One future for machine intelligence is the familiar one: ever larger models trained on ever larger corpora until we reach diminishing returns. But another future is built on orthogonal axes:

richer modalities
grounded sensors
adaptive memory
hybrid architectures
fine-grained world models
algorithmic innovations rather than brute force

Many scientific domains will not benefit from textual or image scaling alone. They will need models that integrate physical data, sensor-based grounding, simulation feedback, or localized context. They will need architectures that prefer precision over scale, structure over quantity, or reasoning over correlation.

This is where selective sensing can become a force multiplier. A system that understands how humans move inside a building learns something that cannot be scraped, compressed, or hallucinated. It learns through distributional contact with the physical world. Scale in this domain comes from coverage, not from parameter counts.

Physical-world intelligence highlights a different scaling axis that is invisible in purely textual domains.

Divergent Scaling Axes — Figure 4: Divergent scaling axes—multiple pathways to intelligence. Intelligence is not a single axis. Beyond traditional parameter scaling, there are orthogonal dimensions: grounded sensing (physical sensors, real-world data), multimodal integration (cross-modal fusion), architectural innovation (sparse networks, modular systems), precision and specialization (task-specific design), adaptive memory (dynamic context), world models (physics-based reasoning), and efficiency (few-shot learning). Every domain has different scaling axes—only some involve parameter count.

⸻

6. The Resource Allocation Trap

Belief in universal scaling produces a resource allocation skew:

compute budgets rise faster than data quality improves
research prioritizes size over algorithmic insight
architectures drift toward generality even when niche design would perform better
organizations treat scaling as inevitability rather than choice

This produces a world where many teams pour effort into larger systems that are only incrementally better, while novel architectures, grounded sensing, or domain-specific models receive less attention. It narrows the imagination.

The mythology is that scale always justifies itself.

⸻

7. Intelligence Beyond Quantity

If scale is not the universal path, then what is intelligence. It might be:

the ability to choose the right representation for a task
the ability to compress without losing the wrong structure
the ability to integrate many modalities coherently
the ability to reason across sparse and rare cases
the ability to adapt to shifting distributions
the ability to act on evidence rather than correlation
the ability to update efficiently from small signals

These are qualities that do not increase monotonically with size. They depend on architecture, grounding, data geometry, training objectives, and interaction loops.

This is where physical sensors offer a contrasting intuition. When a building tracks millions of micro events across its surfaces, the intelligence does not come from massive parameters. It comes from accurate signals, adaptive loops, and long-term pattern discovery. Scale emerges through depth of contact with the world, not through brute expansion.

⸻

8. The Bigger Question

What happens when the field discovers that scale is only one route to intelligence.
What happens when the most powerful systems are not the largest, but the best structured.
What happens when grounded data or fine-grained sensing becomes the limiting factor rather than FLOPs.
What happens when intelligence becomes a design problem instead of an extrapolation problem.

These are not anti-scaling arguments. They are boundary arguments. They point to the fact that every domain has different scaling axes, and only some of them involve parameter count.

The danger of mythology lies in partial truth mistaken for universal truth.

⸻

Closing Thought

Scaling has advanced machine learning dramatically, but it is not an infinite ladder. It is a region of a larger landscape, and we are only beginning to map the rest. Models will grow. Models will shrink. Some will become multimodal. Some will become grounded. Some will become specialized. Some will refuse to scale at all because the domain demands precision, structure, or real-world contact.

The future of intelligence remains unsettled.
The future of scale remains uncertain.
And the most interesting progress may come from the places where size stops helping and architecture, grounding, and environment start to matter.

Author:	Joe Scanlin
Date:	December 2025
Type:	Essay
Topics:	AI, Machine Learning, Scaling, Intelligence