Modern machine learning has built its identity around a simple idea: if you want better performance, scale the system. Scale the data, scale the parameters, scale the compute, scale the context window, scale the pipeline. Repeat until the curve bends in your favor.
This belief has produced real breakthroughs. But it has also hardened into something closer to mythology. A mythology forms when a pattern holds long enough that people forget the boundary conditions that made it work in the first place.
Scaling laws describe trends that appear within specific regimes. They do not guarantee smooth improvement at arbitrary sizes. Under the hood, there are cliffs, plateaus, phase transitions, and regions where error cannot be reduced no matter how many FLOPs are burned.
Scaling succeeds only in particular regions of the landscape, and we often do not know where those regions end until we cross them.
⸻
1. The Regime Problem
Scaling laws are empirical regularities. They describe how loss shrinks as model size and data size grow. But these curves are conditioned on many hidden variables:
- quality and diversity of the training distribution
- inductive biases in the architecture
- mismatch between pretrain and downstream tasks
- irreducible noise in the data
- domains that require slow reasoning rather than fast correlation
When any of these shift, the slope of the scaling curve changes. Sometimes it flattens. Sometimes it bends upward. Sometimes it snaps into a new regime that behaves nothing like the previous one.
This is the phase transition problem.
You can follow the gradient until it vanishes.
Scale does not warn you when you are approaching a boundary.
Models that grow smoothly in one domain can fail abruptly in another. This is the first mythology: the belief that scaling is an unbroken continuum.
⸻
2. Phase Transitions and Behavioral Shifts
In physics, a system can look stable until a threshold is crossed, after which entirely new behavior emerges. Water remains liquid until the critical point approaches and then rearranges its structure. Magnetic materials realign their internal order suddenly. Neural networks do something similar.
Increase the number of parameters and you may find:
- new failure modes
- sharp changes in reasoning behavior
- memorization spikes
- instability in long context windows
- surprising blind spots
- collapse in out-of-domain generalization
None of these effects show up in smooth extrapolations of the scaling curve. They show up as breaks. They resemble the moment in training when a model suddenly begins to reason algebraically or hallucinate more aggressively. These are not linear improvements. They are structural transitions inside the model's internal geometry.
Scaling gets you to these transitions. It does not tell you what they will be.
⸻
3. Limits Imposed by Data
Even infinite compute cannot outrun the limits of data quality.
Some tasks have irreducible error.
Some domains require physical grounding.
Some problems collapse without fine-grained or high dimensional input.
When the world is noisy, ambiguous, or under sampled, scaling encounters a ceiling. Models excel when the training data captures the variability of the environment. When the data does not, the model simply learns to rehearse correlations.
This is one of the reasons I find physical-world sensing important. A building instrumented with a high resolution floor sensor has access to the complete distribution of movement patterns. Nothing is approximated. The data reflects the environment rather than a proxy for it. Physical intelligence relies less on extrapolated correlations and more on grounded evidence.
Physical sensing reshapes the question rather than solving the scaling problem. Data regimes matter as much as model regimes.
⸻
4. Biological Scaling vs Synthetic Scaling
Scaling has a persistent metaphorical shadow: the idea that human intelligence also scales. Larger brains, more neurons, deeper histories. But biological systems are subject to constraints that synthetic models do not face:
- metabolic cost
- wiring length limitations
- developmental schedules
- evolutionary pressures
- embodied interactions
Brains scale through specialization rather than raw size. They develop hierarchies, modularity, and sparsity. They prune aggressively. They optimize for energy rather than maximum representational capacity. They embed learning inside sensory loops and environmental feedback.
Synthetic models scale by adding parameters in bulk. This produces emergent behavior, but it is not an analogue of biological intelligence. It is a different species of scaling altogether.
Assuming that one mirrors the other is a conceptual error. It encourages the belief that human-like reasoning is an extrapolation of parameter count, rather than a product of structure, embodiment, and constraint.
⸻
5. Divergent Scaling Strategies
One future for machine intelligence is the familiar one: ever larger models trained on ever larger corpora until we reach diminishing returns. But another future is built on orthogonal axes:
- richer modalities
- grounded sensors
- adaptive memory
- hybrid architectures
- fine-grained world models
- algorithmic innovations rather than brute force
Many scientific domains will not benefit from textual or image scaling alone. They will need models that integrate physical data, sensor-based grounding, simulation feedback, or localized context. They will need architectures that prefer precision over scale, structure over quantity, or reasoning over correlation.
This is where selective sensing can become a force multiplier. A system that understands how humans move inside a building learns something that cannot be scraped, compressed, or hallucinated. It learns through distributional contact with the physical world. Scale in this domain comes from coverage, not from parameter counts.
Physical-world intelligence highlights a different scaling axis that is invisible in purely textual domains.
⸻
6. The Resource Allocation Trap
Belief in universal scaling produces a resource allocation skew:
- compute budgets rise faster than data quality improves
- research prioritizes size over algorithmic insight
- architectures drift toward generality even when niche design would perform better
- organizations treat scaling as inevitability rather than choice
This produces a world where many teams pour effort into larger systems that are only incrementally better, while novel architectures, grounded sensing, or domain-specific models receive less attention. It narrows the imagination.
The mythology is that scale always justifies itself.
⸻
7. Intelligence Beyond Quantity
If scale is not the universal path, then what is intelligence. It might be:
- the ability to choose the right representation for a task
- the ability to compress without losing the wrong structure
- the ability to integrate many modalities coherently
- the ability to reason across sparse and rare cases
- the ability to adapt to shifting distributions
- the ability to act on evidence rather than correlation
- the ability to update efficiently from small signals
These are qualities that do not increase monotonically with size. They depend on architecture, grounding, data geometry, training objectives, and interaction loops.
This is where physical sensors offer a contrasting intuition. When a building tracks millions of micro events across its surfaces, the intelligence does not come from massive parameters. It comes from accurate signals, adaptive loops, and long-term pattern discovery. Scale emerges through depth of contact with the world, not through brute expansion.
⸻
8. The Bigger Question
What happens when the field discovers that scale is only one route to intelligence.
What happens when the most powerful systems are not the largest, but the best structured.
What happens when grounded data or fine-grained sensing becomes the limiting factor rather than FLOPs.
What happens when intelligence becomes a design problem instead of an extrapolation problem.
These are not anti-scaling arguments. They are boundary arguments. They point to the fact that every domain has different scaling axes, and only some of them involve parameter count.
The danger of mythology lies in partial truth mistaken for universal truth.
⸻
Closing Thought
Scaling has advanced machine learning dramatically, but it is not an infinite ladder. It is a region of a larger landscape, and we are only beginning to map the rest. Models will grow. Models will shrink. Some will become multimodal. Some will become grounded. Some will become specialized. Some will refuse to scale at all because the domain demands precision, structure, or real-world contact.
The future of intelligence remains unsettled.
The future of scale remains uncertain.
And the most interesting progress may come from the places where size stops helping and architecture, grounding, and environment start to matter.