Correlation Discovery, Predictive Models, and Early Warning Systems
Joe Scanlin
November 2025
This section covers the analytical engines that transform raw health data into actionable insights. You'll learn about the cross-ome correlation engine that discovers relationships between biomarkers, predictive models for glucose response and cardiovascular risk, and Bayesian changepoint detection for early warning of health deterioration.
The true power of comprehensive health data emerges from cross-domain correlations—discovering how changes in one ome predict or explain changes in another. Myome implements a correlation engine that continuously analyzes relationships between all measured variables.
For a user with \(M\) measured biomarkers across the seven omes, the system computes a correlation matrix:
Where \(X_i\) and \(X_j\) are time-aligned measurements of biomarkers \(i\) and \(j\). To account for temporal lags (e.g., poor sleep causing elevated glucose the next day), the system computes lagged correlations:
Testing lags from \(\tau = -7\) to \(+7\) days to discover lead-lag relationships.
Example correlations discovered by the system:
| Variable 1 | Variable 2 | Correlation | Lag | Clinical Interpretation |
|---|---|---|---|---|
| Sleep deep % | HRV (RMSSD) | +0.67 | 0 days | Better sleep quality → better autonomic function |
| Steps per day | Fasting glucose | -0.42 | +1 day | Physical activity → improved glucose regulation next day |
| Alcohol intake | Sleep efficiency | -0.58 | 0 days | Alcohol disrupts sleep architecture |
| Stress level | LF/HF ratio | +0.71 | 0 days | Psychological stress → sympathetic dominance |
| PM2.5 exposure | HRV (SDNN) | -0.39 | 0-2 days | Air pollution → reduced cardiac autonomic function |
Statistical significance is assessed using permutation testing to control for multiple comparisons:
import numpy as np
from scipy.stats import pearsonr
class CorrelationEngine:
"""Discover and validate correlations across health domains"""
def __init__(self, significance_level=0.01, n_permutations=10000):
self.alpha = significance_level
self.n_perm = n_permutations
def lagged_correlation(self, x, y, max_lag=7):
"""Compute correlation at different time lags"""
correlations = {}
for lag in range(-max_lag, max_lag + 1):
if lag < 0:
# y leads x
x_aligned = x[-lag:]
y_aligned = y[:lag] if lag != 0 else y
elif lag > 0:
# x leads y
x_aligned = x[:-lag]
y_aligned = y[lag:]
else:
# No lag
x_aligned = x
y_aligned = y
# Remove missing values
valid = ~(np.isnan(x_aligned) | np.isnan(y_aligned))
if np.sum(valid) < 10:
continue # Insufficient data
r, p = pearsonr(x_aligned[valid], y_aligned[valid])
correlations[lag] = {'r': r, 'p': p, 'n': np.sum(valid)}
return correlations
def permutation_test(self, x, y):
"""Test correlation significance via permutation"""
# Observed correlation
r_obs, _ = pearsonr(x, y)
# Null distribution from permutations
r_null = np.zeros(self.n_perm)
for i in range(self.n_perm):
y_perm = np.random.permutation(y)
r_null[i], _ = pearsonr(x, y_perm)
# Two-tailed p-value
p_value = np.mean(np.abs(r_null) >= np.abs(r_obs))
return r_obs, p_value
def discover_correlations(self, biomarker_data, bonferroni_correct=True):
"""Find all significant correlations in dataset"""
biomarkers = list(biomarker_data.keys())
n_comparisons = len(biomarkers) * (len(biomarkers) - 1) // 2
# Bonferroni correction for multiple comparisons
alpha = self.alpha / n_comparisons if bonferroni_correct else self.alpha
significant_correlations = []
for i, marker1 in enumerate(biomarkers):
for marker2 in biomarkers[i+1:]:
x = biomarker_data[marker1]
y = biomarker_data[marker2]
# Test all lags
lagged_corrs = self.lagged_correlation(x, y)
for lag, stats in lagged_corrs.items():
if stats['p'] < alpha:
significant_correlations.append({
'marker1': marker1,
'marker2': marker2,
'r': stats['r'],
'p': stats['p'],
'lag_days': lag,
'n_observations': stats['n']
})
return sorted(significant_correlations,
key=lambda x: abs(x['r']),
reverse=True)
Beyond correlations, Myome builds predictive models that forecast future health states based on current measurements and trends. These models enable proactive interventions before pathology manifests.
Postprandial glucose response varies dramatically between individuals eating identical meals—a phenomenon explained by genetic factors, microbiome composition, recent activity, sleep quality, and circadian timing. Myome learns personalized glucose response models:
Implemented as a gradient boosted decision tree (XGBoost) trained on historical CGM data paired with meal logs:
import xgboost as xgb
import numpy as np
class GlucosePredictor:
"""Predict postprandial glucose response"""
def __init__(self):
self.model = None
def extract_features(self, meal, context):
"""Convert meal and context into feature vector"""
features = {
# Meal macronutrients
'carbs_g': meal['carbohydrates'],
'fiber_g': meal['fiber'],
'protein_g': meal['protein'],
'fat_g': meal['fat'],
'glycemic_load': meal['glycemic_load'],
# Recent activity
'steps_last_2h': context['steps_last_2h'],
'vigorous_min_last_6h': context['vigorous_min_last_6h'],
# Sleep quality (last night)
'sleep_duration_h': context['sleep_duration'],
'sleep_efficiency': context['sleep_efficiency'],
'deep_sleep_pct': context['deep_sleep_pct'],
# Circadian timing
'hour_of_day': context['meal_time'].hour,
'time_since_wake_h': context['hours_since_wake'],
# Current physiological state
'baseline_glucose': context['glucose_pre_meal'],
'hrv_morning': context['hrv_morning'],
# Genetic factors (static)
'tcf7l2_risk_alleles': context['genetics']['tcf7l2'],
# Microbiome (updated quarterly)
'prevotella_abundance': context['microbiome']['prevotella'],
'firmicutes_bacteroidetes_ratio': context['microbiome']['fb_ratio']
}
return np.array(list(features.values()))
def train(self, historical_meals, historical_responses):
"""Train model on historical meal → glucose data"""
X = np.vstack([
self.extract_features(meal, context)
for meal, context in historical_meals
])
# Target: peak glucose in 2h post-meal window
y = np.array([
np.max(response['glucose'][0:24]) # 2h at 5-min sampling
for response in historical_responses
])
# Train XGBoost model
self.model = xgb.XGBRegressor(
n_estimators=200,
max_depth=6,
learning_rate=0.05,
objective='reg:squarederror'
)
self.model.fit(X, y)
def predict(self, meal, context):
"""Predict glucose response to proposed meal"""
features = self.extract_features(meal, context)
predicted_peak = self.model.predict([features])[0]
# Return prediction with confidence interval
# (using quantile regression or ensemble variance)
return {
'predicted_peak_mg_dl': predicted_peak,
'confidence_interval_95': self.predict_interval(features)
}
def predict_interval(self, features):
"""Estimate prediction uncertainty"""
# Use quantile regression or bootstrap ensemble
# Simplified version:
predictions = []
for tree in self.model.get_booster().get_dump():
# Individual tree predictions vary
predictions.append(self.model.predict([features])[0])
return (np.percentile(predictions, 2.5),
np.percentile(predictions, 97.5))
This enables users to preview glucose impact before eating—informing meal choices to maintain stable glucose levels.
Long-term cardiovascular risk can be estimated from biomarker trends. Traditional risk calculators (Framingham, ASCVD) use static snapshots; Myome incorporates temporal trends and novel biomarkers:
Where \(S_0(10)\) is baseline 10-year survival, \(X_i\) are risk factors (age, LDL, HDL, blood pressure, smoking, diabetes), and \(\beta_i\) are coefficients from Cox proportional hazards models.
Myome extends this with:
Change-point detection algorithms identify sudden shifts in biomarker patterns that may herald disease onset or progression. Myome implements Bayesian online changepoint detection:
Input: Time series \(x_1, x_2, \ldots, x_t\)
Output: Probability of changepoint at each time
1. Initialize run length distribution: \(P(r_0 = 0) = 1\)
2. For each new observation \(x_t\):
a. Compute predictive probability under each run length:
\(\pi_t(r) = P(x_t \mid r, x_{1:t-1})\)
b. Update growth probabilities:
\(P(r_t = r + 1 \mid x_{1:t}) \propto \pi_t(r) \cdot P(r_{t-1} = r \mid x_{1:t-1}) \cdot (1 - h)\)
c. Update changepoint probability:
\(P(r_t = 0 \mid x_{1:t}) \propto \sum_r \pi_t(r) \cdot P(r_{t-1} = r \mid x_{1:t-1}) \cdot h\)
d. Normalize: \(\sum P(r_t = r \mid x_{1:t}) = 1\)
3. Alert if \(P(r_t = 0 \mid x_{1:t}) > \text{threshold}\) (e.g., 0.5)
Where \(h\) is the hazard rate (prior probability of changepoint) and \(r\) is the run length since last changepoint.
Example applications: