Myome: Health Analytics and Prediction - Part 5 of the Technical Series

5. Health Analytics and Prediction

5.1 Cross-Ome Correlation Engine

The true power of comprehensive health data emerges from cross-domain correlations—discovering how changes in one ome predict or explain changes in another. Myome implements a correlation engine that continuously analyzes relationships between all measured variables.

For a user with \(M\) measured biomarkers across the seven omes, the system computes a correlation matrix:

\[ C_{ij} = \text{corr}(X_i, X_j) \quad \text{for all pairs } i \neq j \] (13)

Where \(X_i\) and \(X_j\) are time-aligned measurements of biomarkers \(i\) and \(j\). To account for temporal lags (e.g., poor sleep causing elevated glucose the next day), the system computes lagged correlations:

\[ C_{ij}(\tau) = \text{corr}(X_i(t), X_j(t + \tau)) \] (14)

Testing lags from \(\tau = -7\) to \(+7\) days to discover lead-lag relationships.

Example correlations discovered by the system:

Variable 1	Variable 2	Correlation	Lag	Clinical Interpretation
Sleep deep %	HRV (RMSSD)	+0.67	0 days	Better sleep quality → better autonomic function
Steps per day	Fasting glucose	-0.42	+1 day	Physical activity → improved glucose regulation next day
Alcohol intake	Sleep efficiency	-0.58	0 days	Alcohol disrupts sleep architecture
Stress level	LF/HF ratio	+0.71	0 days	Psychological stress → sympathetic dominance
PM2.5 exposure	HRV (SDNN)	-0.39	0-2 days	Air pollution → reduced cardiac autonomic function

Statistical significance is assessed using permutation testing to control for multiple comparisons:

import numpy as np
            from scipy.stats import pearsonr

            class CorrelationEngine:
                """Discover and validate correlations across health domains"""

                def __init__(self, significance_level=0.01, n_permutations=10000):
                    self.alpha = significance_level
                    self.n_perm = n_permutations

                def lagged_correlation(self, x, y, max_lag=7):
                    """Compute correlation at different time lags"""
                    correlations = {}

                    for lag in range(-max_lag, max_lag + 1):
                        if lag < 0:
                            # y leads x
                            x_aligned = x[-lag:]
                            y_aligned = y[:lag] if lag != 0 else y
                        elif lag > 0:
                            # x leads y
                            x_aligned = x[:-lag]
                            y_aligned = y[lag:]
                        else:
                            # No lag
                            x_aligned = x
                            y_aligned = y

                        # Remove missing values
                        valid = ~(np.isnan(x_aligned) | np.isnan(y_aligned))

                        if np.sum(valid) < 10:
                            continue  # Insufficient data

                        r, p = pearsonr(x_aligned[valid], y_aligned[valid])
                        correlations[lag] = {'r': r, 'p': p, 'n': np.sum(valid)}

                    return correlations

                def permutation_test(self, x, y):
                    """Test correlation significance via permutation"""
                    # Observed correlation
                    r_obs, _ = pearsonr(x, y)

                    # Null distribution from permutations
                    r_null = np.zeros(self.n_perm)
                    for i in range(self.n_perm):
                        y_perm = np.random.permutation(y)
                        r_null[i], _ = pearsonr(x, y_perm)

                    # Two-tailed p-value
                    p_value = np.mean(np.abs(r_null) >= np.abs(r_obs))

                    return r_obs, p_value

                def discover_correlations(self, biomarker_data, bonferroni_correct=True):
                    """Find all significant correlations in dataset"""
                    biomarkers = list(biomarker_data.keys())
                    n_comparisons = len(biomarkers) * (len(biomarkers) - 1) // 2

                    # Bonferroni correction for multiple comparisons
                    alpha = self.alpha / n_comparisons if bonferroni_correct else self.alpha

                    significant_correlations = []

                    for i, marker1 in enumerate(biomarkers):
                        for marker2 in biomarkers[i+1:]:
                            x = biomarker_data[marker1]
                            y = biomarker_data[marker2]

                            # Test all lags
                            lagged_corrs = self.lagged_correlation(x, y)

                            for lag, stats in lagged_corrs.items():
                                if stats['p'] < alpha:
                                    significant_correlations.append({
                                        'marker1': marker1,
                                        'marker2': marker2,
                                        'r': stats['r'],
                                        'p': stats['p'],
                                        'lag_days': lag,
                                        'n_observations': stats['n']
                                    })

                    return sorted(significant_correlations,
                                 key=lambda x: abs(x['r']),
                                 reverse=True)

5.2 Predictive Health Models

Beyond correlations, Myome builds predictive models that forecast future health states based on current measurements and trends. These models enable proactive interventions before pathology manifests.

Glucose Response Prediction

Postprandial glucose response varies dramatically between individuals eating identical meals—a phenomenon explained by genetic factors, microbiome composition, recent activity, sleep quality, and circadian timing. Myome learns personalized glucose response models:

\[ \text{Glucose Peak} = f(\text{meal}, \text{microbiome}, \text{activity}, \text{sleep}, \text{time of day}, \text{genetics}) \] (15)

Implemented as a gradient boosted decision tree (XGBoost) trained on historical CGM data paired with meal logs:

import xgboost as xgb
            import numpy as np

            class GlucosePredictor:
                """Predict postprandial glucose response"""

                def __init__(self):
                    self.model = None

                def extract_features(self, meal, context):
                    """Convert meal and context into feature vector"""
                    features = {
                        # Meal macronutrients
                        'carbs_g': meal['carbohydrates'],
                        'fiber_g': meal['fiber'],
                        'protein_g': meal['protein'],
                        'fat_g': meal['fat'],
                        'glycemic_load': meal['glycemic_load'],

                        # Recent activity
                        'steps_last_2h': context['steps_last_2h'],
                        'vigorous_min_last_6h': context['vigorous_min_last_6h'],

                        # Sleep quality (last night)
                        'sleep_duration_h': context['sleep_duration'],
                        'sleep_efficiency': context['sleep_efficiency'],
                        'deep_sleep_pct': context['deep_sleep_pct'],

                        # Circadian timing
                        'hour_of_day': context['meal_time'].hour,
                        'time_since_wake_h': context['hours_since_wake'],

                        # Current physiological state
                        'baseline_glucose': context['glucose_pre_meal'],
                        'hrv_morning': context['hrv_morning'],

                        # Genetic factors (static)
                        'tcf7l2_risk_alleles': context['genetics']['tcf7l2'],

                        # Microbiome (updated quarterly)
                        'prevotella_abundance': context['microbiome']['prevotella'],
                        'firmicutes_bacteroidetes_ratio': context['microbiome']['fb_ratio']
                    }

                    return np.array(list(features.values()))

                def train(self, historical_meals, historical_responses):
                    """Train model on historical meal → glucose data"""
                    X = np.vstack([
                        self.extract_features(meal, context)
                        for meal, context in historical_meals
                    ])

                    # Target: peak glucose in 2h post-meal window
                    y = np.array([
                        np.max(response['glucose'][0:24])  # 2h at 5-min sampling
                        for response in historical_responses
                    ])

                    # Train XGBoost model
                    self.model = xgb.XGBRegressor(
                        n_estimators=200,
                        max_depth=6,
                        learning_rate=0.05,
                        objective='reg:squarederror'
                    )

                    self.model.fit(X, y)

                def predict(self, meal, context):
                    """Predict glucose response to proposed meal"""
                    features = self.extract_features(meal, context)
                    predicted_peak = self.model.predict([features])[0]

                    # Return prediction with confidence interval
                    # (using quantile regression or ensemble variance)
                    return {
                        'predicted_peak_mg_dl': predicted_peak,
                        'confidence_interval_95': self.predict_interval(features)
                    }

                def predict_interval(self, features):
                    """Estimate prediction uncertainty"""
                    # Use quantile regression or bootstrap ensemble
                    # Simplified version:
                    predictions = []
                    for tree in self.model.get_booster().get_dump():
                        # Individual tree predictions vary
                        predictions.append(self.model.predict([features])[0])

                    return (np.percentile(predictions, 2.5),
                            np.percentile(predictions, 97.5))

This enables users to preview glucose impact before eating—informing meal choices to maintain stable glucose levels.

Cardiovascular Event Risk Prediction

Long-term cardiovascular risk can be estimated from biomarker trends. Traditional risk calculators (Framingham, ASCVD) use static snapshots; Myome incorporates temporal trends and novel biomarkers:

10-Year CVD Risk (Cox Proportional Hazards):

\[ P(\text{CVD}) = 1 - S_0(10)^{\exp(\sum \beta_i X_i - \sum \beta_i \bar{X}_i)} \] (16)

Where \(S_0(10)\) is baseline 10-year survival, \(X_i\) are risk factors (age, LDL, HDL, blood pressure, smoking, diabetes), and \(\beta_i\) are coefficients from Cox proportional hazards models.

Myome extends this with:

Apolipoprotein B - Better predictor than LDL cholesterol (HR per SD: 1.38 vs 1.25)
Lipoprotein(a) - Independent genetic risk factor (HR > 50 mg/dL: 1.47)
HRV trends - Declining SDNN indicates autonomic dysfunction
Coronary calcium score - Direct measure of atherosclerotic burden
Epigenetic age acceleration - GrimAge > chronological age predicts mortality

5.3 Early Warning Systems

Change-point detection algorithms identify sudden shifts in biomarker patterns that may herald disease onset or progression. Myome implements Bayesian online changepoint detection:

Algorithm 2: Bayesian Online Changepoint Detection

Input: Time series \(x_1, x_2, \ldots, x_t\)

Output: Probability of changepoint at each time

1. Initialize run length distribution: \(P(r_0 = 0) = 1\)

2. For each new observation \(x_t\):

a. Compute predictive probability under each run length:

\(\pi_t(r) = P(x_t \mid r, x_{1:t-1})\)

b. Update growth probabilities:

\(P(r_t = r + 1 \mid x_{1:t}) \propto \pi_t(r) \cdot P(r_{t-1} = r \mid x_{1:t-1}) \cdot (1 - h)\)

c. Update changepoint probability:

\(P(r_t = 0 \mid x_{1:t}) \propto \sum_r \pi_t(r) \cdot P(r_{t-1} = r \mid x_{1:t-1}) \cdot h\)

d. Normalize: \(\sum P(r_t = r \mid x_{1:t}) = 1\)

3. Alert if \(P(r_t = 0 \mid x_{1:t}) > \text{threshold}\) (e.g., 0.5)

Where \(h\) is the hazard rate (prior probability of changepoint) and \(r\) is the run length since last changepoint.

Example applications: