Abstract
NPJ Syst Biol Appl. 2026 Jan 10. doi: 10.1038/s41540-025-00644-5. Online ahead of print.
ABSTRACT
Metabolomics data are often generated through different platforms and quantification methods which makes their synthesis and large-scale replication challenging. This study developed an ensemble of importance-weighted autoencoders to perform cross-platform metabolomics imputation between two metabolomics platforms, Metabolon and National Phenome Centre (NPC) at Imperial College, using 979 samples from the Airwave Health Monitoring Study. The generated samples were highly correlated with real values across all metabolites (µρ = 0.61 (0.55-0.67)). The well-imputed subset contained 199 metabolites (22%), capturing ≥ 55% variance (R² ≥ 0.55) with minimal uncertainty (R² variance ≤ 0.025), including 43 metabolites unique to Metabolon. The concordance of associations in 2,971 validation samples between real and imputed metabolites with two clinical outcomes, body mass index (BMI) and C-reactive protein (CRP), were highly correlated (ρBMI = 0.93; ρCRP = 0.89) with minimal mean difference (BMI µΔ = 0.005 (0.04); CRP µΔ = 0.005 (0.04)). Similar concordance occurred with equivalent UK Biobank (BMI µΔ = -0.007 (0.05); CRP µΔ = 0.01 (0.04)) and NPC (BMI µΔ = -0.013 (0.04); CRP µΔ = -0.019 (0.04)) metabolites. This methodological innovation offers a scalable and accurate method for cross-platform imputation, enabling the aggregation of metabolomics data from different epidemiological studies for replication and meta-analyses.
PMID:41513683 | DOI:10.1038/s41540-025-00644-5
UK DRI Authors