Using MR-SPI and AlphaFold3, scientists are unraveling the molecular underpinnings of Alzheimer’s disease and identifying key protein changes that could reshape future treatments.
Study: Deciphering proteins in Alzheimer’s disease: a novel Mendelian randomization method integrated with AlphaFold3 for 3D structure prediction. Image credits: Shutterstock AI
This is evident from a recent study published in the journal Cell genomicsa group of researchers developed Mendelian Randomization with Selection and Post-selection Inference (MR-SPI), a method integrated with AlphaFold3, to identify causal protein biomarkers and structural changes in Alzheimer’s disease.
Background
Alzheimer’s disease (AD), the leading cause of dementia worldwide, poses a significant healthcare challenge, with its etiology and pathogenesis remaining unclear. Current therapies that target the production or aggregation of amyloid beta (Aβ) provide only symptomatic relief and fail to halt disease progression.
MR provides an approach to identify causal protein biomarkers by using genetic variants as instrumental variables. However, conventional MR methods face challenges due to invalid instrumentation and horizontal pleiotropy, which can potentially produce biased results.
Advanced MR techniques that address these limitations are critical for uncovering causal proteins and understanding their structural impact. The MR-SPI method uniquely applies the ‘Anna Karenina principle’, assuming that valid instrumental variables (IVs) behave similarly, while invalid IVs deviate in different ways. Further research is urgently needed to enable effective therapeutic development.
About the study
In two-sample Mendelian Randomization (MR) studies, genetic associations between protein quantitative trait loci (pQTLs) and phenotypic outcomes are analyzed using summary statistics from genome-wide association studies (GWAS).
This process involves identifying independent pQTLs by linkage disequilibrium (LD) clumping, retaining only one representative pQTL per LD region. These pQTLs are modeled to estimate causal relationships between proteins and health outcomes while addressing potential violations of instrumental variable (IV) assumptions.
MR-SPI is a novel method designed to overcome challenges in selecting valid pQTL IVs. It uses the ‘plurality rule’, which assumes that valid IVs yield comparable estimates of causal effects, distinguishing them from invalid instruments.
Through a voting procedure, MR-SPI identifies the largest subset of pQTLs with consistent ratio estimates as valid IVs, ensuring causal inference despite limited availability of pQTLs or violations of IV assumptions. This approach, unlike methods that require “majority rule” or strict assumptions such as InSIDE (Instrument Strength Independent of Direct Effect), is particularly robust for proteomics data with small pQTL sets.
MR-SPI estimates the causal effects using ordinary least squares regression and constructs confidence intervals that are robust to errors in the finite sample. Addressing the limitations of conventional MR methods, MR-SPI provides a framework for identifying causal protein biomarkers, promoting the integration of large-scale proteomics and phenotypic outcome data into causal inference studies.
Study results
The proposed pipeline for identifying causal protein biomarkers and predicting their 3D structural changes consists of three primary steps.
First, for each protein, the MR-SPI method is used to select valid pQTLs as IVs. This is achieved by integrating summary GWAS data for proteomics and disease outcomes, allowing the estimation of the causal effect of each protein on the disease.
Second, Bonferroni correction is applied to the estimated causal effects to identify statistically significant protein biomarkers. Third, AlphaFold3 is used to predict and compare the 3D structures of the wild-type and mutated versions of these proteins arising from missense pQTLs.
MR-SPI works through a multi-step process. It first identifies relevant pQTLs with strong protein associations. Each relevant pQTL provides a ratio estimate of the causal effect, and other pQTLs “vote” for its validity if their degree of violation of assumptions (independence and exclusion constraints) falls below a threshold. A voting matrix is constructed to summarize the cross-validation between pQTLs, identifying valid IVs via majority/plural voting or the maximum clique method.
The causal effect is then estimated using ordinary least squares regression with zero intercept, and confidence intervals are constructed to address potential IV selection errors with a finite sample.
This approach was compared to several established MR methods, including inverse-variance weighting (IVW), MR-Robust Adjusted Profile Score (MR-RAPS), MR-Pleiotropy Residual Sum and Outlier (MR-PRESSO), weighted median estimation, and mode-based estimation. MR-SPI outperformed these methods in simulation studies under conditions with locally invalid IVs, demonstrating superior accuracy and robustness.
Applying MR-SPI to United Kingdom (UK) Biobank proteomics data and AD GWAS data identified seven significant protein biomarkers (Cluster of Differentiation (CD)33, CD55, Erythropoietin-producing hepatocellular carcinoma receptor A1 (EPHA1), paired immunoglobulin-like type 2 Receptor Beta (PILRA), PILRB, rearranged during transfection (RET) and triggering receptor Expressed on myeloid cells 2 (TREM2)).
Structural changes in these proteins, predicted by AlphaFold3, revealed changes due to missense mutations in associated pQTLs. For example, CD33 was found to undergo structural changes that may affect microglial function and amyloid plaque accumulation, highlighting its potential role in AD pathology. This finding underlines the potential of the method to link genetic variations to disease mechanisms.
Gene Ontology (GO) analysis linked these proteins to critical biological processes, including phosphorus metabolism and immune regulation. Notably, some of the identified proteins, such as CD33 and TREM2, have existing FDA-approved drugs that target them, suggesting that drugs could be repurposed to treat Alzheimer’s.
Conclusions
This study introduces a novel pipeline that integrates MR-SPI and AlphaFold3 to identify causal protein biomarkers and predict 3D structural changes induced by missense pQTLs.
MR-SPI uses a voting-based approach under the condition of the plurality rule to select valid pQTLs and constructs confidence intervals that are immune to errors in the finite sample. Applied to 912 plasma proteins, MR-SPI identified seven proteins linked to AD, with structural insights gained by AlphaFold3.
The findings also open avenues for drug development, including repurposing FDA-approved drugs that target identified proteins, such as gemtuzumab ozogamicin for CD33 and RET inhibitors such as pralsetinib for potential AD treatment.
Magazine reference:
- Yao, M., Miller, G. W., Vardarajan, B. N., Baccarelli, A. A., Guo, Z., and Liu, Z. (2024). Deciphering proteins in Alzheimer’s disease: a novel Mendelian randomization method integrated with AlphaFold3 for 3D structure prediction. Cell genomics100700. DOI: 10.1016/j.xgen.2024.100700, https://www.sciencedirect.com/science/article/pii/S2666979X2400329X