Abstract
Mendelian randomization (MR) is a statistical method that allows to investigate causal pathways frommodifiable exposures to disease outcomes, using genetic variants within the instrumental variable setting.Thanks to the increasing availability of summary genetic data from huge meta-analyses, two-sample MRstudies are widely performed to infer causal hypotheses. Multiple data sources on the exposure could beavailable, characterized by the presence of many differences in study design (e.g. on the measurementprocedure; the type of blood sample; the study ancestry; the data transformation). The purpose of this work is tocombine different data sources within an MR framework. We explore empirically a new workflow based on adiscovery-replication-validation design, comparing it with the most common one, where only one MR analysis isperformed using the largest dataset available. The robustness of the obtained causal evidence is evaluated.Following a hypotheses-free approach, a metabolome-wide MR is carried out including different genetic datasets on around 186 targeted metabolites, exposures measured with the same analytical technique (LC-MSBiocrates kit). After a literature review, genetic association studies with at least 5,000 participants are used:three genome-wide association studies (GWAS) with European ancestry – Draisma[1] and CHRIS (notpublished) on serum blood samples from the general population, and Lotta[2] on plasma samples from blooddonors; and one whole-exome sequencing study (WES) – CHRIS[3]. The largest dataset is used for theParkinson’s disease outcome[4]. In the classical design, the instruments are selected once on the meta-analysis of Draisma-CHRIS. With the proposed design, that selection is repeated using each datasetindependently. After the instruments’ selection, different metabolites are analysed. The results show that therepetition of the analysis on different data allows an exposure’s optimization by checking findings’ consistency,leading to the identification of the strongest causal pathway, and opening towards new possible biologicalmechanisms. Reliable causal associations were pointed out with our design proposal when multiple datasets are available.The instruments’ selection from WES allows exploring pros and cons of using them instead of those obtainedfrom GWAS, making an innovative contribution to the instruments’ choice in MR. [1] Draisma HHM, Pool R, Kobl M, et al. Genome-wide association study identifies novel genetic variantscontributing to variation in blood metabolite levels. Nat Commun. 2015;6:7208 [2] Lotta LA, Pietzner M, StewartID, et al. A cross-platform approach identifies genetic regulators of human metabolism and health. Nat Genet.2021;53(1):54-64 [3] König E, Rainer J, Hernandes VV, et al. Whole Exome Sequencing Enhanced ImputationIdentifies 85 Metabolite Associations in the Alpine CHRIS Cohort. Metabolites. 2022;12(7):604 [4] Nalls MA,Blauwendraat C, Vallerga CL, et al. Identification of novel risk loci, causal insights, and heritable risk forParkinson's disease: a meta-analysis of genome-wide association studies. Lancet Neurol. 2019;18(12):1091-1102.