Interactions Between Genetic Variants and Environmental Factors Affect Risk of Esophageal Adenocarcinoma and Barrett’s Esophagus

Background & Aims Genome-wide association studies (GWAS) have identified more than 20 susceptibility loci for esophageal adenocarcinoma (EA) and Barrett’s esophagus (BE). However, variants in these loci account for a small fraction of cases of EA and BE. Genetic factors might interact with environmental factors to affect risk of EA and BE. We aimed to identify single nucleotide polymorphisms (SNPs) that may modify the associations of body mass index (BMI), smoking, and gastroesophageal reflux disease (GERD), with risks of EA and BE. Methods We collected data on single BMI measurements, smoking status, and symptoms of GERD from 2284 patients with EA, 3104 patients with BE, and 2182 healthy individuals (controls) participating in the Barrett’s and Esophageal Adenocarcinoma Consortium GWAS, the UK Barrett’s Esophagus Gene Study, and the UK Stomach and Oesophageal Cancer Study. We analyzed 993,501 SNPs in DNA samples of all study subjects. We used standard case–control logistic regression to test for gene-environment interactions. Results For EA, rs13429103 at chromosome 2p25.1, near the RNF144A-LOC339788 gene, showed a borderline significant interaction with smoking status (P = 2.18×10-7). Ever smoking was associated with an almost 12-fold increase in risk of EA among individuals with rs13429103-AA genotype (odds ratio=11.82; 95% CI, 4.03–34.67). Three SNPs (rs12465911, rs2341926, rs13396805) at chromosome 2q23.3, near the RND3-RBM43 gene, interacted with GERD symptoms (P = 1.70×10-7, P = 1.83×10-7, and P = 3.58×10-7, respectively) to affect risk of EA. For BE, rs491603 at chromosome 1p34.3, near the EIF2C3 gene, and rs11631094 at chromosome 15q14, at the SLC12A6 gene, interacted with BMI (P = 4.44×10-7) and pack-years of smoking history (P = 2.82×10-7), respectively. Conclusion The associations of BMI, smoking, and GERD symptoms with risks of EA and BE appear to vary with SNPs at chromosomes 1, 2, and 15. Validation of these suggestive interactions is warranted.

Barrett's Esophagus Gene Study, and the UK Stomach and Oesophageal Cancer Study. We analyzed 993,501 SNPs in DNA samples of all study subjects. We used standard case-control logistic regression to test for gene-environment interactions.

CONCLUSION:
The associations of BMI, smoking, and GERD symptoms with risks of EA and BE appear to vary with SNPs at chromosomes 1, 2, and 15. Validation of these suggestive interactions is warranted. O ver the past 4 decades, the incidence of esophageal adenocarcinoma (EA) has increased markedly in many Western populations. Among white men in the United States the incidence has increased almost 10-fold, 1 and rates continue to rise by 2% per year. 2 EA is a highly fatal cancer with a median overall survival of <1 year following diagnosis. 3 EAs typically arise on a background of a premalignant change in the lining of the esophagus known as Barrett's esophagus (BE). Thus, proposals to prevent EA-associated morbidity and mortality have suggested focusing on identifying patients with BE and enrolling them in endoscopic surveillance programs, or on identifying and modifying risk factors for neoplastic progression. [4][5][6] Epidemiologic studies have identified frequent or persistent symptoms of gastroesophageal reflux disease (GERD), 7,8 obesity, 9 and smoking 10,11 as the principal factors associated with increased risks of EA and BE. These 3 factors together comprise almost 80% of the attributable burden of EA. 12,13 Genetic factors also influence risk of EA and BE. Recent genome-wide association studies (GWAS) and post-GWAS studies have identified more than 20 loci significantly associated with risks of EA and BE 14 ; however, these variants seem to explain only a limited proportion of the heritability of these diseases (estimated to be 25% for EA and 35% for BE). 15 It is possible that environmental risk factors for EA and BE may interact with multiple genes through various biological pathways to contribute to disease susceptibility. Given the strength of associations with known risk factors for EA and BE (especially when compared with most other cancers), and potentially shared biological pathways (eg, inflammation) underlying these risk factors, 16 identifying gene-environment interactions may be more plausible in the setting of EA and BE. These gene-environment interactions may account for some of the missing heritability of EA and BE. 15 However, previous efforts to identify geneenvironment interactions for EA and BE have predominantly been candidate based and have involved only small numbers of single nucleotide polymorphisms (SNPs). [17][18][19] With the aim of identifying SNPs that may modify the associations of body mass index (BMI), smoking, and GERD symptoms with risks of EA and BE, we used pooled questionnaire and genetic data from several studies to conduct a large scale genome-wide gene-environment interaction study of EA and BE.

Study Population
We obtained data from 1512 EA patients, 2413 BE patients, and 2185 control subjects of European ancestry from 14 epidemiologic studies conducted in Western Europe, Australia, and North America participating in the International Barrett's and Esophageal Adenocarcinoma Consortium (http://beacon.tlvnet.net/) GWAS. The design of the Barrett's and Esophageal Adenocarcinoma Consortium GWAS has been described in detail previously. 20

SNP Genotyping
Genotyping of buffy coat or whole blood DNA from all participants was conducted using the Illumina Omni1M Quad platform (San Diego, CA), in accordance with standard quality-control procedures. 21 For quality control, genotyped SNPs were excluded based on call rate <95%, Hardy-Weinberg Equilibrium P value over controls of <10 -4 , or minor allele frequency (MAF) 2%. After quality assurance and quality control, 993,501 SNPs were used for the current analysis. The analysis was restricted to the subset of ethnically homogenous individuals of European ancestry (confirmed in GWAS samples using principal components analysis). 20

Environmental ("Exposure") Variables
Individual-level exposure data for each study participant were harmonized and merged into a single deidentified dataset. The data were checked for consistency and completeness and any apparent inconsistencies were followed up with individual study investigators. Depending on the study, data from self-reported written questionnaires or in-person interviews were obtained at or near the time of cancer diagnosis for EA patients, at or near the time of BE diagnosis for BE patients, and at the time of recruitment for control subjects. BMI was calculated as weight divided by square of height (kg/m 2 ). For the analysis we selected the weight from each participant that likely reflected usual adult weight (before, for example, any disease-related weight loss). For tobacco smoking, the exposure variables were smoking status (ever vs never) and total cigarette smoking exposure among ever smokers (pack-years of smoking exposure). Ever cigarette smoking was defined as either low threshold exposure (!100 cigarettes over their whole life) or by asking whether they had ever smoked regularly. Packyears of smoking exposure was derived by dividing the average number of cigarettes smoked daily by 20 and multiplying by the total number of years smoked. GERD symptoms were defined as the presence of heartburn (ie, a burning or aching pain behind the sternum) or acid reflux (ie, a sour taste from acid, bile, or other stomach contents rising up into the mouth). For analysis, we used the highest reported frequency for either GERD symptom. Participants were then categorized as recurrent vs not recurrent based on a frequency of weekly or greater GERD symptoms for "recurrent." 7 A total of 425 participants with missing values for all 3 covariates (BMI, smoking history, and history of GERD symptoms) were excluded from the analysis.

Statistical Analysis
We used standard case-control logistic regression to test for gene-environment interactions. SNP genotypes were treated as continuous variables and coded as 0, 1, or 2 copies of the minor allele. Exposure variables were either continuous (BMI and pack-years of smoking exposure) or dichotomous (smoking status and GERD symptoms). We modeled the gene-environment interaction by the product of the SNP genotype and the exposure variable, adjusting for age, sex, the first 4 principal components to control for possible population stratification, and the main terms of the SNP and the exposure variable. We used model-robust standard errors as suggested in Voorman et al 22 to avoid inflated test statistics that can arise due to underestimation of variability in gene-environment GWAS. For SNPs from each of the top gene-environment interaction hits (ie, main text, P value for interaction <5.0 Â 10 -7 ) (Supplemental Material, P value for interaction <1.0 Â 10 -6 ) we also performed stratified analyses by genotype to examine the modified association of the known risk factor for EA or BE within the specific genotypes. Analyses were conducted using R software version 3.4.3. (R Foundation for Statistical Computing, Vienna, Austria), the GWASTools package, 23 and Stata 13.0 (StataCorp LP, College Station, TX).

Results
The final study sample included 2284 EA patients, 3104 BE patients, and 2182 control subjects. Characteristics of the study sample are shown in Table 1. On average, BMI was higher among EA (mean, 28.4 kg/m 2 ) and BE (mean, 28.7 kg/m 2 ) patients than among control subjects (mean, 27.0 kg/m 2 ). Similarly, EA and BE patients were more likely than control subjects to be ever smokers (74.8%, 64.8%, and 59.1%, respectively) and to report history of recurrent GERD symptoms (46.9%, 52.9%, and 19.4%, respectively).

Cross-Examination of Discovered Gene-Environment Interactions
For each SNP in Table 2 and Supplemental Table 1 that had a borderline significant genome-wide interaction in either EA or BE, we examined the equivalent gene-environment interaction in BE and EA, respectively (Supplemental Table 3). For all SNPs discovered in EA, we observed nominal levels of significance (P value for interaction <.05) and ORs in the same direction but somewhat attenuated in BE. For SNPs discovered in BE, only half had P value for interaction <.05 in EA, although all had similar ORs to those in BE. Although obesity and GERD are correlated, none of the SNPs with P value for interaction <1.0 Â 10 -6 with GERD had comparable ORs or P values when testing for interaction  Table 2 are shown as a solid purple diamond, except in panel B where rs2341926 and rs13396805 are shown as circles near rs12465911. The color scheme indicates linkage disequilibrium between the SNP shown with a solid purple diamond and other SNPs in the region using the r 2 value calculated from the 1000 Genomes Project. The y axis is the Àlog10 interaction P value computed from 5388 cases (3104 Barrett's esophagus, 2284 esophageal adenocarcinoma) and 2182 control subjects. The recombination rate from CEU HapMap data (right-side y axis) is shown in light blue. with obesity and similarly for the 1 obesity SNP when tested for GERD.

Discussion
To our knowledge, this is the first genome-wide gene-environment interaction study of EA and its precursor, BE. Although no gene-environment interactions reached genome-wide significance (ie, P < 5.0 Â 10 -8 for interaction), several borderline significant interactions were indicated between SNPs and known risk factors for EA and BE -BMI, smoking, and GERD symptoms.
A number of studies have pursued candidate-based gene-environment analyses of EA, and reported interactions between BMI, smoking or GERD symptoms and selected SNPs in genes related to detoxification, angiogenesis, DNA repair, apoptosis, and extracellular matrix degradation. [24][25][26][27][28][29][30][31] This body of work helped to establish the notion that the level of disease risk associated with GERD symptoms, in particular, may vary according to inherited genetic variation. All of these studies, however, were conducted in small samples (<350 cases) and were not replicated in independent populations. While direct comparison of our own results and these past findings is complicated by less-thancomplete overlap of genotyped SNPs between studies, we did not find evidence in support of interactions among BMI, smoking, or GERD symptoms and any assessed variants in previously-implicated genes: GSTM1, GSTT1, VEGF, MGMT, EGF, IL1B, PERP, PIK3CA, TNFRSF1A, CASP7, TP53BP1, BCL2, HIF1AN, PDGRFA, VEGFR1, or MMP1 (Supplemental Table 4). It remains possible that nominal evidence for some of these associations may not have survived stringent correction for multiple comparisons, and larger samples are needed for true signals to reach significance. Alternatively, previously reported interactions may simply reflect chance findings in small samples because they did not validate in our large study population.
This study has several strengths. First, the pooled dataset including relatively large numbers of cases and control subjects provided us with a rare opportunity to perform, in parallel, genome-wide gene-environment interaction analyses for EA and its precursor lesion, BE. Past candidate-based gene-environment interaction studies of EA have focused on small numbers of genes selected according to biological plausibility, and collectively these reports sampled only a small fraction of the total SNPs presently analyzed (N ¼ 993,501). Such preconceived "gene-centric" SNP selection methods fail to capture the large fraction of noncoding intergenic variations that have been linked to altered risk for these 2 conditions, and also artificially restricts the "genic" search space based on limited mechanistic knowledge, a limitation that is overcome by an unbiased comprehensive genome-wide gene-environment interaction assessment. Second, our study draws on genetic and epidemiologic data from a recent consortium-based GWAS of EA/BE, 20 which is the largest of its kind. This sizable study sample afforded greater power to detect gene-environment interactions than in any previous study. Third, all genotyping from this GWAS was conducted on a single platform and in a single laboratory, and subjected to stringent quality-control procedures. Most GWAS analyses test only an additive model because an additive model has reasonable power to detect both additive and dominant effects and the 2 models yield similar results and many GWAS analyses, including ours, are underpowered to detect recessive effects. Nevertheless, for completeness we also tested a dominant model for the 16 SNPs with a P value for interaction <1.0 Â 10 -6 ( Table 2 and Supplemental Table 1), and found slightly attenuated results of the ORs for some gene-environment interactions (data not shown).
Our study also has some limitations. First, our ability to detect true gene-environment interactions might have been limited by the manner in which the environmental (exposure) variables were measured and harmonized. For example, recall bias is a possibility during retrospective reporting of the exposures in the parent casecontrol studies. However, respondents were unaware of their genotype status at the time of the interviews, mitigating the impact of any possible recall bias in our interaction analyses. Similarly, while considerable care was taken during data harmonization, as described in a series of recent pooled analyses, 10,11 some potential for measurement error of the exposures examined is possible. However, given that case-control status was not considered during this process, any errors from harmonization would be nondifferential, resulting in attenuation of the resulting ORs. Second, central obesity (eg, waist-to-hip ratio) has been found to be more strongly associated with the risk of BE than BMI; however, as waist and hip measurements were not collected in the majority of the included studies, we were unable to examine for interactions with central obesity. Third, despite the comprehensive nature of the genomewide analysis, we were nonetheless limited to examining common genetic variation (MAF >2%) represented on the Illumina Omni1M Quad GWAS platform employed. Further large-scale studies based on whole-exome or whole-genome sequencing would be required to identify additional gene-environment interactions with rare variants, and more precisely map the reported associations. Finally, our study results should be considered as discovery findings, worthy of independent replication. None of the interactions studied reached genome-wide significance (ie, P < 5.0 Â 10 -8 for interaction). This may be because there are truly no gene-environment interactions or it may be that power was still limited to detect modest or weak interactions despite our large sample size. In our analyses of 2284 EA patients, 3104 BE patients, and 2182 control subjects, we were adequately powered to detect interactions with an interaction OR in the range of 1.98-2.52 for MAF in the observed range (0.11-0.43), assuming a main effect of 1.08 for log-additive SNPs, a main effect of 1.90 for binary risk factors, and an a of 5.0 Â 10 -8 . Given the large worldwide consortia sample of patients participating in this work, few additional studies of EA and BE patients are currently available and have data for replication; thus, such work may require additional time for study patients to accrue.
In conclusion, our report describes the first genomewide gene-environment interaction analysis for EA and BE. These findings provide evidence that the magnitude of disease risk associated with BMI, smoking, and GERD symptoms may differ according to germline genetics, and suggest the potential utility of combing epidemiologic exposure data with selected genotyping for comprehensive risk assessment in patients susceptible to EA or BE. Pending validation of the observed interactions in independent study populations, further analyses will be required to investigate the biological basis for differential disease risk associated with the risk factors investigated in the presence of these variants.  a Two-way interaction. b On array but quality control failure. We were unable to validate all SNPs as some were biallelic or we failed to identify a high LD SNP (r 2 < 0.70).