Peruvians are among the shortest people in the world. To understand the genetic basis of short stature in Peru, we examined an ethnically diverse group of Peruvians and identified a novel, population-specific, missense variant in FBN1 (E1297G) that is significantly associated with lower height in the Peruvian population. Each copy of the minor allele (frequency = 4.7%) reduces height by 2.2 cm (4.4 cm in homozygous individuals). This is the largest effect size known for a common height-associated variant. This variant shows strong evidence of positive selection within the Peruvian population and is significantly more frequent in Native American populations from coastal regions of Peru compared to populations from the Andes or the Amazon, suggesting that short stature in Peruvians is the result of adaptation to the coastal environment.One Sentence Summary A mutation found in Peruvians has the largest known effect on height for a common variant. This variant is specific to Native American ancestry.
Ishigaki K, Akiyama M, Kanai M, Takahashi A, Kawakami E, Sugishita H, Sakaue S, Matoba N, Low S-K, Okada Y, Terao C, Amariuta T, Gazal S, Kochi Y, Horikoshi M, Suzuki K, Ito K, Momozawa Y, Hirata M, Matsuda K, Ikeda M, Iwata N, Ikegawa S, Kou I, Tanaka T, Nakagawa H, Suzuki A, Hirota T, Tamari M, Chayama K, Miki D, Mori M, Nagayama S, Daigo Y, Miki Y, Katagiri T, Ogawa O, Obara W, Ito H, Yoshida T, Imoto I, Takahashi T, Tanikawa C, Suzuki T, Sinozaki N, Minami S, Yamaguchi H, Asai S, Takahashi Y, Yamaji K, Takahashi K, Fujioka T, Takata R, Yanai H, Masumoto A, Koretsune Y, Kutsumi H, Higashiyama M, Murayama S, Minegishi N, Suzuki K, Tanno K, Shimizu A, Yamaji T, Iwasaki M, Sawada N, Uemura H, Tanaka K, Naito M, Sasaki M, Wakai K, Tsugane S, Yamamoto M, Yamamoto K, Murakami Y, Nakamura Y, Raychaudhuri S, Inazawa J, Yamauchi T, Kadowaki T, Kubo M, Kamatani Y. Large scale genome-wide association study in a Japanese population identified 45 novel susceptibility loci for 22 diseases [Internet]. Nature Genetics Forthcoming; PreprintAbstract
The overwhelming majority of participants in current genetic studies are of European ancestry1–3, limiting our genetic understanding of complex disease in non-European populations. To address this, we aimed to elucidate polygenic disease biology in the East Asian population by conducting a genome-wide association study (GWAS) with 212,453 Japanese individuals across 42 diseases. We detected 383 independent signals in 331 loci for 30 diseases, among which 45 loci were novel (P < 5 × 10-8). Compared with known variants, novel variants have lower frequency in European populations but comparable frequency in East Asian populations, suggesting the advantage of this study in discovering these novel variants. Three novel signals were in linkage disequilibrium (r2 > 0.6) with missense variants which are monomorphic in European populations (1000 Genomes Project) including rs11235604(p.R220W of ATG16L2, a autophagy-related gene) associated with coronary artery disease. We further investigated enrichment of heritability within 2,868 annotations of genome-wide transcription factor occupancy, andidentified 378 significant enrichments across nine diseases (FDR < 0.05) (e.g. NF-κB for immune-related diseases). This large-scale GWAS in a Japanese population provides insights into the etiology of common complex diseases and highlights the importance of performing GWAS in non-European populations.
Multiple slowly progressing diseases initially present with inflammatory arthritis, and it can be difficult to clinically differentiate these conditions. Knevel et al. show that genetic data could be used to triage inflammatory arthritis–causing diagnoses at a patient’s first visit, improving the likelihood of a correct initial diagnosis and potentially expediting appropriate treatment. Their genetic diagnostic tool, here optimized for rheumatic disease diagnosis, could, in principle, be calibrated for other phenotypically similar diseases with different underlying genetics.It is challenging to quickly diagnose slowly progressing diseases. To prioritize multiple related diagnoses, we developed G-PROB (Genetic Probability tool) to calculate the probability of different diseases for a patient using genetic risk scores. We tested G-PROB for inflammatory arthritis–causing diseases (rheumatoid arthritis, systemic lupus erythematosus, spondyloarthropathy, psoriatic arthritis, and gout). After validating on simulated data, we tested G-PROB in three cohorts: 1211 patients identified by International Classification of Diseases (ICD) codes within the eMERGE database, 245 patients identified through ICD codes and medical record review within the Partners Biobank, and 243 patients first presenting with unexplained inflammatory arthritis and with final diagnoses by record review within the Partners Biobank. Calibration of G-probabilities with disease status was high, with regression coefficients from 0.90 to 1.08 (1.00 is ideal). G-probabilities discriminated true diagnoses across the three cohorts with pooled areas under the curve (95% CI) of 0.69 (0.67 to 0.71), 0.81 (0.76 to 0.84), and 0.84 (0.81 to 0.86), respectively. For all patients, at least one disease could be ruled out, and in 45% of patients, a likely diagnosis was identified with a 64% positive predictive value. In 35% of cases, the clinician’s initial diagnosis was incorrect. Initial clinical diagnosis explained 39% of the variance in final disease, which improved to 51% (P < 0.0001) after adding G-probabilities. Converting genotype information before a clinical visit into an interpretable probability value for five different inflammatory arthritides could potentially be used to improve the diagnostic efficiency of rheumatic diseases in clinical practice.
Rheumatoid arthritis (RA) is the most common immune-mediated arthritis. Anti-citrullinated peptide antibodies (ACPA) are highly specific to RA and assayed with the commercial CCP2 assay. Genetic drivers of RA within the MHC are different for CCP2-positive and -negative subsets of RA, particularly at HLA-DRB1. However, aspartic acid at amino acid position 9 in HLA-B (B) increases risk to both RA subsets. Here we explore how individual serologies associated with RA drive associations within the MHC. To define MHC differences for specific ACPA serologies, we quantified a total of 19 separate ACPAs in RA-affected case subjects from four cohorts (n = 6,805). We found a cluster of tightly co-occurring antibodies (canonical serologies, containing CCP2), along with several independently expressed antibodies (non-canonical serologies). After imputing HLA variants into 6,805 case subjects and 13,467 control subjects, we tested associations between the HLA region and RA subgroups based on the presence of canonical and/or non-canonical serologies. We examined CCP2(+) and CCP2(-) RA-affected case subjects separately. In CCP2(-) RA, we observed that the association between CCP2(-) RA and B was derived from individuals who were positive for non-canonical serologies (omnibus_p = 9.2 × 10). Similarly, we observed in CCP2(+) RA that associations between subsets of CCP2(+) RA and B were negatively correlated with the number of positive canonical serologies (p = 0.0096). These findings suggest unique genetic characteristics underlying fine-specific ACPAs, suggesting that RA may be further subdivided beyond simply seropositive and seronegative.
The emerging diversity of single-cell RNA-seq datasets allows for the full transcriptional characterization of cell types across a wide variety of biological and clinical conditions. However, it is challenging to analyze them together, particularly when datasets are assayed with different technologies, because biological and technical differences are interspersed. We present Harmony (https://github.com/immunogenomics/harmony), an algorithm that projects cells into a shared embedding in which cells group by cell type rather than dataset-specific conditions. Harmony simultaneously accounts for multiple experimental and biological factors. In six analyses, we demonstrate the superior performance of Harmony to previously published algorithms while requiring fewer computational resources. Harmony enables the integration of ~10 cells on a personal computer. We apply Harmony to peripheral blood mononuclear cells from datasets with large experimental differences, five studies of pancreatic islet cells, mouse embryogenesis datasets and the integration of scRNA-seq with spatial transcriptomics data.
Of the 1.8 billion people worldwide infected with Mycobacterium tuberculosis, 5–15% will develop active tuberculosis (TB). Approximately half will progress to active TB within the first 18 months after infection, presumably because they fail to mount an effective initial immune response. Here, in a genome-wide genetic study of early TB progression, we genotype 4002 active TB cases and their household contacts in Peru. We quantify genetic heritability (h2ghg2) of early TB progression to be 21.2% (standard error 0.08). This suggests TB progression has a strong genetic basis, and is comparable to traits with well-established genetic bases. We identify a novel association between early TB progression and variants located in a putative enhancer region on chromosome 3q23 (rs73226617, OR = 1.18; P = 3.93 × 10−8). With in silico and in vitro analyses we identify rs73226617 or rs148722713 as the likely functional variant and ATP1B3 as a potential causal target gene with monocyte specific function.
To define the cell populations that drive joint inflammation in rheumatoid arthritis (RA), we applied single-cell RNA sequencing (scRNA-seq), mass cytometry, bulk RNA sequencing (RNA-seq) and flow cytometry to T cells, B cells, monocytes, and fibroblasts from 51 samples of synovial tissue from patients with RA or osteoarthritis (OA). Utilizing an integrated strategy based on canonical correlation analysis of 5,265 scRNA-seq profiles, we identified 18 unique cell populations. Combining mass cytometry and transcriptomics revealed cell states expanded in RA synovia: THY1(CD90)+HLA-DRAhi sublining fibroblasts, IL1B+ pro-inflammatory monocytes, ITGAX+TBX21+autoimmune-associated B cells and PDCD1+ peripheral helper T (TPH) cells and follicular helper T (TFH) cells. We defined distinct subsets of CD8+ T cells characterized by GZMK+, GZMB+, and GNLY+ phenotypes. We mapped inflammatory mediators to their source cell populations; for example, we attributed IL6 expression to THY1+HLA-DRAhi fibroblasts and IL1B production to pro-inflammatory monocytes. These populations are potentially key mediators of RA pathogenesis.
Despite significant progress in annotating the genome with experimental methods, much of the regulatory noncoding genome remains poorly defined. Here we assert that regulatory elements may be characterized by leveraging local epigenomic signatures at sites where specific transcription factors (TFs) are bound. To link these two identifying features, we introduce IMPACT, a genome annotation strategy which identifies regulatory elements defined by cell-state-specific TF binding profiles, learned from 515 chromatin and sequence annotations. We validate IMPACT using multiple compelling applications. First, IMPACT predicts TF motif binding with high accuracy (average AUC 0.92, s.e. 0.03; across 8 TFs), a significant improvement (all p<6.9e-15) over intersecting motifs with open chromatin (average AUC 0.66, s.e. 0.11). Second, an IMPACT annotation trained on RNA polymerase II is more enriched for peripheral blood cis-eQTL variation (N=3,754) than sequence based annotations, such as promoters and regions around the TSS, (permutation p<1e-3, 25% average increase in enrichment). Third, integration with rheumatoid arthritis (RA) summary statistics from European (N=38,242) and East Asian (N=22,515) populations revealed that the top 5% of CD4+ Treg IMPACT regulatory elements capture 85.7% (s.e. 19.4%) of RA h2 (p<1.6e-5) and that the top 9.8% of Treg IMPACT regulatory elements, consisting of all SNPs with a non-zero annotation value, capture 97.3% (s.e. 18.2%) of RA h2 (p<7.6e-7), the most comprehensive explanation for RA h2 to date. In comparison, the average RA h2 captured by compared CD4+ T histone marks is 42.3% and by CD4+ T specifically expressed gene sets is 36.4%. Finally, integration with RA fine-mapping data (N=27,345) revealed a significant enrichment (2.87, p<8.6e-3) of putatively causal variants across 20 RA associated loci in the top 1% of CD4+ Treg IMPACT regulatory regions. Overall, we find that IMPACT generalizes well to other cell types in identifying complex trait associated regulatory elements.
OBJECTIVES: We sought to investigate whether genetic effects on response to TNF inhibitors (TNFi) in rheumatoid arthritis (RA) could be localised by considering known genetic susceptibility loci for relevant traits and to evaluate the usefulness of these genetic loci for stratifying drug response. METHODS: We studied the relation of TNFi response, quantified by change in swollen joint counts ( Δ SJC) and erythrocyte sedimentation rate ( Δ ESR) with locus-specific scores constructed from genome-wide assocation study summary statistics in 2938 genotyped individuals: 37 scores for RA; scores for 19 immune cell traits; scores for expression or methylation of 93 genes with previously reported associations between transcript level and drug response. Multivariate associations were evaluated in penalised regression models by cross-validation. RESULTS: We detected a statistically significant association between Δ SJC and the RA score at the locus (p=0.0004) and an inverse association between Δ SJC and the score for expression of CD39 on CD4 T cells (p=0.00005). A previously reported association between CD39 expression on regulatory T cells and response to methotrexate was in the opposite direction. In stratified analysis by concomitant methotrexate treatment, the inverse association was stronger in the combination therapy group and dissipated in the TNFi monotherapy group. Overall, ability to predict TNFi response from genotypic scores was limited, with models explaining less than 1% of phenotypic variance. CONCLUSIONS: The association with the CD39 trait is difficult to interpret because patients with RA are often prescribed TNFi after failing to respond to methotrexate. The CD39 and pathways could be relevant for targeting drug therapy.
The advancement of precision medicine requires new methods to coordinate and deliver genetic data from heterogeneous sources to physicians and patients. The eMERGE III Network enrolled >25,000 participants from biobank and prospective cohorts of predominantly healthy individuals for clinical genetic testing to determine clinically actionable findings. The network developed protocols linking together the 11 participant collection sites and 2 clinical genetic testing laboratories. DNA capture panels targeting 109 genes were used for testing of DNA and sample collection, data generation, interpretation, reporting, delivery, and storage were each harmonized. A compliant and secure network enabled ongoing review and reconciliation of clinical interpretations, while maintaining communication and data sharing between clinicians and investigators. A total of 202 individuals had positive diagnostic findings relevant to the indication for testing and 1,294 had additional/secondary findings of medical significance deemed to be returnable, establishing data return rates for other testing endeavors. This study accomplished integration of structured genomic results into multiple electronic health record (EHR) systems, setting the stage for clinical decision support to enable genomic medicine. Further, the established processes enable different sequencing sites to harmonize technical and interpretive aspects of sequencing tests, a critical achievement toward global standardization of genomic testing. The eMERGE protocols and tools are available for widespread dissemination.
Rheumatoid arthritis (RA) risk has a large genetic component (~60%) that is still not fully understood. This has hampered the design of effective treatments that could promise lifelong remission. RA is a polygenic disease with 106 known genome-wide significant associated loci and thousands of small effect causal variants. Our current understanding of RA risk has suggested cell-type-specific contexts for causal variants, implicating CD4 + effector memory T cells, as well as monocytes, B cells and stromal fibroblasts. While these cellular states and categories are still mechanistically broad, future studies may identify causal cell subpopulations. These efforts are propelled by advances in single cell profiling. Identification of causal cell subpopulations may accelerate therapeutic intervention to achieve lifelong remission.
Many immune diseases occur at different rates among people with schizophrenia compared to the general population. Here, we evaluated whether this phenomenon might be explained by shared genetic risk factors. We used data from large genome-wide association studies to compare the genetic architecture of schizophrenia to 19 immune diseases. First, we evaluated the association with schizophrenia of 581 variants previously reported to be associated with immune diseases at genome-wide significance. We identified five variants with potentially pleiotropic effects. While colocalization analyses were inconclusive, functional characterization of these variants provided the strongest evidence for a model in which genetic variation at rs1734907 modulates risk of schizophrenia and Crohn's disease via altered methylation and expression of EPHB4-a gene whose protein product guides the migration of neuronal axons in the brain and the migration of lymphocytes towards infected cells in the immune system. Next, we investigated genome-wide sharing of common variants between schizophrenia and immune diseases using cross-trait LD score regression. Of the 11 immune diseases with available genome-wide summary statistics, we observed genetic correlation between six immune diseases and schizophrenia: inflammatory bowel disease (rg = 0.12 ± 0.03, P = 2.49 × 10-4), Crohn's disease (rg = 0.097 ± 0.06, P = 3.27 × 10-3), ulcerative colitis (rg = 0.11 ± 0.04, P = 4.05 × 10-3), primary biliary cirrhosis (rg = 0.13 ± 0.05, P = 3.98 × 10-3), psoriasis (rg = 0.18 ± 0.07, P = 7.78 × 10-3) and systemic lupus erythematosus (rg = 0.13 ± 0.05, P = 3.76 × 10-3). With the exception of ulcerative colitis, the degree and direction of these genetic correlations were consistent with the expected phenotypic correlation based on epidemiological data. Our findings suggest shared genetic risk factors contribute to the epidemiological association of certain immune diseases and schizophrenia.
Single-cell methods have revolutionized the study of T cell biology by enabling the identification and characterization of individual cells. This has led to a deeper understanding of T cell heterogeneity by generating functionally relevant measurements - like gene expression, surface markers, chromatin accessibility, T cell receptor sequences - in individual cells. While these methods are independently valuable, they can be augmented when applied jointly, either on separate cells from the same sample or on the same cells. Multimodal approaches are already being deployed to characterize T cells in diverse disease contexts and demonstrate the value of having multiple insights into a cell's function. But, these data sets pose new statistical challenges for integration and joint analysis.
Arazi A, Rao DA, Berthier CC, Davidson A, Liu Y, Hoover PJ, Chicoine A, Eisenhaure TM, Jonsson AH, Li S, Lieb DJ, Zhang F, Slowikowski K, Browne EP, Norma A, Sutherby D, Steelman S, Smilek DE, Tosta P, Apruzzese W, Massarotti E, Dall'Era M, Park M, Kamen DL, Furie RA, Payan-Schober F, Pendergraft WF, McInnes EA, Buyon JP, Petri MA, Putterman C, Kalunian KC, Woodle ES, Lederer JA, Hildeman DA, Nusbaum C, Raychaudhuri S, Kretzler M, Anolik JH, Brenner MB, Wofsy D, Hacohen N, Diamond B, in network AMPSLE. The immune cell landscape in kidneys of patients with lupus nephritis [Internet]. Nature Immunology 2019;20(7):902–914. Publisher's VersionAbstract
Lupus nephritis is a potentially fatal autoimmune disease for which the current treatment is ineffective and often toxic. To develop mechanistic hypotheses of disease, we analyzed kidney samples from patients with lupus nephritis and from healthy control subjects using single-cell RNA sequencing. Our analysis revealed 21 subsets of leukocytes active in disease, including multiple populations of myeloid cells, T cells, natural killer cells and B cells that demonstrated both pro-inflammatory responses and inflammation-resolving responses. We found evidence of local activation of B cells correlated with an age-associated B-cell signature and evidence of progressive stages of monocyte differentiation within the kidney. A clear interferon response was observed in most cells. Two chemokine receptors, CXCR4 and CX3CR1, were broadly expressed, implying a potentially central role in cell trafficking. Gene expression of immune cells in urine and kidney was highly correlated, which would suggest that urine might serve as a surrogate for kidney biopsies.
The molecular and cellular processes that lead to renal damage and to the heterogeneity of lupus nephritis (LN) are not well understood. We applied single-cell RNA sequencing (scRNA-seq) to renal biopsies from patients with LN and evaluated skin biopsies as a potential source of diagnostic and prognostic markers of renal disease. Type I interferon (IFN)-response signatures in tubular cells and keratinocytes distinguished patients with LN from healthy control subjects. Moreover, a high IFN-response signature and fibrotic signature in tubular cells were each associated with failure to respond to treatment. Analysis of tubular cells from patients with proliferative, membranous and mixed LN indicated pathways relevant to inflammation and fibrosis, which offer insight into their histologic differences. In summary, we applied scRNA-seq to LN to deconstruct its heterogeneity and identify novel targets for personalized approaches to therapy.