Recent advances in single-cell technologies and integration algorithms make it possible to construct large, comprehensive reference atlases from multiple datasets encompassing many donors, studies, disease states, and sequencing platforms. Much like mapping sequencing reads to a reference genome, it is essential to be able to map new query cells onto complex, multimillion-cell reference atlases to rapidly identify relevant cell states and phenotypes. We present Symphony, a novel algorithm for building compressed, integrated reference atlases of ≥106 cells and enabling efficient query mapping within seconds. Based on a linear mixture model framework, Symphony precisely localizes query cells within a low-dimensional reference embedding without the need to reintegrate the reference cells, facilitating the downstream transfer of many types of reference-defined annotations to the query cells. We demonstrate the power of Symphony by (1) mapping a query containing multiple levels of experimental design to predict pancreatic cell types in human and mouse, (2) localizing query cells along a smooth developmental trajectory of human fetal liver hematopoiesis, and (3) harnessing a multimodal CITE-seq reference atlas to infer query surface protein expression in memory T cells. Symphony will enable the sharing of comprehensive integrated reference atlases in a convenient, portable format that powers fast, reproducible querying and downstream analyses.
Polymorphisms in the human leukocyte antigen (HLA) genes within the major histocompatibility complex (MHC) locus strongly influence autoimmune disease risk1–5. Two non-exclusive hypotheses exist about the pathogenic role of HLAalleles; i) the central hypothesis, where HLA risk alleles influence thymic selection so that the probability of T cell receptors (TCRs) reactive to pathogenic antigens is increased6–8; and ii) the peripheral hypothesis, where HLA risk alleles increase the affinity for pathogenic antigens9–11. The peripheral hypothesis has been the main research focus in autoimmunity, while human data on the central hypothesis are lacking. Here, we investigated the influence of HLA alleles on TCR composition at the highly diverse complementarity determining region 3 (CDR3), where TCR recognizes antigens. We demonstrated unexpectedly powerful HLA-CDR3 associations. The strongest association was found at HLA-DRB1 amino acid position 13 (n = 628 subjects, explained variance = 9.4%; P = 4.1 x 10−138). This HLA position mediates genetic risk for multiple autoimmune diseases. In structural analysis of TCR-peptide-MHC complexes, we observed that HLA-DRB1 position 13 does not interact directly with CDR3, but is proximate to antigenic peptide residues that are also close to CDR3. We identified multiple CDR3 amino acid features enriched by HLA risk alleles; for example, the risk alleles of rheumatoid arthritis, type 1 diabetes, and celiac disease all increase the hydrophobicity of CDR3 position 109 (P < 2.1 x 10−5). In the setting of celiac disease, the CDR3 features favored by HLA risk alleles are more enriched among candidate pathogenic TCRs than control TCRs (P = 2.4 × 10−6 for gliadin specific TCRs). Together, these results provide novel genetic evidence supporting the central hypothesis.
As advances in single-cell technologies enable the unbiased assay of thousands of cells simultaneously, human disease studies are able to identify clinically associated cell states using case-control study designs. These studies require precious clinical samples and costly technologies; therefore, it is critical to employ study design principles that maximize power to detect cell state frequency shifts between conditions, such as disease versus healthy. Here, we present single-cell Power Simulation Tool (scPOST), a method that enables users to estimate power under different study designs. To approximate the specific experimental and clinical scenarios being investigated, scPOST takes prototype (public or pilot) single-cell data as input and generates large numbers of single-cell datasets in silico. We use scPOST to perform power analyses on three independent single-cell datasets that span diverse experimental conditions: a batch-corrected 21-sample rheumatoid arthritis dataset (5,265 cells) from synovial tissue, a 259-sample tuberculosis progression dataset (496,517 memory T cells) from peripheral blood mononuclear cells (PBMCs), and a 30-sample ulcerative colitis dataset (235,229 cells) from intestinal biopsies. Over thousands of simulations, we consistently observe that power to detect frequency shifts in cell states is maximized by larger numbers of independent clinical samples, reduced batch effects, and smaller variation in a cell state’s frequency across samples.
Poor trans-ancestry portability of polygenic risk scores is a consequence of Eurocentric genetic studies and limited knowledge of shared causal variants. Leveraging regulatory annotations may improve portability by prioritizing functional over tagging variants. We constructed a resource of 707 cell-type-specific IMPACT regulatory annotations by aggregating 5,345 epigenetic datasets to predict binding patterns of 142 transcription factors across 245 cell types. We then partitioned the common SNP heritability of 111 genome-wide association study summary statistics of European (average n ≈ 189,000) and East Asian (average n ≈ 157,000) origin. IMPACT annotations captured consistent SNP heritability between populations, suggesting prioritization of shared functional variants. Variant prioritization using IMPACT resulted in increased trans-ancestry portability of polygenic risk scores from Europeans to East Asians across all 21 phenotypes analyzed (49.9% mean relative increase in R2). Our study identifies a crucial role for functional annotations such as IMPACT to improve the trans-ancestry portability of genetic data.
Rheumatoid arthritis (RA) risk has a large genetic component (~60%) that is still not fully understood. This has hampered the design of effective treatments that could promise lifelong remission. RA is a polygenic disease with 106 known genome-wide significant associated loci and thousands of small effect causal variants. Our current understanding of RA risk has suggested cell-type-specific contexts for causal variants, implicating CD4 + effector memory T cells, as well as monocytes, B cells and stromal fibroblasts. While these cellular states and categories are still mechanistically broad, future studies may identify causal cell subpopulations. These efforts are propelled by advances in single cell profiling. Identification of causal cell subpopulations may accelerate therapeutic intervention to achieve lifelong remission.
Fibroblast-like synoviocytes (FLS) are joint-lining cells that promote rheumatoid arthritis (RA) pathology. Current disease-modifying antirheumatic agents (DMARDs) operate through systemic immunosuppression. FLS-targeted approaches could potentially be combined with DMARDs to improve control of RA without increasing immunosuppression. Here, we assessed the potential of immunoglobulin-like domains 1 and 2 (Ig1&2), a decoy protein that activates the receptor tyrosine phosphatase sigma (PTPRS) on FLS, for RA therapy. We report that PTPRS expression is enriched in synovial lining RA FLS and that Ig1&2 reduces migration of RA but not osteoarthritis FLS. Administration of an Fc-fusion Ig1&2 attenuated arthritis in mice without affecting innate or adaptive immunity. Furthermore, PTPRS was down-regulated in FLS by tumor necrosis factor (TNF) via a phosphatidylinositol 3-kinase–mediated pathway, and TNF inhibition enhanced PTPRS expression in arthritic joints. Combination of ineffective doses of TNF inhibitor and Fc-Ig1&2 reversed arthritis in mice, providing an example of synergy between FLS-targeted and immunosuppressive DMARD therapies.
Hypersensitivity reactions to drugs are often unpredictable and can be life threatening, underscoring a need for understanding their underlying mechanisms and risk factors. The extent to which germline genetic variation influences the risk of commonly reported drug allergies such as penicillin allergy remains largely unknown. We extracted data from the electronic health records of more than 600,000 participants from the UK, Estonian, and Vanderbilt University Medical Center's BioVU biobanks to study the role of genetic variation in the occurrence of self-reported penicillin hypersensitivity reactions. We used imputed SNP to HLA typing data from these cohorts to further fine map the human leukocyte antigen (HLA) association and replicated our results in 23andMe's research cohort involving a total of 1.12 million individuals. Genome-wide meta-analysis of penicillin allergy revealed two loci, including one located in the HLA region on chromosome 6. This signal was further fine-mapped to the HLA-B∗55:01 allele (OR 1.41 95% CI 1.33-1.49, p value 2.04 × 10-31) and confirmed by independent replication in 23andMe's research cohort (OR 1.30 95% CI 1.25-1.34, p value 1.00 × 10-47). The lead SNP was also associated with lower lymphocyte counts and in silico follow-up suggests a potential effect on T-lymphocytes at HLA-B∗55:01. We also observed a significant hit in PTPN22 and the GWAS results correlated with the genetics of rheumatoid arthritis and psoriasis. We present robust evidence for the role of an allele of the major histocompatibility complex (MHC) I gene HLA-B in the occurrence of penicillin allergy.
The emergence of COVID-19 in early 2020 led to unprecedented changes to rheumatology clinical practice worldwide, including the closure of research laboratories, the restructuring of hospitals and the rapid transition to virtual care. As governments sought to slow and contain the spread of the disease, rheumatologists were presented with the difficult task of managing risks, to their patients as well as to themselves, while learning and implementing new systems for remote health care. Consequently, the COVID-19 pandemic led to a transformation in health infrastructures and telemedicine that could become powerful tools for rheumatologists, despite having some limitations. In this Viewpoint, five experts from different regions discuss their experiences of the pandemic, including the most challenging aspects of this unexpected transition, the advantages and limitations of virtual visits, and potential opportunities going forward.
Fine-mapping human leukocyte antigen (HLA) genes involved in disease susceptibility to individual alleles or amino acid residues has been challenging. Using information regarding HLA alleles obtained from HLA typing, HLA imputation or HLA inference, our software expands the alleles to amino acid sequences using the most recent IMGT/HLA database and prepares a dataset suitable for fine-mapping analysis. Our software also provides useful functionalities, such as various association tests, visualization tools and nomenclature conversion.https://github.com/WansonChoi/HATK.
Background: Rheumatoid arthritis, like many inflammatory diseases, is characterized by episodes of quiescence and exacerbation (flares). The molecular events leading to flares are unknown.
Methods: We established a clinical and technical protocol for repeated home collection of blood in patients with rheumatoid arthritis to allow for longitudinal RNA sequencing (RNA-seq). Specimens were obtained from 364 time points during eight flares over a period of 4 years in our index patient, as well as from 235 time points during flares in three additional patients. We identified transcripts that were differentially expressed before flares and compared these with data from synovial single-cell RNA-seq. Flow cytometry and sorted-blood-cell RNA-seq in additional patients were used to validate the findings.
Results: Consistent changes were observed in blood transcriptional profiles 1 to 2 weeks before a rheumatoid arthritis flare. B-cell activation was followed by expansion of circulating CD45-CD31-PDPN+ preinflammatory mesenchymal, or PRIME, cells in the blood from patients with rheumatoid arthritis; these cells shared features of inflammatory synovial fibroblasts. Levels of circulating PRIME cells decreased during flares in all 4 patients, and flow cytometry and sorted-cell RNA-seq confirmed the presence of PRIME cells in 19 additional patients with rheumatoid arthritis.
Conclusions: Longitudinal genomic analysis of rheumatoid arthritis flares revealed PRIME cells in the blood during the period before a flare and suggested a model in which these cells become activated by B cells in the weeks before a flare and subsequently migrate out of the blood into the synovium. (Funded by the National Institutes of Health and others.).
Multiple slowly progressing diseases initially present with inflammatory arthritis, and it can be difficult to clinically differentiate these conditions. Knevel et al. show that genetic data could be used to triage inflammatory arthritis–causing diagnoses at a patient’s first visit, improving the likelihood of a correct initial diagnosis and potentially expediting appropriate treatment. Their genetic diagnostic tool, here optimized for rheumatic disease diagnosis, could, in principle, be calibrated for other phenotypically similar diseases with different underlying genetics.It is challenging to quickly diagnose slowly progressing diseases. To prioritize multiple related diagnoses, we developed G-PROB (Genetic Probability tool) to calculate the probability of different diseases for a patient using genetic risk scores. We tested G-PROB for inflammatory arthritis–causing diseases (rheumatoid arthritis, systemic lupus erythematosus, spondyloarthropathy, psoriatic arthritis, and gout). After validating on simulated data, we tested G-PROB in three cohorts: 1211 patients identified by International Classification of Diseases (ICD) codes within the eMERGE database, 245 patients identified through ICD codes and medical record review within the Partners Biobank, and 243 patients first presenting with unexplained inflammatory arthritis and with final diagnoses by record review within the Partners Biobank. Calibration of G-probabilities with disease status was high, with regression coefficients from 0.90 to 1.08 (1.00 is ideal). G-probabilities discriminated true diagnoses across the three cohorts with pooled areas under the curve (95% CI) of 0.69 (0.67 to 0.71), 0.81 (0.76 to 0.84), and 0.84 (0.81 to 0.86), respectively. For all patients, at least one disease could be ruled out, and in 45% of patients, a likely diagnosis was identified with a 64% positive predictive value. In 35% of cases, the clinician’s initial diagnosis was incorrect. Initial clinical diagnosis explained 39% of the variance in final disease, which improved to 51% (P < 0.0001) after adding G-probabilities. Converting genotype information before a clinical visit into an interpretable probability value for five different inflammatory arthritides could potentially be used to improve the diagnostic efficiency of rheumatic diseases in clinical practice.
Peruvians are among the shortest people in the world. To understand the genetic basis of short stature in Peru, we examined an ethnically diverse group of Peruvians and identified a novel, population-specific, missense variant in FBN1 (E1297G) that is significantly associated with lower height in the Peruvian population. Each copy of the minor allele (frequency = 4.7%) reduces height by 2.2 cm (4.4 cm in homozygous individuals). This is the largest effect size known for a common height-associated variant. This variant shows strong evidence of positive selection within the Peruvian population and is significantly more frequent in Native American populations from coastal regions of Peru compared to populations from the Andes or the Amazon, suggesting that short stature in Peruvians is the result of adaptation to the coastal environment.One Sentence Summary A mutation found in Peruvians has the largest known effect on height for a common variant. This variant is specific to Native American ancestry.
Ishigaki K, Akiyama M, Kanai M, Takahashi A, Kawakami E, Sugishita H, Sakaue S, Matoba N, Low S-K, Okada Y, Terao C, Amariuta T, Gazal S, Kochi Y, Horikoshi M, Suzuki K, Ito K, Momozawa Y, Hirata M, Matsuda K, Ikeda M, Iwata N, Ikegawa S, Kou I, Tanaka T, Nakagawa H, Suzuki A, Hirota T, Tamari M, Chayama K, Miki D, Mori M, Nagayama S, Daigo Y, Miki Y, Katagiri T, Ogawa O, Obara W, Ito H, Yoshida T, Imoto I, Takahashi T, Tanikawa C, Suzuki T, Sinozaki N, Minami S, Yamaguchi H, Asai S, Takahashi Y, Yamaji K, Takahashi K, Fujioka T, Takata R, Yanai H, Masumoto A, Koretsune Y, Kutsumi H, Higashiyama M, Murayama S, Minegishi N, Suzuki K, Tanno K, Shimizu A, Yamaji T, Iwasaki M, Sawada N, Uemura H, Tanaka K, Naito M, Sasaki M, Wakai K, Tsugane S, Yamamoto M, Yamamoto K, Murakami Y, Nakamura Y, Raychaudhuri S*, Inazawa J*, Yamauchi T*, Kadowaki T*, Kubo M*, Kamatani Y*. Large-scale genome-wide association study in a Japanese population identifies novel susceptibility loci across different diseases [Internet]. Nature Genetics 2020;52(7):669-679. Publisher's VersionAbstract
The overwhelming majority of participants in current genetic studies are of European ancestry1–3, limiting our genetic understanding of complex disease in non-European populations. To address this, we aimed to elucidate polygenic disease biology in the East Asian population by conducting a genome-wide association study (GWAS) with 212,453 Japanese individuals across 42 diseases. We detected 383 independent signals in 331 loci for 30 diseases, among which 45 loci were novel (P < 5 × 10-8). Compared with known variants, novel variants have lower frequency in European populations but comparable frequency in East Asian populations, suggesting the advantage of this study in discovering these novel variants. Three novel signals were in linkage disequilibrium (r2 > 0.6) with missense variants which are monomorphic in European populations (1000 Genomes Project) including rs11235604(p.R220W of ATG16L2, a autophagy-related gene) associated with coronary artery disease. We further investigated enrichment of heritability within 2,868 annotations of genome-wide transcription factor occupancy, andidentified 378 significant enrichments across nine diseases (FDR < 0.05) (e.g. NF-κB for immune-related diseases). This large-scale GWAS in a Japanese population provides insights into the etiology of common complex diseases and highlights the importance of performing GWAS in non-European populations.
Genetic studies have revealed that autoimmune susceptibility variants are over-represented in memory CD4+ T cell regulatory elements1-3. Understanding how genetic variation affects gene expression in different T cell physiological states is essential for deciphering genetic mechanisms of autoimmunity4,5. Here, we characterized the dynamics of genetic regulatory effects at eight time points during memory CD4+ T cell activation with high-depth RNA-seq in healthy individuals. We discovered widespread, dynamic allele-specific expression across the genome, where the balance of alleles changes over time. These genes were enriched fourfold within autoimmune loci. We found pervasive dynamic regulatory effects within six HLA genes. HLA-DQB1 alleles had one of three distinct transcriptional regulatory programs. Using CRISPR-Cas9 genomic editing we demonstrated that a promoter variant is causal for T cell-specific control of HLA-DQB1 expression. Our study shows that genetic variation in cis-regulatory elements affects gene expression in a manner dependent on lymphocyte activation status, contributing to the interindividual complexity of immune responses.
The role of stromal fibroblasts in chronic inflammation is unfolding. In rheumatoid arthritis, leukocyte-derived cytokines TNF and IL-17A work together, activating fibroblasts to become a dominant source of the hallmark cytokine IL-6. However, IL-17A alone has minimal effect on fibroblasts. To identify key mediators of the synergistic response to TNF and IL-17A in human synovial fibroblasts, we performed time series, dose-response, and gene-silencing transcriptomics experiments. Here we show that in combination with TNF, IL-17A selectively induces a specific set of genes mediated by factors including cut-like homeobox 1 (CUX1) and IκBζ (NFKBIZ). In the promoters of CXCL1, CXCL2, and CXCL3, we found a putative CUX1-NF-κB binding motif not found elsewhere in the genome. CUX1 and NF-κB p65 mediate transcription of these genes independent of LIFR, STAT3, STAT4, and ELF3. Transcription of NFKBIZ, encoding the atypical IκB factor IκBζ, is IL-17A dose-dependent, and IκBζ only mediates the transcriptional response to TNF and IL-17A, but not to TNF alone. In fibroblasts, IL-17A response depends on CUX1 and IκBζ to engage the NF-κB complex to produce chemoattractants for neutrophil and monocyte recruitment.
Rheumatoid arthritis (RA) is the most common immune-mediated arthritis. Anti-citrullinated peptide antibodies (ACPA) are highly specific to RA and assayed with the commercial CCP2 assay. Genetic drivers of RA within the MHC are different for CCP2-positive and -negative subsets of RA, particularly at HLA-DRB1. However, aspartic acid at amino acid position 9 in HLA-B (B) increases risk to both RA subsets. Here we explore how individual serologies associated with RA drive associations within the MHC. To define MHC differences for specific ACPA serologies, we quantified a total of 19 separate ACPAs in RA-affected case subjects from four cohorts (n = 6,805). We found a cluster of tightly co-occurring antibodies (canonical serologies, containing CCP2), along with several independently expressed antibodies (non-canonical serologies). After imputing HLA variants into 6,805 case subjects and 13,467 control subjects, we tested associations between the HLA region and RA subgroups based on the presence of canonical and/or non-canonical serologies. We examined CCP2(+) and CCP2(-) RA-affected case subjects separately. In CCP2(-) RA, we observed that the association between CCP2(-) RA and B was derived from individuals who were positive for non-canonical serologies (omnibus_p = 9.2 × 10). Similarly, we observed in CCP2(+) RA that associations between subsets of CCP2(+) RA and B were negatively correlated with the number of positive canonical serologies (p = 0.0096). These findings suggest unique genetic characteristics underlying fine-specific ACPAs, suggesting that RA may be further subdivided beyond simply seropositive and seronegative.
The emerging diversity of single-cell RNA-seq datasets allows for the full transcriptional characterization of cell types across a wide variety of biological and clinical conditions. However, it is challenging to analyze them together, particularly when datasets are assayed with different technologies, because biological and technical differences are interspersed. We present Harmony (https://github.com/immunogenomics/harmony), an algorithm that projects cells into a shared embedding in which cells group by cell type rather than dataset-specific conditions. Harmony simultaneously accounts for multiple experimental and biological factors. In six analyses, we demonstrate the superior performance of Harmony to previously published algorithms while requiring fewer computational resources. Harmony enables the integration of ~10 cells on a personal computer. We apply Harmony to peripheral blood mononuclear cells from datasets with large experimental differences, five studies of pancreatic islet cells, mouse embryogenesis datasets and the integration of scRNA-seq with spatial transcriptomics data.
Of the 1.8 billion people worldwide infected with Mycobacterium tuberculosis, 5–15% will develop active tuberculosis (TB). Approximately half will progress to active TB within the first 18 months after infection, presumably because they fail to mount an effective initial immune response. Here, in a genome-wide genetic study of early TB progression, we genotype 4002 active TB cases and their household contacts in Peru. We quantify genetic heritability (h2ghg2) of early TB progression to be 21.2% (standard error 0.08). This suggests TB progression has a strong genetic basis, and is comparable to traits with well-established genetic bases. We identify a novel association between early TB progression and variants located in a putative enhancer region on chromosome 3q23 (rs73226617, OR = 1.18; P = 3.93 × 10−8). With in silico and in vitro analyses we identify rs73226617 or rs148722713 as the likely functional variant and ATP1B3 as a potential causal target gene with monocyte specific function.