The rapidly emerging diversity of single cell RNAseq datasets allows us to characterize the transcriptional behavior of cell types across a wide variety of biological and clinical conditions. With this comprehensive breadth comes a major analytical challenge. The same cell type across tissues, from different donors, or in different disease states, may appear to express different genes. A joint analysis of multiple datasets requires the integration of cells across diverse conditions. This is particularly challenging when datasets are assayed with different technologies in which real biological differences are interspersed with technical differences. We present Harmony, an algorithm that projects cells into a shared embedding in which cells group by cell type rather than dataset-specific conditions. Unlike available single-cell integration methods, Harmony can simultaneously account for multiple experimental and biological factors. We develop objective metrics to evaluate the quality of data integration. In four separate analyses, we demonstrate the superior performance of Harmony to four single-cell-specific integration algorithms. Moreover, we show that Harmony requires dramatically fewer computational resources. It is the only available algorithm that makes the integration of ∼ 106 cells feasible on a personal computer. We demonstrate that Harmony identifies both broad populations and fine-grained subpopulations of PBMCs from datasets with large experimental differences. In a meta-analysis of 14,746 cells from 5 studies of human pancreatic islet cells, Harmony accounts for variation among technologies and donors to successfully align several rare subpopulations. In the resulting integrated embedding, we identify a previously unidentified population of potentially dysfunctional alpha islet cells, enriched for genes active in the Endoplasmic Reticulum (ER) stress response. The abundance of these alpha cells correlates across donors with the proportion of dysfunctional beta cells also enriched in ER stress response genes. Harmony is a fast and flexible general purpose integration algorithm that enables the identification of shared fine-grained subpopulations across a variety of experimental and biological conditions.
We sought to investigate whether genetic effects on response to TNF inhibitors (TNFi) in rheumatoid arthritis (RA) could be localised by considering known genetic susceptibility loci for relevant traits and to evaluate the usefulness of these genetic loci for stratifying drug response.
We studied the relation of TNFi response, quantified by change in swollen joint counts ( Δ SJC) and erythrocyte sedimentation rate ( Δ ESR) with locus-specific scores constructed from genome-wide assocation study summary statistics in 2938 genotyped individuals: 37 scores for RA; scores for 19 immune cell traits; scores for expression or methylation of 93 genes with previously reported associations between transcript level and drug response. Multivariate associations were evaluated in penalised regression models by cross-validation.
We detected a statistically significant association between Δ SJC and the RA score at the CD40 locus (p=0.0004) and an inverse association between Δ SJC and the score for expression of CD39 on CD4 T cells (p=0.00005). A previously reported association between CD39 expression on regulatory T cells and response to methotrexate was in the opposite direction. In stratified analysis by concomitant methotrexate treatment, the inverse association was stronger in the combination therapy group and dissipated in the TNFi monotherapy group. Overall, ability to predict TNFi response from genotypic scores was limited, with models explaining less than 1% of phenotypic variance.
The association with the CD39 trait is difficult to interpret because patients with RA are often prescribed TNFi after failing to respond to methotrexate. The CD39 and CD40 pathways could be relevant for targeting drug therapy.
Many immune diseases occur at different rates among people with schizophrenia compared to the general population. Here, we evaluated whether this phenomenon might be explained by shared genetic risk factors. We used data from large genome-wide association studies to compare the genetic architecture of schizophrenia to 19 immune diseases. First, we evaluated the association with schizophrenia of 581 variants previously reported to be associated with immune diseases at genome-wide significance. We identified five variants with potentially pleiotropic effects. While colocalization analyses were inconclusive, functional characterization of these variants provided the strongest evidence for a model in which genetic variation at rs1734907 modulates risk of schizophrenia and Crohn's disease via altered methylation and expression of EPHB4 - a gene whose protein product guides the migration of neuronal axons in the brain and the migration of lymphocytes towards infected cells in the immune system. Next, we investigated genome-wide sharing of common variants between schizophrenia and immune diseases using cross-trait LD Score regression. Of the 11 immune diseases with available genome-wide summary statistics, we observed genetic correlation between six immune diseases and schizophrenia: inflammatory bowel disease (rg=0.12±0.03, p=2.49x10-4), Crohn's disease (rg=0.097±0.06, p=3.27x10-3), ulcerative colitis (rg=0.11±0.04, p=4.05x10-3), primary biliary cirrhosis (rg=0.13±0.05, p=3.98x10-3), psoriasis (rg=0.18±0.07, p=7.78x10-3), and systemic lupus erythematosus (rg=0.13±0.05, p=3.76x10-3). With the exception of ulcerative colitis, the degree and direction of these genetic correlations were consistent with the expected phenotypic correlation based on epidemiological data. Our findings suggest shared genetic risk factors contribute to the epidemiological association of certain immune diseases and schizophrenia.
Of the 1.8 billion people worldwide infected with Mycobacterium tuberculosis, 5–15% will develop active tuberculosis (TB). Approximately half will progress to active TB within the first 18 months after infection, presumably because they fail to mount an effective initial immune response. Here, in a genome-wide genetic study of early TB progression, we genotype 4002 active TB cases and their household contacts in Peru. We quantify genetic heritability (h2ghg2) of early TB progression to be 21.2% (standard error 0.08). This suggests TB progression has a strong genetic basis, and is comparable to traits with well-established genetic bases. We identify a novel association between early TB progression and variants located in a putative enhancer region on chromosome 3q23 (rs73226617, OR = 1.18; P = 3.93 × 10−8). With in silico and in vitro analyses we identify rs73226617 or rs148722713 as the likely functional variant and ATP1B3 as a potential causal target gene with monocyte specific function.
Arazi A, Rao DA, Berthier CC, Davidson A, Liu Y, Hoover PJ, Chicoine A, Eisenhaure TM, Jonsson AH, Li S, Lieb DJ, Zhang F, Slowikowski K, Browne EP, Norma A, Sutherby D, Steelman S, Smilek DE, Tosta P, Apruzzese W, Massarotti E, Dall'Era M, Park M, Kamen DL, Furie RA, Payan-Schober F, Pendergraft WF, McInnes EA, Buyon JP, Petri MA, Putterman C, Kalunian KC, Woodle ES, Lederer JA, Hildeman DA, Nusbaum C, Raychaudhuri S, Kretzler M, Anolik JH, Brenner MB, Wofsy D, Hacohen N, Diamond B, in network AMPSLE. The immune cell landscape in kidneys of patients with lupus nephritis [Internet]. Nature Immunology 2019;20(7):902–914. Publisher's VersionAbstract
Lupus nephritis is a potentially fatal autoimmune disease for which the current treatment is ineffective and often toxic. To develop mechanistic hypotheses of disease, we analyzed kidney samples from patients with lupus nephritis and from healthy control subjects using single-cell RNA sequencing. Our analysis revealed 21 subsets of leukocytes active in disease, including multiple populations of myeloid cells, T cells, natural killer cells and B cells that demonstrated both pro-inflammatory responses and inflammation-resolving responses. We found evidence of local activation of B cells correlated with an age-associated B-cell signature and evidence of progressive stages of monocyte differentiation within the kidney. A clear interferon response was observed in most cells. Two chemokine receptors, CXCR4 and CX3CR1, were broadly expressed, implying a potentially central role in cell trafficking. Gene expression of immune cells in urine and kidney was highly correlated, which would suggest that urine might serve as a surrogate for kidney biopsies.
The molecular and cellular processes that lead to renal damage and to the heterogeneity of lupus nephritis (LN) are not well understood. We applied single-cell RNA sequencing (scRNA-seq) to renal biopsies from patients with LN and evaluated skin biopsies as a potential source of diagnostic and prognostic markers of renal disease. Type I interferon (IFN)-response signatures in tubular cells and keratinocytes distinguished patients with LN from healthy control subjects. Moreover, a high IFN-response signature and fibrotic signature in tubular cells were each associated with failure to respond to treatment. Analysis of tubular cells from patients with proliferative, membranous and mixed LN indicated pathways relevant to inflammation and fibrosis, which offer insight into their histologic differences. In summary, we applied scRNA-seq to LN to deconstruct its heterogeneity and identify novel targets for personalized approaches to therapy.
To define the cell populations that drive joint inflammation in rheumatoid arthritis (RA), we applied single-cell RNA sequencing (scRNA-seq), mass cytometry, bulk RNA sequencing (RNA-seq) and flow cytometry to T cells, B cells, monocytes, and fibroblasts from 51 samples of synovial tissue from patients with RA or osteoarthritis (OA). Utilizing an integrated strategy based on canonical correlation analysis of 5,265 scRNA-seq profiles, we identified 18 unique cell populations. Combining mass cytometry and transcriptomics revealed cell states expanded in RA synovia: THY1(CD90)+HLA-DRAhi sublining fibroblasts, IL1B+ pro-inflammatory monocytes, ITGAX+TBX21+autoimmune-associated B cells and PDCD1+ peripheral helper T (TPH) cells and follicular helper T (TFH) cells. We defined distinct subsets of CD8+ T cells characterized by GZMK+, GZMB+, and GNLY+ phenotypes. We mapped inflammatory mediators to their source cell populations; for example, we attributed IL6 expression to THY1+HLA-DRAhi fibroblasts and IL1B production to pro-inflammatory monocytes. These populations are potentially key mediators of RA pathogenesis.
The identification of lymphocyte subsets with non-overlapping effector functions has been pivotal to the development of targeted therapies in immune mediated inflammatory diseases (IMIDs). However it remains unclear whether fibroblast subclasses with non-overlapping functions also exist and are responsible for the wide variety of tissue driven processes observed in IMIDs such as inflammation and damage. Here we identify and describe the biology of distinct subsets of fibroblasts responsible for mediating either inflammation or tissue damage in arthritis. We show that deletion of FAPα+ synovial cells suppressed both inflammation and bone erosions in murine models of resolving and persistent arthritis. Single cell transcriptional analysis identified two distinct fibroblast subsets: FAPα+ THY1+ immune effector fibroblasts located in the synovial sub-lining, and FAPα+ THY1- destructive fibroblasts restricted to the synovial lining. When adoptively transferred into the joint, FAP α+ THY1- fibroblasts selectively mediate bone and cartilage damage with little effect on inflammation whereas transfer of FAP α+ THY1+ fibroblasts resulted in a more severe and persistent inflammatory arthritis, with minimal effect on bone and cartilage. Our findings describing anatomically discrete, functionally distinct fibroblast subsets with non-overlapping functions have important implications for cell based therapies aimed at modulating inflammation and tissue damage.
Genome-wide association studies (GWASs) are valuable for understanding human biology, but associated loci typically contain multiple associated variants and genes. Thus, algorithms that prioritize likely causal genes and variants for a given phenotype can provide biological interpretations of association data. However, a critical, currently missing capability is to objectively compare performance of such algorithms. Typical comparisons rely on “gold standard” genes harboring causal coding variants, but such gold standards may be biased and incomplete. To address this issue, we developed Benchmarker, an unbiased, data-driven benchmarking method that compares performance of similarity-based prioritization strategies to each other (and to random chance) by leave-one-chromosome-out cross-validation with stratified linkage disequilibrium (LD) score regression. We first applied Benchmarker to 20 well-powered GWASs and compared gene prioritization based on strategies employing three different data sources, including annotated gene sets and gene expression; genes prioritized based on gene sets had higher per-SNP heritability than those prioritized based on gene expression. Additionally, in a direct comparison of three methods, DEPICT and MAGMA outperformed NetWAS. We also evaluated combinations of methods; our results indicated that combining data sources and algorithms can help prioritize higher-quality genes for follow-up. Benchmarker provides an unbiased approach to evaluate any similarity-based method that provides genome-wide prioritization of genes, variants, or gene sets and can determine the best such method for any particular GWAS. Our method addresses an important unmet need for rigorous tool assessment and can assist in mapping genetic associations to causal function.
Macrophages tailor their function according to the signals found in tissue microenvironments, assuming a wide spectrum of phenotypes. A detailed understanding of macrophage phenotypes in human tissues is limited. Using single-cell RNA sequencing, we defined distinct macrophage subsets in the joints of patients with the autoimmune disease rheumatoid arthritis (RA), which affects ~1% of the population. The subset we refer to as HBEGF ⁺ inflammatory macrophages is enriched in RA tissues and is shaped by resident fibroblasts and the cytokine tumor necrosis factor (TNF). These macrophages promoted fibroblast invasiveness in an epidermal growth factor receptor–dependent manner, indicating that intercellular cross-talk in this inflamed setting reshapes both cell types and contributes to fibroblast-mediated joint destruction. In an ex vivo synovial tissue assay, most medications used to treat RA patients targeted HBEGF ⁺ inflammatory macrophages; however, in some cases, medication redirected them into a state that is not expected to resolve inflammation. These data highlight how advances in our understanding of chronically inflamed human tissues and the effects of medications therein can be achieved by studies on local macrophage phenotypes and intercellular interactions.
Recent studies have highlighted the role of gene networks in disease biology. To formally assess this, we constructed a broad set of pathway, network, and pathway+network annotations and applied stratified LD score regression to 42 diseases and complex traits (average N = 323K) to identify enriched annotations. First, we analyzed 18,119 biological pathways. We identified 156 pathway-trait pairs whose disease enrichment was statistically significant (FDR < 5%) after conditioning on all genes and 75 known functional annotations (from the baseline-LD model), a stringent step that greatly reduced the number of pathways detected; most significant pathway-trait pairs were previously unreported. Next, for each of four published gene networks, we constructed probabilistic annotations based on network connectivity. For each gene network, the network connectivity annotation was strongly significantly enriched. Surprisingly, the enrichments were fully explained by excess overlap between network annotations and regulatory annotations from the baseline-LD model, validating the informativeness of the baseline-LD model and emphasizing the importance of accounting for regulatory annotations in gene network analyses. Finally, for each of the 156 enriched pathway-trait pairs, for each of the four gene networks, we constructed pathway+network annotations by annotating genes with high network connectivity to the input pathway. For each gene network, these pathway+network annotations were strongly significantly enriched for the corresponding traits. Once again, the enrichments were largely explained by the baseline-LD model. In conclusion, gene network connectivity is highly informative for disease architectures, but the information in gene networks may be subsumed by regulatory annotations, emphasizing the importance of accounting for known annotations.
Despite significant progress in annotating the genome with experimental methods, much of the regulatory noncoding genome remains poorly defined. Here we assert that regulatory elements may be characterized by leveraging local epigenomic signatures at sites where specific transcription factors (TFs) are bound. To link these two identifying features, we introduce IMPACT, a genome annotation strategy which identifies regulatory elements defined by cell-state-specific TF binding profiles, learned from 515 chromatin and sequence annotations. We validate IMPACT using multiple compelling applications. First, IMPACT predicts TF motif binding with high accuracy (average AUC 0.92, s.e. 0.03; across 8 TFs), a significant improvement (all p<6.9e-15) over intersecting motifs with open chromatin (average AUC 0.66, s.e. 0.11). Second, an IMPACT annotation trained on RNA polymerase II is more enriched for peripheral blood cis-eQTL variation (N=3,754) than sequence based annotations, such as promoters and regions around the TSS, (permutation p<1e-3, 25% average increase in enrichment). Third, integration with rheumatoid arthritis (RA) summary statistics from European (N=38,242) and East Asian (N=22,515) populations revealed that the top 5% of CD4+ Treg IMPACT regulatory elements capture 85.7% (s.e. 19.4%) of RA h2 (p<1.6e-5) and that the top 9.8% of Treg IMPACT regulatory elements, consisting of all SNPs with a non-zero annotation value, capture 97.3% (s.e. 18.2%) of RA h2 (p<7.6e-7), the most comprehensive explanation for RA h2 to date. In comparison, the average RA h2 captured by compared CD4+ T histone marks is 42.3% and by CD4+ T specifically expressed gene sets is 36.4%. Finally, integration with RA fine-mapping data (N=27,345) revealed a significant enrichment (2.87, p<8.6e-3) of putatively causal variants across 20 RA associated loci in the top 1% of CD4+ Treg IMPACT regulatory regions. Overall, we find that IMPACT generalizes well to other cell types in identifying complex trait associated regulatory elements.
The Electronic Medical Records and Genomics (eMERGE) network is a network of medical centers with electronic medical records linked to existing biorepository samples for genomic discovery and genomic medicine research. The network sought to unify the genetic results from 78 Illumina and Affymetrix genotype array batches from 12 contributing medical centers for joint association analysis of 83,717 human participants. In this report, we describe the imputation of eMERGE results and methods to create the unified imputed merged set of genome-wide variant genotype data. We imputed the data using the Michigan Imputation Server, which provides a missing single-nucleotide variant genotype imputation service using the minimac3 imputation algorithm with the Haplotype Reference Consortium genotype reference set. We describe the quality control and filtering steps used in the generation of this data set and suggest generalizable quality thresholds for imputation and phenotype association studies. To test the merged imputed genotype set, we replicated a previously reported chromosome 6 HLA-B herpes zoster (shingles) association and discovered a novel zoster-associated loci in an epigenetic binding site near the terminus of chromosome 3 (3p29).
Stroma is a broad term referring to the connective tissue matrix in which other cells reside. It is composed of diverse cell types with functions such as extracellular matrix maintenance, blood and lymph vessel development, and effector cell recruitment. The tissue microenvironment is determined by the molecular characteristics and relative abundances of different stromal cells such as fibroblasts, endothelial cells, pericytes, and mesenchymal precursor cells. Stromal cell heterogeneity is explained by embryonic developmental lineage, stages of differentiation to other cell types, and activation states. Interaction between immune and stromal cell types is critical to wound healing, cancer, and a wide range of inflammatory diseases. Here, we review recent studies of inflammatory diseases that use functional genomics and single-cell technologies to identify and characterize stromal cell types associated with pathogenesis.
High dimensional strategies using mRNA sequencing, mass cytometry, and fluorescence activated cell-sorting with fresh primary tissue samples are producing detailed views of what is happening in diseased tissue in rheumatoid arthritis, inflammatory bowel disease, and cancer. Fibroblasts positive for CD90 (Thy-1) are enriched in the synovium of rheumatoid arthritis patients. Single-cell RNA-seq studies will lead to more discoveries about the stroma in the near future.
Stromal cells form the microenvironment of inflamed and diseased tissues. Functional genomics is producing an increasingly detailed view of subsets of stromal cells with pathogenic functions in rheumatic diseases and cancer. Future genomics studies will discover disease mechanisms by perturbing molecular pathways with chemokines and therapies known to affect patient outcomes. Functional genomics studies with large sample sizes of patient tissues will identify patient subsets with different disease phenotypes or treatment responses.
Haghighi A, Krier JB, Tóth-Petróczy A, Cassa CA, Frank NY, Carmichael N, Fieg E, Bjonnes AC, Mohanty AK, Briere LC, Lincoln SA, Lucia S, Gupta V, Söylemez O, Sutti S, Kooshesh K, Qiu H, Fay CJ, Perroni V, Valerius J, Hanna M, Frank A, Ouahed JD, Snapper SB, Pantazi A, Chopra SS, Leshchiner I, Stitziel NO, Feldweg AM, Mannstadt M, Loscalzo J, Sweetser DA, Liao E, Stoler JOM, Bearce nowak C, Sanchez-Lara PA, Klein OD, Perry H, Patsopoulos NA, Raychaudhuri S, Goessling W, Green RC, Seidman CE, MacRae CA, Sunyaev S, Maas RL, Vuzman D. An Integrated Clinical Program and Crowdsourcing Strategy for Genomic Sequencing and Mendelian Disease Gene Discovery. npj Genomic Medicine 2018;3:21.Abstract
Despite major progress in defining the genetic basis of Mendelian disorders, the molecular etiology of many cases remains unknown. Patients with these undiagnosed disorders often have complex presentations and require treatment by multiple health care specialists. Here, we describe an integrated clinical diagnostic and research program using whole-exome and whole-genome sequencing (WES/WGS) for Mendelian disease gene discovery. This program employs specific case ascertainment parameters, a WES/WGS computational analysis pipeline that is optimized for Mendelian disease gene discovery with variant callers tuned to specific inheritance modes, an interdisciplinary crowdsourcing strategy for genomic sequence analysis, matchmaking for additional cases, and integration of the findings regarding gene causality with the clinical management plan. The interdisciplinary gene discovery team includes clinical, computational, and experimental biomedical specialists who interact to identify the genetic etiology of the disease, and when so warranted, to devise improved or novel treatments for affected patients. This program effectively integrates the clinical and research missions of an academic medical center and affords both diagnostic and therapeutic options for patients suffering from genetic disease. It may therefore be germane to other academic medical institutions engaged in implementing genomic medicine programs.
Current classification of primary inflammatory arthritis begins from the assumption that adults and children are different. No form of juvenile idiopathic arthritis bears the same name as an adult arthritis, a nomenclature gap with implications for both clinical care and research. Recent genetic data have raised questions regarding this adult/pediatric divide, revealing instead broad patterns that span the age spectrum. Combining these genetic patterns with demographic and clinical data, we propose that inflammatory arthritis can be segregated into 4 main clusters, largely irrespective of pediatric or adult onset: seropositive, seronegative (likely including a distinct group that usually begins in early childhood), spondyloarthritis, and systemic. Each of these broad clusters is internally heterogeneous, highlighting the need for further study to resolve etiologically discrete entities. Eliminating divisions based on arbitrary age cutoffs will enhance opportunities for collaboration between adult and pediatric rheumatologists, thereby helping to promote the understanding and treatment of arthritis.