The rapidly emerging diversity of single cell RNAseq datasets allows us to characterize the transcriptional behavior of cell types across a wide variety of biological and clinical conditions. With this comprehensive breadth comes a major analytical challenge. The same cell type across tissues, from different donors, or in different disease states, may appear to express different genes. A joint analysis of multiple datasets requires the integration of cells across diverse conditions. This is particularly challenging when datasets are assayed with different technologies in which real biological differences are interspersed with technical differences. We present Harmony, an algorithm that projects cells into a shared embedding in which cells group by cell type rather than dataset-specific conditions. Unlike available single-cell integration methods, Harmony can simultaneously account for multiple experimental and biological factors. We develop objective metrics to evaluate the quality of data integration. In four separate analyses, we demonstrate the superior performance of Harmony to four single-cell-specific integration algorithms. Moreover, we show that Harmony requires dramatically fewer computational resources. It is the only available algorithm that makes the integration of ∼ 106 cells feasible on a personal computer. We demonstrate that Harmony identifies both broad populations and fine-grained subpopulations of PBMCs from datasets with large experimental differences. In a meta-analysis of 14,746 cells from 5 studies of human pancreatic islet cells, Harmony accounts for variation among technologies and donors to successfully align several rare subpopulations. In the resulting integrated embedding, we identify a previously unidentified population of potentially dysfunctional alpha islet cells, enriched for genes active in the Endoplasmic Reticulum (ER) stress response. The abundance of these alpha cells correlates across donors with the proportion of dysfunctional beta cells also enriched in ER stress response genes. Harmony is a fast and flexible general purpose integration algorithm that enables the identification of shared fine-grained subpopulations across a variety of experimental and biological conditions.
We sought to investigate whether genetic effects on response to TNF inhibitors (TNFi) in rheumatoid arthritis (RA) could be localised by considering known genetic susceptibility loci for relevant traits and to evaluate the usefulness of these genetic loci for stratifying drug response.
We studied the relation of TNFi response, quantified by change in swollen joint counts ( Δ SJC) and erythrocyte sedimentation rate ( Δ ESR) with locus-specific scores constructed from genome-wide assocation study summary statistics in 2938 genotyped individuals: 37 scores for RA; scores for 19 immune cell traits; scores for expression or methylation of 93 genes with previously reported associations between transcript level and drug response. Multivariate associations were evaluated in penalised regression models by cross-validation.
We detected a statistically significant association between Δ SJC and the RA score at the CD40 locus (p=0.0004) and an inverse association between Δ SJC and the score for expression of CD39 on CD4 T cells (p=0.00005). A previously reported association between CD39 expression on regulatory T cells and response to methotrexate was in the opposite direction. In stratified analysis by concomitant methotrexate treatment, the inverse association was stronger in the combination therapy group and dissipated in the TNFi monotherapy group. Overall, ability to predict TNFi response from genotypic scores was limited, with models explaining less than 1% of phenotypic variance.
The association with the CD39 trait is difficult to interpret because patients with RA are often prescribed TNFi after failing to respond to methotrexate. The CD39 and CD40 pathways could be relevant for targeting drug therapy.
Many immune diseases occur at different rates among people with schizophrenia compared to the general population. Here, we evaluated whether this phenomenon might be explained by shared genetic risk factors. We used data from large genome-wide association studies to compare the genetic architecture of schizophrenia to 19 immune diseases. First, we evaluated the association with schizophrenia of 581 variants previously reported to be associated with immune diseases at genome-wide significance. We identified five variants with potentially pleiotropic effects. While colocalization analyses were inconclusive, functional characterization of these variants provided the strongest evidence for a model in which genetic variation at rs1734907 modulates risk of schizophrenia and Crohn's disease via altered methylation and expression of EPHB4 - a gene whose protein product guides the migration of neuronal axons in the brain and the migration of lymphocytes towards infected cells in the immune system. Next, we investigated genome-wide sharing of common variants between schizophrenia and immune diseases using cross-trait LD Score regression. Of the 11 immune diseases with available genome-wide summary statistics, we observed genetic correlation between six immune diseases and schizophrenia: inflammatory bowel disease (rg=0.12±0.03, p=2.49x10-4), Crohn's disease (rg=0.097±0.06, p=3.27x10-3), ulcerative colitis (rg=0.11±0.04, p=4.05x10-3), primary biliary cirrhosis (rg=0.13±0.05, p=3.98x10-3), psoriasis (rg=0.18±0.07, p=7.78x10-3), and systemic lupus erythematosus (rg=0.13±0.05, p=3.76x10-3). With the exception of ulcerative colitis, the degree and direction of these genetic correlations were consistent with the expected phenotypic correlation based on epidemiological data. Our findings suggest shared genetic risk factors contribute to the epidemiological association of certain immune diseases and schizophrenia.