At the Raychaudhuri Lab, we study the basis of immune mediated disease using techniques in human genetics, bioinformatics, and systems biology.    

Below, please click each figure to visit the associated publication.

Recent Publications

Reshef Y, Rumker L, Kang JB, Nathan A, Korsunsky I, Asgari S, Murray MB, Moody DB, Raychaudhuri S. Axes of inter-sample variability among transcriptional neighborhoods reveal disease-associated cell states in single-cell data [Internet]. Nature Biotech In Press; bioRxivAbstract
As single-cell datasets grow in sample size, there is a critical need to characterize cell states that vary across samples and associate with sample attributes like clinical phenotypes. Current statistical approaches typically map cells to cell-type clusters and examine sample differences through that lens alone. Here we present covarying neighborhood analysis (CNA), an unbiased method to identify cell populations of interest with greater flexibility and granularity. CNA characterizes dominant axes of variation across samples by identifying groups of very small regions in transcriptional space—termed neighborhoods—that covary in abundance across samples, suggesting shared function or regulation. CNA can then rigorously test for associations between any sample-level attribute and the abundances of these covarying neighborhood groups. We show in simulation that CNA enables more powerful and accurate identification of disease-associated cell states than a cluster-based approach. When applied to published datasets, CNA captures a Notch activation signature in rheumatoid arthritis, redefines monocyte populations expanded in sepsis, and identifies a previously undiscovered T-cell population associated with progression to active tuberculosis.Competing Interest StatementThe authors have declared no competing interest.
Luo Y, Kanai M, Choi W, Li X, Yamamoto K, Ogawa K, Gutierrez-Arcelus M, Gregersen PK, Stuart PE, Elder JT, Fellay J, Carrington M, Haas DW, Guo X, Palmer ND, Chen Y-DI, Rotter JI, Taylor KD, Rich SS, Correa A, Wilson JG, Kathiresan S, Cho MH, Metspalu A, Esko T, Okada Y, Han B, for Consortium NHLBIT-OPM (TOPM), McLaren PJ, Raychaudhuri S. A high-resolution HLA reference panel capturing global population diversity enables multi-ethnic fine-mapping in HIV host response [Internet]. Nature Genetics In Press; PreprintAbstract
Defining causal variation by fine-mapping can be more effective in multi-ethnic genetic studies, particularly in regions such as the MHC with highly population-specific structure. To enable such studies, we constructed a large (N=21,546) high resolution HLA reference panel spanning five global populations based on whole-genome sequencing data. Expectedly, we observed unique long-range HLA haplotypes within each population group. Despite this, we demonstrated consistently accurate imputation at G-group resolution (94.2%, 93.7%, 97.8% and 93.7% in Admixed African (AA), East Asian (EAS), European (EUR) and Latino (LAT)). We jointly analyzed genome-wide association studies (GWAS) of HIV-1 viral load from EUR, AA and LAT populations. Our analysis pinpointed the MHC association to three amino acid positions (97, 67 and 156) marking three consecutive pockets (C, B and D) within the HLA-B peptide binding groove, explaining 12.9% of trait variance, and obviating effects of previously reported associations from population-specific HIV studies.Competing Interest StatementM.H.C. has received consulting or speaking fees from Illumina and AstraZeneca, and grant support from GSK and Bayer.Funding StatementThe study was supported by the National Institutes of Health (NIH) TB Research Unit Network, Grant U19 AI111224-01. The views expressed in this manuscript are those of the authors and do not necessarily represent the views of the National Heart, Lung, and Blood Institute; the National Institutes of Health; or the U.S. Department of Health and Human Services. The Genotype and Phenotype (GaP) Registry at The Feinstein Institute for Medical Research provided fresh, de-identified human plasma; blood was collected from control subjects under an IRB-approved protocol (IRB# 09-081) and processed to isolate plasma. The GaP is a sub-protocol of the Tissue Donation Program (TDP) at Northwell Health and a national resource for genotype-phenotype studies. A.M. is supported by Gentransmed grant 2014-2020.4.01.15-0012.; D.W.H. is supported by NIH grants AI110527, AI077505, TR000445, AI069439, and AI110527. D.H.S. was supported by R01 HL92301, R01 HL67348, R01 NS058700, R01 AR48797, R01 DK071891, R01 AG058921, the General Clinical Research Center of the Wake Forest University School of Medicine (M01 RR07122, F32 HL085989), the American Diabetes Association, and a pilot grant from the Claude Pepper Older Americans Independence Center of Wake Forest University Health Sciences (P60 AG10484). J.T.E. and P.E.S. were supported by NIH/NIAMS R01 AR042742, R01 AR050511, and R01 AR063611. For some HIV cohort participants, DNA and data collection was supported by NIH/NIAID AIDS Clinical Trial Group (ACTG) grants UM1 AI068634, UM1 AI068636 and UM1 AI106701, and ACTG clinical research site grants A1069412, A1069423, A1069424, A1069503, AI025859, AI025868, AI027658, AI027661, AI027666, AI027675, AI032782, AI034853, AI038858, AI045008, AI046370, AI046376, AI050409, AI050410, AI050410, AI058740, AI060354, AI068636, AI069412, AI069415, AI069418, AI069419, AI069423, AI069424, AI069428, AI069432, AI069432, AI069434, AI069439, AI069447, AI069450, AI069452, AI069465, AI069467, AI069470, AI069471, AI069472, AI069474, AI069477, AI069481, AI069484, AI069494, AI069495, AI069496, AI069501, AI069501, AI069502, AI069503, AI069511, AI069513, AI069532, AI069534, AI069556, AI072626, AI073961, RR000046, RR000425, RR023561, RR024156, RR024160, RR024996, RR025008, RR025747, RR025777, RR025780, TR000004, TR000058, TR000124, TR000170, TR000439, TR000445, TR000457, TR001079, TR001082, TR001111, and TR024160. Molecular data for the Trans-Omics in Precision Medicine (TOPMed) program was supported by the National Heart, Lung and Blood Institute (NHLBI). See the TOPMed Omics Support Table (Supplementary Table 16) for study specific omics support information. Core support including centralized genomic read mapping and genotype calling, along with variant quality metrics and filtering were provided by the TOPMed Informatics Research Center (3R01HL-117626-02S1; contract HHSN268201800002I). Core support including phenotype harmonization, data management, sample-identity QC, and general program coordination were provided by the TOPMed Data Coordinating Center (R01HL-120393; U01HL-120393; contract HHSN268201800001I). We gratefully acknowledge the studies and participants who provided biological samples and data for TOPMed. The COPDGene project was supported by Award Number U01 HL089897 and Award Number U01 HL089856 from the National Heart, Lung, and Blood Institute. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Heart, Lung, and Blood Institute or the National Institutes of Health. The COPDGene project is also supported by the COPD Foundation through contributions made to an Industry Advisory Board comprised of AstraZeneca, Boehringer Ingelheim, GlaxoSmithKline, Novartis, Pfizer, Siemens and Sunovion. A full listing of COPDGene investigators can be found at: The Jackson Heart Study (JHS) is supported and conducted in collaboration with Jackson State University (HHSN268201800013I), Tougaloo College (HHSN268201800014I), the Mississippi State Department of Health (HHSN268201800015I) and the University of Mississippi Medical Center (HHSN268201800010I, HHSN268201800011I and HHSN268201800012I) contracts from the National Heart, Lung, and Blood Institute (NHLBI) and the National Institute on Minority Health and Health Disparities (NIMHD). The authors also wish to thank the staffs and participants of the JHS. MESA and the MESA SHARe project are conducted and supported by the National Heart, Lung, and Blood Institute (NHLBI) in collaboration with MESA investigators. Support for MESA is provided by contracts 75N92020D00001, HHSN268201500003I, N01-HC-95159, 75N92020D00005, N01-HC-95160, 75N92020D00002, N01-HC-95161, 75N92020D00003, N01-HC-95162, 75N92020D00006, N01-HC-95163, 75N92020D00004, N01-HC-95164, 75N92020D00007, N01-HC-95165, N01-HC-95166, N01-HC-95167, N01-HC-95168, N01-HC-95169, UL1-TR-000040, UL1-TR-001079, UL1-TR-001420. MESA Family is conducted and supported by the National Heart, Lung, and Blood Institute (NHLBI) in collaboration with MESA investigators. Support is provided by grants and contracts R01HL071051, R01HL071205, R01HL071250, R01HL071251, R01HL071258, R01HL071259, by the National Center for Research Resources, Grant UL1RR033176. The provision of genotyping data was supported in part by the National Center for Advancing Translational Sciences, CTSI grant UL1TR001881, and the National Institute of Diabetes and Digestive and Kidney Disease Diabetes Research Center (DRC) grant DK063491 to the Southern California Diabetes Endocrinology Research Center. This project has been funded in whole or in part with federal funds from the Frederick National Laboratory for Cancer Research, under Contract No. HHSN261200800001E. The content of this publication does not necessarily reflect the views or policies of the Department of Health and Human Services, nor does mention of trade names, commercial products, or organizations imply endorsement by the U.S. Government. This Research was supported in part by the Intramural Research Program of the NIH, Frederick National Lab, Center for Cancer Research.Author DeclarationsI confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.YesThe details of the IRB/oversight body that provided approval or exemption for the research described are given below:The Genotype and Phenotype (GaP) Registry at The Feinstein Institute for Medical Research provided fresh, de-identified human plasma; blood was collected from control subjects under an IRB-approved protocol (IRB# 09-081) and processed to isolate plasma. The GaP is a sub-protocol of the Tissue Donation Program (TDP) at Northwell Health and a national resource for genotype-phenotype studies. Each study was previously approved by respective institutional review boards (IRBs), including for the generation of WGS data and association with phenotypes. All participants provided written consent.All necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived.YesI understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).YesI have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.YesThe source code is available for download at
Zhang F, Mears J, Shakib L, Beynor J, Shanaj S, Korsunsky I, Nathan A, Consortium AMP, Donlin L, Raychaudhuri S. IFN-γ and TNF-α drive a CXCL10+ CCL2+ macrophage phenotype expanded in severe COVID-19 lungs and inflammatory diseases with tissue inflammation [Internet]. Genome Med 2021;13(1):64. Publisher's VersionAbstract

Background: Immunosuppressive and anti-cytokine treatment may have a protective effect for patients with COVID-19. Understanding the immune cell states shared between COVID-19 and other inflammatory diseases with established therapies may help nominate immunomodulatory therapies.

Methods: To identify cellular phenotypes that may be shared across tissues affected by disparate inflammatory diseases, we developed a meta-analysis and integration pipeline that models and removes the effects of technology, tissue of origin, and donor that confound cell-type identification. Using this approach, we integrated > 300,000 single-cell transcriptomic profiles from COVID-19-affected lungs and tissues from healthy subjects and patients with five inflammatory diseases: rheumatoid arthritis (RA), Crohn's disease (CD), ulcerative colitis (UC), systemic lupus erythematosus (SLE), and interstitial lung disease. We tested the association of shared immune states with severe/inflamed status compared to healthy control using mixed-effects modeling. To define environmental factors within these tissues that shape shared macrophage phenotypes, we stimulated human blood-derived macrophages with defined combinations of inflammatory factors, emphasizing in particular antiviral interferons IFN-beta (IFN-β) and IFN-gamma (IFN-γ), and pro-inflammatory cytokines such as TNF.

Results: We built an immune cell reference consisting of > 300,000 single-cell profiles from 125 healthy or disease-affected donors from COVID-19 and five inflammatory diseases. We observed a CXCL10+ CCL2+ inflammatory macrophage state that is shared and strikingly abundant in severe COVID-19 bronchoalveolar lavage samples, inflamed RA synovium, inflamed CD ileum, and UC colon. These cells exhibited a distinct arrangement of pro-inflammatory and interferon response genes, including elevated levels of CXCL10, CXCL9, CCL2, CCL3, GBP1, STAT1, and IL1B. Further, we found this macrophage phenotype is induced upon co-stimulation by IFN-γ and TNF-α.

Conclusions: Our integrative analysis identified immune cell states shared across inflamed tissues affected by inflammatory diseases and COVID-19. Our study supports a key role for IFN-γ together with TNF-α in driving an abundant inflammatory macrophage phenotype in severe COVID-19-affected lungs, as well as inflamed RA synovium, CD ileum, and UC colon, which may be targeted by existing immunomodulatory therapies.

Keywords: COVID-19; Inflammatory diseases; Macrophage heterogeneity; Macrophage stimulation; Single-cell multi-disease tissue integration; Single-cell transcriptomics.

Nathan A, Beynor JI, Baglaenko Y, Suliman S, Ishigaki K, Asgari S, Huang CC, Luo Y, Zhang Z, Lopez K, Lindestam Arlehamn CS, Ernst JD, Jimenez J, Calderon RI, Lecca L, Van Rhijin I, Moody DB, Murray MB, Raychaudhuri S. Multimodally profiling memory T cells from a tuberculosis cohort identifies cell state associations with demographics, environment and disease [Internet]. Nat Immunol 2021;22(6):781-793. Publisher's VersionAbstract
Multimodal T cell profiling can enable more precise characterization of elusive cell states underlying disease. Here, we integrated single-cell RNA and surface protein data from 500,089 memory T cells to define 31 cell states from 259 individuals in a Peruvian tuberculosis (TB) progression cohort. At immune steady state >4 years after infection and disease resolution, we found that, after accounting for significant effects of age, sex, season and genetic ancestry on T cell composition, a polyfunctional type 17 helper T (TH17) cell-like effector state was reduced in abundance and function in individuals who previously progressed from Mycobacterium tuberculosis (M.tb) infection to active TB disease. These cells are capable of responding to M.tb peptides. Deconvoluting this state-uniquely identifiable with multimodal analysis-from public data demonstrated that its depletion may precede and persist beyond active disease. Our study demonstrates the power of integrative multimodal single-cell profiling to define cell states relevant to disease and other traits.
Kang J, Nathan A, Millard N, Rumker L, Moody DB, Korsunsky I, Raychaudhuri S. Efficient and precise single-cell reference atlas mapping with Symphony [Internet]. Nature Communications In Press; bioRxivAbstract
Recent advances in single-cell technologies and integration algorithms make it possible to construct large, comprehensive reference atlases from multiple datasets encompassing many donors, studies, disease states, and sequencing platforms. Much like mapping sequencing reads to a reference genome, it is essential to be able to map new query cells onto complex, multimillion-cell reference atlases to rapidly identify relevant cell states and phenotypes. We present Symphony, a novel algorithm for building compressed, integrated reference atlases of ≥106 cells and enabling efficient query mapping within seconds. Based on a linear mixture model framework, Symphony precisely localizes query cells within a low-dimensional reference embedding without the need to reintegrate the reference cells, facilitating the downstream transfer of many types of reference-defined annotations to the query cells. We demonstrate the power of Symphony by (1) mapping a query containing multiple levels of experimental design to predict pancreatic cell types in human and mouse, (2) localizing query cells along a smooth developmental trajectory of human fetal liver hematopoiesis, and (3) harnessing a multimodal CITE-seq reference atlas to infer query surface protein expression in memory T cells. Symphony will enable the sharing of comprehensive integrated reference atlases in a convenient, portable format that powers fast, reproducible querying and downstream analyses.
Amariuta T, Ishigaki K, Sugishita H, Ohta T, Koido M, Dey KK, Matsuda K, Murakami Y, Price AL, Kawakami E, Terao C, Raychaudhuri S. Improving the trans-ancestry portability of polygenic risk scores by prioritizing variants in predicted cell-type-specific regulatory elements [Internet]. Nature Genetics 2020;52:1346-1354. Publisher's VersionAbstract
Poor trans-ancestry portability of polygenic risk scores is a consequence of Eurocentric genetic studies and limited knowledge of shared causal variants. Leveraging regulatory annotations may improve portability by prioritizing functional over tagging variants. We constructed a resource of 707 cell-type-specific IMPACT regulatory annotations by aggregating 5,345 epigenetic datasets to predict binding patterns of 142 transcription factors across 245 cell types. We then partitioned the common SNP heritability of 111 genome-wide association study summary statistics of European (average n ≈ 189,000) and East Asian (average n ≈ 157,000) origin. IMPACT annotations captured consistent SNP heritability between populations, suggesting prioritization of shared functional variants. Variant prioritization using IMPACT resulted in increased trans-ancestry portability of polygenic risk scores from Europeans to East Asians across all 21 phenotypes analyzed (49.9% mean relative increase in R2). Our study identifies a crucial role for functional annotations such as IMPACT to improve the trans-ancestry portability of genetic data.
Asgari S, Luo Y, Belbin GM, Bartell E, Calderon R, Slowikowski K, Contreras C, Yataco R, Galea JT, Jimenez J, Coit JM, Farroñay C, Nazarian RM, O’Connor TD, Dietz HC, Hirschhorn J, Guio H, Lecca L, Kenny EE, Freeman E, Murray MB, Raychaudhuri S. A positively selected, common, missense variant in FBN1 confers a 2.2 centimeter reduction of height in the Peruvian population [Internet]. Nature 2020;582(7811):234-239. NCBI LinkAbstract
Peruvians are among the shortest people in the world. To understand the genetic basis of short stature in Peru, we examined an ethnically diverse group of Peruvians and identified a novel, population-specific, missense variant in FBN1 (E1297G) that is significantly associated with lower height in the Peruvian population. Each copy of the minor allele (frequency = 4.7%) reduces height by 2.2 cm (4.4 cm in homozygous individuals). This is the largest effect size known for a common height-associated variant. This variant shows strong evidence of positive selection within the Peruvian population and is significantly more frequent in Native American populations from coastal regions of Peru compared to populations from the Andes or the Amazon, suggesting that short stature in Peruvians is the result of adaptation to the coastal environment.One Sentence Summary A mutation found in Peruvians has the largest known effect on height for a common variant. This variant is specific to Native American ancestry.
Gutierrez-Arcelus M#, Baglaenko Y#, Arora J, Hannes S, Luo Y, Amariuta T, Teslovich N, Rao DA, Ermann J, Jonsson AH, for Consortium NHLBIT-OPM (TOPM), Navarrete C, Rich SS, Taylor KD, Rotter JI, Gregersen PK, Esko T, Brenner MB, Raychaudhuri S. Allele-specific expression changes dynamically during T cell activation in HLA and other autoimmune loci [Internet]. Nature Genetics 2020;52:247-253. Publisher's VersionAbstract
Genetic studies have revealed that autoimmune susceptibility variants are over-represented in memory CD4+ T cell regulatory elements1-3. Understanding how genetic variation affects gene expression in different T cell physiological states is essential for deciphering genetic mechanisms of autoimmunity4,5. Here, we characterized the dynamics of genetic regulatory effects at eight time points during memory CD4+ T cell activation with high-depth RNA-seq in healthy individuals. We discovered widespread, dynamic allele-specific expression across the genome, where the balance of alleles changes over time. These genes were enriched fourfold within autoimmune loci. We found pervasive dynamic regulatory effects within six HLA genes. HLA-DQB1 alleles had one of three distinct transcriptional regulatory programs. Using CRISPR-Cas9 genomic editing we demonstrated that a promoter variant is causal for T cell-specific control of HLA-DQB1 expression. Our study shows that genetic variation in cis-regulatory elements affects gene expression in a manner dependent on lymphocyte activation status, contributing to the interindividual complexity of immune responses.
Korsunsky I, Millard N, Fan J, Slowikowski K, Zhang F, Wei K, Baglaenko Y, Brenner M, Loh P-R, Raychaudhuri S. Fast, sensitive and accurate integration of single-cell data with Harmony [Internet]. Nat Methods 2019;16(12):1289-1296. Publisher's VersionAbstract
The emerging diversity of single-cell RNA-seq datasets allows for the full transcriptional characterization of cell types across a wide variety of biological and clinical conditions. However, it is challenging to analyze them together, particularly when datasets are assayed with different technologies, because biological and technical differences are interspersed. We present Harmony (, an algorithm that projects cells into a shared embedding in which cells group by cell type rather than dataset-specific conditions. Harmony simultaneously accounts for multiple experimental and biological factors. In six analyses, we demonstrate the superior performance of Harmony to previously published algorithms while requiring fewer computational resources. Harmony enables the integration of ~10 cells on a personal computer. We apply Harmony to peripheral blood mononuclear cells from datasets with large experimental differences, five studies of pancreatic islet cells, mouse embryogenesis datasets and the integration of scRNA-seq with spatial transcriptomics data.
Zhang F#, Wei K#, Slowikowski K#, Fonseka CY#, Rao DA#, Kelly S, Goodman SM, Tabechian D, Hughes LB, Salomon-Escoto K, Watts GFM, Jonsson AH, Rangel-Moreno J, Meednu N, Rozo C, Apruzzese W, Eisenhauere TM, Lieb DJ, Boyle DL, Mandelin AM, and Consortium AMPRASLE (AMPRA/SLE), Boyce BF, DiCarlo E, Gravallese EM, Gregersen PK, Moreland L, Firestein GS, Hacohen N, Nusbaum C, Lederer JA, Perlman H, Pitzalis C, Filer A, Holers VM, Bykerk VP, Donlin LT*, Anolik JH*, Brenner MB*, Raychaudhuri S*. Defining Inflammatory Cell States in Rheumatoid Arthritis Joint Synovial Tissues by Integrating Single-cell Transcriptomics and Mass Cytometry [Internet]. Nature Immunology 2019;20:928-942. Publisher's VersionAbstract

To define the cell populations that drive joint inflammation in rheumatoid arthritis (RA), we applied single-cell RNA sequencing (scRNA-seq), mass cytometry, bulk RNA sequencing (RNA-seq) and flow cytometry to T cells, B cells, monocytes, and fibroblasts from 51 samples of synovial tissue from patients with RA or osteoarthritis (OA). Utilizing an integrated strategy based on canonical correlation analysis of 5,265 scRNA-seq profiles, we identified 18 unique cell populations. Combining mass cytometry and transcriptomics revealed cell states expanded in RA synovia: THY1(CD90)+HLA-DRAhi sublining fibroblasts, IL1B+ pro-inflammatory monocytes, ITGAX+TBX21+autoimmune-associated B cells and PDCD1+ peripheral helper T (TPH) cells and follicular helper T (TFH) cells. We defined distinct subsets of CD8+ T cells characterized by GZMK+, GZMB+, and GNLY+ phenotypes. We mapped inflammatory mediators to their source cell populations; for example, we attributed IL6 expression to THY1+HLA-DRAhi fibroblasts and IL1B production to pro-inflammatory monocytes. These populations are potentially key mediators of RA pathogenesis.


Soumya Raychaudhuri's Twitter