As single-cell datasets grow in sample size, there is a critical need to characterize cell states that vary across samples and associate with sample attributes like clinical phenotypes. Current statistical approaches typically map cells to cell-type clusters and examine sample differences through that lens alone. Here we present covarying neighborhood analysis (CNA), an unbiased method to identify cell populations of interest with greater flexibility and granularity. CNA characterizes dominant axes of variation across samples by identifying groups of very small regions in transcriptional space—termed neighborhoods—that covary in abundance across samples, suggesting shared function or regulation. CNA can then rigorously test for associations between any sample-level attribute and the abundances of these covarying neighborhood groups. We show in simulation that CNA enables more powerful and accurate identification of disease-associated cell states than a cluster-based approach. When applied to published datasets, CNA captures a Notch activation signature in rheumatoid arthritis, redefines monocyte populations expanded in sepsis, and identifies a previously undiscovered T-cell population associated with progression to active tuberculosis.Competing Interest StatementThe authors have declared no competing interest.
Defining causal variation by fine-mapping can be more effective in multi-ethnic genetic studies, particularly in regions such as the MHC with highly population-specific structure. To enable such studies, we constructed a large (N=21,546) high resolution HLA reference panel spanning five global populations based on whole-genome sequencing data. Expectedly, we observed unique long-range HLA haplotypes within each population group. Despite this, we demonstrated consistently accurate imputation at G-group resolution (94.2%, 93.7%, 97.8% and 93.7% in Admixed African (AA), East Asian (EAS), European (EUR) and Latino (LAT)). We jointly analyzed genome-wide association studies (GWAS) of HIV-1 viral load from EUR, AA and LAT populations. Our analysis pinpointed the MHC association to three amino acid positions (97, 67 and 156) marking three consecutive pockets (C, B and D) within the HLA-B peptide binding groove, explaining 12.9% of trait variance, and obviating effects of previously reported associations from population-specific HIV studies.Competing Interest StatementM.H.C. has received consulting or speaking fees from Illumina and AstraZeneca, and grant support from GSK and Bayer.Funding StatementThe study was supported by the National Institutes of Health (NIH) TB Research Unit Network, Grant U19 AI111224-01. The views expressed in this manuscript are those of the authors and do not necessarily represent the views of the National Heart, Lung, and Blood Institute; the National Institutes of Health; or the U.S. Department of Health and Human Services. The Genotype and Phenotype (GaP) Registry at The Feinstein Institute for Medical Research provided fresh, de-identified human plasma; blood was collected from control subjects under an IRB-approved protocol (IRB# 09-081) and processed to isolate plasma. The GaP is a sub-protocol of the Tissue Donation Program (TDP) at Northwell Health and a national resource for genotype-phenotype studies. https://www.feinsteininstitute.org/robert-s-boas-center-for-genomics-and... A.M. is supported by Gentransmed grant 2014-2020.4.01.15-0012.; D.W.H. is supported by NIH grants AI110527, AI077505, TR000445, AI069439, and AI110527. D.H.S. was supported by R01 HL92301, R01 HL67348, R01 NS058700, R01 AR48797, R01 DK071891, R01 AG058921, the General Clinical Research Center of the Wake Forest University School of Medicine (M01 RR07122, F32 HL085989), the American Diabetes Association, and a pilot grant from the Claude Pepper Older Americans Independence Center of Wake Forest University Health Sciences (P60 AG10484). J.T.E. and P.E.S. were supported by NIH/NIAMS R01 AR042742, R01 AR050511, and R01 AR063611. For some HIV cohort participants, DNA and data collection was supported by NIH/NIAID AIDS Clinical Trial Group (ACTG) grants UM1 AI068634, UM1 AI068636 and UM1 AI106701, and ACTG clinical research site grants A1069412, A1069423, A1069424, A1069503, AI025859, AI025868, AI027658, AI027661, AI027666, AI027675, AI032782, AI034853, AI038858, AI045008, AI046370, AI046376, AI050409, AI050410, AI050410, AI058740, AI060354, AI068636, AI069412, AI069415, AI069418, AI069419, AI069423, AI069424, AI069428, AI069432, AI069432, AI069434, AI069439, AI069447, AI069450, AI069452, AI069465, AI069467, AI069470, AI069471, AI069472, AI069474, AI069477, AI069481, AI069484, AI069494, AI069495, AI069496, AI069501, AI069501, AI069502, AI069503, AI069511, AI069513, AI069532, AI069534, AI069556, AI072626, AI073961, RR000046, RR000425, RR023561, RR024156, RR024160, RR024996, RR025008, RR025747, RR025777, RR025780, TR000004, TR000058, TR000124, TR000170, TR000439, TR000445, TR000457, TR001079, TR001082, TR001111, and TR024160. Molecular data for the Trans-Omics in Precision Medicine (TOPMed) program was supported by the National Heart, Lung and Blood Institute (NHLBI). See the TOPMed Omics Support Table (Supplementary Table 16) for study specific omics support information. Core support including centralized genomic read mapping and genotype calling, along with variant quality metrics and filtering were provided by the TOPMed Informatics Research Center (3R01HL-117626-02S1; contract HHSN268201800002I). Core support including phenotype harmonization, data management, sample-identity QC, and general program coordination were provided by the TOPMed Data Coordinating Center (R01HL-120393; U01HL-120393; contract HHSN268201800001I). We gratefully acknowledge the studies and participants who provided biological samples and data for TOPMed. The COPDGene project was supported by Award Number U01 HL089897 and Award Number U01 HL089856 from the National Heart, Lung, and Blood Institute. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Heart, Lung, and Blood Institute or the National Institutes of Health. The COPDGene project is also supported by the COPD Foundation through contributions made to an Industry Advisory Board comprised of AstraZeneca, Boehringer Ingelheim, GlaxoSmithKline, Novartis, Pfizer, Siemens and Sunovion. A full listing of COPDGene investigators can be found at: http://www.copdgene.org/directory The Jackson Heart Study (JHS) is supported and conducted in collaboration with Jackson State University (HHSN268201800013I), Tougaloo College (HHSN268201800014I), the Mississippi State Department of Health (HHSN268201800015I) and the University of Mississippi Medical Center (HHSN268201800010I, HHSN268201800011I and HHSN268201800012I) contracts from the National Heart, Lung, and Blood Institute (NHLBI) and the National Institute on Minority Health and Health Disparities (NIMHD). The authors also wish to thank the staffs and participants of the JHS. MESA and the MESA SHARe project are conducted and supported by the National Heart, Lung, and Blood Institute (NHLBI) in collaboration with MESA investigators. Support for MESA is provided by contracts 75N92020D00001, HHSN268201500003I, N01-HC-95159, 75N92020D00005, N01-HC-95160, 75N92020D00002, N01-HC-95161, 75N92020D00003, N01-HC-95162, 75N92020D00006, N01-HC-95163, 75N92020D00004, N01-HC-95164, 75N92020D00007, N01-HC-95165, N01-HC-95166, N01-HC-95167, N01-HC-95168, N01-HC-95169, UL1-TR-000040, UL1-TR-001079, UL1-TR-001420. MESA Family is conducted and supported by the National Heart, Lung, and Blood Institute (NHLBI) in collaboration with MESA investigators. Support is provided by grants and contracts R01HL071051, R01HL071205, R01HL071250, R01HL071251, R01HL071258, R01HL071259, by the National Center for Research Resources, Grant UL1RR033176. The provision of genotyping data was supported in part by the National Center for Advancing Translational Sciences, CTSI grant UL1TR001881, and the National Institute of Diabetes and Digestive and Kidney Disease Diabetes Research Center (DRC) grant DK063491 to the Southern California Diabetes Endocrinology Research Center. This project has been funded in whole or in part with federal funds from the Frederick National Laboratory for Cancer Research, under Contract No. HHSN261200800001E. The content of this publication does not necessarily reflect the views or policies of the Department of Health and Human Services, nor does mention of trade names, commercial products, or organizations imply endorsement by the U.S. Government. This Research was supported in part by the Intramural Research Program of the NIH, Frederick National Lab, Center for Cancer Research.Author DeclarationsI confirm all relevant ethical guidelines have been followed, and any necessary IRB and/or ethics committee approvals have been obtained.YesThe details of the IRB/oversight body that provided approval or exemption for the research described are given below:The Genotype and Phenotype (GaP) Registry at The Feinstein Institute for Medical Research provided fresh, de-identified human plasma; blood was collected from control subjects under an IRB-approved protocol (IRB# 09-081) and processed to isolate plasma. The GaP is a sub-protocol of the Tissue Donation Program (TDP) at Northwell Health and a national resource for genotype-phenotype studies. Each study was previously approved by respective institutional review boards (IRBs), including for the generation of WGS data and association with phenotypes. All participants provided written consent.All necessary patient/participant consent has been obtained and the appropriate institutional forms have been archived.YesI understand that all clinical trials and any other prospective interventional studies must be registered with an ICMJE-approved registry, such as ClinicalTrials.gov. I confirm that any such study reported in the manuscript has been registered and the trial registration ID is provided (note: if posting a prospective study registered retrospectively, please provide a statement in the trial ID field explaining why the study was not registered in advance).YesI have followed all appropriate research reporting guidelines and uploaded the relevant EQUATOR Network research reporting checklist(s) and other pertinent material as supplementary files, if applicable.YesThe source code is available for download at https://github.com/immunogenomics
Recent advances in single-cell technologies and integration algorithms make it possible to construct large, comprehensive reference atlases from multiple datasets encompassing many donors, studies, disease states, and sequencing platforms. Much like mapping sequencing reads to a reference genome, it is essential to be able to map new query cells onto complex, multimillion-cell reference atlases to rapidly identify relevant cell states and phenotypes. We present Symphony, a novel algorithm for building compressed, integrated reference atlases of ≥106 cells and enabling efficient query mapping within seconds. Based on a linear mixture model framework, Symphony precisely localizes query cells within a low-dimensional reference embedding without the need to reintegrate the reference cells, facilitating the downstream transfer of many types of reference-defined annotations to the query cells. We demonstrate the power of Symphony by (1) mapping a query containing multiple levels of experimental design to predict pancreatic cell types in human and mouse, (2) localizing query cells along a smooth developmental trajectory of human fetal liver hematopoiesis, and (3) harnessing a multimodal CITE-seq reference atlas to infer query surface protein expression in memory T cells. Symphony will enable the sharing of comprehensive integrated reference atlases in a convenient, portable format that powers fast, reproducible querying and downstream analyses.