Recent advances in single-cell technologies and integration algorithms make it possible to construct comprehensive reference atlases encompassing many donors, studies, disease states, and sequencing platforms. Much like mapping sequencing reads to a reference genome, it is essential to be able to map query cells onto complex, multimillion-cell reference atlases to rapidly identify relevant cell states and phenotypes. We present Symphony (https://github.com/immunogenomics/symphony), an algorithm for building large-scale, integrated reference atlases in a convenient, portable format that enables efficient query mapping within seconds. Symphony localizes query cells within a stable low-dimensional reference embedding, facilitating reproducible downstream transfer of reference-defined annotations to the query. We demonstrate the power of Symphony in multiple real-world datasets, including (1) mapping a multi-donor, multi-species query to predict pancreatic cell types, (2) localizing query cells along a developmental trajectory of fetal liver hematopoiesis, and (3) inferring surface protein expression with a multimodal CITE-seq atlas of memory T cells.
BACKGROUND: Response to targeted therapy varies between patients for largely unknown reasons. Here, we developed and applied an integrative platform using mass spectrometry imaging (MSI), phosphoproteomics, and multiplexed tissue imaging for mapping drug distribution, target engagement, and adaptive response to gain insights into heterogeneous response to therapy.
METHODS: Patient-derived xenograft (PDX) lines of glioblastoma were treated with adavosertib, a Wee1 inhibitor, and tissue drug distribution was measured with MALDI-MSI. Phosphoproteomics was measured in the same tumors to identify biomarkers of drug target engagement and cellular adaptive response. Multiplexed tissue imaging was performed on sister sections to evaluate spatial co-localization of drug and cellular response. The integrated platform was then applied on clinical specimens from glioblastoma patients enrolled in the phase 1 clinical trial.
RESULTS: PDX tumors exposed to different doses of adavosertib revealed intra- and inter-tumoral heterogeneity of drug distribution and integration of the heterogeneous drug distribution with phosphoproteomics and multiplexed tissue imaging revealed new markers of molecular response to adavosertib. Analysis of paired clinical specimens from patients enrolled in the phase 1 clinical trial informed the translational potential of the identified biomarkers in studying patient's response to adavosertib.
CONCLUSIONS: The multimodal platform identified a signature of drug efficacy and patient-specific adaptive responses applicable to preclinical and clinical drug development. The information generated by the approach may inform mechanisms of success and failure in future early phase clinical trials, providing information for optimizing clinical trial design and guiding future application into clinical practice.
T cells acquire a regulatory phenotype when their T cell antigen receptors (TCRs) experience an intermediate- to high-affinity interaction with a self-peptide presented via the major histocompatibility complex (MHC). Using TCRβ sequences from flow-sorted human cells, we identified TCR features that promote regulatory T cell (Treg) fate. From these results, we developed a scoring system to quantify TCR-intrinsic regulatory potential (TiRP). When applied to the tumor microenvironment, TiRP scoring helped to explain why only some T cell clones maintained the conventional T cell (Tconv) phenotype through expansion. To elucidate drivers of these predictive TCR features, we then examined the two elements of the Treg TCR ligand separately: the self-peptide and the human MHC class II molecule. These analyses revealed that hydrophobicity in the third complementarity-determining region (CDR3β) of the TCR promotes reactivity to self-peptides, while TCR variable gene (TRBV gene) usage shapes the TCR's general propensity for human MHC class II-restricted activation.
The T cell receptor (TCR) endows T cells with antigen specificity and is central to nearly all aspects of T cell function. Each naïve T cell has a unique TCR sequence that is stably maintained during cell division. In this way, the TCR serves as a molecular barcode that tracks processes such as migration, differentiation, and proliferation of T cells. Recent technological advances have enabled sequencing of the TCR from single cells alongside deep molecular phenotypes on an unprecedented scale. In this review, we discuss strengths and limitations of TCR sequences as molecular barcodes and their application to study immune responses following Programmed Death-1 (PD-1) blockade in cancer. Additionally, we consider applications of TCR data beyond use as a barcode.
As single-cell datasets grow in sample size, there is a critical need to characterize cell states that vary across samples and associate with sample attributes, such as clinical phenotypes. Current statistical approaches typically map cells to clusters and then assess differences in cluster abundance. Here we present co-varying neighborhood analysis (CNA), an unbiased method to identify associated cell populations with greater flexibility than cluster-based approaches. CNA characterizes dominant axes of variation across samples by identifying groups of small regions in transcriptional space-termed neighborhoods-that co-vary in abundance across samples, suggesting shared function or regulation. CNA performs statistical testing for associations between any sample-level attribute and the abundances of these co-varying neighborhood groups. Simulations show that CNA enables more sensitive and accurate identification of disease-associated cell states than a cluster-based approach. When applied to published datasets, CNA captures a Notch activation signature in rheumatoid arthritis, identifies monocyte populations expanded in sepsis and identifies a novel T cell population associated with progression to active tuberculosis.
Polymorphisms in the human leukocyte antigen (HLA) genes strongly influence autoimmune disease risk. HLA risk alleles may influence thymic selection to increase the frequency of T cell receptors (TCRs) reactive to autoantigens (central hypothesis). However, research in human autoimmunity has provided little evidence supporting the central hypothesis. Here we investigated the influence of HLA alleles on TCR composition at the highly diverse complementarity determining region 3 (CDR3), which confers antigen recognition. We observed unexpectedly strong HLA-CDR3 associations. The strongest association was found at HLA-DRB1 amino acid position 13, the position that mediates genetic risk for multiple autoimmune diseases. We identified multiple CDR3 amino acid features enriched by HLA risk alleles. Moreover, the CDR3 features promoted by the HLA risk alleles are more enriched in candidate pathogenic TCRs than control TCRs (for example, citrullinated epitope-specific TCRs in patients with rheumatoid arthritis). Together, these results provide genetic evidence supporting the central hypothesis.
OBJECTIVE: We quantified inflammatory burden in rheumatoid arthritis (RA) synovial tissue by using computer vision to automate the process of counting individual nuclei in hematoxylin and eosin images.
METHODS: We adapted and applied computer vision algorithms to quantify nuclei density (count of nuclei per unit area of tissue) on synovial tissue from arthroplasty samples. A pathologist validated algorithm results by labeling nuclei in synovial images that were mislabeled or missed by the algorithm. Nuclei density was compared with other measures of RA inflammation such as semiquantitative histology scores, gene-expression data, and clinical measures of disease activity.
RESULTS: The algorithm detected a median of 112,657 (range 8,160-821,717) nuclei per synovial sample. Based on pathologist-validated results, the sensitivity and specificity of the algorithm was 97% and 100%, respectively. The mean nuclei density calculated by the algorithm was significantly higher (P < 0.05) in synovium with increased histology scores for lymphocytic inflammation, plasma cells, and lining hyperplasia. Analysis of RNA sequencing identified 915 significantly differentially expressed genes in correlation with nuclei density (false discovery rate is less than 0.05). Mean nuclei density was significantly higher (P < 0.05) in patients with elevated levels of C-reactive protein, erythrocyte sedimentation rate, rheumatoid factor, and cyclized citrullinated protein antibody.
CONCLUSION: Nuclei density is a robust measurement of inflammatory burden in RA and correlates with multiple orthogonal measurements of inflammation.
OBJECTIVE: To facilitate patient disease subset and risk factor identification by constructing a pipeline which is generalizable, provides easily interpretable results, and allows replication by overcoming electronic health records (EHRs) batch effects.
MATERIAL AND METHODS: We used 1872 billing codes in EHRs of 102 880 patients from 12 healthcare systems. Using tools borrowed from single-cell omics, we mitigated center-specific batch effects and performed clustering to identify patients with highly similar medical history patterns across the various centers. Our visualization method (PheSpec) depicts the phenotypic profile of clusters, applies a novel filtering of noninformative codes (Ranked Scope Pervasion), and indicates the most distinguishing features.
RESULTS: We observed 114 clinically meaningful profiles, for example, linking prostate hyperplasia with cancer and diabetes with cardiovascular problems and grouping pediatric developmental disorders. Our framework identified disease subsets, exemplified by 6 "other headache" clusters, where phenotypic profiles suggested different underlying mechanisms: migraine, convulsion, injury, eye problems, joint pain, and pituitary gland disorders. Phenotypic patterns replicated well, with high correlations of ≥0.75 to an average of 6 (2-8) of the 12 different cohorts, demonstrating the consistency with which our method discovers disease history profiles.
DISCUSSION: Costly clinical research ventures should be based on solid hypotheses. We repurpose methods from single-cell omics to build these hypotheses from observational EHR data, distilling useful information from complex data.
CONCLUSION: We establish a generalizable pipeline for the identification and replication of clinically meaningful (sub)phenotypes from widely available high-dimensional billing codes. This approach overcomes datatype problems and produces comprehensive visualizations of validation-ready phenotypes.
OBJECTIVE: Current lupus nephritis (LN) treatments are effective in only 30% of patients, emphasizing the need for novel therapeutic strategies. We undertook this study to develop mechanistic hypotheses and explore novel biomarkers by analyzing the longitudinal urinary proteomic profiles in LN patients undergoing treatment.
METHODS: We quantified 1,000 urinary proteins in 30 patients with LN at the time of the diagnostic renal biopsy and after 3, 6, and 12 months. The proteins and molecular pathways detected in the urine proteome were then analyzed with respect to baseline clinical features and longitudinal trajectories. The intrarenal expression of candidate biomarkers was evaluated using single-cell transcriptomics of renal biopsy sections from LN patients.
RESULTS: Our analysis revealed multiple biologic pathways, including chemotaxis, neutrophil activation, platelet degranulation, and extracellular matrix organization, which could be noninvasively quantified and monitored in the urine. We identified 237 urinary biomarkers associated with LN, as compared to controls without systemic lupus erythematosus. Interleukin-16 (IL-16), CD163, and transforming growth factor β mirrored intrarenal nephritis activity. Response to treatment was paralleled by a reduction in urinary IL-16, a CD4 ligand with proinflammatory and chemotactic properties. Single-cell RNA sequencing independently demonstrated that IL16 is the second most expressed cytokine by most infiltrating immune cells in LN kidneys. IL-16-producing cells were found at key sites of kidney injury.
CONCLUSION: Urine proteomics may profoundly change the diagnosis and management of LN by noninvasively monitoring active intrarenal biologic pathways. These findings implicate IL-16 in LN pathogenesis, designating it as a potentially treatable target and biomarker.
Monocytes undergo phenotypic and functional changes in response to inflammatory cues, but the molecular signals that drive different monocyte states remain largely undefined. We show that monocytes acquire macrophage markers upon glomerulonephritis and may be derived from CCR2+CX3CR1+ double-positive monocytes, which are preferentially recruited, dwell within glomerular capillaries, and acquire proinflammatory characteristics in the nephritic kidney. Mechanistically, the transition to immature macrophages begins within the vasculature and relies on CCR2 in circulating cells and TNFR2 in parenchymal cells, findings that are recapitulated in vitro with monocytes cocultured with TNF-TNFR2-activated endothelial cells generating CCR2 ligands. Single-cell RNA sequencing of cocultures defines a CCR2-dependent monocyte differentiation path associated with the acquisition of immune effector functions and generation of CCR2 ligands. Immature macrophages are detected in the urine of lupus nephritis patients, and their frequency correlates with clinical disease. In conclusion, CCR2-dependent functional specialization of monocytes into macrophages begins within the TNF-TNFR2-activated vasculature and may establish a CCR2-based autocrine, feed-forward loop that amplifies renal inflammation.
Non-coding genetic variants may cause disease by modulating gene expression. However, identifying these expression quantitative trait loci (eQTLs) is complicated by differences in gene regulation across fluid functional cell states within cell types. These states-for example, neurotransmitter-driven programs in astrocytes or perivascular fibroblast differentiation-are obscured in eQTL studies that aggregate cells1,2. Here we modelled eQTLs at single-cell resolution in one complex cell type: memory T cells. Using more than 500,000 unstimulated memory T cells from 259 Peruvian individuals, we show that around one-third of 6,511 cis-eQTLs had effects that were mediated by continuous multimodally defined cell states, such as cytotoxicity and regulatory capacity. In some loci, independent eQTL variants had opposing cell-state relationships. Autoimmune variants were enriched in cell-state-dependent eQTLs, including risk variants for rheumatoid arthritis near ORMDL3 and CTLA4; this indicates that cell-state context is crucial to understanding potential eQTL pathogenicity. Moreover, continuous cell states explained more variation in eQTLs than did conventional discrete categories, such as CD4+ versus CD8+, suggesting that modelling eQTLs and cell states at single-cell resolution can expand insight into gene regulation in functionally heterogeneous cell types.
BACKGROUND: Pro-inflammatory fibroblasts are critical for pathogenesis in rheumatoid arthritis, inflammatory bowel disease, interstitial lung disease, and Sjögren's syndrome and represent a novel therapeutic target for chronic inflammatory disease. However, the heterogeneity of fibroblast phenotypes, exacerbated by the lack of a common cross-tissue taxonomy, has limited our understanding of which pathways are shared by multiple diseases.
METHODS: We profiled fibroblasts derived from inflamed and non-inflamed synovium, intestine, lungs, and salivary glands from affected individuals with single-cell RNA sequencing. We integrated all fibroblasts into a multi-tissue atlas to characterize shared and tissue-specific phenotypes.
FINDINGS: Two shared clusters, CXCL10+CCL19+ immune-interacting and SPARC+COL3A1+ vascular-interacting fibroblasts, were expanded in all inflamed tissues and mapped to dermal analogs in a public atopic dermatitis atlas. We confirmed these human pro-inflammatory fibroblasts in animal models of lung, joint, and intestinal inflammation.
CONCLUSIONS: This work represents a thorough investigation into fibroblasts across organ systems, individual donors, and disease states that reveals shared pathogenic activation states across four chronic inflammatory diseases.
FUNDING: Grant from F. Hoffmann-La Roche (Roche) AG.
We investigated whether ancestry-specific genetic factors affect tuberculosis (TB) progression risk in a cohort of admixed Peruvians. We genotyped 2,105 patients with TB and 1,320 household contacts (HHCs) who were infected with Mycobacterium tuberculosis (M. tb) but did not develop TB and inferred each individual's proportion of native Peruvian genetic ancestry. Our HHC study design and our data on potential confounders allowed us to demonstrate increased risk independent of socioeconomic factors. A 10% increase in individual-level native Peruvian genetic ancestry proportion corresponded to a 25% increased TB progression risk. This corresponds to a 3-fold increased risk for individuals in the highest decile of native Peruvian genetic ancestry versus the lowest decile, making native Peruvian genetic ancestry comparable in effect to clinical factors such as diabetes. Our results suggest that genetic ancestry is a major contributor to TB progression risk and highlight the value of including diverse populations in host genetic studies.
Juvenile dermatomyositis (JDM) is a rare, severe autoimmune disease and the most common idiopathic inflammatory myopathy of children. JDM and adult-onset dermatomyositis (DM) have similar clinical, biological and serological features, although these features differ in prevalence between childhood-onset and adult-onset disease, suggesting that age of disease onset may influence pathogenesis. Therefore, a JDM-focused genetic analysis was performed using the largest collection of JDM samples to date. Caucasian JDM samples (n = 952) obtained via international collaboration were genotyped using the Illumina HumanCoreExome chip. Additional non-assayed human leukocyte antigen (HLA) loci and genome-wide single-nucleotide polymorphisms (SNPs) were imputed. HLA-DRB1*03:01 was confirmed as the classical HLA allele most strongly associated with JDM [odds ratio (OR) 1.66; 95% confidence interval (CI) 1.46, 1.89; P = 1.4 × 10-14], with an independent association at HLA-C*02:02 (OR = 1.74; 95% CI 1.42, 2.13, P = 7.13 × 10-8). Analyses of amino acid positions within HLA-DRB1 indicated that the strongest association was at position 37 (omnibus P = 3.3 × 10-19), with suggestive evidence this association was independent of position 74 (omnibus P = 5.1 × 10-5), the position most strongly associated with adult-onset DM. Conditional analyses also suggested that the association at position 37 of HLA-DRB1 was independent of some alleles of the Caucasian HLA 8.1 ancestral haplotype (AH8.1) such as HLA-DQB1*02:01 (OR = 1.62; 95% CI 1.36, 1.93; P = 8.70 × 10-8), but not HLA-DRB1*03:01 (OR = 1.49; 95% CR 1.24, 1.80; P = 2.24 × 10-5). No associations outside the HLA region were identified. Our findings confirm previous associations with AH8.1 and HLA-DRB1*03:01, HLA-C*02:02 and identify a novel association with amino acid position 37 within HLA-DRB1, which may distinguish JDM from adult DM.
Background: Immunosuppressive and anti-cytokine treatment may have a protective effect for patients with COVID-19. Understanding the immune cell states shared between COVID-19 and other inflammatory diseases with established therapies may help nominate immunomodulatory therapies.
Methods: To identify cellular phenotypes that may be shared across tissues affected by disparate inflammatory diseases, we developed a meta-analysis and integration pipeline that models and removes the effects of technology, tissue of origin, and donor that confound cell-type identification. Using this approach, we integrated > 300,000 single-cell transcriptomic profiles from COVID-19-affected lungs and tissues from healthy subjects and patients with five inflammatory diseases: rheumatoid arthritis (RA), Crohn's disease (CD), ulcerative colitis (UC), systemic lupus erythematosus (SLE), and interstitial lung disease. We tested the association of shared immune states with severe/inflamed status compared to healthy control using mixed-effects modeling. To define environmental factors within these tissues that shape shared macrophage phenotypes, we stimulated human blood-derived macrophages with defined combinations of inflammatory factors, emphasizing in particular antiviral interferons IFN-beta (IFN-β) and IFN-gamma (IFN-γ), and pro-inflammatory cytokines such as TNF.
Results: We built an immune cell reference consisting of > 300,000 single-cell profiles from 125 healthy or disease-affected donors from COVID-19 and five inflammatory diseases. We observed a CXCL10+ CCL2+ inflammatory macrophage state that is shared and strikingly abundant in severe COVID-19 bronchoalveolar lavage samples, inflamed RA synovium, inflamed CD ileum, and UC colon. These cells exhibited a distinct arrangement of pro-inflammatory and interferon response genes, including elevated levels of CXCL10, CXCL9, CCL2, CCL3, GBP1, STAT1, and IL1B. Further, we found this macrophage phenotype is induced upon co-stimulation by IFN-γ and TNF-α.
Conclusions: Our integrative analysis identified immune cell states shared across inflamed tissues affected by inflammatory diseases and COVID-19. Our study supports a key role for IFN-γ together with TNF-α in driving an abundant inflammatory macrophage phenotype in severe COVID-19-affected lungs, as well as inflamed RA synovium, CD ileum, and UC colon, which may be targeted by existing immunomodulatory therapies.
Multimodal T cell profiling can enable more precise characterization of elusive cell states underlying disease. Here, we integrated single-cell RNA and surface protein data from 500,089 memory T cells to define 31 cell states from 259 individuals in a Peruvian tuberculosis (TB) progression cohort. At immune steady state >4 years after infection and disease resolution, we found that, after accounting for significant effects of age, sex, season and genetic ancestry on T cell composition, a polyfunctional type 17 helper T (TH17) cell-like effector state was reduced in abundance and function in individuals who previously progressed from Mycobacterium tuberculosis (M.tb) infection to active TB disease. These cells are capable of responding to M.tb peptides. Deconvoluting this state-uniquely identifiable with multimodal analysis-from public data demonstrated that its depletion may precede and persist beyond active disease. Our study demonstrates the power of integrative multimodal single-cell profiling to define cell states relevant to disease and other traits.
Summary Background Juvenile idiopathic arthritis (JIA) is a heterogeneous disease, the signs and symptoms of which can be summarised with use of composite disease activity measures, including the clinical Juvenile Arthritis Disease Activity Score (cJADAS). However, clusters of children and young people might experience different global patterns in their signs and symptoms of disease, which might run in parallel or diverge over time. We aimed to identify such clusters in the 3 years after a diagnosis of JIA. The identification of these clusters would allow for a greater understanding of disease progression in JIA, including how physician-reported and patient-reported outcomes relate to each other over the JIA disease course. Methods In this multicentre prospective longitudinal study, we included children and young people recruited before Jan 1, 2015, to the Childhood Arthritis Prospective Study (CAPS), a UK multicentre inception cohort. Participants without a cJADAS score were excluded. To assess groups of children and young people with similar disease patterns in active joint count, physician's global assessment, and patient or parental global evaluation, we used latent profile analysis at initial presentation to paediatric rheumatology and multivariate group-based trajectory models for the following 3 years. Optimal models were selected on the basis of a combination of model fit, clinical plausibility, and model parsimony. Finding Between Jan 1, 2001, and Dec 31, 2014, 1423 children and young people with JIA were recruited to CAPS, 239 of whom were excluded, resulting in a final study population of 1184 children and young people. We identified five clusters at baseline and six trajectory groups using longitudinal follow-up data. Disease course was not well predicted from clusters at baseline; however, in both cross-sectional and longitudinal analyses, substantial proportions of children and young people had high patient or parent global scores despite low or improving joint counts and physician global scores. Participants in these groups were older, and a higher proportion of them had enthesitis-related JIA and lower socioeconomic status, compared with those in other groups. Interpretation Almost one in four children and young people with JIA in our study reported persistent, high patient or parent global scores despite having low or improving active joint counts and physician's global scores. Distinct patient subgroups defined by disease manifestation or trajectories of progression could help to better personalise health-care services and treatment plans for individuals with JIA. Funding Medical Research Council, Versus Arthritis, Great Ormond Street Hospital Children's Charity, Olivia's Vision, and National Institute for Health Research.
Khan A, Shang N, Petukhova L, Zhang J, Shen Y, Hebbring SJ, Moncrieffe H, Kottyan LC, Namjou-Khales B, Knevel R, Raychaudhuri S, Karlson EW, Harley JB, Stanaway IB, Crosslin D, Denny JC, Elkind MSV, Gharavi AG, Hripcsak G, Weng C, Kiryluk K. Medical Records-Based Genetic Studies of the Complement System. Journal of the American Society of Nephrology 2021;32(8):2031-2047.Abstract
The complement pathway represents one of the critical arms of the innate immune system. We combined genome-wide and phenome-wide association studies using medical records data for C3 and C4 levels to discover common genetic variants controlling systemic complement activation. Three genome-wide significant loci had large effects on complement levels. These loci encode three critical complement genes: CFH, C3, and C4. We performed detailed functional annotations of the significant loci, including multiallelic copy number variant analysis of the C4 locus to define two structural genomic variants with large effects on C4 levels. Blood C4 levels were strongly correlated with the copy number of C4A and C4B genes. Lastly, using genome-wide genetic correlations and electronic health records–based phenome-wide association studies in 102,138 participants, we catalogued a spectrum of human diseases genetically related to systemic complement activation, including inflammatory, autoimmune, cardiometabolic, and kidney diseases.Background Genetic variants in complement genes have been associated with a wide range of human disease states, but well-powered genetic association studies of complement activation have not been performed in large multiethnic cohorts.Methods We performed medical records–based genome-wide and phenome-wide association studies for plasma C3 and C4 levels among participants of the Electronic Medical Records and Genomics (eMERGE) network.Results In a GWAS for C3 levels in 3949 individuals, we detected two genome-wide significant loci: chr.1q31.3 (CFH locus; rs3753396-A; β=0.20; 95% CI, 0.14 to 0.25; P=1.52x10-11) and chr.19p13.3 (C3 locus; rs11569470-G; β=0.19; 95% CI, 0.13 to 0.24; P=1.29x10-8). These two loci explained approximately 2% of variance in C3 levels. GWAS for C4 levels involved 3998 individuals and revealed a genome-wide significant locus at chr.6p21.32 (C4 locus; rs3135353-C; β=0.40; 95% CI, 0.34 to 0.45; P=4.58x10-35). This locus explained approximately 13% of variance in C4 levels. The multiallelic copy number variant analysis defined two structural genomic C4 variants with large effect on blood C4 levels: C4-BS (β=-0.36; 95% CI, -0.42 to -0.30; P=2.98x10-22) and C4-AL-BS (β=0.25; 95% CI, 0.21 to 0.29; P=8.11x10-23). Overall, C4 levels were strongly correlated with copy numbers of C4A and C4B genes. In comprehensive phenome-wide association studies involving 102,138 eMERGE participants, we cataloged a full spectrum of autoimmune, cardiometabolic, and kidney diseases genetically related to systemic complement activation.Conclusions We discovered genetic determinants of plasma C3 and C4 levels using eMERGE genomic data linked to electronic medical records. Genetic variants regulating C3 and C4 levels have large effects and multiple clinical correlations across the spectrum of complement-related diseases in humans.
Many diseases exhibit population-specific causal effect sizes with trans-ethnic genetic correlations significantly less than 1, limiting trans-ethnic polygenic risk prediction. We develop a new method, S-LDXR, for stratifying squared trans-ethnic genetic correlation across genomic annotations, and apply S-LDXR to genome-wide summary statistics for 31 diseases and complex traits in East Asians (average N = 90K) and Europeans (average N = 267K) with an average trans-ethnic genetic correlation of 0.85. We determine that squared trans-ethnic genetic correlation is 0.82× (s.e. 0.01) depleted in the top quintile of background selection statistic, implying more population-specific causal effect sizes. Accordingly, causal effect sizes are more population-specific in functionally important regions, including conserved and regulatory regions. In regions surrounding specifically expressed genes, causal effect sizes are most population-specific for skin and immune genes, and least population-specific for brain genes. Our results could potentially be explained by stronger gene-environment interaction at loci impacted by selection, particularly positive selection.
The recent development of imputation methods enabled the prediction of human leukocyte antigen (HLA) alleles from intergenic SNP data, allowing studies to fine-map HLA for immune phenotypes. Here we report an accurate HLA imputation method, CookHLA, which has superior imputation accuracy compared to previous methods. CookHLA differs from other approaches in that it locally embeds prediction markers into highly polymorphic exons to account for exonic variability, and in that it adaptively learns the genetic map within MHC from the data to facilitate imputation. Our benchmarking with real datasets shows that our method achieves high imputation accuracy in a wide range of scenarios, including situations where the reference panel is small or ethnically unmatched.