Polymorphisms in the human leukocyte antigen (HLA) genes within the major histocompatibility complex (MHC) locus strongly influence autoimmune disease risk1–5. Two non-exclusive hypotheses exist about the pathogenic role of HLAalleles; i) the central hypothesis, where HLA risk alleles influence thymic selection so that the probability of T cell receptors (TCRs) reactive to pathogenic antigens is increased6–8; and ii) the peripheral hypothesis, where HLA risk alleles increase the affinity for pathogenic antigens9–11. The peripheral hypothesis has been the main research focus in autoimmunity, while human data on the central hypothesis are lacking. Here, we investigated the influence of HLA alleles on TCR composition at the highly diverse complementarity determining region 3 (CDR3), where TCR recognizes antigens. We demonstrated unexpectedly powerful HLA-CDR3 associations. The strongest association was found at HLA-DRB1 amino acid position 13 (n = 628 subjects, explained variance = 9.4%; P = 4.1 x 10−138). This HLA position mediates genetic risk for multiple autoimmune diseases. In structural analysis of TCR-peptide-MHC complexes, we observed that HLA-DRB1 position 13 does not interact directly with CDR3, but is proximate to antigenic peptide residues that are also close to CDR3. We identified multiple CDR3 amino acid features enriched by HLA risk alleles; for example, the risk alleles of rheumatoid arthritis, type 1 diabetes, and celiac disease all increase the hydrophobicity of CDR3 position 109 (P < 2.1 x 10−5). In the setting of celiac disease, the CDR3 features favored by HLA risk alleles are more enriched among candidate pathogenic TCRs than control TCRs (P = 2.4 × 10−6 for gliadin specific TCRs). Together, these results provide novel genetic evidence supporting the central hypothesis.
As advances in single-cell technologies enable the unbiased assay of thousands of cells simultaneously, human disease studies are able to identify clinically associated cell states using case-control study designs. These studies require precious clinical samples and costly technologies; therefore, it is critical to employ study design principles that maximize power to detect cell state frequency shifts between conditions, such as disease versus healthy. Here, we present single-cell Power Simulation Tool (scPOST), a method that enables users to estimate power under different study designs. To approximate the specific experimental and clinical scenarios being investigated, scPOST takes prototype (public or pilot) single-cell data as input and generates large numbers of single-cell datasets in silico. We use scPOST to perform power analyses on three independent single-cell datasets that span diverse experimental conditions: a batch-corrected 21-sample rheumatoid arthritis dataset (5,265 cells) from synovial tissue, a 259-sample tuberculosis progression dataset (496,517 memory T cells) from peripheral blood mononuclear cells (PBMCs), and a 30-sample ulcerative colitis dataset (235,229 cells) from intestinal biopsies. Over thousands of simulations, we consistently observe that power to detect frequency shifts in cell states is maximized by larger numbers of independent clinical samples, reduced batch effects, and smaller variation in a cell state’s frequency across samples.