John Todd, Linda Wicker, and colleagues (1) published their first genome-wide linkage study of type 1 diabetes (T1D) using the diabetes-prone NOD mouse in 1991. This study first appeared as a featured article in Nature and is reprinted in this month’s issue of The Journal of Immunology. The genetic basis of susceptibility to T1D was poorly understood in 1991, although previous studies had identified the NOD-derived H-2g7 MHC as a crucial genetic component of disease (2, 3). However, homozygosity for H-2g7 alone was insufficient for T1D development, leading to the postulate that additional recessive loci in the NOD genome were also required (2, 3). Todd et al. (1) focused on identifying these additional recessive loci using a novel backcross strategy in which H-2g7 was eliminated from the screen through the analysis of (B10.H-2g7 × NOD)F1 × NOD backcross progeny, in which all progeny were H-2g7 homozygotes. They produced a cohort of 920 progeny in this backcross and identified 96 diabetic progeny (11%) for detailed studies. Segregation analysis with 53 informative genetic markers distributed throughout the mouse genome identified two recessive T1D risk loci, which they designated Idd3 and Idd4. Several cutting edge technologies were pioneered in mouse genetics in this study, including the use of polymorphic microsatellite markers to perform genome-wide linkage analysis in mouse crosses and the use of interval mapping for the localization of autoimmune disease loci (4, 5). Most importantly, this study described an approach that could identify disease risk loci anywhere in the genome, thus fostering the hope that this genetic strategy could ultimately identify all genes that cause autoimmunity and other complex genetic diseases.
The findings of this study were substantial and important; however, the stimulatory impact of this publication on the field of genetics and autoimmunity was much greater and long-lasting. The study launched an era of linkage studies of autoimmune phenotypes in animal models and human patient populations and undoubtedly led to a significant influx of young investigators into the field of immunogenetics (6–14). Todd, Wicker, and colleagues were in the forefront of these studies, and their efforts, together with those of several other investigators, ultimately identified >32 risk loci for diabetes in the NOD mouse (15–21). A multitude of research groups moved into genetic analyses of complex genetic diseases during the next 10 years. As the ease and accuracy of such studies increased through the development of new technologies (4, 22), >2000 risk loci (23), commonly termed quantitative trait loci (QTLs), were identified in a diverse array of diseases and medically relevant complex phenotypes. A wealth of linkage data were assembled during this era, and the consensus opinion was that a bountiful “harvest” of disease genes would soon occur and provide important new insights into disease processes (24, 25). The successful sequencing of the human and mouse genomes provided a wealth of information about the genes that were located within the QTLs, thus providing an array of intriguing candidate disease genes in such loci. Sadly, the precise identification of causative disease alleles within QTLs proved much more formidable than initially envisaged.
The standard approach for disease gene identification following linkage analysis in a single gene system is to fine map the locus into an extremely small genomic interval, sequence and compare the parental genomes, and identify the causal variant among an abbreviated list of candidates. However, linkage studies of diabetes and other autoimmune diseases identified a multitude of risk loci additively contributing to disease susceptibility, none of which was essential for disease development (8, 26). As a result, the standard approach was untenable. Fortunately, an alternative strategy, termed congenic dissection, had been developed in the 1940s by George Snell (27) and used to identify the H-2 complex among a wealth of other loci that impact histocompatibility. Basically, congenic dissection isolates individual QTL genomic segments by breeding the locus from one strain onto another to create a congenic strain, thus dissecting a polygenic disease into a series of “single gene” disease models. The identification of a component disease phenotype controlled by the isolated QTL is an essential element in this strategy. Once the congenic strains are produced, then a series of recombinants across the congenic interval are produced, and the location of the causative allele is “fine mapped” to a small interval (generally <1 Mb), usually by in vivo analysis of the component phenotype that it controls among the recombinants. The causative disease variant can then be identified by a combination of DNA sequencing and genotype/phenotype analysis, followed by confirmation with genetic knockins or other molecular strategies (24).
The application of Snell’s congenic dissection strategy to QTLs proved difficult owing to several characteristics of the disease genes and genomic intervals that underlie QTLs for complex diseases. First and foremost, virtually all of the risk loci identified in complex disease models have small effects on disease susceptibility. Disease develops as a consequence of their combined actions. As a result, when risk loci are studied individually, their contribution to the disease phenotype is difficult to detect, often requiring the analysis of large cohorts of aged mice. Second, in many instances, multiple disease genes are located under a single QTL peak, which leads to complications in fine mapping and further weakens individual effect sizes (28–30). In some cases, reasonably robust QTLs slowly seem to evaporate during the production and analysis of congenic recombinants (30). Finally, the genomic intervals identified by many QTLs have highly divergent sequences between the parental strains. Laboratory inbred strains were created from stocks of domesticated mice that were originally bred by “mouse fanciers,” using breeding stock obtained from locations all over the world (31). As a result, standard laboratory strains have genomic segments derived from multiple mouse subspecies. Although Mus musculus domesticus contributed most to the genomes in laboratory strains, a small component of each was contributed by subspecies such as Mus musculus musculus or Mus musculus castaneus. Because Mus subspecies genomes often differ by 1% of their sequence (32), genome sequence comparisons between laboratory strains reveal a mosaic of diversity levels in various segments (33), with some regions in a specific strain comparison exhibiting extensive sequence divergence and others much less. Not surprisingly, many of the QTLs detected among laboratory strains mapped into highly divergent genomic segments that differed in subspecies origin between the parental strains. As a result, sequence analysis often detected a plethora of potentially functional variations in several genes within the candidate genomic segment defined by congenic dissection. In summary, as teams of investigators attempted to define the causative variants located in QTLs, they were commonly working with weak phenotypes, mediated by multiple genes, with several potential candidate variations in each fine-mapped locus. For these reasons, the identification of causative variants in murine QTLs was not a simple undertaking.
Todd, Wicker, and colleagues were in the vanguard of the movement into congenic dissection and causal gene identification in NOD mice (28, 34–38). Their most successful efforts came in the characterization of Idd3 and Idd5, two loci that were revealed by their early linkage studies and found to have relatively potent effects on disease. For Idd3, their congenic dissection analysis localized the causal variants to an interval of ∼700 kb that was found to contain at least five functional genes, including Il2 and Il21, both valid candidates for a role in diabetes (37, 39). Detailed functional analyses ultimately demonstrated that the NOD allele encodes an IL-2 molecule with decreased functional properties and differential expression levels in specific cell lineages (15). Deficiencies in IL-2 production are correlated with accelerated susceptibility to diabetes in the NOD mouse, presumably via decreased development of regulatory T cells (40, 41). Thus, a solid case has been made for ll2 as the causative disease allele for Idd3. In contrast, other investigators have reported that increased expression of IL-21 increases diabetes pathogenesis and that the NOD allele of Il21 exhibits increased expression levels (42). Thus, analyses of both cytokine genes have yielded data supporting their candidacy as the causal disease allele, and as a result, both NOD-derived cytokine alleles in Idd3 are thought to contribute to the disease phenotype (43). Analogous studies of Idd5 on murine chromosome 1 identified two subintervals with phenotypic effects on diabetes incidence in NOD mice, and one of these (Idd5.1) contained CTLA4 (44). The NOD allele of CTLA4 produces an aberrant splice variant lacking the CD80/CD86 ligand binding domain found in intact CTLA-4, thereby inhibiting the downregulation of T cell effector responses that are normally mediated by intact CTLA-4 (45).
Characterizations of the remaining 30 identified risk loci in NOD are still ongoing in many laboratories, including the laboratories of Todd, Wicker, and colleagues. Congenic dissections have identified relevant candidate genes with functional variants at 18 of these loci thus far, and in some instances analyses have established a link between the NOD allele and disease-relevant processes in autoimmune diabetes. Idd3 and Idd5, for instance, both have been shown to impact the functional properties of multiple cell lineages in NOD, including tolerance induction by dendritic cells and CD4+ T cells (46, 47). B cell tolerance is also impaired in NOD, and this phenotype is impacted in part by Idd5 and Idd9/11 (48). Although most of these disease alleles impact the immune system, some loci, such as Idd9, also affect the susceptibility of pancreatic β islet cells to autoimmune targeting and destruction (49). Overall, analyses of Idd risk alleles of NOD have provided important insights into diabetes pathogenesis, although progress has slowed recently as characterizations of the genetic basis for human diabetes have gained momentum.
Todd and Wicker initiated genome-wide linkage studies in human diabetic patients soon after their publication on the NOD mouse and have had leadership roles in the characterization of risk alleles for human diabetes (50–56). During this 20-year period, human genetics and genomics have made stunning advances. The successful completion of the human genome project not only provided an invaluable database for human genetics, but also spawned the creation of a variety of new technologies. The development of high-throughput single nucleotide polymorphism genotyping, next generation sequencing, and genome-wide association analytics, coupled with the assembly of massive cohorts of clinically well-characterized patients and controls, has allowed rapid advances in the genetics of human autoimmunity (57). Genome-wide association analyses, which involve genotyping thousands of patients and controls with massive panels of single nucleotide polymorphisms, typically lead to the localization of risk loci to small genomic segments in which all of the variations are in extremely strong linkage disequilibrium and form stable haplotypes within populations. With the exception of the HLA complex, most human risk loci are localized into “linkage disequilibrium blocks” that span <200 kb, thus allowing the rapid identification of short lists of candidate causal genes for each risk locus identified in a genome scan (58, 59). Genome-wide association study analysis has been the key strategy in the discovery of risk loci for common diseases, ultimately making this process much more efficient in humans than was ever possible in the mouse.
Todd, Wicker, and colleagues have been important contributors to the development and application of all of these technologies to the study of human autoimmunity. They and their collaborators have identified >40 risk loci associated with predisposition to diabetes (60). Several of these human risk loci are syntenic with NOD loci or impact similar molecular pathways via other genetic mechanisms. For example, although Idd3 is not syntenic with a risk locus for human diabetes, the IL-2 receptor (IL-2RA) has been identified as a human diabetes risk locus (61). A total of four other Idd loci in NOD are syntenic with human diabetes loci, and in some instances, the functional lesions are quite similar. For example, the MHC class II genes (Idd1 and IDDM1) are the most potent risk loci in humans and mice, and the associated disease alleles are structurally very similar (62). The CTLA risk loci (Idd5.1 and IDDM12) are also functionally analogous in that the risk-associated alleles of both species encode CTLA-4 molecules with decreased functional expression (44). Taken together, these results support the relevance of the NOD mouse model to human diabetes susceptibility.
It has been 23 years since John Todd, Linda Wicker, and colleagues led the charge of the genetics community into the complex genetics of autoimmune diseases. Since this beginning, billions of dollars have been invested in the efforts of an army of investigators with the goal of identifying the causative genes for human autoimmune disease. What have we learned? First, the genetic basis of predisposition to these diseases is much more complex than had been anticipated in 1991, and simple answers are few and far between. Susceptibility to autoimmunity in mice or humans is not mediated by a few potent genes, but instead by dozens of relatively weak ones. However, steady improvements in technology and the recruitment of massive cohorts of human patients have led to the identification of >40 common TID loci. Recent analytical analyses indicate that this list should account for >80% of T1D genetic heritability in human populations (63–65). Thus, all (or at least most) of the important contributing genes have been assembled, although the manner in which they interact to cause disease remains an enigma. Second, polygenic autoimmune diseases are not the result of a single dysregulated process, but instead represent a collection of related disease processes with a similar endpoint (i.e., a syndrome). The genetic diversity observed in patient populations supports the hypothesis that refined genetic analyses may cluster patients into specific subpopulations that potentially exhibit common disease components and responses to therapy. If so, then a personalized genome analysis may provide clinically valuable information and enhance the efficacy of disease treatment.
In closing, the genetics of autoimmunity is now transitioning from gene discovery to defining the functional mechanisms by which genetic predisposition translates into disease. It is reasonable to predict that most of the causal variants and primary functional effects of disease alleles will be delineated in the next few years and that the quantification of genetic risk for disease will become feasible and potentially of value for diagnosis and treatment. Beyond this, defining the manner in which multiple genes interact to cause autoimmunity will undoubtedly remain challenging for some time. In this regard, the mechanisms by which MHC class II alleles potentiate autoimmunity are still not resolved, almost 30 years after the basic discovery (62)! It is quite likely that such genotype/phenotype investigations will once again use experimental models in the mouse, now informed by discoveries in human genetics. Nevertheless, wherever future investigations into the genetics of T1D take this field, it is safe to predict that John Todd and Linda Wicker will be in the vanguard of future investigations, as they have been in T1D genetics since 1991.
Footnotes
Abbreviations used in this article:
- QTL
quantitative trait locus
- T1D
type 1 diabetes.
References
Disclosures
The author has no financial conflicts of interest.