Abstract
T cell specificity emerges from a myriad of processes, ranging from the biological pathways that control T cell signaling to the structural and physical mechanisms that influence how TCRs bind peptides and MHC proteins. Of these processes, the binding specificity of the TCR is a key component. However, TCR specificity is enigmatic: TCRs are at once specific but also cross-reactive. Although long appreciated, this duality continues to puzzle immunologists and has implications for the development of TCR-based therapeutics. In this review, we discuss TCR specificity, emphasizing results that have emerged from structural and physical studies of TCR binding. We show how the TCR specificity/cross-reactivity duality can be rationalized from structural and biophysical principles. There is excellent agreement between predictions from these principles and classic predictions about the scope of TCR cross-reactivity. We demonstrate how these same principles can also explain amino acid preferences in immunogenic epitopes and highlight opportunities for structural considerations in predictive immunology.
Introduction
T cell specificity is a hallmark of cellular immunity. Specificity results from a myriad of processes, ranging from the biological mechanisms that control the composition of the T cell repertoire and its reactivity, to the physiochemical mechanisms that influence the interactions between TCRs, MHC proteins, and antigenic peptides. In between are numerous other mechanisms that influence T cell responsiveness, Ag presentation and density, and the efficiency and outcome of T cell signaling. Despite this complexity, TCR binding specificity is a foundational component of T cell specificity. In this review, we survey recent progress in understanding TCR specificity to help place it in the context of other processes that make up the equation of T cell specificity.
Binding specificity arises from the structural and physicochemical “fit” between a receptor and its ligand. In theory, structure can be used to rationalize, or even predict, TCR binding specificity. The concept of structural fit, however, is elusive and not easily quantified from structures. Just as tissue microenvironments influence cellular states, structural environments impact interatomic interactions, both attractive and repulsive. Some interactions operate at long ranges, outside of what might traditionally be viewed as a receptor–ligand interface. Motion, which can strongly influence how two molecules interact, is poorly gauged from structures. Structures themselves are the results of experiments with noise and error. These realities explain why predicting protein–ligand affinities from structure remains challenging even after decades of improvements (1–3). However, there is considerable knowledge about the factors that influence binding, and there has been much progress toward interpreting binding data from structures and using this to make qualitative predictions about specificity. These advances can readily be applied to TCRs and are discussed below. Our discussion is largely from the perspective of TCR recognition in class I MHC systems, due largely to available data, but the general themes are easily extendable to TCR recognition in class II and other Ag presentation systems in cellular immunity.
Caution ahead: TCR specificity necessarily invokes binding affinity, but it is only a component of T cell specificity
Specificity is precisely defined in biochemical interactions that involve two molecules, such as enzyme–substrate or Ab–Ag interactions. An affinity matured Ab that binds its target 1000-fold more tightly than unrelated Ags is considered highly specific. Solution binding affinities are therefore implicit in discussions of specificity. However, as reviewed recently (4), TCRs and MHC proteins are embedded in membranes, which greatly influences the biophysics of protein interactions (5). Thus, the concept of a solution KD (sometimes referred to in immunology as a three-dimensional affinity), on which the biochemical definition of binding specificity is based, cannot truly apply to T cell biology.
Considering TCRs and their ligands in their biological contexts not only brings up the physical influences of membrane confinement, but also calls attention to the fact that TCR binding and specificity are often evaluated with experiments that measure biological outcomes dependent on T cell signaling processes. In addition to TCR binding strength, T cell signaling incorporates a large variety of physical and biological complexities, incorporating everything from peptide binding to MHC proteins, T cell membrane composition, kinase and coreceptor expression levels, and more (4, 6–16). One complexity receiving current attention is the supramolecular architecture of the TCR signaling complex. Unusual TCR binding topologies have been associated with altered immunological outcomes (17, 18), potentially by hindering coreceptor or CD3 engagement and possibly the formation of higher order clusters (6, 19, 20). Supramolecular architectural differences can, in principle, occur independently of TCR affinity for peptide/MHC (pMHC). T cell mechanics and the biology of the CD4/CD8 coreceptors are two other complexities of notable interest. The former is an enigmatic process where different peptides alter the force dependence of membrane-bound TCR–pMHC interactions (4, 14, 15). The latter relates T cell responses to the levels of coreceptor directly associated with the Lck kinase (13).
Supramolecular architectures, force dependencies, and coreceptor influences provide examples of how T cell specificity can be influenced independently of classical biochemical parameters that determine the number of ligated receptors (i.e., affinity and receptor/ligand concentrations). Because these complexities superimpose on receptor binding in determining T cell function, functional outcomes scale imperfectly with TCR binding affinity measured in solution. Indeed, many outliers have been noted over the years, and both high and low thresholds are thought to exist (21–23). Nonetheless, experimental binding affinities and their ratios are the lens through which structural interpretations of binding and specificity are viewed. Binding affinities give access to the binding free energy (ΔG°), or the “glue” within an interface. When we discuss van der Waals interactions, hydrogen bonds, burial of hydrophobic surface area, and such, we implicitly consider their contributions to ΔG°.
Fortunately, when we consider ratios of affinities (or equivalently, differences in ΔG°), we can often discount many of the aspects that distinguish binding affinity from other contributions to T cell function. This has been borne out by numerous experiments in which modifications within a TCR–pMHC interface lead to changes in binding affinity, with corresponding changes in functional readouts (again, sometimes nonlinearly and with occasional outliers) (21–23). This is a good thing; without the ability to rely on relationships between ΔG° and structure, structural immunology would be of questionable value for interpreting specificity. Some healthy skepticism, however, and an awareness of the distinction between what we might refer to as TCR binding specificity (based on solution affinities and interpretable in the context of structural information) and T cell functional specificity (variations in biological T cell responses), are nonetheless important.
Rules are made to be broken and roles are not easily defined
Through efforts in structural immunology, we now have several dozen structures of different TCRs bound to various pMHC ligands (and >160 structures when we include the “redundant” structures with altered peptides, mutations, and high-affinity variants). This structural database has been reviewed extensively, recently by Rossjohn and colleagues (24, 25). Although there are common themes and trends within the collection of structures, exceptions exist for nearly every “rule” that emerged in the early days of structural immunology. For example, hypervariable CDR loops contact peptides, but they also frequently contact the MHC protein. Germline loops, although commonly aligned alongside the α helices of MHCs, often contact the peptide. A variety of angles make up the TCR “diagonal binding mode,” and TCRs that bind with reversed binding modes have now been described (17, 26). There are biological implications for the trends and their exceptions; for example, as noted above, TCRs that bind pMHC with outlier geometries seem to signal weaker or not at all, possibly due to supramolecular architectural limits (17, 18). A key lesson is that many of our simplifying assumptions about the rules and roles in TCR binding have turned out to be limiting. Indeed, partly because of the inadequacies of simplifying assumptions, there is still much to be learned from new structures of TCRs and their complexes.
One of the common assumptions of TCR specificity is that it emerges from hypervariable CDR3α and CDR3β loops. After all, in most TCR–pMHC structures the hypervariable loops most closely align with the peptide. Hypervariable loops defining specificity makes biological sense: T cell repertoires consist of millions of different receptors sharing a few dozen genetically encoded germline loops but possessing (almost) randomly generated hypervariable loops. In fact, hypervariable loop composition has very recently been shown to allow predictions of TCR specificity (27). However, because of their proximity, hypervariable loops cannot at the atomic level act independently of their neighboring germline loops nor the MHC protein.
An illustrative example is the A6 TCR, which recognizes the human T cell leukemia virus-1 Tax11–19 peptide presented by the class I MHC protein HLA-A*0201 (HLA-A2). The A6-Tax11–19/HLA-A2 complex was the first TCR–pMHC structure to be solved at high resolution (28). The structure showed that although CDR3α helped accommodate the peptide, the loop also made a series of electrostatic interactions with the HLA-A2 α1 helix (Fig. 1A) (29, 30). A deconstruction of the strengths of individual interactions in the interface showed that the interactions between CDR3α and HLA-A2 were the strongest in the entire TCR–pMHC interface (29). At first glance this was a puzzling finding: if the strongest interactions in the interface are between a hypervariable loop and the MHC protein, how can the A6 TCR show peptide specificity? Indeed, the A6 TCR shows typical specificity and is not a “degenerate” binder as shown with positional scanning and peptide libraries (31).
High peptide specificity emerging from how a TCR interfaces with the MHC protein. (A) In the structure of the A6 TCR bound to Tax11–19/HLA-A2, Thr98 and Asp99 of CDR3α form strong electrostatic interactions with Arg65 on the HLA-A2 α1 helix (29). (B) To interact with Arg65, CDR3α must undergo a conformational change upon binding (30). In the absence of the loop conformational change, steric clashes would occur between the CDR3α backbone and Arg65. Upon making the conformational change, the backbone of CDR3α is tightly packed against position 4 of the peptide. If position 4 is anything other than glycine, steric clashes would exist, preventing the loop from adopting its needed conformation, as shown in bottom right for an alanine at position 4 (yellow sphere shows the β carbon of an alanine modeled at p4).
High peptide specificity emerging from how a TCR interfaces with the MHC protein. (A) In the structure of the A6 TCR bound to Tax11–19/HLA-A2, Thr98 and Asp99 of CDR3α form strong electrostatic interactions with Arg65 on the HLA-A2 α1 helix (29). (B) To interact with Arg65, CDR3α must undergo a conformational change upon binding (30). In the absence of the loop conformational change, steric clashes would occur between the CDR3α backbone and Arg65. Upon making the conformational change, the backbone of CDR3α is tightly packed against position 4 of the peptide. If position 4 is anything other than glycine, steric clashes would exist, preventing the loop from adopting its needed conformation, as shown in bottom right for an alanine at position 4 (yellow sphere shows the β carbon of an alanine modeled at p4).
Two other pieces come together to tell the full story. First, in binding Tax11–19/HLA-A2 the A6 CDR3α loop must adjust its position to avoid steric clashes with the HLA-A2 α1 helix and form the key electrostatic interactions (30). In doing so, it aligns against Gly4 of the peptide (Fig. 1B). Second, the A6 TCR shows exquisite specificity for a glycine at position 4 of the peptide—no other residues are tolerated (31). Structural analyses showed that when anything other than glycine is present at P4, the loop cannot adopt the proper conformation to interact with the MHC protein, due to steric clashes. Thus, the way the CDR3α loop interfaces with both peptide and MHC contributes to peptide specificity.
Although the A6-Tax11–19–HLA-A2 interaction is one of the best studied, numerous other TCR complexes provide other examples of how traditional “roles” for the various interface components break down at an atomic level. Some of these examples are summarized in a recent analysis of several structures (32). In some cases, germline-encoded CDR1 loops interface with the N-terminal or C-terminal halves of peptides, contributing to peptide specificity (33–35). Conformations of neighboring CDR loops can influence one another (36–38). In other cases, TCR–peptide perturbations have been compensated by new TCR–MHC interactions without any apparent losses in peptide specificity (30).
More recently, in the analysis of a variant of the A6 TCR whose specificity was switched from the Tax11–19 peptide to the MART-126–35 peptide via molecular evolution (39), it was found that specificity determinants were distributed throughout the germline and hypervariable CDR loops, including amino acids distal from the binding site that influenced loop architecture (38, 40). The overall message is that when considering the determinants of TCR binding specificity we need to consider the interface in its entirety, including the structural and physical relationships between the various CDR loops (hypervariable or not) and the composite pMHC surface.
Rationalizing the specificity/cross-reactivity duality of TCRs
As pointed out in Mason’s (41) seminal 1998 review and then later by Sewell (42), the universe of potential Ags is orders of magnitude larger than the number of unique TCRs in an individual, necessitating a highly cross-reactive TCR repertoire. Experimentally, a single TCR has been shown to recognize more than a million peptides (43). But how do we square such high cross-reactivity with observations of high specificity, such as the requirement of a glycine at P4 for binding of the A6 TCR to Tax11–19/HLA-A2? Absolute specificity for an amino acid at one position still permits millions of other peptides. Consider a 9-mer with at least one ideal class I MHC anchor residue at P9. Many peptides are immunogenic with only one ideal anchor (44), so imagine that the second anchor residue can be substituted by one of the nine smaller/uncharged amino acids without inducing a structural change in the peptide (45, 46). Now consider a hypothetical TCR with an absolute requirement for a glycine at P4. With these generous stipulations, there are 580 million matching peptides, as shown in Fig. 2. However, simply replacing Gly4 with an alanine would abolish recognition of any.
Structural and physiochemical constraints on which peptides a TCR can recognize permit a high degree of cross-reactivity but also high specificity. (A and B) In the example shown for a class I system, there are 512 billion possible 9-mer peptides. Constraining one primary anchor and limiting the second reduces this to 12 billion. Further constraints designed to mimic principles of TCR specificity progressively reduce the number of possible peptides. This is illustrated for a hypothetical case where compatibility with a TCR is introduced in a stepwise fashion. Compatibility adds a requirement for a glycine at P4, a requirement for a hydrophobic leucine or isoleucine at P5, an aromatic tyrosine or phenylalanine at P7, removal of charges from the center, and exclusion of glycine and proline from all remaining positions. Under all of these constraints, there are still 1.6 million compatible peptides. Two seemingly unrelated peptides are shown in (C). Note that compatibility as defined indicates a peptide that permits a TCR to bind with an affinity strong enough to productively signal, implying that all compatible peptides need not be recognized with the same affinity.
Structural and physiochemical constraints on which peptides a TCR can recognize permit a high degree of cross-reactivity but also high specificity. (A and B) In the example shown for a class I system, there are 512 billion possible 9-mer peptides. Constraining one primary anchor and limiting the second reduces this to 12 billion. Further constraints designed to mimic principles of TCR specificity progressively reduce the number of possible peptides. This is illustrated for a hypothetical case where compatibility with a TCR is introduced in a stepwise fashion. Compatibility adds a requirement for a glycine at P4, a requirement for a hydrophobic leucine or isoleucine at P5, an aromatic tyrosine or phenylalanine at P7, removal of charges from the center, and exclusion of glycine and proline from all remaining positions. Under all of these constraints, there are still 1.6 million compatible peptides. Two seemingly unrelated peptides are shown in (C). Note that compatibility as defined indicates a peptide that permits a TCR to bind with an affinity strong enough to productively signal, implying that all compatible peptides need not be recognized with the same affinity.
Recent studies of TCR cross-reactivity shed further light on structural aspects of specificity. Using yeast display, Garcia and colleagues (47) screened libraries encoded by genes for single-chain peptide/H-2Ld complexes with randomized peptide sequences. Sequences encoding proteins that bound strongly to the 42F3 TCR were identified by flow cytometry using TCR tetramers and used to generate what can be termed a “sequence fitness landscape” for peptides compatible with the 42F3 TCR. Structural analyses of some of these complexes showed a TCR focus on a “hot spot,” a region of the peptide that was structurally and chemically similar between different agonist ligands, forming similar interactions with the TCR (Fig. 3A). Outside of the hot spot more sequence diversity was permitted.
Hot spots demonstrated by sequence landscapes in TCR–pMHC interfaces. (A) Illustration of a structurally conserved hot spot within the interface between the 42F3 TCR and peptides presented by H2-Ld (47). The left panel shows the interface between 42F3 and the QL9 (QLSPFPFDL) mimotope FLSPFWFDI/Ld. The hot spot region is localized to the peptide bulge and engaged primarily by residues of CDR3β. The right panel shows the hot spot as found in eight different agonist 42F3–agonist/Ld structures. Peptide backbones at positions 6 through 9 are colored by contact frequency with the TCR, with red indicating the greatest number of contacts. The side chains of the key amino acids at positions 7 and 8 are shown, as is the conformation of the CDR3β loop. Pro97β remains in position to interact with peptide position 7, whereas Asp95β adjusts its conformation to optimize charge complementarity with peptide position 8. (B) In altering peptide specificity, the molecular evolution process acted upon a hot spot in the interface between A6 TCR and Tax11–19/HLA-A2 (see also Fig. 1) (38). By changing Thr98α to a lysine and Asp99α to a tyrosine, the complex electrostatic interactions between the TCR and the HLA-A2 α1 helix were disrupted, forcing Arg65 of HLA-A2 to adopt a new conformation, permitting Trp104 of CDR3α to sandwich between the arginine and the peptide backbone and forming a new hot spot in the interface the modified TCR forms with the MART-126–35/HLA-A2 ligand.
Hot spots demonstrated by sequence landscapes in TCR–pMHC interfaces. (A) Illustration of a structurally conserved hot spot within the interface between the 42F3 TCR and peptides presented by H2-Ld (47). The left panel shows the interface between 42F3 and the QL9 (QLSPFPFDL) mimotope FLSPFWFDI/Ld. The hot spot region is localized to the peptide bulge and engaged primarily by residues of CDR3β. The right panel shows the hot spot as found in eight different agonist 42F3–agonist/Ld structures. Peptide backbones at positions 6 through 9 are colored by contact frequency with the TCR, with red indicating the greatest number of contacts. The side chains of the key amino acids at positions 7 and 8 are shown, as is the conformation of the CDR3β loop. Pro97β remains in position to interact with peptide position 7, whereas Asp95β adjusts its conformation to optimize charge complementarity with peptide position 8. (B) In altering peptide specificity, the molecular evolution process acted upon a hot spot in the interface between A6 TCR and Tax11–19/HLA-A2 (see also Fig. 1) (38). By changing Thr98α to a lysine and Asp99α to a tyrosine, the complex electrostatic interactions between the TCR and the HLA-A2 α1 helix were disrupted, forcing Arg65 of HLA-A2 to adopt a new conformation, permitting Trp104 of CDR3α to sandwich between the arginine and the peptide backbone and forming a new hot spot in the interface the modified TCR forms with the MART-126–35/HLA-A2 ligand.
Hot spots are found in almost every protein–protein interaction, and indeed have been described many times in TCR–pMHC complexes (48–56). They can be discerned via sequence landscapes as noted above (47), but have been more traditionally defined as regions where mutations have the greatest impact on binding (57). TCRs where hot spots have been explored through point mutations are listed in a recently developed online database (https://zlab.umassmed.edu/atlas/web/) (58). However, point mutants are almost always to alanine, a rather limited exploration of chemical space (51, 59). Extending on point mutations is the recent deployment of deep mutational scanning to TCRs, which can rapidly and exhaustively assess the impact and importance of multiple mutations throughout the CDR loops, generating a sequence fitness landscape for the receptor (38, 40). Using deep mutational scanning, it was recently shown that for a variant of the A6 TCR, although specificity emerged from the action of numerous sites as noted above, molecular evolution altered specificity by modifying the interactions between the CDR3α loop and the charges on the HLA-A2 α1 helix shown in Fig. 1. Essentially, the yeast display process “converted” a hot spot that drove compatibility with Tax11–19/HLA-A2 to another that drove compatibility with MART-126–35/HLA-A2 (Fig. 3B).
Although not solely responsible for binding specificity, the occurrence of hot spots within TCR–pMHC interfaces can explain the simultaneous observation of both high and low specificity in TCR binding: subtle perturbations in hot spot regions of a peptide will have profound impacts on binding, whereas changes outside a hot spot can be more easily tolerated. In fact, the discovery and consequences of hot spots in peptides were foreshadowed: returning to the A6 structure published in 1996, Wiley and colleagues noted “… the observation that although substantial contacts are made to peptide residues Y5 and Y8, only a few atoms of peptide residues 1, 2, 4, 6 and 7 are in contact, places physical limits on TCR specificity for peptide” (28).
Owing to structural and chemical variability, not every TCR–pMHC interface will share similar hot spots. This variability is highlighted by comparisons of different TCRs binding the same pMHC (Table I) (60–75): in some instances, very different structural/physical solutions have been seen, indicating different mechanisms of obtaining ΔG° (60, 64, 68, 71, 76). In some interfaces, hot spots will be direct interactions, as is frequently envisioned. In other cases, hot spots may be cryptic, such as alignments to avoid charge repulsion or steric clashes (30, 47). Engineering TCRs can alter the locations and overall contributions of hot spots. Indeed, it has recently been shown that improvements in TCR affinity can be found by changing not only the amino acids that contact peptide or MHC, but also those in the “second shell” away from the contact surface (38, 40).
TCRa . | Peptide . | MHC . | PDB . | References . |
---|---|---|---|---|
Class I systems | ||||
DMF5, DMF4 | AAGIGILTV | HLA-A2 | 3QDJ, 3QEQ | (60) |
DMF5, DMF4, Mel5 | ELAGIGILTV | HLA-A2 | 3QDG, 3QDM, 3HG1 | (34, 60) |
LC13, CF34, RL42 | FLRGRAYGL | HLA-B8 | 1MI5, 3FFC, 3SJV | (61, 62, 63) |
LS01, LS10 | GILGFVFTL | HLA-A2 | 5ISZ, 5JHD | (64) |
A6, B7 | LLFGYPVYV | HLA-A2 | 1AO7, 1BD2 | (28, 65) |
NP2-B17, NP1-B17 | ASNENMETM | H-2Db | 5SWS, 5SWZ | (17) |
SB27b, CA5, SB47 | LPEPLPQGQLTAY | HLA-B35 | 2AK4, 4JRX, 4JRY | (66, 67) |
RA14, C7, C25 | NLVPMVATV | HLA-A2 | 3GSN, 5D2L, 5D2N | (49, 68) |
2Cc, 42F3 | QLSPFPFDL | H-2Ld | 2E7L, 3TF7 | (18, 69) |
C1-28, T36-5 | RFPLTFGWCF | HLA-A24 | 3VXM, 3VXU | (70) |
Class II systems | ||||
2B4, 226 | ADLIAYLKQATKG | I-Ek | 3QIB, 3QIU | (71) |
JR5.1, D2, S16 | APQPELPYPQPGS | HLA-DQ2 | 4OZF, 4OZG, 4OZH | (72) |
S13, L3-12, T316, Bel502 | APSGEGSFQPSQENPQGS | HLA-DQ8 | 4Z7U, 4Z7V, 4Z7W, 5KS9 | (72, 73) |
B3K506, YAe62, 2W20, 14.C6 | FEAQKAKANKAVD | I-Ab | 3C5Z, 3C60, 3C6L, 4P5T | (37, 74) |
E8, G4 | GELIGILNAAKVPAD | HLA-DR1 | 2IAM, 4E41 | (75) |
T15d, Bel602 | GPQQSFPEQEA | HLA-DQ8 | 5KSB, 5KSA | (73) |
FS18, FS17 | GSLQPLALEGSLQKRGIV | HLA-DR4 | 4Y19, 4Y1A | (26) |
TCRa . | Peptide . | MHC . | PDB . | References . |
---|---|---|---|---|
Class I systems | ||||
DMF5, DMF4 | AAGIGILTV | HLA-A2 | 3QDJ, 3QEQ | (60) |
DMF5, DMF4, Mel5 | ELAGIGILTV | HLA-A2 | 3QDG, 3QDM, 3HG1 | (34, 60) |
LC13, CF34, RL42 | FLRGRAYGL | HLA-B8 | 1MI5, 3FFC, 3SJV | (61, 62, 63) |
LS01, LS10 | GILGFVFTL | HLA-A2 | 5ISZ, 5JHD | (64) |
A6, B7 | LLFGYPVYV | HLA-A2 | 1AO7, 1BD2 | (28, 65) |
NP2-B17, NP1-B17 | ASNENMETM | H-2Db | 5SWS, 5SWZ | (17) |
SB27b, CA5, SB47 | LPEPLPQGQLTAY | HLA-B35 | 2AK4, 4JRX, 4JRY | (66, 67) |
RA14, C7, C25 | NLVPMVATV | HLA-A2 | 3GSN, 5D2L, 5D2N | (49, 68) |
2Cc, 42F3 | QLSPFPFDL | H-2Ld | 2E7L, 3TF7 | (18, 69) |
C1-28, T36-5 | RFPLTFGWCF | HLA-A24 | 3VXM, 3VXU | (70) |
Class II systems | ||||
2B4, 226 | ADLIAYLKQATKG | I-Ek | 3QIB, 3QIU | (71) |
JR5.1, D2, S16 | APQPELPYPQPGS | HLA-DQ2 | 4OZF, 4OZG, 4OZH | (72) |
S13, L3-12, T316, Bel502 | APSGEGSFQPSQENPQGS | HLA-DQ8 | 4Z7U, 4Z7V, 4Z7W, 5KS9 | (72, 73) |
B3K506, YAe62, 2W20, 14.C6 | FEAQKAKANKAVD | I-Ab | 3C5Z, 3C60, 3C6L, 4P5T | (37, 74) |
E8, G4 | GELIGILNAAKVPAD | HLA-DR1 | 2IAM, 4E41 | (75) |
T15d, Bel602 | GPQQSFPEQEA | HLA-DQ8 | 5KSB, 5KSA | (73) |
FS18, FS17 | GSLQPLALEGSLQKRGIV | HLA-DR4 | 4Y19, 4Y1A | (26) |
The table includes parental molecules only, excluding mutants and variants, etc., except as noted.
SB27 is crystallized with HLA-B*3508, whereas CA5 and SB47 are crystallized with B*3505.
The 2C variant crystallized is a high-affinity mutant.
The peptides in complex with T15 and Bel602 differ at the P1 residue.
Following the themes above, when peptide hot spots do exist, they should not be expected to be engaged solely by hypervariable loops—again, rules are made to be broken. Returning to the example of A6 TCR, the germline CDR1β loop forms a very strong hydrogen bond with the C-terminal end of the Tax11–19 peptide, resulting in high specificity for a tyrosine at P8 (29, 31). In another case, the Mel5 TCR forms a specificity-determining salt-bridge with Glu1 of the MART-126–35 peptide using the germline CDR1α loop (33, 34). The HCV1406 TCR that recognizes the hepatitis C virus (HCV) NS3 epitope requires a lysine at P1, which is also engaged by CDR1α (35). Lastly, the flexibility inherent to some TCRs (77) may permit different hot spots with different ligands, as is thought to occur in the well-studied TCR 2C (78, 79).
Importantly, hot spots are not necessarily amino acid specific: the key feature is that compatible ligands share structural and chemical similarity. Thus, there may be similar specificity for a large hydrophobic residue, a charge, or a hydrogen bond donor, etc., depending on the structural details (31). Return to the imaginary example of the 9-mer in Fig. 2 and add a requirement for a leucine or isoleucine hot spot to make hydrophobic interactions with the TCR. Now the number of compatible peptides that meet the criteria for TCR recognition is reduced from 580 million to 58 million. Add another constraint for a tyrosine or phenylalanine at another position, and the number of compatible peptides is 5.8 million (note that “compatibility” as defined in this example requires TCR binding with sufficient strength to productively signal; compatible affinities will cover a wide range, which might influence signal strength and functional outcome, but a response will still be elicited).
Although instructional, this is admittedly a simplified argument: the implicit assumption that any random amino acid will work in the nonspecified positions and still yield a compatible peptide is certainly wrong. As discussed below, charges are less frequently observed in the centers of immunogenic peptides (80). Take this into account for our 9-mer example and the number of compatible peptides becomes 4.3 million. Again, other constraints could be envisioned—some amino acids will alter peptide conformation. To crudely mimic this, we can exclude proline and then glycine from nonterminal positions; now, the numbers are 3.6 and 3.0 million. Restrict P6 to the smallest seven amino acids to limit peptide–MHC steric clashes and the number is 1.6 million. This is a large number of peptides, but consider that there are 12 billion that only match the anchor residue requirements, and 512 billion random nonamers—1.6 million out of these much larger pools meet the biochemical definition of high specificity. Two hypothetical unrelated but compatible peptides are shown in Fig. 2C. The two peptides differ substantially in sequence, yet they have positions where subtle perturbations would abolish TCR recognition.
Although illustrative, the examples above do not address all possibilities. Adding more realism, there will be cooperative influences at different positions, leading to positional correlations in amino acid preferences and reducing the number of compatible peptides (81). However, although the example is shown for 9-mers, longer peptides are processed, presented, and recognized, which will increase the number of compatible peptides (82). For our hypothetical TCR, we easily settle on a number of compatible peptides in the millions, with regions of both low and high homology, reconciling how TCRs can be at once specific but also cross-reactive (83). Twenty years later, Mason’s (41) remarkably prescient prediction that a single TCR should be able to recognize at least a million peptides is consistent with and can be fully rationalized by structural and biophysical principles.
Using structural information to help guide the search for cross-reactive epitopes
Based on the previous discussion, any one TCR will productively engage with what at first glance might appear to be unrelated ligands, yet also show high specificity toward subtle peptide changes, a duality that as shown above can be rationalized from structural and physical principles. This duality and our ability to rationalize it are instructional as we enter the age of TCR-based molecular and cellular therapeutics. Recently, an engineered, high-affinity TCR targeting the MAGE-A3 tumor Ag presented by HLA-A1 was used in a clinical trial testing gene-engineered T cells for melanoma. Unbeknownst at the time, and despite substantial preclinical testing, the receptor also recognized a peptide from a protein expressed in cardiac tissue, leading to severe off-target autoimmunity and patient fatalities (84). The cross-reactive peptide was subsequently identified as an epitope from the protein Titin (85). The sequence of the Titin peptide is ESDPIVAQY. The sequence of the MAGE-A3 peptide is EVDPIGHLY. Consider these peptides in light of Fig. 2 and the surrounding discussion above. That the same high-affinity receptor recognized both peptides is, in hindsight, not surprising.
Crucially, however, two viral and bacterial peptides that showed similar degrees of homology with the MAGE-A3 epitope were not recognized by the TCR used in the clinical trial (85). Can the variety of outcomes with Titin, MAGE-A3, and these other epitopes be rationalized? The answer is yes, as the structures of a variant of the same TCR (MAG-IC3) bound to the MAGE-A3 and Titin peptides presented by HLA-A1 were recently published (86). In the MAG-IC3 structures with MAGE-A3 and Titin, the peptides are presented almost identically, which is not unusual given their similarities and the fact that nonamers bound to class I MHC proteins often adopt similar conformations (87, 88). The TCR in turn engages these two peptides very similarly (Fig. 4A). In both structures, Glu1 of the peptide is capped by the backbone of CDR3α, with the carbonyl of Ala98α rotated away to optimize charge complementarity. Despite being buried in the HLA-A1 binding groove, pAsp3 hydrogen bonds with Tyr32 of CDR1α. The hydrophobic pPro4 is capped by Phe101 of CDR3α, and Arg56 of CDR2β hydrogen bonds to the pPro4 backbone. The side chain of pIle5 fills a gap between the TCR and HLA-A2. The TCR mostly ignores differences in the C-terminal halves of the two peptides due to its tilt toward the peptide N termini.
The MAGE-A3 and Titin epitopes are presented and recognized almost identically by the MAG-IC3 TCR (86). (A) Nearly identical presentation of the two epitopes in the HLA-A1 peptide binding groove. The inset shows the peptide sequences and the color scheme used for all panels. (B) Overview of how the two ligands are engaged by the MAG-IC3 TCR. (C) Details of similar, key interactions in the two interfaces. Charge complementarity with pGlu1 is optimized by the positioning of the carbonyl oxygen of Ala98 of CDR3α away from the glutamate side chain (not shown is a salt bridge from Arg170 of HLA-A1 that helps fix the glutamate). Tyr32 of CDR1α hydrogen bonds with pAsp3. Phe101 of CDR3α “caps” the hydrophobic pPro4, and Arg56 of CDR2β hydrogen bonds with the pPro4 backbone. Not evident in the figure is how pIle5 packs between the TCR and the HLA-A1 α2 helix. (D) The Titin and MAGE-A3 peptides have a valine and a glycine at P6, respectively. Substitution with larger amino acids would result in clashes with Ala69 and Thr73 of the HLA-A1 α1 helix, explaining why pathogen-derived peptides similar to the Titin and MAGE-A3 peptides would not be recognized. Dashed lines show distances between the side chains of Val6 of the Titin peptide and residues of HLA-A1 (note that in generating this figure, the coordinates of the CDR1α loop were optimized, yielding coordinates slightly altered from those deposited in the PDB).
The MAGE-A3 and Titin epitopes are presented and recognized almost identically by the MAG-IC3 TCR (86). (A) Nearly identical presentation of the two epitopes in the HLA-A1 peptide binding groove. The inset shows the peptide sequences and the color scheme used for all panels. (B) Overview of how the two ligands are engaged by the MAG-IC3 TCR. (C) Details of similar, key interactions in the two interfaces. Charge complementarity with pGlu1 is optimized by the positioning of the carbonyl oxygen of Ala98 of CDR3α away from the glutamate side chain (not shown is a salt bridge from Arg170 of HLA-A1 that helps fix the glutamate). Tyr32 of CDR1α hydrogen bonds with pAsp3. Phe101 of CDR3α “caps” the hydrophobic pPro4, and Arg56 of CDR2β hydrogen bonds with the pPro4 backbone. Not evident in the figure is how pIle5 packs between the TCR and the HLA-A1 α2 helix. (D) The Titin and MAGE-A3 peptides have a valine and a glycine at P6, respectively. Substitution with larger amino acids would result in clashes with Ala69 and Thr73 of the HLA-A1 α1 helix, explaining why pathogen-derived peptides similar to the Titin and MAGE-A3 peptides would not be recognized. Dashed lines show distances between the side chains of Val6 of the Titin peptide and residues of HLA-A1 (note that in generating this figure, the coordinates of the CDR1α loop were optimized, yielding coordinates slightly altered from those deposited in the PDB).
The structures also suggest why the viral and bacterial peptides similar to MAGE-A3 and Titin were not recognized: these peptides contain a tyrosine and lysine, respectively, at position 6, neither of which would fit in the tight constraints between the peptide and MHC (Fig. 4B). Despite being very similar to Titin/MAGE-A3 elsewhere, the bacterial and viral peptides therefore likely have an altered peptide conformation that would change how the receptor sees the peptide. Overall, the structures and related physical considerations can explain TCR specificity in this instance.
Such hindsight, however, is not helpful if the goal is to assess risks of TCR cross-reactivity in advance. Could MAGE-A3 and Titin cross-reactivity be predicted in advance? Given the complexities surrounding the processing and tissue distribution of the two Ags, the answer to the question posed at the in vivo functional level is almost certainly no (85). However, a more focused question is appropriate: can hot spots and other specificity determinants be predicted in ways that can guide searches for potential cross-reactive epitopes? Structural information could be enormously helpful here, as demonstrated in recent efforts at predicting TCR specificity (27). Starting with the structure of a TCR–pMHC interface, modeling and energetic scoring of TCR–pMHC interfaces could be used to predict regions of “focus” (or hot spots) that could be used to narrow down motifs that drive specific binding. To be effective, advances will need to occur in protein modeling methodologies, particularly to account for flexibility in the TCR, peptide, and MHC proteins. Substantial increments in speed and computational power will also be required, perhaps aided by advances in distributed computing (89). These challenges are surmountable, however, and structure-based prediction methods are doubtless on the horizon as a way to help assess TCR cross-reactivity. New approaches, such as deep mutational scanning, can greatly complement structural information and thus structure-based prediction methods (38, 40). Screening for ligands using yeast display, peptide libraries, or other combinatorial approaches will be similarly helpful (43, 47, 90). Together, structural information, modeling, and screening could prove enormously powerful in this aspect of predictive immunology. Identifying those epitopes that are correctly processed and presented remains an additional task, but advances in appropriate prediction methods can be leveraged here (91–95).
Demonstrating key principles via the paradox of specificity in allorecognition
Alloreactivity emerges when tissues are transplanted across MHC barriers. Owing to the high frequency of alloreactive T cells, alloreactivity has often been presumed to be relatively nonspecific, attributable to TCR focus on mismatched MHC polymorphisms, or degenerate recognition of allopeptides. Yet paradoxically, alloreactivity has been shown in many instances to proceed with specificity for both peptide and MHC (96). Recent work on alloreactivity has shown how such specificity can emerge in alloreactivity. These findings reinforce the themes noted above: the importance of the composite TCR–pMHC interface, the breakdown of once-traditional rules, and the utility of structural information in rationalizing and predicting TCR specificity and cross-reactivity.
The LC13 TCR was first studied structurally as an example of a “public” antiviral TCR, with its structure solved in complex with an EBV epitope presented by the “self” MHC HLA-B*0801 (61). LC13 was later studied in complex with an unrelated human allopeptide presented by the “foreign” MHC HLA-B*4405 (97). In the structure, the peptide mimics the viral epitope, and despite the sequence differences, it is engaged very similarly by the TCR with no indications of nonspecific, degenerate recognition (Fig. 5A). Moreover, the LC13 TCR was found to discriminate between HLA-B*4405 and HLA-B*4403, which differ by only 2 amino acids, whereas HLA-B*4405 and HLA-B*0801 differ by 25. Discrimination between the B44 subtypes was attributable to a single amino acid difference in the MHC α2 helix, which in the case of B*4403 prevented the movement of the peptide into a compatible conformation. This highly specific engagement of both allopeptide and foreign MHC by LC13 shows that even allorecognition cannot escape the consequences that emerge from the need for TCRs to engage a composite pMHC ligand.
Structural studies of alloreactivity illustrate core principles. (A) The LC13 TCR cross-reacts between the syngeneic viral peptide/HLA-B*0801 complex (left panel) and the allogeneic self-peptide/HLA-B*4405 complex (right panel). Despite considerable sequence differences, the peptides adopt similar conformations in the ternary complexes, with key interactions between the TCR and the protruding aromatic P7 side chain maintained (97). The TCR also discriminates between closely related B44 subtypes due to a single amino acid difference that prohibits the peptide from adopting a compatible conformation in B*4403 (right panel, circled detail). In B*4405, the aspartic acid at position 156 forms a hydrogen bond with pTyr3. In B*4403, position 156 is a leucine (yellow) and would clash with pTyr3 as shown. (B) In the structure of the HCV1406-NS3/HLA-A2 complex, the conformation and chemistry of the NS3 peptide are similar to those of the MART-126–36 peptide, except for the residue at P1 (left panel). Structurally, the MART-126–35 peptide fits within the complex without any steric clashes or chemical incompatibility, save for the P1 residue, which would experience charge repulsion with Glu134 of CDR1α (right panel). Replacing pGlu1 with lysine resulted in a MART-126–35 variant that was recognized by the HCV1406 TCR (35).
Structural studies of alloreactivity illustrate core principles. (A) The LC13 TCR cross-reacts between the syngeneic viral peptide/HLA-B*0801 complex (left panel) and the allogeneic self-peptide/HLA-B*4405 complex (right panel). Despite considerable sequence differences, the peptides adopt similar conformations in the ternary complexes, with key interactions between the TCR and the protruding aromatic P7 side chain maintained (97). The TCR also discriminates between closely related B44 subtypes due to a single amino acid difference that prohibits the peptide from adopting a compatible conformation in B*4403 (right panel, circled detail). In B*4405, the aspartic acid at position 156 forms a hydrogen bond with pTyr3. In B*4403, position 156 is a leucine (yellow) and would clash with pTyr3 as shown. (B) In the structure of the HCV1406-NS3/HLA-A2 complex, the conformation and chemistry of the NS3 peptide are similar to those of the MART-126–36 peptide, except for the residue at P1 (left panel). Structurally, the MART-126–35 peptide fits within the complex without any steric clashes or chemical incompatibility, save for the P1 residue, which would experience charge repulsion with Glu134 of CDR1α (right panel). Replacing pGlu1 with lysine resulted in a MART-126–35 variant that was recognized by the HCV1406 TCR (35).
More recently, the properties of the alloreactive TCR HCV1406 were studied (35). The HCV1406 TCR was identified from T cells that expanded when an HLA-A2+ liver was transplanted into an HCV-infected HLA-A2− host (98). T cells expressing HCV1406 mediate anti-HCV immunity, specifically targeting the HCV NS3 Ag presented by HLA-A2 (35). Again, TCR binding was found to be dependent on features unique to both the peptide and the foreign MHC. In this case, TCR binding to HLA-A2 required polymorphic amino acids on the α1 helix that distinguished HLA-A2 from all class I, class II, and nonclassical MHC proteins in the transplant recipient. Peptide specificity was also dependent in part on a hot spot present at the P1 residue, which was engaged by the CDR1α loop.
Thus, in alloreactivity, LC13 and HCV1406 demonstrate the general principles discussed above. We suggest that in addition to explaining specificity in alloreactivity, these same principles can help explain the high frequencies of alloreactive T cells. The combination of unique peptides, unique modes of presentation, and different features on MHC α helices provides for composite recognition surfaces very distinct from those that TCRs encounter syngeneically (35). In alloreactivity, TCRs indeed see something new; however, it is not only peptide or MHC, but their synergistic combination that results in significant T cell reactivity.
Lastly, the structural and biophysical data with the HCV1406 TCR allowed the identification of a novel cross-reactive peptide (Fig. 5B). Identification of this novel epitope is a straightforward demonstration of how structural information can be of use in predicting TCR cross-reactivity.
Amino acid preferences in immunogenic epitopes are readily explained through physical principles
Recent analyses of immunogenic and nonimmunogenic epitopes have led to the discovery that, at least for class I systems, immunogenic epitopes are enriched in hydrophobic/aromatic amino acids in the peptide centers (80). This has led to the development of immunogenicity prediction tools, such as those found at the Immune Epitope Database (99). A related structural analysis showed that immunogenic epitopes are enriched in hydrophobic TCR contact residues (100). These findings have potential to significantly impact epitope discovery and the development of therapeutics such as those based on cancer neoepitopes.
What is the rationale for preferential use of hydrophobic and aromatic amino acids in immunogenic epitopes? It is widely thought that a prerequisite for immunogenicity is a TCR binding affinity above some minimal threshold, and as described above immunogenicity often scales with binding affinity (21–23). We suggest that differences in how hydrophobic/aromatic surfaces and how charged/polar surfaces can contribute to binding affinity underlie the observed preferences.
Burial of a hydrophobic surface is a key driving force in biomolecular recognition (101). However, electrostatic interactions such as those contributed by polar and charged amino acids are also important. What is the distinction? When charged or polar groups are buried within an interface, they are removed from the bulk water that solvates the unbound protein. Removal of charges from water is energetically expensive, and is referred to as the “desolvation penalty” (102, 103). The desolvation penalty is compensated by whatever new electrostatic interactions (e.g., hydrogen bonds and salt bridges) are formed in the protein–protein interface. However, these new interactions do not always offset the desolvation penalty. This is because the energies of electrostatic interactions are highly dependent on angles and distances, and precise alignment is required to fully offset desolvation. Indeed, studies of protein electrostatics have shown that salt bridges and hydrogen bonds within proteins and their interfaces are often unfavorable, as the desolvation penalty is not always compensated (102–105). The influence of the desolvation penalty effect has been demonstrated in TCR recognition (51) and is thought to underlie the restricted positioning of TCRs over HLA-A2 (106).
Unlike burial of charged or polar groups, removal of hydrophobic surface from bulk water is always favorable (107). This is simply the hydrophobic effect at work. The major geometrical requirements that influence the magnitude of the hydrophobic effect involve curvature and whether removal of hydrophobic surface creates a cavity (or hole) in the interface (108–110). Barring the latter, there are no special geometries needed to obtain favorable free energy from burial of hydrophobic surface.
Extrapolating to TCR recognition of pMHC, recognition of a peptide with charged or polar amino acids in the center requires higher structural precision in the loops of an engaging TCR to overcome desolvation (engaging charges near the peptide termini is easier, as the cost of desolvation is reduced when the charged group remains solvent exposed). Recognition of a peptide with a more hydrophobic center is correspondingly easier, requiring less structural precision to obtain the same binding affinity. In an individual’s TCR repertoire, there will be fewer TCRs whose CDR loops match the precise geometry needed to engage a polar peptide to bind strong enough to signal. This is not to say they do not exist, as there are complexes with interfacial salt bridges in the TCR–pMHC structural database (34, 35, 111, 112), and indeed some have been discussed above. However, two predictions from the proceeding discussion are that 1) they will be of lower frequency for strongly immunogenic complexes, as supported by amino acid preferences in immunogenic epitopes, and 2) when they do occur, interfacial salt bridges will more likely involve peptide termini and be located at the periphery of the interface, where desolvation penalties will be reduced.
The considerations above suggest how pMHC structural information could be used to improve immunogenicity predictions. Building from models based on amino acid composition, accurate structural modeling of peptides within MHC binding grooves could be used to refine predictions and possibly generate immunogenicity “scores” based on structure and energies, making allowances for conformation and the solvent exposure of hydrophobic and hydrophilic groups. Advances in structural modeling (or even high-throughput crystallography) could again prove advantageous, and a deeper understanding of how best to use structural models to energetically score a pMHC complex will be needed.
Conclusions
The binding specificity of the TCR is one of the key factors that contribute to specificity in T cell–mediated immunity. Structural and biochemical investigations have profoundly influenced our understanding of the determinants of TCR specificity. A key finding is the inability to ascribe specificity to TCR hypervariable loops or the peptide alone: rather, the composite pMHC surface and the juxtaposition of various loops of the TCR force us to consider the interface in its entirety. This includes examining the connectedness between the various components, such as peptide and MHC or hypervariable and germline loops.
Importantly, TCRs are not monospecific: as has been recognized for some time, TCR cross-reactivity is fundamental to the immune system. However, cross-reactivity is not random, but driven by the fact that for any one TCR, many peptides will be compatible and able to achieve an optimal fit with the receptor. Achieving such a fit, or a structural/energetic alignment, will not always be fully obvious or predictable from sequence comparisons, as fit is determined by structural and chemical similarities and influenced by motion. Moreover, small regions of the ligand, that is, hot spots, may dominate the binding determinants. The latter, together with structural and physical considerations, can explain long-standing observations that TCRs can be sensitive to small perturbations in one region of the peptide while tolerating more dramatic changes elsewhere.
Structural biology and our knowledge of the relationships between structure and binding can also explain amino acid preferences in immunogenic peptides. Combining structural biology and modeling with various screening techniques such as yeast display, deep scanning mutagenesis, and combinatorial peptide libraries has the potential to yield new approaches for predicting and productively modulating TCR binding properties. With further advances in understanding the myriad of other physical constraints and biological complexities that contribute to the overall specificity of cell responses, we will be closer to solving and manipulating the entire equation that describes functional T cell specificity.
Acknowledgements
We thank Dr. David Cole for critical comments on the manuscript.
Footnotes
This work was supported by National Institutes of Health Grants R35GM118166, R01AI129543, and R01GM103773.
References
Disclosures
B.M.B. and T.P.R. are associated with a nascent university startup centered on modulating TCR binding. The other authors have no financial conflicts of interest.