A total of 111 Ag–Ab x-ray crystal structures of large protein Ag epitopes and paratopes were analyzed to inform the process of eliciting or selecting functional and therapeutic Abs. These analyses illustrate that Ab contact residues (CR) are distributed in three prominent CR regions (CRR) on L and H chains that overlap but do not coincide with Ab CDR. The number of Ag and Ab CRs per structure are overlapping and centered around 18 and 19, respectively. The CR span (CRS), a novel measure introduced in this article, is defined as the minimum contiguous amino acid sequence containing all CRs of an Ag or Ab and represents the size of a complete structural epitope or paratope, inclusive of CR and the minimum set of supporting residues required for proper conformation. The most frequent size of epitope CRS is 50–79 aa, which is similar in size to L (60–69) and H chain (70–79) CRS. The size distribution of epitope CRS analyzed in this study ranges from ∼20 to 400 aa, similar to the distribution of independent protein domain sizes reported in the literature. Together, the number of CRs and the size of the CRS demonstrate that, on average, complete structural epitopes and paratopes are equal in size to each other and similar in size to intact protein domains. Thus, independent protein domains inclusive of biologically relevant sites represent the fundamental structural unit bound by, and useful for eliciting or selecting, functional and therapeutic Abs.

Monoclonal Abs (mAbs) are important tools, diagnostic reagents, and therapeutic drugs. High-value commercial diagnostic, functional, and therapeutic mAbs must recognize full-length native proteins folded in native conformations as they exist in patients or patient samples. Given their potential value, strategies for developing mAbs to native protein Ags is of intense interest; however, the underlying biology dictating performance makes the task of developing such Abs both complicated and costly (1). For an Ab to exert a desired biological effect, it is usually the case that it must bind to a specific, predefined site. In such cases, B cell epitope prediction algorithms are of limited value because the Ab must bind to the predefined site regardless of its inherent immunogenicity. Of paramount importance is binding at the target site to the three-dimensional conformation the protein exists in under the conditions where the Ab must function.

The most common method for developing Abs is to inject purified protein into animals and isolate mAbs that bind to native proteins. A difficulty with this approach is the preparation of protein of suitable purity and functional conformation. For decades, researchers have attempted to use peptides as surrogates of full-length native proteins with little success (2). Anti-peptide Abs react well to their cognate peptides and may be useful for detecting linear epitopes in techniques like Western blot, but Abs raised to unstructured peptides more often than not do not recognize the same sequence in native protein (3, 4).

Purified, full-length native and recombinant proteins are useful for eliciting Abs that bind native proteins, but are often difficult to prepare in quantities and qualities sufficient for immunization and selection of mAbs. In addition, immunization with full-length proteins results in a polyclonal response where the majority of Abs are directed to a limited number of immunodominant sites of the protein (5, 6). In cases where the predefined site of interest is nonimmunodominant, or even nonimmunogenic, it is very difficult to isolate mAbs with the required performance specifications. Thus, there is a critical need for a means of directing Ab development to predefined functional epitopes of native protein Ags regardless of their inherent immunogenicity.

In the case of therapeutics, once candidate mAbs have been isolated with the requisite binding characteristics, the CDR can be taken from the isolated mAb (7) and inserted into engineered Ab constructs having desired effecter function and characterized immunogenicity and toxicity profiles. It has been reported for six protein Ags that only a fraction of the CDR residues participate in binding (8), and that Ab contact residues (CRs) in nine crystal structures do not precisely correspond with residues modified by somatic hypermutation in Vκ (9). It is critical to identify specific residues involved in contacting Ag and understand how they are organized within CDR to engineer Ab reactivity. Thus, the native three-dimensional structure of both Ag epitope and Ab paratope are fundamental to the success of developing and engineering high-performance functional and therapeutic mAbs.

With these considerations in mind, we undertook the analysis of a large number of protein–Ab x-ray crystal structures to better understand both epitope and paratope size, structure, and interaction. A number of researchers have reported identifying CR of Abs and Ags by analysis of publicly available crystal structures (1012). A large body of work in this field is aimed at identifying CR as a means of developing algorithms for predicting B cell epitopes (10, 1317), whereas others have focused on analyzing or predicting CR in CDR of Abs (8, 1820). The majority of the existing studies define two residues to be in contact when they are within a specified distance of each other. Many researchers use 4 Å as the definition of CR (10, 1417), and the Molecular Modeling Database (21), at the National Center for Biotechnology Information, defines protein interactions using a 4-Å cutoff. One group has published several works using 6 Å (13, 20, 22) and reported that their conclusions were unaffected by cutoffs ranging from 4 to 6 Å (20).

Haste Andersen et al. (14) identified epitope CR using a 4-Å cutoff in 76 crystal structures composed of Ags >25 aa (29 of the structures were variants of lysozyme). Using this approach, these researchers were able to determine the number of CRs per epitope, the maximum continuous sequence of CR within each epitope, and the distribution of continuous CR per segment. These results highlighted the discontinuous nature of protein Ag epitopes.

Padlan et al. (8) determined Ab CR in six crystal structures with protein Ags and compared the sequence positions with CDR defined by nucleic acid sequence variability (23). MacCallum et al. (12) determined Ab CR sequence positions for 7 protein–Ab structures (5 of which are lysozyme), and Almagro (19) determined numbers and sequence positions of VL and VH CR for 19 protein Ags. Besides analyzing only a limited number of protein–Ag structures, these studies did not report on CR sequence positions outside established CDRs. Recently, Kunik and coworkers (20) identified Ab CR in 69 Ag–Ab structures with proteins ≥5 aa using a 6-Å cutoff and reported that ∼20% of CRs fall outside CDR. It is not clear how many of these structures contained large protein Ags, and the authors did not report sequence positions of the identified CR.

The existing studies define CR in different ways, and the number of unique structures analyzed is small. In addition, the number of Ag amino acids in the crystals are often not reported or are too few to draw conclusions about the nature of Ab binding to large protein Ags. Furthermore, some studies report only on Ag CR, whereas others focus solely on Ab.

In this article we apply a uniform analysis to both Ag and Ab proteins in a large number of x-ray crystal structures containing relatively large numbers of Ag amino acids (≥85). The number of structures analyzed is significantly greater than previous studies, revealing the extent of variability and enabling generalization of the underlying structural architecture that supports epitopes and paratopes. For each structure, we identify numbers of CR and CR span (CRS) for both Ag and Ab, and calculate CR ratios between Ag and Ab, as well as H and L chains. In addition, we identify the sequence positions of all Ab CRs and map them against CDR revealing CR regions (CRRs) that are similar to, but do not coincide with, CDR.

A significant observation of these analyses is that CR of both Ag and Ab are distributed in long, continuous linear sequences (CRS) that are similar in size to each other, and similar to the size of complete protein domains. For the first time, to our knowledge, we define complete structural epitopes as the surface CR and the minimum set of underlying contiguous amino acids required to form the intact native structure. On average, complete epitopes are the size of protein domains, and intact protein domains inclusive of predefined functional sites represent the basic structural unit useful for immunization, screening, and selection of functional and therapeutic mAbs.

A list of 110 unique Ab–Ag x-ray crystal structures representing 111 unique Abs (structure 3LOH contains 2 different Abs bound to the same Ag at different sites) was manually curated from the Protein Data Bank (PDB) (24). To provide information regarding the full extent of Ag–Ab interaction, only structures containing ≥85 Ag aa were included in the analysis. Each Ab in the data set occurs only once. The resolution of all structures is ≤3.8 Å. The amino acid sequence for each Ab was numbered using Abysis and the Kabat convention (25) (http://www.bioinf.org.uk/abysis). CRs are defined as Ag and Ab amino acid residues within 4 Å of each other (10). CRs were identified from PDB x-ray crystal structures using the Cn3D tool and Molecular Modeling Database (21), found at the National Center for Biotechnology Information Web site (http://www.ncbi.nlm.nih.gov/Structure/CN3D/cn3d.shtml). In some analyses, CR at other distances are also presented. The majority of the Abs analyzed in this article are described by terms denoting specific function including “neutralizing” (e.g., 3GBN, 3SE8), “therapeutic” (e.g., 1N8Z, 1YY9), “inhibitory/antagonistic” (e.g., 3L95, 3HI6), “blocking” (e.g., 2QQN), “stimulating” (e.g., 3GO4), “mitogenic” (e.g., 1YJD), and “protective” (e.g., 3ETB).

The list of 110 unique Ab–Ag x-ray crystal structures analyzed in this work is presented in Table I. A summary of the amino acids analyzed in this study is presented in Table II. The data set is composed of 78,666 aa, of which 5.3% are identified as CR. Ab amino acids represent 61% of the total and are divided almost equally between heavy (IgH) and light (IgL) chains. About half of all CRs belong to Ab (52%) and 65% of these lie within the H chain. Overall, there are ∼7 CRs per L chain, 13 per H chain, and 18 per Ag; however, there is significant variation between structures. The largest number of CRs in a single structure, including both Ag and Ab, is 101 (1BGX), whereas the smallest is 20 (2EXW).

Table I.
Ag–Ab crystal structures analyzed
PDB Structures Analyzed
1ADQ 1I9R 1ORQ 1YQV 2I9L 2VIS 3CVH 3GRW 3L5X 3N85 3S37 
1AFV 1IQD 1OSP 1YY9 2J88 2VXT 3D85 3H3B 3L95 3NH7 3SE8 
1BGX 1JPS 1QFU 1Z3G 2JEL 2W9E 3D9A 3H42 3LDB 3NID 3SM5 
1CZ8 1JRH 1RJL 1ZTX 2NY1 2XQB 3EOA 3HB3 3LEV 3NPS 3SOB 
1E6J 1LK3 1S78 2ADF 2NZ9 2XWT 3ETB 3HI6 3LIZ 3O2D 3T2N 
1EGJ 1N8Z 1V7M 2ARJ 2Q8B 2ZCH 3FMG 3HMX 3LOH 3OR6 3TT1 
1EZV 1NCD 1VFB 2DD8 2QQL 3B9K 3G04 3I50 3LZF 3R1G 3TT3 
1FJ1 1NSN 1WEJ 2EXW 2QQN 3BSZ 3G6J 3IU3 3MA9 3RAJ 3VG9 
1FNS 1OAZ 1YJD 2FD6 2R29 3C09 3GBN 3K2U 3MJ9 3RKD 4D9R 
1FSK 1OB1 1YNT 2GHW 2R56 3CSY 3GI8 3KJ4 3MXW 3S35 4DAG 
PDB Structures Analyzed
1ADQ 1I9R 1ORQ 1YQV 2I9L 2VIS 3CVH 3GRW 3L5X 3N85 3S37 
1AFV 1IQD 1OSP 1YY9 2J88 2VXT 3D85 3H3B 3L95 3NH7 3SE8 
1BGX 1JPS 1QFU 1Z3G 2JEL 2W9E 3D9A 3H42 3LDB 3NID 3SM5 
1CZ8 1JRH 1RJL 1ZTX 2NY1 2XQB 3EOA 3HB3 3LEV 3NPS 3SOB 
1E6J 1LK3 1S78 2ADF 2NZ9 2XWT 3ETB 3HI6 3LIZ 3O2D 3T2N 
1EGJ 1N8Z 1V7M 2ARJ 2Q8B 2ZCH 3FMG 3HMX 3LOH 3OR6 3TT1 
1EZV 1NCD 1VFB 2DD8 2QQL 3B9K 3G04 3I50 3LZF 3R1G 3TT3 
1FJ1 1NSN 1WEJ 2EXW 2QQN 3BSZ 3G6J 3IU3 3MA9 3RAJ 3VG9 
1FNS 1OAZ 1YJD 2FD6 2R29 3C09 3GBN 3K2U 3MJ9 3RKD 4D9R 
1FSK 1OB1 1YNT 2GHW 2R56 3CSY 3GI8 3KJ4 3MXW 3S35 4DAG 
Table II.
Summary of amino acids analyzed
CR 4 Å
StructureTotal aaTotalMeanMedianMaximumMinimum
IgL 23,470 751 27 
IgH 24,384 1,397 13 12 26 
All Ab 47,854 2,148 19 19 49 
Ag 30,812 2,002 18 18 52 
Ag+Ab 78,666 4,150 37 37 101 20 
CR 4 Å
StructureTotal aaTotalMeanMedianMaximumMinimum
IgL 23,470 751 27 
IgH 24,384 1,397 13 12 26 
All Ab 47,854 2,148 19 19 49 
Ag 30,812 2,002 18 18 52 
Ag+Ab 78,666 4,150 37 37 101 20 

n = 111 Ag–Ab interactions.

The frequency distribution of numbers of CR in all 111 Ag–Ab structures is shown in Fig. 1. CRs are distributed around ∼18–19 for both Ag and Ab, and the distributions exhibit significant overlap. Whereas Fig. 1 illustrates that the number of CRs across all 111 structures is distributed in a similar and overlapping manner for both Ag and Ab, there is only a modest correlation between the number of Ag and Ab CRs within individual structures (Pearson correlation coefficient, r2 = 0.5394).

FIGURE 1.

Frequency of CR in Ag–Ab structures. The number of CRs in the entire data set per group is given in parentheses.

FIGURE 1.

Frequency of CR in Ag–Ab structures. The number of CRs in the entire data set per group is given in parentheses.

Close modal

The ratio of Ab to Ag CR for each of the 111 structures is shown in Fig. 2 and is represented as the log(Ab CR/Ag CR). Negative values result when there are more CR in the Ag (36 structures), and positive values result when there are greater numbers of Ab CR (59 structures). Sixteen structures have equal numbers of Ag and Ab CRs. The structure having the highest ratio in the data set is 2J88 (21 Ab CRs, 10 Ag CRs), and the lowest is 1ADQ (9 Ab CRs, 16 Ag CRs).

FIGURE 2.

The distribution of the ratio of Ab to Ag CR for 111 x-ray crystal structures.

FIGURE 2.

The distribution of the ratio of Ab to Ag CR for 111 x-ray crystal structures.

Close modal

In addition to counting the number of CRs within each structure, the amino acid sequence position of each CR was also recorded for both Ag and Ab. From the sequence positions, the CRSs were determined for all 111 structures. Fig. 3 shows the distribution of the size of the CRS across all 111 Ags using 3.5, 4.0, and 4.5 Å as the definition of CR. Although there appears to be a trend toward longer distances between CRs resulting in longer CRS, the effect is small using the distances examined in this study. The large majority of Ag structures (86.5%) have CRS ranging from 20 to 229 aa. Only 2 of the 111 structures analyzed have all of the Ag CRs contained within a CRS of <20 aa (2J88 and 3BSZ). The most common class of CRS in this data set is 50–79 residues. The average and median CRS of all 111 Ag structures is 125 and 95 aa, respectively.

FIGURE 3.

The distribution of Ag epitope CRS determined at 3.5, 4.0, and 4.5 Å. The number of structures included in each distribution is in parentheses. Only crystal structures having resolution less than the value used to determine CR are included. Figure annotations are based on the 4.0-Å distribution.

FIGURE 3.

The distribution of Ag epitope CRS determined at 3.5, 4.0, and 4.5 Å. The number of structures included in each distribution is in parentheses. Only crystal structures having resolution less than the value used to determine CR are included. Figure annotations are based on the 4.0-Å distribution.

Close modal

There is a significantly higher average number of CR for large Ag structures ≥230 aa (19.8, n = 51), compared with smaller Ags <230 aa (16.6, n = 60, p < 0.01), and there is a weak but significant correlation between Ag size and CRS (Pearson correlation coefficient, r2 = 0.273). The relationship is readily seen in the distribution of average CRS depicted in Supplemental Fig. 1. The epitope CRS increases proportionally to the size of Ag in the crystal. Also, the average CRS of Ags ≥230 aa (179) is significantly greater than Ags <230 aa (79).

A small but significant percent of the structures (11.7%) have Ag CRS >229 aa. The 442-aa CRS in structure 3TT3 is the longest in the data set, whereas the 12-aa CRS in structure 2J88 is the shortest. The 12-aa CRS containing the 2J88 epitope contains 10 Ag CRs. In this respect, the 2J88 epitope is the most like a small, contiguous peptide of all 111 structures. Although 2J88 is the only structure where the number of CR ≈ CRS, the majority of the structures do indeed have epitope CRs that are clustered together in short, contiguous stretches ranging from 2 to 12 aa, with the most frequent longest contiguous sequence of CR being 5 (Fig. 4). These findings are in good agreement with previous work (14). The 1FSK structure is an example of an epitope containing a long, contiguous sequence of 12 CRs (i.e., a linear peptide); however, the complete epitope contains a total of 17 CR spread over a CRS of 56 residues. Thus, Ag epitopes are discontinuous individual and peptide CRs contained within protein regions ranging from ∼20 to 400 aa.

FIGURE 4.

Frequency distribution of the longest contiguous sequence of Ag CR in each of the 111 structures analyzed.

FIGURE 4.

Frequency distribution of the longest contiguous sequence of Ag CR in each of the 111 structures analyzed.

Close modal

The size and distribution of native mAb neutralization epitopes is illustrated in Fig. 5. The number of CRs for 9 HIV-1 gp120 neutralizing Abs range from 13 to 39 and are contained within long CRS from 194 to 384 aa. Many of these mAbs contact the same amino acid positions and the SUM line is a “heat map” indicating the number of different neutralizing mAbs contacting residues at each position. HIV-1 gp120 residues 281, 365, 366, 367, 368, 419, 421, and 473 are contacted by five or six of the mAbs. Together, these 8 CRs represent the core of an epitope recognized by multiple broadly neutralizing mAbs spread over a linear amino acid sequence of 193 residues.

FIGURE 5.

CR epitope map of nine HIV-1 gp120 neutralizing mAbs. Red bars indicate gp120 amino acid sequence positions within 3.5 Å of any Ab residue; yellow bars indicate amino acids within 4 Å. The number of CRs and length of CRSs are based on 4 Å data. The SUM line was constructed by counting the number of structures having CRs at the indicated positions and applying a color scale (red > yellow > green) to the distribution.

FIGURE 5.

CR epitope map of nine HIV-1 gp120 neutralizing mAbs. Red bars indicate gp120 amino acid sequence positions within 3.5 Å of any Ab residue; yellow bars indicate amino acids within 4 Å. The number of CRs and length of CRSs are based on 4 Å data. The SUM line was constructed by counting the number of structures having CRs at the indicated positions and applying a color scale (red > yellow > green) to the distribution.

Close modal

Although each CR epitope is unique, the number of CRs and pattern of sequence positions may be useful for grouping neutralization epitopes into classes. Structures 2NY1 and 3JWD have relatively few CRs located at almost identical positions and contained within a 319-aa CRS. Similarly, structures 3NGB, 3SE9, 3SE8, and NIH45-46 all have large numbers of CRs spread over very long CRS and located at very similar sequence positions. All of the HIV-1 Abs examined in this study bind and neutralize structurally intact infectious virus illustrating the general phenomenon that functional Abs contact Ag residues in highly conformationally dependant structures consisting of long CRS. The identification of CRs recognized by multiple neutralizing mAbs illustrates the utility of applying a uniform analysis to multiple structures and lends credence to the use of CR defined as a simple distance between amino acids as a means for identifying residues of viral pathogens that may be critical determinants of vaccines.

The distribution of the number of CRs present within the H and L chains of each Ab structure is shown in Fig. 6. Overall, 65% of the Ab CRs identified here are contained within the H chain. The frequency of CRs are distributed around 5–7 per L chain and 11–13 per H chain. The minimum Ig structure retaining both IgH and IgL variable domains (Fv) is ∼230 aa (∼110 aa VL and ∼120 aa VH). The 2,148 Ab CRs identified here represent, on average, ∼8.4% of an intact Fv fragment. The total number of CRs per Ab, however, varies significantly between structures. Structures 1BGX and 3SE8 have 49 and 32 CRs per Ab, respectively, whereas structure 1ADQ has only 9 Ab CRs.

FIGURE 6.

The frequency of CRs in Ab structures. Numbers in parentheses indicate the total number of CRs per group in the entire set of 111 Ag–Ab structures. IgH+L is the summation of IgH and IgL.

FIGURE 6.

The frequency of CRs in Ab structures. Numbers in parentheses indicate the total number of CRs per group in the entire set of 111 Ag–Ab structures. IgH+L is the summation of IgH and IgL.

Close modal

The ratio of IgH to IgL CRs for each of the 111 structures is depicted in Fig. 7. Three of the 111 structures have no CRs within the L chain (1Z3G, 2I9L, and 3GBN). The overwhelming majority of the structures (92) have more IgH CR than IgL; however, 11 Ab structures have more IgL CRs than IgH. The structure having the highest ratio in the data set is 3EOA (13 IgH and 1 IgL), and the lowest is 3HB3 (5 IgH and 10 IgL).

FIGURE 7.

The ratio of IgH to IgL CR in 111 Ag–Ab structures. Numbers in parentheses indicate the total CR per group in the entire set of 111 structures.

FIGURE 7.

The ratio of IgH to IgL CR in 111 Ag–Ab structures. Numbers in parentheses indicate the total CR per group in the entire set of 111 structures.

Close modal

A significant observation from these studies is the degree of variability in the number and ratio of CRs between individual structures within an Ag–Ab complex. Although all the data taken together suggest a model of Ag–Ab interaction involving equal numbers of Ab and Ag CR, this is the case in only 16 of the 111 structures analyzed. Similarly, the composite data indicate that 65% of Ab CRs reside within H chains, but the IgH to IgL CR ratio within an individual Ab is highly variable ranging from no IgL CRs to twice as many L chain CRs as H chain. It is commonly reported that the majority of Ab CRs reside within IgH, and the analyses reported in this article support these observations; however, there are 11 examples in this data set of Abs having more CRs in IgL than IgH. Although the large number of structures analyzed in this study does indeed enable construction of useful models of Ag–Ab interaction, the significant variability within and between structures limits the ability to predict individual CRs based on these composite data.

As with Ag (Fig. 3), Ab CRs, comprising the Ab paratope, are distributed along the IgH and IgL chains in a noncontiguous manner. The CRS was determined for all 111 IgL and IgH structures, and the frequency was plotted in Fig. 8. The IgL CRs are typically contained within a CRS of 60–69 aa, whereas the IgH CRS is typically 70–79 residues. Abs make contact with Ag at three discrete regions along each IgH and IgL chain known as CDR, and the predominant CRS depicted in Fig. 8 represents Ig chains having CR in both the first and third CDRs. Eleven IgH structures have CRS ≥90 aa (e.g., 3VG9). In all 11 of these structures, there is a CR in amino acid sequence position 1 or 2, which accounts for the extraordinary length of these CRSs. Similarly, there are 3 IgL structures where amino acid sequence position 1 or 2 is a CR resulting in CRS of 90–99 aa (e.g., 3SE8).

FIGURE 8.

The frequency distribution of the CRS of IgH and IgL chains. Numbers in parentheses indicate the total number of CRs per group in the data set of 111 structures.

FIGURE 8.

The frequency distribution of the CRS of IgH and IgL chains. Numbers in parentheses indicate the total number of CRs per group in the data set of 111 structures.

Close modal

The sequence positions of the Ab CR are depicted in Fig. 9. All PDB Ab sequences were renumbered according to the Kabat convention using Abysis. Abysis did not number structures 3SE8 and 3T2N, and these were omitted from further analysis. The IgH sequence positions that are most frequently CRs are 100, 52, 31, 97, and 98. The most frequent IgL CR sequence positions are 32, 92, 91, 30, and 50. IgL positions 32, 92, and 91 are in agreement with previous findings (9). In this study, we add as predominant CR the IgL positions 30 and 50 and the IgH positions above.

FIGURE 9.

The distribution of Ab CRs along the IgH and IgL chains. Ab amino acid sequence positions from PDB files were renumbered using Abysis (25).

FIGURE 9.

The distribution of Ab CRs along the IgH and IgL chains. Ab amino acid sequence positions from PDB files were renumbered using Abysis (25).

Close modal

Further inspection of the data reveals three prominent regions of CR for both IgH and IgL chains. These CRRs, defined by distance between Ag and Ab residues, are organized in a manner similar to established CDR defined by Ab gene sequence variability (23). Although the great majority of the total number of Ab CRs identified in this study lie within these prominent regions, a small but significant percentage of CRs have sequence positions that fall outside of these regions.

A more detailed analysis of Ab CRR is presented in Fig. 10. The upper panel is an analysis of IgL CRR and the lower panel is the same analysis for IgH. No CRs were identified beyond IgL sequence position 96 and IgH sequence position 103. The heat maps highlight sequence positions where multiple structures have CR. Sixty-three IgL and 59 IgH amino acid sequence positions, representing ∼54% of all positions within an Ab V region fragment (Fv ≈230 aa), are CRs in at least 1 of the 111 structures analyzed. Many of these sequence positions, however, are CRs in only a small number of structures.

FIGURE 10.

Analysis of IgL (upper) and IgH (lower) CRRs. Heat maps for IgL and IgH were constructed by counting the number of occurrences of each position in all structures and applying a color scale (red > yellow > green) to the distribution. Sequence positions were numbered using Abysis and the Kabat convention. CRRs are highlighted in gray.

FIGURE 10.

Analysis of IgL (upper) and IgH (lower) CRRs. Heat maps for IgL and IgH were constructed by counting the number of occurrences of each position in all structures and applying a color scale (red > yellow > green) to the distribution. Sequence positions were numbered using Abysis and the Kabat convention. CRRs are highlighted in gray.

Close modal

CRRs are defined in this article as contiguous sequences where >10% of the structures have CR. With these boundaries, 91.6% of all IgL and 93.7% of all IgH CRs lie within the indicated CRR. IgH CR sequence positions 1–2 and 73–79 lie outside of defined CDR/CRR. Eleven structures have CRs within IgH region 73–79 (e.g., 1S78) and 11 within region 1–2 (see also Fig. 9). Region 3 contains more CR than regions 1 and 2 for both IgL and IgH. The third region of IgH is the largest, containing 24.6% of all Ab CR. There are many Ab structures in this data set that do not have CR in each of the six CRRs. There are 7 IgH (e.g., 2QQN) and 64 IgL structures that do not make contact in all 3 CRRs. The sequence positions of all CRs in relation to CDR and CRR for all 111 structures are shown in Supplemental Figs. 2 (IgL) and 3 (IgH).

In individual Abs, CRs generally do not occur as long, contiguous sequences, but are scattered in a discontinuous manner within and across regions. It is interesting to note that in IgH and IgL CDR2/CRR2, there are large numbers of structures having CRs at positions 50 and 52, but only a small number at position 51. Similarly, IgH position 29 is included within CRR1 even though only 1 Ab of 111 contacts Ag at that position because the positions on either side are significantly above the threshold. These observations illustrate the discontinuous nature of CR within CRR and CDR of individual Abs. The fact that IgH positions 29 and 51 and IgL position 51 are included within CDR/CRR and yet are only infrequently CRs suggests the possibility of a specific role for these positions apart from directly binding Ag, perhaps in properly positioning actual CRs.

A comparison of the boundaries, number of sequence positions, and CR in each CRR with the corresponding CDR is presented in Fig. 11. Although CRR and CDR significantly overlap, the boundaries are different. The differences between CDR and CRR boundaries are greater for IgH than for IgL. There are more total sequence positions within CDR than CRR. IgH region 3 contains the largest number of CRs for both CDR and CRR, followed by IgH region 2. IgL region 2 contains the fewest CRs for both CDR and CRR. Using the Kabat definitions (discussed in Ref. 25), 13.0% of all Ab CRs lie outside of established CDR boundaries and 7.0% outside of CRR. Recently, Kunik et al. (20) reported ∼20% of the Ab residues that actually bind the Ag fall outside the CDR. Fig. 12 illustrates the overlap between CDR and CRR. The greatest difference between the two is the inclusion of IgH residues 26–30 in CRR1 compared with CDR1. Another major difference between the two is inclusion of IgL position 49 in CRR2 compared with CDR2.The CRRs defined in this article contain a larger number of CR and are more uniform in number of sequence positions in a region, and yet comprise fewer total sequence positions compared with CDR; that is, they are more densely populated with CR than CDR.

FIGURE 11.

Comparison of established CDR (Kabat) and CRR. The total number of CRs in the data set for each group is in parentheses.

FIGURE 11.

Comparison of established CDR (Kabat) and CRR. The total number of CRs in the data set for each group is in parentheses.

Close modal
FIGURE 12.

Overlap of CDR and CRR. Amino acid sequence position is numbered across the top using the Kabat convention. The number of CRs in each region (defined in Fig. 11) are indicated. The heat map was constructed by applying a color scale to the number of CRs (red > yellow > green).

FIGURE 12.

Overlap of CDR and CRR. Amino acid sequence position is numbered across the top using the Kabat convention. The number of CRs in each region (defined in Fig. 11) are indicated. The heat map was constructed by applying a color scale to the number of CRs (red > yellow > green).

Close modal

The analyses presented in this article demonstrate a significant overlap between CRR and established CDR defined using completely orthogonal methodologies. Indeed, the CRR boundaries reported in this article are even more closely aligned with refinements of CDR reported by others (26). CRRs are defined by measuring physical distance between atoms within crystal structures, whereas CDRs are defined by Ig gene sequence variability. The overlap further confirms the validity of both methodologies. Although the two approaches point to similar structural architecture, there are important differences as well (Fig. 12, Supplemental Figs. 2 and 3). Ab engineers relying solely on definitions of CDR may not give adequate consideration to sequence positions that are actually important CR. Two examples are IgL sequence position 49 and IgH position 30. These positions lie outside established CDR and yet frequently occur as CR. Also, IgL positions 24, 25, 26, 90, and 97 and IgH positions 61 and 63 do not contact Ag in this data set but are included within CDR. These observations point out the advantage of using multiple methodologies to characterize Ab binding residues and shed new light on the organization of regions of Abs harboring important CR.

These analyses were undertaken to better understand the nature of Ab binding to large protein Ags, and in so doing provide insight to strategies for developing high-performance, high-value functional and therapeutic Abs. Abs are used in a variety of ways, and the structure of an Ag is influenced by the assay or application in which the Ab is used (4). In the Western blot, the Ag is typically boiled in detergent and reducing agent, and the Ab must be capable of recognizing the denatured form of the Ag. In sandwich immunoassay methods using minimally processed patient serum or plasma samples, the Ab must be able to bind to the properly folded native Ag structure. Therapeutic Abs must bind to native protein structures as they exist in the patient. Functional Abs must bind to native protein structures at specific sites in a way that influences protein function.

In many applications, it is of greater interest to predict the sites on an Ag that will result in functional modulation or therapeutic effect than it is to predict sites that will elicit the production of Abs in an immunized animal. Indeed, the site of interest may not be immunodominant or even immunogenic, and yet an Ab to the epitope is highly desired. B cell epitope prediction algorithms are of little value in such instances, and thus strategies for developing Abs to specific sites of native proteins regardless of their inherent immunogenicity are critical.

An important aspect of these analyses is that the majority of the Abs are described in functional and therapeutic terms providing information about this important class of molecules. Another important aspect of this work is that all of the Ags consist of ≥85 aa. Structures with large Ags were selected in an attempt to reveal the full extent of Ab interaction with large, natively folded proteins. In this data set, there is a significantly higher average number of CRs for large Ag structures compared with smaller Ags, and the epitope CRS increases proportionally to the size of Ag in the crystal. Given the earlier considerations and the fact that 92% of the 111 Ags analyzed in this study are less than full-length, it seems likely that the values reported in this article for CRS, and to a lesser extent CR, may be underestimates of actual values, and previously reported estimates of CR based on even smaller protein fragments are lower still.

In this study, we identify Ag and Ab CRs in publicly available x-ray crystal structures defined as amino acids within a distance of 4 Å. Other researchers have performed similar analyses using this methodology (10, 14, 15); however, prior studies have been more limited in scope and have not focused exclusively on structures comprising large protein Ags. In addition, the length of the linear sequence containing all CR, the CRS, has not been previously reported. The large number of structures analyzed in this article highlights the extent of variability associated with the number of CRs and their relative ratios, the length of CRS, and the organization of CR within prominent Ab CRRs. The large number of structures also highlights specific differences between CDR and CRR. The identification of viral Ag sequence positions contacted by multiple neutralizing Abs, and the organization of Ab CR into CRR similar to established CDR lend credence to distance between CR as a useful means of characterizing Ag–Ab interactions.

The observation that the distributions of the numbers of Ag and Ab CR significantly overlap and are both centered around ∼18 or 19 CRs per structure suggests that, in general, Abs contact regions on large protein Ags that are similar in size to their own physical dimensions. Thus, one characteristic of an Ag epitope is that the two-dimensional surface area containing all CRs is less than or equal to the physical dimensions of the binding region of an Ab. Davies et al. (11) estimated these dimensions from studies of lysozyme–Ab crystal structures as an oval with the approximate dimensions of 20 × 30 Å. These dimensions represent the “footprint” of Ab paratopes and large protein Ag epitopes.

A unique aspect of this work is the characterization of both the Ag and Ab CRS. The analysis of many Ag–Ab complexes indicates that CRs are contained within relatively long CRSs that come into close spatial proximity because of complex folding inherent within native protein structures. This is true of both Ag and Ab proteins. In the case of Abs, the CRSs are highly uniform in length (60–69 for IgL and 70–79 for IgH), and CRs are organized into prominent CRRs. Ag CR organization and CRS are more variable; however, the most common CRS class, 50–79 residues, is very similar in size to IgL and IgH CRS. Given that the function of all Abs is to bind, it seems reasonable that Ab CRs are organized into uniform CRRs and distributed across CRSs that are highly homogeneous in length. In contrast, epitopes reside on Ags that have a variety of different functions, so it is perhaps not surprising that Ag CR and CRS are more variable.

The fact that the most frequent Ag, IgH, and IgL CRSs are very similar in size suggests a common underlying structural organization. In addition, the similarity in size between the large majority of Ag CRSs (20–229 residues), as well as the minimum Ab structure retaining complete binding (Fv ≈230 aa), further suggests common structural elements.

The basic structural unit within a protein, Ag and Ab, is a domain. Domains fold independently, mediate specific biological function, and combine to form larger modular proteins. The smallest domains are ∼35 aa and the largest can be several hundred (27). The size of the most frequently occurring protein domain is ∼100 aa (28), very similar in size to the average (125 aa) and median (95 aa) CRSs of the 111 Ag structures analyzed in this article. In addition, the size distribution of complete protein domains (28, 29) is strikingly similar to the distribution of Ag CRS presented in Fig. 3. Together, these observations suggest that complete structural epitopes, represented by CRS, are the size of intact protein domains.

It is widely recognized that Ab framework regions are required for proper paratope structure and Ab binding, and complete functional paratopes are formed only when CDR and framework regions are structurally organized into VH (∼120 aa) and VL (∼110 aa) domains that together form a functional binding unit (Fv). In a similar fashion, functional protein Ags are made up of “frameworks” of domains (30), and functional epitope conformation is determined by the underlying domain structure. When viewed in composite, the observations that the number of Ag CRs is equal to the number of Ab CRs, Ag CRSs are similar in size to IgH and IgL CRSs, and the size of the majority of Ag CRSs is less than or equal to Fv suggest that Abs recognize structural epitopes that are similar in size to VL, VH, and Fv, that is, one to two protein domains.

Large protein epitopes are a collection of not only CRs, but many more residues that are necessary to present a native structure that may be as large as, and larger than, the physical footprint of an Ab. Attempting to elicit or select Abs that bind full-length, native proteins with peptides and polypeptides smaller than an entire epitope not only risks non-native Ag folding, but Abs elicited in this way, although capable of binding native structure, may still not bind to the full-length protein because of steric hindrance imposed by Ag residues within the footprint of the Ab but not included in the immunogen.

In the analysis of the functional, neutralizing epitopes of HIV-1 presented in this article, the most broadly neutralizing epitopes are composed of structures of ∼380 aa. Immunization with smaller polypeptides would likely result in Abs with less CRs and, arguably, less affinity or perhaps less broadly neutralizing activity. In instances where Abs bind to CR on multiple domains, the size of an immunogen required to elicit such an Ab would be expected to be larger still, to encompass all CRs and all supporting framework amino acids required for maximum contact.

A great deal of effort is expended attempting to predict protein epitopes. Zhang and coworkers (31) recently reported an improved method of predicting conformational epitopes based on a “thick surface patch” model including an “adjacent residue distance” feature. This concept of an epitope includes the idea of CRs layered on top of supporting amino acids but fails to appreciate that the structure of the immediately underlying amino acids is, in turn, dependent on even more amino acids comprising the complete protein domain. Currently, B cell epitope prediction is largely focused on physicochemical attributes of a protein Ag recognized by Abs such as surface exposure, solvent accessibility, spatial configuration, and so on. It is unlikely, however, that such characteristics will fully account for issues of immunodominance and tolerance that influence Ab development and involve other components of the immune system such as T cell help.

When attempting to develop functional and therapeutic Abs, it is less important to identify the sites on a protein that may elicit large amounts of Ab and is more important to direct Ab development to sites where binding would modulate biological function, regardless of the site’s inherent immunogenicity. Given that there are a limited number of immunodominant sites on a protein, immunization with full-length proteins, although ensuring proper conformation and maximizing the potential interaction between Ag and Ab, may not elicit Abs directed to the biologically relevant site. Immunization with peptides and polypeptides representing less than a complete structural epitope has proved time and again to elicit Abs that may be directed to the site of interest but either do not bind to native protein or bind in a way that does not modulate function. To develop functional and therapeutic Abs, it is critical to focus Ab development to potentially weakly immunogenic, biologically relevant sites in a way that the resulting Abs bind to full-length native proteins and effect a desired function.

The analyses presented in this article demonstrate that functional and therapeutic Abs bind structures composed of linear strings of contiguous amino acids that are the size of protein domains. As defined by the CRS, a structural epitope is significantly larger than previous definitions of epitopes that are focused primarily on CR. A structural epitope includes CRs as well as the minimum number of contiguous amino acids required to yield a compact native structure that is independent of other protein regions or even other molecules. Thus, immunization with independent protein domains representing complete structural epitopes is indicated for development of functional and therapeutic mAbs targeted to biologically relevant and potentially weakly immunogenic sites.

We thank Dr. Ross Chambers for insight and review of the manuscript and Dr. Fenglin Yin for review of the database and Ab sequence numbering.

The online version of this article contains supplemental material.

Abbreviations used in this article:

CR

contact residue

CRR

contact residue region

CRS

contact residue span

PDB

Protein Data Bank.

1
Strohl
W. R.
2009
.
Therapeutic monoclonal antibodies: past, present and future
. In
Therapeutic Monoclonal Antibodies.
An
Z.
, ed.
John Wiley & Sons, Inc.
,
Hoboken, NJ
, p.
3
50
.
2
van Regenmortel, M. H. V. 2009. What is a B-cell epitope? In Methods in Molecular Biology, Epitope Mapping Protocols, Vol. 524. U. Reineke, and M. Schutkowski, eds. Humana Press, New York, p. 3–20
.
3
Irving
M. B.
,
Craig
L.
,
Menendez
A.
,
Gangadhar
B. P.
,
Montero
M.
,
van Houten
N. E.
,
Scott
J. K.
.
2010
.
Exploring peptide mimics for the production of antibodies against discontinuous protein epitopes.
Mol. Immunol.
47
:
1137
1148
.
4
Brown
M. C.
,
Joaquim
T. R.
,
Chambers
R.
,
Onisk
D. V.
,
Yin
F.
,
Moriango
J. M.
,
Xu
Y.
,
Fancy
D. A.
,
Crowgey
E. L.
,
He
Y.
, et al
.
2011
.
Impact of immunization technology and assay application on antibody performance—a systematic comparative evaluation.
PLoS ONE
6
:
e28718
.
5
Jemmerson
R.
1987
.
Multiple overlapping epitopes in the three antigenic regions of horse cytochrome c1.
J. Immunol.
138
:
213
219
.
6
Newman
M. A.
,
Mainhart
C. R.
,
Mallett
C. P.
,
Lavoie
T. B.
,
Smith-Gill
S. J.
.
1992
.
Patterns of antibody specificity during the BALB/c immune response to hen eggwhite lysozyme.
J. Immunol.
149
:
3260
3272
.
7
Jones
P. T.
,
Dear
P. H.
,
Foote
J.
,
Neuberger
M. S.
,
Winter
G.
.
1986
.
Replacing the complementarity-determining regions in a human antibody with those from a mouse.
Nature
321
:
522
525
.
8
Padlan
E. A.
,
Abergel
C.
,
Tipper
J. P.
.
1995
.
Identification of specificity-determining residues in antibodies.
FASEB J.
9
:
133
139
.
9
Ramirez-Benitez
M. C.
,
Almagro
J. C.
.
2001
.
Analysis of antibodies of known structure suggests a lack of correspondence between the residues in contact with the antigen and those modified by somatic hypermutation.
Proteins
45
:
199
206
.
10
Novotný
J.
,
Handschumacher
M.
,
Haber
E.
,
Bruccoleri
R. E.
,
Carlson
W. B.
,
Fanning
D. W.
,
Smith
J. A.
,
Rose
G. D.
.
1986
.
Antigenic determinants in proteins coincide with surface regions accessible to large probes (antibody domains).
Proc. Natl. Acad. Sci. USA
83
:
226
230
.
11
Davies
D. R.
,
Padlan
E. A.
,
Sheriff
S.
.
1990
.
Antibody-antigen complexes.
Annu. Rev. Biochem.
59
:
439
473
.
12
MacCallum
R. M.
,
Martin
A. C. R.
,
Thornton
J. M.
.
1996
.
Antibody-antigen interactions: contact analysis and binding site topography.
J. Mol. Biol.
262
:
732
745
.
13
Schlessinger
A.
,
Ofran
Y.
,
Yachdav
G.
,
Rost
B.
.
2006
.
Epitome: database of structure-inferred antigenic epitopes.
Nucleic Acids Res.
34
(
Database issue
):
D777
D780
.
14
Haste Andersen
P.
,
Nielsen
M.
,
Lund
O.
.
2006
.
Prediction of residues in discontinuous B-cell epitopes using protein 3D structures.
Protein Sci.
15
:
2558
2567
.
15
Ponomarenko
J. V.
,
Bourne
P. E.
.
2007
.
Antibody-protein interactions: benchmark datasets and prediction tools evaluation.
BMC Struct. Biol.
7
:
64
.
16
Ansari
H. R.
,
Raghava
G. P.
.
2010
.
Identification of conformational B-cell epitopes in an antigen from its primary sequence.
Immunome Res.
6
:
6
.
17
Zhao
L.
,
Li
J.
.
2010
.
Mining for the antibody-antigen interacting associations that predict the B cell epitopes.
BMC Struct. Biol.
10
(
Suppl. 1
):
S6
.
18
Collis
A. V. J.
,
Brouwer
A. P.
,
Martin
A. C. R.
.
2003
.
Analysis of the antigen combining site: correlations between length and sequence composition of the hypervariable loops and the nature of the antigen.
J. Mol. Biol.
325
:
337
354
.
19
Almagro
J. C.
2004
.
Identification of differences in the specificity-determining residues of antibodies that recognize antigens of different size: implications for the rational design of antibody repertoires.
J. Mol. Recognit.
17
:
132
143
.
20
Kunik
V.
,
Peters
B.
,
Ofran
Y.
.
2012
.
Structural consensus among antibodies defines the antigen binding site.
PLOS Comput. Biol.
8
:
e1002388
.
21
Madej
T.
,
Addess
K. J.
,
Fong
J. H.
,
Geer
L. Y.
,
Geer
R. C.
,
Lanczycki
C. J.
,
Liu
C.
,
Lu
S.
,
Marchler-Bauer
A.
,
Panchenko
A. R.
, et al
.
2012
.
MMDB: 3D structures and macromolecular interactions.
Nucleic Acids Res.
40
(
Database issue
):
D461
D464
.
22
Ofran
Y.
,
Schlessinger
A.
,
Rost
B.
.
2008
.
Automated identification of complementarity determining regions (CDRs) reveals peculiar characteristics of CDRs and B cell epitopes.
J. Immunol.
181
:
6230
6235
.
23
Wu
T. T.
,
Kabat
E. A.
.
1970
.
An analysis of the sequences of the variable regions of Bence Jones proteins and myeloma light chains and their implications for antibody complementarity.
J. Exp. Med.
132
:
211
250
.
24
Berman
H. M.
,
Westbrook
J.
,
Feng
Z.
,
Gilliland
G.
,
Bhat
T. N.
,
Weissig
H.
,
Shindyalov
I. N.
,
Bourne
P. E.
.
2000
.
The protein data bank.
Nucleic Acids Res.
28
:
235
242
.
25
Martin
A. C. R.
2010
.
Protein sequence and structure analysis of antibody variable domains
. In
Antibody Engineering
,
Vol. 2
.
Kontermann
R.
,
Dübel
S.
, eds.
Springer-Verlag
,
Berlin
, p.
33
51
.
26
Kirkham
P. M.
,
Schroeder
H. W.
 Jr.
1994
.
Antibody structure and the evolution of immunoglobulin V gene segments.
Semin. Immunol.
6
:
347
360
.
27
Boeckmann
B.
,
Blatter
M. C.
,
Famiglietti
L.
,
Hinz
U.
,
Lane
L.
,
Roechert
B.
,
Bairoch
A.
.
2005
.
Protein variety and functional diversity: Swiss-Prot annotation in its biological context.
C. R. Biol.
328
:
882
899
.
28
Wheelan
S. J.
,
Marchler-Bauer
A.
,
Bryant
S. H.
.
2000
.
Domain size distributions can predict domain boundaries.
Bioinformatics
16
:
613
618
.
29
Jones
S.
,
Stewart
M.
,
Michie
A.
,
Swindells
M. B.
,
Orengo
C.
,
Thornton
J. M.
.
1998
.
Domain assignment for protein structures using a consensus approach: characterization and analysis.
Protein Sci.
7
:
233
242
.
30
Savageau
M. A.
1986
.
Proteins of Escherichia coli come in sizes that are multiples of 14 kDa: domain concepts and evolutionary implications.
Proc. Natl. Acad. Sci. USA
83
:
1198
1202
.
31
Zhang
W.
,
Xiong
Y.
,
Zhao
M.
,
Zou
H.
,
Ye
X.
,
Liu
J.
.
2011
.
Prediction of conformational B-cell epitopes from 3D structures by random forests with a distance-based feature.
BMC Bioinformatics
12
:
341
.

The authors have no financial conflicts of interest.