Ag selection has been suggested to play a role in chronic lymphocytic leukemia (CLL) pathogenesis, but no large-scale analysis has been performed so far on the structure of the Ag-binding sites (ABSs) of leukemic cell Igs. We sequenced both H and L chain V(D)J rearrangements from 366 CLL patients and modeled their three-dimensional structures. The resulting ABS structures were clustered into a small number of discrete sets, each containing ABSs with similar shapes and physicochemical properties. This structural classification correlates well with other known prognostic factors such as Ig mutation status and recurrent (stereotyped) receptors, but it shows a better prognostic value, at least in the case of one structural cluster for which clinical data were available. These findings suggest, for the first time, to our knowledge, on the basis of a structural analysis of the Ab-binding sites, that selection by a finite quota of antigenic structures operates on most CLL cases, whether mutated or unmutated.

Chronic lymphocytic leukemia (CLL) is the most frequent form of leukemia in the western world and is characterized by a clonal expansion of neoplastic mature B lymphocytes. CLL pathogenesis is still unclear, but it appears that many factors contribute to the evolution and expansion of the neoplastic clones (1).

Analyses of the sequences of H and L chain variable regions of Igs expressed on the surface of leukemic cells showed that the IGHV and IGL/KV regions undergo somatic hypermutation in ∼50% of leukemic clones (24), and that patients with mutated IGHV genes generally have a more indolent clinical course than patients with unmutated IGHV genes (5, 6).

In addition, clones from different CLL patients express Igs that contain remarkably similar IGHV amino acid sequences (711). The extent of these recurrent rearrangements, termed “stereotyped Igs,” in the CLL repertoire has been recently appreciated through the analysis of thousands of CLL H chain IGVs: ∼30% of CLL cases fall within 1 of >300 subgroups of stereotyped Igs (stereotyped subsets) (1214).

Altogether, although CLL Igs might be able to also mediate cell-autonomous signaling dependent on intrinsic motifs, as it has been recently reported (15), the earlier findings suggest that Ag–Ig interaction might play a crucial role in CLL pathogenesis as well. It is still unclear, however, whether the role of Ags is crucial in all CLL cases or is restricted to only CLLs with stereotyped Igs, which are mostly unmutated (10, 16, 17).

The definition of stereotyped Igs is mainly based on the HCDR3 amino acid composition and length. These play a crucial role in the Ag–Ig interaction, but the shape of the whole Ag-binding site (ABS) obviously also depends on other Ig regions, and it is important to analyze the whole binding site at the structural level.

In their seminal work, Wu and Kabat (18) identified three sequence portions on each Ig VH, and VK or VL domain, the so-called hypervariable regions, with an extremely variable amino acid composition in comparison with the other less variable parts. They correctly predicted such regions to assume a loop conformation and to be responsible for the selective binding of the Ag, and named them “complementary determining regions” (CDRs) in contrast with the surrounding “framework” regions. The work of Chothia and Lesk (19) and of some of us (20, 21) extended the analysis and pointed out that five of six CDRs (LCDR1–3 and HCDR1–2) and a portion of the sixth loop (HCDR3), although presenting a very variable sequence repertoire, usually adopt a limited set of backbone conformations referred to as canonical structures determined by the nature of relatively few residues that are primarily responsible for their main-chain conformations. These residues are found both within the hypervariable regions and in the conserved β-sheet framework (21).

These studies made it possible to develop ad hoc modeling techniques (22) to build Ab models accurate enough for theoretical and practical studies, such as docking (23), engineering (24), and comparison (25).

Taking advantage of the above, we analyze in this article for the first time, to our knowledge, the structures of Igs from CLL patients, with the aim of evaluating whether information on the ABS structure can provide novel insights into the Ag role in CLL pathogenesis. We modeled the structure of Igs derived from a cohort of 366 CLL patients starting from the amino acid sequences of their paired H and L chains, and studied the structural features of their ABS to highlight possible common patterns potentially correlated with the pathological phenotype.

After informed consent according to the Declaration of Helsinki, PBMCs were isolated from heparinized venous blood of patients with CLL. CLL diagnosis was based on accepted clinical and immunophenotypic features (26). Rearranged IGHV-D-J and IGKV-J or IGLV-J paired segments were sequenced from the cDNA of 218 CLL patients as described previously (2, 3); in addition, 148 Ig sequences (IGHV + IGK/LV) were retrieved from GenBank, bringing the total number of analyzed cases to 366. All of the latter were submitted to the database by the research group of Dr. K. Stamatopoulos (Hematology Department and HCT Unit, G. Papanicolaou Hospital, Thessaloniki, Greece), who verified that allelic exclusion was taken into account (personal communication). Only samples with allelic exclusion of both IGHV and IGK/LV were included in the study. Sequences were analyzed using the ImMunoGeneTics Information System (http://www.imgt.org/) (27). The mutational status of Ig clones was defined based on both IGHV and IGK/LV. Patients with leukemic clones exhibiting <2% mutations in both V segments were labeled as “unmutated CLL cases” (U-CLL), whereas patients with ≥2% IGHV and/or IGK/LV somatic mutations were defined as “mutated CLL cases” (M-CLL).

We also used a finer mutational classification by dividing the Igs into three classes: heavily mutated (HM) Igs (IGHV and/or IGK/LV percentage of mutation ≥ 3%); scarcely mutated (SM) Igs (IGHV and/or IGK/LV ≥ 1% and <3% mutations), and unmutated Igs (IGHV and IGK/LV percentage of mutation < 1%). The cutoffs adopted for our three-class partitioning are slightly different from those defined in a previous study on the IGVH gene repertoire in splenic marginal zone lymphoma (28). The classification did not change when the sequences were inspected by running IgBLAST on the National Center for Biotechnology Information human gene database.

We built a “test” data set by querying the DIGIT database (29) for all available human Igs for which the paired sequences of the L and H chains were available. After inspecting the Ig description contained in the DIGIT database and the related PubMed entry, we discarded all Igs for which no reference to any published article could be found or not corresponding to an entry in Entrez Nucleotide (http://www.ncbi.nlm.nih.gov/nuccore/), as well as all Igs already contained in our initial CLL data set. We ended up with 2441 Igs for which complete information on canonical structures, loop length, and mutation rates could be retrieved using the tools provided by DIGIT (29).

Among the 2441 Igs of the “test” data set, 212 were from CLLs, and we labeled them as “test CLL” data set. Among the remaining 2229 Igs (“test without [w/o] CLL”), we also defined a “test AI” data set including the 294 sequences for which the associated PubMed entry contained any of the MeSH terms “autoimmunity” (MeSH tree no. G12.450.192), “autoimmune disease” (MeSH tree no. C20.111), or “autoantibody” (MeSH tree no. D12.776.124.486.485.114.323). All the remaining 1935 Igs were defined as the “test w/o (CLL-AI)” data set. All considered sequences included the full-length variable domains.

We used the PIGS server, based on the canonical structure method, to derive the sequence alignments of the Ig frameworks and to build the three-dimensional models of all Igs in our data set (22). We could build 342 complete and correctly assembled models. The remaining 24 models were discarded because the modeling procedure returned an incomplete or improperly assembled model: in 14 models, the IGHV-IGL/KV packing was incorrect; in 8 cases, no template was found to model the LCDR2, and in 1 case, the HCDR3 was too long to be properly modeled.

Structural superpositions were performed using the LGA package (30). The loop coordinates as defined by Al-Lazikani et al. (31) plus the two residues flanking the N and C termini of each loop were used for the superposition of the ABSs. Two residues were considered as corresponding to each other if the distance of their Cαs after superposition was <8 Å.

The next step consisted in clustering the structures of the loops. To select the most appropriate metrics for clustering, we used the silhouette analysis (32), an effective and unbiased method for selecting the parameters leading to the best cluster separations.

The tested distance metrics were root-mean-square deviation, global distance test, and template modeling (TM) score distance matrices for the superimposed structures (33). For each distance matrix, we performed agglomerative hierarchical clustering using the agnes function (Maechler & Rousseeuw, http://cran.r-project.org/web/packages/cluster/) and divisive hierarchical clustering (diana method, Maechler & Rousseeuw, http://cran.r-project.org/web/packages/cluster/) of the R package with a number of clusters ranging from 10 to 50. The linkage functions used in our analysis were complete, single, average, and Ward’s method (34).

The best average silhouette value (0.146) was obtained using TM score as metric and a divisive clustering scheme with 21 clusters. Ig images were generated using the Pymol software (W. L. DeLano, The PyMOL Molecular Graphics System, 2002 HYPERLINK, http://www.pymol.org). Solvent-accessible surface electrostatic potentials were calculated using Adaptive Poisson-Boltzmann Solver (35).

Clustering results were compared with the IGHV and IGLV mutation status. As described earlier, we used two different classifications to represent the mutation level of Igs, namely, the two-class partition with a 2% cutoff for defining mutated and unmutated groups, and the three-class partition that divides CLLs in HM, SM, and unmutated samples. For each cluster and for the two classifications, we computed the probability that an equal or higher number of Igs belonging to the same class could be found by chance in a randomly extracted subset of the same size (hypergeometric distribution). The Bonferroni–Holm method (36) was used to correct for multiple testing. We assigned the smaller value between the two/three probability values to each cluster. The graphical representation of the cluster results was generated using the R package tool A2R.

To test whether the structural clusters were describing specific features of CLL Igs rather than general Ig characteristics, we built, for each cluster containing more than five Igs, a sequence-based hidden Markov model (HMM) (36) including the H and L chains of all Igs in the cluster. To this end, we used the HMMER package with default parameters.

Each HMM was used to score each Ig in the test data set and, for each Ig, the largest score was recorded. The same procedure was applied to the Igs in each of the clusters used to build the HMMs. For the sequence-based clustering, IGVH–IGVL/K paired sequences of all Igs of our CLL data set were clustered using the cd-hit software (37) with a sequence identity threshold of 80%. The statistical difference between the Igs in the test data set marked as CLL with all others was computed using the R implementation of the Wilcoxon Mann–Whitney U test.

We sequenced the VH and VK/VL regions of a cohort of 366 IgM+ CLL patients, 61.7% (226/366) of which expressed IgK and 38.3% (140/366) IgL isotype. According to the two-class classification described in 2Materials and Methods, the cohort comprised 47.3% (173/366) U-CLL samples, 63.6% (110/173) of which expressed κ-isotype L chains and 36.4% (63/173) the λ-isotype L chains. Of the remaining 52.7% (193/366) M-CLL samples, 60.1% (116/193) expressed the κ-isotype L chains and 39.9% (77/193) the λ-isotype L chains. Of the 366 CLL patients, 13.7% (50/366) expressed a stereotyped BCR: 48 of these belonged to 24 different previously described CLL subsets (12, 14), whereas 2 did not, thus defining a novel stereotyped subset (Supplemental Table I). The most represented CLL subsets were subsets 1, 2, and 6, respectively, representing 19.2% (n = 10), 19.2% (n = 10), and 11.9% (n = 7) of all stereotyped receptors in the cohort.

The definition of stereotyped BCRs is based on sequence information from the H chain only (12, 14). We conjectured that adding information on the L chain and focusing on structural features of the ABS would be more informative.

We used the atomic coordinates of the ABS obtained by structural modeling and quantified their structural similarity. As described in 2Materials and Methods, we could build reliable models for 342 of the 366 Igs in our CLL data set. Their ABS structures were clustered as described, leading to the definition of 21 well-separated clusters. The most populated cluster (cluster 2) contained 28% of all Igs, followed by cluster 5 (13%), cluster 9 (8.5%), and cluster 1 (7.6%). Altogether, 323 of the 342 modeled Igs (94.5%) fell in clusters containing at least 5 Ig clones. Only 19 Igs (5.5%) were distributed in smaller clusters (Supplemental Table I). Fig. 1 illustrates all 15 clusters containing ≥5 Igs. Among these, seven (clusters 1, 2, 5, 6, 16, and 21) contained Igs with only κ L chains and eight (clusters 3, 4, 7, 8, 9, 10, 19, and 20) only λ L chains. The genetic features of the samples belonging to the 21 clusters are listed in Supplemental Table I. All of the following analyses were performed on the 15 clusters formed by >5 Igs.

FIGURE 1.

Structural clustering and mutational status of CLL Igs. (A) Hierarchical divisive clustering of the 342 modeled CLL Igs grouped according to the structural similarity (TM score) of their ABSs. Clusters with five or more samples are shown. On the left, the silhouette value corresponding to different number of clusters is shown. The optimal cut (corresponding to 21 clusters) is reported as a blue dot. (B) Two-class (upper bars, unmutated [U]; mutated [M]) and three-class (lower bars, U, SM, HM) description of the mutational status of all samples. (C) Statistical analysis of the structural clusters. The number of samples, according to their two-class (upper table) and three-class (lower table) description, is reported together with the probability that the enrichment of Igs with the same mutational status observed in a cluster is due to chance alone. The p values <0.05 are reported in bold.

FIGURE 1.

Structural clustering and mutational status of CLL Igs. (A) Hierarchical divisive clustering of the 342 modeled CLL Igs grouped according to the structural similarity (TM score) of their ABSs. Clusters with five or more samples are shown. On the left, the silhouette value corresponding to different number of clusters is shown. The optimal cut (corresponding to 21 clusters) is reported as a blue dot. (B) Two-class (upper bars, unmutated [U]; mutated [M]) and three-class (lower bars, U, SM, HM) description of the mutational status of all samples. (C) Statistical analysis of the structural clusters. The number of samples, according to their two-class (upper table) and three-class (lower table) description, is reported together with the probability that the enrichment of Igs with the same mutational status observed in a cluster is due to chance alone. The p values <0.05 are reported in bold.

Close modal

We analyzed the correlation between the structural clusters and the mutational status of the Igs. Interestingly, we found that the 172 M-CLL samples and 151 U-CLL Igs segregated with a significant overrepresentation of either mutated or unmutated Igs in 5 of the clusters (clusters 1, 2, 3, 6, and 9), accounting for 53% of the cases (180/342; Fig. 1). If three different intervals of mutation are used instead of two (HM, SM, unmutated), 9 of the 15 clusters were significantly enriched in samples belonging to 1 of the 3 groups (clusters 1, 2, 3, 4, 6, 9, 18, 19, and 20), accounting for 224 of 342 (65%) modeled Igs (Fig. 1).

We also mapped the hydrophobicity and electrostatic potential of the ABS surface of our models, and these properties also turned out to be very similar in Igs within a cluster (Figs. 24). As an example, the structure of all Igs belonging to one cluster (cluster 19) is shown in Fig. 2. As can be appreciated from Fig. 2, remarkable similarities can be observed in terms of conformation, hydrophobicity, and electrostatic potential of the ABS surface. Fig. 3 shows, for each of the clusters, one representative Ig structure. Conserved hydrophobic patches can be identified in some clusters, located either in the center of the ABS (clusters 7 and 9), near H3 (clusters 2, 6, 12, 18), or near the H2 loop (clusters 1 and 3; Fig. 3). Based on the classifications adopted in the literature (38, 39), a summary of the ABS characteristics are reported in Table I. It is apparent that these characteristics, even if very difficult to quantify in an objective way in protein models, can provide an overview of the main differences and similarities between the ABSs belonging to different clusters and to the same cluster, respectively. It is to be expected that they are related to the nature of the respective Ags. For example, Abs with deep pockets, grooves, and flat sites are often specific for small molecules, peptides, and proteins, respectively. Interestingly, Igs belonging to the same ABS cluster, thus sharing high structural similarity, do not necessarily show a high level of sequence similarity, as demonstrated by the examples shown in Fig. 4.

FIGURE 2.

Hydrophobicity and electrostatic potentials of samples in cluster 19. The solvent-accessible surface of all CLL samples of cluster 19 is colored according to the Eisenberg hydrophobicity scale (right, hydrophobic in green, hydrophilic in white) and to the electrostatic potentials (left, red is negatively charged, blue is positively charged). Molecules are shown from the Ag point of view (ABS is visible, L chain is on the left, H chain on the right, H3 loop on top). The region of the hierarchical clustering corresponding to cluster 19 and sample names are reported on the left.

FIGURE 2.

Hydrophobicity and electrostatic potentials of samples in cluster 19. The solvent-accessible surface of all CLL samples of cluster 19 is colored according to the Eisenberg hydrophobicity scale (right, hydrophobic in green, hydrophilic in white) and to the electrostatic potentials (left, red is negatively charged, blue is positively charged). Molecules are shown from the Ag point of view (ABS is visible, L chain is on the left, H chain on the right, H3 loop on top). The region of the hierarchical clustering corresponding to cluster 19 and sample names are reported on the left.

Close modal
FIGURE 4.

Examples of high structural similarity of Igs with low sequence identity. Samples CLL038 and N1405 (top, cluster 2) have different IGHV, IGHD, and IGKV genes, and their L and H chains share only 73 and 49% sequence identity, respectively. Samples CLL282 and CLLGN24 (cluster 4, middle) use different IGHV, IGHD, and IGHJ genes, and samples CLL048 and CLL270 (cluster 15, bottom) use different IGHD, IGHJ, and IGKV genes. In all cases, the pairs have a nearly identical binding site.

FIGURE 4.

Examples of high structural similarity of Igs with low sequence identity. Samples CLL038 and N1405 (top, cluster 2) have different IGHV, IGHD, and IGKV genes, and their L and H chains share only 73 and 49% sequence identity, respectively. Samples CLL282 and CLLGN24 (cluster 4, middle) use different IGHV, IGHD, and IGHJ genes, and samples CLL048 and CLL270 (cluster 15, bottom) use different IGHD, IGHJ, and IGKV genes. In all cases, the pairs have a nearly identical binding site.

Close modal
FIGURE 3.

Structures of representative samples for each of the clusters in Fig. 1. For each structural cluster containing at least five Igs, a representative sample has been selected to provide a view of the physicochemical properties of the Igs in the cluster. Solvent-accessible surfaces are colored according to the Eisenberg hydrophobicity scale (right, hydrophobic in green, hydrophilic in white) and to the electrostatic potentials (left, red is negatively charged, blue is positively charged).

FIGURE 3.

Structures of representative samples for each of the clusters in Fig. 1. For each structural cluster containing at least five Igs, a representative sample has been selected to provide a view of the physicochemical properties of the Igs in the cluster. Solvent-accessible surfaces are colored according to the Eisenberg hydrophobicity scale (right, hydrophobic in green, hydrophilic in white) and to the electrostatic potentials (left, red is negatively charged, blue is positively charged).

Close modal
Table I.
Summary of the predominant structural and physicochemical properties of the ABS surface, and amino acid lengths of the hypervariable loops, for clusters with five or more samples
ABS ClusterABS Shapea
ChargeHydrophobicityNo. of Samples (%)L1 Average (±SD)L3 Average (±SD)H1 Average (±SD)H2 Average (±SD)H3 Average (±SD)
CavityGroovePlanar
   26 (7.6) 12.1 ± 1.5 9.2 ± 1.0* 5.0 ± 0.0* 17.0 ± 0.0* 18.9 ± 3.7* 
    95 (27.8) 11.4 ± 1.3*** 9.1 ± 0.9*** 5.2 ± 0.5 16.8 ± 0.7 14.3 ± 3.3*** 
  — 6 (1.8) 11.5 ± 1.2 10.5 ± 1.0 5.0 ± 0.0 17.0 ± 0.0 20.3 ± 3.8 
  —  23 (6.7) 11.0 ± 0.0*** 11.4 ± 1.3*** 5.1 ± 0.5 16.7 ± 0.7 12.2 ± 4.6*** 
 Partial    48 (14.0) 16.0 ± 1.6*** 9.0 ± 0.9*** 5.2 ± 0.6 16.9 ± 0.7 18.5 ± 3.9** 
   24 (7.0) 11.0 ± 0.3* 9.4 ± 0.9 6.1 ± 1.0*** 16.2 ± 0.4*** 19.4 ± 2.4** 
 14 (4.0) 13.6 ± 0.8* 10.6 ± 0.5*** 5.3 ± 0.7 16.9 ± 0.8 18.2 ± 4.0 
 Partial  —  5 (1.4) 11.6 ± 1.3 10.6 ± 1.7 5.0 ± 0.0 17.0 ± 0.0 17.2 ± 5.2 
   29 (8.5) 13.6 ± 0.7* 10.2 ± 1.1* 5.3 ± 0.6 16.5 ± 0.5* 14.6 ± 3.8* 
10     12 (3.5) 11.0 ± 0.0** 10.3 ± 1.0 5.5 ± 0.9* 16.3 ± 0.5 17.5 ± 4.9 
12   8 (2.3) 12.6 ± 2.7 10.1 ± 0.8 5.0 ± 0.0 16.7 ± 0.5 19.2 ± 5.2 
15 Partial   12 (3.5) 11.8 ± 1.9 9.4 ± 1.2 5.0 ± 0.0 17.0 ± 0.0 16.9 ± 4.6 
18 Partial   8 (2.3) 11.0 ± 0.0* 10.9 ± 0.3** 5.2 ± 0.7 16.9 ± 0.6 20.5 ± 7.1* 
19   — 8 (2.3) 12.0 ± 1.4 12.2 ± 1.2** 5.0 ± 0.0 17.0 ± 0.0 16.2 ± 3.3 
20   —  5 (1.4) 16.0 ± 0.0** 10.0 ± 1.0 5.0 ± 0.0 16.6 ± 0.5 19.0 ± 7.6 
Whole CLL data set      342 (100) 12.5 ± 2.1 9.8 ± 1.27 5.25 ± 0.6 16.6 ± 0.7 16.5 ± 4.7 
ABS ClusterABS Shapea
ChargeHydrophobicityNo. of Samples (%)L1 Average (±SD)L3 Average (±SD)H1 Average (±SD)H2 Average (±SD)H3 Average (±SD)
CavityGroovePlanar
   26 (7.6) 12.1 ± 1.5 9.2 ± 1.0* 5.0 ± 0.0* 17.0 ± 0.0* 18.9 ± 3.7* 
    95 (27.8) 11.4 ± 1.3*** 9.1 ± 0.9*** 5.2 ± 0.5 16.8 ± 0.7 14.3 ± 3.3*** 
  — 6 (1.8) 11.5 ± 1.2 10.5 ± 1.0 5.0 ± 0.0 17.0 ± 0.0 20.3 ± 3.8 
  —  23 (6.7) 11.0 ± 0.0*** 11.4 ± 1.3*** 5.1 ± 0.5 16.7 ± 0.7 12.2 ± 4.6*** 
 Partial    48 (14.0) 16.0 ± 1.6*** 9.0 ± 0.9*** 5.2 ± 0.6 16.9 ± 0.7 18.5 ± 3.9** 
   24 (7.0) 11.0 ± 0.3* 9.4 ± 0.9 6.1 ± 1.0*** 16.2 ± 0.4*** 19.4 ± 2.4** 
 14 (4.0) 13.6 ± 0.8* 10.6 ± 0.5*** 5.3 ± 0.7 16.9 ± 0.8 18.2 ± 4.0 
 Partial  —  5 (1.4) 11.6 ± 1.3 10.6 ± 1.7 5.0 ± 0.0 17.0 ± 0.0 17.2 ± 5.2 
   29 (8.5) 13.6 ± 0.7* 10.2 ± 1.1* 5.3 ± 0.6 16.5 ± 0.5* 14.6 ± 3.8* 
10     12 (3.5) 11.0 ± 0.0** 10.3 ± 1.0 5.5 ± 0.9* 16.3 ± 0.5 17.5 ± 4.9 
12   8 (2.3) 12.6 ± 2.7 10.1 ± 0.8 5.0 ± 0.0 16.7 ± 0.5 19.2 ± 5.2 
15 Partial   12 (3.5) 11.8 ± 1.9 9.4 ± 1.2 5.0 ± 0.0 17.0 ± 0.0 16.9 ± 4.6 
18 Partial   8 (2.3) 11.0 ± 0.0* 10.9 ± 0.3** 5.2 ± 0.7 16.9 ± 0.6 20.5 ± 7.1* 
19   — 8 (2.3) 12.0 ± 1.4 12.2 ± 1.2** 5.0 ± 0.0 17.0 ± 0.0 16.2 ± 3.3 
20   —  5 (1.4) 16.0 ± 0.0** 10.0 ± 1.0 5.0 ± 0.0 16.6 ± 0.5 19.0 ± 7.6 
Whole CLL data set      342 (100) 12.5 ± 2.1 9.8 ± 1.27 5.25 ± 0.6 16.6 ± 0.7 16.5 ± 4.7 
a

According to criteria described previously (38, 39). The average length of hypervariable loops L1, L3, H1, H2, and H3 for each cluster was compared with the average lengths evaluated over all other Igs of the CLL data set (permutation test).

*

p < 0.05, **p < 0.01, ***p < 0.001.

The length of some hypervariable loops belonging to some clusters also shows a bias (Table I). For instance, cluster 2 includes Igs with short L1, L3, and H3 loops; cluster 1 those with long and hydrophobic H2 and H3 loops; in cluster 4, a long L3 associates with short L1 and H3 loops; whereas in cluster 5, the opposite pattern is observed. In cluster 6, a groove is formed by a short H2 and long H1 and H3 loops. A long H3 is present in clusters 12 and 18 as well. Notably, cluster 2 contains a remarkable fraction (∼25%) of our CLL Igs and, despite its numerosity, it is still rather homogeneous in terms of the included structures, most of which contain a cavity in the ABS surface and have short, hypervariable loops (Fig. 1, Table I).

It has been reported that the IGHV gene mutational status correlates with the overall survival of affected patients. This is also the case, on average, for Igs in our CLL patients for which clinical information is available (Fig. 5A). Remarkably, a different pattern is observed for Igs belonging to cluster 2, for which there are sufficient clinical data and a balanced distribution of mutated and nonmutated Igs to carry out a statistically significant analysis. Mutated and unmutated Igs belonging to this cluster have almost overlapping overall survival curves and very similar median of the survival time (174 versus 172 mo; Fig. 5B). We compared this result with that obtained by selecting at random 10,000 times the same number of Igs as in cluster 2 from our data set, whereas keeping the same ratio of mutated and unmutated samples. In only 6.6% of the cases was the difference in the median survival time between patients with mutated and nonmutated Igs in the random sampling smaller than or equal to that observed in cluster 2. This implies that the survival of CLL patients is not necessarily only correlated to the mutational status of their Igs, but, as in the case of the cluster 2 samples, might be related to the ABS properties and, therefore, most likely to the recognized Ag.

FIGURE 5.

Overall survival (OS) analysis with respect to somatic hypermutation. Kaplan–Maier curves for OS calculated separately for U-CLL (<2% mutation) and M-CLL (≥2% mutation) samples from the whole CLL data set (A) or cluster 2 CLLs (B). Clinical data are available for only a fraction of patients; therefore, the analysis was performed for n = 101 CLLs distributed across all clusters (A) and n = 26 cluster 2 CLLs (B). Median survival values were 120 and 196 for U-CLL and M-CLL samples in the whole data set (p = 0.0011) (A) and 172 and 174 for cluster 2 U-CLL and M-CLL samples (B).

FIGURE 5.

Overall survival (OS) analysis with respect to somatic hypermutation. Kaplan–Maier curves for OS calculated separately for U-CLL (<2% mutation) and M-CLL (≥2% mutation) samples from the whole CLL data set (A) or cluster 2 CLLs (B). Clinical data are available for only a fraction of patients; therefore, the analysis was performed for n = 101 CLLs distributed across all clusters (A) and n = 26 cluster 2 CLLs (B). Median survival values were 120 and 196 for U-CLL and M-CLL samples in the whole data set (p = 0.0011) (A) and 172 and 174 for cluster 2 U-CLL and M-CLL samples (B).

Close modal

The BCR stereotype of the obtained structural clusters is remarkably biased, with cluster 4 containing all the 12 subset 2 BCR samples, cluster 1 all subset 6, cluster 2 most of the subset 1 (7/10), and cluster 7 containing both of the 2 novel stereotyped BCRs identified in this study for the first time, to our knowledge (Supplemental Table I).

To verify whether the structural clustering captures, in whole or in part, features that are specific for CLL Igs, we built HMMs for each of the clusters using the sequences of its members and used them to classify the sequences of the independent Ig data sets described in 2Materials and Methods that are the “test” data set composed by Igs from CLL patients not present in our original data set (n = 212) and the “test w/o CLL” data set including non-CLL Igs (n = 2229).

The resulting similarity score distributions showed that members of the “test CLL” had a significantly higher score than those from the “test w/o CLL” Igs (p = 0.0014). We also repeated the procedure on our “test AI” and “test w/o (CLL-AI)” data sets, and observed that the scores for the “test AI” samples were rather similar to those of CLL Igs. Accordingly, when the method was applied to the “test CLL” and “test w/o (CLL-AI)” Igs (the remaining non-CLL/nonautoreactive Igs, n = 1935), the difference between the scores of the former and latter data sets was even more pronounced (p = 7.1 × 10−5; Fig. 6A). These results indicate that our structural clustering well reflects the properties of CLL Abs, and that these are somewhat similar to those of autoreactive Ig-binding sites, as it has been suggested before by studies showing that CLLs originate from self-reactive B cell precursors (40, 41). However, it was also observed that although U-CLLs expressed autoreactive Abs, most M-CLLs did not (41). In line with this latter observation, when scoring “test AI” Ig sequences using the HMMs generated from each of our structural clusters, the scores with the HMMs generated from clusters 2 and 9, which contain mostly mutated Igs, were significantly lower than with the others (p = 3.9e-10; see also Supplemental Table II, showing the scores of AI Abs with HMM derived from clusters enriched in mutated and unmutated Igs).

FIGURE 6.

Box plots of scores obtained by scanning the test data sets with sequence-based HMMs derived from the structural clusters. Score distributions on Igs from a “test” data set obtained by assigning to each “test” Ig a score reflecting its similarity with the representative profiles (HMM) of each cluster. The “test” data set is divided in three “test” sets, namely, “Test CLL”, “Test AI,” and “Test w/o (CLL+AI)” (all remaining Igs of the “test” set after subtracting “Test CLL” and “Test AI”). The data for the three-dimensional clusters are shown in (A); those obtained using the paired VH–VK/VL amino acid sequence are shown in (B). Box edges indicate the first and third quartiles; the whiskers extend to the most extreme data points, and the central bar indicates the median. Dashed gray area represents the 5th and 95th percentile of scores obtained on the same Igs used to build the HMMs. (A) Distributions of the scores for HMMs built on the 15 most populated three-dimensional clusters of our CLL data set. (B) Distributions of the scores for HMMs built on the 16 most populated clusters of paired VH–VK/VL amino acid sequences.

FIGURE 6.

Box plots of scores obtained by scanning the test data sets with sequence-based HMMs derived from the structural clusters. Score distributions on Igs from a “test” data set obtained by assigning to each “test” Ig a score reflecting its similarity with the representative profiles (HMM) of each cluster. The “test” data set is divided in three “test” sets, namely, “Test CLL”, “Test AI,” and “Test w/o (CLL+AI)” (all remaining Igs of the “test” set after subtracting “Test CLL” and “Test AI”). The data for the three-dimensional clusters are shown in (A); those obtained using the paired VH–VK/VL amino acid sequence are shown in (B). Box edges indicate the first and third quartiles; the whiskers extend to the most extreme data points, and the central bar indicates the median. Dashed gray area represents the 5th and 95th percentile of scores obtained on the same Igs used to build the HMMs. (A) Distributions of the scores for HMMs built on the 15 most populated three-dimensional clusters of our CLL data set. (B) Distributions of the scores for HMMs built on the 16 most populated clusters of paired VH–VK/VL amino acid sequences.

Close modal

It is relevant to stress that the classification abilities of the HMMs originate from the structural based protocol that we used for clustering. This is proven by the following experiment. We clustered the Igs using sequence similarity as a metric (see 2Materials and Methods) and obtained 16 clusters with ≥5 elements accounting for 44% of the Igs and 141 very small clusters including all the remaining ones. HMMs built on the sequence-based 16 clusters were unable to distinguish between the elements of the “test CLL” and “test w/o CLL” data sets and between those of the “test CLL” and “test w/o (CLL-AI)” data sets (Fig. 6B).

This finding also demonstrated that, although there is a correlation between germline usage and structural similarity, the latter captures additional relevant information that is not, or perhaps only very weakly, correlated with germline usage, such as the HCDR3 structure and the VL/VH packing.

It has been a matter of discussion for several years whether the B lymphocyte clones that accumulate in CLL patients display widely distributed Ag specificity among the billions of self and nonself antigenic epitopes encountered by the immune system, or alternatively, if they express a restricted set of ABSs. Support to the first hypothesis is provided by the observation that stereotyped IgV rearrangements exist (2, 79), even though they are mostly found in U-CLL cases (10, 16, 17). Furthermore, a classification based on stereotyped receptors accounts for only a fraction of CLL patients, and these are distributed in quite a large number of subsets (>300). This is likely due to the fact that the definition of stereotyped BCRs is mostly based on the HCDR3 amino acid sequence composition and length, which only partially contributes to the shape of the ABS of an Ig.

In this study, we exploited the availability of our accurate protocol for modeling the structure of Igs and of our collection of paired VL/VH sequences of Igs to investigate whether a better understanding of the CLL Igs could be obtained by taking into account the shape of their complete binding site.

To this end, we built models of all the complete Igs from CLL patients obtained by us and retrieved from public sources, and clustered them on the basis of the structural similarity of their binding sites.

Interestingly, the samples, both U-CLL and M-CLL, could be partitioned in a limited number of clusters. Notably, this is not the case if the clustering is done on the basis of sequence similarity.

Most members of the clusters shared interesting properties other than their structural similarity, such as the type of L chain (κ or λ), BCR stereotypes, and mutational status. In some instances, members of the same cluster display remarkable homogeneity also in terms of the IGHV-IGK/LV usage and H/L chains pairing.

For example, Igs belonging to cluster 1 all carry the IGHV1-69 gene combined with IGKV L chains, and cluster 3 contains all unmutated Igs that use the IGHV1-69 gene combined with L chains that almost exclusively use genes of the IGLV3 family.

Cluster 4 includes almost half of all SM cases of our CLL cohort and all subset 2 stereotyped cases of the data set. These CLLs use the IGHV3-21/IGLV3-21 genes and are known to have unfavorable clinical outcome regardless of their mutational status (42). Interestingly, cluster 4 includes IGHV3-21, but also IGHV3-48 and IGHV3-11 CLL cases. In a previous study (43), a very large data set of HCDR3 amino acid sequences was used to cluster CLL cases based on their HCDR3 sequences, and the IGHV3-21 gene predominated in a cluster where also CLLs using the IGHV3-48 and IGHV3-11 genes were present. This was interpreted as suggestive of the presence of some functional constraint that we can now also relate to the structure of the binding site. Interestingly, the composition of our cluster 4 indicates that the IGHV3-21, IGHV3-48, and IGHV3-11 CLL genes generate a structurally similar binding site (almost) only when paired with the IGLV3-21 L chain gene.

We used our structural clustering results to generate statistical models of their members in the form of HMMs. The generated HMMs have discriminative power in that they are able to identify CLL Igs in a large data set not including the Igs used for clustering and, remarkably, are also able to separate AI Abs from non-AI ones. Also in this case, no discriminative power could be achieved using a sequence-based classification.

The ability of the structure-based HMMs to identify common features among AI Abs can, in principle, be due to a bias in the data set because sequenced AI Abs react to a subset of specific Ags. However, it is very likely that this potential bias is not, or is not solely, responsible for our finding given the fact that in several cases auto-Ags have been proved to be reactive with CLL clones (44).

Our results strongly suggest that specific features of the CLL Igs reside in the overall atomic structure of their binding site and therefore provide support to the hypothesis that a finite number of antigenic structures may be involved in CLL pathogenesis.

Our data also show that the features of the ABS are partially captured by the stereotype subset classification, even though the latter relies only on the amino acid sequence of CDR3. For example, our cluster 1 contains all subset 6 stereotyped Igs, cluster 4 all subset 2 stereotyped cases, cluster 2 mostly includes subset 1 stereotyped BCRs (7/10), and cluster 7 contains the 2 novel stereotyped BCRs identified for the first time, to our knowledge, in this study.

The possibility of clustering CLL Igs on the basis of their functional properties as determined by the structure of their binding site and potentially by their recognized Ag raises the interesting question of whether there is any correlation with the clinical outcome of the patients. This might be expected because some BCR stereotype subsets (14, 17, 4548), as well as the mutational status (5, 6), are known to correlate with the patient prognosis.

Clinical data are available for only a limited set of patients; however, the analysis of our most populated cluster (cluster 2) shows very promising results.

This cluster mostly contains M-CLL cases and a smaller fraction of U-CLL cases. As mentioned earlier, M-CLLs have generally longer life expectancy, less need for therapy, and better response to treatment when compared with U-CLLs. However, the U-CLL cases within cluster 2 display a clinical outcome much more favorable than what is generally observed for U-CLL patients. Should this result be confirmed on a larger cohort of patients, it would indicate that clinical outcome is correlated to the structure of the binding site and only indirectly linked to the IGHV mutation status. In this view, cluster subdivision would be of help in better classifying the numerous outliers observed using the more simplistic U-CLL and M-CLL classification.

This is the first time, to our knowledge, that Igs from CLL patients were analyzed from a structural point of view, and we believe that our results point to the relevance of using this approach on a larger scale, which can now be easily handled by current methodologies for modeling and structural analysis.

This is a perspective study and the clinical data are still rather sparse. Clearly we will continue to monitor the clinical outcome of the patients enrolled in this study, as well as repeat the analysis on samples that will become available in the future, because we strongly believe that our strategy has the potential to lead to advances in understanding the nature of CLL and in managing patients.

The correlation between ABS structure and clinical outcome, if confirmed, may provide novel tools for a more robust prognostic stratification of CLL, also thanks to the fact that sequencing of the IgVL can easily become a standard laboratory test, as it is already the case for IgVH sequencing, and the modeling and clustering protocols are very well defined and available.

Currently, it is essentially impossible to identify the Ag given the structure of the cognate Ab-binding site, but this might change in the future and hopefully we might also be able to gain insight in the nature of the Ags associated with CLL pathogenesis, which would obviously have important applications for therapy.

This work was supported by Associazione Italiana Ricerca sul Cancro (IG-10698 to F.F.; IG-10492 to M.F.); Compagnia di San Paolo (4824 SD/CV, 2007.2880 to F.F.); Fondazione Maria Piaggio Casarsa, Genova, Italy (to F.G.); the National Institutes of Health (Grant RO1 CA81554 to N.C.); and King Abdullah University of Science and Technology (Grant KUK-I1-012-43 to A.T.). M.C. has a fellowship from the Associazione Italiana Ricerca sul Cancro 5 per Mille.

The online version of this article contains supplemental material.

Abbreviations used in this article:

ABS

Ag-binding site

CDR

complementary determining region

CLL

chronic lymphocytic leukemia

HM

heavily mutated

HMM

hidden Markov model

M-CLL

mutated chronic lymphocytic leukemia case

SM

scarcely mutated

TM

template modeling

U-CLL

unmutated chronic lymphocytic leukemia case

w/o

without.

1
Chiorazzi
N.
,
Rai
K. R.
,
Ferrarini
M.
.
2005
.
Chronic lymphocytic leukemia.
N. Engl. J. Med.
352
:
804
815
.
2
Fais
F.
,
Ghiotto
F.
,
Hashimoto
S.
,
Sellars
B.
,
Valetto
A.
,
Allen
S. L.
,
Schulman
P.
,
Vinciguerra
V. P.
,
Rai
K.
,
Rassenti
L. Z.
, et al
.
1998
.
Chronic lymphocytic leukemia B cells express restricted sets of mutated and unmutated antigen receptors.
J. Clin. Invest.
102
:
1515
1525
.
3
Ghiotto
F.
,
Fais
F.
,
Albesiano
E.
,
Sison
C.
,
Valetto
A.
,
Gaidano
G.
,
Reinhardt
J.
,
Kolitz
J. E.
,
Rai
K.
,
Allen
S. L.
, et al
.
2006
.
Similarities and differences between the light and heavy chain Ig variable region gene repertoires in chronic lymphocytic leukemia.
Mol. Med.
12
:
300
308
.
4
Hadzidimitriou
A.
,
Darzentas
N.
,
Murray
F.
,
Smilevska
T.
,
Arvaniti
E.
,
Tresoldi
C.
,
Tsaftaris
A.
,
Laoutaris
N.
,
Anagnostopoulos
A.
,
Davi
F.
, et al
.
2009
.
Evidence for the significant role of immunoglobulin light chains in antigen recognition and selection in chronic lymphocytic leukemia.
Blood
113
:
403
411
.
5
Damle
R. N.
,
Wasil
T.
,
Fais
F.
,
Ghiotto
F.
,
Valetto
A.
,
Allen
S. L.
,
Buchbinder
A.
,
Budman
D.
,
Dittmar
K.
,
Kolitz
J.
, et al
.
1999
.
Ig V gene mutation status and CD38 expression as novel prognostic indicators in chronic lymphocytic leukemia.
Blood
94
:
1840
1847
.
6
Hamblin
T. J.
,
Davis
Z.
,
Gardiner
A.
,
Oscier
D. G.
,
Stevenson
F. K.
.
1999
.
Unmutated Ig V(H) genes are associated with a more aggressive form of chronic lymphocytic leukemia.
Blood
94
:
1848
1854
.
7
Ghiotto
F.
,
Fais
F.
,
Valetto
A.
,
Albesiano
E.
,
Hashimoto
S.
,
Dono
M.
,
Ikematsu
H.
,
Allen
S. L.
,
Kolitz
J.
,
Rai
K. R.
, et al
.
2004
.
Remarkably similar antigen receptors among a subset of patients with chronic lymphocytic leukemia.
J. Clin. Invest.
113
:
1008
1016
.
8
Messmer
B. T.
,
Albesiano
E.
,
Efremov
D. G.
,
Ghiotto
F.
,
Allen
S. L.
,
Kolitz
J.
,
Foa
R.
,
Damle
R. N.
,
Fais
F.
,
Messmer
D.
, et al
.
2004
.
Multiple distinct sets of stereotyped antigen receptors indicate a role for antigen in promoting chronic lymphocytic leukemia.
J. Exp. Med.
200
:
519
525
.
9
Tobin
G.
,
Thunberg
U.
,
Johnson
A.
,
Eriksson
I.
,
Söderberg
O.
,
Karlsson
K.
,
Merup
M.
,
Juliusson
G.
,
Vilpo
J.
,
Enblad
G.
, et al
.
2003
.
Chronic lymphocytic leukemias utilizing the VH3-21 gene display highly restricted Vlambda2-14 gene use and homologous CDR3s: implicating recognition of a common antigen epitope.
Blood
101
:
4952
4957
.
10
Tobin
G.
,
Thunberg
U.
,
Karlsson
K.
,
Murray
F.
,
Laurell
A.
,
Willander
K.
,
Enblad
G.
,
Merup
M.
,
Vilpo
J.
,
Juliusson
G.
, et al
.
2004
.
Subsets with restricted immunoglobulin gene rearrangement features indicate a role for antigen selection in the development of chronic lymphocytic leukemia.
Blood
104
:
2879
2885
.
11
Widhopf
G. F.
 II
,
Rassenti
L. Z.
,
Toy
T. L.
,
Gribben
J. G.
,
Wierda
W. G.
,
Kipps
T. J.
.
2004
.
Chronic lymphocytic leukemia B cells of more than 1% of patients express virtually identical immunoglobulins.
Blood
104
:
2499
2504
.
12
Agathangelidis
A.
,
Darzentas
N.
,
Hadzidimitriou
A.
,
Brochet
X.
,
Murray
F.
,
Yan
X. J.
,
Davis
Z.
,
van Gastel-Mol
E. J.
,
Tresoldi
C.
,
Chu
C. C.
, et al
.
2012
.
Stereotyped B-cell receptors in one-third of chronic lymphocytic leukemia: a molecular classification with implications for targeted therapies.
Blood
119
:
4467
4475
.
13
Murray
F.
,
Darzentas
N.
,
Hadzidimitriou
A.
,
Tobin
G.
,
Boudjogra
M.
,
Scielzo
C.
,
Laoutaris
N.
,
Karlsson
K.
,
Baran-Marzsak
F.
,
Tsaftaris
A.
, et al
.
2008
.
Stereotyped patterns of somatic hypermutation in subsets of patients with chronic lymphocytic leukemia: implications for the role of antigen selection in leukemogenesis.
Blood
111
:
1524
1533
.
14
Stamatopoulos
K.
,
Belessi
C.
,
Moreno
C.
,
Boudjograh
M.
,
Guida
G.
,
Smilevska
T.
,
Belhoul
L.
,
Stella
S.
,
Stavroyianni
N.
,
Crespo
M.
, et al
.
2007
.
Over 20% of patients with chronic lymphocytic leukemia carry stereotyped receptors: pathogenetic implications and clinical correlations.
Blood
109
:
259
270
.
15
Minden
M. D.
,
Ubelhart
R.
,
Schneider
D.
,
Wossning
T.
,
Bach
M. P.
,
Buchner
M.
,
Hofmann
D.
,
Surova
E.
,
Follo
M.
,
Kohler
F.
, et al
.
2012
.
Chronic lymphocytic leukaemia is driven by antigen-independent cell-autonomous signalling.
Nature.
489
:
309
312
.
16
Sutton
L. A.
,
Kostareli
E.
,
Hadzidimitriou
A.
,
Darzentas
N.
,
Tsaftaris
A.
,
Anagnostopoulos
A.
,
Rosenquist
R.
,
Stamatopoulos
K.
.
2009
.
Extensive intraclonal diversification in a subgroup of chronic lymphocytic leukemia patients with stereotyped IGHV4-34 receptors: implications for ongoing interactions with antigen.
Blood
114
:
4460
4468
.
17
Thorsélius
M.
,
Kröber
A.
,
Murray
F.
,
Thunberg
U.
,
Tobin
G.
,
Bühler
A.
,
Kienle
D.
,
Albesiano
E.
,
Maffei
R.
,
Dao-Ung
L. P.
, et al
.
2006
.
Strikingly homologous immunoglobulin gene rearrangements and poor outcome in VH3-21-using chronic lymphocytic leukemia patients independent of geographic origin and mutational status.
Blood
107
:
2889
2894
.
18
Wu
T. T.
,
Kabat
E. A.
.
1970
.
An analysis of the sequences of the variable regions of Bence Jones proteins and myeloma light chains and their implications for antibody complementarity.
J. Exp. Med.
132
:
211
250
.
19
Chothia
C.
,
Lesk
A. M.
.
1987
.
Canonical structures for the hypervariable regions of immunoglobulins.
J. Mol. Biol.
196
:
901
917
.
20
Morea
V.
,
Tramontano
A.
,
Rustici
M.
,
Chothia
C.
,
Lesk
A. M.
.
1998
.
Conformations of the third hypervariable region in the VH domain of immunoglobulins.
J. Mol. Biol.
275
:
269
294
.
21
Tramontano
A.
,
Chothia
C.
,
Lesk
A. M.
.
1990
.
Framework residue 71 is a major determinant of the position and conformation of the second hypervariable region in the VH domains of immunoglobulins.
J. Mol. Biol.
215
:
175
182
.
22
Marcatili
P.
,
Rosi
A.
,
Tramontano
A.
.
2008
.
PIGS: automatic prediction of antibody structures.
Bioinformatics
24
:
1953
1954
.
23
Pedotti
M.
,
Simonelli
L.
,
Livoti
E.
,
Varani
L.
.
2011
.
Computational docking of antibody-antigen complexes, opportunities and pitfalls illustrated by influenza hemagglutinin.
Int. J. Mol. Sci.
12
:
226
251
.
24
Morea
V.
,
Lesk
A. M.
,
Tramontano
A.
.
2000
.
Antibody modeling: implications for engineering and design.
Methods
20
:
267
279
.
25
Almagro
J. C.
,
Beavers
M. P.
,
Hernandez-Guzman
F.
,
Maier
J.
,
Shaulsky
J.
,
Butenhof
K.
,
Labute
P.
,
Thorsteinson
N.
,
Kelly
K.
,
Teplyakov
A.
, et al
.
2011
.
Antibody modeling assessment.
Proteins
79
:
3050
3066
.
26
Cheson
B. D.
,
Bennett
J. M.
,
Grever
M.
,
Kay
N.
,
Keating
M. J.
,
O’Brien
S.
,
Rai
K. R.
.
1996
.
National Cancer Institute-sponsored Working Group guidelines for chronic lymphocytic leukemia: revised guidelines for diagnosis and treatment.
Blood
87
:
4990
4997
.
27
Pommié
C.
,
Levadoux
S.
,
Sabatier
R.
,
Lefranc
G.
,
Lefranc
M. P.
.
2004
.
IMGT standardized criteria for statistical analysis of immunoglobulin V-REGION amino acid properties.
J. Mol. Recognit.
17
:
17
32
.
28
Bikos
V.
,
Darzentas
N.
,
Hadzidimitriou
A.
,
Davis
Z.
,
Hockley
S.
,
Traverse-Glehen
A.
,
Algara
P.
,
Santoro
A.
,
Gonzalez
D.
,
Mollejo
M.
, et al
.
2012
.
Over 30% of patients with splenic marginal zone lymphoma express the same immunoglobulin heavy variable gene: ontogenetic implications.
Leukemia
26
:
1638
1646
.
29
Chailyan
A.
,
Tramontano
A.
,
Marcatili
P.
.
2012
.
A database of immunoglobulins with integrated tools: DIGIT.
Nucleic Acids Res.
40
(
Database issue
):
D1230
D1234
.
30
Zemla
A.
2003
.
LGA: A method for finding 3D similarities in protein structures.
Nucleic Acids Res.
31
:
3370
3374
.
31
Al-Lazikani
B.
,
Lesk
A. M.
,
Chothia
C.
.
1997
.
Standard conformations for the canonical structures of immunoglobulins.
J. Mol. Biol.
273
:
927
948
.
32
Rousseeuw
P. J.
1987
.
Silhouettes: a graphical aid to the interpretation and validation of cluster analysis.
J. Comput. Appl. Math.
20
:
53
65
.
33
Zhang
Y.
,
Skolnick
J.
.
2004
.
Scoring function for automated assessment of protein structure template quality.
Proteins
57
:
702
710
.
34
Anderberg
M. R.
1973
.
Cluster Analysis for Applications.
Academic Press
,
New York
.
35
Baker
N. A.
,
Sept
D.
,
Joseph
S.
,
Holst
M. J.
,
McCammon
J. A.
.
2001
.
Electrostatics of nanosystems: application to microtubules and the ribosome.
Proc. Natl. Acad. Sci. USA
98
:
10037
10041
.
36
Eddy
S. R.
1998
.
Profile hidden Markov models.
Bioinformatics
14
:
755
763
.
37
Li
W.
,
Godzik
A.
.
2006
.
Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences.
Bioinformatics
22
:
1658
1659
.
38.
Lee
M.
,
Lloyd
P.
,
Zhang
X.
,
Schallhorn
J. M.
,
Sugimoto
K.
,
Leach
A. G.
,
Sapiro
G.
,
Houk
K. N.
.
2006
.
Shapes of antibody binding sites: qualitative and quantitative analyses based on a geomorphic classification scheme
.
J. Org. Chem.
71
:
5082
5092
.
39.
MacCallum
R. M.
,
Martin
A. C.
,
Thornton
J. M.
.
1996
.
Antibody-antigen interactions: contact analysis and binding site topography
.
J. Mol. Biol.
262
:
732
745
.
40
Sthoeger
Z. M.
,
Wakai
M.
,
Tse
D. B.
,
Vinciguerra
V. P.
,
Allen
S. L.
,
Budman
D. R.
,
Lichtman
S. M.
,
Schulman
P.
,
Weiselberg
L. R.
,
Chiorazzi
N.
.
1989
.
Production of autoantibodies by CD5-expressing B lymphocytes from patients with chronic lymphocytic leukemia.
J. Exp. Med.
169
:
255
268
.
41
Hervé
M.
,
Xu
K.
,
Ng
Y. S.
,
Wardemann
H.
,
Albesiano
E.
,
Messmer
B. T.
,
Chiorazzi
N.
,
Meffre
E.
.
2005
.
Unmutated and mutated chronic lymphocytic leukemias derive from self-reactive B cell precursors despite expressing different antibody reactivity.
J. Clin. Invest.
115
:
1636
1643
.
42
Bomben
R.
,
Dal Bo
M.
,
Capello
D.
,
Benedetti
D.
,
Marconi
D.
,
Zucchetto
A.
,
Forconi
F.
,
Maffei
R.
,
Ghia
E. M.
,
Laurenti
L.
, et al
.
2007
.
Comprehensive characterization of IGHV3-21-expressing B-cell chronic lymphocytic leukemia: an Italian multicenter study.
Blood
109
:
2989
2998
.
43
Collis
A. V.
,
Brouwer
A. P.
,
Martin
A. C.
.
2003
.
Analysis of the antigen combining site: correlations between length and sequence composition of the hypervariable loops and the nature of the antigen.
J. Mol. Biol.
325
:
337
354
.
44
Catera
R.
,
Silverman
G. J.
,
Hatzi
K.
,
Seiler
T.
,
Didier
S.
,
Zhang
L.
,
Hervé
M.
,
Meffre
E.
,
Oscier
D. G.
,
Vlassara
H.
, et al
.
2008
.
Chronic lymphocytic leukemia cells recognize conserved epitopes associated with apoptosis and oxidation.
Mol. Med.
14
:
665
674
.
45
Bomben
R.
,
Dal Bo
M.
,
Capello
D.
,
Forconi
F.
,
Maffei
R.
,
Laurenti
L.
,
Rossi
D.
,
Del Principe
M. I.
,
Zucchetto
A.
,
Bertoni
F.
, et al
.
2009
.
Molecular and clinical features of chronic lymphocytic leukaemia with stereotyped B cell receptors: results from an Italian multicentre study.
Br. J. Haematol.
144
:
492
506
.
46
Chu
C. C.
,
Catera
R.
,
Hatzi
K.
,
Yan
X. J.
,
Zhang
L.
,
Wang
X. B.
,
Fales
H. M.
,
Allen
S. L.
,
Kolitz
J. E.
,
Rai
K. R.
,
Chiorazzi
N.
.
2008
.
Chronic lymphocytic leukemia antibodies with a common stereotypic rearrangement recognize nonmuscle myosin heavy chain IIA.
Blood
112
:
5122
5129
.
47
Rossi
D.
,
Gaidano
G.
.
2010
.
Biological and clinical significance of stereotyped B-cell receptors in chronic lymphocytic leukemia.
Haematologica
95
:
1992
1995
.
48
Tobin
G.
,
Thunberg
U.
,
Johnson
A.
,
Thörn
I.
,
Söderberg
O.
,
Hultdin
M.
,
Botling
J.
,
Enblad
G.
,
Sällström
J.
,
Sundström
C.
, et al
.
2002
.
Somatically mutated Ig V(H)3-21 genes characterize a new subset of chronic lymphocytic leukemia.
Blood
99
:
2262
2264
.

The authors have no financial conflicts of interest.