Current Adaptive Immune Receptor Repertoire sequencing (AIRR-seq) using short-read sequencing strategies resolve expressed Ab transcripts with limited resolution of the C region. In this article, we present the near-full-length AIRR-seq (FLAIRR-seq) method that uses targeted amplification by 5′ RACE, combined with single-molecule, real-time sequencing to generate highly accurate (99.99%) human Ab H chain transcripts. FLAIRR-seq was benchmarked by comparing H chain V (IGHV), D (IGHD), and J (IGHJ) gene usage, complementarity-determining region 3 length, and somatic hypermutation to matched datasets generated with standard 5′ RACE AIRR-seq using short-read sequencing and full-length isoform sequencing. Together, these data demonstrate robust FLAIRR-seq performance using RNA samples derived from PBMCs, purified B cells, and whole blood, which recapitulated results generated by commonly used methods, while additionally resolving H chain gene features not documented in IMGT at the time of submission. FLAIRR-seq data provide, for the first time, to our knowledge, simultaneous single-molecule characterization of IGHV, IGHD, IGHJ, and IGHC region genes and alleles, allele-resolved subisotype definition, and high-resolution identification of class switch recombination within a clonal lineage. In conjunction with genomic sequencing and genotyping of IGHC genes, FLAIRR-seq of the IgM and IgG repertoires from 10 individuals resulted in the identification of 32 unique IGHC alleles, 28 (87%) of which were previously uncharacterized. Together, these data demonstrate the capabilities of FLAIRR-seq to characterize IGHV, IGHD, IGHJ, and IGHC gene diversity for the most comprehensive view of bulk-expressed Ab repertoires to date.

This article is featured in Top Reads, p.1461

Antibodies or Igs are the primary effectors of humoral immunity and are found as both membrane-bound receptors on B cells and circulating, secreted proteins (1). Both membrane-bound BCRs and secreted Abs act to recognize and bind Ag. All Abs and BCRs are composed of two identical heavy (H) and light (L) chains that are posttranslationally associated. The H chain is composed of two distinct domains: (1) the variable domain (Fab), which allows for Ag binding; and (2) the constant domain (Fc), which modulates downstream effector functions (1, 2). The L chain also includes a variable domain that, once posttranslationally associated with the H chain variable domain, interacts with cognate Ag (3). In humans, Abs are grouped into discrete isotypes and subisotypes (i.e., IgM, IgD, IgG1, IgG2, IgG3, IgG4, IgA1, IgA2, and IgE), based on the expression of specific C genes within the genomic locus that encodes the IgH (IGH) locus. Each isotype and subisotype has unique effector properties that together represent the wide diversity of Ab-mediated functions, including binding of Fc receptors, activation of complement, opsonization, Ab-dependent cellular cytotoxicity, and Ab-dependent cellular phagocytosis (4, 5).

To facilitate the development of diverse Ab repertoires capable of recognizing the wide range of pathogens that humans encounter, the Ig genomic loci are highly polymorphic and harbor diverse and complex sets of genes that recombine in each B cell to encode up to 1013 unique specificities (6). B cells create this expansive catalog of specificities through somatic recombination of the V, D, and J genes in IGH, and V and J genes from the corresponding L chain loci, λ (IGL) and κ (IGK) (7). During VDJ recombination in IGH, a single D and J gene are first recombined, whereas the intervening and unselected D and J gene sequences are removed by RAG recombinase (8). After D and J genes are joined, further recombination of a specific V gene to the recombined DJ completes the formation of the full VDJ rearrangement. After transcription of the recombined VDJ, a single C region gene is spliced together with the VDJ cassette to generate the completed H chain transcript (7). Recombination at IGL and IGK occurs similarly, recombining V and J genes only. H and L chain transcripts are independently translated and linked via covalent cysteine bonds resulting in a fully functional protein before B cell cell-surface expression or secretion (Fig. 1A) (9). Naive B cells, which develop in the bone marrow from hematopoietic stem cell progenitors, have undergone VDJ recombination but have not yet encountered Ag and solely express IgM and IgD (10). These naive B cells then migrate to B cell zones in secondary lymphoid tissues, where they encounter Ag, driving further maturation and class switch recombination (CSR) to enable the most effective humoral responses (11). CSR mediates the excision of IGHC genes at the DNA level, which leads to the utilization and linkage of different IGHC genes to the same VDJ, ultimately resulting in class switching to alternate isotypes and subisotypes (11).

The IgG isotype class is represented by four subisotypes: IgG1, IgG2, IgG3, and IgG4. Each IgG subisotype circulates at varied frequencies and facilitates unique immune functions. For example, IgG1 is typically the most abundant circulating IgG and mediates proinflammatory responses; IgG2 targets bacterial polysaccharides, providing protection from bacterial pathogens; IgG3 confers protection against intracellular bacterial infections and enables clearing of parasites; and IgG4 contains exclusive structural and functional characteristics often resulting in anti-inflammatory and tolerance-inducing effects (5). Multiple studies have identified Ab-mediated subisotype-specific pathogenicity in the context of autoimmune diseases and cancer, highlighting the need for further investigation of subisotype-specific repertoires (12–15).

Current Adaptive Immune Receptor Repertoire sequencing (AIRR-seq) methods aim to resolve V and C region transcripts to differing extents. Profiling of the V region, even in part, defines V, D, and J gene usage while also providing characterization of complementarity-determining regions (CDRs) 1, 2, and 3, which are hypervariable and directly interact with target Ag (16). CDR3-targeted profiling approaches, such as those used by Adaptive Biotechnologies (17), allow for V, D, and J gene assignments but do not provide complete resolution of the entire V region (17). The isoform sequencing (Iso-Seq) method captures full-length transcripts expressing a poly(A) tail through oligo dT-based priming. For characterization of AIRR-seq reads, this approach has lower throughput, lower depth, and increased cost, because Iso-Seq generates a complete transcriptome per sample without any enrichment of H chain sequences, which are then filtered out for analyses, resulting in a considerable amount of non-Ab repertoire data being discarded (Fig. 1B). Multiplexed primer-based sr-AIRR-seq strategies generate full V region content but require specific primers to known targets and therefore may miss novel genes and alleles. 5′ RACE sr-AIRR-seq methods capture the full-length VDJ exon without V region–targeted multiplexed primer pools, therefore limiting the impact of primer bias and enabling discovery of novel IGHV, IGHD, and IGHJ genes and alleles (18). 5′ RACE methods also often prime from the first IGHC exon (CH1), allowing for determination of isotype. Additional methods have been developed that shift amplification strategies by capturing additional IGHC region sequence to enable subisotype resolution; however, these methods sacrifice full and contiguous characterization of IGHV genes (19). All commonly used sr-AIRR-seq methods are further technically limited by the length restrictions (≤600 nt) of short-read next generation sequencing. As a result, no current sr-AIRR-seq strategy resolves the complete H chain transcript, including all IGHC exons alongside the recombined IGHV, IGHD, and IGHJ genes. Our team has recently shown that population-based polymorphisms within the IGHV, IGHD, IGHJ, IGK, and IGL loci are far more extensive than previously known; the IGHC region has also been shown to contain genomic diversity, although the extent of this diversity has likely not been fully explored (20–27). Although it is understood that the Fc domain mediates Ab effector functions, there is limited knowledge as to how genetic variation in this region may impact functional capabilities or posttranslational modifications (5, 28, 29). As such, there is a growing need to understand genomic variation across the complete Ab molecule. To address these limitations, we have developed an end-to-end pipeline to target, profile, and characterize the Ab H chain repertoire in the context of isotype (IgG, IgM) and subisotype (IgG1, IgG2, IgG3, IgG4).

In this article, we present full-length AIRR-seq (FLAIRR-seq), a novel targeted 5′ RACE-based amplification of near-full-length IgG and IgM H chain transcripts, paired with single-molecule real-time (SMRT) sequencing, resulting in highly accurate (mean read accuracy ∼Q60, 99.9999%), near-full-length Ab sequences from RNA derived from whole blood, isolated PBMCs, and purified B cells. When analyzed with the Immcantation sr-AIRR-seq tool suite (30, 31), we demonstrate that FLAIRR-seq performs comparably with standard 5′ RACE sr-AIRR-seq methods and single-molecule Iso-Seq strategies for characterizing the expressed Ab repertoire. We further highlight FLAIRR-seq data features, including phased identification of IGHV, IGHD, IGHJ, and IGHC genes, allowing for profiling of subisotype- and IGHC allele-specific repertoires and CSR characterization.

Experiments were conducted using healthy donor PBMCs, purified B cells from healthy donors, or whole blood collected from hospitalized COVID-19 patients (Supplemental Table I). Commercially available healthy donor PBMCs (STEMCELL Technologies) and a subset of matched purified B cells were used to generate sr-AIRR-seq and FLAIRR-seq validation datasets. Full-length Iso-Seq was performed using B cells isolated from the PBMCs of a healthy, consented 57-y-old male donor at the University of Louisville (UofL) School of Medicine. The UofL Institutional Review Board (IRB) approved sample collection (IRB 14.0661). For COVID-19–affected patient samples (n = 5), whole blood was collected from the Mount Sinai COVID-19 biobank cohort of hospitalized COVID-19 patients, approved by the IRB at the Icahn School of Medicine at Mount Sinai as previously described (32).

Frozen healthy donor PBMCs were purchased, thawed, and aliquoted for use in downstream experiments (STEMCELL Technologies). For Iso-Seq analyses, 175 ml of venous blood was collected in a final concentration of 6 mM K3EDTA using standard phlebotomy. PBMCs were isolated using Sepmate PBMC Isolation Tubes (STEMCELL Technologies) as previously described (33), with an additional granulocyte depletion step using the RosetteSep Human Granulocyte Depletion Cocktail (STEMCELL Technologies) as directed by the manufacturer. B cells from the freshly collected and frozen healthy donor PBMCs were isolated using the EasySep Human Pan-B Cell Enrichment Kit, as described by the manufacturer (STEMCELL Technologies). In brief, B cells, including plasma cells, were isolated by negative selection using coated magnetic particles. First, the B cell enrichment mixture was added to the sample and mixed for a 5-min incubation at room temperature, followed by addition of magnetic particles and further incubation for 5 min on the benchtop. The sample tube was then placed on an EasySep magnet (STEMCELL Technologies), and purified B cells were carefully eluted from the magnetic particles and immediately used for RNA extraction.

For the healthy frozen PBMCs and matched purified B cells, genomic DNA (gDNA) and RNA were coextracted using the AllPrep DNA/RNA Mini Kit (Qiagen) according to the manufacturer’s instructions. For the freshly processed UofL healthy donor PBMCs, purified Pan-B cells were lysed in Buffer RLT Plus, and RNA was extracted with the RNeasy Plus Mini Kit (Qiagen) per the manufacturer’s protocol; no gDNA was collected from this sample. COVID-19 whole blood–derived RNA was extracted from samples collected in Tempus Blood RNA tubes using the Tempus Spin RNA Isolation Kit (ThermoFisher) as described by the manufacturer. For all samples, concentrations of RNA and gDNA (when appropriate) were assessed using the Qubit 4.0 fluorometer, with the RNA HS Assay Kit and Qubit DNA HS Assay Kit, respectively (ThermoFisher Scientific). RNA and gDNA integrity were evaluated using the Bioanalyzer RNA Nano Kit and DNA 12000 Kit, respectively (Agilent Technologies). Extracted RNA and gDNA were stored at −80°C and −20°C, respectively, until used.

Extracted RNA was thawed on ice and converted to first-strand cDNA using the SMARTer RACE 5′/3′ Kit (Takara Bio USA), as described by the manufacturer and a custom oligonucleotide that contained the template switch oligo and a unique molecular identifier (UMI; 5′ TSO-UMI) for template switch during first-strand cDNA synthesis. The following reaction conditions were used: (1) a primary master mix was prepared with 4.0 μl 5× First-Strand Buffer, 0.5 μl DTT (100 mM), and 1.0 μl dNTPs (20 mM) per reaction and set aside until needed; (2) in a separate 0.2-ml PCR tube, 10 μl of sample RNA and 1 μl 5′-CDS Primer A were combined and incubated in a thermal cycler at 72°C (lid temperature: 105°C) for 3 min, followed by cooling to 42°C for 2 min; (3) after cooling, tubes were spun briefly to collect contents, and 1 μl (12 μM) of the 5′ TSO-UMI was added to the RNA; and (4) 0.5 μl of RNase inhibitor and 2.0 μl of SMARTScribe Reverse Transcriptase were added to the primary master mix tube per sample, and 8 μl of the combined master mix was then added to each RNA-containing sample tube. First-strand cDNA synthesis reactions were incubated in a thermal cycler at 42°C (lid temperature: 105°C) for 90 min, followed by heat inactivation at 70°C for 10 min. Total first-strand cDNA generated in this reaction was diluted 1:2 with Tricine-EDTA Buffer before moving onto targeted H chain transcript amplification.

To specifically amplify H chain transcripts from total first-strand cDNA, we performed targeted IgG and IgM transcript amplification reactions using barcoded IgG (3′ primer binding in the C region exon 3 [CH3])-specific or IgM (3′ primer binding in the C region exon 4 [CH4])-specific primers (Supplemental Table II) and the following conditions: (1) 5 μl of diluted first-strand cDNA was added to 0.2-ml PCR tubes; (2) a master mix was generated using 10 μl 5× PrimeSTAR GXL Buffer, 4 μl GXL dNTP mixture, 28 μl PCR-grade water, 1 μl PrimeSTAR GXL Polymerase, and 1 μl 10× UPM form the SMARTer RACE 5′/3′ Kit per reaction; and (3) 44 μl master mix was added to each reaction tube followed by 1 μl of the appropriate barcoded IgG (CH3) or IgM (CH4) primer. Different temperatures were used for annealing of IgG- (63.5°C) and IgM-specific primers (60°C) to account for primer-specific melting temperatures and to enhance targeted amplification specificity. Amplification conditions for full-length IgG were 1 min at 95°C, followed by 35 amplification cycles of 95°C for 30 s, 63.5°C for 20 s, and 2 min at 68°C, followed by a final extension for 3 min at 68°C and hold at 4°C. Amplification conditions for full-length IgM were 1 min at 95°C, followed by 35 amplification cycles of 95°C for 30 s, 60°C for 20 s, and 2 min at 68°C, followed by a final extension for 3 min at 68°C and hold at 4°C. Final amplification reactions were purified using a 1.1× (v/v) cleanup with ProNex magnetic beads (Promega). Successfully amplified products were quantified with Qubit dsDNA HS assay (ThermoFisher Scientific), and length was evaluated with the Fragment Analyzer Genomic DNA HS assay (Agilent). Samples were equimolar pooled in 8-plexes for SMRTbell library preparation and sequencing.

Eight-plex pools of targeted IgG or IgM amplicons were prepared into SMRTbell sequencing templates according to the “Procedure and Checklist for Iso-Seq Express Template for Sequel and Sequel II systems” protocol starting at the “DNA Damage Repair” step and using the SMRTbell Express Template Prep Kit 2.0, with some modifications (Pacific Biosciences). In brief, targeted IgG and IgM amplicons underwent enzymatic DNA damage and end repair, followed by ligation with overhang SMRTbell adapters as specified in the protocol. To increase consistency in SMRTbell loading on the Sequel IIe system, we further treated the SMRTbell libraries with a nuclease mixture to remove unligated amplified products, using the SMRTbell Enzyme Cleanup Kit, as recommended by the manufacturer (Pacific Biosciences). In brief, after heat-killing the ligase with an incubation at 65°C, samples were treated with a nuclease mixture at 37°C for 1 h and then purified with a 1.1× Pronex cleanup. Final SMRTbell libraries were evaluated for quantity and quality using the Qubit dsDNA HS assay and Fragment Analyzer Genomic DNA assay, respectively. Sequencing of each 8-plex, barcoded sample pool was performed on one SMRTcell 8M using primer v4 and polymerase v2.1 on the Sequel IIe system with 30-h movies. Demultiplexed, high-fidelity circular consensus sequence reads (“HiFi reads”) were generated on the instrument for downstream analyses.

Matched healthy donor RNA was used to generate targeted IgG and IgM sr-AIRR-seq libraries using the SMARTer Human BCR IgG IgM H/K/L Profiling Kit (Takara Bio USA) according to the manufacturer’s instructions with no modifications. In brief, for each sample, proprietary IgG and IgM primers were used to amplify H chain transcripts following a 5′ RACE reaction. sr-AIRR-seq libraries were then quality controlled using the 2100 Bioanalyzer High Sensitivity DNA Assay Kit (Agilent) and the Qubit 3.0 Fluorometer dsDNA High Sensitivity Assay Kit. Sequencing on the MiSeq platform using 300-bp paired-end reads was performed using the 600-cycle MiSeq Reagent Kit v3 (Illumina) according to the manufacturer’s instructions, and FASTQ reads were generated using the associated DRAGEN software package (Illumina).

RNA extracted from healthy sorted B cells was used to generate Iso-Seq SMRTbell libraries following the “Procedure & Checklist Iso-Seq Express Template Preparation for the Sequel II System” with minor adaptations compared with the manufacturer’s instructions. In brief, Iso-Seq libraries were generated using 500 ng of high-quality (RNA integrity number > 8) RNA as input into oligo-dT primed cDNA synthesis (NEB). Barcoded primers were incorporated into the cDNA during second-strand synthesis. After double-stranded cDNA amplification, transcripts from two samples sourced from purified B cells and NK cells were equimolar pooled as previously described (33). SMRTbells were generated from the pooled cDNA as described earlier for the FLAIRR-seq amplification products, including the addition of a nuclease digestion step. Quantity and quality of the final Iso-Seq libraries were performed with the Qubit dsDNA High Sensitivity Assay Kit and Agilent Fragment Analyzer Genomic DNA assay, respectively. This 2-plex Iso-Seq pool was sequenced using primer v4 and polymerase v2.1 on the Sequel IIe system with a 30-h video. HiFi reads were generated on the instrument before analyses. Demultiplexing of barcoded samples and generation of full-length nonconcatemer predicted transcripts were performed using the Iso-Seq v3 pipeline available through SMRTLink (v.10.2). B cell–derived full-length nonconcatemer reads were mapped to the human genome using the Genomic Mapping and Alignment Program reference database, and reads derived from chromosome 14 were extracted for downstream IgH transcript characterization via Immcantation, as described later.

Analyses of FLAIRR-seq, sr-AIRR-seq, and Iso-Seq datasets were performed using Immcantation tools (30, 31). Demultiplexed barcoded HiFi (for SMRT sequencing data) or FASTQ (for sr-AIRR-seq) reads were first processed using the pRESTO tool for quality control, UMI processing, and error profiling (30). For sr-AIRR-seq, pRESTO analysis data from paired-end reads (“R1” and “R2”) were trimmed to remove bases with <Q20 read quality and/or <125-bp length using the “FilterSeq trimqual” and “FilterSeq length,” respectively. IgG and IgM CH3 or CH4 primer sequences were identified with an error rate of 0.2, and primers identified were then noted in FASTQ headers using “MaskPrimers align.” Next, 12-bp UMIs were located and extracted using “Maskprimers extract.” Sequences found to have the same UMIs were grouped and aligned using “AlignSets muscle,” with a consensus sequence generated for each UMI using “BuildConsensus.” Mate pairing of sr-AIRR-seq reads was conducted using a reference-guided alignment requiring a minimum of a 5-bp overlap via “AssemblePairs sequential.” After collapsing consensus reads with the same UMI (“conscount”) using “CollapseSeq,” reads with fewer than two supporting sequences were removed from downstream analysis. For pRESTO processing of FLAIRR-seq, single HiFi reads (“R1”) did not require trimming because of >Q20 sequence quality across all bases. 5′ TSO-UMI and CH3 or CH4 region primers were identified along with a 22-bp UMI with an error rate of 0.3 using “MaskPrimers align.” Reads were then grouped and aligned using “AlignSets muscle.” Due to the single-molecule nature of FLAIRR-seq reads, no mate pairing was required. Consensus reads were then generated as described earlier, including removal of sequences with fewer than two supporting reads. Read counts following each step of data filtration for AIRR-seq and FLAIRR-seq are represented in Supplemental Table III.

pRESTO-filtered reads for both sr-AIRR-seq and FLAIRR-seq data were then input into the Change-O tool (Table I). Iso-Seq required no initial processing from pRESTO and was input into Change-O for Ig gene reference alignment along with sr-AIRR-seq and FLAIRR-seq data using “igblastn,” clonal clustering using “DefineClones,” and germline reconstruction and conversion using “CreateGermlines.py” and the GRCh38 chromosome 14 germline reference (31). Fully processed and annotated data were then converted into a TSV format for use in downstream analyses. The Alakazam Immcantation tool suite was then used to quantify gene usage, calculate CDR3 length, and assess somatic hypermutation frequencies (SHMs), which were compared by unpaired t tests with Bonferroni multiple testing correction. Alakazam was also used to analyze clonal diversity (31). SCOPer clonal assignment by spectral clustering was conducted for COVID-19 patient time-course samples (34, 35). For clonal lineage tree analysis, the Dowser tool was used to examine clonal diversity and CSR over time (36). After trimming of 3′ primers, FLAIRR-seq data were also processed using the MiXCR generic BCR amplicon pipeline with visualization using the immunarch R package (Supplemental Fig. 1) (37, 38).

FLAIRR-seq validation samples also underwent IGHC targeted enrichment and long-read sequencing as previously described. In brief, gDNA was mechanically sheared and size selected to include 5- to 9-kb fragments using the BluePippin (Sage Science). Samples were then end repaired and A-tailed using the standard KAPA library preparation protocol (Roche). Universal priming sequences and barcodes were then ligated onto samples for multiplexing (Pacific Biosciences). Barcoded gDNA libraries were captured using IGH-specific probes following a SeqCap protocol (Roche). Twenty-six IGH-enriched samples were purified and pooled together for SMRTbell library prep as described earlier, including the final nuclease digestion step. Pooled SMRTbells were annealed to primer v4, bound to polymerase v2.0, and sequenced on the Sequel IIe system with 30-h movies. After sequencing, HiFi reads were generated and analyzed by the IGenotyper pipeline (39). In brief, IGenotyper was used to detect single-nucleotide variants (SNVs) and assemble sequences into haplotype-specific assemblies for downstream IGHC gene genotyping. Alleles were then extracted from assemblies using a bed file containing coordinates for each IGHC gene exon. After sequences were extracted, reads were then aligned to the international ImMunoGeneTics information system (IMGT) database (downloaded on 2/21/22) and assigned as an exact match to IMGT, “novel” if there was no match to the IMGT database, or “novel, extended” if a match was detected to a partial allele found in IMGT but the IMGT allele was a substring of the IGenotyper-identified allele (40). This set of alleles was then used as a ground-truth dataset.

To genotype IGHC genes and alleles from FLAIRR-seq data, we filtered productive reads by IGHC length (900–1100 bp) for IgG and aligned to the chromosome 14 hg38 reference using minimap2 along with SAMtools to generate sorted and indexed bam files (41, 42). WhatsHap was used to identify, genotype, and phase SNVs (M. Martin, M. Patterson, S. Garg, S. O. Fischer, N. Pisanti, G. W. Klau, A. Schöenhuth, and T. Marschall, manuscript posted on bioRxiv, DOI: 10.1101/085050). Phased SNVs were used to assign each read to a haplotype using MsPAC. Reads from each haplotype and gene were clustered using CD-HIT using a 100% identity clustering threshold parameter, and a single representative read from the largest cluster was aligned to the IGenotyper curated alleles using BLAST (43) to determine the closest matching IGHC gene and allele (40, 44). The representative read was selected based on 100% identity to all other sequences in that cluster.

To test the ability of IGHC genes to be used for the inference of IGHV, IGHD, and IGHJ haplotypes from FLAIRR-seq data, we chose one sample that was heterozygous for both IGHM and IGHJ6 (IGHJ6 use is standard for sr-AIRR-seq haplotype inference). For this sample, we used TIgGER (31, 45–47) to infer novel IGHV alleles and generate sample-level IGHV genotypes. Rearranged sequences within the Change-O table were then reannotated, taking into account sample genotype and detected novel alleles. Updated annotations were then used to infer haplotypes using RAbHIT version 0.2.4 (48). Both IGHJ6 and IGHM were used as anchor points for haplotyping, and the resulting haplotypes were compared.

All IGenotyper, FLAIRR-seq, sr-AIRR-seq, and Iso-Seq data generated for this study are available in the Sequence Read Archive using the identifier BioProject ID PRNJA922682 (https://www.ncbi.nlm.nih.gov/bioproject/?term=922682).

Current methods for commercially available 5′ RACE sr-AIRR-seq use targeted amplification of the V region and, in some cases, a small portion of CH1, in conjunction with short-read sequencing to characterize Ig repertoires. However, this minimal examination of the IGHC gene sequence is primarily used to define isotypes. No current method defines both the H chain V and C regions allowing for both subisotype classification and IGHC allele-level resolution. To address these technical limitations, we developed the FLAIRR-seq method (Fig. 1C), a targeted 5′ RACE approach combined with SMRT sequencing, to generate highly accurate, near-full-length IgG (∼1500 bp) and/or IgM (2000 bp) sequences, allowing for direct, simultaneous analysis of the H chain V and C regions (Fig. 1D), including gene/allele identification for IGHV, IGHD, IGHJ, and IGHC segments and isotype- and subisotype-specific repertoire profiling.

To evaluate and validate the capabilities of FLAIRR-seq and sr-AIRR-seq, we performed analyses on 10 healthy donor PBMC samples to compare both library preparation methods. FLAIRR-seq data were filtered from the initial HiFi reads (>Q20) to include only >Q40 reads. The average read quality of these filtered reads was >Q60 (99.9999%), with a pass filter rate ranging from 88 to 93% of total reads. sr-AIRR-seq FASTQ bases were trimmed to retain sequences with an average quality of Q20 (99%). These filtered reads were used as input into the Immcantation suite, specifically the pREST-O and Change-O tools, for IGHV, IGHD, and IGHJ gene assignment, and repertoire feature analyses, including identification of clones, extent of somatic hypermutation, and evaluation of CDR3 lengths. As shown in Table I, fewer overall FLAIRR-seq reads were used as input into the Immcantation analyses, after required filtration and read assembly steps (which were not needed for the high-quality single-molecule FLAIRR-seq reads). FLAIRR-seq resulted in comparable or, in many cases, increased number of unique VDJ sequences, clones, and CDR3 sequences when compared with matched sr-AIRR-seq–derived samples in both the IgG and IgM repertoires. These basic sequencing and initial analysis metrics demonstrated that FLAIRR-seq produced high-quality V region data for detailed Ab repertoire analyses and are amenable to analysis using existing sr-AIRR-seq analysis tools. Comparative costs were calculated based on the cost per “actionable read,” defined as the read number per sample and per method after quality filtration and assembly, but before cluster consensus. This calculation represented the total unique single molecule or assembled templates captured by either method that passed all necessary quality control irrespective of biologic repertoire diversity or clonality, removing technology/platform biases. To remove the impact of pooling differences, we used this “per actionable read” price to calculate the cost for 15,000 “actionable reads” as our standard sample. sr-AIRR-seq costs $25.50 per sample, whereas the FLAIRR-seq cost was $33.57 per sample. These costs reflect reagents and consumables only; instrumentation and labor are not included.

While optimizing FLAIRR-seq sample preparation, we examined whether upstream isolation of B cells before FLAIRR-seq molecular preparation would enhance the ability to detect IGHV, IGHD, and IGHJ gene usage. To do this, we split aliquots of PBMCs (n = 4) into two groups for RNA extraction: (1) RNA derived from bulk PBMCs; and (2) RNA isolated from purified pan B-cells, followed by FLAIRR-seq preparation, SMRT sequencing, and Immcantation analysis of both groups. IGHV, IGHD, and IGHJ gene usage correlations between groups are shown in Fig. 2A and demonstrate a significant association (p values ranging from 0.033 to 4.1e−16) strongly supporting the conclusion that B cell isolation before RNA extraction was not necessary to achieve comparable gene usage metrics. The limited differences that were observed could be explained by template sampling differences between the two experiments. Due to the strong associations observed and the ease of processing PBMCs in bulk, we moved forward with RNA derived directly from PBMC aliquots for the remainder of our analyses.

We established FLAIRR-seq performance by comparing its output with the commonly used 5′ RACE sr-AIRR-seq method. 5′ RACE sr-AIRR-seq was chosen because it provides resolution of the complete V region and a small portion of IGHC, allowing for isotype differentiation. After preparation of both FLAIRR-seq and sr-AIRR-seq libraries from healthy donor PBMC samples (n = 10), we compared multiple repertoire features with benchmark FLAIRR-seq performance. First, we evaluated IGHV, IGHD, and IGHJ gene usage frequencies. We observed significant correlations between FLAIRR-seq and sr-AIRR-seq datasets in IGHV, IGHD and IGHJ gene usage for both IgM (V genes: r = 0.93–0.97, p ≤ 2.2e−16; D genes: r = 0.98–0.99, p ≤ 2.2e−16; J genes: r = 0.94–1.0, p = 0.0028–0.017) and IgG isotypes (V genes: r = 0.90–0.96, p ≤ 2.2e−16; D genes: r = 0.87–0.99, p ≤ 2.2e−16–6.1e−14; J genes: r = 0.89–1.0, p = 0.0028–0.033) (Fig. 2B), indicating that FLAIRR-seq comparably resolves IGHV, IGHD, and IGHJ gene usage profiles. Notably, IGHJ genes showed lower levels of significance (larger p values) across all comparisons because of the relatively few genes that make up the IGHJ gene family compared with the more diverse IGHV and IGHD families. We next investigated the performance of both methods in terms of resolving SHM frequencies (Fig. 2C) and CDR3 lengths (Fig. 2D), which are often used as measures of evaluating B cell affinity maturation. Although we did observe occasional statistically significant differences in the SHM frequency between sr-AIRR-seq and FLAIRR-seq data using the same samples, these differences were not seen across all samples, suggesting that sample-to-sample variation may drive this observation rather than technology-based discrepancies. We found that CDR3 lengths were consistently longer in the FLAIRR-seq datasets for both the IgM and IgG isotypes in most donors. The characterization of unusually long CDR3 regions (>40 nt) in the IgG sequences with FLAIRR-seq is likely due to the higher contiguity and quality afforded by the longer read lengths, which are less likely to be spanned by short-read 2 × 300-bp paired-end sequencing strategies. Together, these data demonstrate that FLAIRR-seq achieves comparable gene usage profiles and improved resolution of long CDR3 sequences.

Others have recognized the power of long-read sequencing to resolve B cell repertoires using bulk Iso-Seq methods, allowing for the examination of full-length transcripts from isolated B cells (49). As mentioned earlier, the Iso-Seq method captures full-length transcripts expressing a poly(A) tail without bias, although a considerable amount of nonrepertoire data are discarded. To investigate whether the untargeted transcriptome-wide Iso-Seq method would resolve a qualitatively different repertoire than FLAIRR-seq, which would have indicated FLAIRR-seq–driven primer bias, we performed matched Iso-Seq and FLAIRR-seq on purified B cell–derived RNA. IGHV, IGHD, and IGHJ gene usage frequencies were compared between Iso-Seq and FLAIRR-seq datasets (Spearman rank correlation), revealing significant correlations between usage profiles (V genes: r = 0.94, p = 2.2e−16; D genes: r = 0.92, p = 1.4e−13; J genes: r = 1.0, p = 0.0028; Fig. 2E). These data strongly suggest that FLAIRR-seq has very limited to no primer-driven bias compared with whole-transcriptome data. Collectively, this benchmarking dataset confirmed that FLAIRR-seq is comparable with other state-of-the-art methods, providing robust characterization of commonly used repertoire metrics, with limited increases in per-sample cost.

FLAIRR-seq data provides improved resolution of IGHC, including estimation of IGHC gene and allele usage, subisotype identification, and phasing of V and C regions for comprehensive repertoire analysis. To evaluate the capabilities and accuracy of IGHC gene and allele identification with FLAIRR-seq, we first established a ground-truth dataset of IGHC alleles for all 10 samples by targeted sequencing of the germline IGH locus (Fig. 3A) using IGenotyper, as previously described (39). IGHG1, IGHG2, IGHG3, IGHG4, and IGHM alleles called by IGenotyper (see Materials and Methods) were assigned to one of three categories, schematized in Fig. 3B: (1) “exact match,” alleles documented in the IMGT database; (2) “novel not in IMGT,” alleles not documented in the IMGT database; or (3) “extended,” alleles that matched partial alleles in the IMGT database (i.e., those spanning only a subset of exons) but were extended by sequences in our dataset. IGenotyper identified a total of 32 unique IGHG1, IGHG2, IGHG3, IGHG4, and IGHM alleles across all individuals, as schematized in Fig. 3C. Among these 32 alleles, only 4 were documented in IMGT; the remaining represented novel alleles (n = 11) or extensions (n = 17) of known alleles. In aggregate, we observed a greater number of IGHG4 alleles than for any of the IGHG genes. Among these alleles were four sequences represented by suspected duplications of IGHG4. In fact, we observed three IGHG4 gene alleles in 4/10 samples, indicating the presence of gene duplications in these donors; in all cases, these alleles were also identified in the FLAIRR-seq data (see later). Given the relatively small size of this proof-of-concept healthy donor cohort, the identification of 28 (87%) novel or extended alleles underscores the extensive polymorphism in this region and reflects the paucity of information regarding this locus in existing immunogenomics databases.

We next used the IGenotyper-derived IGHC gene database as the ground truth for evaluating the capability of FLAIRR-seq to identify and resolve IGHC gene alleles. Based on our analysis workflow for identifying IGHC alleles from FLAIRR-seq data, we resolved 19/32 (59%) IGenotyper alleles at 100% identity; no additional false-positive alleles were identified. Of the alleles that were not unambiguously resolved by our FLAIRR-seq pipeline, eight had allele-defining SNVs 3′ of the FLAIRR-seq primers. The rate of true-positive allele calls using FLAIRR-seq across all 10 samples ranged from 5% for IGHG1 to 90% for IGHM (Fig. 3C, 3D). As a result, the IGHC genotypes inferred by FLAIRR-seq have some limitations, but on the whole allow for much greater resolution of IGHC variation in the expressed repertoire than currently used methods. Future iterations of FLAIRR-seq will include primer optimization to facilitate better sequence coverage in the 3′ regions of the IGHC alleles and improve the direct genotyping capabilities of the FLAIRR-seq method.

Previous studies have demonstrated the use of IGHJ6 heterozygosity to infer haplotypes of V and D genes from sr-AIRR-seq data (47, 50). However, the frequency of IGHJ6 heterozygotes in the population can vary. Therefore, we wanted to assess the utility of leveraging IGHC polymorphism resolved by FLAIRR-seq for haplotyping IGHV alleles with the publicly available tool RAbHIT (47, 50, 51). We selected a single donor (“1013”) from our cohort that was heterozygous for both IGHJ6 (*02 and *03) and IGHM (FL_2 and FL_4). Importantly, we were able to associate each IGHJ6 allele to the respective IGHM allele from the corresponding haplotype (Fig. 3E). After defining germline IGHV alleles using TIgGER (31), we generated and compared IGHV haplotype inferences using either IGHJ6 or IGHM alleles as anchor genes using RAbHIT (48) (Fig. 3E). Although some allele assignments were ambiguous (“unknown”) using both methods, we observed a strong consensus between haplotype inferences using the two anchor genes. For haplotype 1, represented by IGHJ6*03 and IGHM_FL_4, the IGHJ6*03-derived haplotype had 39 IGHV genes for which either an allele or deletion call was made. When using IGHM_FL_4, the same allele/deletion calls were made for 39 of these genes; in addition, using IGHM as the anchor gene, assignments were made for an additional five IGHV genes that had “unknown” designations using IGHJ6. Similarly, of the allele/deletion calls made for 41 IGHV genes on haplotype 2 assigned to IGHJ6*02, 40 of these gene had identical assignments to IGHM_FL_2. Together, these results indicate that IGHC variants can be used for haplotype inference from repertoire data when commonly used IGHJ or IGHD genes are homozygous in individuals of interest.

IGHG and IGHM alleles identified in each sample were used to annotate reads in each respective repertoire. These assignments allowed for partitioning of the repertoire by isotype, subisotype, and IGHC allele (Fig. 4). To demonstrate this, we used the same representative sample (“1013”) that was heterozygous for all IGHC genes. As shown in Fig. 4A, IGHC gene assignments allow for subisotype and allele-level frequencies to be estimated as a proportion of the overall IgG and IgM repertoires. In addition, detailed analyses of the repertoire can be conducted within each of these compartments. For example, Fig. 4B shows the frequencies of IGHV gene subfamilies for each IGHG and IGHM allele identified in this sample. Importantly, the average subisotype distribution of unique sequences resolved with FLAIRR-seq across our 10 subjects were in line with what has been reported for healthy individuals, with the most prevalent being IGHG1 (52%) and IGHG2 (36%), whereas IGHG3 (6%) and IGHG4 (5%) were seen at lower levels (52). Using standard sr-AIRR-seq analyses, we would not be able to identify allele-resolved V gene usage or enrichment within subisotype populations, which is important for linking subisotype functionality to particular Ag-specific VDJ clones.

Through the partitioning of repertoire sequences by subisotype and IGHC allele, we found that FLAIRR-seq also allowed for trends to be assessed in aggregate across donors. To demonstrate this, we further examined V gene family usage partitioned by IgG subisotype across all 10 healthy donors. This analysis revealed expected patterns, in that IGHV1, IGHV3 and IGHV4 subfamily genes were dominant across the four subisotypes (Fig. 4C). However, we did observe significant variation in subfamily proportions between subisotypes, associated with distinct profiles in specific subisotypes (Fig. 4C). Specifically, the estimated frequencies of IGHV1 and IGHV3 were statistically different between subisotypes (p < 0.01, ANOVA); IGHV1 usage was elevated in IGHG1 and IGHG4, whereas IGHV3 was elevated in IGHG2 and IGHG3. These analyses demonstrate the unique capability of FLAIRR-seq to examine variation in the expressed repertoire at the level of isotype, subisotype, and IGHC allele. As samples sizes increase, we expect that a multitude of additional repertoire features will become accessible to this kind of analysis leading to novel discoveries linking VDJ and IGHC genetic signatures.

We wanted to investigate the utility of FLAIRR-seq in clinical samples, particularly to observe changes in immune repertoires over time. Ab responses are highly dynamic, with specific Ab clones expanding upon activation by Ag. We were interested to know whether CSR could be captured by FLAIRR-seq, given the capability to identify clones with the subisotype resolved. We had the opportunity to evaluate FLAIRR-seq resolved repertoires over time in four samples collected from one individual over their >13 d hospitalization for severe COVID-19. Blood draws were taken on days 1, 4, 8, and 13 posthospitalization (Fig. 5A) and analyzed with FLAIRR-seq across all time points. After initial FLAIRR-seq processing and analysis, we defined unique clones using SCOPer, which clusters sequences based on CDR3 similarity and mutations in IGHV and IGHJ genes (34). This analysis allowed for the estimation of subisotype-specific clone counts across the four time points examined. Overall, we observed IgG1 dominated the repertoire at all four time points, but the proportion of subisotypes fluctuated over time (Fig. 5B). Specifically, the IGHG2- and IGHG3-specific repertoires expanded from day 1 to day 4, but then contracted in overall frequency from day 8 to day 13 (Fig. 5B). To assess clonal diversity within each subisotype repertoire across time, we calculated the Simpson’s diversity index (q = 2) using Alakazam (31). All subisotype-specific repertoires became less diverse from day 1 to day 4, suggesting clonal expansion across the IGHG repertoire (Fig. 5C). Notably, IGHG4 was not included in diversity calculations because IGHG4 would have required higher sequencing depth to ascertain diversity, given the overall lower expression of IgG4 transcripts in this individual. Subisotype-specific repertoire polarity was also assessed by calculating the fraction of clones needed to represent 80% of the total repertoire (Fig. 5D), with lower fractions representing more polarized and clonally expended repertoires. Results of this analysis were consistent with the diversity index, demonstrating an increase in clonal expansion (i.e., decreased polarity) at day 4 across IGHG1, IGHG2, and IGHG3 compartments, which returned to baseline at later time points.

CSR mediates the switching of Abs from one subisotype/isotype to another. This occurs through the somatic recombination of IGHC genes, which brings the switched/selected IGHC genes adjacent to the recombined IGHV, IGHD, and IGHJ segments, facilitating transcription. The switching of isotypes and subisotypes can result in changes to associated effector functions of the Ab while maintaining Ag-specific V regions. Given the ability of FLAIRR-seq to resolve clones with subisotype and IGHC allele resolution, as proof-of-concept, we sought to assess whether FLAIRR-seq could allow for more detailed haplotype-level analysis of CSR through the course of infection. To do this, we identified the largest clones in our dataset that were both represented by multiple isotypes and found across time points. In total, using SCOPer (34), we identified 19 unique clonal lineages that met these criteria. We focused our detailed analysis on one of the largest clones, 9900 (Fig. 5E), comprising IGHM, IGHG1, and IGHG2 sequences. On day 1 posthospitalization, clone 9900 sequences were represented by both IGHM and IGHG2. At day 4, IGHG2 was the only isotype observed, whereas again at day 8, both IGHM and IGHG2 were observed, as well as IGHG1. To visualize CSR, we built a phylogeny using Dowser (36). Highlighted in the red box on the phylogenetic tree shown in Fig. 5F, we observe a single subclade that is represented by IGHM, IGHG1, and IGHG2.

We were also able to resolve IGHC alleles from this individual, with the exception of IGHG1 alleles, which were ambiguous. Critically, both IGHG1 and IGHG2 were heterozygous (Fig. 5G). Through the assignment of IGHG alleles to haplotypes within this individual using heterozygous V genes (IGHV3–7 and IGHV3–48), we were able to determine that sequences from clone 9900 (Fig. 5F) used IGHG1 and IGHG2 alleles from the same haplotype, associated with IGHV3–7*07 and IGHV3–48*03 (31). This observation offers direct characterization of CSR events occurring on the same chromosome. When we looked across the remaining clones (n = 8) in this dataset that spanned time points and were represented by IGHG1 and IGHG2 subisotypes, we were able to confirm that all of these clones used IGHG alleles from the same haplotype.

Together, these data provide demonstrative proof-of-concept evidence that FLAIRR-seq profiling performs robustly on clinical samples, including RNA directly extracted from whole blood. In addition, these data provide near full-length repertoire resolution extending what would have been possible with standard sr-AIRR-seq methods, including analysis of subisotype-specific repertories, evaluation of clonal expansion, and characterization of CSR in the IgG and IgM repertoires.

In this article, we present the development, validation, and application of FLAIRR-seq, a method designed to resolve near-full-length Ab transcripts from bulk PBMC-derived, isolated B cell–derived, and whole blood–derived total RNA. FLAIRR-seq enables highly accurate, simultaneous resolution of V and C regions and suggests that IGHC polymorphism is far more extensive than previously assumed. FLAIRR-seq performed equivalent to or with increased resolution compared with existing standard 5′ RACE AIRR-seq methods when resolving V, D, and J gene calls, CDR3 lengths, and SHM signatures, suggesting that our CH3/CH4 targeting strategies did not compromise V region characterization while simultaneously adding the capability to resolve IGHC variation. Little to no primer bias was observed when compared with Ab repertoire profiling from total mRNA Iso-Seq methods. FLAIRR-seq provides the ability to use IGHC gene usage to identify subisotypes and genotype H chain transcripts, linking these data back to evaluate subisotype-specific repertoires, clonal expansion, and CSR. Underscoring the underappreciated extent of IGHC variation, our profiling of a restricted cohort of only 10 individuals from relatively homogenous backgrounds still identified 4 and 7 previously undocumented IGHC alleles in IgM and IgG, respectively, and extended an additional 17 alleles beyond which had been available in the IMGT database.

The unique capabilities of FLAIRR-seq will allow higher resolution examination of Ab repertoires, including the characterization of variable gene usage and clonotype distribution within unique subisotype subsets. This perspective has the potential to provide key insights into dynamic Ab responses in diseases known to be mediated by subisotype-specific processes or have skewed subisotype distribution as predictive markers of disease, including myasthenia gravis (mediated by pathogenic autoantibodies across subisotypes that give rise to varied disease pathologies), acute rheumatic fever (associated with elevated IgG3), and melanoma (skewing toward IgG4 in late-stage disease thought to be indicative of tolerogenic responses and poor prognosis) (12–14). These subisotype-specific repertoire profiling approaches have the potential to enable identification of unique clones that mediate disease pathogenicity or serve as high-resolution biomarkers to disease progression, as well as open the door for potential functional experiments on subisotype clones of interest, including examining the functional impact of the, to our knowledge, novel IGHC alleles identified in this study. We propose that expanded population-based FLAIRR-seq profiling and curation of novel IGHC alleles, particularly in conjunction with IGenotyper targeted genomic assembly efforts in IGH, will be a significant first step in defining the full extent of variation in a region too long assumed to be relatively invariant.

The Fc region is known to be critical for modulating differential Ab effector functions. These differential functionalities are currently understood to be regulated by differential posttranslational modifications, such as variable glycosylation (53–56). Future FLAIRR-seq profiling will be a valuable tool to investigate how genomic variation across IGHC genes impacts residue usage and resultant Fc receptor binding, signaling potential, crosslinking, and potential for posttranslational modification, all of which would be expected to alter downstream effector functions, such as Ab-dependent cellular cytotoxicity, Ab-dependent cellular phagocytosis, and complement fixation.

We further demonstrate that FLAIRR-seq can effectively examine clonal expansion and CSR in longitudinal samples, demonstrating the feasibility of using FLAIRR-seq to resolve Ab repertoire dynamics. This increased resolution will further our understanding of Ab repertoire evolution in the transition of acute to chronic disease states, many of which are associated with overall IgG subisotype distribution changes that are thought to reflect the inflammatory milieu (14, 57). One example is advanced melanoma, where late-stage disease is characterized by elevated IgG4 compared with IgG1, which is believed to reflect a more tolerizing, protumor environment (12). As suggested by the data presented in this article, FLAIRR-seq could also provide insights into longitudinal Ab repertoire dynamics in COVID-19 infection after exposure to different viral variants of concern or assess Ab responses to viral vaccines. FLAIRR-seq examination of these samples may identify specific repertoire distribution patterns that could act as biomarkers of disease progression. Moving forward it is critical to account for all variability within the Ab repertoire for the most comprehensive understanding of repertoire dynamics and the myriad factors impacting Ab effector function. Future efforts will expand FLAIRR-seq methods to target IGHA and IGHE repertoires, as well as implement multiplexed arrays sequencing to considerably increase throughput and lower cost (A. M. Al’Khafaji, J. T. Smith, K. V. Garimella, M. Babadi, M. Sade-Feldman, M. Gatzen, S. Sarkizova, M. A. Schwartz, V. Popic, E. M. Blaum, et al., manuscript posted on bioRxiv, DOI: 2021.10.01.462818). Together, the data presented in this article demonstrate that the FLAIRR-seq method provides a comprehensive characterization of allele-resolved IgG and IgM repertoires, detailing V region gene usage and measurements of maturation, isotype, and subisotype identification, and the unappreciated extent of C region variation, which will be necessary to fully appreciate the impact of Ig genomic variation in health and disease.

The authors have no financial conflicts of interest.

We thank Kamille Rasche, Kaitlyin Shields, Uddalok Jana, and the staff of the UofL Sequencing Technology Center for assistance in the sequencing and analysis of FLAIRR-seq samples.

This work was supported by the National Institutes of Health Grant P20 GM135004-02 (to E.E.F., D.T., C.T.W., and M.L.S.).

The online version of this article contains supplemental material.

AIRR-seq

Adaptive Immune Receptor Repertoire sequencing

CDR

complementarity-determining region

CH1

first IGHC exon

CH3

C region exon 3

CH4

C region exon 4

CSR

class switch recombination

FLAIRR-seq

full-length AIRR-seq

gDNA

genomic DNA

IGH

genomic locus that encodes the IgH

IMGT

international ImMunoGeneTics information system

IRB

Institutional Review Board

Iso-Seq

isoform sequencing

SHM

somatic hypermutation frequency

SMRT

single-molecule real-time

SNV

single-nucleotide variant

sr-AIRR-seq

Adaptive Immune Receptor Repertoire sequencing using short-read sequencing

TSO-UMI

template switch oligo and a unique molecular identifier

UMI

unique molecular identifier

UofL

University of Louisville

1
Schroeder
,
H. W.
Jr.
,
L.
Cavacini
.
2010
.
Structure and function of immunoglobulins
.
J. Allergy Clin. Immunol.
125
(
2
Suppl. 2
):
S41
S52
.
2
Janda
,
A.
,
A.
Bowen
,
N. S.
Greenspan
,
A.
Casadevall
.
2016
.
Ig constant region effects on variable region structure and function
.
Front. Microbiol.
7
:
22
.
3
Nakano
,
T.
,
M.
Matsui
,
I.
Inoue
,
T.
Awata
,
S.
Katayama
,
T.
Murakoshi
.
2011
.
Free immunoglobulin light chain: its biology and implications in diseases
.
Clin. Chim. Acta
412
:
843
849
.
4
Lu
,
L. L.
,
T. J.
Suscovich
,
S. M.
Fortune
,
G.
Alter
.
2018
.
Beyond binding: antibody effector functions in infectious diseases
.
Nat. Rev. Immunol.
18
:
46
61
.
5
Vidarsson
,
G.
,
G.
Dekkers
,
T.
Rispens
.
2014
.
IgG subclasses and allotypes: from structure to effector functions
.
Front. Immunol.
5
:
520
.
6
Greiff
,
V.
,
C. R.
Weber
,
J.
Palme
,
U.
Bodenhofer
,
E.
Miho
,
U.
Menzel
,
S. T.
Reddy
.
2017
.
Learning the high-dimensional immunogenomic features that predict public and private antibody repertoires
.
J. Immunol.
199
:
2985
2997
.
7
Tonegawa
,
S.
1983
.
Somatic generation of antibody diversity
.
Nature
302
:
575
581
.
8
Nishana
,
M.
,
S. C.
Raghavan
.
2012
.
Role of recombination activating genes in the generation of antigen receptor diversity and beyond
.
Immunology
137
:
271
281
.
9
Tong
,
P.
,
A.
Granato
,
T.
Zuo
,
N.
Chaudhary
,
A.
Zuiani
,
S. S.
Han
,
R.
Donthula
,
A.
Shrestha
,
D.
Sen
,
J. M.
Magee
, et al
.
2017
.
IgH isotype-specific B cell receptor expression influences B cell fate. [Published erratum appears in 2017 Proc. Natl. Acad Sci USA 114: E9750–E9751.]
Proc. Natl. Acad. Sci. USA
114
:
E8411
E8420
.
10
Noviski
,
M.
,
J. L.
Mueller
,
A.
Satterthwaite
,
L. A.
Garrett-Sinha
,
F.
Brombacher
,
J.
Zikherman
.
2018
.
IgM and IgD B cell receptors differentially respond to endogenous antigens and control B cell fate
.
eLife
7
:
e35074
.
11
Stavnezer
,
J.
,
J. E.
Guikema
,
C. E.
Schrader
.
2008
.
Mechanism and regulation of class switch recombination
.
Annu. Rev. Immunol.
26
:
261
292
.
12
Karagiannis
,
P.
,
A. E.
Gilbert
,
D. H.
Josephs
,
N.
Ali
,
T.
Dodev
,
L.
Saul
,
I.
Correa
,
L.
Roberts
,
E.
Beddowes
,
A.
Koers
, et al
.
2013
.
IgG4 subclass antibodies impair antitumor immunity in melanoma
.
J. Clin. Invest.
123
:
1457
1474
.
13
Chung
,
A. W.
,
T. K.
Ho
,
P.
Hanson-Manful
,
S.
Tritscheller
,
J. M.
Raynes
,
A. L.
Whitcombe
,
M. L.
Tay
,
R.
McGregor
,
N.
Lorenz
,
J. R.
Oliver
, et al
.
2020
.
Systems immunology reveals a linked IgG3-C4 response in patients with acute rheumatic fever
.
Immunol. Cell Biol.
98
:
12
21
.
14
Vander Heiden
,
J. A.
,
P.
Stathopoulos
,
J. Q.
Zhou
,
L.
Chen
,
T. J.
Gilbert
,
C. R.
Bolen
,
R. J.
Barohn
,
M. M.
Dimachkie
,
E.
Ciafaloni
,
T. J.
Broering
, et al
.
2017
.
Dysregulation of B cell repertoire formation in myasthenia gravis patients revealed through deep sequencing
.
J. Immunol.
198
:
1460
1473
.
15
Huijbers
,
M. G.
,
L. A.
Querol
,
E. H.
Niks
,
J. J.
Plomp
,
S. M.
van der Maarel
,
F.
Graus
,
J.
Dalmau
,
I.
Illa
,
J. J.
Verschuuren
.
2015
.
The expanding field of IgG4-mediated neurological autoimmune disorders
.
Eur. J. Neurol.
22
:
1151
1161
.
16
Polonelli
,
L.
,
J.
Pontón
,
N.
Elguezabal
,
M. D.
Moragues
,
C.
Casoli
,
E.
Pilotti
,
P.
Ronzi
,
A. S.
Dobroff
,
E. G.
Rodrigues
,
M. A.
Juliano
, et al
.
2008
.
Antibody complementarity-determining regions (CDRs) can display differential antimicrobial, antiviral and antitumor activities
.
PLoS One
3
:
e2371
.
17
Liu
,
H.
,
W.
Pan
,
C.
Tang
,
Y.
Tang
,
H.
Wu
,
A.
Yoshimura
,
Y.
Deng
,
N.
He
,
S.
Li
.
2021
.
The methods and advances of adaptive immune receptors repertoire sequencing
.
Theranostics
11
:
8945
8963
.
18
Trück
,
J.
,
A.
Eugster
,
P.
Barennes
,
C. M.
Tipton
,
E. T.
Luning Prak
,
D.
Bagnara
,
C.
Soto
,
J. S.
Sherkow
,
A. S.
Payne
,
M.-P.
Lefranc
, et al
AIRR Community
.
2021
.
Biological controls for standardization and interpretation of adaptive immune receptor repertoire profiling
.
eLife
10
:
e66274
.
19
Horns
,
F.
,
C.
Vollmers
,
D.
Croote
,
S. F.
Mackey
,
G. E.
Swan
,
C. L.
Dekker
,
M. M.
Davis
,
S. R.
Quake
.
2016
.
Lineage tracing of human B cells reveals the in vivo landscape of human antibody class switching. [Published erratum appears in 2016 eLife 5: e23066.]
eLife
5
:
e16578
.
20
Calonga-Solís
,
V.
,
D.
Malheiros
,
M. H.
Beltrame
,
L. B.
Vargas
,
R. M.
Dourado
,
H. C.
Issler
,
R.
Wassem
,
M. L.
Petzl-Erler
,
D. G.
Augusto
.
2019
.
Unveiling the diversity of immunoglobulin heavy constant gamma (IGHG) gene segments in Brazilian populations reveals 28 novel alleles and evidence of gene conversion and natural selection
.
Front. Immunol.
10
:
1161
.
21
Jonsson
,
S.
,
G.
Sveinbjornsson
,
A. L.
de Lapuente Portilla
,
B.
Swaminathan
,
R.
Plomp
,
G.
Dekkers
,
R.
Ajore
,
M.
Ali
,
A. E. H.
Bentlage
,
E.
Elmér
, et al
.
2017
.
Identification of sequence variants influencing immunoglobulin levels
.
Nat. Genet.
49
:
1182
1191
.
22
Buck
,
D.
,
E.
Albrecht
,
M.
Aslam
,
A.
Goris
,
N.
Hauenstein
,
A.
Jochim
,
S.
Cepok
,
V.
Grummel
,
B.
Dubois
,
A.
Berthele
, et al
Wellcome Trust Case Control Consortium
.
2013
.
Genetic variants in the immunoglobulin heavy chain locus are associated with the IgG index in multiple sclerosis
.
Ann. Neurol.
73
:
86
94
.
23
Keyeux
,
G.
,
G.
Lefranc
,
M.-P.
Lefranc
.
1989
.
A multigene deletion in the human IGH constant region locus involves highly homologous hot spots of recombination
.
Genomics
5
:
431
441
.
24
Bashirova
,
A. A.
,
W.
Zheng
,
M.
Akdag
,
D. G.
Augusto
,
N.
Vince
,
K. L.
Dong
,
C.
O’hUigin
,
M.
Carrington
.
2021
.
Population-specific diversity of the immunoglobulin constant heavy G chain (IGHG) genes
.
Genes Immun.
22
:
327
334
.
25
Lefranc
,
M. P.
,
G.
Lefranc
,
G.
de Lange
,
T. A.
Out
,
P. J.
van den Broek
,
J.
van Nieuwkoop
,
J.
Radl
,
A. N.
Helal
,
H.
Chaabani
,
E.
van Loghem
, et al
.
1983
.
Instability of the human immunoglobulin heavy chain constant region locus indicated by different inherited chromosomal deletions
.
Mol. Biol. Med.
1
:
207
217
.
26
Lefranc
,
M.-P.
,
G.
Lefranc
.
2012
.
Human Gm, Km, and Am allotypes and their molecular characterization: a remarkable demonstration of polymorphism
. In
Immunogenetics: Methods and Applications in Clinical Practice.
F. T.
Christiansen
,
B. D.
Tait
, eds.
Humana Press
,
Totowa, NJ
, p.
635
680
.
27
Lefranc
,
M.-P.
,
G.
Lefranc
,
T. H.
Rabbitts
.
1982
.
Inherited deletion of immunoglobulin heavy chain constant region genes in normal human individuals
.
Nature
300
:
760
762
.
28
van Erp
,
E. A.
,
W.
Luytjes
,
G.
Ferwerda
,
P. B.
van Kasteren
.
2019
.
Fc-mediated antibody effector functions during respiratory syncytial virus infection and disease
.
Front. Immunol.
10
:
548
.
29
Jefferis
,
R.
,
J.
Lund
,
J. D.
Pound
.
1998
.
IgG-Fc-mediated effector functions: molecular definition of interaction sites for effector ligands and the role of glycosylation
.
Immunol. Rev.
163
:
59
76
.
30
Vander Heiden
,
J. A.
,
G.
Yaari
,
M.
Uduman
,
J. N. H.
Stern
,
K. C.
O’Connor
,
D. A.
Hafler
,
F.
Vigneault
,
S. H.
Kleinstein
.
2014
.
pRESTO: a toolkit for processing high-throughput sequencing raw reads of lymphocyte receptor repertoires
.
Bioinformatics
30
:
1930
1932
.
31
Gupta
,
N. T.
,
J. A.
Vander Heiden
,
M.
Uduman
,
D.
Gadala-Maria
,
G.
Yaari
,
S. H.
Kleinstein
.
2015
.
Change-O: a toolkit for analyzing large-scale B cell immunoglobulin repertoire sequencing data
.
Bioinformatics
31
:
3356
3358
.
32
Charney
,
A. W.
,
N. W.
Simons
,
K.
Mouskas
,
L.
Lepow
,
E.
Cheng
,
J.
Le Berichel
,
C.
Chang
,
R.
Marvin
,
D. M.
Del Valle
,
S.
Calorossi
, et al
Mount Sinai COVID-19 Biobank Team
.
2020
.
Sampling the host response to SARS-CoV-2 in hospitals under siege. [Published errata appear in 2020 Nat. Med. 26: 1493 and 2021 Nat. Med. 27: 560.]
Nat. Med.
26
:
1157
1158
.
33
Woolley
,
C. R.
,
J. H.
Chariker
,
E. C.
Rouchka
,
E. E.
Ford
,
E. A.
Hudson
,
S. J.
Waigel
,
M. L.
Smith
,
T. C.
Mitchell
.
2022
.
Reference long-read isoform-aware transcriptomes of 4 human peripheral blood lymphocyte subsets
.
G3 (Bethesda)
12
:
jkac253
.
34
Nouri
,
N.
,
S. H.
Kleinstein
.
2018
.
A spectral clustering-based method for identifying clones from high-throughput B cell repertoire sequencing data
.
Bioinformatics
34
:
i341
i349
.
35
Nouri
,
N.
,
S. H.
Kleinstein
.
2020
.
Somatic hypermutation analysis for improved identification of B cell clonal families from next-generation sequencing data
.
PLOS Comput. Biol.
16
:
e1007977
.
36
Hoehn
,
K. B.
,
O. G.
Pybus
,
S. H.
Kleinstein
.
2022
.
Phylogenetic analysis of migration, differentiation, and class switching in B cells
.
PLOS Comput. Biol.
18
:
e1009885
.
37
Bolotin
,
D. A.
,
S.
Poslavsky
,
I.
Mitrophanov
,
M.
Shugay
,
I. Z.
Mamedov
,
E. V.
Putintseva
,
D. M.
Chudakov
.
2015
.
MiXCR: software for comprehensive adaptive immunity profiling
.
Nat. Methods
12
:
380
381
.
38
Nazarov
,
V.
,
V.
Tsvetkov
,
E.
Rumynskiy
,
A.
Popov
,
I.
Balashov
, and
M.
Samokhina
.
2022
.
immunarch: bioinformatics analysis of T-cell and B-cell immune repertoires
.
ImmunoMind
,
Berkeley, CA
. .
39
Rodriguez
,
O. L.
,
W. S.
Gibson
,
T.
Parks
,
M.
Emery
,
J.
Powell
,
M.
Strahl
,
G.
Deikus
,
K.
Auckland
,
E. E.
Eichler
,
W. A.
Marasco
, et al
.
2020
.
A novel framework for characterizing genomic haplotype diversity in the human immunoglobulin heavy chain locus
.
Front. Immunol.
11
:
2136
.
40
Lefranc
,
M. P.
2001
.
IMGT, the international ImMunoGeneTics database
.
Nucleic Acids Res.
29
:
207
209
.
41
Li
,
H.
2018
.
Minimap2: pairwise alignment for nucleotide sequences
.
Bioinformatics
34
:
3094
3100
.
42
Li
,
H.
,
B.
Handsaker
,
A.
Wysoker
,
T.
Fennell
,
J.
Ruan
,
N.
Homer
,
G.
Marth
,
G.
Abecasis
,
R.
Durbin
;
1000 Genome Project Data Processing Subgroup
.
2009
.
The sequence alignment/map format and SAMtools
.
Bioinformatics
25
:
2078
2079
.
43
Altschul
,
S. F.
,
W.
Gish
,
W.
Miller
,
E. W.
Myers
,
D. J.
Lipman
.
1990
.
Basic local alignment search tool
.
J. Mol. Biol.
215
:
403
410
.
44
Fu
,
L.
,
B.
Niu
,
Z.
Zhu
,
S.
Wu
,
W.
Li
.
2012
.
CD-HIT: accelerated for clustering the next-generation sequencing data
.
Bioinformatics
28
:
3150
3152
.
45
Gadala-Maria
,
D.
,
G.
Yaari
,
M.
Uduman
,
S. H.
Kleinstein
.
2015
.
Automated analysis of high-throughput B-cell sequencing data reveals a high frequency of novel immunoglobulin V gene segment alleles
.
Proc. Natl. Acad. Sci. USA
112
:
E862
E870
.
46
Gadala-Maria
,
D.
,
M.
Gidoni
,
S.
Marquez
,
J. A.
Vander Heiden
,
J. T.
Kos
,
C. T.
Watson
,
K. C.
O’Connor
,
G.
Yaari
,
S. H.
Kleinstein
.
2019
.
Identification of subject-specific immunoglobulin alleles from expressed repertoire sequencing data
.
Front. Immunol.
10
:
129
.
47
Gidoni
,
M.
,
O.
Snir
,
A.
Peres
,
P.
Polak
,
I.
Lindeman
,
I.
Mikocziova
,
V. K.
Sarna
,
K. E. A.
Lundin
,
C.
Clouser
,
F.
Vigneault
, et al
.
2019
.
Mosaic deletion patterns of the human antibody heavy chain gene locus shown by Bayesian haplotyping
.
Nat. Commun.
10
:
628
.
48
Peres
,
A.
,
M.
Gidoni
,
P.
Polak
,
G.
Yaari
.
2019
.
RAbHIT: R antibody haplotype inference tool
.
Bioinformatics
35
:
4840
4842
.
49
Brochu
,
H. N.
,
E.
Tseng
,
E.
Smith
,
M. J.
Thomas
,
A. M.
Jones
,
K. R.
Diveley
,
L.
Law
,
S. G.
Hansen
,
L. J.
Picker
,
M.
Gale
Jr.
,
X.
Peng
.
2020
.
Systematic profiling of full-length Ig and TCR repertoire diversity in rhesus macaque through long read transcriptome sequencing
.
J. Immunol.
204
:
3434
3444
.
50
Kidd
,
M. J.
,
Z.
Chen
,
Y.
Wang
,
K. J.
Jackson
,
L.
Zhang
,
S. D.
Boyd
,
A. Z.
Fire
,
M. M.
Tanaka
,
B. A.
Gaëta
,
A. M.
Collins
.
2012
.
The inference of phased haplotypes for the immunoglobulin H chain V region gene loci by analysis of VDJ gene rearrangements
.
J. Immunol.
188
:
1333
1340
.
51
Kirik
,
U.
,
L.
Greiff
,
F.
Levander
,
M.
Ohlin
.
2017
.
Parallel antibody germline gene and haplotype analyses support the validity of immunoglobulin germline gene inference and discovery
.
Mol. Immunol.
87
:
12
22
.
52
Janeway
,
C. A.
Jr.
,
P.
Travers
,
M.
Walport
, and
M. J.
Shlomchik
.
2001
.
The distribution and functions of immunoglobulin isotypes
. In
Immunobiology: The Immune System in Health and Disease
, 5th Ed.
Garland Science
,
New York
, p.
428
429
.
53
Irvine
,
E. B.
,
G.
Alter
.
2020
.
Understanding the role of antibody glycosylation through the lens of severe viral and bacterial diseases
.
Glycobiology
30
:
241
253
.
54
Alter
,
G.
,
T. H. M.
Ottenhoff
,
S. A.
Joosten
.
2018
.
Antibody glycosylation in inflammation, disease and vaccination
.
Semin. Immunol.
39
:
102
110
.
55
Collin
,
M.
2020
.
Antibody glycosylation as an immunological key in health and disease
.
Glycobiology
30
:
200
201
.
56
Plomp
,
R.
,
L. R.
Ruhaak
,
H.-W.
Uh
,
K. R.
Reiding
,
M.
Selman
,
J. J.
Houwing-Duistermaat
,
P. E.
Slagboom
,
M.
Beekman
,
M.
Wuhrer
.
2017
.
Subclass-specific IgG glycosylation is associated with markers of inflammation and metabolic health
.
Sci. Rep.
7
:
12325
.
57
Trampert
,
D. C.
,
L. M.
Hubers
,
S. F. J.
van de Graaf
,
U.
Beuers
.
2018
.
On the role of IgG4 in inflammatory conditions: lessons for IgG4-related disease
.
Biochim. Biophys. Acta Mol. Basis Dis.
1864
(
4 Pt B
):
1401
1409
.
This article is distributed under The American Association of Immunologists, Inc.,Reuse Terms and Conditions for Author Choice articles.

Supplementary data