Abstract
The 0.8-Mb Ig new Ag receptor (IgNAR) region of the whitespotted bamboo shark (Chiloscyllium plagiosum) is incompletely assembled in Chr_44 of the reference genome. Here we used Cas9-assisted targeting of chromosome segments (CATCH) to enrich the 2 Mb region of the Chr_44 IgNAR loci and sequenced it by PacBio and next-generation sequencing. A fragment >3.13 Mb was isolated intact from the RBCs of sharks. The target was enriched 245.531-fold, and sequences had up to 94% coverage with a 255× mean depth. Compared with the previously published sequences, 20 holes were filled, with a total length of 3508 bp. In addition, we report five potential germline V alleles of IgNAR1 from six sharks that may belong to two clusters of the IgNAR. Our results provide a new method to research the germline of large Ig gene segments, as well as provide the enhanced bamboo shark IgNAR gene loci with fewer gaps.
Introduction
The Ig new Ag receptor (IgNAR), an H-chain homodimer lacking L chains, was first discovered in the nurse shark in 1995 (1). These molecules are exposed to the blood of sharks, containing 350 mM urea, which imparts them with superior stability (2–5). A single-domain variable domain of IgNAR (vNAR) is only 12 kDa in size (6). Moreover, vNARs have different loop structures compared with classical Abs (2, 6, 7). These characteristics make them exceptional tools for research, diagnostics, and potential therapeutics (8). Moreover, the shark is one of the most evolutionarily ancient extant organisms in which the adaptive immune system, as defined in humans, is found (9). Accordingly, shark Abs can provide unique insights into the evolution of the immune system (9, 10). The current research on IgNAR mainly focuses on the structural features of molecules (6, 7, 11–13), the characteristics of the vNAR repertoire, and the development of the single-domain Ab (8, 14). However, limited studies have investigated the germline sequence of the IgNAR. So far, only five elasmobranch species have undergone complete genome sequencing (15–17), including the brownbanded bamboo shark, the whale shark, the cloudy catshark, the white shark, and the whitespotted bamboo shark. Among them, only the whitespotted bamboo shark genome has chromosome-level assembly, and the IgNAR genes were annotated on the genome for the first time in our previous work (17); however, there remain some deletions and possible assembly errors in its IgNAR region (L.K. Wei, unpublished observations), which may be caused by the high similarity of each cluster. These problems seriously hinder further development of shark single-domain Abs and in-depth understanding of IgNAR.
In the past, depicting the complex genomic structural regions required the construction of the bacterial artificial chromosome library of the whole genome and haplotype-specific iterative mapping and sequencing. These methods are both expensive and labor intensive (18–20). The PacBio HiFi (high-fidelity) sequencing method yields highly accurate long-read sequencing datasets with an average read length of 10–25 kb and accuracies >99.5%, which show excellent results for the assembly of complex genomes (21). However, PacBio HiFi sequencing of the whole genome is still costly. Currently, the probe capture method is the most commonly used to enrich the target region, and commercial probes have been developed for specific regions of some model organisms. However, for new species, it is necessary to repeatedly test the capture effect of the probe and modify the probe sequence to finally capture the complete target region; these works are highly complex and also costly. In recent years, genomic segment enrichment based on Cas9 site-specific restriction digestion and pulsed-field gel electrophoresis (PFGE) separation has been developed. Cas9-assisted targeting of chromosome segments (CATCH) was first used for targeted cloning of large intact genomic fragments in 2015 (22). Moreover, in 2017, Bennett-Baker and Mueller used CRISPR-mediated isolation of specific megabase-sized regions of the genome (CISMR) to successfully enrich 2.3-Mb and 610-kb fragments from the mouse Srsx locus (23). Later, in 2018, Gabrieli and coworkers harnessed CATCH to enrich an ∼200-kb region containing the human BRCA1 gene (24).
Here we employed CATCH technology to release the IgNAR region of Chr_44 of the whitespotted bamboo shark. The released megabase-sized genomic segments were isolated by PFGE and processed by multiple displacement amplification (MDA), followed by PacBio HiFi sequencing and next-generation sequencing (NGS) (Fig. 1). The bamboo shark IgNAR loci were supplemented and corrected with the newly obtained sequence information.
Materials and Methods
Reagents
Please see Supplemental Table I for a list of reagents used.
Animals
In this study, all adult whitespotted bamboo sharks (Chiloscyllium plagiosum), with an average length of 80 cm, were obtained from Fujian, China, and reared at the Beijing Genomics Institution (BGI) Marine, Shenzhen, China. All of the experimental procedures in this study were conducted according to the laboratory animal ethics guidelines and approved by the institutional review board of BGI.
Sampling and RBC isolation
Genetic material was extracted from male shark nucleated RBCs. Before bleeding, the shark was anesthetized with seawater containing 0.1 g/L ethyl 3-aminobezoate methane sulfonate (MS-222). Next, 2-ml blood samples were harvested from the caudal vein using an 18-gauge needle within 10 min of anesthetization and slowly injected into the blood collection tube (367525; BD) without a needle.
Shark RBCs were isolated with modified Ficoll-Paque Plus (17-1440-02; GE Healthcare) at a density of 1.077 g/ml by adding 2.7 g NaCl per 100 ml to increase the osmolarity from 300 mOsm to 1200 mOsm. Next, 0.5 ml of blood was layered carefully onto 3 ml of shark Ficoll-Paque Plus and centrifuged at 400 × g for 20 min at 8°C, with slow acceleration and reduced braking. Each fraction was collected. RBCs were washed once by centrifugation at 400 × g for 5 min at 8°C in 3.6% NaCl solution, resuspended to 0.5 ml of 3.6% NaCl solution, observed, and counted under a microscope after dilution.
Design, synthesis, and verification of single-guide RNA (sgRNA)
Using the online CRISPOR tool (http://crispor.tefor.net/), sgRNA target sequences were designed from IgNAR regions using the whitespotted bamboo shark reference genome sequence (GCA_004010195.1). Small scaffolds of the reference genome that were <1 kb were concatenated into a longer pseudo-chromosome by separating with 100 unknown nucleotides. The processed genome was added to the CRISPOR database with the help of the software developer, Maximilian Haeussler (25). Using the 1–2000-bp and 2000–2002-kb regions of Chr_44, we designed eight sgRNAs, with 20-bp recognition sequences followed by an NGG protospacer adjacent motif with minimal off-target sites (Supplemental Table II).
The synthesis of sgRNA, which consists of CRISPR RNA (crRNA) and a trans-activating CRISPR was completed by GenScript. Both ends of the sequence were modified by GenScript to three sulfo- and methoxy- groups to improve the stability of the sgRNA.
Whole-genome DNA was extracted from shark RBCs using the phenol-chloroform method, and the template DNA of the sgRNA was amplified using LA Taq Hot Start Versions (RR042A; Takara) according to the manufacturer’s instructions. The primers used for PCR and the expected sizes of PCR products are shown in Supplemental Table II. The basic PCR conditions were set according to the manufacturer’s recommended protocol, with extension at 68°C for 80 s for 748-bp DNA, 68°C for 4 min for ∼2-kb DNA, and 35 amplification cycles. All PCR products were isolated in 2% agarose (BY-R0100; Baygene) and recovered with a QIAEX II Gel Extraction Kit (20021; Qiagen). Next, 6 nM 748-bp DNA and 3 nM ∼2-kb DNA were used to verify the function of sgRNAs according to the system and procedures of the manufacturers of Cas9 Nuclease (20 μM stock, M0386T; New England Biolabs).
Genome digestion and target segment isolation
Agarose block preparation and in-gel cell lysis
Agarose blocks of shark genomic DNA were prepared as previously described, with slight modifications (23, 24). For the preparation of agarose plugs, RBCs were diluted 1:1 in 3.6% NaCl solution and incubated at 43°C for 10 min. Next, 1.2% low melting temperature agarose (SeaPlaque Agarose, 50101; Lonza) was melted at 70°C and then incubated at 43°C for 10 min. Diluted RBCs were mixed 1:1 with melted agarose, immediately dispensed into agarose block mold (which can hold 80-μl aliquots), and incubated at 4°C until solidified (∼15 min).
Two methods were used to lyse cells and wash blocks. Method 1 was based on the method of Bennett-Baker and Mueller (23). Briefly, four agarose blocks were incubated with 1.1 ml of N-laurylsarcosine and proteinase K (NDSK) buffer (Supplemental Table I) for 40 h at 50°C in a 2-ml tube. After NDSK lysis, agarose blocks were rinsed twice in 10 ml of Tris-EDTA (TE) buffer (B548106-0500; Sangon Biotech) for 5 min at room temperature and then incubated with 10 ml of TE buffer containing 0.01 mM PMSF solution (Supplemental Table I) for 16 h at room temperature. After incubation, the blocks were rinsed three times in 10 ml of TE buffer with gentle shaking at 4°C for a further 24 h. Agarose blocks were stored for the long term at 4°C in 10 ml of wash buffer (Supplemental Table I). Method 2 was based on the method of Gabrieli and colleagues (24). Briefly, four agarose blocks were incubated for 2 h at 50°C, followed by overnight at 50°C with 1.16 ml of freshly prepared lysis buffer (Supplemental Table I) with occasional shaking. Agarose blocks were washed three times with 10 ml of TE buffer and gentle shaking for 10 min. Agarose blocks were incubated with 24 μl of RNase A (100 mg/ml, 19101; Qiagen) in 1.2 ml of TE buffer for 1 h at 37°C with occasional shaking, followed by four washes with 10 ml of wash buffer and gentle shaking for 15 min. Finally, the agarose blocks were stored at 4°C in 10 ml of wash buffer.
In-gel Cas9 digestion
Agarose blocks stored in wash buffer were washed four times with 10 ml of TE buffer for 15 min at 4°C and then balanced with 1 ml of Cas9 reaction buffer overnight at 4°C. sgRNA and Cas9 were preassembled before digestion, which was performed by incubating 0.8 μl of Cas9 Nuclease (20 μM stock, M0386T; New England Biolabs) with 0.8 μl of 5′_IgNAR_1 sgRNA (10 μM stock), 0.8 μl of 3′_IgNAR_1 sgRNA (10 μM stock), 10 μl of NEB buffer 3.1 (10× stock; New England Biolabs), and 47.6 μl of RNase-free water (9012; Takara). Method 1 of preassembly and digestion was based on the method of Bennett-Baker and Mueller (23). The mix was preannealed in a 10-min incubation at 37°C. Once half of the block was added, the reactions were incubated at 37°C for 1 h. The reactions were terminated by replacing the mix with NDSK and then incubating with gentle shaking at 4°C for 1 h. Method 2 of preassembly and digestion was based on the method of Gabrieli and colleagues (24). The preassembly mix was incubated for 30 min at room temperature. After preassembly, half of the block was digested with the preassembled mix at 37°C for 2 h. Finally, 6 μl of Proteinase K (10 mg/ml; New England Biolabs) was added to each reaction, and the samples were incubated at 43°C for 3 h to remove excess Cas9.
PFGE analysis of cleaved DNA segments
Half of the agarose blocks were lying flat on the bottom edge of the gel comb. Hansenula wingei CHEF DNA Size Markers (1703667; Bio-Rad Laboratories) were loaded to the left of the samples. Next, 150 ml of tempered 0.8% agarose (Certified Megabase Agarose, 1613108; Bio-Rad Laboratories) in 1× Tris-acetate-EDTA was poured into a 14 × 21–cm CHEF Casting Stand (CHEF Wide/Long Combination Casting Stand, 1703704; Bio-Rad Laboratories) after the blocks were pasted on the comb. All PFGE was performed on a Bio-Rad CHEF-DR III system with an external chiller and recirculation pump. Next, ∼2 L of 1× Tris-acetate-EDTA was precooled at 14°C during the run. PFGE was performed at a 106° angle and 3 V/cm for 48 h with a fixed switch time of 500 s. Gels were stained for 30 min in 1× SYBR Safe DNA Gel Stain (S33102; Life Technologies) at room temperature. Images were captured and evaluated on a Bio-Rad GelDoc XR+ system with Image Lab software.
Target region verification
Southern blotting
DNA released on PFGE gels was transferred to a membrane and hybridized with probes specific for the IgNAR region. Gels were washed once with sterile water and bathed in denaturation solution (Supplemental Table I) for 45 min at room temperature with gentle rotation. Gels were washed twice with sterile water and bathed twice in neutralization solution (Supplemental Table I) for 15 min at room temperature with gentle rotation, followed by washing twice with sterile water. The gels and charged nylon membrane (RPN303B; Amersham) were balanced with 2× SSC (Supplemental Table I) for 5 min. The DNA was transferred to a charged nylon membrane for 20 h, followed by washes with 2× SSC for 3 min and baking for 2 h at 80°C. The membrane was prehybridized for 2 h and hybridized overnight at 37°C in 10 ml of hybridization solution sealed in a rotating glass bottle and incubated in a hybridization oven. According to the manufacturer’s recommendations, the PCR-DIG–labeled probe (Supplemental Table II) was prepared with the PCR DIG Probe Synthesis Kit (11636090910; Roche), which was denatured and added to the hybridization. The membrane was washed twice with 20 ml of 2× SSC/0.1% SDS for 5 min at room temperature, twice with 20 ml of 1× SSC/0.1% SDS for 5 min at 65°C, and balanced with 20 ml of washing buffer for 2–5 min. The DIG-labeled probe was detected according to the recommendations of the DIG-High Prime DNA Labeling and Detection Starter Kit II (11585614910; Roche).
TA cloning
To verify the DNA segments resolved by PFGE, DNA segments were excised from the gel and recovered from each gel slice according to the manufacturer’s instructions of the QIAEX II Gel Extraction Kit (20021; Qiagen). DNA from each gel slice was eluted in 30 μl of nuclease-free water (P1195; Promega) and then amplified directly using 2× TransTaq High Fidelity (HiFi) PCR SuperMix I (AS131; TransGen Biotech). The reaction mixture included 25 μl of 2× TransTaq High Fidelity PCR SuperMix I, 0.5 μl of each primer (100 μM), 8 μl of DNA template, and 16 μl of nuclease-free water. The primers for PCR and the expected sizes of PCR products are shown in Supplemental Table II. The basic PCR conditions were set according to the protocol in the manufacturer’s recommended procedures, with extension at 72°C for 50 s and 35 amplification cycles. All PCR products were isolated in 2% agarose and recovered with a QIAEX II Gel Extraction Kit (20021; Qiagen). Next, 0.03 pmol T-Vector pMD19 (6013; Takara) and 0.3 pmol sample were added to 5 μl of solution I and then incubated at 4°C overnight. The ligated mixture was transformed into DH5α cells (DL1001; Shanghai Weidi Biotechnology) according to the product illustration. All sequences were obtained by Sanger sequencing.
Library preparation and sequencing
To construct the sequencing library, one-half of the original concentration of RBCs (1.595 × 108 cells/ml) was used for embedding, cleaved with a pair of sgRNAs, and separated by PFGE. The DNA segments of interest were excised and recovered as described in the TA cloning section. The concentration of the purified DNA was estimated by applying 1 μl of the sample to high-sensitivity DNA Qubit quantitation (Q32854; Invitrogen). Next, 5 μl of recovered DNA was amplified using the MDA according to the illustration of the REPLI-g Midi Kit (150043; Qiagen); the amplified DNA was purified with 0.8× AMPure XP beads. DNA (1 μg) was subjected to debranching using T7 endonuclease I as described in the method of Gabrieli and colleagues (24), followed by purification with 0.8× AMPure XP beads. Multiple repetitions of amplication and fragmentation were performed to obtain 8 μg of DNA.
Subsequently, 5 μg of purified DNA was used to construct the PacBio HiFi library according to the illustration in the Procedure & Checklist - Preparing HiFi Libraries from Low DNA Input Using SMRTbell Express Template Prep Kit 2.0. One sample, which did not require adding a barcode, and with pooling samples of according to the illustration of the Procedure & Checklist, was sequenced on one SMRT Cell using the Sequel II system with a 2-h preextension and 30-h recordings. A total of 3 μg of purified DNA was used to construct the NGS library according to the illustration in the MGI kit (1000006985; MGI Tech Co.) and sequenced in one lane using BGISEQ-500RS.
Sequencing analysis
Polymerase reads were fragmented into subreads and statistically analyzed. HiFi reads were obtained using the circular consensus sequence (CCS) algorithm (version 5.0.0; parameters, –min-Passes 1, –min-rq 0.99, –min-length 100). HiFi reads were assembled with hifiasm, and the assembled contig and HiFi reads were then mapped against their corresponding reference genome (GenBank assembly accession no. GCA_004010195.1) with minimap2 (26) (parameter, -ac map-pb). Contigs that mapped to Chr_44 and those that were unmapped were selected to reduce the number of gaps in the Chr_44 of the reference genome with fgap (27).
IgNAR1 germline sequence verification
Whole-genome DNA was extracted from the muscle tissue from five sharks and the fin from one shark using the phenol-chloroform method. The extracted DNA was amplified with 30 amplification cycles using the primer of the leader sequence to FW3b of IgNAR1 (Supplemental Table II) and was then purified, cloned, and sequenced using the method described in the TA cloning section. All sequences were aligned using the ClustalW tool in MEGA, and the phylogenetic tree was constructed by the neighbor-joining tree method with bootstrap values after 1000 iterations.
Ethics approval
Ethics approval (FT20021 and FT20021-T1) was obtained from the ethics committee (the institutional review board of BGI) and is valid through December 2022.
Results
Evaluation of CRISPR/Cas9 cleavage activity
We evaluated the CRISPR/Cas9 cleavage activity by digesting the corresponding DNA templates. To verify whether there were more IgNAR clusters, we designed sgRNA targeted to the header 2 Mb region of Chr_44, despite the fact that the IgNAR clusters are located at 0.56–1.36 Mb of Chr_44. The DNA fragments where the sgRNAs were located were amplified and purified (Supplemental Fig. 1A). All sgRNAs showed cleavage activity to their template DNA, with no significant difference in their cleavage efficiency (Supplemental Fig. 1B). The sgRNA 5′_IgNAR_1 and 3′_IgNAR_1 with the fewest off-target sites were selected to enrich the IgNAR region.
Targeted enrichment of IgNAR region
To optimize the concentration of RBCs in the pretreatment process, we tested different cell concentrations to be lysed. The isolated RBCs were resuspended to the sampling volume at a concentration of ∼1.9–3.19 × 108 cells/ml. Embedding at the original concentration, RBCs would not be fully lysed; thus, to achieve full lysis, the cells were diluted to one-half and one-fourth of the original concentration for embedding.
On the basis of the optimized method of Bennett-Baker and Mueller and the method of Gabrieli and colleagues, the IgNAR region was released using the pair of sgRNAs with the fewest off-target sites or a single 3′_IgNAR_1 sgRNA and separated from the rest of genomic DNA by PFGE. After staining, DNA segments >2 Mb could be visualized (Figs. 2A and (3A). Because the reference sequence of the IgNAR region remains incompletely assembled, the size of the region is only an estimate. This result suggested that the target region is >2 Mb in reality.
To ensure that these DNA segments contain the IgNARs, we hybridized the Southern blots of the pulsed-field gels with probes specific to the V and C5 exons of IgNARs, producing positive signals for the >2 Mb segment in the lane with the plug processed by a pair of sgRNAs (Fig. 2B), and the signal of the lane with one-half of the original concentration RBCs was stronger than the one with one-fourth of the original concentration. (Results of hybridization with the C5 probe are not shown.) There were no positive signals in the lane with the plug processed by a single 3′_IgNAR_1 sgRNA and hybridized with V probes (Fig. 2B), and there were positive signals with C probes in method 1, which may be caused by the nonspecificity of the C probe (results not shown). In addition, the recovered DNA segments were amplified with specific primers of four IgNAR clusters, and every cluster could be amplified (Fig. 2C). The cluster 1 DNA was cloned into the pMD19 T-vector, transformed into DH5α, and sequenced by Sanger sequencing. All sequences were aligned with IgNAR sequences and translated into the amino acid sequence from FW1 to FW3b, which showed high similarity to the genome reference sequence as shown in the multiple sequence alignment (Fig. 2D). The Southern blot and TA clone results show that the visualized DNA segments containing the IgNAR region and the IgNAR region itself may be located at the telomere region of Chr_44.
Isolation and characterization of the IgNAR region
The IgNAR region was enriched using optimized conditions (Fig. 3A), and 38–63 ng of target DNA was recovered from each plug (Table I). Before the construction of the library, one-sixth of DNA (8–9 ng) was amplified using MDA to ∼3 μg and was digested with endonuclease fragmentation (Fig. 3B). The average fragment of enriched DNA based on the method of Bennett-Baker and Mueller was 17 kb, and after library construction, it was 18 kb (Fig. 3C). On the basis of the method of Gabrieli and colleagues, the average size of the enriched DNA was 10 kb, and after library construction, it was 16 kb (Fig. 3D).
Method . | Sample . | Purification Yield (ng) . | Polymerase Reads . | HiFi Reads . | On-Target Reads . | On-Target Base (%) . | Fold Enrichment . |
---|---|---|---|---|---|---|---|
1 | 1_1 | 38.4 | 5,203,376 | 2,090,896 | 194,383 | 11.51 | 245.531 |
1_2 | 51.6 | ||||||
1_3 | 63 | ||||||
2 | 2_1 | 61.8 | 5,593,558 | 2,558,424 | 52,595 | 2.098 | 40.456 |
2_2 | 55.8 | ||||||
2_3 | 59.7 |
Method . | Sample . | Purification Yield (ng) . | Polymerase Reads . | HiFi Reads . | On-Target Reads . | On-Target Base (%) . | Fold Enrichment . |
---|---|---|---|---|---|---|---|
1 | 1_1 | 38.4 | 5,203,376 | 2,090,896 | 194,383 | 11.51 | 245.531 |
1_2 | 51.6 | ||||||
1_3 | 63 | ||||||
2 | 2_1 | 61.8 | 5,593,558 | 2,558,424 | 52,595 | 2.098 | 40.456 |
2_2 | 55.8 | ||||||
2_3 | 59.7 |
On the basis of the method of Bennett-Baker and Mueller, a total of 5,203,376 polymerase reads yielding 438.11 Gb were acquired in a single cell. Moreover, 2,090,896 CCS reads yielding 22 Gb were obtained from 59,233,332 subheads. Overall, the 3.2 Gb data were aligned to Chr_44 with the largest proportion, followed by Chr_8 with 1.3 Gb (Fig. 4A), both of which were related to Ig genes. Approximately 99.25% of reads were aligned to the reference genome; 12.49% of reads were aligned to Chr_44, with an enrichment factor (24) of 42.546; and 9.30% of reads were aligned to the target region (Chr_44: 1889-2001667), with an enrichment factor of 245.531. Visualization of the read coverage shows a heavy enrichment of the front 8.8 Mb area compared with the rear 6.5 Mb area of Chr_44 (Fig. 4C), which was consistent with the results observed in PFGE, and 94% of the rear 2 Mb area of Chr_44 was covered with a 255× mean sequencing depth. The same result was observed in NGS data (Supplemental Fig. 2A), which demonstrated that the actual size between the two sgRNAs is much larger than the 2 Mb on the reference genome. Compared with the method of Bennett-Baker and Mueller, the length of reads was shorter and the enrichment factor was lower with use of the method of Gabrieli and colleagues (Table I). A total of 5,593,558 polymerase reads yielding 460.09 Gb were acquired in a single cell. A total of 2,558,424 CCS reads yielding 20 Gb were obtained from 79,890,075 subheads. Overall, the 1.5 Gb data were aligned to Chr_8 with the largest proportion, followed by Chr_44 with 1.2 Gb (Fig. 4B). Approximately 99.58% of reads were aligned to the reference genome, 5.6% of reads were aligned to Chr_44 with an enrichment factor of 14.886, and 2.06% of reads were aligned to the target region (Chr_44: 1889-2001667) with an enrichment factor of 40.456. Visualization of read coverage shows a heavy enrichment of the front 8.8 Mb area compared with the rear 6.5 Mb area of Chr_44 (Fig. 4D), and 94% of the rear 2 Mb area of Chr_44 was covered with a 196× mean sequencing depth. The same result also appeared in the NGS data (Supplemental Fig. 2B). Together, these results show that both methods can enrich the target region; the difference in the length of the fragment and the enrichment factor may be caused by operation or by the method itself.
Using the hifiasm assembler, we obtained an assembly result of 181.97-Mb size, which contains 3939 contigs from the HiFi reads of method 1, and 97.24 Mb, which contains 3072 contigs from method 2. The contigs aligned to Chr_44 and unaligned to genome were used to fill the hole of Chr_44:1889-2001667 with fgap. The contigs in method 1 made up 15 holes, with a total length of 2705 bp, and the contigs in method 2 made up 20 holes, with a total length of 3508 bp (Table II).
Statistic Type . | Scaffold Length . | Scaffold Number . | Contig Length . | Contig Number . | Gap Length . | Gap Number . |
---|---|---|---|---|---|---|
Reference | 1,999,779 | 1 | 1,918,891 | 83 | 80,888 | 82 |
Methods 1 and 2 | 1,999,779 | 1 | 1,922,399 | 63 | 77,380 | 62 |
Method 1 | 1,999,779 | 1 | 1,921,596 | 68 | 78,183 | 67 |
Method 2 | 1,999,779 | 1 | 1,922,399 | 63 | 77,380 | 62 |
Statistic Type . | Scaffold Length . | Scaffold Number . | Contig Length . | Contig Number . | Gap Length . | Gap Number . |
---|---|---|---|---|---|---|
Reference | 1,999,779 | 1 | 1,918,891 | 83 | 80,888 | 82 |
Methods 1 and 2 | 1,999,779 | 1 | 1,922,399 | 63 | 77,380 | 62 |
Method 1 | 1,999,779 | 1 | 1,921,596 | 68 | 78,183 | 67 |
Method 2 | 1,999,779 | 1 | 1,922,399 | 63 | 77,380 | 62 |
All genes of the IgNARs in the reference genome of the whitespotted bamboo shark were found at both HiFi reads of methods 1 and 2 (data not shown). Surprisingly, the V gene of IgNAR1 was only 93% similar to the reference genome (Fig. 5A). In addition, amplification of FW1 to C1 from another shark of the genome and then sequenced by PacBio showed that IgNAR1 has two characteristic sequences: One is similar to the reference, and the other is similar to the sequence obtained using methods 1 and 2 (Fig. 5A). To clarify the germline sequence characteristics of IgNAR1, the muscle tissue of five sharks and the fin of one shark were amplified from the leader sequence to Fw3b. Ninety-four sequences aligned to IgNAR1 were classified into five categories, where F1, F2, and F3 are closer while F4 and F5 are closer. (Fig. 5E). The representative sequence of each category is shown in (Fig. 4C, where the intron sequences of F1, F2, and F3 are completely consistent and those of F4 and F5 are completely consistent. The V gene is translated into amino acids as shown in (Fig. 4D, and there were 14 aa with changes. According to the statistical analysis, each shark has two characteristic sequences except shark 25 (Fig. 5F). Together, these results suggest that IgNAR1 has multiple alleles or that these genes belong to two clusters, where F1, F2, and F3 are one cluster and F4 and F5 are the other cluster, and each cluster has multiple alleles.
Discussion
In this study, we enriched a continuous segment containing the IgNAR region >3.13 Mb using the CATCH or CISMR method from RBCs of whitespotted bamboo sharks. A few micrograms of DNA were obtained after MDA amplification of tens of nanograms of enriched DNA, and a PacBio HiFi library was constructed for sequencing. Finally, the target region was significantly enriched. Previously, Bennett-Baker and Mueller enriched an ∼2.3-Mb mouse X chromosome fragment, which was 1 Mb smaller than the reference genome (23). Shin and colleagues enriched a 0.1–0.2-Mb fragment using the SageHLS instrument (28). Here we enriched an IgNAR region that is larger than the reference genome, indicating that this method can effectively enrich species with reference genomes to continuous and complete large fragments. Moreover, compared with SageHLS, this method can enrich Mb fragments and accurately display the actual size of the target region. Because the target region is still a very small proportion of the whole genome, the quantity of the genetic materials in the target region fragments enriched using this method is limited, usually only yielding a few nanograms to tens of nanograms of DNA. Bennett-Baker and Mueller enriched an ∼2.3-Mb fragment from the mouse Srsx locus and only recovered 3.1–7.0 ng of DNA (23). The IgNAR region enriched in this study only yielded 38–63 ng of DNA, which is far less than the microgram-level starting amount needed to construct the PacBio HiFi library. The proposal of Bennett-Baker and Mueller (23) to use CISMR and long-read sequencing for short regions with low levels of sequence identity, CISMR for long-read sequencing, and Single-Haplotype Iterative Mapping and Sequencing for segmental duplications of large and highly identical regions still have issues with yields that are insufficient to construct a library. In this study, the MDA method tried by Gabrieli and colleagues (24) was used to amplify nanogram-level DNA to microgram-level DNA to satisfy the subsequent requirements for library construction.
In this study, although 92.66% of the target region was covered by the CCS reads with a coverage depth >20×, the region could not be correctly assembled by conventional methods. The assembled region is much larger than the actual region and contains several contigs. The main reason for this is that the chimera reads by MDA lead to assembly confusion. MDA is mainly used in the whole-genome amplification of trace DNA, such as metagenomics and single-cell whole genomes. The chimera caused by MDA is a common problem in this method, so the amplified products cannot be used to confirm the structural variation or for assembly. Current software, such as ChimeraMiner (29), can only identify chimeras caused by MDA in the NGS data. Thus, there is an urgent need to develop a new software to deal with chimeras in the long-read sequence data.
We used the nonimmune tissues of six individuals to reveal the polymorphism of the V gene of IgNAR1. Mikocziova and colleagues confirmed 10 novel germline IGHV alleles by targeted amplification and Sanger sequencing of non–B cell DNA (30). The discovery of these alleles further complements the reference gene set and lays a foundation for the precise analysis of the repertoire of Ig and the BCR. In addition, it provides more candidate genes for the generation of transgenic animals that produce nanobodies (31, 32).
This method, target enrichment based on CRISPR/Cas9 combined with long-read sequencing, can be used to obtain a continuous and complete target region easily. However, to obtain a complete high-resolution sequence, the issue with the recovery and amplification of trace amounts of large fragments of DNA must be solved. If the MDA method is used, the MDA reads need to be processed to remove the chimera. Relatively speaking, this method is simpler and more effective than the probe capture method. However, using this method to obtain a complete and accurate genomic region requires further exploration. Nevertheless, in this study, we used CATCH or CISMR methods to enrich a visible IgNAR region >3.13 Mb, which, to our knowledge, is the largest fragment ever published. This result further broadens the scope of application of the enrichment method. Besides, we have demonstrated that combining CATCH or CISMR methods with MDA amplification and PacBio HiFi sequencing is a feasible means to enrich and sequence large megabase-sized local genomic loci and, to our knowledge, also achieved specific sequence area corrections by the assembly and filling of holes.
Acknowledgements
We thank Jin Huang, Biao Huang, and Qian Xu at BGI-Wuhan Clinical Laboratories for providing the experimental platform. We thank Ou Wang at BGI-Shenzhen for providing the method of using PFGE. We thank Wei Zhang at BGI-Shenzhen for advice on processing the chimeras caused by MDA. We thank Qianqian Zhao at BGI-Shenzhen for advice on preparation of the manuscript. We thank LetPub (www.letpub.com) for its linguistic assistance during the preparation of the manuscript. We thank the China National GeneBank DataBase for storing data.
Footnotes
This work was supported by Shenzhen-Hong Kong Collaboration Fund JCYJ20170412152916724 (20170331).
N.Y., J.S., and M.W. directed the project. H.D. performed shark sampling, laboratory experiments, data analysis, and figure creation and wrote the manuscript. S.Y., L.L., and J.C. raised the sharks. H.D., S.Y., and L.W. sampled sharks. H.D., Y.Z., J.W., H.X., and T.L. performed data analysis. L.W. provided the samples that were used for IgNAR1 germline sequence verification. B.R., X.L., and X.Z. revised the paper.
The data presented in this article have been submitted to the China National GeneBank Sequence Archive (https://db.cngb.org/search/project/CNP0002146/) under accession number CNP0002146.
The online version of this article contains supplemental material.
Abbreviations used in this article
- BGI
Beijing Genomics Institution
- CATCH
Cas9-assisted targeting of chromosome segments
- CCS
circular consensus sequence
- CISMR
CRISPR-mediated isolation of specific megabase-sized regions of the genome
- IgNAR
Ig new Ag receptor
- MDA
multiple displacement amplification
- NDSK
N-laurylsarcosine and proteinase K
- NGS
next-generation sequencing
- PFGE
pulsed-field gel electrophoresis
- sgRNA
single-guide RNA
- TE
Tris-EDTA
- vNAR
variable domain of Ig new Ag receptor
References
Disclosures
The authors have no financial conflicts of interest.