Abstract
The tightly linked A and E blood alloantigen systems are 2 of 13 blood systems identified in chickens. Reported herein are studies showing that the genes encoding A and E alloantigens map within or near to the chicken regulator of complement activation (RCA) gene cluster, a region syntenic with the human RCA. Genome-wide association studies, sequence analysis, and sequence-derived single-nucleotide polymorphism information for known A and/or E system alleles show that the most likely candidate gene for the A blood system is C4BPM gene (complement component 4 binding protein, membrane). Cosegregation of single-nucleotide polymorphism–defined C4BPM haplotypes and blood system A alleles defined by alloantisera provide a link between chicken blood system A and C4BPM. The best match for the E blood system is the avian equivalent of FCAMR (Fc fragment of IgA and IgM receptor). C4BPM is located within the chicken RCA on chicken microchromosome 26 and is separated from FCAMR by 89 kbp. The genetic variation observed at C4BPM and FCAMR could affect the chicken complement system and differentially guide immune responses to infectious diseases.
Introduction
The complement cascade is an integral part of the both the innate and adaptive immune systems, consisting of a group of proteins that work together to opsonize foreign or damaged cells, facilitate phagocytosis, or form a membrane attack complex that directly lyses foreign cells (1, 2). These proteins are found in blood and tissues and are activated by three distinct mechanisms: 1) the classical pathway, activated by Ag/Ab complexes; 2) the alternative pathway, activated by bacterial and fungal cell walls; and 3) the lectin pathway, activated by the attachment of mannose binding lectin to mannose residues on bacterial surfaces. All three pathways converge to C3 activation. Once initiated, complement activation cascades in a sequence during which an activated component stimulates the next component. Multifaceted regulatory processes prevent injury to host cells after complement activation. Passive regulation occurs through the short half-life of complement components as well as the contrasting cell surface characteristics of bacteria compared with host cells. Complement regulatory proteins provide active regulation at various stages of the cascade by limiting or eliminating the action of complement proteins (3).
Genetic regulation of complement proteins occurs via the regulator of complement activation (RCA) gene cluster that acts on complement protein C3 production. Dysregulation of the alternative complement pathway in humans plays a role in atypical hemolytic uremic syndrome and age-related macular degeneration (4). Variants are known in the genes within the complement pathway, many of which impact human health. For humans, variation in various RCA proteins were initially identified as human blood systems. Proteins encoded by RCA genes possess complement control protein domains, also called SUSHI domains or short consensus repeats. Each domain contains ∼60 aa residues with four conserved cysteines arranged in two conserved disulfide bonds and a conserved tryptophan.
The human Cromer blood system Ags are due to single-nucleotide polymorphism (SNP) changes that impact the SUSHI domains of the CD55 protein (5). Variation in the CR1 gene results in Ag protein changes identified as Knops blood system (6), with all allelic variants reported as changes in SUSHI domain 25 (7). Both of these human blood systems are reported as influencing disease outcomes, with the Cromer blood system (CD55 protein) used as a receptor for bacteria and viruses, and the Knops blood system (CR1 protein) associated with differential responses to multiple pathogens, including those responsible for malaria, tuberculosis, and leishmaniasis (see review in Ref. 8).
The human RCA gene cluster is located on the long arm of chromosome 1. Genes and encoded proteins in the RCA include the α- and β-chains of the C4-binding protein (C4BPA and C4BPB), CD55 (complement decay-accelerating factor [DAF]), the complement receptor genes CR2 (CD21), CR1 (CD35), and CR1L (complement receptor 1-like), and CD46 (membrane cofactor protein [MCP]) (9–11). All of these proteins are membrane bound except for C4BPA and C4BPB, which are found in the serum as a chain complex of seven αs and one β. The C4-binding protein inhibits the complement pathways with greater effects on the classical and lectin pathways compared with the alternative pathway.
The chicken is a well-studied nonmammalian vertebrate model organism that has been particularly valuable for immunity-related studies. Chickens provided the first evidence of the dichotomy between B and T cell development, with B cells defined as maturing within the avian bursa of Fabricius (12, 13). The first tumor virus (Rous sarcoma virus) and oncogene (src) were identified in the chicken (4). Strong associations between disease resistance and chicken MHC variation are exemplified in multiple studies (15).
The chicken karyotype consists of 38 autosomes and one pair of sex chromosomes. The autosomes are highly variable in size and generally classified into macrochromosomes and microchromosomes. The chicken genome is one-third the size of the human genome, with lower interspersed repeat content and fewer pseudogenes and segmental duplications. The intron/exon structure of genes is generally conserved, but the chicken introns and intergenic spaces are greatly reduced, also contributing to the size reduction. Long blocks of conserved synteny are observed for chicken/human alignments (16). The first chicken genome reference is from an inbred Red Junglefowl (ancestor of domestic fowl), with improved sequences being obtained to develop builds 4, 5, and 6 (GRCg6A). The chicken RCA syntenic region is found on microchromosome 26. The National Center for Biotechnology Information (NCBI) and Ensembl (GRCg6a) gene identification and annotations within this region are neither complete nor identical. More recently, two additional reference sequences have been obtained for a White Leghorn (WL) breed (egg layer; GRCg7w) and broiler (meat production; GRCG7b). The (bGalGal1.pat.whiteleghornlayer GRCg7w) reference annotation of the RCA region and surrounding genes (between PFKFB2 to CD34) shows five genes: C4BPS, CD55, C4BPG, CR1, and C4BPM.
Chicken blood types were first reported by Landsteiner and Miller (17). Currently there are 13 blood groups, or alloantigeneic systems, known in the chicken. Variation was detected by hemagglutination using specific polyclonal antisera, which identified variable degrees of polymorphism within each blood system (18). Systems were named alphabetically in their order of discovery. The best studied chicken blood system, B, was subsequently determined to be the chicken MHC (1). The A blood system was identified concurrently with the B system in the early studies (2) as it was also highly allogeneic, with seven serologically defined alleles (identified as A1–A7) (19). The molecular mass for the A allogeneic protein was estimated at 53–54.5 kDa (20), with protein expression detected on erythrocytes, but not lymphocytes (21). The E system has 11 distinct serologically defined alleles, identified numerically as E1–E11 (19). Through classical recombination studies, the A and the E blood systems were shown to be genetically linked, separated by 0.5–1.3 cM (22–24), which is estimated as 42,000–109,000 bp for chicken chromosome 26 (16). Due to the close linkage of these two blood systems, identification of the candidate gene for one of the systems could also reveal the closely linked candidate gene for the other.
The specific function of the A blood system in chickens is not known. Associations between A system alleles and egg production and body weight have been reported (25). However, the association between A/E system alleles and immunological-related traits has particular interest. Lines divergently selected for the Ab response following sheep RBC immunization from a common base population differed greatly in their A and E allele frequencies after 10 generations of selection (26). Similar lines divergently selected for resistance or susceptibility to an intestinal parasite (Eimeria tenella) also showed very different A and E system allele frequencies (27).
Preliminary work by the authors indicated that the candidate gene for the A blood system was within the chicken RCA syntenic gene cluster, on chicken chromosome 26. This elevated the relevance of the A blood system with potential implications for immune system function. Furthermore, the Chicken QTL (quantitative trait loci) database (28) lists several immune-related traits associated with chromosome 26, including Ab titer to keyhole limpet hemocyanin (29), Ab titer to LPS (30), bursa of Fabricius size (31, 32), thymus mass (33), Campylobacter colonization (34), and resistance to the avian oncogenic herpesvirus, Marek’s disease virus (35, 36).
We hypothesized that the genes encoding the A and E system alloantigens are associated with immune response in the chicken, and thus identification of the candidate genes would be valuable in understanding the impact of these blood system variants on chicken health. The preliminary observation that these blood system genes were within the RCA provided further impetus to the study. Identification of protein alloantigenic differences in these blood systems could provide new insights into the study of complement regulation in both chickens and humans.
Materials and Methods
Genetic material
DNA was available from individuals from multiple, diverse sources that had blood system allele information. Some samples had individual A and/or E system allele identification, whereas others were from lines that had information on allelic segregation. Table I summarizes these resources, including the breed, the A and E serologically defined alleles present (where known), and the number of samples examined. From the Northern Illinois University (NIU) DNA bank we used DNA from pedigree families segregating for A3 and A4 plus a set of non-pedigree samples with known A system serology (A2, A3, A4, A5). These latter samples included a subset (n = 30) for which E system serology (E2, E3, E7) information was also known. The HAS and LAS chicken lines were developed by divergent selection for high (HAS) or low (LAS) Ab levels following sheep RBC injection (37) and previously shown to be segregating for the A and E blood systems (26). The HYL lines are elite commercial egg-layer lines (from Hy-Line International) for which polyclonal antisera-defined A blood system allele information and DNA were available from past generations. The HYL lines were from three different breeds, White WL, Rhode Island Red (RIR), and White Plymouth Rock (WPR). There was no E system information for any of the HYL lines. Genomic sequence information was available for six inbred experimental lines, some of which had A and or E system allele information,
Hemagglutination assays
Polyclonal sera hemagglutination
Alloantisera production and typing was done previously, as described (38, 39). A 2% solution of washed RBCs was resuspended with appropriately diluted alloantigen-specific antisera and incubated at room temperature for 2 h, then at 4°C overnight. Agglutination reaction of each sample was recorded and genotypes were assigned. The A or E system-specific alloantisera previously used are no longer available, which limited serological testing on fresh samples. Serological information previously recorded using well-defined A blood system polyclonal antisera was available for multiple HYL lines over multiple generations from 2000 to 2012 at the NIU Briles laboratory with DNA from these serotyped individuals available.
mAb hemagglutination
Hemagglutination assays with the ISU-cA mAb were performed following the established protocol (40). Fresh blood cells were washed in PBS, after which 4% solutions of RBCs were prepared. The ISU-cA Ab from mouse ascites was serially diluted from 1:100 to 1:256,000. Equal volumes of RBCs and diluted Abs were mixed, shaken, and incubated for 1 h, resuspended, and then agglutination reactions were scored. Negative cells showed no agglutination, and heterozygotes were distinguished from homozygous positives by their weaker agglutination scores at greater Ab dilutions. The ISU-cA mAb is known to identify specific A system alleles, with A3, A4, and A8 showing agglutination whereas A2 and A5 alleles showed no agglutination (40). This Ab was used to test fresh blood samples from recent generations of HYL lines, some of which had previously defined alloantisera A system alleles. No mAb for detection of the E system alleles exists.
Genotyping
Genome sequences
Genome sequence (8–17x) was available for eight HYL lines from three different breeds (five WLs, one RIR, and two WPRs) produced from line-specific DNA pools of 10 individuals (41). Additional genome sequences for six research lines were also available including the chicken genome reference line UCD001 (Red Junglefowl), five WL lines, IAH 61, IAH 72, IAH 15I, and IAH RH-C (courtesy of Jacqueline Smith, Roslin Institute) plus UCD003 (courtesy of Hans Cheng, U.S. Department of Agriculture). Many of these lines also had published blood system A and E allele information (42–44). Although no A or E system allele information is described for UCD001 (inbred line source of the GRCg6a reference genome), both systems are reported to have different alleles than those found in line UCD003 (45).
Low-pass (4×) sequence
DNA from 92 WL1 line individuals with known ISU-cA mAb reactivity was individually sequenced (4×): 54−/−, 6+/+, and 32−/+ and for 10 samples with known E system alleles (E2/E2 = 2, E3/E3 = 3, and E3/E7 = 5). Libraries were prepared using 0.137 ng of DNA from each sample with the Illumina Nextera XT kit (Illumina, San Diego, CA). Twenty-four individually barcoded libraries were then pooled, purified, and sequenced together on one lane of the HiSeq X with 2 × 150-bp reads. Sequence coverage averaged 4× per sample and was provided by Gencove (New York, NY). All bam files of individuals with the same reaction to ISU-cA Ab were combined and analyzed for SNP frequency differences among the different sample classifications. Case–control association analysis was performed in PLINK (46). Golden Helix Genome Browse software (47) was used to visualize sequences and identify candidate SNPs.
Genotypes from SNP arrays
Individual SNP genotypes were obtained using a proprietary 54K Affymetrix Axiom SNP chip containing a subset of SNPs from the Affymetrix Axiom 600K chicken SNP chip. SNP genotypes were obtained from individuals from the RIR1 line whose RBCs either reacted strongly (20 positive) or did not react (30 negative) to the ISU-cA mAb. Affymetrix Axiom SNP genotyping was performed at GeneSeek (Lincoln, NE). Genotype calling was performed with Affymetrix analysis power tools. A quality control filter was applied with a minor allele frequency of 0.1 and maximum missing genotypes of 0.1 to the 54K SNP chip results, resulting in 43,757 SNPs available for the analysis.
DNA samples that had E system allele information (E2E2 = 1, E2E3 = 5, E3E3 = 11), sourced from the NIU DNA bank, were also genotyped with the 54K SNP chip. SNP frequency between the E2E3 and E3E3 individuals was compared with the predicted frequencies (i.e., frequency = 0.5 in E2E3 and either 0 or 1.0 in E3E3 individuals).
Bioinformatics analyses
All genomic analyses including sequence alignments, genome-wide association study (GWAS), and candidate SNP identification were done utilizing the GRCg6a assembly, which was the most current reference build available at the study’s initiation. During the preparation of the manuscript, two new reference builds were released, one of which (GRCg7w) was from the genome of a WL breed. This version has an improved annotation in the region of interest, which increased confidence in candidate gene identification. Therefore, all SNP locations provided in the tables have been updated to those identified from the GRCg7w assembly.
Using the low-pass sequences, birds with genotypes −/+ and +/+ were treated as cases, whereas birds with the −/− genotype were treated as controls in an association analysis. The program SnpEff (48) was used for variant annotation, and SnpSift (49) was used for selection of variants with high and moderate impact on the protein function according to sequence ontology terms.
Identification of candidate genes
BioMart was used to identify all genes within the preselected regions. Because detection by hemagglutination requires Ag expression on RBC membranes, genes with known Gene Ontology (GO) terms (50) involving membrane (membrane bound; GO:0016020, integral component of membrane; GO:0016021, integral component of plasma membrane; GO:0005887) were selected. UniProt datasets were used for all genes identified within the selected regions to verify GO terms produced by BioMart for candidate genes. Information on selected genes (low-pass sequence data) were compared with frequency of missense variants detected within experimental lines.
The E system locus is estimated to be within 0.5 cM from the A system locus by classical recombination analysis (23, 24). Using the centiMorgan to base pair conversion for chromosome 26 of 11.95 cM/Mb (16), the E system locus is predicted to be within 42 kb of the A system locus. To be as inclusive as possible, the region closely examined for SNP segregation that could fit the expected patterns was extended from 300,000 bp upstream (3.6 cM) to 454,000 bp downstream (5.4 cM) of the A system candidate gene. As was done for the A system, focus was placed on genes encoding proteins reported as being membrane bound with nonsynonymous variants whose segregation patterns fit the known E system alleles identified for specific samples.
SNP identification and genotyping
Detection assays were developed for those SNP alleles within the candidate genes predicted to produce nonsynonymous changes because protein differences are expected to be one cause of serological differences among blood system alleles. SNP alleles were detected via KASP chemistry, which employs one common primer, two allele-specific primers, and fluorescence detection with end-point reads (51). All SNPs for which functional assays were developed and genotypes obtained are listed in Table II for the A system and Table III for the E system.
Haplotype identification
Samples with serologically defined alleles for either A or E system alleles or from lines known to contain specific A system alleles were genotyped for SNP within the relevant gene. Limited SNP combinations (haplotypes) were observed. Each SNP-defined haplotype found was assigned an identification number (for A system candidate C4BPM-H01 through H14; E system candidate FCAMR-H01 through H08). Where possible this number was matched with a serologically defined allele, either previously defined in those samples, or known to be present within the line. For example, for the A system C4BMP-H01 was assigned to samples with the A1 allele, and C4BPM-H02 was assigned to samples with the A2 allele. Similar logic was applied to assign haplotype names to other A and the E system alleles. This nomenclature is relevant for A1, A2, A3, A4, A5, and A8, as these A alleles had multiple DNA sources available for testing (Table I). Furthermore, the positive or negative agglutination reaction with the ISU-cA Ab was determined for nine of the haplotypes.
Protein modeling
Gene synteny analysis
The map displaying the syntenic regions between FCAMR and CD34 (see Fig. 5) was based on chicken genome reference GRCg7w and human reference GRCh38.p13. Multiple sequence alignments were done using COBALT with default parameter and visualized using iTOL. Protein sequences were obtained from the National Institutes of Health National Library of Medicine Protein Database (www.ncbi.nlm.nih.gov/protein/) and included the following: Homo sapiens CD46 accession no. NP_002380.3; Gallus gallus (chicken) C4BPS accession no. BAE16761.1; Loxodonta africana (African elephant) C4BPA accession no. XP_023404145.1; Bos taurus C4BPA accession no. NP_776677.1; Homo sapiens C4BPA accession no. NP_000706.1; Alligator sinensis (Chinese alligator) C4BPM accession no. XP_025068738.1; Struthio camelus (ostrich) accession no. XP_009670590.1; Numida meleagris (guinea fowl) accession no. XP_021233258.1; Numida meleagris (Japanese Quail) accession number XP_015740608.1; Gallus gallus accession no. NP_98005.1.
Animal care statement
Fresh blood samples were obtained from Hy-Line flocks during routine blood sample collections under the care of the Institutional Animal Care and Use Committee.
Results
Identification and confirmation of A system candidate gene
GWAS analysis using low-pass (4×) sequence information obtained from WL1 individuals typed with the ISU-cA mAb showed one strong peak on chromosome 26 (Fig. 1). Closer examination of chromosome 26 showed a single peak between 2,420,000 and 2,890,000 bp (Fig. 2A). Five candidate genes that fit the GO terms involving membrane (membrane bound; GO:0016020, integral component of membrane; GO:0016021, integral component of plasma membrane; GO:0005887) were identified within this peak. These genes are PIGR, C4BPM, CD55, CD34, and PLXNA2.
Genome-wide Manhattan plot utilizing low-pass (4×) sequence from WL1 samples with differing response to ISU-cA mAb (A blood system specific).
Genome-wide Manhattan plot utilizing low-pass (4×) sequence from WL1 samples with differing response to ISU-cA mAb (A blood system specific).
Manhattan plots showing chromosome 26 genomic regions associated with different responses to ISU-cA mAb (A blood system specific). (A) From low-pass sequence data (4×) of HYL WL1. (B) From 54K SNP genotypes of HYL RIR1.
Manhattan plots showing chromosome 26 genomic regions associated with different responses to ISU-cA mAb (A blood system specific). (A) From low-pass sequence data (4×) of HYL WL1. (B) From 54K SNP genotypes of HYL RIR1.
The most significant SNP (p = 7.25E−28; rs313727931) detected by the GWAS was 1214 bp from the start of C4PBM (RefSeq Genes 104, NCBI). Additional confirmation of this small region on chromosome 26 was provided from 100 RIR1 (a different breed) individuals that had both ISU-cA Ab response phenotype and 54K genotypes (Fig. 2B). There were three SNPs with the same p value (1.60E−19) located either within 700 bp of the start of C4BPM (rs314277126) or inside the C4BPM gene itself (rs318134097, rs313904441).
Sequences of these candidate genes were examined for SNPs predicted to change an amino acid. The segregation of these SNPs was then compared with the reported A system allele for multiple lines, including the two HYL lines used for initial identification of the candidate region, and the five unrelated experimental lines with reported A system alleles. This bioinformatics analysis showed that the amino acid–changing SNP within the C4BPM gene produced the most consistency in predicting the A system allele reported within each line.
Haplotype identification and A system allele definition
With the C4BPM gene being identified as a strong candidate by bioinformatics analyses, 24 SNP assays were developed within the gene. The SNP name (rs ID), genomic location (based on build GRCg7w), gene location, codon change, and subsequent amino acid change (where relevant) are provided in Table II for C4BPM. Each SNP was also assigned a short letter name for ease of viewing location with respect to intron/exon structure shown in (Fig. 3. Amino acid and exon numbering for C4BPM is based on the longest gene sequence within NCBI (XM_015298885.2). Alternate spliced forms of C4BPM have been reported. The reference (based on the reference Junglefowl genome, GRCg6a) and alternate allele for each SNP are listed. The SNPs used in this study were those found within the HYL sequences and are located in exons 3 through the 3′ untranslated region (UTR). The SUSHI domain predicted to contain each amino acid changing variant is also included. Additional SNPs, as identified by rs numbers in NCBI (one in exon 1, and two in exon 2), also predicted to cause nonsynonymous changes, are known. None of these variants was seen in the HYL genomic sequences, and thus any variation with these SNPs is not likely to be a factor affecting the A system allele haplotype found within those lines. Genotypes for these 24 SNPs were obtained for multiple samples that had serologically defined A system alleles, either for individual samples or for representatives of lines with previously defined A system alleles (Table I). Utilizing these 24 SNPs, a total of 14 C4BPM haplotypes were subsequently found across all samples analyzed.
Intron/exon structure of chicken C4BPM and FCAMR. (A) Chicken C4BPM. (B) Chicken FCAMR. The SNPs used to define haplotypes are identified as A–X (C4BPM) and A–J (FCAMR). Relative location of each SNP is indicated. *Nonsynonymous SNP.
Intron/exon structure of chicken C4BPM and FCAMR. (A) Chicken C4BPM. (B) Chicken FCAMR. The SNPs used to define haplotypes are identified as A–X (C4BPM) and A–J (FCAMR). Relative location of each SNP is indicated. *Nonsynonymous SNP.
Sources of alloantigen system A and E serologically defined samples used for candidate gene identification
Source . | Breed . | Alleles Present . | Samples (n) . | Referencea . | ||
---|---|---|---|---|---|---|
A . | E . | A . | E . | |||
NIU DNA bank; pedigree | WL, Ancona | A3, A4 | Unk | 162 | 0 | None |
NIU DNA bank | WL, Ancona | A2, A3, A4, A5 | E2, E3, E7 | 58 | 30 | None |
HAS/LAS | WL | A1, A2, A4 | E1, E2 | 87 | 40 | (26) |
UCD001 | RJF | Unk | Unk | 10 | 0 | (45)b |
UCD003 | WL | A4 | E7 | 20 | 20 | (43, 44) |
UCD361 | WL | A2 | E7 | 4 | 4 | (42, 43) |
ADOL-15I | WL | A4 | Unk | 6 | 6 | (43) |
HYL-WL1 | WL | A2, A4 | Unk | 100 | 0 | None |
HYL-WL2 | WL | A1, A2, A4 | Unk | 90 | 0 | None |
HYL-WL3 | WL | A2 | Unk | 99 | 0 | None |
HYL-WL4 | WL | A4, A8 | Unk | 96 | 0 | None |
HYL-WL5 | WL | A4 | Unk | 24 | 0 | None |
HYL-RIR1 | RIR | Unk | Unk | 100 | 0 | None |
HYL-WPR1 | WPR | Unk | Unk | 96 | 0 | None |
HYL-WPR2 | WPR | Unk | Unk | 96 | 0 | None |
Sequence only | Average Coverage | |||||
UCD001 | RJF | Unk | Unk | 6.6 | (16) | |
UCD003 | WL | A4 | E7 | 18.7 | H. Cheng, personal communication | |
IAH-61 | WL | A4 | E7 | 15.2 | (41) | |
IAH-72 | WL | A4 | E5 | 17.6 | (41) | |
IAH 15I | WL | A4 | Unk | 9.2 | (41) | |
IAH-RHC | WL | A4 | E7 | 15.9 | (41) |
Source . | Breed . | Alleles Present . | Samples (n) . | Referencea . | ||
---|---|---|---|---|---|---|
A . | E . | A . | E . | |||
NIU DNA bank; pedigree | WL, Ancona | A3, A4 | Unk | 162 | 0 | None |
NIU DNA bank | WL, Ancona | A2, A3, A4, A5 | E2, E3, E7 | 58 | 30 | None |
HAS/LAS | WL | A1, A2, A4 | E1, E2 | 87 | 40 | (26) |
UCD001 | RJF | Unk | Unk | 10 | 0 | (45)b |
UCD003 | WL | A4 | E7 | 20 | 20 | (43, 44) |
UCD361 | WL | A2 | E7 | 4 | 4 | (42, 43) |
ADOL-15I | WL | A4 | Unk | 6 | 6 | (43) |
HYL-WL1 | WL | A2, A4 | Unk | 100 | 0 | None |
HYL-WL2 | WL | A1, A2, A4 | Unk | 90 | 0 | None |
HYL-WL3 | WL | A2 | Unk | 99 | 0 | None |
HYL-WL4 | WL | A4, A8 | Unk | 96 | 0 | None |
HYL-WL5 | WL | A4 | Unk | 24 | 0 | None |
HYL-RIR1 | RIR | Unk | Unk | 100 | 0 | None |
HYL-WPR1 | WPR | Unk | Unk | 96 | 0 | None |
HYL-WPR2 | WPR | Unk | Unk | 96 | 0 | None |
Sequence only | Average Coverage | |||||
UCD001 | RJF | Unk | Unk | 6.6 | (16) | |
UCD003 | WL | A4 | E7 | 18.7 | H. Cheng, personal communication | |
IAH-61 | WL | A4 | E7 | 15.2 | (41) | |
IAH-72 | WL | A4 | E5 | 17.6 | (41) | |
IAH 15I | WL | A4 | Unk | 9.2 | (41) | |
IAH-RHC | WL | A4 | E7 | 15.9 | (41) |
RIR, Rhode Island Red; RJF, Red Junglefowl; Unk, unknown (no serology data); WL, White Leghorn; WPR, White Plymouth Rock.
Reference for A/E system type information.
UCD 001 types not identified but described as different from the types in UCD 003 (which are A4, E7).
SNPs within the C4BPM gene used to define haplotypes, the SNP genomic and exon locations and their predicted amino acid changes, the SUSHI domain that contains the amino acid variant, and all SNP combinations (haplotypes) found
. | . | . | . | . | . | . | . | . | . | Haplotype . | |||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
SNP Name . | Letter Name . | Location (bp)a . | Exon Gene Location . | Codon Change . | Base Change . | aab . | SUSHI Domain . | Refc . | Altd . | H01 . | H02 . | H03 . | H04 . | H05 . | H06 . | H07 . | H08 . | H09 . | H10 . | H11 . | H12c . | H13 . | H14 . |
rs740454531 | A | 2,694,350 | 3 | AAT>AAC | T>C | N97N | T | C | T | T | T | T | T | C | T | T | T | T | T | T | T | T | |
rs317635931 | B | 2,694,386 | 3 | TCG>TCA | G>A | S109S | G | A | A | G | G | G | G | A | G | G | G | G | A | G | G | G | |
rs739090154 | C | 2,695,038 | 4 | CGT>CAT | G>A | R125H | 2 | G | A | A | G | G | G | G | G | G | G | G | G | G | G | G | G |
rs733092754 | D | 2,695,058 | 4 | CAA>AAA | C>A | Q132K | 2 | C | A | C | C | C | C | C | A | A | C | C | C | A | C | C | A |
rs735075947 | E | 2,695,067 | 4 | GTT>CTT | G>C | V135L | 2 | G | C | G | G | G | G | G | C | C | G | G | G | C | G | G | C |
rs13606128 | F | 2,695,086 | 4 | CCT>CGT | C>G | P141R | 2 | C | G | C | C | G | G | G | G | G | C | G | C | G | C | C | G |
rs13606129 | G | 2,695,094 | 4 | TTA>ATA | T>A | L144I | 2 | T | A | T | T | T | A | T | A | A | T | A | T | A | T | T | A |
rs314825792 | H | 2,695,463 | 5 | CCT>TCT | C>T | P149S | 3 | C | T | C | C | C | T | C | C | C | C | T | C | C | C | C | C |
rs318146258 | I | 2,695,469 | 5 | AAG>GAG | A>G | K151E | 3 | A | G | A | A | G | A | G | A | A | A | A | A | A | A | A | A |
rs16201764 | J | 2,695,506 | 5 | GAA>GTA | A>T | E163V | 3 | A | T | A | A | T | A | T | A | A | A | A | A | A | A | A | A |
rs316860464 | K | 2,695,553 | 5 | AAT>GAT | A>G | N179D | 3 | A | G | G | G | G | G | G | G | G | A | G | G | G | A | G | G |
rs16201765 | L | 2,695,562 | 5 | GCA>ACA | G>A | A182T | 3 | G | A | A | A | A | A | A | A | A | G | A | A | A | G | A | A |
rs737366667 | M | 2,696,368 | 6 | GTC>ATC | G>A | V217I | Linker | G | A | G | G | G | G | A | G | G | G | G | G | G | G | G | G |
rs740868946 | N | 2,696,408 | 6 | GCA>GTA | C>T | A230V | 4 | C | T | C | C | C | C | T | C | C | C | C | C | C | C | C | C |
rs734498555 | O | 2,696,432 | 6 | ATC>ACC | T>C | I238T | 4 | T | C | T | T | T | T | C | T | T | T | T | T | T | T | T | T |
rs318134097 | P | 2,697,378 | Intron | T>C | na | T | C | T | T | T | T | C | T | T | T | T | T | T | T | T | T | ||
rs732151663 | Q | 2,697,548 | 7 | TCT>CCT | T>C | S297P | 5 | T | C | T | T | T | T | T | T | T | T | C | T | T | T | T | T |
rs13904441 | R | 2,698,159 | Intron | C>T | na | C | T | C | C | C | C | T | T | C | C | T | T | C | C | C | C | ||
rs13606130 | S | 2,698,328 | 8 | ATT>ATC | T>C | I351I | T | C | C | T | C | C | C | C | C | T | T | C | C | T | T | C | |
rs13606131 | T | 2,698,429 | 8 | CGT>CAT | G>A | R385H | 6 | G | A | G | G | A | A | A | A | G | G | A | A | G | G | G | G |
rs738859092 | U | 2,698,832 | 9 | ATG>GTG | A>G | M413V | A | G | A | A | A | A | A | A | A | A | G | A | A | A | A | A | |
rs16201773 | V | 2,701,056 | 11 | TCG>TTG | C>T | S436L | C | T | C | C | C | C | T | C | C | C | C | C | C | C | C | C | |
rs736407694 | W | 2,701,065 | 11 | GCT>GTT | C>T | A439V | C | T | C | C | C | C | C | C | C | C | T | C | C | C | C | C | |
rs16201779 | X | 2,702,538 | 3′ UTR | CCA>CCG | A>G | na | A | G | G | A | G | G | A | G | A | G | A | G | G | A | G | A | |
Blood system A serological allele | A1 | A2 | A3 | A4 | A5 | Unk | Unk | A8 | A4 | Unk | Unk | Unk | A2 | Unk | |||||||||
Response to ISU-cA Ab | Unk | Neg | Unk | Pos | Pos | Pos | Unk | Neg | Pos | Neg | Pos | Unk | Neg | Unk |
. | . | . | . | . | . | . | . | . | . | Haplotype . | |||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
SNP Name . | Letter Name . | Location (bp)a . | Exon Gene Location . | Codon Change . | Base Change . | aab . | SUSHI Domain . | Refc . | Altd . | H01 . | H02 . | H03 . | H04 . | H05 . | H06 . | H07 . | H08 . | H09 . | H10 . | H11 . | H12c . | H13 . | H14 . |
rs740454531 | A | 2,694,350 | 3 | AAT>AAC | T>C | N97N | T | C | T | T | T | T | T | C | T | T | T | T | T | T | T | T | |
rs317635931 | B | 2,694,386 | 3 | TCG>TCA | G>A | S109S | G | A | A | G | G | G | G | A | G | G | G | G | A | G | G | G | |
rs739090154 | C | 2,695,038 | 4 | CGT>CAT | G>A | R125H | 2 | G | A | A | G | G | G | G | G | G | G | G | G | G | G | G | G |
rs733092754 | D | 2,695,058 | 4 | CAA>AAA | C>A | Q132K | 2 | C | A | C | C | C | C | C | A | A | C | C | C | A | C | C | A |
rs735075947 | E | 2,695,067 | 4 | GTT>CTT | G>C | V135L | 2 | G | C | G | G | G | G | G | C | C | G | G | G | C | G | G | C |
rs13606128 | F | 2,695,086 | 4 | CCT>CGT | C>G | P141R | 2 | C | G | C | C | G | G | G | G | G | C | G | C | G | C | C | G |
rs13606129 | G | 2,695,094 | 4 | TTA>ATA | T>A | L144I | 2 | T | A | T | T | T | A | T | A | A | T | A | T | A | T | T | A |
rs314825792 | H | 2,695,463 | 5 | CCT>TCT | C>T | P149S | 3 | C | T | C | C | C | T | C | C | C | C | T | C | C | C | C | C |
rs318146258 | I | 2,695,469 | 5 | AAG>GAG | A>G | K151E | 3 | A | G | A | A | G | A | G | A | A | A | A | A | A | A | A | A |
rs16201764 | J | 2,695,506 | 5 | GAA>GTA | A>T | E163V | 3 | A | T | A | A | T | A | T | A | A | A | A | A | A | A | A | A |
rs316860464 | K | 2,695,553 | 5 | AAT>GAT | A>G | N179D | 3 | A | G | G | G | G | G | G | G | G | A | G | G | G | A | G | G |
rs16201765 | L | 2,695,562 | 5 | GCA>ACA | G>A | A182T | 3 | G | A | A | A | A | A | A | A | A | G | A | A | A | G | A | A |
rs737366667 | M | 2,696,368 | 6 | GTC>ATC | G>A | V217I | Linker | G | A | G | G | G | G | A | G | G | G | G | G | G | G | G | G |
rs740868946 | N | 2,696,408 | 6 | GCA>GTA | C>T | A230V | 4 | C | T | C | C | C | C | T | C | C | C | C | C | C | C | C | C |
rs734498555 | O | 2,696,432 | 6 | ATC>ACC | T>C | I238T | 4 | T | C | T | T | T | T | C | T | T | T | T | T | T | T | T | T |
rs318134097 | P | 2,697,378 | Intron | T>C | na | T | C | T | T | T | T | C | T | T | T | T | T | T | T | T | T | ||
rs732151663 | Q | 2,697,548 | 7 | TCT>CCT | T>C | S297P | 5 | T | C | T | T | T | T | T | T | T | T | C | T | T | T | T | T |
rs13904441 | R | 2,698,159 | Intron | C>T | na | C | T | C | C | C | C | T | T | C | C | T | T | C | C | C | C | ||
rs13606130 | S | 2,698,328 | 8 | ATT>ATC | T>C | I351I | T | C | C | T | C | C | C | C | C | T | T | C | C | T | T | C | |
rs13606131 | T | 2,698,429 | 8 | CGT>CAT | G>A | R385H | 6 | G | A | G | G | A | A | A | A | G | G | A | A | G | G | G | G |
rs738859092 | U | 2,698,832 | 9 | ATG>GTG | A>G | M413V | A | G | A | A | A | A | A | A | A | A | G | A | A | A | A | A | |
rs16201773 | V | 2,701,056 | 11 | TCG>TTG | C>T | S436L | C | T | C | C | C | C | T | C | C | C | C | C | C | C | C | C | |
rs736407694 | W | 2,701,065 | 11 | GCT>GTT | C>T | A439V | C | T | C | C | C | C | C | C | C | C | T | C | C | C | C | C | |
rs16201779 | X | 2,702,538 | 3′ UTR | CCA>CCG | A>G | na | A | G | G | A | G | G | A | G | A | G | A | G | G | A | G | A | |
Blood system A serological allele | A1 | A2 | A3 | A4 | A5 | Unk | Unk | A8 | A4 | Unk | Unk | Unk | A2 | Unk | |||||||||
Response to ISU-cA Ab | Unk | Neg | Unk | Pos | Pos | Pos | Unk | Neg | Pos | Neg | Pos | Unk | Neg | Unk |
The A blood group serological allele and the response to the ISU-cA mAb found for each haplotype is given where known. Alphabet identifiers are included that indicate relative position of each SNP on (Fig. 3. aa, amino acid; na, not applicable; Neg, negative; Pos, positive; Unk, unknown.
Location based on GRCg7w.
Based on XM_015298885.2
Reference allele (H12) is based on build 6 Junglefowl reference genome.
Alternate.
Assignment of C4BPM haplotypes to specific A system alleles
Specific C4BPM haplotypes were consistently found in samples with the same serological A blood system allele and are summarized in Table II. Identification of specific C4BPM haplotypes associated with specific A blood system alleles were confirmed within multiple unrelated samples and lines. There was complete agreement between A3 or A4 serological identification and C4BPM haplotype for all 162 NIU pedigree samples. Within the 58 non-pedigree NIU DNA bank samples, two samples identified as being A5 heterozygotes contained a distinctive haplotype, which then defined A5 = C4BPM-H05. Overall, there was 91% (53/58) agreement between A serology and C4BPM haplotype for the NIU non-pedigree samples. The five discrepancies were due to misidentification of one of the two A system alleles present, which were low-frequency alleles. Thus, from multiple sources, the A1, A2, A3, A4, and A5 alleles were all defined by unique C4BPM haplotypes, C4BPM-H01–C4BPM-H05.
Lines that were serologically identified as having A4 were homozygous for one of two haplotypes: C4BPM-H04, which was found from 11 sources, or C4BPM-H09, as found in three sources. The lone exception was line 15I, which was heterozygous for C4BPM-H04 and C4BMP-H09. These two haplotypes have identical C4BPM SNP constitution from exon 1 through exon 6, suggesting that the variation encoded within exon 7 and beyond does not affect serologically identified epitopes. The A2 allele can also be produced by two haplotypes: C4BPM-H02, as found in most A2 samples, and C4BPM-H13, which differs from C4BPM-H02 by one SNP in the 3′ UTR.
Although the UCD361 line is reported as having the serological A2 allele, the C4BPM haplotype for this line is not the expected C4BPM-H02 that was identified for other A2 alleles. This unique haplotype has been assigned as C4BPM-H07 with no serological moniker. These two haplotypes differ by four nonsynonymous SNPs in exon 4 and one synonymous SNP in exon 8.
The source of the GRCg6a chicken reference genome is the inbred Red Junglefowl line UCD001 (16). Although we have been unable to find a record of the A system serological allele for UCD001, it is reported as being different from that found for UCD003 (45). C4BPM-H12 is the haplotype found in the reference genome sequence, as well as the additional UCD001 samples for which C4BPM haplotypes were obtained. Comparison between C4BPM-H12 (in UCD001) and C4BPM-H04 (in UCD003) shows seven nonsynonymous changes, in exons 4, 5, and 8, which are likely to generate serological differences in the C4BPM protein. Additional C4BPM haplotypes using the same 24 SNPs were found in RIR1, WPR1, and WPR2 lines with either limited or no A system serological information. These C4BPM haplotype numbers were assigned based on the order of haplotype detection.
Identification of A system Ab binding epitopes
For 9 of the 14 C4BPM haplotypes, the ISU-cA Ab binding response was determined (Table II). Five of these haplotypes (C4BPM-H04, -H05, -H06, -H09, and -H11) resulted in Ab binding, whereas four haplotypes (C4BPM-H02, -H08, -H10, and -H13) were not agglutinated by the Ab. Examination of the SNP constitution of these nine haplotypes showed that the positive-binding haplotypes had one common nonsynonymous change, P141R, suggesting that the ISU-cA mAb binds an epitope within SUSHI domain 2 that is impacted by the P141R variant.
The space-filling model generated in Raptor (52) and visualized in UCSF Chimera (54) is shown in (Fig. 4. The amino acid variants identified by the SNP set used are indicated in red, and the P141R variant is shown in magenta. Asterisks have been added to those variants predicted to be within a SUSHI domain to distinguish close variants. SUSHI domain 2 contains five amino acid changing variants, four of which can be clearly seen while a fifth variant (R125H) is obscured in this orientation. SUSHI domain 3 also contains five amino acid variants, all of which are visible on the model. The V217I variant is within the linker region connecting SUSHI domains 3 and 4 and is obscured on the model. SUSHI domain 4 contains two amino acid variants, and SUSHI domains 5 and 6 each contain one. The SNP with no asterisk is not found within a SUSHI domain and is likely close to or within the membrane-spanning region. All but one of the amino acid variants found within SUSHI domains can be observed in this model, showing that the variants occur primarily along one edge of the protein, with the P141R variant also within that same edge.
Space filling model of chicken C4BPM. Exon-encoded domains are shown alternating in green/gold. Exon 1 (signal peptide) points behind and away from the structure and is obscured. Sulfur-containing amino acids are shown in yellow. Amino acid variants resulting from tested SNP variants are shown in red. Magenta indicates the P141R variant. Asterisks indicate each separate amino acid variant. SUSHI domains are encircled and numbered 1–6.
Space filling model of chicken C4BPM. Exon-encoded domains are shown alternating in green/gold. Exon 1 (signal peptide) points behind and away from the structure and is obscured. Sulfur-containing amino acids are shown in yellow. Amino acid variants resulting from tested SNP variants are shown in red. Magenta indicates the P141R variant. Asterisks indicate each separate amino acid variant. SUSHI domains are encircled and numbered 1–6.
Identification of E system candidate gene
Candidate gene identification
The low-pass (4×) sequences of the NIU-sourced E2E2 (n = 2) versus E3E3 (n = 3) samples were examined within the range of 301,000 upstream and 456,000 bp downstream of the C4BPM gene (3.6 cM upstream to 5.4 cM downstream). Within this range, SNPs that fitted the expected pattern of homozygous and different between the two phenotype classes were found only within the FCAMR gene. The 54K SNP data of E2E3 (n = 5) and E3E3 (n = 11) showed 76 SNPs within this same range that were segregating with frequency differences between the two phenotypic classes, 16 of which fit the predicted frequencies (i.e., frequency = 0.5 in E2E3 and either 0 or 1.0 in E3E3). Comparison of these 16 SNPs derived from the 54K SNP set with sequence information of different samples revealed only one SNP (rs15231284), located in an intron of FCAMR, which had a consistent segregation pattern with phenotypic class in both sample sets. A total of 10 SNP assays were developed within exons 1–5 of FCAMR to define the gene’s variation and its subsequent haplotypes. Five of these SNP variants are predicted to change the amino acid constitution. These SNPs, their rs numbers (where available), genomic and gene location, and any amino acid change in addition to the subsequent haplotype are given in Table III.
SNP within the FCAMR gene used to define haplotypes, the genomic and exon locations of SNPs and their predicted amino acid changes, and all SNP combinations (haplotypes) found
. | . | . | . | . | . | . | . | . | Haplotype . | |||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
SNP Name . | Letter Name . | Location (bp)a . | Exon Gene Location . | Codon Change . | Nucleotide Change . | aa . | Refb . | Altc . | H01b . | H02 . | H03 . | H04 . | H05 . | H06 . | H07 . | H08 . |
rs736830918 | A | 2,603,595 | 1 | TTA>TTG | A>G | L25L | A | G | A | A | G | G | A | G | A | G |
rs312784778 | B | 2,603,584 | 1 | GTA>GCA | T>C | V29A | T | C | T | T | C | C | T | C | T | T |
rs315049428 | C | 2,603,538 | 1 | CAG>CAA | G>A | Q44Q | G | A | G | G | G | A | G | G | G | G |
rs735283104 | D | 2,603,243 | 1 | GCA>TCA | G>T | A143S | G | T | G | G | G | T | T | T | T | G |
rs1057718335 | E | 2,603,202 | 1 | GGC>GGT | C>T | G156G | C | T | C | C | C | C | C | T | T | C |
SNPF | F | 2,603,201 | 1 | GCA>ACA | G>A | A157T | G | A | G | G | G | G | G | G | G | G |
rs314448189 | G | 2,603,145 | 1 | AAA>AAG | A>G | K175K | A | G | A | G | G | G | G | G | G | G |
rs313141301 | H | 2,602,496 | 2 | GAC>GAT | C>T | D338D | C | T | C | C | T | C | C | T | C | C |
rs731319700 | I | 2,602,324 | 3 | GCC>TCC | G>T | A368S | G | T | G | G | T | G | G | T | G | G |
rs313409564 | J | 2,601,078 | 4 | ACT>AGT | C>G | T599S | C | G | C | C | G | C | C | C | C | C |
Blood system E serological allele | Unk | E2, E5, E7 | E3, E7 | Unk | Unk | Unk | Unk | Unk |
. | . | . | . | . | . | . | . | . | Haplotype . | |||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
SNP Name . | Letter Name . | Location (bp)a . | Exon Gene Location . | Codon Change . | Nucleotide Change . | aa . | Refb . | Altc . | H01b . | H02 . | H03 . | H04 . | H05 . | H06 . | H07 . | H08 . |
rs736830918 | A | 2,603,595 | 1 | TTA>TTG | A>G | L25L | A | G | A | A | G | G | A | G | A | G |
rs312784778 | B | 2,603,584 | 1 | GTA>GCA | T>C | V29A | T | C | T | T | C | C | T | C | T | T |
rs315049428 | C | 2,603,538 | 1 | CAG>CAA | G>A | Q44Q | G | A | G | G | G | A | G | G | G | G |
rs735283104 | D | 2,603,243 | 1 | GCA>TCA | G>T | A143S | G | T | G | G | G | T | T | T | T | G |
rs1057718335 | E | 2,603,202 | 1 | GGC>GGT | C>T | G156G | C | T | C | C | C | C | C | T | T | C |
SNPF | F | 2,603,201 | 1 | GCA>ACA | G>A | A157T | G | A | G | G | G | G | G | G | G | G |
rs314448189 | G | 2,603,145 | 1 | AAA>AAG | A>G | K175K | A | G | A | G | G | G | G | G | G | G |
rs313141301 | H | 2,602,496 | 2 | GAC>GAT | C>T | D338D | C | T | C | C | T | C | C | T | C | C |
rs731319700 | I | 2,602,324 | 3 | GCC>TCC | G>T | A368S | G | T | G | G | T | G | G | T | G | G |
rs313409564 | J | 2,601,078 | 4 | ACT>AGT | C>G | T599S | C | G | C | C | G | C | C | C | C | C |
Blood system E serological allele | Unk | E2, E5, E7 | E3, E7 | Unk | Unk | Unk | Unk | Unk |
The E blood group serological allele found for each haplotype is given, where known. aa, amino acid; Unk, unknown.
Location based on GRCg7w.
Haplotype (H01) found in the chicken Junglefowl reference genome.
Alternate.
Haplotype identification and E system allele definition
Based on the NIU samples that had both serological and sequence information, haplotypes could be defined for both E2 (FCAMR-H02) and E3 (FCAMR-H03). There were five NIU samples serologically identified as E3E7 heterozygotes. These all fit the FCAMR haplotype pattern of FCAMR-H02/FCAMR-H03 heterozygotes, suggesting that E7 is also defined by FCAMR-H02. Additional haplotypes were found in other lines and breeds that had limited or no serological information, identifying a total of nine FCAMR haplotypes. The Junglefowl build 6 reference genome is assigned to FCAMR-H01, with FCAMR-H02 and FCAMR-H07 found for the E2 serotype and FCAMR-H03 for the E3 serotype (Table III).
E serological allelic information had previously been reported for some of the experimental lines as summarized in Table I. Their FCAMR haplotypes could be determined from the sequence information. Lines UCD003 and UCD361, reported as having E7, both have FCAMR-H02, consistent with the previous FCAMR-H02 identification as defining E7. The FCAMR-H01 haplotype found within the UCD001 Junglefowl buildGRCg6a reference genome is different from that found in UCD003, which aligns with previous reports that these two lines have different E system alleles (45). Lines 61 and IAH-RHC are reported as both having E7, and although they do have the same FCAMR haplotype, they have the FCAMR-H03 haplotype rather than the FCAMR-H02 haplotype previously defined for E7. Both the HAS and LAS lines were reported as segregating for E1 and E2 alleles, although with different frequencies (26). We found that HAS contains FCAMR-H03 and FCAMR-H08, and LAS contains FCAMR-H02, -H04, and -H08. These discrepancies could be due to inconsistent serological assignments because of lack of interlab comparisons and lack of well-defined reagents.
Synteny
Blast, synteny analysis, and multiple sequence alignment allowed further characterization of C4BPM and the RCA region. Attempts to identify a mammalian ortholog of C4BPM by reciprocal blast failed. Chicken C4BPM identified human C4BPA as the blast best hit, but the reciprocal analysis identified chicken C4BPS as the best hit for human C4BPA. To extend this analysis, we examined the syntenic relationships between the human and chicken RCA syntenic regions (Fig. 5). These relationships are well conserved in the region bound by CR1 and FCAMR to the 5′ side of chicken C4BPM, although we were unable to identify an ortholog of C4BPB. To the 3′ side of C4BPM, synteny is also conserved beyond PLXNA2 (data not shown).
Comparative genomic map of the chicken and human RCA region, extended to include the FCAMR gene. The map was derived from chicken bGalGal1.pat.whiteleghornlayer.CRCg7w WZ and human build GRCh38.p13 assembly. The relative location of the conserved nucleotide element (CNE) was determined by BLAST analysis. The size of the map is not to scale.
Comparative genomic map of the chicken and human RCA region, extended to include the FCAMR gene. The map was derived from chicken bGalGal1.pat.whiteleghornlayer.CRCg7w WZ and human build GRCh38.p13 assembly. The relative location of the conserved nucleotide element (CNE) was determined by BLAST analysis. The size of the map is not to scale.
The region spanning from CR1 to CD34 differs between the human and chicken. Identification of a chicken ortholog of CR1L was complicated by the extensive conservation of SUSHI domains across the complement genes. Also, CD46 appeared to lack a direct ortholog in the chicken. To determine whether there is conservation in the region between CD24 and CD34, the nucleotide sequence between the 3′ end of human CR1 and the 3′ end of human CD34 was compared with the chicken genome by BLAST. Only two conserved nucleotide sequences were identified. The first encoded the miR-29 microRNA gene and the second a conserved nucleotide element (CNE) of ∼300 nt. The MIR-29 gene and CNE are located in an intron of the chicken C4BPM gene and in the intergenic region between human CD46 and CD34.
A distance tree was produced to further evaluate the homology between C4BPM and other mammalian and archosaur gene products. The analysis (Fig. 6) indicates that chicken C4BPS is likely an ortholog of mammalian C4BPA. The C4BPM gene appears limited to the archosaur lineage whereas human CD46 is an outgroup to both C4BPA and C4BPM.
Multiple sequence alignment encompassing representative CD46, C4BPA, and C4BPM sequences. Alignment was carried out using COBALT and the tree was built with iTOL. Protein sequences were obtained from the National Institutes of Health National Library of Medicine Protein Database. Homo sapiens accession no. NP_002380.3; Gallus gallus (chicken) C4BPS accession no. BAE16761.1; Loxodonta africana (African elephant) accession no. XP_023404145.1; Bos taurus accession no. NP_776677.1; Homo sapiens C4BPA accession no. NP_000706.1; Alligator sinensis (Chinese alligator) C4BPM accession no. XP_025068738.1; Struthio camelus (ostrich) accession no. XP_009670590.1; Numida meleagris (guinea fowl) accession no. XP_021233258.1; Coturnix japonica (Japanese quail) accession no. XP_015740608.1; Gallus gallus accession no. NP_98005.1.
Multiple sequence alignment encompassing representative CD46, C4BPA, and C4BPM sequences. Alignment was carried out using COBALT and the tree was built with iTOL. Protein sequences were obtained from the National Institutes of Health National Library of Medicine Protein Database. Homo sapiens accession no. NP_002380.3; Gallus gallus (chicken) C4BPS accession no. BAE16761.1; Loxodonta africana (African elephant) accession no. XP_023404145.1; Bos taurus accession no. NP_776677.1; Homo sapiens C4BPA accession no. NP_000706.1; Alligator sinensis (Chinese alligator) C4BPM accession no. XP_025068738.1; Struthio camelus (ostrich) accession no. XP_009670590.1; Numida meleagris (guinea fowl) accession no. XP_021233258.1; Coturnix japonica (Japanese quail) accession no. XP_015740608.1; Gallus gallus accession no. NP_98005.1.
Discussion
The complement system is an essential pathway within both the innate and acquired immune systems. Appropriate activation and regulation are critical for elimination of pathogens and development of humoral immunity. The clustering of regulators of complement activation into one genomic region and the synteny of this clustering in multiple species are indicative of the significance of precise, coordinated regulation of the complement pathway.
The RCA genomic regions from humans and chickens are distinct between the CD24 and CD34 genes. These regions encompass the C4BPM gene in chickens and CD46 and CR1L in humans, in addition to the CNE and miR-29 genes, which are found in both species. Reciprocal BLAST analysis failed to identify a CD46 or CR1L ortholog in chickens within this genomic region or a C4BPM ortholog in mammals, including humans (data not shown). BLAST analysis of C4BPM identifies human C4BPA as the most similar protein sequence, but multiple sequence alignment suggests that C4BPM and C4BPA are homologous due to the conserved SUSHI domains but are not orthologs. Based on these multiple alignments and syntenic relationships, we hypothesize that the chicken C4BPS gene encodes the true ortholog of human C4BPA. A candidate C4BPM ortholog is found in avian and crocodilian species (Fig. 6), indicating that this gene is conserved in archosaurs. There are at least two possible explanations for this difference between the mammalian and archosaur sequences in this genomic region. It is possible that the genome sequence is inaccurate within this region across multiple species, and future efforts will identify C4BPM in mammals and CD46 in archosaurs. Alternatively, the region between CR1 and CD34 may have undergone different selective pressures following the divergence that gave rise to the mammals and archosaurs.
Multiple approaches using independent sources of information were used to identify the A blood system candidate gene. The low-pass (4×) sequences of WL individuals having known A allelic differences focused scrutiny on a 400,000-bp region (2.4 to 2.8 Mb) on chromosome 26. This same region was confirmed in another GWAS study using a 54K SNP chip with a different chicken breed (Rhode Island Red). These two independent sample sets added confidence to the identification of the 2.4- to 2.8-Mb region on chromosome 26 as containing the A system candidate gene. The top SNPs (greatest frequency difference between phenotypic classes) from these two independent studies were either close to or within the C4BPM gene. The examination of genomic sequence information from inbred lines with known serologically identified A system alleles provided additional confirmatory information. The predicted molecular mass of the C4BPM protein based on amino acid constitution is 50 kDa, which is similar to the estimation of 53–54.5kD for the A system Ag based on SDS-PAGE (20), thus adding further support to C4BPM as the A system candidate protein. Although there were minor inconsistencies between serological types reported and C4BPM haplotypes, these could be attributed to lack of reagents to detect rare alleles.
The known close linkage between the A and E systems narrowed the genomic region to be investigated as the source of the E system candidate gene. Although we greatly expanded the region of interest to encompass 9 cM surrounding C4BPM, the only SNPs that fit both 4× sequence and 54K SNP data of different samples were nearby to FCAMR. This gene is located 89,000 bp from C4BPM, which is estimated as 1.04 cM, and thus within the range estimated as separating the A and E blood systems (22–24). The gene lies outside the defined RCA but it is within the same syntenic region as for humans. Multiple sources showed consistency in the FCAMR haplotype and E system alleles. However, there were inconsistencies between FCAMR haplotype and E system serological alleles previously reported for some of the experimental lines. It is noteworthy that serological identity of alleles requires the use of the same alloantisera and testing under the same conditions. There is no record of an international collaboration within which fresh blood cells and E blood system specific reagents from multiple laboratories were shared as has been reported for MHC-B system comparisons (38, 55–57). The A system–specific mAb was essential in obtaining additional samples with A system allele variation information, and in combination with past serological information allowed the assignment of specific C4BPM haplotypes to A system serological alleles. The lack of serological reagents for the E system combined with the limited number of samples possessing previously defined E allelic variants hindered further confirmation of the E system candidate gene.
Two different threading models, Raptor and SWISS-MODEL, were used to produce three-dimensional models of C4BPM. Both models produced similar protein structures, suggesting that the modeling is likely a good representation of the C4BPM protein. The mapping of the possible amino acid variants as defined by the detected SNP variants shows that most protein changes occurred on one edge of the molecule, rather than randomly dispersed on all sides. This finding suggests that this area of the protein is either not constrained for functionality or that this area is exposed for Ab elicitation during alloantisera production. The P141R variant, hypothesized to be the epitope responsible for the ISU-cA binding, is also on the same edge of the protein as the other variants. Comparison of structural models differing for the P141R variant does not indicate any change in the protein structure.
Whether the variations identified within the chicken A and E blood systems are associated with disease resistance is unknown, but note that the A and E system alleles reported in the present study are from chicken lines selected for high production (HYL) or from experimental lines, neither of which has extensive disease studies. The observation that variants in genes within the RCA can impact pathogen infections in both humans and chickens is intriguing and suggests that more in-depth studies of the role of these genes in the immune system are warranted.
Better understanding of the immune response of chickens is important for multiple reasons. Chickens are excellent animal models in diverse disciplines including embryology, genetics, physiology, immunology, virology, and host–pathogen interactions. Furthermore, they provide high-quality protein for human consumption in the form of meat and eggs, in production systems with decreasing availability of both antibiotics and antiparasitic medications. The results of this study provide new opportunities for understanding the impact of variation in complement-related genes for immune response and disease resistance, with relevance to both humans and chickens.
Acknowledgements
We are indebted to Rene Kopulos and Linda Yates who isolated and cataloged DNA of known blood system reactivities and provided the samples for this study, to Susan Lamont who provided the ISU-cA Ab, and to Fiona McCarthy who updated the nomenclature for both candidate genes. The assistance of Jacqueline Smith and Jeb Owen with manuscript revisions is greatly appreciated. The authors also acknowledge GenCove for providing the 1× and 4× genome sequences. Molecular graphics and analyses were performed with UCSF Chimera, developed by the Resource for Biocomputing, Visualization, and Informatics at the University of California, San Francisco.
Footnotes
This work was supported by the National Institute of Food and Agriculture, the U.S. Department of Agriculture National Research Support Program-8 (NRSP-8) coordinators, Hy-Line International, and the West Virginia University Elwood and Ruth Briles Avian Alloantigen Research Fund.
Abbreviations used in this article:
- C4BPA
α-chain of the C4-binding protein
- C4BPB
β-chain of the C4-binding protein
- C4BPM
complement component 4 binding protein, membrane
- CR1L
complement receptor 1–like
- FCAMR
Fc fragment of IgA and IgM receptor
- GO
Gene Ontology
- GWAS
genome-wide association study
- NCBI
National Center for Biotechnology Information
- NIU
Northern Illinois University
- RCA
regulator of complement activation
- RIR
Rhode Island Red
- SNP
single-nucleotide polymorphism
- UTR
untranslated region
- WL
White Leghorn
- WPR
White Plymouth Rock
References
Disclosures
The authors have no financial conflicts of interest.