A homologue of complement component C3 (SpC3) has been cloned and sequenced from the purple sea urchin, Strongylocentrotus purpuratus. The preprocessed, deduced protein size is estimated to be 186 kDa with a short leader and two chains, α and β. There are cysteines in conserved positions for interchain disulfide bonding, and there is a conserved thioester site in the α-chain with an associated histidine. There are five consensus N-linked glycosylation sites, and putative cleavage sites for factor I and C3 convertase. Partially purified SpC3 on protein gels shows a nonreduced size of 210 kDa and, under reducing conditions, reveals an α-chain of 130 kDa and a β-chain of 80 kDa. These sizes are larger than the deduced sizes, suggesting that the protein has carbohydrates added to most of the consensus N-linked glycosylation sites. Phylogenetic analysis of SpC3 compared with other members of the thioester protein family, which includes C3, C4, C5, and α2-macroglobulin, shows that SpC3 is the first divergent complement protein, falling at the base of the complement protein clade. Transcripts from the SpC3 gene (Sp064) are 9 kb, and the gene is expressed specifically in coelomocytes, which are the immunocytes in the sea urchin. Genome blots suggest that SpC3 is encoded by a single copy gene per haploid genome. This is the first identification of a complement component in an invertebrate, and suggests homology of the innate immune system within the deuterostome lineage of animals.
The immune response in higher vertebrates is composed of a complex of inter-regulated systems, including both adaptive and innate reactivities to foreign Ags, that are mediated by cellular and humoral systems. One of the major effector arms of the immune response is complement that is composed of about 30 distinct humoral and cell surface proteins. The complement system comprises three convergent pathways (classical, alternate, and lectin) that function to amplify the initiating signal through feedback systems of serine protease activities. The pathways converge and activate the terminal or lytic cascade, which results in the destruction of foreign cells through the formation of the membrane attack complex. The activities of these cascades are controlled by mechanisms that involve additional regulatory proteins. The classical cascade is initiated through specific Ag-Ab interactions and can be considered an effector system of adaptive immunity. The alternative cascade, part of the innate response, is initiated by the complement component C3 that undergoes a constant, low level spontaneous autoactivation reaction, enabling it to bind to molecules (reviewed in 1 . The lectin pathway, also part of the innate system, is initiated through the mannan-binding lectin that interacts with mannose sugars on cell surfaces. It functions in place of C1q (2, 3), and the mannan-binding lectin-associated protease functions in place of C1s and C1r to activate either C4 of the classical pathway (4, 5, 6) or C3 of the alternate pathway (7).
Complement component C3, the central component in all cascades, is the most versatile and multifunctional molecule of the complement proteins (8). Not only does it function in the complement cascades, but it interacts with cell surface proteins on self and foreign cells to initiate and augment immune responsiveness, and it interacts with an array of proteins that regulate its activity (9, 10, 11). C3 also has opsonin functions and is important in immune surveillance and host protection against microbial infection by augmenting phagocytic removal and destruction of invading pathogens by phagocytic cells bearing receptors for C3b. Furthermore, the breakdown product of complement component C3, C3d, when bound to the Ag, is remarkably effective in augmenting the immunogenicity of foreign Ags to activate a specific immune response (12). Clearly, C3 functions effectively in both the adaptive and innate immune responses and acts to link them together. Yet because of the complex and essential activities of C3 in the higher vertebrate, this creates problems in studying each C3 function independent of the others. One means to understand the complexities and evolution of the higher vertebrate complement system and the immune system in general has been to investigate animals that are phylogenetically related, yet have simpler immune responses. The deuterostome lineage of animals is composed of two major phyla, the chordata that includes mammals, and the echinodermata that includes sea urchins, sea stars, sea cucumbers, and other groups. Because of this relationship, the echinoderms are an appropriate choice to understand innate immune responses without the added complexities of interactions between the innate and adaptive systems.
Coelomocytes are cells found in the coelomic cavity of the adult sea urchin that function as mediators of the immune system (for review, see Ref. 13 and references cited therein). Previously, we found that small doses of LPS would significantly activate the coelomocytes from the purple sea urchin, Strongylocentrotus purpuratus (14). This study was then followed by an investigation of genes expressed in these LPS-activated coelomocytes (15). Of the 307 expressed sequence tags (ESTs)3 that were reported, one (EST064) encoded an amino acid sequence that showed significant similarities to the thioester family of complement proteins. We report in this work the completed sequence of the EST064 cDNA, hereafter called Sp064, which shows that it encodes a homologue of complement component C3, called SpC3. The Sp064 gene appears to be single copy per haploid genome, and transcripts are present only in coelomocytes. The encoded protein, SpC3, is found in the coelomic fluid and is composed of two chains. This is the first identification of a complement component homologue expressed in an invertebrate. Because sea urchins are phylogenetically related to vertebrates, both groups being deuterostomes, these data indicate that a simple complement system was present in the deuterostome ancestor rather than the vertebrate ancestor and that it is a far more ancient system than had been previously assumed.
Materials and Methods
Isolation of total RNA from coelomocytes and other adult tissues was done according to Smith et al. (16). Briefly, coelomic fluid (40 ml) was poured through sterile cheese cloth and mixed into 10 ml of cold Ca2+- and Mg2+-free sea water (17) containing 30 mM EDTA, pH 7.4 (CMFSW-E). Coelomocyte pellets and minced solid tissues were vortexed and homogenized using a dounce homogenizer in guanidinium thiocyanate extraction buffer (5 M guanidinium thiocyanate, 50 mM sodium acetate, 50 mM EDTA, 50 mM Tris, pH 7.4, and 5% β-mercaptoethanol), to which was added N-lauroyl sarcosine to a final concentration of 2%. Total RNA was pelleted through a cushion of 5.7 M CsCl containing 50 mM sodium acetate and 50 mM EDTA at 105 × g in either a Ti60 fixed angle rotor (Beckman Instruments, Fullerton, CA) or a swinging bucket rotor (Sorvall, Newtown, CT) at 20°C for 20 h. Pellets were washed in 70% ethanol, resuspended in RNase-free water, extracted in phenol/sevag (1:1) (sevag is 24 parts chloroform, 1 part isoamyl alcohol), precipitated, and resuspended in RNase-free water. Poly(A)+ RNA was isolated using oligo(dT) magnetic beads (Dynal, Great Neck, NY).
cDNA library construction
The LPS-activated coelomocyte cDNA library was constructed using the Time Saver cDNA kit (Pharmacia, Piscataway, NJ) according to kit instructions. The first strand of cDNA was made from Poly(A)+ RNA with a random primer containing a NotI site. An EcoRI adapter was ligated to the 5′ end, and the cDNA was directionally cloned into λExCell phage arms (Pharmacia), as previously reported (15). An arrayed cDNA library was directionally constructed from nonactivated coelomocyte poly(A)+ RNA using the same NotI/random primer, but with a SalI adapter at the 5′ end. The cDNA was ligated into the pSPORT vector (Life Technologies, Grand Island, NY) and electroporated into DH10b bacteria. Individual colonies were inoculated into 384-well microtiter plates according to the methods of Maier et al. (18). The arrayed library consisted of approximately 92,160 clones. PCR products of clone inserts (18,432) were spotted in duplicate onto five Hybond N+ nylon filters, 22 cm × 22 cm (Amersham, Arlington Heights, IL) for screening.
Screening cDNA libraries
The λExCell library was screened using 32P-labeled RNA probes that were generated according to technical information from Promega Corp. (Madison, WI) and as previously reported (16). Filters were prehybridized for 2 h in hybridization solution (50% formamide, 250 mM phosphate buffer, pH 7.4, 1 mM EDTA, 0.1% BSA, and 7% SDS), and then incubated with the probe at 42°C overnight in a rotating oven (Robbins Scientific, Sunnyvale, CA). Final washes were conducted at 68°C in 1× SSC (0.15 M NaCl and 15 mM sodium citrate, pH 7) with 1% SDS. Filters were exposed overnight to X-OMAT XAR-5 x-ray film (Eastman Kodak, Rochester, NY), and positive plaques were rescreened once to purify the phage clones. Phagemids were released according to manufacturers’ instructions (Pharmacia), and in some cases, inserts were subcloned into Bluescript (Stratagene, La Jolla, CA).
The arrayed library was screened with a random primed 32P-labeled probe generated from a deletion clone that was made for sequencing, and consisted of approximately 500 bp of the 5′ end of pExCell14 (Fig. 1). The deletion clone was first used as a PCR template with the M13-40 primer and a primer specific for Sp064 (TCTAAGCAGGTAGACAGC; see Fig. 2, nucleotides 1988 to 2001). After amplification of the insert, the 5′ polylinker was removed by EcoRI digestion, producing a 160-bp PCR fragment, which was isolated by gel electrophoresis and eletroelution. The fragment was then labeled with 32P by random priming.
Filters spotted with DNA from the arrayed library were prehybridized for 1 h at 65°C in 5× SSPE (20× SSPE is 3 M NaCl, 0.2 M NaH2PO4, and 20 mM EDTA, pH 7.4), 5× Denhardt’s solution (50× Denhardt’s is 1% BSA (fraction V; Sigma Chemical Co., St. Louis, MO), 1% polyvinylpyrrolidone, and 1% Ficoll type 400), 0.5% SDS, and 20 μg/ml denatured salmon sperm DNA (type VI; Sigma Chemical Co.). Denatured probe was added and allowed to hybridize for 16 h at 65°C. Filters were washed twice for 10 min each at room temperature in 2× SSPE with 0.1% SDS, followed by three washes of 15 min each in 1× SSPE, 0.1% SDS at 65°C. Filters were then mounted in Saran wrap and exposed to film for 4 h. Plasmids were isolated from positive clones for subsequent characterization.
Sequencing was conducted on plasmid DNA according to the dideoxynucleotide termination protocol (19) using the TaqTrack sequencing kit (Promega Corp.) and incorporating [α-35S]dATP (DuPont NEN, Boston, MA). Sequencing reactions were electrophoresed on a 6% acrylamide gel with 0.6× TBE (10× TBE is 0.9 M Tris, 0.9 M boric acid, and 20 mM EDTA, pH 8.3) running buffer, after which the gel was dried and exposed overnight to BioMax MR-1 x-ray film (Eastman Kodak). For clones longer than several hundred nucleotides, such as the EcoRI to NotI fragment of pExcell139 and all of pExCell14 (see Fig. 1), insert fragments were subcloned into Bluescript (Stratagene), and the Erase-a-Base kit (Promega Corp.) was used to create a nested set of deleted clones. Sequences were analyzed using the MacVector sequence analysis program (Eastman Kodak) and the AssemblyLIGN sequence assembly program (Eastman Kodak) on a Power Macintosh (Apple Computer, Cupertino, CA).
Protein alignment and phylogenetic analysis
The cDNA sequence was used to run a basic blast search of GenBank (20), and several of the matched sequences were used to construct protein alignments with the Clustal W program (21). Without altering the default parameters, a number of variations were used to obtain the best alignment. We defined “best” alignments to be those in which the thioester site and junctions between chains corresponded best between the various proteins. It was noticed that the order in which the proteins were listed slightly affected the alignment results using Clustal W. Therefore, the order was changed in several runs to obtain the best alignment. These variations included 1) listing SpC3 first, followed by the other proteins arranged from lower to higher phylogenetic order, 2) keeping multiple proteins from the same species together, and 3) listing the α2-macroglobulin (α2 M) protein family last. The best alignment on Clustal W was verified by using a second alignment program, DNASIS version 2.1 (Hitachi, San Bruno, CA). To optimize the alignment on DNASIS, various gap penalties were tried (5, 10, 25, 50) and then kept at 5. The fixed gap penalty (10, 25, 50, 100) and floating gap penalty (10, 25, 50, 100) were adjusted separately and in combinations. We found that alignments with DNASIS could not be improved over that generated by Clustal W, and results from the two programs agreed best when the gap penalties on DNASIS were maintained at the default settings.
The PAUP program (version 3.1.1) (22) was used with standard set parameters (character types were set as unordered, and character weights were set as 1) to compare and assemble multiple sequences from the Clustal W alignments into a phylogenetic tree. Slightly different alignments resulting from different ordering of the proteins in Clustal W were used in the PAUP program to identify the shortest tree. The heuristic search method was used with various search options to obtain the shortest tree. The general search options were set to keep minimal trees only and to collapse zero-length branches. The stepwise addition searches were either simple or random with seed numbers of 1, 50, and 100. The degree of support for internal branches was assessed using the bootstrapping method with 1000 bootstrap replications.
Preparation of anti-SpC3 antiserum
A rabbit antiserum was produced against the peptide, DNAKVQEEVDVSPSIGR (see Fig. 2), which was chosen from the amino acid sequence deduced from Sp064 and was based on predictions from the human C3 structure that it would be located on an exposed region of the α-chain and would therefore produce a useful anti-peptide antiserum. The peptide was synthesized using a 430A peptide synthesizer (Applied Biosystems, Foster City, CA), conjugated to keyhole limpet hemocyanin using glutaraldehyde (23), mixed with complete (first injection) or in CFA (second and third boosts). The rabbit received three injections given weekly, after which it was bled weekly for 3 wk. The serum was collected and stored at −70°C.
Specific anti-peptide Abs were obtained by affinity chromatography. The synthetic peptide was coupled to activated CH-Sepharose 4B (Pharmacia) according to the manufacturer’s instructions. The peptide column was equilibrated with PBS containing 10 mM EDTA, and 2 ml of the rabbit antiserum containing 10 mM EDTA was passed over the column twice. Unbound protein was washed with PBS, and the specific Ab was eluted with 0.1 M glycine/HCl, pH 2.5. The pH of the eluted fractions was immediately neutralized by the addition of 1 M Tris-HCl, pH 8. Affinity-purified Ab was stored at −70°C.
Purification of SpC3 protein from coelomic fluid
Sea urchin C3 was partially purified from coelomic fluid using modifications to published methods (24). Coelomic fluid was pooled from several sea urchins to which had been added EDTA (3 to 15 mM, final concentration), PMSF (2 mM, final concentration; Sigma Chemical Co.), and pepstatin A (1 to 100 μM, final concentration; Sigma Chemical Co.). Coelomocytes were pelleted, and the cell-free fluid was stored at −70°C until further purification could be conducted. Forty milliliters of sea urchin coelomic fluid were concentrated with an Amicon filter (10-kDa cut-off) (Amicon, Bedford, MA) to 2 ml, and the sample was passed through a PD-10 gel filtration column (Pharmacia) to exchange the buffer to 10 mM phosphate, pH 7.5. The concentrated coelomic fluid was precipitated with 4% polyethylene glycol, by stirring for 30 min at 4°C, followed by centrifugation at 15,000 × g for 20 min. The supernatant was brought to 16% polyethylene glycol by stirring at 4°C for 30 min and centrifuging as before. The pellet was then resuspended in 10 mM phosphate buffer, pH 7.5, and applied to a Mono Q HR 5/5 anion exchange chromatography column (Pharmacia) that had been equilibrated with the same buffer. Bound proteins were eluted with a linear salt gradient (0–500 mM NaCl), and fractions containing SpC3 were identified by gel electrophoresis and Western blotting using the affinity-purified anti-SpC3 peptide Ab.
Determination of NH2-terminal amino acid sequence
Purified protein was subjected to SDS-PAGE under reducing conditions and electroblotted onto ProBlott membranes (Applied Biosystems), and the NH2-terminal sequences of the protein chains were obtained by using a modification of the method of Matsudaira (25), as previously described (26). The individual SpC3 chains were separated by SDS-PAGE, electroblotted, cut out of the filter, and subjected to Edman degradation, using an 473 protein sequencer (Applied Biosystems).
Isolation and sequencing the SpC3 cDNAs
A cDNA, identified as encoding a putative sea urchin complement component, was one of 307 clones that were partially characterized from the LPS-activated coelomocyte λExCell library, and was reported as expressed sequence tag 064 (EST064) (15). In preliminary BLAST (20) searches of GenBank, EST064 matched to cobra C3 (27), human and mouse C3 (28, 29), mouse sex-limited protein (30), in addition to a few others. This result suggested that EST064 encoded a new member of the thioester gene family. The 5′ end of pExCell064 (Fig. 1) was used to rescreen the same library to obtain additional clones. The initial screen yielded 144 clones, 43 of which rescreened positive, and 12 of which were chosen for sequencing based on the size differences of their 5′ ends. Sizes were determined by PCR using Sp6, a primer that hybridizes to the 5′ polylinker of the phagemid, and an insert-specific primer that hybridized to the 5′ end of pExCell064. Because this set of clones had approximately 100- to 200-bp increment differences in size, this created a natural deletion set for sequencing. Three (pExCell139, 063, and 054) of the 12 clones from this set are shown on Figure 1. The sequence obtained from the 5′ ends of these 12 clones was published as the deduced protein, and was shown aligned with other complement protein family members (see Ref. 15, Fig. 1). However, at the time, it was not clear whether SpC3 was homologous to C3, C4, or C5, although it did not appear to be α2 M (15).
Because the total sequence length of these overlapping clones (shown in Fig. 1 as pExCell054, pExCell063, pExCell064, and pExCell139) covered only about one-half of the estimated open reading frame and only about one-half of the transcript, the λExCell library was screened a second time. The first 450 bp of pExCell054, the 5′-most clone of the set (Fig. 1), was used to make another riboprobe that yielded 84 clones, of which 18 rescreened positive. The clone with the longest 5′ end, pExCell14 (Fig. 1), was sequenced in its entirety using nested set deletion clones. However, the beginning of the open reading frame was not obtained. The third library screen used the arrayed coelomocyte library constructed by Jonathan Rast and Eric Davidson at the California Institute of Technology (Pasadena, CA). The 32 positive clones obtained from this screen were analyzed by PCR using the Sp6 and T7 primer sites in the vector to determine insert sizes, and the Sp6 primer in combination with an internal primer specific for the 5′ end of pExCell14 to determine the sizes of the 5′ ends. These PCR-amplified fragments were also checked by Southern blots using the same probe as that used to screen the library. Analysis of the 5′ end of pSPORTA22/137, pSPORTG11/211, and pSPORTJ17/96 showed an additional 2 kb 5′ of pExCell14. Sequence analysis of two clones, pSPORTA22/137 and J17/96 (Fig. 1), indicated that the 5′ end of the open reading frame and a short stretch of the 5′ untranslated (UT) region were included.
The complete sequence of Sp064 and the deduced protein are shown in Figure 2. There are 7611 nucleotides (nt) in the overlapping cDNAs, which are missing much of the 5′ UT region and part of the 3′ UT region of the transcript since, by RNA gel blot, the transcript is 9 kb (see below). Because a random primer was used to construct the cDNA library, the poly(A)+ tail is not expected to be identified in any clone. However, a consensus polyadenylation signal, GATAAA, is located 93 nt from the 3′ end (Fig. 2), which suggests that most of the 3′ UT region may be present. The sequence shows 129 nt in the 5′ UT region (based on the most probable start site), 5097 nt in the open reading frame, and 2385 nt in the 3′ UT region. There are three possible start sites near the beginning of the sequence shown in Figure 2. The first ATG is followed by two in-frame stop codons, while the second and third are in-frame and are not followed by stops. Because the third ATG, located at nt 130, is surrounded by a Kozak sequence, ACCATGG, this suggests that this is the most probable start site for translation. Following the start site, there is a hydrophobic region of 12 amino acids plus a serine, which is followed by a hydrophilic region that includes serine, proline, and glycine (31). This combination of a short hydrophobic and hydrophilic regions is typical of a leader or signal sequence of 13 amino acids and is based on the “[-3,-1]-rule” (32). A leader region is expected to be present since SpC3 is produced in the coelomocytes and appears to be secreted into the coelomic fluid (see Fig. 5; Gross and Smith, unpulished). There are six ATTTA repeats in the 3′ UT region (Fig. 2). These AU-rich repeats are typical of transcripts that encode inducible genes and may function to stabilize the transcript after induction, which results in increased translation (33).
The open reading frame encodes a protein of 1699 amino acids. The cleavage site for processing SpC3 into two chains before being secreted is located at Arg686 and consists of RRKR (Fig. 2). The β-chain has 672 amino acids (without the leader), and the α-chain has 1010 amino acids after removing the RRKR at the βα junction. The deduced m.w. for the preprocessed SpC3 is 186 kDa, and after processing the deduced α-chain is predicted to be 110 kDa and the β-chain 73.5 kDa. These predicted m.w. do not take into consideration the possibility of N-linked glycosylation, which is known for human C3. There are five consensus recognition sequences for N-linked glycosylation in SpC3, four of which are located in the α-chain (Fig. 2); however, none are conserved in position compared with glycosylation sites in human C3 (34). The deduced isoelectric point for SpC3, as calculated by DNASIS, is 4.76. This is substantially lower than the range of isoelectric points 5.87 to 7.99, which was calculated for other C3, C4, and C5 proteins that are listed in Table I.
|Protein, Species .||Accession No. .||% Identity .||% Similarity Plus Identity .|
|C3, guinea pig||sp‖P12387||27.5||41.1|
|α2M, guinea pig||pir‖JC5143||20.4||37.6|
|α1 inhibitor, rat||sp‖P14046||21.4||37.3|
|Protein, Species .||Accession No. .||% Identity .||% Similarity Plus Identity .|
|C3, guinea pig||sp‖P12387||27.5||41.1|
|α2M, guinea pig||pir‖JC5143||20.4||37.6|
|α1 inhibitor, rat||sp‖P14046||21.4||37.3|
The Clustal W alignment program (21) was used to align SpC3 to each of the thioester proteins listed. The percent identical and percent similar plus identical amino acids were calculated based on the SpC3 length of 1699 amino acids. sp, Swiss Protein database; pir, PIR database; gb, GenBank database.
SpC3 protein alignments to other thioester family proteins
Regions of the deduced SpC3 protein that are similar to other complement proteins can best be estimated through amino acid alignments. The BLAST search provided a list of proteins to which SpC3 matched best, and we chose a subset of these to construct an alignment that included some of the complement proteins (Fig. 3). Inspection of Figure 3 reveals a number of regions in SpC3 that are conserved when compared with the other complement sequences and other regions that are not conserved. All proteins in Figure 3 show a βα junction, including SpC3, whereas only the C4 proteins and the cyclostome C3 components show an αγ junction. A conserved thioester site (GCGEQ) is located in the SpC3 α-chain, identical to that seen in vertebrate C3 and C4 proteins, but that is not present in C5. The histidine involved in thioester binding to hydroxyl groups (35), located about 100 amino acids (H1090 in Fig. 2) toward the C terminus, is conserved in SpC3 (Fig. 3). The hydrophobic region surrounding the thioester site is also conserved in SpC3, the function of which is thought to shield the thioester from the aqueous environment and nucleophilic attack (36). In vitro mutagenesis experiments have shown that the two prolines surrounding the thioester in human C3 are necessary for stable formation of the activated thioester (37), and these positions are conserved in SpC3 (Fig. 3). This analysis of SpC3 compared with vertebrate complement components indicates that it is a two-chain structure with a conserved thioester site similar to other C3 proteins.
Complement proteins are folded and held together by the disulfide bonds. In human C3, there are 27 cysteines that align with the other cysteines in the thioester complement protein family. However, SpC3 has 32 cysteines, and we were interested to understand how this amino acid aligned with cysteines in the other members of the thioester protein family. Alignments between SpC3 and the other complement components using Clustal W (Fig. 3) showed that five of the SpC3 cysteines are located in the β-chain, three of which align with the three conserved cysteines in the β-chain of all of the other components. In the α-chain, 18 of the 27 cysteines align with conserved cysteines in other components. The two cysteines involved in interchain disulfide bonding in the human sequence (38) are aligned between the sea urchin sequence and all of the other proteins (Fig. 3). To understand more fully whether the nonaligned cysteines in SpC3 were due to differences between the protein sequences or due to errors made by the Clustal W algorithm, additional alignments were performed to inspect the positioning of the cysteines in SpC3. Both Clustal W and DNASIS were used to align the entire thioester protein family and a subset that included just the complement proteins. The Clustal W alignment of the complement protein subset is shown in Figure 3, and the Clustal W alignment for the entire thioester family can be obtained by E-mail (see legend to Fig. 4; DNASIS alignments are not shown). Results of these four alignments revealed similar, but not identical, positioning of the cysteines in SpC3 relative to the other proteins. Most (22 to 24) of the SpC3 cysteines align with or near cysteines that are generally conserved in the complement protein family. There are four to five positions in which cysteines are missing in SpC3, where the other proteins showed a conserved cysteine, and SpC3 has seven to nine extra cysteines that do not align with other proteins. In no case did any of the extra cysteines in SpC3 align with cysteines in the α2 M group rather than the complement group. Interpretation of these data may suggest that SpC3 has a similar, but perhaps not identical folding pattern compared with the other complement proteins.
The human C3 protein is the most fully characterized of all of the complement proteins, and functional regions have been mapped using a variety of techniques (reviewed in 10 . These functionally mapped regions have been taken from Figure 4 in Lambris et al. (36), and are indicated in Figure 3 for comparisons between SpC3 and human C3. There are nine matches between SpC3 and human C3 in a span of 43 amino acids (20.9%) within the region in which human C3 interacts with the C3a receptor, CR1 and CR3, factor H and factor B. This region is generally not well conserved in vertebrate C3 proteins (8), except for the C3 convertase cleavage site (Arg, Ser), which is conserved in most of the C3 proteins including SpC3. The second factor H binding site located farther down in the α-chain, within which is located the CR2 binding site, reveals 18 amino acids in SpC3 that match to human C3 in a span of 76 (23.7%). Eleven of thirty-two amino acids (34.3%) match between SpC3 and human C3 within the properdin binding site that is known as a region of high conservation (8). There are five factor I cleavage sites labeled in Figure 3 that are positioned relative to the human C3 sequence (10). Of those sites, none match perfectly to SpC3; however, there are two Arg/Ser sequences in the SpC3 that align near factor I sites 1 and 5 (Fig. 3). In general, the alignment with other complement proteins and the comparison with functional sites in human C3 suggest that the sea urchin protein may have a different folding pattern and fewer functions, and may not interact with as many other proteins as is known for human C3.
Phylogenetic relationships between SpC3 and other thioester family proteins
Our first approach to understanding the phylogenetic relationships between SpC3 and other thioester family proteins was to generate pairwise alignments between SpC3 and 25 sequences that included several C3, C4, C5, and α2 M proteins from a number of different species. We then calculated the percentages of amino acids that showed identical matches and the percentages that were identical plus similar between the two proteins. The results, shown in Table I, indicate that SpC3 showed greater similarity to the complement proteins (23.1–27.9% identical; 41.1–44.9% similar) than to the α2 M family (20.4–21.4% identical; 36.6–39.2% similar) in general agreement with our previous report (15). However, these results could have been due to the shorter length of α2 M proteins and not due to fewer amino acid identities and similarities between SpC3 and the α2 M proteins. Consequently, we generated a large alignment using all of the sequences listed in Table I with the Clustal W program (21) (available by E-mail, see legend to Fig. 4). The alignment was inspected at each amino acid position, and SpC3 was scored as identical to each group of proteins based on the number of sequences to which it matched. SpC3 was scored as similar to C3 when it matched to six of nine of the vertebrate C3 sequences; similar to C4 when it matched to four of five of the C4 sequences; similar to C5 when it matched to both of the C5 sequences; and similar to α2 M when it matched to six of nine of the α2 M sequences. Results indicated that SpC3 was equally similar to all of the complement components (19%) and was less similar to the α2 M proteins (15%) for this alignment set. A similar result was found when the matches were calculated for a shorter region that corresponded only to the α2 M protein length (20% similar to complement components and 17% similar to the α2 M group). These alignment analyses suggest that SpC3 is a complement protein and is not an α2 M protein.
Since the paired alignments (Table I) were not informative as to the relationships between SpC3 and the other complement protein family members, a phylogenetic analysis was done using the PAUP program (22). This program is designed to compare and assemble related sequences into phylogenetic trees that can then be used to infer evolutionary relationships. We used the same alignment between SpC3 and the 25 other thioester proteins listed in Table I in a phylogenetic analysis using PAUP. We were able to repeatedly identify the shortest tree (length = 13614) when we used a number of variations in generating the alignment with Clustal W, which were then used in the PAUP program. Based on the lower percentage identity between SpC3 and the α2 M group (Table I), these proteins were selected as the outgroup. The phylogenetic tree (Fig. 4) shows that SpC3 is positioned basal to the complement clade that includes C3, C4, and C5 proteins. Furthermore, the branch arrangement within the vertebrate complement clade generally agrees with other published trees (26, 39, 40); however, the internal details of the positioning of some of the proteins differed in our trees depending on the protocols used. For example, the hagfish component clustered with either the C3 or the C4 clade in different analyses. It should be noted that the hagfish C3 position in Figure 4 is not supported by bootstrapping and the position of the lamprey C3 is poorly supported. The important result revealed in Figure 4 is that SpC3 appears as the first diverging member of the complement protein family.
We also investigated the α-chains of the thioester proteins because this chain includes many important functional regions. We chose 490 amino acids that started at the beginning of the α-chain, included the thioester site, and terminated with the end of the α2 M proteins. We used this region in another alignment and phylogenetic analysis, again employing the α2 M proteins as the outgroup. The resulting tree had a different appearance from that seen in Figure 4. The complement clade had six unresolved groups that included 1) sea urchin C3, 2) hagfish C3, 3) lamprey C3, 4) higher vertebrate C3, 5) vertebrate C4, and 6) mammalian C5 (results not shown). This decreased resolution perhaps reflects the similarity of the α-chains that contain significant sequence conservation in all of the complement proteins.
Analysis of SpC3 protein by SDS-PAGE and N-terminal sequencing
The presence of a single junction in the deduced, unprocessed protein predicted that SpC3 would have two chains. To test this and to characterize the size of the protein, SpC3 was partially purified from coelomic fluid and was separated by SDS-PAGE under reducing and nonreducing conditions (Fig. 5). The nonreduced protein is 210 kDa (Fig. 5, lane 1), and under reducing conditions (lane 2), two chains are resolved as the α-chain (130 kDa) and the β-chain (80 kDa). These observed sizes are larger than the sizes deduced from the cDNA sequence, suggesting that some or all of the consensus N-linked glycosylation sites are filled during SpC3 processing by the coelomocyte.
A rabbit antiserum was raised against a peptide designed from the deduced sequence in the α-chain (see Fig. 2). On Western blots of reducing gels, the antiserum bound to the larger α-chain (Fig. 5, lane 3). To ensure that the protein isolated from the coelomic fluid was encoded by Sp064, the N terminus of both chains was sequenced. The β-chain was found to be blocked; however, the α-chain gave the peptide, SIDRDQLXLYDP. In sequencing the cDNA encoding the N terminus of the α-chain, we found the following peptide: SIDRDQLCLYDP (see Fig. 2). This identical match is evidence that the protein isolated from the coelomic fluid is the same as that encoded by Sp064.
Sp064 gene expression in sea urchin tissues
To determine whether coelomocytes were the only tissue to express the Sp064 gene, we probed poly(A)+ RNA isolated from the major adult tissues (coelomocyte, ovary, testis, gut) and found that Sp064 is expressed exclusively in the coelomocytes (Fig. 6). The transcript size is approximately 9 kb, which is longer than the total length of the overlapping cDNAs that we have sequenced (9-kb transcript, 7.6-kb cDNA sequence). Very weak bands appeared in all lanes after very long exposures (data not shown), with expression in gonads being higher than that in the gut. Although the other tissues appear to have low expression of the Sp064 gene, it is likely that coelomocytes were present in or on these organs at the time of dissection and total RNA isolation. Since it is not possible to wash or remove all coelomocytes from other sea urchin tissues during RNA isolation, these cells may account for the weak bands (Fig. 6). These data indicate that coelomocytes are the major or perhaps the only source of Sp064 gene expression in the adult sea urchin.
Sp064 gene copy number
In most animals, C3 is a single copy gene; however, gene duplication events are known to have occurred in some organisms. Examples include C4A and C4B in humans (41), C4 and sex-limited protein in mice (30, 42), and multiple gene copies of C3 in trout (24, 43) and cobra (27, 44). To determine whether Sp064 is a single or multiple copy gene per haploid genome, we analyzed a genome blot. Three male sea urchins were treated to 15 V (direct current) and shaken to induce spawning. Sperm was collected and DNA was isolated according to Lee et al. (45). Each sample was digested with three endonucleases (EcoRI, KpnI, BamHI), and the genome blot was analyzed with a riboprobe that corresponded to the α-chain region of the message (500-bp fragment from the 5′ end of pExCell054). Only one or two bands were seen in each lane, indicating that Sp064 is a single copy gene (data not shown).
Sea urchins display a nonspecific, innate type of immune response. This was first determined from an extensive series of allograft rejection experiments, in which the kinetics of the second set rejections were found to be the same as that of the third party rejections, and both second set and third party rejection rates were significantly faster than that for first set rejections (46; reviewed in 16 . Furthermore, general activation of the sea urchin immune response has been inferred from increases in profilin transcripts in coelomocytes responding either to injury (47) or to LPS (14). This inference is based on the fact that 1) profilin is a key regulatory protein involved in modifying the actin cytoskeleton (48), and 2) amoeboid, phagocytic cells, such as the major coelomocyte cell type, readily change their shape, i.e., their cytoskeleton, to increase motility, phagocytosis, and encapsulation upon being activated by immune challenge or injury.
The nonspecific, activatible immune response in the sea urchin is very effective at maintaining a healthy animal; however, little is known about the molecular mechanisms of gene expression or protein function through which this system functions. Our preliminary study of sea urchin ESTs was the first molecular evidence that a simple complement system exists in the sea urchin (15), and we report in this work the molecular characterization of SpC3. This is the first identification of a C3 homologue that is expressed in a sea urchin, and furthermore, it is the first complement component to be identified in an invertebrate. Homology of SpC3 with vertebrate C3 proteins is based on several pieces of evidence. 1) Sequence analysis reveals a βα junction, the absence of an αγ junction (which is present in C4), and the conserved thioester site (which is absent in C5). 2) Protein gel analysis shows that SpC3 is composed of two chains. The combination of a two-chain molecule with a conserved thioester site suggests that SpC3 is a C3 complement homologue. Calculations of identities and similarities of paired alignments shown on Table I plus the analysis of the alignment and the phylogenetic tree all show that SpC3 is not an α2 M homologue and that it is the first diverging member of the thioester complement protein family.
In mammals, the primary site of C3 biosynthesis and secretion is the liver, with more than 90% of all C3 being produced by hepatocytes (10). However, several other cell types appear to produce and secrete C3 besides the liver, including macrophages, monocytes, fibroblasts, B lymphocytes, polymorphonuclear leukocytes, type II pneumocytes, astrocytes, and microglial cells (10, 49, 50, 51, 52, 53). These extrahepatic sites of complement production are very important in local inflammatory reactions. Since sea urchins do not have an equivalent of a liver, or a hepatopancreas that is found in sea stars, the specific expression of the Sp064 gene in coelomocytes suggests that these cells are involved in producing SpC3 for both local and systemic function. Although we cannot rule out the possibility that all major tissues express Sp064 based on the minor bands present in all lanes in Figure 6, one of the functions of coelomocytes may involve patrolling and invasive activities within organs when responding to microinjuries or focal sites of inflammation. This possibility has been suggested previously and was based on several ESTs encoding putative proteases that may be capable of degrading the extracellular matrix (15). Consequently, the presence of coelomocytes in all tissues of the adult sea urchin would be revealed by low levels of Sp064 expression.
Simple complement systems have been identified in agnathan fishes and consist of a C3-like component (54, 55, 56, 57), factor B (58), and a putative complement receptor on circulating leukocytes (59). Complement in hagfish has been shown to function as an opsonin (60, 61). Although little is known about complement in tunicates, a mannan-binding lectin-associated protease homologue has been characterized from a tunicate, suggesting the presence of a lectin activation pathway (62) and other complement components may be present in these animals. Preliminary sequence data from a PCR fragment have indicated that a thioester protein may be present in the compound tunicate, Botryllus schlosseri (63). Based on what is known about complement function in lower vertebrates, SpC3 may function as an opsonin and be a very important mechanism for host protection against pathogens.
The presence of a C3 protein in sea urchins may suggest that a number of accessory and regulatory proteins may also be present in this organism. If the sea urchin complement acts as an opsonin functioning to identify foreign cells for removal and destruction by phagocytic coelomocytes, this suggests that a C3 receptor should be present on these cells. Preliminary evidence for a receptor putatively associated with the sea urchin complement system has been reported previously, suggesting that it might be involved in augmented phagocytosis (64, 65, 66, 67). In vertebrates, spontaneously activated C3 can be bound to self cells in the form of C3b, which is then inactivated by factor I that functions with a number of cofactors such as membrane cofactor protein, factor H, and CR1 (for review, see Refs. 68, 69). Decay-accelerating factor is an additional cell surface protein that may dissociate the C3b-factor B complex, called C3bBb, thereby deactivating its C3 convertase activity (70). Predictions of regulatory proteins in the sea urchin at present can only be based on the presence of a few conserved cleavage sites that have been identified in deduced SpC3 sequence. These include sites for factor I and C3 convertase, which is formed through an interaction between C3b and factor B. At present, none of these predicted proteins have been cloned and sequenced in the sea urchin, except one. We have characterized recently the complete open reading frame from EST152 (15), and it encodes a factor B-like protein (Smith, unpublished).
Lachmann (71) has proposed that the most primitive complement cascade or “archeo-complement” system would have resembled a simple alternate pathway consisting of a C3-like protein with a thioester site, a factor B-like protein containing short consensus repeats and a serine protease domain, and a complement receptor on phagocytic immune cells. Based on the sequence similarities among various complement components, it has been suggested that several of the complement protein families have been generated by gene duplication from a small number of primordial genes (35, 39, 72). Previous to the current work, the agnathans appeared to fulfill the prediction of an archeo-complement system, inferring not only that the alternate cascade was more ancient than the classical cascade, but that it was present in the common ancestor of the vertebrates. It is now clear, with this first characterization of a C3 homologue from the phylogenetically older deuterostome phylum, the echinodermata, that this protein represents the first diverging complement component. Furthermore, it is interesting to consider the possibility that it may still bear some similarities to the ancestral protein that functioned in the common ancestor of the deuterostomes that gave rise through gene duplication to the complement family of thioester proteins that functions in the higher vertebrates today.
The authors thank Drs. Jonathan Rast and Eric Davidson for constructing the arrayed coelomocyte cDNA library and for screening the library to obtain additional Sp064 cDNA clones. We thank Lynn Spruce for her excellent technical assistance in running the peptide synthesizer. We are grateful to Drs. Diana Lipscomb and Marc Allard for advice on the phylogenetic analysis, to Drs. Paul Gross and Jonathan Rast for critically reading the manuscript, to Drs. Arvind Sahu and William Moore for helpful suggestions, and to the two anonymous reviewers for improvements and corrections.
This work was supported by grants from the National Science Foundation (NSF) (MCB9219330, MCB9596251, and MCB9603086) to L.C.S., and grants from NSF (MCB931911), the National Institutes of Health (AI 300040), and the Cancer Center and Diabetes Centers Core Support (CA 16520 and DK 19525) to J.D.L. GenBank accession number for the sea urchin complement component is AF025526.
Abbreviations used in this paper: EST, expressed sequence tag; α2 M, α2-macroglobulin; nt, nucleotide; UT, untranslated.