The evolution of the IFN system, the major innate antiviral mechanism of vertebrates, remains poorly understood. According to the detection of type I IFN genes in cartilaginous fish genomes, the system appeared 500 My ago. However, the IFN system integrates many other components, most of which are encoded by IFN-stimulated genes (ISGs). To shed light on its evolution, we have used deep RNA sequencing to generate a comprehensive list of ISGs of zebrafish, taking advantage of the high-quality genome annotation in this species. We analyzed larvae after inoculation of recombinant zebrafish type I IFN, or infection with chikungunya virus, a potent IFN inducer. We identified more than 400 zebrafish ISGs, defined as being either directly induced by IFN or induced by the virus in an IFNR-dependent manner. Their human orthologs were highly enriched in ISGs, particularly for highly inducible genes. We identified 72 orthology groups containing ISGs in both zebrafish and humans, revealing a core ancestral ISG repertoire that includes most of the known signaling components of the IFN system. Many downstream effectors were also already present 450 My ago in the common ancestor of tetrapods and bony fish and diversified as multigene families independently in the two lineages. A large proportion of the ISG repertoire is lineage specific; around 40% of protein-coding zebrafish ISGs had no human ortholog. We identified 14 fish-specific gene families containing multiple ISGs, including finTRIMs. This work illuminates the evolution of the IFN system and provides a rich resource to explore new antiviral mechanisms.
All living organisms are targeted by viruses, and evolution has given rise to various antiviral strategies. Vertebrates possess many unique immune features, including their principal innate antiviral system based on signaling by type I IFNs. Type I IFNs induce the expression of hundreds of proteins encoded by IFN-stimulated genes (ISGs), making cells refractory to viral infection (1). The origin of the IFN system, which seems to have replaced the RNA interference antiviral system still used by plants and most invertebrates (2, 3), is shrouded in mystery. Cartilaginous fish are the most basal clade with genomes containing recognizable type I IFN genes (4, 5), suggesting that this system appeared ∼500 My ago with jawed vertebrates. The IFN system integrates many components, most of which are encoded by ISGs, which can be traced back in genomes from distant clades. However, finding the ortholog(s) of a human ISG in another taxon does not imply that this gene is part of its IFN system. To understand the evolution of antiviral immunity, it is therefore desirable to establish how the repertoire of ISGs changed from early to modern vertebrates. This can be inferred by comparing the ISGs of current living representatives of distant vertebrate taxa.
Bony fishes (hereafter simply called fish) diverged from the tetrapod lineage ∼450 My ago, and because viral infections are a major problem in aquaculture, their IFN system has been the subject of many studies, as reviewed in Ref. 6–9. Teleost fish possess several subgroups of type I IFNs (but no type III genes), with strong variation in gene numbers among fish taxa (8). The zebrafish possess four type I IFN genes, named ifnphi1-4; only ifnphi1 and ifnphi3 are active at the larval stage (10). Their receptors have been identified (10). Even before fish IFNs were known, the first fish ISGs were identified by homology cloning from cells stimulated by poly(I:C) (11) or by differential transcript analysis of cells infected by viruses (12, 13). Because many virus-induced genes (vig) were homologous to well-known mammalian ISGs, they were hypothesized to be IFN inducible, which was often confirmed by later studies, as in the case of vig-1, the rsad2/viperin ortholog (12, 14). Similarly, upon cloning of fish IFNs, induction of Mx (11) was used as a readout for their activity (15–17), confirming it was an ISG. The list of fish homologs of known ISGs rapidly grew with the release of new fish genomes and expressed sequence tag (EST) collections, allowing the development of microarrays to study fish response to virus or recombinant type I IFNs (18, 19). Candidate gene approaches were also developed, testing orthologs of known mammalian ISGs in quantitative RT-PCR (qRT-PCR) assays in multiple fish infection models (14, 20, 21). In parallel, approaches without a priori identified fish ISGs that had no ortholog in mammals, although they belonged to gene families involved in antiviral immunity. A large set of tripartite-motif protein–encoding genes, called fintrims (ftr), distantly related to trim25 was identified in rainbow trout cells as a induced by virus infection (13) and later shown to form multigene families in teleosts, particularly extensive in zebrafish (22). Similarly, a family of IFN-induced ADP-ribosyltransferases named gig2 was identified in crucian carp cells treated with UV-inactivated grass carp hemorrhage virus (23, 24). Some ISG were restricted to particular fish groups, such as the noncoding RNA vig-2 that is found only in salmonids (25).
We previously established a list of zebrafish candidate ISGs using microarray analysis (26). For this, we compared the response to a poor IFN inducer, infectious hematopoietic necrosis virus (27) and a strong IFN inducer, chikungunya virus (CHIKV) (28). However, the array did not include the full complement of zebrafish genes, and the study identified vig that were not necessarily ISGs. In this study, to directly identify ISGs, we analyze the transcriptional response of zebrafish larvae injected with recombinant type I IFN. We rely on deep RNA sequencing (RNAseq), which is intrinsically quasi-exhaustive. Our approach is therefore limited mainly by the quality of genome assembly and annotation, which is excellent for the zebrafish (29). We complemented this analysis with a study of the response to CHIKV and its dependence to expression of the zebrafish IFNR chains crfb1 and crfb2 (10). We thus established a comprehensive list of ISGs of zebrafish larvae and performed a detailed comparison with the human ISG repertoire. Our comparative analysis was facilitated by a compilation of human ISGs made to perform a systematic screen (30) and by the specialized database Interferome (31). We identify ∼70 orthology groups that include ISG in both species and thus approximate the ISG repertoire of the common ancestor of all Osteichthyes. As ISGs typically evolve fast, with frequent duplications and gene loss, we also identify many families of fish-specific ISGs that represent a rich resource for seeking new antiviral mechanisms. Our study provides a broad overview of the evolutionary patterns of genes belonging to the type I IFN pathway and identifies gene modules induced by a viral infection independently of IFN.
Materials and Methods
Wild-type AB zebrafish, initially obtained from the Zebrafish International Resource Center (Eugene, OR), were raised in the Institut Pasteur facility. Animal experiments were performed according to European Union guidelines for handling of laboratory animals (http://ec.europa.eu/environment/chemicals/lab_animals/home_en.htm) and were approved by the Institut Pasteur Animal Care and Use Committee. Eggs were obtained by marble-induced spawning, cleaned by treatment with 0.003% bleach for 5 min, and then kept in petri dishes containing Volvic source water at 28°C. All timings in the text refer to the developmental stage at the reference temperature of 28.5°C. At 3 d postfertilization (dpf), shortly before injections, larvae that had not hatched spontaneously were manually dechorionated. Larvae were anesthetized with 200 μg/ml tricaine (A5040; Sigma-Aldrich) during the injection procedure.
IFN and virus inoculation
Recombinant zebrafish IFNφ1 (10), kindly provided by R. Hartmann (University of Aarhus, Aarhus, Denmark), was inoculated by i.v. injection in the caudal cardinal vein. One nanoliter of 1 mg/ml IFNφ1, or, as a control, BSA (New England Biolabs) in ×1 PBS/10% glycerol, was injected. CHIKV infections were performed as described (26, 28). Briefly, ∼200 PFU of CHIKV115 was injected i.v. in a volume of 1 nl at 3 dpf.
Morpholino antisense oligonucleotides (Gene Tools) were injected at the one-to-two cells stage as described (32). Two nanograms of crfb1 splice morpholino (5′-CGCCAAGATCATACCTGTAAAGTAA-3′) was injected together with 2 ng of crfb2 splice morpholino (5′-CTATGAATCCTCACCTAGGGTAAAC-3′), knocking down all type I IFNR (10). Control morphants were injected with 4 ng of control morpholino (5′-GAAAGCATGGCATCTGGATCATCGA-3′) with no known target.
For RNAseq analysis, total RNA was extracted from replicate pools of 10 injected larvae (at 6 h postinjection for IFNφ1 treatment, or 24 h postinjection for CHIKV infections), using TRIzol (Invitrogen), following the manufacturer’s protocol. The integrity of the RNA was confirmed by laboratory-on-chip analysis using the 2100 Bioanalyzer (Agilent Technologies), using only samples with an RNA integrity number of at least 8.
Libraries were built using a TruSeq mRNA Library Preparation Kit (Illumina), according to the manufacturer’s recommendations. Quality control was performed on an Agilent Bioanalyzer. Sequencing was performed on a HiSeq 2500 System (Illumina) and produced 65-base, single-end reads.
Mapping reads and gene expression counts
Sequences were trimmed using Cutadapt (v1.8.3). The reads quality was checked with FastQC. Reads were then spliced-aligned to the zebrafish genome (GRCz10, Ensembl release 88) using TopHat2 (v2.0.14). The average number of mapped read per sample was 16.5 million. Only fragments mapping coherently and unambiguously to genes have been considered for gene counts. Gene counts have been assigned using featureCounts v1.5.2 (33).
Identification of differentially expressed genes
Differentially expressed nuclear genes between larvae treated with IFNφ1 and controls, between larvae infected by CHIKV and controls, or between crfb1+2 and control morphants all infected by CHIKV were identified. Differentially expressed genes were identified using DESeq 1.18.0 (Bioconductor) (34) and R: 3–1-2 (35). Briefly, raw counts of genes were subjected to a minimal prefiltering step: genes for which the count sum, per group of samples, was equal or higher than 10, in at least one group, were kept. Raw counts were normalized for library size, and normalized data were fitted using a negative binomial general linear model. Data were adjusted for multiple testing using the Benjamini–Hochberg procedure (adjusted p value). Genes with an adjusted p value < 0.01 and an absolute fold change (FC) > 2 or FC < 0.5 were considered as differentially expressed genes.
Sequence data were registered in the BioProject National Center for Biotechnology Information database (https://www.ncbi.nlm.nih.gov/bioproject) with the Sequence Read Archive accession number: PRJNA531581.
Identification of human orthologs and ISGs
Orthology analysis was primarily based on data from the Ensembl database (www.ensembl.org), the zfin database (zfin.org), and the literature, notably, our previous analysis of zebrafish orthologs of human ISGs (26). Data were systematically curated manually and conflicts resolved using a combination of literature search, synteny analysis, and sequence homology analysis (two-way protein BLAST). When human genes were present on the list compiled by Schoggins et al. (30), they were labeled as ISGs. If absent from the list, gene names were further queried on the Interferome Web site (http://www.interferome.org/interferome/search/showSearch.jspx), which compiles the results of many transcriptomic studies on human and mouse samples after IFN stimulation (31). We postulated that human genes present in Interferome could be considered as ISG when being significantly induced at least 2-fold with a stimulation for no more than 12 h by type I IFN in at least four datasets.
RNA was extracted from individual larvae using RNeasy Mini Kit (QIAGEN). cDNA was obtained using Moloney Murine Leukemia Virus H Minus Reverse Transcriptase (Promega) with a dT17 primer. Quantitative PCR was then performed on an ABI 7300 thermocycler (Applied Biosystems) using Takyon ROX SYBR 2× MasterMix (Eurogentec) in a final volume of 25 μl. The following pairs of primers were used: ef1a (housekeeping gene used for normalization), 5′-GCTGATCGTTGGAGTCAACA-3′ and 5′-ACAGACTTGACCTCAGTGGT-3′; mxa, 5′-GACCGTCTCTGATGTGGTTA-3′ and 5′-GCATGCTTTAGACTCTGGCT-3′; ddx58, 5′-ACGCCGGAGAAAGAATTTTTC-3′ and 5′-TCGACAGACTCTCGATGTTG-3′; and aqp9a, 5′-CTGTACTACGACGCCTTCAT-3′ and 5′-GAGAATACAGAGCACCAGCA-3′.
RNAseq analysis of IFNφ1-regulated genes
To make an inventory of zebrafish ISGs, we first injected 3-dpf larvae with recombinant zebrafish IFNφ1, the first type I IFN to be identified in zebrafish, or BSA as a negative control. Based on preliminary kinetic experiments, we chose 6 h postinjection as the early plateau phase of ISG expression (Supplemental Fig. 1A). RNA was extracted from multiple pools of 10 larvae and subjected to deep sequencing using an Illumina-based platform sequencing. Reads were mapped to zebrafish genome (zv10), and the differential analysis performed using the DESeq package.
Choosing as cutoff values adjusted p values < 5% and FC > 2, we identified 360 IFNφ1 upregulated genes (which are ISGs, by definition) and 75 downregulated genes (Supplemental Table I, Tab 1). As expected, genes with high basal expression levels tended to display lower FC (Fig. 1A). The top IFNφ1-upregulated genes (with FC > 100) comprised many well-known ISGs, many previously used as IFN signature genes in zebrafish, including several mx genes, rsad2, cmpk2, several ifit genes, the ubiquitin-like isg15, the helicase dhx58 (also known as lgp2), the kinase pkz, the transcription factor stat1b, and the chemokine ccl19a.2 (Fig. 1B). To our surprise, ddx58 (encoding RIG-I), a well-known and conserved ISG (36), was not found in that list. In fact, the gene model is missing on the zebrafish reference genome, with only fragments of the sequence present in the assembly. Therefore, we performed qRT-PCR for ddx58 and confirmed that it is induced by IFNϕ1 (Supplemental Fig. 1B). We searched the list of zebrafish orthologs of human ISGs (26) for more genes not annotated on the reference genome and found only one besides ddx58, aqp9a. By qRT-PCR, this gene appeared to be moderately (0.75-fold) downregulated by IFNφ1 (Supplemental Fig. 1B) and, thus, was not an ISG.
Among the 360 zebrafish ISGs identified by RNAseq, 23 corresponded to noncoding ISGs or transposons with no clear homologs in mammals and were excluded from further phylogenetic analyses.
Gene ontology analysis was performed using the Database for Annotation, Visualization and Integrated Discovery (DAVID) and the Gene Ontology Enrichment Analysis and Visualization tool (GOrilla), and showed, as expected, that IFNφ1-upregulated genes were strongly enriched in genes linked to antiviral response, type I IFN pathway, and Ag processing and presentation (data not shown). Similarly, enrichment analyses identified Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways for influenza, measles, and herpes simplex infection as well as RIG-I signaling and cytosolic DNA–sensing pathways. The list of downregulated genes was not found associated with any particularly notable function in these analyses.
Human orthologs of zebrafish ISGs are enriched in ISGs
We then searched for the human orthologs of the 337 identified protein-coding zebrafish ISGs (Supplemental Table I, Tab 2). All types of orthology relationships between zebrafish and humans were observed from none to “many-to-many.” One-to-one orthology was found for 77 genes (Fig. 1C). We identified one or several human orthologs for 200 zebrafish ISGs. This proportion (200/337, 59%) is significantly lower than the 71% reported for the whole genome (29) (Fisher exact test, p < 0.0001).
We then searched which of these human genes were themselves ISGs. We found 61 ISGs present in the list of 446 human ISGs compiled by Schoggins et al. (30); by querying the Interferome database (31), we identified 11 additional human ISGs (Supplemental Table I, Tab 2). In total, 97 zebrafish IFNϕ1–inducible genes were orthologous to at least one human ISG (Fig. 1C). In addition, we identified a handful of genes that were not true orthologs but shared ancestry with a human ISG at the vertebrate level, such as MHC class I genes (see comments on Supplemental Table I, Tab 2).
As expected, human orthologs of zebrafish ISGs were strongly enriched in ISGs; whereas there are 446 human ISGs out of 20,454 genes in the genome (i.e., 2%), we found 72 ISGs among the 196 human orthologs to zebrafish ISGs (i.e., 37%) (Fisher exact test, p < 0.0001). Interestingly, FC values of zebrafish ISGs were higher when they were orthologous to a human ISG than when their orthologs were not ISGs (red versus blue on Fig. 1C, 1D), whereas FCs of zebrafish ISGs without human orthologs (gray on Fig. 1C, 1D) were intermediate (mean values, 127.7, 10.6, and 46.9, respectively; all groups significantly different from each other, p < 0.001, Kruskal–Wallis test). Thus, inducibility by type I IFNs is often evolutionary conserved, making it possible to infer an ancestral set of ISGs.
Fish-specific ISG genes and families
Consistent with the expected high rate of duplication and divergence of ISGs, a significant proportion of zebrafish IFNφ1–upregulated protein-coding genes had no identifiable ortholog in the human genome (137/337; 41%). Interestingly, but not unlike genes with human orthologs, many of these genes belonged to multigenic families. We show on Fig. 2 the fish-specific gene families that contain several zebrafish ISGs, with domains identified by the SMART tool (37).
The genes listed on Fig. 2 included more than 20 fintrim (ftr) genes, a family identified first as virus-inducible genes in rainbow trout (13), highly diversified in zebrafish, and hypothesized to antagonize retroviruses (22). Of note, besides finTRIM, there are two other large TRIM gene expansions in zebrafish, each with a single human ortholog of unknown function (38). The bloodthirsty-like TRIMs (btr), related to human TRIM39, include several ISGs. By contrast, no member of the TRIM35-like family was upregulated by IFNφ1.
Another family had been described as virus-inducible in fish: gig2 (23). gig2 are distantly related to the poly (ADP-ribose) polymerase (PARP) family (24), which include several ISGs in humans and zebrafish. The zebrafish gig2p and gig2o are induced by IFNφ1.
To our knowledge, the genes in the remaining fish-specific families had not previously been described to be IFN or virus inducible. These families are diverse, encoding proteins expected to be membrane receptors and presumably secreted, nuclear, or cytosolic proteins (Fig. 2).
Eight members of the very large NLR family were ISGs. These genes belong to groups 1, 2, 3, and 4, as defined in Ref. 39, and two of them belong to a fish-specific subset defined by the presence of a C-terminal B30.2 (or PRY/SPRY) domain (40, 41), which is most similar to the corresponding domain of finTRIM genes (22). The specific function(s) of zebrafish NLR genes remain poorly understood, but this highly expanded family may be central for inflammatory mechanisms.
Additionally, three ISGs corresponded to membrane proteins with two Ig domains and a transmembrane region, but not ITAM or ITIM. These genes belong to a very large family with 140 members that we propose to name fish genes with two Ig domains (f2Ig).
RNAseq analysis of CHIKV-induced genes
Experimental infection of zebrafish larvae with CHIKV induces a strong type I IFN response (28). Our previous microarray-based analysis indicated that the response to CHIKV was dominated by ISGs (26). However, to allow comparison with another virus with slower IFN induction kinetics, this analysis had been performed at 48 h postinfection (hpi), whereas the peak of the IFN and ISG response, as determined by qRT-PCR, is at 24 hpi (26). Therefore, we reanalyzed in this study the transcriptome of CHIKV-infected larvae at 24 hpi using deep RNAseq. Choosing the same cutoff values as for the IFNφ1 analysis, we identified 466 CHIKV upregulated genes and 26 downregulated genes (Supplemental Table I, Tab 3). Hundreds of new CHIKV-inducible genes were identified, either because they were absent from the microarray or because their induction were below the cutoff of the first analysis. Among the genes significantly upregulated in the microarray study, all those with a human ISG ortholog were also upregulated in this new dataset, and, as expected, typically much more (Supplemental Fig. 2).
About half of genes induced by IFNφ1 were also induced by the viral infection (181/360 Fig. 3A, Supplemental Table I, Tab 3, yellow), including almost all (84 out of 97) genes orthologous to a human ISG, such as mxa, b, and e, stat1a and b, stat2, rasd2, isg15, etc. There was a clear correlation of the FC values for genes induced by both IFNϕ1 and CHIKV (Fig. 3B). However, almost two thirds of the genes induced by CHIKV infection were not significantly modulated by IFNφ1 (285/466; Fig. 3A). We then asked whether this CHIKV-specific response could correspond essentially to genes for which there was weak induction by rIFNφ1, below our arbitrary cutoff. We therefore extracted from this list genes induced by IFNφ1 with FC > 1.5 and with an adjusted p value < 20%, and we found 66 genes matching these conditions (Supplemental Table I, Tab 3, green): 21 genes without annotation and 45 annotated genes, many of which were notoriously linked to the type I IFN system. These genes notably comprised crfb1, encoding a type I IFNR subunit, and two other cytokine receptors il10Ra and il13Ra; four chemokines (ccl34, cxcl11.6, cxc18b, and cxc20); 10 additional fintrim and three other members of the gig2 family; and two irf transcription factors (irf2, irf10). It also includes the metalloreductase steap3, whose mammalian ortholog is not an ISG, but regulates type I IFN response, CXCL10 induction, and iron homeostasis in mouse macrophages (42).
Besides this intermediate gene set, a conservative list of 219 genes seems to be upregulated only by the virus (FC > 2 and adjusted p value < 5%) independently of IFNφ1 (FC < 1.5 or adjusted p value > 20%) (Fig. 3C; Supplemental Table I, Tab 3, blue). This list contains 105 genes without annotation, but also several functional modules, providing interesting insights on the virus/host interactions. Functional analysis using the Database for Annotation, Visualization and Integrated Discovery identified six enriched Kyoto Encyclopedia of Genes and Genomes pathways, namely cytokine/cytokine receptor interaction, cytosolic DNA sensing, TLR signaling, RIG-I–like receptor signaling, proteasome, and herpes simplex infection.
Importantly, type I IFNs were induced by the infection. Consistent with our previous report with another virus (10), ifnphi1 and ifnphi3 were clearly dominant at this larval stage, with 58 ± 3 and 47 ± 11 reads respectively, compared with 9 ± 4 reads for ifnphi2, and none detected for ifnphi4. Two proinflammatory cytokines il1b and tnfb were also upregulated. Among typical sensors, tlr3, mb21d1 (encoding cGAS) and its downstream adaptor tmem173 (encoding STING), and several kinases of the IFN signaling pathways (ripk1 and tbk1) were present. Seven proteasome subunits are induced by the virus, suggesting activation of protein degradation and Ag presentation pathways. The complement pathway also stands out as an important module upregulated by CHIKV infection: 12 complement component genes (c1, c2, several c3, c7, c9, cfB, and cfhl-1,-3, and -5) were induced by CHIKV, suggesting that it is an important defense triggered in a type I IFN–independent manner. Additionally, this response comprises three metalloaminopeptidases (anpepb and erap1b and 2); the myeloid markers ncf1, mpx, and marco; two guanylate binding proteins (gbp1 and 2) that have well-known orthologs in humans; the transcription factors atf3 and irf1b; and, with a high level of expression, the enzyme rnasel3 (an ortholog of human RNASE4, not of RNASEL, an ISG with no fish counterpart). Nine fintrim and three btr can also be noted, underscoring the importance of these TRIM with PRY/SPRY domains in virus/host interactions altogether.
Thus, CHIKV induces a typical IFN-stimulated response of high magnitude, but also a broader and less overt inflammatory response.
IFNR dependence of the response to CHIKV
To test the IFN-dependence of the response to CHIKV, we used morpholinos to knockdown in zebrafish larvae crfb1 and crfb2, which encode specific chains of the two types of type I IFNR of zebrafish (10). We previously showed that such IFNR morphant larvae are hypersusceptible to CHIKV infection, dying 2–3 d after virus injection (28). We analyzed by deep RNAseq their transcriptional response to CHIKV at 24 h postinoculation and compared it to that of control morphant larvae. Choosing as cutoff values adjusted p values < 5% and a ratio between IFN-R morphants and controls >2, we identified 187 genes for which induction was dampened by IFNR knockdown, and 10 genes that were upregulated in morphants (Supplemental Table I, Tab 4). Among CHIKV-induced genes (Supplemental Table I, Tab 3), 181 were IFNR dependent, representing a significant fraction (181/466; 39%) (Fig. 3A). Predictably, the list of genes upregulated by CHIKV in a IFNR-dependent manner largely, but not fully, overlapped with the gene set induced by rIFNφ1 (129/181; 71%, see Fig. 3A). This approach led us to classify 52 new zebrafish genes as ISGs, being induced by CHIKV in an IFNR-dependent manner, even if they were not significantly induced by rIFNφ1. As previously, we searched the human orthologs of these additional ISGs (Supplemental Table I, Tab 2, bottom), identifying a few more human ISGs in this list, such as cGAS, NLRC5, or IFI35.
Together, our results provide a near-exhaustive list of zebrafish ISGs at the larval stage, identified by two independent approaches, and a useful reference for future studies.
Assuming that the common ancestor of genes that are IFN inducible in both humans and zebrafish was itself an ISG in their last common ancestor ∼450 My ago, we can define a list of ancestral ISGs. We identified 66 orthology groups that included an ISG on both the human and the zebrafish sides (Table I, Supplemental Table I, Tab 5). A few more ancestral ISGs were also defined by pairs of ISGs with orthology relationships at the early vertebrate or gnathostome level, meaning that the zebrafish gene is not directly orthologous to a human ISG but is paralogous (with an ancestral taxonomy level labeled in Ensembl as vertebrates or jawed vertebrates) to another gene itself orthologous to a human ISG. In total, our list includes 72 ancestral genes (Fig. 4, Table I).
|APOL||apol||APOL1, 2, 3, 6|
|HERC5/6||herc56.1, 2, 3, 4||HERC5, HERC6|
|IFI44||ifi44a1, a5, c2, d, f3-6, g||IFI44, IFI44L|
|IFIT||ifit8–12, 14–16||IFIT1-3, 5|
|ISG12||isg12.1–4, 6–7||IFI6, IFI27|
|LGALS9||lgals9l1, 3||LGALS9, 9C|
|MOV10||mov10a, b.1, b.2||MOV10|
|MX||mxa, b, c, e||MX1, MX2|
|SP100||sp100.1, sp100.3, sp100.4||SP100, 110, 140, 140L|
|MHC class I||mhc1zba||HLA-A, -B, -C|
|APOL||apol||APOL1, 2, 3, 6|
|HERC5/6||herc56.1, 2, 3, 4||HERC5, HERC6|
|IFI44||ifi44a1, a5, c2, d, f3-6, g||IFI44, IFI44L|
|IFIT||ifit8–12, 14–16||IFIT1-3, 5|
|ISG12||isg12.1–4, 6–7||IFI6, IFI27|
|LGALS9||lgals9l1, 3||LGALS9, 9C|
|MOV10||mov10a, b.1, b.2||MOV10|
|MX||mxa, b, c, e||MX1, MX2|
|SP100||sp100.1, sp100.3, sp100.4||SP100, 110, 140, 140L|
|MHC class I||mhc1zba||HLA-A, -B, -C|
Orthology groups that include ISGs in both zebrafish and humans and therefore define an ancestral ISG in their common ancestor. Orthologous genes are produced by speciation (orthogenesis), by opposition to paralogous genes produced by duplication. Thus, an osteichtyan-level orthology group includes human and zebrafish genes with a direct ancestor in the LCATT. Vertebrate-level orthologs share this ancestral gene at the basal or jawed vertebrate level. This is a condensed version of Supplemental Table I, Tab5, which also include non-IFN–inducible genes that belong to the osteichtyan-level orthology groups and provide Ensembl Gene IDs.
Based on our orthology analysis, we propose new, more explicit names for many of the zebrafish ISGs with known human orthologs (in red on Supplemental Table I, Tab 5). This ancestral ISG core includes most ISGs with known functions. The IFN system of 450 My ago seems fairly similar to the present one, particularly in its signaling components (Fig. 4). Many ancestral genes have been duplicated independently in one or both lineages (Supplemental Table I, Tab 5), in addition to multiple ISGs apparently gained by either group (Fig. 5).
Two of the most strongly IFNφ1-downregulated genes (Supplemental Table I, Tab 1, bottom) were orthologous to human genes downregulated by type I IFNs, according to the Interferome database: perilipin 1 (plin1) and palmytoil acyl-CoA oxidase 1 (acox1). This suggests that downregulation of fatty acid oxidation pathway is an ancient feature of the IFN system.
Many IFNφ1-downregulated genes were orthologous to a human gene in a one-to-two manner, with the two zebrafish paralogues having arisen during the teleost-specific, whole-genome duplication (ohnologues). Systematically, only one of the two paralogues was downregulated.
The zebrafish has become an important model to study host/pathogen interactions, particularly at its early life stages that are the most prone to live imaging and genetically tractable. Although its antiviral IFN genes and receptors are now well identified, knowledge of IFN-induced genes, or ISGs, was only partial. In this work, we used deep sequencing to characterize the transcriptomic response of the 3 dpf zebrafish larva to rIFNφ1, the first type I IFN identified in zebrafish and the most highly inducible one. We analyzed in parallel the response to an alphavirus inducing a strong type I IFN induction and the impact of IFNR knockdown on this response. From these different datasets, we established a comprehensive list of zebrafish ISGs. This list was compared with the human ISG repertoire, and a phylogenetic analysis was performed to approach the ancestral ISG repertoire of early vertebrates.
New insights and limitations of the work
A number of studies have identified genes induced by IFN or viral infections in fish (reviewed in Ref. 7). However, very few global descriptions after treatment with recombinant type I IFN have been reported using microarrays (19). Microarray analyses are limited by probe choice and are typically biased toward genes with known human homologs. RNAseq, by contrast, is mainly limited by the genome annotation quality and by the analysis method and can be reanalyzed; this approach is, thus, more complete. Because the early zebrafish larva constitutes a reference model for investigating innate immune response, drug screening as well as for modeling diseases, we undertook a comprehensive description of the repertoire of ISG upregulated at this developmental stage. Importantly, we previously reported a clear transcriptional response of zebrafish embryos to IFNφ1 as early as 24 h postfertilization (14); the responsiveness to type I IFNs is thus already well established at 3 dpf. We are aware that cells present in adult, but not yet in larvae, notably those of the adaptive immune system such as lymphocytes and dendritic cells, may express additional ISGs, which should be assessed in further work.
There are two groups of type I IFNs in teleost fish (43) with two different receptors (10). This study only addresses the ISG repertoire induced by IFNφ1 (a group 1 IFN), and it is possible that group 2 IFNs (IFNφ2 and IFNφ3) induce a different ISG subset. Determining this will require more studies; however, because CHIKV induces both IFNφ1 and IFNφ3, whereas crfb1 and 2 morpholinos target receptors for both type I IFN groups, IFNφ3-only–induced ISGs should, therefore, be found among CHIKV-induced, IFNR-dependent, but non-IFNφ1–induced, genes. Such genes (listed on Supplemental Table I, Tab 2, bottom) constituted ∼30% of genes for which induction by CHIKV was impacted in morphants and only ∼15% if one also excludes genes for which induction by IFNφ1 is almost significant (Fig. 3A, 3C). A previous report by López-Muñoz et al. (20) suggests differences in ISG induction, notably in kinetics, by different IFNφs.
Comparative and phylogenetic analysis of zebrafish ISGs
Our comparative and phylogenetic approach led to a tentative reconstruction of the ISG repertoire of the last common ancestor of teleosts and tetrapods (LCATT) that lived ∼450 My ago and probably resembled the fossil osteichtyan Ligulalepis (44). To do so, we looked for human (co)ortholog(s) of all zebrafish ISGs identified in our analysis. Based on available data compilations (30, 31), we then determined which one(s) of these human orthologs were themselves induced by type I IFN. In such cases, we considered that they most likely originated from an “ancestral” ISG present in the LCATT. It is generally believed that the type I IFN system emerged during the early evolution of jawed vertebrates, because chondrichtyans (rays, sharks, and chimeras), but not agnathans (lampreys and hagfish), possess typical type I IFN genes (4, 45). Hence, it is important to note that the IFN system had already evolved, expanded, and standardized for more than 50 My before our last common ancestor with zebrafish.
Approximately half of what we defined as ancestral ISGs are represented by one-to-one orthologs in zebrafish and humans (Supplemental Table I, Tab 5, top rows), a situation of practical interest, as the likelihood of conservation of gene function is highest in this case. These are either isolated genes (e.g., RSAD2, ISG15, or cGAS) or members of “old” families already stabilized in the LCATT (e.g., IRF7 and IRF9) (26). The situation is relatively similar for a few ancestral genes, such as STAT1 or SOCS1, with one human ortholog and two zebrafish co-orthologs that arose during the teleost-specific, whole-genome duplication and were retained. In contrast, many other “young” families have clearly been subjected to further duplication during later evolution of fish or tetrapods, leading to orthology groups containing multiple ISGs both in zebrafish and humans, the most spectacular examples being the ISG12, IFIT, and IFI44 families.
The frequency of orthology with a human gene is lower for ISGs (59%) than for the entire genome (71%). This is probably a consequence of the stronger evolutionary pressure of genes involved in the arms race with pathogens, as postulated by the Red Queen hypothesis (46). Similar mechanisms also explain the frequent and extensive gene duplications, as well as gene losses if some virus disappears, removing the corresponding selective pressure on a given ISG. Possibly, a greater diversity of aquatic viruses could further favor ISG retention and divergence after duplication, but few direct evidences are available.
In addition to the ancestral genes with true zebrafish and human orthologs, we added to this list a few genes with a more complex history, with human and zebrafish ISGs that shared an ancestor at the basal vertebrate level (Table I, Supplemental Table I, Tab 5, bottom). These ancestral genes must have been duplicated in the LCATT genome; the teleost and tetrapod lineages then retained distinct paralogues. This comprises some genes whose evolutionary history is extremely difficult to trace because of multiple copies and extensive polymorphism, such as MHC class I genes. In this study, only mhc1zba was found to be a zebrafish ISG, but this does not necessarily imply that other zebrafish MHC class I genes are not ISGs, as they may have been missed because of mapping issues; the strain we used (AB) is not the same as the one of the reference genome (Tü), and strain-specific divergences are considerable between strains for MHC class I, with deep evolutionary roots (47). Importantly, we did not define ancestral ISGs for zebrafish/human ISG pairs that appeared to be related at first glance but, upon further analysis, were too distant; for example, zebrafish vamp5 and human VAMP8 are both ISGs but share their last common ancestor at the Opistokontha level, before the split of fungi and animals, very long before the emergence of IFNs.
Nevertheless, the type I IFN system also includes very old genes that were already present in basal metazoans. The RNAseL/OAS module is a good example of such cases, being found across metazoans from mammals to sponges (48) but lost in the fish branch. Another striking example is the cGAS/STING module recently identified in cnidaria (49). The implication of these genes in the antiviral immunity of basal branches of animals is unknown but certainly worth investigating. The main models for invertebrate immunity are flies and mosquitoes, but they largely rely on RNA-interference mechanisms to contain viruses (3). Central signaling modules of the vertebrate IFN system, such as TLR/NF-κB and JAK/STAT, are also present in insects and in more distant metazoans, but they induce different set of genes with other functions (50).
Additionally, a few important genes do not meet our criteria for ancestral ISG, because they are not typically inducible either in zebrafish or in humans (Fig. 5). For example, irf3 is an ISG in fish, but not in humans, whereas it is the reverse for JAK2. Hence, our list of ancestral ISG is likely not complete, but it provides a core repertoire pointing to most fundamental factors of the vertebrate innate antiviral arsenal.
A relatively large number of ISGs have no ortholog in the other lineage, such as human APOBEC3, RNASEL, OAS, and AIM2 (Fig. 5). Similarly, many fish-specific ISG likely have been co-opted by the IFN pathway during fish evolution. In this case, they do not have clear orthologs in humans and other tetrapods (as for finTRIMs and nlr-B30.2), or their ortholog(s) have no link with the type I IFN system. The finTRIM family contains the largest number of zebrafish ISGs of any family, ancestral or not. Interestingly, ISGs are found only among the recently diversified, species-specific finTRIMs; the most basal members (ftr82–84), well-conserved among fish, were not found in this study to be induced by IFNφ1 or by CHIKV, consistent with previous studies (51). Nevertheless, ftr83 appears to mediate protection especially in the gill region by stimulating local ifnphi1 expression (52). The diversity and evolution under positive selection of the IFN-inducible finTRIMs evoke viral recognition (22), yet their functions remain unclear.
The co-optation of new genes in the ISG repertoire may be operated quickly and in a group-specific manner, by introduction of sequence motifs in the regulatory sequences, for example, via retroviral insertion (53, 54). However, we cannot exclude that these branch-specific ISGs are, in fact, ancestral but lost in one of the two lineages; this is the case for Gig2 genes, which are present in the coelacanth genome as well as in fish and, thus, were lost in tetrapods. Thus, our repertoire of ancestral ISGs is underestimated, because we cannot include the lineage-specific losses.
Do ancestral ISGs identified in this study define a minimal, but complete, set of response elements from recognition to elimination of invading viruses? Probably not, as this ancestral core group of ISGs was backed up by more ISGs is any species, including the LCATT. For example, the absence of the well-known OAS/RNAseL module genes in fish (and therefore in our list of conserved ancestral ISGs) is puzzling, and one could predict that other fish genes have taken over similar functions. Similarly, APOBEC3 genes are absent in fish, and maybe their RNA-editing mechanisms are mediated by other genes, possibly by ADAR1.
Ancestral ISGs encode very diverse proteins in localization and function (Fig. 4). We provide an extended discussion of their classification below.
Characterization of the IFN-independent response to CHIKV infection
Knowing the repertoire of ISG also offers the possibility to identify genes that are induced by viral infection independently of the type I IFN pathway, whereas a subset of ISGs can be induced via IFN-dependent and -independent pathways in humans and fish, for example, rsad2 (12). Thus, IRF3-dependent, type I IFN–independent induction of many ISG by particular viruses has been described (55).
However, about half of the genes upregulated by CHIKV were not induced by IFNϕ1 injection, and most were not affected by IFNR knockdown. Notably, three gene sets stand out in this list: 1) components of the complement cascade that are known to play a role in antiviral defense; 2) cytokines, including some CC and CXC chemokines, as well as the type I IFN themselves, which do not appear to be strongly auto/cross-inducible; and 3) many btr and ftr TRIM E3 ligases as well as multiple proteasome components. Interestingly too, irf1b, the zebrafish ortholog of IRF1 (a human ISG), is CHIKV inducible, but not in a IFNR-dependent manner, consistent with previous work (26), and was not induced by IFNϕ1. Many other genes of unknown function also share the same induction pattern and would certainly be worth investigating. A strong redundancy of antiviral pathways has certainly been selected during evolution, because viruses have developed multiple strategies of immune subversion.
Contrary to what was observed with upregulated genes, there was no overlap between gene sets downregulated by IFNϕ1 and by CHIKV. This remarkable difference could be due to the alternative inflammatory response induced by the virus besides type I IFNs or to kinetic differences.
Classification of ancestral ISGs
The ancestral ISG presented in Table I can be classified based on molecular functions: sensors, transcription factors and other signal transduction factors, secreted factors, enzymes including ubiquitination factors, and membrane receptors, which we discuss below. The antiviral mechanisms described in humans or in other mammalian systems also provide hints about the likely conserved mode of action of these factors.
Many members of the list appear to have DNA binding capacity and may be classified as transcription factors.
BATF2 is a member of the AP-1/ATF family transcription factors that controls the differentiation of immune cells and plays key regulatory roles in immune responses. BATF2 promotes TLR7-induced Th1 responses (56).
SP100 is a tumor suppressor and a major constituent of the PML bodies controlling transcription and/or chromatin conformation.
HELZ2 is a helicase that acts as a transcriptional coactivator for a number of nuclear receptors, including AHR, a nuclear receptor regulating lipid metabolism and the susceptibility to dengue virus (57).
PARPs can act as transcriptional coactivators and potentiate induction of many ISG (58). The multiple ifi44 zebrafish genes counts seven ISG among 19 members, but their two human co-orthologs are induced by type I IFN. Located in the nucleus, IFI44 binds and blocks the HIV1 LTR promoter (59). However, the numerous zebrafish ifi44 probably have subfunctionalized and mediate multiple antiviral mechanisms.
Sensors and related genes.
The helicases RIG-I, LPGP2, and IFIH1 (i.e., MDA5) stand as primary ISGs encoding viral sensors. Besides, as a cytoplasmic helicase, HELZ2 might also play a sensor role. In keeping with this, TREX proteins have a 3′-to-5′ DNA exonuclease activity that is important to block the sting-dependent initiation of IFN responses by DNA fragments from endogenous retroviruses and elements (60).
Besides transcription factors, enzymes are the most important category of ancestral ISG. They may play a role in signaling or have a direct antiviral activity.
PARP are involved in many cellular processes, from regulation of chromatin conformation to transcription control, and several PARP also are induced by infection and inflammation. PARP are represented by parp9, parp12, and parp14 among ancestral ISG. Strikingly, these three PARP are part of a nuclear complex, with the E3 ubiquitin ligase encoded by dtx3l that is also an ancestral ISG that promotes RNA polymerase II recruitment at IRF3-dependent promoters (58). Our data showing that key components of this complex are part of the essential type I IFN system underscore its importance in the core antiviral response. Besides, other activities of PARP may be involved in antiviral mechanisms; for example, PARP12 mediates ADP-ribosylation of Zika virus NS1 and NS3, leading to their degradation by the proteasome (61). The ADP-ribose-hydrolase encoded by the CHIKV, which is required for its virulence, is another hint of the central importance of these enzymes in antiviral defense (62).
Several E3 ubiquitin ligases were found among ancestral ISG, including trim25, usp18, rnf114, and dtx3l. The mechanisms through which they exert antiviral activity or regulate the response are not fully resolved. The critical role of trim25 in RIG-I activation and its capacity of ISGylation (63) have been well documented in fish and mammals. isg15, a ubiquitin-like protein, is also an ancestral ISG playing a central role in the type I IFN pathway in fish and mammals (7, 63) via multiple mechanisms.
The proapoptotic caspase casp7 possesses type I IFN–induced orthologs in zebrafish and humans. Interestingly, ancestral ISGs also comprise pmaip1, which promotes caspase activation and apoptosis via modifications of the mitochondrial membrane and xiaf1, a negative regulator of members of inhibitor of apoptosis proteins. Taken together, these observations indicate that the ancestral type I IFN system comprised a proapoptotic module.
The proinflammatory caspase casp1 is also an ancestral ISG, as is pycard, which encodes ASC, the major scaffold protein of the canonical inflammasome. Induction of the inflammasome is thus an ancestral property of the IFN response. Many upstream sensors of the inflammasome are IFN inducible, but they are generally divergent in the two lineages, nlrc5 being the only ancestral ISG.
Rsad2 (also known as viperin) is an enzyme with a direct antiviral function that catalyzes the conversion of CTP to a completely new ribonucleoside, the 3′-deoxy-3′,4′-didehydro-CTP acting as a terminator of RNA synthesis (64). Interestingly, both ancestral ISG rsad2 and the nucleotide modifier cpmk2 are located very close to each other in the genome in fish as well as in mammals, likely forming a conserved functional antiviral unit.
Adenosine deaminases acting on dsRNA (ADARs) deaminate adenosine to produce inosine in dsRNA structures, regulating the inflammation induced by such molecules (65). Accordingly, loss of function of adar in zebrafish larvae leads to brain inflammation in a model of Aicardi–Goutières syndrome, suggesting a key regulatory role of this gene during type I IFN response (66).
Protein kinase R (PKR; encoded by eif2ak2) is activated by dsRNA (and thus could have been listed above as a sensor), leading to phosphorylation of EIF2α and to inhibition of protein synthesis and viral replication. Many viruses encode PKR inhibitors of this cornerstone antiviral factor that also affects transcription factors like IRF1, STATs, and NF-κB and upregulates many genes including β2microglobulin and isg15 (67). Interestingly, the other ancestral ISG epsti1 can activate PKR promoters and induce PKR-dependent genes in humans (68), questioning whether pkr and epsti1 may have been functionally coupled from the LCATT. Fish possess a lineage-specific paralogue of PKR called PKZ, which detects Z-DNA (69).
In humans and mice, ccl19 is implicated in lymphocyte migration and is important to define compartments within lymphoid tissues. In rainbow trout, one of the six ccl19 paralogues present in the genome participate to antiviral immunity through promotion of mucosal and central CD8+ T cell response (70).
Some of the fish homologs of murine and human IFN–inducible CXC chemokines (i.e., CXCL9–11, which bind CXCR3, a receptor expressed by various leukocyte, including some T cells, macrophages, and dendritic cell subsets) are also upregulated by IFNφ in zebrafish larvae. These genes have been largely expanded in fish, and two lineages of CXCL11 have been recently distinguished, both closely related to the mammalian CXCL9–11 (71, 72). The upregulated cxcl11.3 (i.e., cxc66 or cxcl11ac) identified in this work belongs to the lineage 1. The zebrafish has three cxcr3 paralogues, and receptor/ligand binding, tested for three other zebrafish cxcl11 ligands, does not follow ligand lineage (73), so the receptor(s) of this ISG remains to be identified experimentally.
Another soluble factor upregulated by type I IFN and viral infection in fish is galectin-9 [this work and (13)]. In mammals, Galectin-9 is involved in multiple mechanisms of antiviral immunity. For example, it is a potent factor against human CMV because it blocks the entry of the virus in target cells (74). Galectin-9 can also regulate HIV transcription and induces the expression of the deaminase APOBEC3G, a potent antiviral factor (75). Besides, the galectin-9 receptor TIM3 is implicated in the control of Th1 cells (76).
Whereas zebrafish and human mhc class I are not direct orthologs, mhc class I genes are in the list, with β2microglobulin and the peptide transporters tap-1 and tap-2, as well as homologs of TAPBP and proteasome subunits, indicating that this pathway is a fundamental component of the type I IFN system.
Other important membrane proteins in the list are tetraspanins of the CD9 family that regulate degranulation of myeloid subsets and secretion of cytokines and, hence, constitute key players in inflammation (77).
Zebrafish possess eight isg12 genes located in tandem, of which six were highly inducible by IFNφ1 and by CHIKV. Their human ISG orthologs IFI6 and IFI27 (i.e., ISG12A) are internal membrane proteins stabilizing endoplasmic reticulum membrane and preventing the formation of flavivirus-induced endoplasmic reticulum membrane invaginations (78) or destabilize mitochondrial membrane and promote apoptosis (79). In fact, IFI27 can also recruit a E3 ubiquitin ligase and targets HCV NS5 protein to degradation (80), illustrating the potential diversity of antiviral mechanisms mediated by members of this family.
APOL1 affects endocytosis and promotes an expansion of the lysosomal compartment, favoring, for example, the degradation of the HIV-1 protein Vif (81).
ISG with unknown functions or unknown antiviral mechanisms.
Even in humans and mice, the basis of antiviral activity of certain ISGs remains completely unknown. For example, the effects of PHF11, RNF114, or SAMD9 are elusive. In the latter, a DNA/RNA-binding AlbA, a nucleoside triphosphatase, and a OB domain with predicted RNA-binding properties suggest a link with nucleic acid metabolism or sensing (82). These very old ISG with counterparts found across Metazoa and even in procaryotes are key restriction factors of poxviruses (83).
In conclusion, antiviral genes are well known to evolve very fast, as postulated by the Red Queen hypothesis, under strong pressure from pathogens. This is indeed illustrated by the large number of ISGs that are either fish or mammal specific. Nevertheless, our data define a surprisingly stable set of core ISGs that were apparently co-opted into the new IFN system of early vertebrates ∼500 My ago and have been maintained for the last 450 My both in fish and tetrapods. The full list of zebrafish ISG provides a powerful reference to characterize the subtle interactions between viruses and the host response, including redundancy of immune pathways and viral subversion mechanisms. It also constitutes a valuable resource for the study of autoinflammatory disease using the emerging zebrafish model.
We thank Jean-Yves Coppee and Caroline Proux (Transcriptomics Platform, Institut Pasteur) for generation and sequencing of RNA libraries. We are indebted to Rune Hartmann (Aarhus University) for recombinant zebrafish IFN and stimulating discussion. We thank Emma Colucci and Pedro Hernandez-Cerda for critical reading of the manuscript.
This work was supported by Agence Nationale de la Recherche Project Fish-RNAvax (Grants ANR-16-CE20-0002-02 and ANR-16-CE20-0002-03) and the European Union Horizon 2020 Research and Innovation Programme under Marie Sklodowska-Curie Grant Agreement 721537–ImageInLife.
The sequences presented in this article have been submitted to BioProject National Center for Biotechnology Information (https://www.ncbi.nlm.nih.gov/bioproject) under the Sequence Read Archive accession number PRJNA531581.
The online version of this article contains supplemental material.
Abbreviations used in this article:
adenosine deaminase acting on dsRNA
last common ancestor of teleosts and tetrapods
poly (ADP-ribose) polymerase
protein kinase R
The authors have no financial conflicts of interest.