Profiling studies of mRNA and microRNA, particularly microarray-based studies, have been extensively used to create compendia of genes that are preferentially expressed in the immune system. In some instances, functional studies have been subsequently pursued. Recent efforts such as the Encyclopedia of DNA Elements have demonstrated the benefit of coupling RNA sequencing analysis with information from expressed sequence tags (ESTs) for transcriptomic analysis. However, the full characterization and identification of transcripts that function as modulators of human immune responses remains incomplete. In this study, we demonstrate that an integrated analysis of human ESTs provides a robust platform to identify the immune transcriptome. Beyond recovering a reference set of immune-enriched genes and providing large-scale cross-validation of previous microarray studies, we discovered hundreds of novel genes preferentially expressed in the immune system, including noncoding RNAs. As a result, we have established the Immunogene database, representing an integrated EST road map of gene expression in human immune cells, which can be used to further investigate the function of coding and noncoding genes in the immune system. Using this approach, we have uncovered a unique metabolic gene signature of human macrophages and identified PRDM15 as a novel overexpressed gene in human lymphomas. Thus, we demonstrate the utility of EST profiling as a basis for further deconstruction of physiologic and pathologic immune processes.

The immune system is our central defense against bacterial and viral pathogens. As such, misregulation of certain genes that are critical for normal immune function can result in immunodeficiency and autoimmune disorders. Furthermore, disorders such as diabetes, obesity, and cancer are increasingly viewed as having roots in immunological dysfunction (1). Despite this recognition of the immune system as a central player in health and disease, the full spectrum of genes and pathways capable of modulating immune responses is not known.

Analyzing the function of genes that are highly enriched or specific for immune cell/tissue expression may aid in capturing important regulators of immune function. It is important to note that this premise does not require that ubiquitously expressed genes cannot be critically important in immune function and development. However, considering examples such as BCR/TCR signaling cascade components, it is striking that genes that are expressed specifically in immune cells/tissues are often central players in immune physiology. In addition, genes responsible for Mendelian immunodeficiences or autoimmunity often show enriched expression patterns in the immune system. Examples include the causes of Omenn syndrome (RAG1/2; Online Mendelian Inheritance in Man [OMIM] identification number 603554), hyper-IgM syndrome (AICDA; OMIM identification number 308230), and X-linked lymphoproliferative disorder (SH2D1A; OMIM identification number 308240).

Historically, gene discovery in the immune system has relied on unbiased searches for genes exhibiting differential expression in cells/tissues of the immune system as compared with other cells/tissues. These approaches have used techniques such as differential display, microarray profiling, serial analysis of gene expression, and, most recently, RNA sequencing (RNA-Seq) (211). In the case of microarray profiling, experiments often reveal a large number of differentially expressed genes, making cross-validation by alternative approaches a challenge. In addition, microarrays suffer from difficulty in detecting low-abundance transcripts and may not contain valid probes for mRNAs, noncoding RNAs (ncRNAs), or microRNAs, because their design is based on human annotation. Recent studies have used RNA-Seq data to examine expression profiles of immune cells; however, this has not diminished the utility of expressed sequence tag (EST) data, as recently demonstrated by the Encyclopedia of DNA Elements (ENCODE) consortium’s extensive use of EST data (1214). For example, EST data are often used to aid in deciphering expression of alternative 5′ untranslated region isoforms, because RNA-Seq alone is insufficient to define transcription start sites (TSSs) of genes.

With the goal of identifying genes enriched in the immune system, we developed a rigorous quantitative systems-level expression database called Immunogene, which harnesses the ∼7 million human ESTs in the UniGene database. To date, the UniGene database has not been exploited in a systematic fashion for genome-wide immune system gene discovery, although it has successfully been used for retinal gene discovery (15, 16). Early studies employing UniGene for analysis of the bone marrow transcriptome in zebrafish further highlighted the value of this approach (17). Notably, the UniGene database has the benefit of not being biased toward human gene annotation, representing instead a resource of sequence data coupled with tissue/cell-type annotation. Using this approach, we demonstrate unique transmembrane, enzymatic, and nutrient/ion transport network signatures in the immune system. We also demonstrate the utility of the Immunogene database as an aid in deciphering blood-related cancers, identifying PRDM15 as a novel overexpressed gene in human B cell lymphomas. Thus, we show that EST profiling can significantly enhance our understanding of the ensemble of genes and processes important for immune function.

Human UniGene database build 202 and mouse build 163 were obtained from the National Center for Biotechnology Information (NCBI; ftp://ftp.ncbi.nih.gov/repository/UniGene). These databases contain mapping of each EST to a UniGene cluster and mapping of UniGene clusters to genes and tissue of origin for each EST. Upon analysis of library descriptors, we found that the human database contained 469 unique tissue/cell types, of which 91 were associated with the immune system, allowing each EST to be classified as either immune or nonimmune according to its associated library descriptor. A normalized or weighted immune percentage (WIP) was calculated for each UniGene cluster using the following formula: Normalized WIP score per UniGene cluster = (A/B)/(A/B + C/D), where A is the number of ESTs from immune-related libraries in UniGene Cluster X, B is the total number of ESTs from immune-related libraries in the entire UniGene database, C is the number of ESTs from non–immune-related libraries in UniGene Cluster X, and D is the total number of ESTs from non–immune-related libraries in the entire UniGene database.

A p value for immune enrichment was computed for each UniGene cluster using a χ2 test in R (version 2.5.1). For this test, a 2 × 2 contingency table was created with ESTs derived from immune tissues versus nonimmune tissues (Table I). The p value was calculated with respect to the number of ESTs in all clusters that had at least 10 ESTs.

Table I.
ESTs derived from immune tissues versus nonimmune tissues
Individual ClusterEntire UniGene Collection
ESTs derived from immune tissues 575,059 
ESTs derived from nonimmune tissues 5,474,516 
Individual ClusterEntire UniGene Collection
ESTs derived from immune tissues 575,059 
ESTs derived from nonimmune tissues 5,474,516 

The p values for all clusters with at least 10 ESTs were corrected for multiple hypotheses testing using the false discovery rate (FDR) of Benjamini and Hochberg (18).

For Fig. 5A, we identified the U133 Plus2/U133A quantile-normalized relative expression level of PRDM15 in various subtypes of lymphoma cell lines compared with median expression levels of PRDM15 in all lymphoma cell types in the body atlas database of NextBio. These normalized relative expression levels were used to generate a box plot to show the expression level of PRDM15 in different lymphoma subtypes. Nonlymphoma cell lines from lung and colon cancer were also used for comparison.

FIGURE 5.

The Immunogene PRDM15 is overexpressed in human lymphomas. (A) Box plot of normalized microarray expression levels of PRDM15 across 744 cell lines from various cancer types with PRDM15 levels shown for B cell lymphoma subtypes, lung cancer, and colon cancer. (B) PRDM15 mRNA is overexpressed in Burkitt lymphoma. RT-qPCR of PRDM15 transcript levels in EBV-transformed B cell lymphoblastoid GM06990 and GM12878 cell lines, embryonic kidney HEK293T cells, Burkitt lymphoma Daudi cells, and monocytic THP-1 cells. Levels were normalized to S18 transcript and expressed relative to PRDM15 levels in GM06990 cells. (C) PRDM15 localizes to the nucleus. HEK293T cells were mock transfected or transfected with N-terminal FLAG-tagged PRDM15 and stained with anti-PRDM15 Ab and DAPI (original magnification ×40). (D) Western blot analysis of PRDM15 in multiple human cell lines including Daudi and Raji cells (Burkitt lymphomas), K562 cells (myelogenous leukemia, BCR-ABL positive), U-937 cells (histiocytic lymphoma, negative IgH, negative EBV, monocytic), THP-1 cells (acute monocytic leukemia), HL-60 cells (acute promyelocytic leukemia), Jurkat cells (acute T cell leukemia), and MOLT4 cells (acute T-lymphoblastic leukemia). * indicates a likely nonspecific band as it does not correspond to any PRDM15 predicted isoform weight. (E) PRDM15 mRNA is overexpressed in primary follicular B cell lymphomas. Normalized log2 median-centered expression of PRDM15 in human follicular lymphomas (n = 7) versus normal immune cell populations including peripheral blood B cells (PBC B cell; n = 5), peripheral blood CD4+ T cells (PBC T cell; n = 5), germinal center B cells (GBC; n = 1), memory B lymphocytes (MBC; n = 1), umbilical cord B lymphocytes (UBC B cell; n = 1), and umbilical cord T lymphocytes (UBC T cell; n = 1). (F) PRDM15 protein is expressed in normal germinal centers and overexpressed in primary human follicular lymphomas. Shown are the results comparing PRDM15 levels in normal human tonsils and two of seven representative independent primary human follicular lymphomas samples that were stained with anti-PRDM15 Ab by immunohistochemistry.

FIGURE 5.

The Immunogene PRDM15 is overexpressed in human lymphomas. (A) Box plot of normalized microarray expression levels of PRDM15 across 744 cell lines from various cancer types with PRDM15 levels shown for B cell lymphoma subtypes, lung cancer, and colon cancer. (B) PRDM15 mRNA is overexpressed in Burkitt lymphoma. RT-qPCR of PRDM15 transcript levels in EBV-transformed B cell lymphoblastoid GM06990 and GM12878 cell lines, embryonic kidney HEK293T cells, Burkitt lymphoma Daudi cells, and monocytic THP-1 cells. Levels were normalized to S18 transcript and expressed relative to PRDM15 levels in GM06990 cells. (C) PRDM15 localizes to the nucleus. HEK293T cells were mock transfected or transfected with N-terminal FLAG-tagged PRDM15 and stained with anti-PRDM15 Ab and DAPI (original magnification ×40). (D) Western blot analysis of PRDM15 in multiple human cell lines including Daudi and Raji cells (Burkitt lymphomas), K562 cells (myelogenous leukemia, BCR-ABL positive), U-937 cells (histiocytic lymphoma, negative IgH, negative EBV, monocytic), THP-1 cells (acute monocytic leukemia), HL-60 cells (acute promyelocytic leukemia), Jurkat cells (acute T cell leukemia), and MOLT4 cells (acute T-lymphoblastic leukemia). * indicates a likely nonspecific band as it does not correspond to any PRDM15 predicted isoform weight. (E) PRDM15 mRNA is overexpressed in primary follicular B cell lymphomas. Normalized log2 median-centered expression of PRDM15 in human follicular lymphomas (n = 7) versus normal immune cell populations including peripheral blood B cells (PBC B cell; n = 5), peripheral blood CD4+ T cells (PBC T cell; n = 5), germinal center B cells (GBC; n = 1), memory B lymphocytes (MBC; n = 1), umbilical cord B lymphocytes (UBC B cell; n = 1), and umbilical cord T lymphocytes (UBC T cell; n = 1). (F) PRDM15 protein is expressed in normal germinal centers and overexpressed in primary human follicular lymphomas. Shown are the results comparing PRDM15 levels in normal human tonsils and two of seven representative independent primary human follicular lymphomas samples that were stained with anti-PRDM15 Ab by immunohistochemistry.

Close modal

To identify potential divergent ncRNAs emanating near the TSSs of immune-enriched protein-coding genes, we downloaded EST data from the University of California Santa Cruz genome browser. Using custom PERL scripts, we analyzed the data to identify sequences that met the following requirements: 1) at least one EST is spliced and transcribed on the opposite strand of a coding gene and has a start site within 5 kb of the TSS of the RefSeq Immunogene; 2) the spliced EST does not show evidence of splicing to a neighboring RefSeq gene; and 3) no open reading frame of >100 aa could be identified in the longest EST associated with the coding RefSeq, if there was more than one EST that met our criteria. We used the data feature “intronOrientation” downloaded from the University of California Santa Cruz Genome Browser to confirm EST strand orientation.

Enrichment analyses for gene ontology (GO) biological process and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways were conducted based on various gene list categories including the EST-defined Immunogenes with Entrez IDs (n = 2,232) and the reference set of genes (n = 570) documented in the literature as being preferentially expressed in the immune system. These gene lists were submitted to the Database for Annotation, Visualization and Integrated Discovery (http://david.abcc.ncifcrf.gov/) for enrichment analyses of GO biological processes and KEGG pathways, with the number of genes in each category and Bonferroni-corrected p values displayed (19).

GC robust multiarray average–normalized gene expression profiles across 79 tissues were obtained from the Genomics Institute of the Novartis Research Foundation (GNF) consortium (20). Expression profiles from the Affymetrix U133A platform (Affymetrix) and GNF custom probes were used. In accordance with Affymetrix guidelines, probes with highest expression value below log2(100) across all tissues were removed. Expression profiles were clustered using Cluster 3 (21) and visualized using JavaTreeView (22). Immune enrichment was calculated with the R program (version 2.5.1) using the Wilcoxon rank-sum test for each probe, and p values were corrected using the FDR method (18). In cases in which multiple probes were mapped to the same gene, the most significant p value was accepted. The following tissues in the GNF dataset were classified as immune and tested versus all other tissues: bone marrow, CD19 B cells, tonsils, lymph nodes, thymus, CD4 T cells, CD8 T cells, CD56 T cells, whole blood, CD33 myeloid cells, CD14 monocytes, dendritic cells, fetal liver, CD105 endothelial cells, leukemia cell lines, lymphoma cell lines, and erythroid cells. For Fig. 5E, we used the normalized log2 median-centered expression of PRDM15 and several coregulated genes obtained from published microarray expression profiling (23).

For each human protein, the homolog mouse protein was identified using NCBI HomoloGene (24). In some cases, multiple mouse proteins matched a single human protein and vice versa. For the purpose of immune enrichment, at least one mouse homolog was required to be immune enriched.

Human PRDM15 (GenBank identification number BC067102) was obtained from Open Biosystems and subcloned into N-terminal and C-terminal FLAG-tagged pCMV-3xFLAG vectors using 5′ EcoRI and 3′ HindIII; vectors were generated by modifying ClonTech pCMV-Myc vector by introduction of appropriate epitope tags and modification of the multiple cloning site. Primary Abs used were anti-PRDM15 (Santa Cruz Biotechnology; sc-83314), anti-FLAG M2 (Sigma-Aldrich; F1804), and anti-actin (Sigma-Aldrich; F2066). Secondary Abs for ECL development were HRP-conjugated anti-mouse (Amersham Biosciences; NXA931) and anti-rabbit (Amersham Biosciences; NA934).

Human Cell Line Blot II (ProSci; 1502) was obtained containing 15 μg protein from various human cell lines. Anti-PRDM15 Western blotting was performed under standard procedures with 1:1000 primary Ab and 1:3000 secondary Ab. For immunofluorescence, HEK293T cells were transfected with Lipofectamine (Invitrogen) with PRDM15 FLAG-tagged constructs and stained with either anti-FLAG (1:400) or anti-PRDM15 (1:200) Abs. Immunohistochemistry on human tonsil or follicular lymphomas obtained from the Pathology Core at Massachusetts General Hospital was performed as described using anti-PRDM15 Ab (1:100) (25).

Reads per kilobase of exon model per million mapped reads calls from an RNA-Seq atlas were downloaded from Medicalgenomics (http://medicalgenomics.org/rna_seq_atlas), which encompasses ribodepleted RNA-Seq data from 11 different tissues (colon, heart, hypothalamus, kidney, liver, lung, skeletal muscle, spleen, testes, ovary, and adipose tissue). Immune enrichment was calculated for each gene using the Wilcoxon signed-rank test. In the RNA-Seq atlas, spleen was classified as an immune tissue and tested versus all other tissues. In this dataset, 21,725 RefSeq isoforms for mRNAs or ncRNAs were designated by NM or NR accessions, encompassing 14,713 unique gene symbols (10). Of the 26,552 UniGene clusters analyzed for their WIP score, there were RNA-Seq data for 11,449 corresponding genes. Furthermore, of the 3352 Immunogene clusters identified in the UniGene database (see above analysis), there were RNA-Seq data for 1336 corresponding genes. The σ, W-, and Z-score statistics were calculated for all 11,449 genes versus the 1,336 Immunogenes to test for the presence of a trend toward immune enrichment based on RNA-Seq data.

The genome-wide association study (GWAS) single nucleotide polymorphism (SNP) catalog was downloaded from http://www.genome.gov/gwastudies/. SNPs were considered immune disease/trait-associated if they associated with the following diseases/traits: atopic dermatitis, systematic lupus erythematosus, Behcet's disease, primary biliary cirrhosis, rheumatoid arthritis, psoriasis, psoriatic arthritis, vitiligo, sarcoidosis, inflammatory bowel disease, Crohn's disease, ulcerative colitis, primary sclerosing cholangitis, type 1 diabetes, type 1 diabetes autoantibodies, multiple sclerosis (MS), ankylosing spondylitis, juvenile idiopathic arthritis, IgE levels, IgA levels, allergic rhinitis, leprosy, IgE grass sensitization, CMV Ab response, Graves disease, or celiac disease. These diseases/traits were associated with 1102 unique entries in the catalog, encompassing 808 unique immune disease/trait-associated SNPs given overlapping identification of SNPs among diseases and studies. The remaining diseases/traits were classified as nonimmune and were associated with 8795 entries encompassing 7355 unique SNPs. We next calculated the distance from each immune disease/trait-associated SNP, as well as each nonimmune disease/trait-associated SNP, to the nearest TSS of a coding RefSeq. We then calculated the frequency of immune disease/trait-associated SNPs within binned distances from EST-defined Immunogenes versus the frequency of nonimmune disease/trait-associated SNPs within the same binned distances from EST-defined Immunogenes (Fig. 4C). We computed a p value by permutation testing, selecting 808 SNPs at random from the non–immune disease/trait-associated SNP pool 10,000 times and calculating the percentage of SNPs with a nearest EST-defined Immunogene each time.

FIGURE 4.

Immune cell transporters and enzymes identified using Immunogene, lncRNA coregulated with LRRK2, and proximity of immune disease/trait GWAS SNPs to Immunogenes. (A) Graphical representation of known or predicted locations and functions of transmembrane proteins and/or genes related to metabolism or nutrient/ion transport in the Immunogene dataset. Green circles indicate transmembrane proteins or transporters associated with complex or Mendelian diseases; red circles indicate transporters with known function; black circles indicate transmembrane proteins with predicted localization and no known function. (B) A divergently transcribed lncRNA is coregulated with LRRK2. Top panel, Schematic of exon structure of LRRK2 and the divergent AS-LRRK2 transcript(s), with location of SNP rs11175593 associated with Crohn’s disease (37). Bottom panel, The THP-1 monocytic cell line was stimulated with IFN-γ for the indicated time. RT-qPCR of LRRK2 and AS-LRRK2 transcript levels is shown normalized to S18 transcript and relative to expression levels of LRRK2 in resting THP-1 cells. Data shown are representative of one of two biological replicates with errors bars representing SD of triplicate technical replicates. (C) The distance (2-kb bins) between SNPs associated by GWAS with immune diseases/traits (red) and the nearest Immunogene is shown in comparison with the distance between SNPs not associated with immune diseases/traits (blue) and the nearest Immunogene.

FIGURE 4.

Immune cell transporters and enzymes identified using Immunogene, lncRNA coregulated with LRRK2, and proximity of immune disease/trait GWAS SNPs to Immunogenes. (A) Graphical representation of known or predicted locations and functions of transmembrane proteins and/or genes related to metabolism or nutrient/ion transport in the Immunogene dataset. Green circles indicate transmembrane proteins or transporters associated with complex or Mendelian diseases; red circles indicate transporters with known function; black circles indicate transmembrane proteins with predicted localization and no known function. (B) A divergently transcribed lncRNA is coregulated with LRRK2. Top panel, Schematic of exon structure of LRRK2 and the divergent AS-LRRK2 transcript(s), with location of SNP rs11175593 associated with Crohn’s disease (37). Bottom panel, The THP-1 monocytic cell line was stimulated with IFN-γ for the indicated time. RT-qPCR of LRRK2 and AS-LRRK2 transcript levels is shown normalized to S18 transcript and relative to expression levels of LRRK2 in resting THP-1 cells. Data shown are representative of one of two biological replicates with errors bars representing SD of triplicate technical replicates. (C) The distance (2-kb bins) between SNPs associated by GWAS with immune diseases/traits (red) and the nearest Immunogene is shown in comparison with the distance between SNPs not associated with immune diseases/traits (blue) and the nearest Immunogene.

Close modal

We compiled a list of genes documented in the literature to be preferentially expressed in the immune system. We defined evidence for immune enrichment based on RT-PCR or Northern blots. To remove selection biases as much as possible, we cataloged the expression pattern of sets of genes that would be generally regarded as preferentially expressed in the immune system (e.g., cluster of differentiation Ags, cytokine genes and receptors, and TLR signaling components) using various Web-based resources that catalog these genes. We consider this very stringent approach necessary, because not all TLR genes (e.g., TLR5) or CD Ags (e.g., CD56) are enriched in the immune system.

First-strand cDNA from human spleen was obtained from OriGene. Reactions were performed in 25 μl volumes with 40 ng sample first-strand cDNA and 640 nM concentrations of both forward and reverse primers. PCR was performed under the following conditions: initial denaturation of 95°C for 5 min followed by 34 cycles of 95°C for 30 s and 60°C for 30 s. To control for specificity of PCR reactions and primer-dimers, the same PCR reaction was also conducted with no input cDNA. All primers were designed to be exon spanning. PCR products were cloned and sequenced to verify amplification of expected target long ncRNA (lncRNA). A complete list of UniGene clusters and associated PCR primers used to amplify spliced ncRNAs is shown in Table II. For quantitative RT-PCR experiments, total RNA was extracted from the indicated cell lines; the RT step was performed with Superscript, and the quantitative PCR (qPCR) step was performed with SYBR Green (Invitrogen).

Table II.
Complete list of UniGene clusters and associated PCR primers used to amplify spliced ncRNAs
UniGene Cluster
(Primer Name)SequenceProduct (bp)EST Accession No.NR Accession No.
Hs.662020 (ncRNA4) F: 5′-GGAGGATATGGCTCAGGACA-3′ 346 AK097922 NR_045486.1 
 R: 5′-TGGGCCACCAAGTTGTTTAT-3′    
Hs.690944 (ncRNA6) F: 5′-CACCTCATTCAGTGGCTCCT-3′ 353 DA953527  
 R: 5′-GGCAATGTCAGGGAAAAGAA-3′    
Hs.439791 (ncRNA12) F: 5′-TGCCTCACCACTTGTCTCAG-3′ 823 CR598049 NR_024464.1 
 R: 5′-CAGGATGACTCAGTGGAGCA-3′    
Hs.667175 (ncRNA16) F: 5′-CCAACCCAGGAGACAGTGTT-3′ 275 AL832284  
 R: 5′-CGTTCTCCCCATGTCTCTGT-3′    
Hs.657722 (ncRNA17) F: 5′-CCTCCTGCCTCCTTGGTAAT-3′ 399 CD367998 NR_026544.1 
 R: 5′-TGGTATGTTCACGGCTGAAA-3′    
Hs.130180 (ncRNA18) F: 5′-GCCATCTTGACTCACTGCAA-3′ 332 DA937565  
 R: 5′-AGGATGTGTGAGCCATGACA-3′    
Hs.672896 (ncRNA20) F: 5′-AGAGATGGAACCAGCTCGAA-3′ 366 DA468624  
 R: 5′-GATTCCCTCCAGCTCTCAAA-3′    
(AS-LRRK2) F: 5′-CTTCAACCCAGCAAATCTCC-3′ 158 DA673065  
 R: 5′-ATCAGCCCTTCCAATGTCAC-3′    
(LRRK2) F: 5′-TGACAGCACAGCTAGGAAGC-3′ 128 NM_198578.3  
 R: 5′-TGGAAGATTGATGTCCCAAA-3′    
(PRDM15) F: 5′-CGCAGTACAGAGCATTCAGC-3′ 186 NM_022115.3  
 R: 5′-AGCTGTAACTGGCGTTCAGG-3′    
UniGene Cluster
(Primer Name)SequenceProduct (bp)EST Accession No.NR Accession No.
Hs.662020 (ncRNA4) F: 5′-GGAGGATATGGCTCAGGACA-3′ 346 AK097922 NR_045486.1 
 R: 5′-TGGGCCACCAAGTTGTTTAT-3′    
Hs.690944 (ncRNA6) F: 5′-CACCTCATTCAGTGGCTCCT-3′ 353 DA953527  
 R: 5′-GGCAATGTCAGGGAAAAGAA-3′    
Hs.439791 (ncRNA12) F: 5′-TGCCTCACCACTTGTCTCAG-3′ 823 CR598049 NR_024464.1 
 R: 5′-CAGGATGACTCAGTGGAGCA-3′    
Hs.667175 (ncRNA16) F: 5′-CCAACCCAGGAGACAGTGTT-3′ 275 AL832284  
 R: 5′-CGTTCTCCCCATGTCTCTGT-3′    
Hs.657722 (ncRNA17) F: 5′-CCTCCTGCCTCCTTGGTAAT-3′ 399 CD367998 NR_026544.1 
 R: 5′-TGGTATGTTCACGGCTGAAA-3′    
Hs.130180 (ncRNA18) F: 5′-GCCATCTTGACTCACTGCAA-3′ 332 DA937565  
 R: 5′-AGGATGTGTGAGCCATGACA-3′    
Hs.672896 (ncRNA20) F: 5′-AGAGATGGAACCAGCTCGAA-3′ 366 DA468624  
 R: 5′-GATTCCCTCCAGCTCTCAAA-3′    
(AS-LRRK2) F: 5′-CTTCAACCCAGCAAATCTCC-3′ 158 DA673065  
 R: 5′-ATCAGCCCTTCCAATGTCAC-3′    
(LRRK2) F: 5′-TGACAGCACAGCTAGGAAGC-3′ 128 NM_198578.3  
 R: 5′-TGGAAGATTGATGTCCCAAA-3′    
(PRDM15) F: 5′-CGCAGTACAGAGCATTCAGC-3′ 186 NM_022115.3  
 R: 5′-AGCTGTAACTGGCGTTCAGG-3′    

F, Forward; R, reverse.

To identify immune-enriched genes using EST datasets, we developed a quantitative schema outlined in Fig. 1. We use the term “immune-enriched” to indicate higher mRNA transcript levels in immune tissues compared with nonimmune tissues.

FIGURE 1.

Schematic of the human EST-based profiling analysis pipeline. Starting with the NCBI UniGene database build 202, 26,552 clusters with at least 10 ESTs were profiled for their expression in immune tissues. Based on the genome-wide distribution of WIP scores, a p value was calculated for each gene. A significant p value (<0.05) indicated that there were enough ESTs to conclude that a particular WIP score was higher than average. Clusters identified as immune enriched were cross-validated using a microarray data platform and GO/KEGG annotation with subsequent experimental follow-up.

FIGURE 1.

Schematic of the human EST-based profiling analysis pipeline. Starting with the NCBI UniGene database build 202, 26,552 clusters with at least 10 ESTs were profiled for their expression in immune tissues. Based on the genome-wide distribution of WIP scores, a p value was calculated for each gene. A significant p value (<0.05) indicated that there were enough ESTs to conclude that a particular WIP score was higher than average. Clusters identified as immune enriched were cross-validated using a microarray data platform and GO/KEGG annotation with subsequent experimental follow-up.

Close modal

The NCBI Human UniGene database (build 202) contained 6,731,038 ESTs derived from 8,648 human cDNA libraries. These ESTs were clustered by sequence alignment with no genomic scaffold into 171,154 clusters. Of the resulting clusters, we focused on the 26,552 that had at least 10 ESTs, encompassing a total of 6,049,575 ESTs. The UniGene tissue library descriptors were annotated and standardized, resulting in 469 tissues/cell types. Of these tissues/cell types, 91 (∼20%) were classified as immune tissues, with the remaining defined as nonimmune tissues. Because each EST in a particular UniGene cluster has an associated library identification number, we were able to create a binary classification scheme, categorizing each EST as to whether it derived from immune or nonimmune tissues (Supplemental Fig. 1A).

Next, we introduced a scoring system to quantify the level of immune expression for each of the 26,552 clusters. Specifically, we calculated a WIP score for each Unigene cluster which reflects the normalized percent of ESTs derived from immune tissues within each cluster. To assess whether the WIP score for each UniGene cluster was significantly enriched in immune expression, we applied a χ2 statistic and corrected for multiple hypothesis testing using the FDR method (see 2Materials and Methods and Table I). The distribution of the WIP score (mean 0.095) and p values for all 26,552 UniGene clusters are shown in Fig. 2A and Supplemental Table I.

FIGURE 2.

Quantitative EST-based immune expression enrichment analysis utilizing 26,552 UniGene clusters. (A) The frequency distribution of the WIP score (mean 0.095) and p values (χ2 metric with FDR correction) for 26,552 UniGene clusters with ≥10 ESTs is shown. Red indicates clusters/genes significantly enriched in immune expression (WIP > 0.095; p < 0.05), whereas blue indicates clusters/genes significantly underexpressed in the immune system (WIP < 0.05 and p < 0.05). (B) The frequency distribution (density) of WIP scores for 26,552 UniGene clusters (gray; mean 0.095) versus the WIP scores of the 570 known immune-enriched reference gene sets (red; mean 0.44). (C) The distribution of WIP scores and p values for the 570 known immune-enriched reference gene sets showing that 85% (red) of this gold-standard gene set have significant WIP and p values.

FIGURE 2.

Quantitative EST-based immune expression enrichment analysis utilizing 26,552 UniGene clusters. (A) The frequency distribution of the WIP score (mean 0.095) and p values (χ2 metric with FDR correction) for 26,552 UniGene clusters with ≥10 ESTs is shown. Red indicates clusters/genes significantly enriched in immune expression (WIP > 0.095; p < 0.05), whereas blue indicates clusters/genes significantly underexpressed in the immune system (WIP < 0.05 and p < 0.05). (B) The frequency distribution (density) of WIP scores for 26,552 UniGene clusters (gray; mean 0.095) versus the WIP scores of the 570 known immune-enriched reference gene sets (red; mean 0.44). (C) The distribution of WIP scores and p values for the 570 known immune-enriched reference gene sets showing that 85% (red) of this gold-standard gene set have significant WIP and p values.

Close modal

To evaluate whether our EST immune-based expression profiling was sufficiently sensitive and specific to discover immune-related genes, we compiled a reference set of genes (n = 570) documented in the literature as being preferentially expressed in the immune system as defined by RT-PCR or Northern blot analysis (Supplemental Table I). To cross-validate our literature annotation for this immune-enriched gold-standard set of genes, we performed GO biological processes, KEGG pathway, and microarray enrichment analyses on this gene set. GO enrichment analysis revealed enrichment in categories such as immune system process, leukocyte activation, T cell activation, and inflammatory response (Supplemental Fig. 2A). All KEGG pathways identified as being significantly overrepresented were immune pathways (Supplemental Fig. 2B). Microarray expression analysis of our compiled immune-enriched reference set showed that these genes are preferentially expressed in human immune tissues, thus providing a high-confidence set of immune-enriched genes against which we could benchmark our EST profiling results (Supplemental Fig. 2C, 2D).

We identified 3352 UniGene clusters with a significant p value (p < 0.05) and a WIP score greater than the average (0.095). Importantly, the average WIP scores for the gold-standard genes were significantly higher, with a mean score of 0.44 and an SD of 0.235, suggesting that our EST profiling methodology was robust (Fig. 2B). In addition, the vast majority of our gold-standard genes had significant p values (p < 0.05), indicating the specificity of our methods (Fig. 2C). Using a conservative WIP score threshold of 0.15 and p < 0.05, we identified a total of 2834 (10.6%) clusters matching 2232 unique NCBI Gene LocusLink identification numbers as being immune enriched. We termed these 2232 genes “Immunogenes.” Although we focus on the results of human immune profiling in this report, application of our methodology to the mouse UniGene database ultimately identified 3108 unique mouse genes as immune enriched (Supplemental Fig. 1B, 1C, Supplemental Table I).

We next assessed the strength of EST profiling against microarray profiling to provide: 1) an independent method of cross-validation of each platform; and 2) increased cellular resolution within the immune system of our EST-derived dataset. We quantified immune enrichment by calculating a Wilcoxon rank-sum test and adjusted for multiple hypothesis testing (FDR corrected p < 0.05) using the publicly available BioGPS atlas (9). This atlas is a broad compendium of human tissues, with 22 of 79 profiled tissues categorized as immune tissues. Of the 2232 Immunogenes, 1927 had representative probes on the microarray. Strikingly, 1522 out of 1927 (79%) of the genes found to be immune enriched based on EST profiling WIP score (WIP >0.15) also exhibited significantly higher immune expression by microarray (p < 0.05; FDR microarray Wilcoxon test) (Fig. 3A). Hierarchical clustering and heat map representation of the 1927 EST Immunogene profiles separated by immune versus nonimmune tissue are shown in Fig. 3B, revealing enrichment of EST-derived Immunogenes in the BioGPS compendia and providing further cross-validation of our methodology.

FIGURE 3.

Cross-validation of EST profiling methodology with microarray profiling, GO pathway analysis, and epigenetic signatures. (A) Of the 1927 genes found to be significantly enriched by EST profiling, 1522 genes (79%) exhibited significantly higher expression by Wilcoxon rank-sum test and FDR (p < 0.05) in the immune system based on analysis of the GNF BioGPS microarray data tissue compendium. (B) Microarray profile clustering in GNF tissue/cell compendium of 1927 immune-enriched genes defined by EST profiling segregating immune versus nonimmune tissue types. (C) Examples of GO biological process terms statistically enriched (p < 0.05, Bonferroni corrected) for all EST Immunogenes. (D) The Wilcoxon signed-rank (y-axis) for 1336 Immunogenes compared with their WIP score (x-axis) showing that most Immunogenes show a positively signed rank.

FIGURE 3.

Cross-validation of EST profiling methodology with microarray profiling, GO pathway analysis, and epigenetic signatures. (A) Of the 1927 genes found to be significantly enriched by EST profiling, 1522 genes (79%) exhibited significantly higher expression by Wilcoxon rank-sum test and FDR (p < 0.05) in the immune system based on analysis of the GNF BioGPS microarray data tissue compendium. (B) Microarray profile clustering in GNF tissue/cell compendium of 1927 immune-enriched genes defined by EST profiling segregating immune versus nonimmune tissue types. (C) Examples of GO biological process terms statistically enriched (p < 0.05, Bonferroni corrected) for all EST Immunogenes. (D) The Wilcoxon signed-rank (y-axis) for 1336 Immunogenes compared with their WIP score (x-axis) showing that most Immunogenes show a positively signed rank.

Close modal

We also reasoned that if expression is an indicator of potential function, then enrichment would also be found in GO biological process categories and KEGG pathways relevant to the immune system, which was indeed the case (Fig. 3C, Supplemental Fig. 1D). In addition, analysis of publicly available RNA-Seq data comparing spleen to 10 other nonimmune tissues revealed that the list of Immunogenes defined by EST profiling showed a significantly higher tendency to be expressed in spleen versus nonimmune tissues (Z = 7.03 for all genes, Z = 151.42 for Immunogenes, Wilcoxon signed-rank test) (Fig. 3D). KIAA0226L (WIP = 0.37; p = 1.65 × 10−38) is an example of a novel immune-enriched gene identified by EST profiling. Interestingly, KIAA0226L has been implicated via proteomics to be involved in autophagy (26), and RNA-Seq has demonstrated expression of KIAA0226L in human spleen and mouse C19+ B cells, but not in mouse CD4+ T cells (3, 10). As the ability of EST analysis to spotlight immune-enriched genes was confirmed by multiple analyses, we designed a browseable Web-based portal of Immunogenes accessible at http://xavierlab2.mgh.harvard.edu/ESTProfiler/index.html.

To fully characterize the Immunogene dataset, we took advantage of structure-function relationships. For this approach, we analyzed each Immunogene systematically by: 1) determining the presence of potential protein domains using Simple Modular Architecture Research Tool and PFAM database criteria; 2) checking for homology to any known or predicted proteins in multiple species using BLASTP alignments; and 3) identifying predicted or known transmembrane regions using the TMHMM prediction algorithm. Structural annotation/prediction revealed that 20% of Immunogenes (440 of 2232) have known or predicted enzymatic functions using an enzymology classification system or homology modeling. These observations led us to question whether such a large fraction of enzymes in the Immunogene gene set was a reflection of the unique metabolic pathways and needs of immune cells. We used a similar number of UniGenes (460 of 2611) that were significantly underrepresented in immune tissues (WIP < 0.05; p < 0.05) as a control set of enzymes. Using two independent expression maps, BioGPS (p = 1.4 × 10−15) and NCBI serial analysis of gene expression tags (p = 1.2 × 10−20), we observed that our EST immune enzyme inventory was most enriched in monocytes/dendritic cells compared with other cell types. In contrast, the control set of enzymes was enriched in the nervous system (p = 1 × 10−13). These results strongly support our conclusion that EST profiling has revealed a genome-wide compendium of enzymes that are likely to be particularly relevant to immune system function and development. For example, one novel potential enzyme identified as being immune-enriched was ARL11 (WIP = 0.3; p = 0.028), which is predicted to encode an ADP-ribosylation factor.

In other instances, we identified enzymes that have clear enzymatic function, but which have only been evaluated to a limited degree in immune system function. One such enzyme is fructose 1,6-diphosphatase, which is known to be critical in gluconeogenesis in the liver, but was revealed by EST profiling to be enriched in macrophages. Fructose 1,6-diphosphatase was previously found to be a vitamin D–responsive gene in monocytes (27). Thus, our data highlight how EST profiling can also help focus attention on potential intriguing links between metabolism and immune function (27, 28).

In addition to the enzymatic footprints of immune cells, our structure-function analysis demonstrated highly significant enrichment of genes encoding transmembrane proteins (n = 215; p = 7 × 10−10 “integral to membrane” GO annotation) with an even greater number of genes (n = 285) identified by transmembrane prediction algorithms. This result correlated with our finding that the largest PFAM and InterPro classes of protein domains in the Immunogene set were Ig domains (n = 87; p = 3.9 × 10−12). These transmembrane proteins included cytokine receptors, ion channels such the calcium channel ORAI2, as well as novel transmembrane proteins (n = 69) and predicted secreted proteins (n = 22). One example of a transmembrane protein that we identified as being immune enriched is the nucleotide transporter SLC29A3, which causes H syndrome, Faisalabad histiocytosis, and Rosai-Dorfman disease (OMIM identification number 602782) when mutated. Patients with H syndrome have overlapping features of hemophagocytic syndromes, suggesting that functional characterization of SCL29A3 is likely to yield insights into immune function (29). The strong representation of known or predicted nutrient and ion transporters is also consistent with the unique metabolic needs of the immune system. Our results provide a guide to further dissect the links between metabolism and immune function. An integrated portrait of immune cell transporters and enzymes generated using Immunogene is shown in Fig. 4A.

Given that our EST profiling is agnostic to the type of transcripts it identifies, we also identified several immune-enriched microRNAs, such as miR-155 (BIC) (p = 2.33 × 10−14). Further inspection of the human EST dataset revealed that 30% (n = 388) of the UniGene clusters with WIP > 0.20 and p < 0.05 were not annotated as hypothetical or validated reference protein-coding genes at NCBI. Manual curation of these clusters revealed that a large fraction was likely composed of genomic contaminants, alternative exons, or untranslated regions of known protein-coding genes (see 2Materials and Methods). However, 46 UniGene clusters demonstrated evidence of splicing and did not show any significant similarity to known protein-coding genes using BLASTX, suggesting that these clusters likely represent lncRNAs (>200 bp). Using RT-PCR, we confirmed expression of 10 of 17 (59%) randomly selected clusters in human spleen (see Supplemental Fig. 3A, 3B, and Table II for examples). One such human-specific lncRNA identified as an Immunogene was AK056817 (WIP = 0.34; p = 7.0 × 10−9). Although the functions of the ncRNAs are largely unknown, analysis of ENCODE ChIP-Seq data in leukemic K562 cells revealed IFN-inducible binding of STAT1 to the promoter region of AK056817, suggesting that this lncRNA may function in IFN response pathways (30). Thus, we suggest that integrating additional genome-wide datasets can be used to refine potential functional pathways of Immunogenes.

AK056817 represents a likely lncRNA that is transcribed in an intergenic region away from other known genes. Yet many coding genes have recently been found to exist as pairs with lncRNAs that are divergently transcribed on the opposite strand with a TSS within a few kilobases of the known mRNA TSS (31). We therefore asked how many protein-coding Immunogenes harbored a divergently transcribed lncRNA based on the presence of a divergent spliced EST. For this analysis, we relaxed our stringency requirements such that a single EST from any tissue was sufficient to be counted, because most lncRNAs are transcribed at significantly lower levels than mRNAs. Interestingly, we identified 188 Immunogenes with divergently transcribed lncRNAs, including multiple susceptibility genes for immune disorders such as LRRK2, CYLD, and MFHAS1 (Supplemental Fig. 3C). We had previously shown that LRRK2 is a IFN-γ–responsive gene. Strikingly, we show that the antisense LRRK2 lncRNA, which we term antisense (AS)-LRRK2, is coordinately upregulated with LRRK2 upon IFN-γ stimulation (Fig. 4B). Thus, EST profiling can also be used to reveal novel lncRNAs responsive to specific immune-signaling pathways (30, 32, 33).

We found that the Immunogene compendium is significantly enriched in the KEGG category “primary immunodeficiency” (hsa05340; n = 28 of 34; p = 2.03 × 10−14). This KEGG pathway does not include other primary immunodeficiency genes that we identified as Immunogenes such as CARD9, CLEC7A, IL17RA (familial candidiasis 2, 4, 5; OMIM identification numbers 212050, 613108, and 613953), CXCR4 (WHIM syndrome; OMIM identification number 193670), DOCK8 (hyper-IgE syndrome; OMIM identification number 243700), WAS (Wiskott-Aldrich syndrome; OMIM identification number 301000), and SH2D1A (X-linked lymphoproliferative disease; OMIM identification number 308240) (34) (Supplemental Table I). These results demonstrate the ability of our approach to highlight genes responsible for monogenic immune-related human diseases.

In terms of complex traits, GWAS have identified thousands of genetic loci related to disease phenotypes or traits, although identification of the exact causative polymorphism and mechanism(s) of action including relevant target gene(s) remains a challenge. We postulated that EST-defined Immunogenes may be significantly more enriched at loci implicated in immune-related diseases/traits (e.g., inflammatory bowel disease) given their expression in disease/trait-relevant tissue/cells (see 2Materials and Methods for details of immune-related diseases/traits analyzed). To test this possibility, we plotted the frequency of SNPs associated with human immune diseases/traits relative to the distance to the nearest Immunogene. The distance distribution of SNPs associated with nonimmune-related diseases/traits to the nearest Immunogene was used as a comparison. As shown in Fig. 4C, there is a higher frequency of SNPs associated with immune-related diseases/traits near Immunogenes, as compared with SNPs not associated with immune-related SNPs. Strikingly, we found that 254 of 808 (31.44%) immune-related disease/trait-associated SNPs had their closest RefSeq coding gene as an EST-defined Immunogene versus 783 of 7355 (10.65%) nonimmune disease/trait-associated SNPs (p < 10−4 based on permutation testing).

A recent GWAS study identified the locus tagging SNP rs140522 as associated with MS. In this study, the authors cited the nearby gene encoding cytochrome oxidase deficient homolog 2 (SCO2) as the MS candidate gene within the locus (35). However, the nearest gene to rs140522 is not SCO2, but rather outer dense fiber of sperm tails 3B (ODF3B), which previously has not been implicated in immune function. Our approach identified ODF3B as novel Immunogene (Hs.531314; WIP = 0.229; p = 0.005). Thus, ODF3B is an alternative candidate gene for further testing in the pathophysiology of MS, illustrating the utility of integrating our EST expression profiling results with GWAS studies.

We identified PRDM15, a zinc finger and SET domain-encoding gene, as one of the most highly immune-enriched (WIP = 0.71; p = 5.1 × 10−19) genes with little known function(s). We chose to focus on PRDM15 given our interest in B cell lymphoma pathogenesis coupled with our observation that a high fraction of the PRDM15-derived ESTs were from either normal germinal center B cells (BGC) or from B cell lymphoma cell line cDNA libraries (36). BGC are thought to be the normal counterpart of some human B cell malignancies, including follicular lymphoma, Burkitt lymphoma, and diffuse large B cell lymphoma. To refine and further corroborate our EST data, we therefore analyzed the relative expression pattern of PRDM15 among various types of B cell lymphomas and non–B cell cancer-derived cell lines using published microarray data (23). As shown in Fig. 5A, PRDM15 showed the highest expression in cell lines derived from follicular lymphomas (n = 9) and the second highest expression level in Burkitt lymphomas (n = 17), compared with germinal B cell– and activated B cell–type non-Hodgkin diffuse large B cell lymphoma (n = 11), Hodgkin lymphomas (n = 9), lung cancers (n = 99), and colon cancers (n = 52). These results supported the observation that PRDM15 is specifically overexpressed in human B cell lymphomas. These data are consistent with RT-qPCR data showing that the Burkitt lymphoma cell line, Daudi, has 8–35 times more PRDM15 mRNA expression than several other cell lines/types (Fig. 5B). To examine PRDM15 expression at the protein level, we confirmed the nuclear localization of PRDM15 in HEK293T cells (Fig. 5C) and used Western blot analysis to demonstrate that PRDM15 protein is expressed in multiple Burkitt lymphomas and some monocytic leukemias, but not in T cell leukemias, suggesting that PRDM15 is expressed at higher levels in Burkitt B cell lymphomas not only at the mRNA level, but also at the protein level (Fig. 5D). In terms of follicular lymphoma expression, additional microarray analysis confirmed that PRDM15 was not only overexpressed in follicular lymphoma cell lines, but also in primary follicular lymphomas compared with primary normal B cell populations (Fig. 5E). Strikingly, immunohistochemistry analyses also revealed that PRDM15 was overexpressed in five of seven independent human follicular lymphoma samples examined, as compared with normal tonsillar germinal centers (Fig. 5F). Thus, our results demonstrate how initial insights gained by EST profiling can be used to identify a novel transcription factor overexpressed in human B cell lymphomas.

In this study, we present the application of a quantitative genome-wide analysis of the human/mouse dbEST database composed of >6 million data points to elucidate a collection of genes enriched in immune expression, which we term immunogenes. Using this database, we were able to identify an immune transmembrane and enzymatic signature. Furthermore, we implicated a novel transcription factor in B cell oncogenesis. Our results provide an important complement to other studies aimed at understanding the immune system including the RefDIC, ImmGen, and Immune Response In Silico databases, all of which largely used microarray profiling (25).

Having a genome-wide set of genes with preferential immune expression, we undertook a systems-level analysis of the Immunogene compendium, finding a unique enzymatic signature of immune cells. The study of the large set of known or predicted enzymes, transporters, and corresponding metabolites is likely to yield critical insights into the metabolism of immunity. Furthermore, the Immunogene database provides an enhanced view of the potential spectrum of potential transmembrane cell-surface markers expressed on various immune cells. These novel transmembrane Immunogenes represent potential candidates for targeted therapeutics or cell purification over the long-term via the development of Abs directed against these Ags. We demonstrated how the Immunogene approach can be used to identify genes relevant to the pathogenesis of lymphomas. Along these lines, we dissected the expression pattern of PRDM15, finding it to be overexpressed at the level of mRNA and protein in human follicular lymphomas. Thus, delineation of PRMD15 function may prove to be important in understanding BGC physiology and B cell malignancies.

The identification of genes involved in both Mendelian and complex traits encompassed within the dataset support the utility of the Immunogene database as an aid in genetic studies. In this regard, we demonstrate how the Immunogene dataset can be used to spotlight candidate genes in which GWAS studies, genetic linkage studies, quantitative trait locus analysis, or mouse mutagenesis screens have identified genomic intervals containing numerous genes. In addition, the Immunogene database can be harnessed to probe the variation of immune system responses through focused analysis of SNPs and copy number variations of the identified Immunogenes. The analysis of genetic variation in the Immunogene dataset, through methods such as coding variant SNP analysis or resequencing, is poised to yield a pool of human genes at the cornerstone of variation in human immune responses, including susceptibility to autoimmune diseases such as lupus, inflammatory bowel disease, and diabetes. In addition, we demonstrated how EST data can be used to identify lncRNAs in autoimmune disease loci, with the identification of IFN-inducible AS-LRRK2 lncRNA in the Crohn's disease LRRK2 locus. Our data suggest that Crohn's disease-associated SNPs such as rs11175593, which lies in an intron of the AS-LRRK2 lncRNA, may modulate the function and/or expression of not only LRRK2, but also its coregulated partner AS-LRRK2. Furthermore, extension of the Immunogene paradigm to other tissues may aid in deciphering human genes likely to be uniquely associated with other systems, such as the nervous system. We anticipate that by starting with the Immunogene dataset as a robust input, accompanied by multiple new layers of annotation, this resource will provide a dynamic platform to generate hypotheses to study known and novel genes in a variety of immune pathways.

We thank Natalia Nedelsky for substantive editorial assistance.

This work was supported by Grants DK083756, DK086502, HL088297, and DK043351 from the National Institutes of Health (to R.J.X.).

The online version of this article contains supplemental material.

Abbreviations used in this article:

BGC

germinal center B cell

ENCODE

Encyclopedia of DNA Elements

EST

expressed sequence tag

FDR

false discovery rate

GNF

Genomics Institute of the Novartis Research Foundation

GO

gene ontology

GWAS

genome-wide association study

KEGG

Kyoto Encyclopedia of Genes and Genomes

lncRNA

long noncoding RNA

MS

multiple sclerosis

NCBI

National Center for Biotechnology Information

ncRNA

noncoding RNA

ODF3B

outer dense fiber of sperm tails 3B

OMIM

Online Mendelian Inheritance in Man

qPCR

quantitative PCR

RNA-Seq

RNA sequencing

SNP

single nucleotide polymorphism

TSS

transcription start site

WIP

weighted immune percentage.

1
Osborn
O.
,
Olefsky
J. M.
.
2012
.
The cellular and signaling networks linking the immune system and metabolism in disease.
Nat. Med.
18
:
363
374
.
2
Hijikata
A.
,
Kitamura
H.
,
Kimura
Y.
,
Yokoyama
R.
,
Aiba
Y.
,
Bao
Y.
,
Fujita
S.
,
Hase
K.
,
Hori
S.
,
Ishii
Y.
, et al
.
2007
.
Construction of an open-access database that integrates cross-reference information from the transcriptome and proteome of immune cells.
Bioinformatics
23
:
2934
2941
.
3
Heng
T. S.
,
Painter
M. W.
Immunological Genome Project Consortium
.
2008
.
The Immunological Genome Project: networks of gene expression in immune cells.
Nat. Immunol.
9
:
1091
1094
.
4
Abbas
A. R.
,
Baldwin
D.
,
Ma
Y.
,
Ouyang
W.
,
Gurney
A.
,
Martin
F.
,
Fong
S.
,
van Lookeren Campagne
M.
,
Godowski
P.
,
Williams
P. M.
, et al
.
2005
.
Immune response in silico (IRIS): immune-specific genes identified from a compendium of microarray expression data.
Genes Immun.
6
:
319
331
.
5
Hyatt
G.
,
Melamed
R.
,
Park
R.
,
Seguritan
R.
,
Laplace
C.
,
Poirot
L.
,
Zucchelli
S.
,
Obst
R.
,
Matos
M.
,
Venanzi
E.
, et al
.
2006
.
Gene expression microarrays: glimpses of the immunological genome.
Nat. Immunol.
7
:
686
691
.
6
Hoffman
B. G.
,
Williams
K. L.
,
Tien
A. H.
,
Lu
V.
,
de Algara
T. R.
,
Ting
J. P.
,
Helgason
C. D.
.
2006
.
Identification of novel genes and transcription factors involved in spleen, thymus and immunological development and function.
Genes Immun.
7
:
101
112
.
7
Hashimoto
S. I.
,
Suzuki
T.
,
Nagai
S.
,
Yamashita
T.
,
Toyoda
N.
,
Matsushima
K.
.
2000
.
Identification of genes specifically expressed in human activated and mature dendritic cells through serial analysis of gene expression.
Blood
96
:
2206
2214
.
8
Castle
J. C.
,
Armour
C. D.
,
Löwer
M.
,
Haynor
D.
,
Biery
M.
,
Bouzek
H.
,
Chen
R.
,
Jackson
S.
,
Johnson
J. M.
,
Rohl
C. A.
,
Raymond
C. K.
.
2010
.
Digital genome-wide ncRNA expression, including SnoRNAs, across 11 human tissues using polyA-neutral amplification.
PLoS ONE
5
:
e11779
.
9
Wu
C.
,
Orozco
C.
,
Boyer
J.
,
Leglise
M.
,
Goodale
J.
,
Batalov
S.
,
Hodge
C. L.
,
Haase
J.
,
Janes
J.
,
Huss
J. W.
 III
,
Su
A. I.
.
2009
.
BioGPS: an extensible and customizable portal for querying and organizing gene annotation resources.
Genome Biol.
10
:
R130
.
10
Krupp
M.
,
Marquardt
J. U.
,
Sahin
U.
,
Galle
P. R.
,
Castle
J.
,
Teufel
A.
.
2012
.
RNA-Seq Atlas—a reference database for gene expression profiling in normal tissue by next-generation sequencing.
Bioinformatics
28
:
1184
1185
.
11
Gu
J.
,
He
T.
,
Pei
Y.
,
Li
F.
,
Wang
X.
,
Zhang
J.
,
Zhang
X.
,
Li
Y.
.
2006
.
Primary transcripts and expressions of mammal intergenic microRNAs detected by mapping ESTs to their flanking sequences.
Mamm. Genome
17
:
1033
1041
.
12
Dunham
I.
,
Kundaje
A.
,
Aldred
S. F.
,
Collins
P. J.
,
Davis
C. A.
,
Doyle
F.
,
Epstein
C. B.
,
Frietze
S.
,
Harrow
J.
,
Kaul
R.
, et al
ENCODE Project Consortium
.
2012
.
An integrated encyclopedia of DNA elements in the human genome.
Nature
489
:
57
74
.
13
Gerstein
M. B.
,
Kundaje
A.
,
Hariharan
M.
,
Landt
S. G.
,
Yan
K. K.
,
Cheng
C.
,
Mu
X. J.
,
Khurana
E.
,
Rozowsky
J.
,
Alexander
R.
, et al
.
2012
.
Architecture of the human regulatory network derived from ENCODE data.
Nature
489
:
91
100
.
14
Thurman
R. E.
,
Rynes
E.
,
Humbert
R.
,
Vierstra
J.
,
Maurano
M. T.
,
Haugen
E.
,
Sheffield
N. C.
,
Stergachis
A. B.
,
Wang
H.
,
Vernot
B.
, et al
.
2012
.
The accessible chromatin landscape of the human genome.
Nature
489
:
75
82
.
15
Katsanis
N.
,
Worley
K. C.
,
Gonzalez
G.
,
Ansley
S. J.
,
Lupski
J. R.
.
2002
.
A computational/functional genomics approach for the enrichment of the retinal transcriptome and the identification of positional candidate retinopathy genes.
Proc. Natl. Acad. Sci. USA
99
:
14326
14331
.
16
Liang
S.
,
Zhao
S.
,
Mu
X.
,
Thomas
T.
,
Klein
W. H.
.
2004
.
Novel retinal genes discovered by mining the mouse embryonic RetinalExpress database.
Mol. Vis.
10
:
773
786
.
17
Song
H. D.
,
Sun
X. J.
,
Deng
M.
,
Zhang
G. W.
,
Zhou
Y.
,
Wu
X. Y.
,
Sheng
Y.
,
Chen
Y.
,
Ruan
Z.
,
Jiang
C. L.
, et al
.
2004
.
Hematopoietic gene expression profile in zebrafish kidney marrow.
Proc. Natl. Acad. Sci. USA
101
:
16240
16245
.
18
Benjamini
Y.
,
Hochberg
Y.
.
1995
.
Controlling the false discovery rate: a practical and powerful approach to multiple testing.
J. R. Stat. Soc. B
57
:
289
300
.
19
Dennis
G.
 Jr.
,
Sherman
B. T.
,
Hosack
D. A.
,
Yang
J.
,
Gao
W.
,
Lane
H. C.
,
Lempicki
R. A.
.
2003
.
DAVID: Database for Annotation, Visualization, and Integrated Discovery.
Genome Biol.
4
:
3
.
20
Su
A. I.
,
Cooke
M. P.
,
Ching
K. A.
,
Hakak
Y.
,
Walker
J. R.
,
Wiltshire
T.
,
Orth
A. P.
,
Vega
R. G.
,
Sapinoso
L. M.
,
Moqrich
A.
, et al
.
2002
.
Large-scale analysis of the human and mouse transcriptomes.
Proc. Natl. Acad. Sci. USA
99
:
4465
4470
.
21
de Hoon
M. J.
,
Imoto
S.
,
Nolan
J.
,
Miyano
S.
.
2004
.
Open source clustering software.
Bioinformatics
20
:
1453
1454
.
22
Saldanha
A. J.
2004
.
Java Treeview—extensible visualization of microarray data.
Bioinformatics
20
:
3246
3248
.
23
Rosenwald
A.
,
Alizadeh
A. A.
,
Widhopf
G.
,
Simon
R.
,
Davis
R. E.
,
Yu
X.
,
Yang
L.
,
Pickeral
O. K.
,
Rassenti
L. Z.
,
Powell
J.
, et al
.
2001
.
Relation of gene expression phenotype to immunoglobulin mutation genotype in B cell chronic lymphocytic leukemia.
J. Exp. Med.
194
:
1639
1647
.
24
Wheeler
D. L.
,
Barrett
T.
,
Benson
D. A.
,
Bryant
S. H.
,
Canese
K.
,
Chetvernin
V.
,
Church
D. M.
,
DiCuccio
M.
,
Edgar
R.
,
Federhen
S.
, et al
.
2007
.
Database resources of the National Center for Biotechnology Information.
Nucleic Acids Res.
35
(
Database issue
):
D5
D12
.
25
Wu
C. L.
,
Kirley
S. D.
,
Xiao
H.
,
Chuang
Y.
,
Chung
D. C.
,
Zukerberg
L. R.
.
2001
.
Cables enhances cdk2 tyrosine 15 phosphorylation by Wee1, inhibits cell growth, and is lost in many human colon and squamous cancers.
Cancer Res.
61
:
7325
7332
.
26
Behrends
C.
,
Sowa
M. E.
,
Gygi
S. P.
,
Harper
J. W.
.
2010
.
Network organization of the human autophagy system.
Nature
466
:
68
76
.
27
Solomon
D. H.
,
Raynal
M. C.
,
Tejwani
G. A.
,
Cayre
Y. E.
.
1988
.
Activation of the fructose 1,6-bisphosphatase gene by 1,25-dihydroxyvitamin D3 during monocytic differentiation.
Proc. Natl. Acad. Sci. USA
85
:
6904
6908
.
28
Muindi
J. R.
,
Peng
Y.
,
Wilson
J. W.
,
Johnson
C. S.
,
Branch
R. A.
,
Trump
D. L.
.
2007
.
Monocyte fructose 1,6-bisphosphatase and cytidine deaminase enzyme activities: potential pharmacodynamic measures of calcitriol effects in cancer patients.
Cancer Chemother. Pharmacol.
59
:
97
104
.
29
Morgan
N. V.
,
Morris
M. R.
,
Cangul
H.
,
Gleeson
D.
,
Straatman-Iwanowska
A.
,
Davies
N.
,
Keenan
S.
,
Pasha
S.
,
Rahman
F.
,
Gentle
D.
, et al
.
2010
.
Mutations in SLC29A3, encoding an equilibrative nucleoside transporter ENT3, cause a familial histiocytosis syndrome (Faisalabad histiocytosis) and familial Rosai-Dorfman disease.
PLoS Genet.
6
:
e1000833
.
30
Rozowsky
J.
,
Euskirchen
G.
,
Auerbach
R. K.
,
Zhang
Z. D.
,
Gibson
T.
,
Bjornson
R.
,
Carriero
N.
,
Snyder
M.
,
Gerstein
M. B.
.
2009
.
PeakSeq enables systematic scoring of ChIP-seq experiments relative to controls.
Nat. Biotechnol.
27
:
66
75
.
31
Hung
T.
,
Wang
Y.
,
Lin
M. F.
,
Koegel
A. K.
,
Kotake
Y.
,
Grant
G. D.
,
Horlings
H. M.
,
Shah
N.
,
Umbricht
C.
,
Wang
P.
, et al
.
2011
.
Extensive and coordinated transcription of noncoding RNAs within cell-cycle promoters.
Nat. Genet.
43
:
621
629
.
32
Rinn
J. L.
,
Chang
H. Y.
.
2012
.
Genome regulation by long noncoding RNAs.
Annu. Rev. Biochem.
81
:
145
166
.
33
Cabili
M. N.
,
Trapnell
C.
,
Goff
L.
,
Koziol
M.
,
Tazon-Vega
B.
,
Regev
A.
,
Rinn
J. L.
.
2011
.
Integrative annotation of human large intergenic noncoding RNAs reveals global properties and specific subclasses.
Genes Dev.
25
:
1915
1927
.
34
McKusick
V. A.
2007
.
Mendelian Inheritance in Man and its online version, OMIM.
Am. J. Hum. Genet.
80
:
588
604
.
35
International Multiple Sclerosis Genetics Consortium
; 
Wellcome Trust Case Control Consortium 2
Sawcer
S.
,
Hellenthal
G.
,
Pirinen
M.
,
Spencer
C. C.
,
Patsopoulos
N. A.
,
Moutsianas
L.
,
Dilthey
A.
,
Su
Z.
, et al
.
2011
.
Genetic risk and a primary role for cell-mediated immune mechanisms in multiple sclerosis.
Nature
476
:
214
219
.
36
Chiarle
R.
,
Zhang
Y.
,
Frock
R. L.
,
Lewis
S. M.
,
Molinie
B.
,
Ho
Y. J.
,
Myers
D. R.
,
Choi
V. W.
,
Compagno
M.
,
Malkin
D. J.
, et al
.
2011
.
Genome-wide translocation sequencing reveals mechanisms of chromosome breaks and rearrangements in B cells.
Cell
147
:
107
119
.
37
Barrett
J. C.
,
Hansoul
S.
,
Nicolae
D. L.
,
Cho
J. H.
,
Duerr
R. H.
,
Rioux
J. D.
,
Brant
S. R.
,
Silverberg
M. S.
,
Taylor
K. D.
,
Barmada
M. M.
, et al
NIDDK IBD Genetics Consortium
; 
Belgian-French IBD Consortium
; 
Wellcome Trust Case Control Consortium
.
2008
.
Genome-wide association defines more than 30 distinct susceptibility loci for Crohn’s disease.
Nat. Genet.
40
:
955
962
.

The authors have no financial conflicts of interest.