Abstract
Dendritic cells (DCs) are functionally diverse and are present in most adult tissues, but deep understanding of human DC biology is hampered by relatively small numbers of these in circulation and their short lifespan in human tissues. We built a transcriptional atlas of human DCs by combining samples from 14 expression profiling studies derived from 10 laboratories. We identified significant gene expression variation of DC subset–defining markers across tissue type and upon viral or bacterial stimulation. We further highlight critical gaps between in vitro–derived DC subsets and their in vivo counterparts and provide evidence that monocytes or cord blood progenitor in vitro–differentiated DCs fail to capture the repertoire of primary DC subsets or behaviors. In constructing a reference DC atlas, we provide an important resource for the community wishing to identify and annotate tissue-specific DC subsets from single-cell datasets, or benchmark new in vitro models of DC biology.
Introduction
Resident in most tissues, dendritic cells (DCs) are specialized Ag-presenting components of the immune system that play a key role in the mounting and regulating of Ag-specific responses by T and B lymphocytes. Their role in bridging innate and adaptive immunity has made them a popular target in vaccine development, immunotherapies, and autoimmune disease treatments (reviewed in Refs. 1, 2). There are several classes of DCs that are distinct in morphology, phenotype, and function. These classes are comprised of plasmacytoid DCs (pDCs), at least two functionally distinct subsets of conventional DCs (cDCs; type 1 cDCs [cDC1s] and type 2 cDCs [cDC2s]), and monocyte-derived DCs (MoDCs). The latter are not present at steady state but are differentiated from monocytes during inflammatory conditions.
As DCs are a rare cell population in human tissues, our understanding of their biology has largely arisen from model organisms, but cross-species comparisons have revealed the distinctiveness of mouse DC subsets from human equivalents. These include the absence of shared expression of subset-defining markers (reviewed in Ref. 3), TLR8-mediated responses to ssRNA (4), and expression of costimulatory molecules after stimulator of IFN genes (STING)–dependent activation (5). To study DC heterogeneity and complexity in humans, many studies have relied on putative DC isolation from blood. For example, one of the first single-cell transcriptome experiments of blood DCs (6) described a new DC subset termed AXL+ SIGLEC6+ DCs. Although some individual studies have described DC subsets from other tissues, including spleen (7–9), intestine (10), skin (11), or bone marrow (12), comparisons between these individual datasets are ad hoc in the absence of a reference that can integrate information from diverse sources. Indeed, although several transcriptome studies have enumerated the types and behaviors of human DC subsets, there is little agreement with respect to the subset-specific transcriptional phenotypes. For example, a recent comparison of three studies reported <5% agreement between “signature genes” associated with the cross-presenting specialist cDC1 subset (6, 9, 13, 14).
Given the importance of DCs in immunity, as well as their distinctive phenotype and functions compared with other myeloid components, including macrophages and monocytes, the lack of laboratory models of these cells is becoming a critical bottleneck for the field. The current in vitro models of human DC biology rely on the differentiation of CD34+ hematopoietic stem cells from blood or bone marrow (15), are differentiated in vitro from peripheral blood monocytes (16), or use induced pluripotent stem cells (17) or directed differentiation from fibroblasts (18, 19). Concordance between cord blood–derived cDC1s with their in vivo blood counterparts was previously reported (20). However, the experimental setup did not assess both sources simultaneously, and similarity was evaluated with a limited set of markers for each subset. Therefore, introducing a platform that incorporates different models of DCs and includes freshly isolated tissue-resident cells (in vivo), primary DCs cultured after isolation (ex vivo), DC models generated in vitro, or isolated from humanized mouse models (in vivo HuMouse) (21) benefits the community by providing the opportunity to compare and evaluate subsets of DCs across multiple sources.
In this study, we provide a systematic evaluation of tissue-resident, ex vivo, and in vitro models of human DC biology by constructing a reference human DC atlas that integrates data from multiple laboratories, derivation methods, and measurement platforms. We leverage the experimental and functional descriptions of DC phenotype to classify the transcriptional subsets observed in the atlas. By doing so, the variability of subset-specific genes between tissues of origin and status of activation explains the lack of concordance of previous studies. The DC atlas further reveals a transcriptional gap between in vitro–derived DCs and their primary counterparts, highlighting new opportunities to improve in vitro differentiation models.
Materials and Methods
Data collection and processing
For atlas construction, publicly available datasets were collected from database repositories such as the National Center for Biotechnology Information’s Gene Expression Omnibus and the European Bioinformatics Institute’s ArrayExpress. Mapping and analysis of microarray and RNA sequencing datasets were done using the standard Stemformatics processing pipeline described by Choi et al. (22). All datasets passed multiple stringent quality control steps required for hosting on the Stemformatics platform. Dataset processing scripts are available at Stemformatics GitHub. Datasets that have failed the quality control steps were removed from the integration step. Finally, 14 datasets consisting of 342 samples were selected for inclusion in the atlas (Supplemental Table I).
Dataset integration and gene selection for principal component analysis
For the integration step, the common genes between all datasets were selected and assessed for their platform dependency as described in detail by Angel et al. (23) and Rahab et al. (24). Briefly, the gene expression values from RNA sequencing and microarray were transformed into ranking percentiles. Then, using a linear mixed model, each gene’s variance composition was determined with regard to variance explained by the platform. Then, a threshold for gene selection was empirically determined to remove genes with a platform effect for the first three principal components, resulting in a platform variance ratio threshold of 0.16. A total of 2417 genes with low platform-dependent variance and concomitantly high variance by cell type were kept for generating the atlas, using a principal component analysis (PCA). Codes for constructing the atlas are available at Wells Lab GitHub, and the final gene list and interactive atlas are available at https://www.stemformatics.org/atlas/dc.
Facet annotation
A review of experimental details from each of the 14 studies was entered into a standardized metadata table, and the following facets were identified: cell type included cDC1s, 103 samples; cDC2s, 144 samples; pDCs, 19 samples; DC precursors, 17 samples; MoDCs, 25 samples; monocytes, 6 samples; and uncharacterized DCs, 27 samples. The activation status facet consisted of 202 samples that were unstimulated by experimental factors, 42 samples exposed to whole bacteria or bacterial ligands, and 97 exposed to virus or viral-mimicking ligands. In all, 308 of 342 samples were collected from healthy individuals. Tissue type consisted of 186 samples isolated from blood, 19 from bone marrow, 24 from skin, 11 from small intestine, 12 from spleen, 5 from synovial fluid, and 85 from an in vitro source. Finally, the sample source described 125 samples that were profiled directly from primary human tissue (in vivo), a further 105 primary samples that were cultured before profiling (ex vivo), 85 that were derived from progenitors in the laboratory (in vitro), and 26 that were isolated from humanized mice.
Pseudo-bulk samples from external single-cell dataset
To project external single-cell data onto the DC atlas, first single cells were aggregated to build pseudo-bulk samples to mitigate library size differences. For each cluster defined by the author of single-cell data, the cells were randomly separated into subgroups of a size 15, using Sincast (25). Thus, every pseudo-bulk sample was aggregated from 14 cells and the expression values of each aggregated group were a summation of the subgroup’s expression values for each gene.
Capybara similarity analysis
Capybara identity scores (26) were used to measure the similarity of the external pseudo-bulk samples to the reference atlas. Capybara cell scores were calculated by performing a restricted linear regression of atlas samples on each of the pseudo-bulk samples’ expression profiles as described previously by (24).
Differential expression analysis
Atlas comparisons (violin plots of individual genes) used prefiltered gene lists to account for platform variance, and p values were determined with a Student t test. For differential expression (DE) analyses on genes that were not used in construction of the atlas, a linear model was used. Specifically, for DE between in vivo and in vitro models, the linear model accounted for the experimental source by including the platform as the batch variable. Significant genes were determined when the variance explained by the sample source was higher than the variance explained by batch, and they were ranked by a Benjamini–Hochberg (27) adjusted p value (<0.05) and the difference of mean expression (i.e., δ expression) of in vivo and in vitro cells (>0.2).
DE comparing tissue markers used a linear mixed-effects model (28), where for each gene, the linear mixed-effects model assigned tissue as the fixed effect and batch was accounted for by assigning platform as the random effect, and genes were considered when the proportion of variance explained by tissue was >0.3. To identify top-ranked tissue-specific markers, a further pairwise analysis of each tissue against all of the others was performed. A gene was considered tissue specific when it was upregulated in only one tissue (maximum pairwise p value against the tissue <0.05 and mean δ expression against the tissue >0.15. The selection resulted in a set of 496 genes. A summary of these analyses accompanies each table and figure.
Enrichment analysis
Gene Ontology biological process enrichment analysis was conducted using clusterProfiler package 3.12.0 in Bioconductor (29).
Graphing and illustration
Violin plots of ranked gene expressions were generated using Plotly Python Graphing Library (30). PCA graphs were created through the Stemformatics platform (http://www.stemformatics.org) (22). Heatmaps were generated using R package pheatmap (31) and gplots (32). The schematic illustration was created by BioRender.com.
Results
Assessing the reproducibility of DC expression phenotypes in the Stemformatics DC atlas
As DCs are relatively rare cell types, it has been difficult to directly compare subsets derived from different tissues and evaluate how disease or Ag activation alters their molecular phenotypes. Most published studies on human DC biology are described for a specific experimental context, such as profiling blood DCs (14) or comparing the transcriptional phenotype of in vitro–derived DCs with those from blood (15). Comparing DC phenotypes derived from different studies remains a challenge for the field. We set out to develop a unified description of DC biology by assessing which transcriptional phenotypes were generalizable across multiple studies. The resulting DC atlas has been constructed using 342 human samples from 14 studies including tissue-resident, ex vivo, and in vitro generated conventional and nonconventional DCs (Fig. 1, Supplemental Fig. 1, Supplemental Table I). Samples were annotated using evidence from the original study consisting of FACS accompanying each DC subset, tissue of origin, and, where relevant, disease or activation status.
The DC atlas in Stemformatics.org consists of 342 samples derived from 14 transcriptome studies and 10 laboratories. (A) Annotation of samples by DC subtype: purple, cDC1s; light blue, cDC2s; red-violet, pDCs; dark blue, MoDCs; navy blue, monocytes; yellow-green, DC precursors; pink, unspecified (not FACS sorted) DCs. (B) Overlay of CLEC9A expression highlights behavior as a marker of cDC1s. Color intensity ranges from high expression (dark red, rank 1.0) to low/no expression (pale red, rank 0.1). (C and D) Ranked gene expression (median, interquartile range) of (C) CLEC9A and (D) CD1D comparing activation status between bacterial, unstimulated, or viral agonists. Sample sizes (N) are given under each combined category. The p values were determined by an independent t test. (E and F) Ranked gene expression (median, interquartile range) of IL6 for the subsets of (E) cDC1s and (F) cDC2s comparing activation status between bacterial, unstimulated, or viral agonists. Sample sizes (N) are given under each combined category. The p values were determined by a Student t test. See also Supplemental Fig. 1.
The DC atlas in Stemformatics.org consists of 342 samples derived from 14 transcriptome studies and 10 laboratories. (A) Annotation of samples by DC subtype: purple, cDC1s; light blue, cDC2s; red-violet, pDCs; dark blue, MoDCs; navy blue, monocytes; yellow-green, DC precursors; pink, unspecified (not FACS sorted) DCs. (B) Overlay of CLEC9A expression highlights behavior as a marker of cDC1s. Color intensity ranges from high expression (dark red, rank 1.0) to low/no expression (pale red, rank 0.1). (C and D) Ranked gene expression (median, interquartile range) of (C) CLEC9A and (D) CD1D comparing activation status between bacterial, unstimulated, or viral agonists. Sample sizes (N) are given under each combined category. The p values were determined by an independent t test. (E and F) Ranked gene expression (median, interquartile range) of IL6 for the subsets of (E) cDC1s and (F) cDC2s comparing activation status between bacterial, unstimulated, or viral agonists. Sample sizes (N) are given under each combined category. The p values were determined by a Student t test. See also Supplemental Fig. 1.
The dominant pattern shared by all studies was the clustering of known DC subtypes, which allowed us to assess the expression of genes commonly used to discriminate between these groups (Fig. 1A). MoDCs, for example, clustered with monocytes and were closely associated with the cDC2 cluster. A generic “dendritic cell” label was assigned to the samples that lacked sufficient subtype information, and as these were predominantly derived in vitro from CD34+ cells, they were closely associated with cDC2 and MoDC subsets. pDCs grouped closely with progenitors as well as cDC1s. Examination of technical variables such as study identifier or platform (Supplemental Fig. 1A, 1B) showed that these were not contributing to clustering of individual samples, or the DC subsets.
cDC1s have been described in both blood and primary tissues and are important for immune responses to pathogens and tumors (reviewed in Ref. 33). Samples were annotated as cDC1s when the original study had provided FACS data using current benchmarks of cDC1s, including CD141, CLEC9A, and CADM1. cDC2s are more abundant and are specialized in MHC class II Ag presentation (33). The gene expression of cDC1 markers such as CLEC9A is likewise restricted to cDC1 annotated subsets (Fig. 1, Supplemental Fig. 1C). However, not all markers were restricted to their cognate subsets at a transcript level in laboratory-derived DCs. For example, CD1C is a common marker of a cDC2 population and is widely used to isolate this cell type from blood and other tissues. In evaluating marker distribution across the cDC subsets, we noted high expression of CD1C mRNA by in vitro–differentiated cDC1s and pDCs, but not tissue-resident equivalents (Supplemental Fig. 1D, 1E). This observation agrees with previous research showing that FLT3L-driven differentiation of progenitors induces expression of the CD1C marker in cDC1s (34, 35).
Many of the classical markers of cDC1s and cDC2s displayed varying levels of expression on activation with bacterial or viral ligands. For example, cDC1 significant markers CLEC9A and TLR3 and cDC2 markers CLEC10A and CD1D were downregulated by bacterial or viral activation (Fig. 1C, 1D, Supplemental Fig. 2A, 2B). Activation markers such as IL6 and TNFAIP6 demonstrated subset-specific activation profiles: bacterial ligands induced IL6 and TNFAIP6 expression in cDC2s, whereas these factors were induced by viral ligands in cDC1s (Fig. 1E, 1F, Supplemental Fig. 2C–F). This is consistent with the reciprocal expression pattern of the bacterial and viral adjuvants’ receptors in cDC1s and cDC2s; the viral adjuvant receptor TLR3 is highly expressed by cDC1s whereas the bacterial adjuvant receptor TLR4 is highly expressed by cDC2s (36). Thus, the ability to look across multiple experimental attributes allows users of the DC atlas to assess the conditions likely to reproducibly alter cDC subset-specific behaviors.
Tissue of origin alters DC behavior
The tissue environment shapes human DC phenotypes (reviewed in Ref. 37), and indeed the DC atlas demonstrated reproducible patterns of DC subsets grouping by related tissue types. For example (Fig. 2A), spleen and small intestine groups overlapped substantially with blood clusters, but DCs isolated from primary human skin biopsies or DCs isolated from humanized mouse bone marrow formed discrete groupings. To assess the reproducibility of tissue clusters of DC subsets, we projected pseudo-bulk samples of an external single-cell dataset of blood-derived DC subsets (6) on the reference atlas. Projection of these samples showed the association of these data with the reference DC type and tissue (Fig. 2B–D). The Capybara similarity analysis identified not only the multiple DC subtypes, including cDC2-Bs (DC3s), as well as the AXL+ SIGLEC6+ DC (DC5) subset originally described by the Villani single-cell profiling, but also confirmed that the tissue of origin was blood, by showing a high Capybara similarity score with DC atlas samples from blood and lower scores for bone marrow, spleen, skin, and intestinal DCs (Fig. 2C, 2D). This demonstrates that subsets identified in high-resolution single-cell data such as the Villani dataset are effectively annotated by projection onto the DC atlas.
Tissue-specific behavior of DCs. (A) Stemformatics DC atlas cluster by subset and tissue of origin. (B) Projection of pseudo-bulk samples from single-cell data describing blood-derived DCs from Villani et al. (6) demonstrates reproducible grouping of new samples on the reference atlas. (C) Annotation of cell type from Villani et al. (6) single-cell libraries using the Capybara similarity score against the reference Stemformatics DC atlas predicting DC subsets and (D) tissue of origin. Similarity scores are shown: high (blue) to low (yellow). (E) Ranked gene expression of XCR1 or (F) IL1B (median, interquartile range) in samples selected for cDC1s and cDC2s from blood, bone marrow (bone m.), skin, small intestine (s.i.), spleen, and synovial fluid (s.f.). Sample size (N) for each combined category is listed under the x-axis. The p values were determined by a Student t test. (G) Heatmap of top 10 genes specific to each tissue (p < 0.05) (see Supplemental Table III), classifying DCs isolated from blood, skin, spleen, and synovial fluid tissues. Row-normalized color scale shows intensity ranges from high expression (dark red) to low/no expression (dark blue). The p value was determined by a two-sided mixed linear model. See also Supplemental Figs. 2 and 3.
Tissue-specific behavior of DCs. (A) Stemformatics DC atlas cluster by subset and tissue of origin. (B) Projection of pseudo-bulk samples from single-cell data describing blood-derived DCs from Villani et al. (6) demonstrates reproducible grouping of new samples on the reference atlas. (C) Annotation of cell type from Villani et al. (6) single-cell libraries using the Capybara similarity score against the reference Stemformatics DC atlas predicting DC subsets and (D) tissue of origin. Similarity scores are shown: high (blue) to low (yellow). (E) Ranked gene expression of XCR1 or (F) IL1B (median, interquartile range) in samples selected for cDC1s and cDC2s from blood, bone marrow (bone m.), skin, small intestine (s.i.), spleen, and synovial fluid (s.f.). Sample size (N) for each combined category is listed under the x-axis. The p values were determined by a Student t test. (G) Heatmap of top 10 genes specific to each tissue (p < 0.05) (see Supplemental Table III), classifying DCs isolated from blood, skin, spleen, and synovial fluid tissues. Row-normalized color scale shows intensity ranges from high expression (dark red) to low/no expression (dark blue). The p value was determined by a two-sided mixed linear model. See also Supplemental Figs. 2 and 3.
Most isolation methods rely on specific markers to isolate DC subsets from varied tissue types. For example, CLEC9A, XCR1, and CADM1 are the most common markers for isolating blood and tissue-resident cDC1s. However, a highly variable gene expression pattern of XCR1 was identified across multiple tissue types (Fig. 2E). This has implications when choosing a reference to identify and annotate innate immune cells isolated from different tissues. cDC1s isolated from peripheral blood had low expression of the proinflammatory factor IL-1β (IL1B), suggesting a less mature profile of blood cDC1s in comparison with cDC1s isolated from other tissues (Fig. 2F).
Given that the blood-derived cells appeared to have a reproducible tissue-associated phenotype, we next asked whether there was evidence of tissue-specific markers by examining gene sets associated with each tissue, regardless of the cell subset or characterizing platform. Altogether, 1270 genes were differentially expressed between tissues (Supplemental Fig. 3, Supplemental Table II, adjusted p < 0.05, δ value of 0.15). The 10 most discriminating genes were chosen for illustrative purposes (Fig. 2G, Supplemental Table III). Examples included regulation of genes required for cell division by spleen-derived DCs (such as KIF2C, CENPM, and AURKB), which implies a mitotic profile of spleen DCs. Notably, these factors were also expressed in pre-DCs isolated from peripheral blood. Blood DCs showed high expression of SELL, important for binding and subsequent rolling of leukocytes on endothelial cells; however, blood DCs, which are the seeding source of other tissues’ DC populations, also shared gene expression patterns with spleen and other tissues. Skin-derived DC subsets were the most distinct from other tissue DCs, with distinctively expression of 322 genes. These included the integrin SLAMF1 and TM4SF1, a transmembrane protein that regulates WNT signaling. Pathway enrichment analysis on the whole list of genes differentially expressed between at least two tissues suggested that IL signaling shapes some of the tissue environment (Supplemental Fig. 3, Supplemental Table II).
Cultured blood DCs remember their tissue environment
Given that tissue has such a strong impact on the transcriptional profiles of DCs, we wondered whether this was retained in culture. However, blood-derived DCs were the only tissue profiled directly, or cultured, and these clustered closely together (Fig. 3A). Culture alone did not induce expression of the chemokine CCL13 in ex vivo cells, whereas this was significantly higher in vitro (Fig. 3B). This suggests that although ex vivo blood DCs have experienced a culture environment, they appear to maintain memory of their tissue environment.
Cultured blood (ex vivo) DCs remember their tissue environment. (A) Stemformatics DC atlas colored by sample source to highlight the resemblance of ex vivo (light brown) DCs to their in vivo (green) models but apart from in vitro (dark brown) differentiated DCs. (B) Ranked gene expression of CCL13 chemokine in in vivo DCs (n = 125), ex vivo DCs (n = 105), in vitro–derived DCs (n = 85), and HuMouse models of DCs (n = 26). The p values were determined by a Student t test. (C and D) Genes significantly higher in (C) cDC1s or (D) cDC2s, plotted against healthy in vivo, untreated ex vivo, and untreated in vitro samples. To be consistent, in vivo and ex vivo samples were selected from blood tissue. Color intensity ranges from high expression (dark red, rank 1.0) to low/no expression (blue, rank 0.1). See also Supplemental Table IV.
Cultured blood (ex vivo) DCs remember their tissue environment. (A) Stemformatics DC atlas colored by sample source to highlight the resemblance of ex vivo (light brown) DCs to their in vivo (green) models but apart from in vitro (dark brown) differentiated DCs. (B) Ranked gene expression of CCL13 chemokine in in vivo DCs (n = 125), ex vivo DCs (n = 105), in vitro–derived DCs (n = 85), and HuMouse models of DCs (n = 26). The p values were determined by a Student t test. (C and D) Genes significantly higher in (C) cDC1s or (D) cDC2s, plotted against healthy in vivo, untreated ex vivo, and untreated in vitro samples. To be consistent, in vivo and ex vivo samples were selected from blood tissue. Color intensity ranges from high expression (dark red, rank 1.0) to low/no expression (blue, rank 0.1). See also Supplemental Table IV.
When examining the impact of culture on the identity of DC subsets (Fig. 3C, 3D, Supplemental Table IV), we noted that some subset-distinguishing markers such as CLEC9A, IRF8, CADM1, and BATF3 have been recapitulated by in vitro cDC1s, and, likewise, markers such as CD1C, CD1E, and CLEC10A recapitulated by in vitro cDC2s and MoDCs. Nevertheless, some other markers such as PLCD1 and TMEM173 were not shared by in vitro cDC1s and cDC2s, respectively. In contrast, ex vivo blood DC subsets expressed these subset-defining markers at a similar level with in vivo blood DCs, reinforcing our observation that although ex vivo blood DCs have experienced a culture environment, they maintain memory of their tissue environment. We do not know how generalizable this observation is to other tissues, but it suggests that the source of the DC is a more important factor than the culture environment when considering DC behaviors such as their activation or maturation status.
CD34-derived DCs differentiated in vitro are not equivalent to their in vivo counterparts
Human DCs develop from the hematopoietic stem cells resident in the bone marrow that later differentiate to common myeloid progenitors and common DC progenitors. Although the ontogeny of human DCs is still being elucidated, many in vitro DC differentiation protocols have been developed in line with what is currently understood with regard to the DC developmental trajectory. Most human models of DC biology rely on differentiation from peripheral blood monocytes (16) or CD34+ cord blood progenitors (15). It is critical to understand how faithfully these capture the repertoire of DC subsets or behaviors.
The transcriptional phenotypes of in vitro–differentiated cDCs from CD34+ cord blood progenitors (n = 60) or monocyte-derived DCs (n = 25) were distinct, shown by clustering away from tissue-isolated DCs (Fig. 4A). We wondered whether this lack of similarity is due to the number of MoDC models in our in vitro subset, as they were not sampled from in vivo conditions, but review of the other subsets of in vitro versus in vivo pDCs and cDC1s derived from FLT3L-dependent differentiation methods recapitulated this pattern of separation (Supplemental Fig. 4A, 4B). For example, B and T lymphocyte attenuator (BTLA), a significant marker of human cDC1s mediating immune regulation (38), is missing from the cDC1s differentiated from cord blood CD34+ progenitors, but its expression is rescued in cDC1s differentiated in the HuMouse models (Fig. 4B). The improved methods of differentiation using a feeder layer of mouse OP9 stromal cells expressing Notch ligand Delta-like 1 (DLL1) (20) pushed the in vitro–derived DCs more toward the DCs derived in vivo in the HuMouse models; nevertheless, these still sat apart from their in vivo counterparts (Fig. 4C, Supplemental Fig. 4C).
The phenotype captured by in vitro–derived DCs is distinct from primary cells: in vitro–derived DCs from CD34+ cord blood progenitors (n = 60) and monocyte-derived DCs (n = 25) are transcriptionally distinct from primary cell types. (A) Stemformatics DC atlas colored by in vitro (n = 85) and in vivo (n = 125) DC sources. (B) Ranked gene expression of cDC1 marker, BTLA, in in vivo cDC1s, ex vivo cDC1s, in vitro–derived from cord blood CD34+ progenitors, and in HuMouse-differentiated cDC1s. Sample size (n) for each combined category is listed under the x-axis. (C) Heatmap of similarity score of in vitro–derived DCs from single cell dataset by Balan et al. (20) against the reference Stemformatics DC atlas using Capybara analysis. Similarity scores are shown: high (red) to low (blue). (D) Volcano plot of differentially expressed genes lost (red) or gained (green) in in vitro DCs compared with primary cells. (E) Overlay of SMAD3 and (F) TNFRSF21 gene expression between in vivo and in vitro samples. Color intensity ranges from high expression (dark red) to low/no expression (pale red). The p values were determined by a Student t test.
The phenotype captured by in vitro–derived DCs is distinct from primary cells: in vitro–derived DCs from CD34+ cord blood progenitors (n = 60) and monocyte-derived DCs (n = 25) are transcriptionally distinct from primary cell types. (A) Stemformatics DC atlas colored by in vitro (n = 85) and in vivo (n = 125) DC sources. (B) Ranked gene expression of cDC1 marker, BTLA, in in vivo cDC1s, ex vivo cDC1s, in vitro–derived from cord blood CD34+ progenitors, and in HuMouse-differentiated cDC1s. Sample size (n) for each combined category is listed under the x-axis. (C) Heatmap of similarity score of in vitro–derived DCs from single cell dataset by Balan et al. (20) against the reference Stemformatics DC atlas using Capybara analysis. Similarity scores are shown: high (red) to low (blue). (D) Volcano plot of differentially expressed genes lost (red) or gained (green) in in vitro DCs compared with primary cells. (E) Overlay of SMAD3 and (F) TNFRSF21 gene expression between in vivo and in vitro samples. Color intensity ranges from high expression (dark red) to low/no expression (pale red). The p values were determined by a Student t test.
We identified 3139 differentially expressed genes between in vitro–generated DCs and in vivo DCs (Supplemental Table V), where key receptors such as CSF3R, CCR9, and TLR10 were among the downregulated differentially expressed genes by in vitro samples (Supplemental Fig. 4D, 4E). The gene set enrichment analysis on top-ranked genes missing in vitro revealed that the most impacted biological processes are cell activation and communication, including IL-18R1 (IL18R1), TNF receptor, TNFRSF21 (important for activating the NF-KB pathway) (39), signaling lymphocyte activation molecule (SLAM) family immunoregulatory receptor CD244, and SMAD3 controlling responsiveness to transforming the TGF-β1 cytokine (40) (Fig. 4E, 4F, Supplemental Fig. 4F, 4G). The missing factors from the transcriptional phenotype of the current in vitro DC models highlight the necessity to develop improved differentiation methods of in vitro DC derivation.
To better understand the phenotype captured in vitro, and to what extent this phenotype is different from ex vivo models, we compared the gene expression profile of stimulated and unstimulated cultured groups with in vivo samples. Four broad gene clusters were identified among the genes significantly captured by in vitro DCs. The genes that appear to be driven by the culture environment because they were upregulated by both in vitro and ex vivo models (Fig. 5A, clusters 3 and 4) were enriched for metabolite and energy-related processes (Fig. 5A). A subset of these genes (Fig. 5A, cluster 3) were induced by viral activation specifically by the in vitro group.
In vitro–derived DCs are primed for activation by pathogen. (A) Heatmap of top differentially upregulated genes by in vitro DCs (p < 0.05, see also Supplemental Table V) compared within varied activation statuses of cultured samples versus in vivo DCs, with gene set enrichment analysis (GSEA) of gene ontology terms (biological process) for each cluster including examples of some of the genes associated with each process. Row normalized color intensity from high expression (dark red) to low/no expression (blue) is shown. (B) Schematic of the impact of differentiation cytokines on the transcriptional profile of in vitro–derived DCs. (C and D) Ranked gene expression of (C) CCL17 chemokine and (D) SOCS2 genes compared between in vivo (n = 125), ex vivo (n = 105), and in vitro (n = 85) and HuMouse models (n = 26) of DC samples. The p values were determined by a Student t test.
In vitro–derived DCs are primed for activation by pathogen. (A) Heatmap of top differentially upregulated genes by in vitro DCs (p < 0.05, see also Supplemental Table V) compared within varied activation statuses of cultured samples versus in vivo DCs, with gene set enrichment analysis (GSEA) of gene ontology terms (biological process) for each cluster including examples of some of the genes associated with each process. Row normalized color intensity from high expression (dark red) to low/no expression (blue) is shown. (B) Schematic of the impact of differentiation cytokines on the transcriptional profile of in vitro–derived DCs. (C and D) Ranked gene expression of (C) CCL17 chemokine and (D) SOCS2 genes compared between in vivo (n = 125), ex vivo (n = 105), and in vitro (n = 85) and HuMouse models (n = 26) of DC samples. The p values were determined by a Student t test.
Genes that were upregulated in vitro but remained at a low expression level under ex vivo conditions and in vivo DCs (Fig. 5, clusters 1 and 2) shared commonalities with the immunoregulatory DC state (mature DCs enriched in immunoregulatory molecules [mregDCs)] (41), specifically expressing genes associated with the regulation of cell migration (e.g., CCL8 and WNT5B), immunoregulatory functions (e.g., CD274 and PDCD1LG2), communication with adaptive immune cells (e.g., BATF and FCER2), and the genes encoding CD1 family Ag-presenting molecules (CD1A, CD1B, CD1E) (Supplemental Fig. 5A, 5B).
This mregDC state of in vitro DCs is most likely explained by the presence of IL-4 and GM-CSF cytokines in most DC differentiation culture media (Fig. 5B). For example, highly expressed maturation genes by in vitro DCs such as CCL17, CCL22, and CCL1 chemokines are downstream of GM-CSF signaling (42, 43), and SOCS1 and SOCS2 genes are downstream of the IL-4 signaling pathway (44) (Fig. 5C, 5D, Supplemental Fig. 5C). This in vitro–activated profile might have implications for their application in immunotherapy, as matured DCs have a limited lifespan (45). Certainly, we observed low expression of the anti-apoptotic marker BCL-2 by in vitro models (Supplemental Fig. 5D). Unlike in vitro models, DCs differentiated in HuMouse from CD34+ progenitors with the administration of FLT3L showed low expression of maturation genes (Supplemental Fig. 6).
Discussion
DCs are specialized Ag processing and Ag presenters of the human immune system. However, our understanding of their biology is limited by their paucity in human tissues, as well as by a lack of appropriate in vitro models. Our aim in the present study was to develop a resource that allowed systematic benchmarking and assessment of diverse DC behaviors, by developing an integrated transcriptional atlas of human DCs that combined datasets from individual laboratories. We used a method of batch correction that does not require prior delegation of samples to biological categories, thereby avoiding normalization that predicates the analysis outcomes. In integrating many datasets together, emergent biological properties were reproducible across several independently derived studies, often from different laboratories. In doing so, we identified previously hidden variation of subset-defining markers across tissues or upon activation by viral or bacterial stimuli.
The DC atlas is implemented in Stemformatics.org, offering a readily accessible platform for benchmarking in vitro–generated models of in vivo biology. The human DC atlas is publicly available as an interactive three-dimensional PCA graph that can be explored across a single condition or a combination of conditions such as cell type, tissue type, derivation source, disease, or activation status. Users may project their own gene expression dataset, including single-cell RNA sequencing datasets, against the atlas for annotating subsets or benchmarking their models. The atlas is scalable and will grow as new datasets of DC biology become available.
Although individual studies emphasized the homology of their in vitro–generated models of DCs compared with putative DCs, in the present study we identified a transcriptional gap between these two. The genes overexpressed by in vitro models can be explained by the presence of growth factors and cytokines in the dish environment and indicate an activated phenotype that may impact their longevity in clinical settings. The missing genes from the in vitro models were associated with their ability to develop, differentiate, and communicate with T cells. These were not rescued by improved NOTCH-coculture methods, and they were only partially rescued in humanized mice. The coexpression of immunoregulatory genes (CD274, PDCD1LG2) and maturation genes (CD40, TNFRSF12A, CCL17) by in vitro DCs suggested a regulatory program captured by these models described previously as mregDCs (41). The regulatory module associated with decreased DC1 functionality in cancer was observed in the in vitro DC subsets but not HuMouse models (41).
Nevertheless, cultured primary DC models did partially recapitulate in vivo DC biology. For example, cultured blood DCs clustered tightly with primary blood DCs, suggesting that they remember their tissue environment, or, rather, have not yet been exposed to maturation factors in the tissue niche (reviewed in Refs. 33, 37). These observations suggest that it is the derivation stage of DC development that requires additional signals from the tissue stroma, which are absent in a culture dish. Although coculture with stromal cells such as the OP1 line or addition of NOTCH signaling to the culture environment seeks to address this gap, the DC atlas phenotypes show that these strategies only partially rescued the cord blood–derived DC phenotypes, as did reconstitution in HuMouse models. Altogether, there are further opportunities to improve in vitro DC derivation methods. The use of alternative progenitor sources, such as induced pluripotent stem cells, may assist in deconstructing these environmental gaps and provide an opportunity to understand the requirements, and impact, of various factors in shaping DC development, subset specificity, and function.
Acknowledgements
We thank the Stemformatics team at The University of Melbourne for hosting the DC atlas.
Footnotes
This work was supported by Department of Health/National Health and Medical Research Council Synergy Grant APP1186371 to C.A.W. Z.E. is funded by an Australian Postgraduate Award to The University of Melbourne.
Conceptualization, C.A.W. and Z.E; data curation, Z.E., P.W.A., S.K.B., J.C., and N.R.; formal analysis, Z.E. and P.W.A.; investigation, Z.E. and C.A.W; methodology, Z.E., P.W.A., J.C., and Y.D.; project administration, C.A.W. and J.C.; resources, C.A.W.; software, P.W.A., J.C., and Y.D.; supervision, J.D.M., K.R., and C.A.W.; validation, Z.E., P.W.A., and C.A.W.; visualization, J.C.; writing–original draft Z.E. and C.A.W.; writing—review and editing, Z.E., C.A.W., S.K.B., N.R., J.D.M., and K.R.
The online version of this article contains supplemental material.
Abbreviations used in this article:
- cDC
conventional DC
- cDC1
type 1 cDC
- cDC2
type 2 cDC
- DC
dendritic cell
- DE
differential expression
- MoDC
monocyte-derived DC
- mregDC
mature DC enriched in immunoregulatory molecules
- PCA
principal component analysis
- pDC
plasmacytoid DC
References
Disclosures
The authors have no financial conflicts of interest.