Abstract
With the goal of improving the reproducibility and annotatability of MHC multimer reagent data, we present the establishment of a new data standard: Minimal Information about MHC Multimers (https://miamm.lji.org/). Multimers are engineered reagents composed of a ligand and a MHC, which can be represented in a standardized format using ontology terminology. We provide an online Web site to host the details of the standard, as well as a validation tool to assist with the adoption of the standard. We hope that this publication will bring increased awareness of Minimal Information about MHC Multimers and drive acceptance, ultimately improving the quality and documentation of multimer data in the scientific literature.
Introduction
Major histocompatibility complex multimers are engineered reagents with precisely defined compositions that are used to detect Ag-specific T cells (1). Unfortunately, the compositional precision of these reagents is often obscured by shorthand nomenclatures that are useful in context, but that are an impediment to research reproducibility. Errors and confusion in how data are presented are well-known problems within the scientific literature (2, 3). To improve the description of these popular reagents in the literature and in publicly funded resources, such as the Immune Epitope Database (IEDB) (4), we propose minimal nomenclature guidelines for MHC multimers. Standardized nomenclature will ensure that multimer data are presented according to the FAIR principles and are more findable, accessible, interoperable, and reusable (5). Adherence to such standards has become increasingly endorsed or required across many areas of research (6–9). Following the precedent set by the Minimum Information about a Microarray Experiment, we present the Minimal Information about MHC Multimers (MIAMM; https://miamm.lji.org/) (10). As a key provider of such reagents, the National Institutes of Health (NIH) Tetramer Core Facility is driving this effort (11). Recognized as experts in this field, the NIH Tetramer Core Facility is ideally positioned to develop and endorse such standards. We present the details of MIAMM and also provide a public resource to validate multimer terminology.
These guidelines were established with the expertise of the members of the NIH Tetramer Core Facility, laboratory users of such reagents, and database curators tasked with describing such reagents in machine-readable and interoperable digital resources. We found that no such existing standards could be identified via the FAIRsharing data standard portal (12). The only available standards related to the description of MHC molecules are the MHC Restriction Ontology (MRO) (13), which is used by this MIAMM effort, as described later, and MaHCO, an Ontology for MHC Alleles and Molecules (14), which is no longer maintained. The Minimal Information About T cell Assays project overlaps with our effort, because it prescribes that authors report details about assay procedures, including all reagents and materials used, that would allow the accurate repetition of the assay by others (example reports: Intracellular Cytokine Staining assay, ELISpot, Multimer) (15, 16). We extend the Minimal Information About T cell Assays recommendations to the level of how the details of multimer reagents should be reported. A need for such standards was identified by the NIH Tetramer Core Facility when viewing publications describing the use of their reagents. Far too often, descriptions of MHC multimer reagents in the published literature are absent, misleading, imprecise, or otherwise incorrect. Likewise, the IEDB, a well-established National Institute of Allergy and Infectious Diseases resource that has manually curated >3,860,000 multimer assays from >21,600 manuscripts, has encountered similar obstacles when attempting to correctly and consistently represent these data (17). Therefore, we set out to provide guidelines on how to report MHC multimer reagents in publications based on three simple rules:
Provide a precise definition of the multimer reagent in the Materials and Methods section following the MIAMM standard. In these guidelines, this will be referred to as the “full name.”
Use reader-friendly abbreviations to refer to multimer reagents elsewhere in the manuscript, including figures, figure legends, and body text.
Within a manuscript, ensure that there is exactly one multimer abbreviation used for each multimer defined in the methods.
Results
Precise definitions of MHC ligand complexes
Most MHC molecules in a multimer contain three distinct subunits, and each of these must be precisely defined (or unambiguously assumed). In MHC multimers, the class I H chains and the class II α- and β-chains are always truncated to remove the transmembrane domains, but for the purposes of these guidelines, the details of the truncations should be described in the Materials and Methods section, and the names of the corresponding MHC alleles from the official nomenclature can be used without introducing ambiguity.
MHC nomenclature
The Immuno Polymorphism Database (IPD; https://www.ebi.ac.uk/ipd/mhc/) is the most authoritative repository of MHC allele names for humans and a number of other important species in immunology research (with the unfortunate notable exception of the mouse) (18). There are subsections of the IPD devoted to the highly standardized human MHC allele names (IPD-IMGT/HLA) and allele names for other species (IPD-MHC), including dogs, horses, rats, chickens, goats, cows, and a wide array of nonhuman primates. The MRO was developed to provide a unifying source of terms for these vocabularies; the MRO incorporates human MHC allele names from both IMGT/HLA and all other species present in IPD, using their approved nomenclature, and provides consistent additional properties, such as assignments to class I, class II, allele, serotype, and other relevant metadata (13). MRO also contains standardized MHC nomenclature for species not yet present within the IPD library (e.g., the laboratory mouse), making these terms available for use now while awaiting future guidelines from the IPD, which will be adopted once available. Thus, all MHC terms should be designated using MRO terminology. Importantly for data standards, MRO follows official IPD and IMGT nomenclature standards, and thus stays current with new terminology. MRO is an openly developed and maintained ontology, with regular build cycles and versioning. Input and new term requests from the public are facilitated via a GitHub issue tracker (https://github.com/IEDB/MRO).
For the full name of an MHC multimer reagent, the full MRO label must be used. An example from the literature, written as “DQB0201,” would instead be represented as “HLA-DQB1*02:01,” which has the stable and resolvable ontology identifier of MRO:0000674. It is important to note that MIAMM requires specifying the MHC protein chains included in the tetramer. Thus, for example, HLA-A*02:01 is the appropriate designation, rather than HLA-A*02:01:01 or HLA-A*02:01:02, which identifies alleles that encode for the same protein chain.
Class- and locus-specific nomenclature issues
The class I β2-microglobulin subunit.
Although alleles of β2-microglobulin (β2m) have been discovered in a number of species, it is not regarded as polymorphic and is therefore commonly ignored in nomenclatures (19). In MRO, the β2m chain is included in the annotations of classical and nonclassical class I MHC complexes but is not restated in the MHC complex term name, unless there is variability or the species differ between the two chains. In MHC multimer applications, the most notable fact about β2m is that reagents with H chains from the mouse (and other species) are commonly made with human β2m, in part because of data suggesting better stability of complexes using human β2m and in part out of historical practice (20–22). We suggest authors describe the source of the β2m in the Materials and Methods section.
The class II α subunit.
Although the class II β subunit is always polymorphic and must always be included in the multimer full name, the extent of the polymorphism of the class II α-chain varies from locus to locus and from species to species, and this must be considered in the full name. For example, the HLA-DRA1 locus is essentially monomorphic, and therefore it can be omitted, whereas the SLA-DRA locus is polymorphic and probably should be included. In contrast, there is significant allelic diversity for the HLA-DPA1 and HLA-DQA1 loci; thus, it is recommended that these alleles be explicitly included in the full name of the multimer, even when experts may be aware of α and β alleles that are in linkage disequilibrium. For example, although HLA-DQ2.5 is a common shorthand for an isoform associated with autoimmune disease, its full name is referred to as HLA-DQA1*05:01/DQB1*02:01 in MRO. MRO allows users to specify one or both chains for class II molecules; for example, both HLA-DRA*01:01/DRB1*03:01 and HLA-DRB1*03:01 are valid terms.
Nonnatural MHC mutants
Nonnatural MHC variants, often introduced to improve or reduce CD4 and CD8 coreceptor binding, are common in the MHC tetramer literature. From one perspective, these are exactly equivalent to discovery of a new allele, but the curators of allele databases generally restrict new entries to naturally occurring alleles. To address this, the MRO includes nonnatural MHC molecules that have been engineered. The nomenclature for engineered alleles follows what is commonly used for other gene variants, by first indicating the natural sequence using the IMGT-IPD nomenclature, followed by the wild type amino acid, the position of the mutation within the mature protein chain, and the mutant amino acid. For example, “HLA-A*02:01 A150P” indicates an amino acid sequence that is identical to the wild type A*02:01 sequence apart from a substitution from alanine to proline at position 150. The amino acid numbering should be based on the accepted natural mature protein sequence. Mature protein sequences for many species are currently available from IMGT (http://www.imgt.org), and they are working on making these more accessible. If a researcher engineers a new mutated MHC molecule, it will be added to MRO.
Peptide ligand nomenclature
Peptide definitions must include their complete amino acid sequence using the standard one-letter code established by the Nomenclature Committee of the International Union of Biochemistry (23). Table I contains examples of commonly studied peptidic multimer reagents, together with their standardized names and MRO mappings.
Because peptide sequences are not necessarily unique to a single organism and a single Ag, organisms, Ags, and sequence positions within an Ag should not be included in the full name of a multimer reagent; to do so would often imply more information than is warranted. Nevertheless, we recognize that these attributes aid reader understanding, and we therefore recommend that the study-relevant organism, Ag, and position information be included in reagent abbreviations. In addition, when d-amino acids or retroinverso peptides are used, this should be stated in the Materials and Methods.
To describe nonstandard amino acids and/or amino acid modifications, the Protein Modification Ontology (PSI-MOD) terminology (24) should be used, which is also the convention used by UniProt and the Protein Ontology (25, 26). PSI-MOD is presented as a standard for protein modification terminology by the FAIRsharing resource and, like MRO, is a member of the Open Biological and Biomedical Ontology Foundry (27). Table II provides all of the commonly used modifications, including those relevant to the NIH Tetramer Core Facility and found within the literature curated by the IEDB. Modifications must be described using the modification type followed by the position within the peptide, described using the amino acid abbreviation and the position, such as R5. Table III shows examples of commonly observed peptidic multimers with modifications and their ontology identifiers.
Nonpeptidic ligand nomenclature
Nonpeptide definitions must include their Chemical Entities of Biological Interest (ChEBI) name and identifier (https://www.ebi.ac.uk/chebi/) (28). ChEBI is a database and ontology containing information about chemical entities of biological interest and is also an Open Biological and Biomedical Ontology Foundry member, thus promoting interoperability among different resources. An example of a chemical structure known to have been tested as an multimer reagent is 1-O-(α-d-galactosyl)-N-hexacosanoylphytosphingosine, which is identified in ChEBI as CHEBI:466659. The IEDB has been able to accommodate all such structures encountered in the literature via this methodology, demonstrating its utility. Table IV presents some of the most commonly studied nonpeptidic multimer ligands with their corresponding ontology identifiers.
Validation tool
To ensure a common understanding of the guidelines specified earlier, we developed a publicly accessible, Web-based tool that allows a user to either (1) parse the full name they generated, and validate that it fulfills the specifications, or (2) compose the full name, based on the information added by the user. The tool ensures that the valid nomenclatures for MHC alleles are used based on MRO, valid amino acids are used for a peptide ligand, and valid amino acid modifications are specified using PSI-MOD. The tool interface is shown in (Fig. 1.
This tool accepts manually entered data and provides the ability to validate a set of data via an upload feature. It was tested by research scientists who frequently use MHC multimers in their research, as well as biocurators, who commonly encounter this type of data in the literature. Currently, the tool validates only multimers that contain peptidic ligands; however, development of a nonpeptidic ligand feature is underway. It is freely available online (https://miamm.lji.org/), and we welcome the public to use this tool to standardize their MHC multimer terms before publication. This Web site also serves as a portal to inform users of the MIAMM standard and the various ontologies employed by it.
Abbreviations
Once a multimer is precisely defined in the Materials and Methods section, the recommendations for how abbreviations should be constructed are decidedly less prescriptive, and the chosen approach is dependent on the context and considerations of uniqueness and consistency throughout a manuscript. For abbreviations, one absolute rule and one strong suggestion must be followed: Abbreviations for multimers must be unique within a manuscript and used consistently, and abbreviations for multimers should include a reference to both the MHC allele and its ligand.
As an example, in the very first article describing MHC multimers, we described three tetramers and referred to them as A2-gag, A2-pol, and A2-MP (29). Within the context of that article, these abbreviations are unambiguous and meet the criteria of these guidelines.
A variety of peptide name abbreviation systems have been used in the MHC multimer literature. Three examples are as follows:
Use of the Ag name and a position indicator: a common example of this is the I-Ab–restricted GP66 epitope from lymphocytic choriomeningitis virus (LCMV).
Use of the first three amino acids of the peptide: common examples are the HLA-B*08:01–restricted epitopes from EBV, RAK, and FLR.
Use of the first and last amino acid of the peptide sequence, followed by the length of the peptide: common examples of this are the Mamu-A*01–restricted epitope from SIV gag referred to as CM9 and the HLA-B*57:01–restricted epitope from HIV-1 gag called IW9.
Any of these approaches (and potentially others) are acceptable as abbreviations for peptides in MHC multimers.
Cautionary notes
Although we have embraced the principle of simplicity in these guidelines for multimer abbreviations, we do raise one important cautionary note. The simplest abbreviations are not future proof, and although this is unlikely to be a problem for a manuscript, it may become a problem when labeling reagent aliquots within a laboratory. An abbreviation such as A2-pol may be unique when the reagent is first made, but what happens if additional A2-restricted epitopes are discovered in the HIV pol protein, or if A2-restricted epitopes are discovered in the pol proteins of non-HIV viruses? It is hard to enforce forward-looking nomenclatures in the laboratory, especially for abbreviations, but we encourage it.
Other multimer features
There are numerous other components and formulations of an MHC multimer beyond the typical MHC allele and ligand combinations. For example, class II MHC molecules are often expressed with peptide ligands covalently tethered to the N terminus of the β-chain (30), sometimes with an added disulfide cross-link (31). Although this current nomenclature proposal does not easily capture these details (suggestions will be collected and discussed on the MIAMM Web site: https://miamm.lji.org/), they must be thoroughly described in the Materials and Methods section of the manuscript. Two common additional components, which we will discuss here, are (1) fusions to the MHC molecule and (2) the multimerization agent and its linked labels.
Fusions to the MHC molecule.
Features in this category include flexible linkers, leucine zippers, Ig Fc domains, BirA substrate peptides (or StrepTag equivalents), and affinity tags for purification (e.g., His6 tags). This type of information is critical to be included in the Materials and Methods section of a manuscript; however, they are not required in reagent nomenclatures.
Multimerization agents.
Streptavidin is by far the most common multimerization agent used in MHC multimer technology, but it is not the only one. In addition, it is sometimes combined with additional agents to produce higher-order multimers, such as in the Dextramer technology and the recently developed Spheromer technology (32, 33). Furthermore, in MHC multimer reagents, the multimerization agent almost always serves a second function: it is the component that is covalently linked to a label for detection.
The multimerization technology should always be described in the Materials and Methods section of a manuscript. Common examples include “tetramer,” “pentamer,” and “dextramer.” Detection labels such as fluorophores must also be included. Conjugation of protein fluorophores, such as R-phycoerythrin (R-PE) and allophycocyanin, results in mixtures of higher-order aggregates that vary by vendor, so vendors and catalog numbers for these reagents should be included in the Materials and Methods when they are known.
Research Resource Identifier
To extend the standards proposed in this article and to further contribute to FAIR data, the NIH Tetramer Core Facility is working with the Resource Identification Initiative (RRI) (34) to produce Research Resource Identifiers (RRIDs) for each of our reagents. Furthermore, we and the Resource Identification Initiative encourage commercial multimer vendors to do the same (note that we are not imposing any guidelines on how vendors market their products). We are hopeful that this publication will draw attention to this effort and extend its adoption. We anticipate vendors and other researchers generating multimer reagents will be incentivized to standardize their descriptions and obtain RRIDs because many journals (e.g., Star Methods in Cell Press journals) and the NIH increasingly require RRIDs as part of their Rigor and Reproducibility initiatives.
Discussion
In this article, we have introduced guidelines for the MIAMM (miamm.lji.org), as summarized in Table V. The primary goals of these guidelines are 2-fold: to enhance the reproducibility and annotatability of the literature using these reagents. We have attempted to provide guidelines that are as least prescriptive as possible while still adding value. To achieve these goals, we have recommended two basic principles: (1) describe the reagents as precisely as possible in the Materials and Methods section of a manuscript, and (2) use suitable abbreviations elsewhere.
Adoption of these standards is facilitated by the Multimer Validation Tool, which has been used to standardize the data present in the NIH Tetramer Core Facility. In addition, it has been used to validate the nearly 18,500 multimer assays in the IEDB, demonstrating the applicability to real-life data, as these assays were derived from >2400 separate publications. It has also been shared with stakeholders in both the academic and commercial sectors to facilitate implementation of these standards. We hope that this publication will bring increased awareness of MIAMM and drive acceptance. As a key stakeholder and recognized leader in this field, the NIH Tetramer Core Facility is well positioned to promote these standards. Both the NIH Tetramer Core Facility and the IEDB are stable, publicly funded resources well suited to the maintenance of this project. We hope the application of MIAMM will improve the reproducibility and annotatability of MHC multimer reagent data.
Footnotes
This work was supported by the U.S. Department of Health and Human Services, National Institutes of Health, Office of Extramural Research (Grant 75N93020D00005 to J.D.A., R.A.W., and D.L.L.; Grant R24 HG010032 to R.V., J.A.O., A.M., and B.P.; Grant 75N93019C00001 to R.V., J.A.O., A.M., B.P., and A.S.), the Yerkes National Primate Research Center, Emory University (Grant P51OD011132 to J.D.A., R.A.W., and D.L.L.), and the Emory Center for AIDS Research (Grant P30AI050409 to J.D.A., R.A.W., and D.L.L.).
Abbreviations used in this article
- ChEBI
Chemical Entities of Biological Interest
- FAIR
findable, accessible, interoperable, and reusable
- IEDB
Immune Epitope Database
- IPD
Immuno Polymorphism Database
- LCMV
lymphocytic choriomeningitis virus
- β2m
β2-microglobulin
- MIAMM
Minimal Information about MHC Multimers
- MRO
MHC Restriction Ontology
- NIH
National Institutes of Health
- PSI-MOD
Protein Modification Ontology
- RRID
Research Resource Identifier
References
Disclosures
The authors have no financial conflicts of interest.