Vanderbilt University
VICB Logo

Home

 

Discoveries Featured

Bringing Order to Chaos

By: Carol A. Rouzer, VICB Communications
Published: April 25, 2014

A comprehensive exploration of relationships among the huge class of cytosolic glutathione transferases, sheds light on structure-function relationships.

The cytosolic glutathione transferases (cytGSTs) comprise one of four major groups of enzymes that catalyze the nucleophilic attack of glutathione (GSH) on substrates containing an electrophilic carbon, nitrogen, sulfur, or oxygen atom (Figure 1). CytGSTs play various roles in endogenous metabolic pathways, xenobiotic detoxication, and the response to oxidative stress. The enzymes are distinguished by the presence of a minimum of two domains: an N-terminal thioredoxin-like fold that contains the GSH binding site and a C-terminal domain that may be variable in structure. Recent advances in genome sequencing have revealed over 13,000 sequences that meet these criteria across the full phylogenetic spectrum. The identification of so many potential cytGSTs has far exceeded our ability to structurally or functionally characterize the putative gene products and has rendered previous ad hoc methods used to classify these enzymes inadequate. These challenges are exacerbated by the fact that cytGST activity is frequently established by using synthetic substrates (Figure 1), leaving the natural substrates for the enzymes a mystery. To begin to address these challenges, Vanderbilt Institute of Chemical Biology member Richard Armstrong, collaborators Steven Almo (Albert Einstein College of Medicine) and Patricia Babbitt (University of California, San Francisco), and their laboratories, with support from the Enzyme Function Initiative (NIH U54 BM093342) undertook the herculean task of classifying all known cytGSTs on the basis of their sequences. They then used this framework to select uncharacterized proteins for further study, providing the foundation for developing a comprehensive understanding of the structure-function relationships among this diverse protein superfamily [S. T. Mashiyama, et al. (2014) PLoS Biology, published online April 24, doi:10.1371/journal.pbio.1001843].



Figure 1
. Examples of reactions catalyzed by cytGSTs. Substrates shown are typical synthetic compounds used to test for cytGST activity.

The investigators began by identifying 13,435 sequences that fit the criteria for a thioredoxin-like fold N-terminal domain. They added 58 additional sequences based on similarity to those of known cytGSTs. After depositing all 13,493 sequences in the Structure-Function Linkage Database (SFLD), they used BLAST analysis to search for the degree of similarity between the sequences and Cytoscape to construct similarity networks based on the BLAST results. Because of the huge number of sequences involved, the researchers sought a way to simplify their network. They accomplished this by clustering sequences together that were more than 50% identical. This provided 2,190 nodes that they then used to construct a representative network. The result identified 7 Level 1 subgroups (Figure 2) which could be further divided into 35 Level 2 subgroups (Figure 3). Clusters of fewer than 50 proteins were also identified but were not given a subgroup designation. Construction of full networks, which classified each individual protein, provided greater detail of the relationships within the Level 2 subgroups.



Figure 2
. Level 1 representative network consisting of 2,190 nodes, each comprising cytGST sequences sharing >50% identity. Level 1 subgroups (AMPS, Main, R1, R2, R3, R4, and Xi) are labeled if they contain at least 50 sequences. Colors indicate that at least 50% of the annotated sequences in a node belong to the canonical class designated by that color in the key. Heavy borders indicate that 3D structural data are available for at least one member of the node, and the shape of the border indicates the source of the structural data. Reproduced under the Creative Commons Attribution License from S. T. Mashiyama, et al. (2014) PLoS Biology, published online April 24, doi:10.1371/journal.pbio.1001843. Copyright 2014, Mashiyama et al.


 


Figure 3. Level 2 representative network consisting of 2,190 nodes, each comprising cytGST sequences sharing >50% identity. Level 2 subgroups are labeled if they contain at least 50 sequences. Colors indicate that at least 50% of the annotated sequences in a node belong to the canonical class designated by that color in the key. Heavy borders indicate that 3D structural data are available for at least one member of the node, and the shape of the border indicates the source of the structural data. Reproduced under the Creative Commons Attribution License from S. T. Mashiyama, et al. (2014) PLoS Biology, published online April 24, doi:10.1371/journal.pbio.1001843. Copyright 2014, Mashiyama et al.


In the past, researchers classified cytGSTs on the basis of their enzymatic activity and structural features. These canonical classes, originally designated with Greek letters, have evolved over the years as new enzymes were discovered, but they have never been comprehensive or systematic. The investigators used data from the UniProtKB/Swiss-Prot resource, as well as other literature sources, to assign canonical classifications to all of the sequences that had previously been classified. They were not surprised to find that the vast majority of cytGST sequences have never been classified or characterized. In fact, a canonical class annotation was available for only 280 of the sequences. However, when they “painted” the canonical classifications onto their representative network, they found that sequences that shared the same classification tended to cluster together (Figures 1 and 2), indicating a general agreement between the traditional classification system and their sequence similarity network.

Using literature data, the investigators could only find 176 sequences that were confirmed to have cytGST activity. They used their network results to select 867 sequences from 31 subgroups and 64 smaller clusters as targets for further study. They successfully expressed 230 of these proteins and confirmed cytGST-like activity for 82, thereby increasing the number of known cytGSTs by nearly 50%. The investigators evaluated the known cytGSTs on the basis of fourteen different reaction types. Painting these results over their classification network revealed that the reactions catalyzed were widely distributed throughout the network without any clear and immediate correlation between type of reaction and sequence.

The investigators found crystal structure data for 398 cytGST sequences. Thirty-seven of these structures resulted from their own work on 27 cytGST proteins. They used all of these data to construct a structural similarity network (Figure 4). The results highlighted similarities in structure among superfamily members, despite their sequence diversity. The structural data also confirmed a relationship between sequence similarity and structural similarity and helped to identify subgroups that were most similar to each other. Again, as shown in Figure 4B, there was no strong correlation between structure and reaction type.

 



Figure 4. Structural similarity network showing the relationships between 131 crystal structures. (A) Each node is colored to show the Level 2 subgroup of the sequence in the sequence similarity network. (B) Each node is colored to show the type of reaction catalyzed by the enzyme in the case of those for which data were available. Reproduced under the Creative Commons Attribution License from S. T. Mashiyama, et al. (2014) PLoS Biology, published online April 24, doi:10.1371/journal.pbio.1001843. Copyright 2014, Mashiyama et al.


A reaction that is catalyzed by some cytGSTs is the reduction of a disulfide bond through the oxidation of two molecules of GSH to the corresponding disulfide GSSG (Figure 5). A careful study of the sequence similarity network indicated that the 44 enzymes confirmed to have this disulfide bond reductase (DSBR) activity were widely distributed, with the largest concentration in the Main.2 and Main.3 subgroups. Structural data available for 13 of these proteins allowed a direct comparison of their active sites. Main.2 enzymes with confirmed DSBR activity have conserved residues at Thr28, Gln57, Glu89, Ser90, and Arg152 (using numbering for the 4IKH protein from Pseudomonas fluorescens). X-ray crystal structures show that Thr28 hydrogen bonds with the sulfhydryl group of one GSH molecule, while Gln57, Glu89, and Ser90 interact at other points of this molecule. Arg152 interacts with the second substrate GSH. In DSBR enzymes from different subgroups, the residue that interacts with the GSH sulfhydryl (Thr in Main.2 subgroup enzymes) is frequently Ser or Cys. Residues in positions comparable to Gln57 and Arg152 in 4IKH are not highly conserved in enzymes from other subgroups. In contrast, residues in positions comparable to Glu89 and Ser90 in 4IKH were highly conserved (with Asp occasionally substituting for Glu89). The investigators note that Glu89 and Ser90 of 4IKH are found in a core ββα region that is preserved across a divergent range of cytGST subgroups and reaction types and may, therefore, be derived from a shared evolutionary ancestor.



Figure 5
. The disulfide bond reductase reaction shown with a substrate frequently used to test for this activity.


A particularly interesting reaction catalyzed by the Main.2 subgroup enzyme DrcA is reductive dechlorination (Figure 6). A slime mold (Dictyostelium discoideum) protein, DrcA’s function is to dechlorinate differentiation-inducing factor (DIF, Figure 6), a chlorinated alkyl phenone that plays a role in stalk cell differentiation in response to starvation conditions. Prior studies of DrcA have shown that Cys54 is critical to its activity. Thus, it is notable that none of the 17 Main.2 proteins known to have cytGST activity has a Cys at this position. The reductive dechlorination reaction is important because it plays a role in detoxication of halogenated environmental pollutants such as polychlorinated biphenyls, dioxins, and dichlorodiphenyltrichloroethane (DDT). A full understanding of the structural requirements for this reaction may pave the way for bioengineering new enzymes that can facilitate the process of bioremediation.


Figure 6. The reductive dechlorination reaction shown with a substrate frequently used to test for this activity. The structure of differentiation-inducing factor from Dictyostelium discoideum, the natural substrate for DrcA.


It is clear that our ability to identify proteins on the basis of sequence data has far exceeded our ability to characterize them functionally. This study begins to address this issue by laying the foundation for the systematic evaluation of structure-function relationships among the huge cytGST superfamily. A clear conclusion from this work is that we still have much to learn about these proteins, but future exploration can now be built upon a framework that facilitates elucidation of key similarities and differences among this diverse group of enzymes. Similar efforts are sorely needed for other large protein families that contain many poorly characterized members.

 

 

 



 



 



 

 


 

 


 

 
     

   vicb_youtube_channel_mark


Vanderbilt University School of Medicine | Vanderbilt University Medical Center | Vanderbilt University | Eskind Biomedical Library

The Vanderbilt Institute of Chemical Biology 896 Preston Building, Nashville, TN 37232-6304 866.303 VICB (8422) fax 615 936 3884
Vanderbilt University is committed to principles of equal opportunity and affirmative action. Copyright © 2013 by Vanderbilt University Medical Center