Generating insights into protein function
Part of communicating the significance of your research is having figures that tell your story. Coding allows the investigator the opportunity to create applications that not only facilitate research, but generate figures that tell a unique story. The intention of this blog is to make code available that I have collected over the years which I have found to help me to tell better stories. I hope that others will not only be able to use the tools here to further their research, but to also tell really interesting stories in structural biology. The bottom line for me is that even if it isn’t as useful as I might hope, it is still a lot of fun to play around with!
Parsing PDB Files with Biopython
When creating protein structure network (PSN) visualizations, I typically begin by extracting key components from the Protein Data Bank (PDB) structure file using PDBParser from the Biopython package. For clarification, the PDB archive is a publicly accessible database that stores 3D structural data of biological molecules, such as proteins and nucleic acids, for use in scientific research and education. For the purpose of demonstration I am using the PDB structure 4PLD which is a human liver receptor homolog (LRH-1). It is worth noting that the workflow presented here is based on research conducted as part of a drug screening study on LRH-1. Note that you will need to update the line pdb_file = '7tt8.pdb'
to match the path where your PDB file is stored.
If you’re using a Jupyter Notebook, running this snippet should produce the following output:
This creates a Pandas DataFrame that contains basic atomic information from the crystal structure. To create a 3D network representation of the 4PLD protein structure, we need extract key information from the PDB file. When constructing PSNs I prefer to combine the residue number and name for each node so that on visual inspection the researcher can ‘get a feel’ for how the primary sequence structure is mapped to the network topology. In PSNs each residue is represented as a node. As a rule, I limit the network to chain A and only include C-alpha atoms. Therefore, each residue is represented by that residue’s C-alpha atom and corresponding x,y,z coordinates. The C-alpha coordinates are extracted as node features to construct the 3D network. It’s an exciting and insightful process!
Creating PSNs Using the Residue Interaction Network Generator
The Residue Interaction Network Generator (RING) is an online server that transforms protein structures into network representations. As mentioned earlier, residues are treated as nodes and interactions between them as edges. Generally, an interaction is interpreted in terms of proximity, i.e., Euclidean distances. However, other types of interactions are included, such as hydrogen bonds, salt bridges (ionic bonds), π-π stacking and van der Waals. The RING helps visualize and quantify the topological, or structural, features that emerges from residue-residue interaction network. Quantifying these structural features allows researchers to ask questions about functional hot spots, potential allosteric sites, and signaling pathways which may advance our understanding of protein dynamics and contribute to computational drug repurposing.
There are other methods for generating PSNs — residue-residue interactions. However, the RING server has been peer-reviewed and provides detailed documentation on how edges are calculated and what defines a connection. Below is a screenshot of a typical configuration I use for generating PSNs. I generally select parameters that I think are maximize edge inclusion. The RING server allows you to either retrieve a structure file from the PDB archive or upload a local file, which is what I have done in this case.
Once the server is finished with its computations, you’ll see an output similar to the screenshot below. Rather than going through the details of the results here, I encourage readers to explore the RING server and become familiar with its output by simply tinkering around. There are three three files that are generated for download: a .cif_ringNodes
, a .cif_ringEdges
, and a .json
file, which contain everything needed to build either a 2D or 3D network. The entire 3D network, including x,y,z coordinates, is contained in the .json
file. In a separate post, I will demonstrate how to read the .json
file and plot the 3D network using Plotly. Again, the reason I extract coordinates from the PDB file, rather than the coordinates available in the .json
file, is to ensure that the edges between residues map to the C-alpha atoms. It is a convention that structural biologists easily recognize and understand.
Next, we will import the .cif_ringEdges
file downloaded from the RING server into a Pandas DataFrame, and then merge the residue-residue interactions (edges) with the C-alpha atom coordinates from the PDB file.
This should produce a data frame with ‘source’ and ‘target’ node columns, followed by the corresponding x, y, z coordinates for both the ‘source’ and ‘target’ nodes, similar to the example shown below.
Lastly, with the Plotly and NetworkX libraries, we can create a script to generate an interactive 3D network visualization.
Observe that the code creates a Networkx graph object from the edgelist_7tt8_coords
data frame. Please, note that the graph object isn’t necessary to create the 3D network visualization. This code snippet is included for a future post, where the graph object will be used to calculate various measures of centrality which will be mapped to the network’s visual features. The data frame is parsed using standard Python operations. Coordinates for each residue are extracted and with duplicate nodes being removed. Each residue is linked to a text marker in the 3D plot displaying a residue names and sequence position label. Hover labels are also assigned, but note that the label information is redundant. This information was left as a place holder. In a future post I will demonstrate how the hover label can be used to annotate the network with other information such as centrality score, evolutionary conservation score, or links to other databases. The Plotly figure is easily customizable with figure title, axis grids, and node and edge properties. The result is an interactive 3D network that allows users to explore the relationships between residues in any PSN. Images of the 7TT8 PSN are displayed below.
There’s a lot more we can do with this figure. We can enhance it by adding widgets that dynamically resize nodes based on different centrality measures, or include biological and analytical annotations in the hover information. I’ll explore these enhancements in a future post. You can find the Jupyter Notebook for this exercise on GitHub. If you have any questions, feel free to contact me at LastCodeBender42@gmail.com.
Unless otherwise noted, all images are created by the author.