Journal of Molecular Biology
Inference of Macromolecular Assemblies from Crystalline State
Introduction
Macromolecular assemblies are complexes of more than one polypeptide and/or nucleotide chain that are stable in the native environment. The way in which the chains assemble represents the protein quaternary structure (PQS). Often (but not always), an assembly is the biological unit that performs a certain physiological function by facilitating respective biochemical processes. The functionality of many, if not most, proteins is dependent of the context of a macromolecular assembly. A simple example is given by the two-gene product, hemoglobin.1 This protein complex, made of four polypeptide chains, is responsible for oxygen transport in the body, while no functional significance may be assigned to the isolated chains. Other important classes of macromolecular assemblies include holoenzymes, ion channels, DNA polymerase, microtubules, nucleosomes, virons, and many others.2
The physiological function of macromolecular complexes is known to be closely related to their 3D structure. While various techniques (e.g., light-scattering,3 X-ray and neutron scattering,4 mass spectrometry5) have been developed to study different properties of macromolecular assemblies, such as molecular weight, accessible surface area, chemical composition, and others, inference on the 3D structure is difficult in such experimental studies. Certain conclusions about the shape of assembly may be derived from mobility and mass measurements,3 as well as from experiments on small-angle scattering.6 Electron microscopy (EM) is applicable to studying large complexes, but it offers only low-resolution images. About 20% of structures in Protein Data Bank (PDB7) were obtained using NMR technique,8 which is capable of getting atomic coordinates of macromolecular complexes in a solution. However, this method has limitations on the size of objects under study and is hardly applicable to medium and large assemblies. Besides, macromolecular complexes often exist in dynamic equilibrium, which further complicates interpretation of experimental results.
More than 80% of PDB entries were obtained by means of X-ray diffraction on macromolecular crystals.9 It is reasonable to expect that stable macromolecular complexes do not change during crystallization and therefore they should be identifiable in crystal packing. By convention, a PDB entry contains only the atomic coordinates for the asymmetric unit (ASU) of a crystal. ASU is defined as the smallest unit that can be rotated and translated to generate one unit cell, using only the symmetry operators allowed by the crystallographic symmetry. Generally speaking, ASU may be chosen in many different ways, from which any one that contains the crystallographically unique covalently linked structure(s) may be acceptable for PDB deposition. However, macromolecular complexes, as a rule, are linked by weaker, non-covalent, interactions, and often possess crystallographic symmetry. As a result, a macromolecular complex may be made of a single or several ASUs, or several parts of neighboring ASUs, or several complexes may be contained in a single ASU. The lack of a direct relationship between ASU and macromolecular complex poses considerable difficulties for the identification of the latter in crystal packing in a universal manner.
Inference of macromolecular assemblies from crystalline state is often seen as a bioinformatical problem. In the framework of informatics-based approaches, macromolecular interfaces, found in crystals, are classified into “biologically relevant” and “insignificant” (crystal packing) ones according to a certain scoring system (cf. e.g., Ponstingl et al.10). The score may depend on the interface area, residue/atom composition and contacts, hydropathy index, charge distribution, topological complementarity, and other parameters. Disengagement of “insignificant” interfaces breaks the crystal apart, hypothetically leaving monomeric chains assembled by “significant” interfaces into biological units. This idea has found two different technical implementations. The first one was the Protein Quaternary Structure (PQS) server at the Macromolecular Structure Database group of the European Bioinformatics Institute (EBI-MSD),11 which builds assemblies by progressive addition of suitable chain contacts. Another approach is represented by PITA (Protein InTerfaces and Assemblies) software,12 which starts with the largest complex allowed by crystal symmetry and then iteratively splits it by bisectioning until a chosen threshold score is achieved. The interface scores in PITA were calibrated in the course of an exhaustive study on statistical discrimination between crystal contacts formed by homodimeric and monomeric proteins.10
There are, however, grounds to believe that interface properties alone are not indicative enough for unambiguous discrimination between relevant interfaces and artifacts of crystal packing. Indeed, if the binding energy of a particular interface is sufficient for dimerization of given macromolecules, it does not necessarily mean that an identical interface will bind a pair of considerably heavier objects. This was implicitly confirmed in a detailed study of interface properties reported in Jones and Thornton.13 It has been concluded that no ultimate discriminating parameters for the identification of biologically relevant protein interfaces may be proposed even in the simplest case of dimeric complexes and that assessment of interface biological significance should take assembly type into account. Many other attempts to assess the significance of protein interfaces have been performed,14., 15., 16., 17., 18., 19., 20., 21., 22., 23., 24., 25., 26. but no universal criteria were found. A few databases of protein–protein interactions and interfaces, derived from PDB, have been developed25., 26., 27., 28. in attempt to provide a systematic view on the factors that are responsible for macromolecular binding. One can expect that such databases and statistical analysis of different interface properties may be useful for the identification of transient interactions, which are extremely specific to the topology and chemical composition of binding sites. However, formation of stable complexes involves an interplay between affinity and entropy change and therefore it may be (and in fact it has been found to be, as shown below) less dependent on the interface characteristic features.
In this study, we discuss physical–chemical principles of macromolecular complexation and assess complex stability from the positions of chemical thermodynamics. We show that the binding energy of an interface is not a sole function of interface properties but also depends on the complex size and shape. We also show that entropy change due to complex formation is a major factor that, along with binding properties, determines complex size and symmetry. Next, we find that available theoretical approaches to the calculation of binding energy and entropy of dissociation have sufficient accuracy for the correct identification of macromolecular assemblies in crystals in 80–90% of instances.
Section snippets
Macromolecular Complexes in Solutions
Complexes differ from molecules in that, typically, their subunits do not make strong (covalent) chemical bonds. Rather, formation of complexes is only partly due to immediate (contact-dependent and electrostatic) interactions between the subunits. The dominant factors determining complex size and geometry are due to interaction with the solvent, therefore, existence and stability of complexes cannot be considered out of solvent context. Because of the relatively weak binding of their subunits,
Identification of Macromolecular Assemblies in Crystals
Theoretical analysis, presented in Macromolecular Complexes in Solutions, can answer the question whether a particular macromolecular assembly is stable, that is, may appear in solution in noticeable concentration on comparison with that of subunits. Our next goal is to identify stable complexes in macromolecular crystals and suggest scoring of their chances to be biological units.
Results and Discussion
Table 1 shows empirical parameters of theoretical models for binding energy ΔGint (Eq. (9)) and entropy of dissociation ΔS (Eq. (17)), obtained by the fitting procedure described in Implementation and calibration of parameters. As found, the system of inequalities (22) remains underfit for the data sets used, which means that maximal number of satisfied inequalities is achieved by many different sets of fitted parameters. This implies that the data sets used may be insufficient for the
Conclusion
Here, we have described the theoretical background of a new approach to the identification of macromolecular complexes in crystals. We have also introduced the new publicly available EBI-MSD service PISA, which implements the method. The software provides a single-button analysis of X-ray-resolved structures, including the assessment of macromolecular interfaces, presence of thermodynamically stable complexes, and their probable dissociation patterns. Structure solution by means of X-ray
Protein data bank accession codes
The following PDB entries were used in order to calibrate the protein–protein interaction parameters of Eqs. (1), (9) and (17): 1a19, 1a6q, 1a8o, 1afk, 1ah7, 1ako, 1amj, 1aoh, 1aua, 1aun, 1ayi, 1ayl, 1bc2, 1be0, 1bea, 1bkz, 1bp1, 1bry, 1bwz,1cki, 1ckm, 1ctj, 1dff, 1djx, 1dmr, 1esf, 1fdr, 1feh, 1fsu, 1iae, 1ips, 1kpt, 1kwa, 1lrv, 1mdt, 1mh1, 1mpgk 1np4, 1pda, 1pgs, 1pmi, 1ppo, 1ps1, 1rhs, 1ton, 1xgs, 232l, 2abx, 2acy, 2atj, 2bls, 2hex, 2ihl, 2mbr, 3cms, 1a3c, 1ad3, 1af5, 1afw, 1ajs, 1alk, 1hlr,
Acknowledgements
The authors would like to thank Prof. Joel Janin for reading the manuscript and helpful discussion of vibrational entropy effects. E.K. is supported by the research grant No. 721/B19544 from the Biotechnology and Biological Sciences Research Council (BBSRC) UK.
References (121)
- et al.
The crystal structure of human deoxyhaemoglobin at 1.74 Å resolution
J. Mol. Biol.
(1984) - et al.
Advances in structure analysis using small-angle scattering in solution
Curr. Opin. Struct. Biol.
(2002) - et al.
PQS: a protein quaternary structure file server
Trends Biochem. Sci.
(1998) - et al.
The structure of protein–protein recognition sites
J. Biol. Chem.
(1990) - et al.
Protein-protein interactions: a review of protein dimer structures
Prog. Biophys. Mol. Biol.
(1995) - et al.
Understanding nature's catalytic toolkit
Trends Biochem. Sci.
(2005) - et al.
Using a neural network and spatial clustering to predict the location of active sites in enzymes
J. Mol. Biol.
(2003) - et al.
Dictionary of interfaces in proteins (DIP). Data bank of complementary molecular surface patches
J. Mol. Biol.
(1998) - et al.
Hydrogen bonding in globular proteins
Prog. Biophys. Mol. Biol.
(1984) - et al.
Surface, subunit interfaces and interior of oligomeric proteins
J. Mol. Biol.
(1988)
Conformation and hydrogen ion titration of proteins: a continuum electrostatic model with conformational flexibility
Biophys. J.
Satisfying hydrogen bonding potential in proteins
J. Mol. Biol.
Strength and co-operativity of contributions of surface salt bridges to protein stability
J. Mol. Biol.
A new method for computing the macromolecular electric potential
J. Mol. Biol.
Boundary element solution of macromolecular electrostatics: interaction energy between two proteins
Biophys. J.
Rapid refinement of protein interfaces incorporating solvation: application to the docking problem
J. Mol. Biol.
The entropic cost of protein–protein association: a case study on Acetylcholinesterase binding to Fasciculin-2
Biophys. J.
Structural basis of inhibition of the human NAD+-dependent deacetylase SIRT5 by Suramin
Structure
Unusual molecular architecture of the Yersinia pestis Cytotoxin YopM: a leucine-rich repeat protein with the shortest repeating unit
J. Mol. Biol.
The bacteriophage T4 DNA injection machine
Curr. Opin. Struct. Biol.
Crystal structure of carboxypeptidase G2, a bacterial enzyme with applications in cancer therapy
Structure
Structure of human chorionic gonadotropin at 2.6 Å resolution from MAD analysis of the selenomethionyl protein
Structure
Rat submaxillary gland serine protease, tonin structure solution and refinement at 1.8 Å resolution
J. Mol. Biol.
Unusual molecular architecture of the Yersinia pestis cytotoxin YopM: a leucine-rich repeat protein with the shortest repeating unit
J. Mol. Biol.
Selectivity and promiscuity in Eph receptors
Structure
Structural adaptations in the specialized bacteriophage T4 co-chaperonin Gp31 expand the size of the Anfinsen cage
Cell
Biochemistry
Light scattering by proteins
Structure Analysis by Small Angle X-ray and Neutron Scattering
Principles and Practice of Biological Mass Spectrometry
The Protein Data Bank
Nucleic Acids Res.
Protein NMR Spectroscopy, Principles and Practice
Protein Crystallography
Discriminating between homodimeric and monomeric proteins in the crystalline state
Proteins: Struct. Funct. Genet.
Automatic inference of protein quaternary structure from crystals
J. Appl. Crystallogr.
Principles of protein–protein interactions
Proc. Natl Acad. Sci. USA
An investigation of protein subunit and domain interfaces
Protein Eng.
The structure of interfaces between subunits ofdimeric and tetrameric proteins
Protein Eng.
On the nature of antibody combining sites: unusual structural features that may confer on these sites an enhanced capacity for binding ligands
Proteins: Struct. Funct. Genet.
A method for localizing ligand binding pockets in protein structures
Proteins: Struct. Funct. Genet.
Protein-protein interfaces: architectures and interactions in protein–protein interfaces and in protein cores. Their similarities and differences
Crit. Rev. Biochem. Mol. Biol.
The atomic structure of protein–protein recognition sites
J. Mol. Biol.
Dissecting protein–protein recognition sites
Proteins: Struct. Funct. Genet.
A new, structurally nonredundant, diverse data set of protein interfaces and its implications
Protein Sci.
PRISM: protein interactions by structural matching
Nucleic Acids Res.
A protein domain interaction interface database: InterPare
BMC Bioinformatics
Physical Chemistry
Calculation of the free energy of association for protein complexes
Protein Sci.
Protein–protein interaction at crystal contacts
Proteins: Struct. Funct. Genet.
Theory of hydrophobic bonding. II. The correlation of hydrocarbon solubility in water with solvent cavity surface area
J. Phys. Chem.
Cited by (7636)
In silico prediction of CD8<sup>+</sup> and CD4<sup>+</sup> T cell epitopes in Leishmania major proteome: Using immunoinformatics
2024, Journal of Molecular Graphics and ModellingNovel β-galactosidase activity and first crystal structure of Glycoside Hydrolase family 154
2024, New BiotechnologyStructural basis for the allosteric behaviour and substrate specificity of Lactococcus lactis Prolidase
2024, Biochimica et Biophysica Acta - Proteins and ProteomicsStructure and function of the pyridoxal 5′-phosphate-dependent (PLP) threonine deaminase IlvA1 from Pseudomonas aeruginosa PAO1
2024, Biochemical and Biophysical Research Communications