Journal of Molecular Biology
Volume 372, Issue 3, 21 September 2007, Pages 774-797
Journal home page for Journal of Molecular Biology

Inference of Macromolecular Assemblies from Crystalline State

https://doi.org/10.1016/j.jmb.2007.05.022Get rights and content

Abstract

We discuss basic physical–chemical principles underlying the formation of stable macromolecular complexes, which in many cases are likely to be the biological units performing a certain physiological function. We also consider available theoretical approaches to the calculation of macromolecular affinity and entropy of complexation. The latter is shown to play an important role and make a major effect on complex size and symmetry. We develop a new method, based on chemical thermodynamics, for automatic detection of macromolecular assemblies in the Protein Data Bank (PDB) entries that are the results of X-ray diffraction experiments. As found, biological units may be recovered at 80–90% success rate, which makes X-ray crystallography an important source of experimental data on macromolecular complexes and protein–protein interactions. The method is implemented as a public WWW service.

Introduction

Macromolecular assemblies are complexes of more than one polypeptide and/or nucleotide chain that are stable in the native environment. The way in which the chains assemble represents the protein quaternary structure (PQS). Often (but not always), an assembly is the biological unit that performs a certain physiological function by facilitating respective biochemical processes. The functionality of many, if not most, proteins is dependent of the context of a macromolecular assembly. A simple example is given by the two-gene product, hemoglobin.1 This protein complex, made of four polypeptide chains, is responsible for oxygen transport in the body, while no functional significance may be assigned to the isolated chains. Other important classes of macromolecular assemblies include holoenzymes, ion channels, DNA polymerase, microtubules, nucleosomes, virons, and many others.2

The physiological function of macromolecular complexes is known to be closely related to their 3D structure. While various techniques (e.g., light-scattering,3 X-ray and neutron scattering,4 mass spectrometry5) have been developed to study different properties of macromolecular assemblies, such as molecular weight, accessible surface area, chemical composition, and others, inference on the 3D structure is difficult in such experimental studies. Certain conclusions about the shape of assembly may be derived from mobility and mass measurements,3 as well as from experiments on small-angle scattering.6 Electron microscopy (EM) is applicable to studying large complexes, but it offers only low-resolution images. About 20% of structures in Protein Data Bank (PDB7) were obtained using NMR technique,8 which is capable of getting atomic coordinates of macromolecular complexes in a solution. However, this method has limitations on the size of objects under study and is hardly applicable to medium and large assemblies. Besides, macromolecular complexes often exist in dynamic equilibrium, which further complicates interpretation of experimental results.

More than 80% of PDB entries were obtained by means of X-ray diffraction on macromolecular crystals.9 It is reasonable to expect that stable macromolecular complexes do not change during crystallization and therefore they should be identifiable in crystal packing. By convention, a PDB entry contains only the atomic coordinates for the asymmetric unit (ASU) of a crystal. ASU is defined as the smallest unit that can be rotated and translated to generate one unit cell, using only the symmetry operators allowed by the crystallographic symmetry. Generally speaking, ASU may be chosen in many different ways, from which any one that contains the crystallographically unique covalently linked structure(s) may be acceptable for PDB deposition. However, macromolecular complexes, as a rule, are linked by weaker, non-covalent, interactions, and often possess crystallographic symmetry. As a result, a macromolecular complex may be made of a single or several ASUs, or several parts of neighboring ASUs, or several complexes may be contained in a single ASU. The lack of a direct relationship between ASU and macromolecular complex poses considerable difficulties for the identification of the latter in crystal packing in a universal manner.

Inference of macromolecular assemblies from crystalline state is often seen as a bioinformatical problem. In the framework of informatics-based approaches, macromolecular interfaces, found in crystals, are classified into “biologically relevant” and “insignificant” (crystal packing) ones according to a certain scoring system (cf. e.g., Ponstingl et al.10). The score may depend on the interface area, residue/atom composition and contacts, hydropathy index, charge distribution, topological complementarity, and other parameters. Disengagement of “insignificant” interfaces breaks the crystal apart, hypothetically leaving monomeric chains assembled by “significant” interfaces into biological units. This idea has found two different technical implementations. The first one was the Protein Quaternary Structure (PQS) server at the Macromolecular Structure Database group of the European Bioinformatics Institute (EBI-MSD),11 which builds assemblies by progressive addition of suitable chain contacts. Another approach is represented by PITA (Protein InTerfaces and Assemblies) software,12 which starts with the largest complex allowed by crystal symmetry and then iteratively splits it by bisectioning until a chosen threshold score is achieved. The interface scores in PITA were calibrated in the course of an exhaustive study on statistical discrimination between crystal contacts formed by homodimeric and monomeric proteins.10

There are, however, grounds to believe that interface properties alone are not indicative enough for unambiguous discrimination between relevant interfaces and artifacts of crystal packing. Indeed, if the binding energy of a particular interface is sufficient for dimerization of given macromolecules, it does not necessarily mean that an identical interface will bind a pair of considerably heavier objects. This was implicitly confirmed in a detailed study of interface properties reported in Jones and Thornton.13 It has been concluded that no ultimate discriminating parameters for the identification of biologically relevant protein interfaces may be proposed even in the simplest case of dimeric complexes and that assessment of interface biological significance should take assembly type into account. Many other attempts to assess the significance of protein interfaces have been performed,14., 15., 16., 17., 18., 19., 20., 21., 22., 23., 24., 25., 26. but no universal criteria were found. A few databases of protein–protein interactions and interfaces, derived from PDB, have been developed25., 26., 27., 28. in attempt to provide a systematic view on the factors that are responsible for macromolecular binding. One can expect that such databases and statistical analysis of different interface properties may be useful for the identification of transient interactions, which are extremely specific to the topology and chemical composition of binding sites. However, formation of stable complexes involves an interplay between affinity and entropy change and therefore it may be (and in fact it has been found to be, as shown below) less dependent on the interface characteristic features.

In this study, we discuss physical–chemical principles of macromolecular complexation and assess complex stability from the positions of chemical thermodynamics. We show that the binding energy of an interface is not a sole function of interface properties but also depends on the complex size and shape. We also show that entropy change due to complex formation is a major factor that, along with binding properties, determines complex size and symmetry. Next, we find that available theoretical approaches to the calculation of binding energy and entropy of dissociation have sufficient accuracy for the correct identification of macromolecular assemblies in crystals in 80–90% of instances.

Section snippets

Macromolecular Complexes in Solutions

Complexes differ from molecules in that, typically, their subunits do not make strong (covalent) chemical bonds. Rather, formation of complexes is only partly due to immediate (contact-dependent and electrostatic) interactions between the subunits. The dominant factors determining complex size and geometry are due to interaction with the solvent, therefore, existence and stability of complexes cannot be considered out of solvent context. Because of the relatively weak binding of their subunits,

Identification of Macromolecular Assemblies in Crystals

Theoretical analysis, presented in Macromolecular Complexes in Solutions, can answer the question whether a particular macromolecular assembly is stable, that is, may appear in solution in noticeable concentration on comparison with that of subunits. Our next goal is to identify stable complexes in macromolecular crystals and suggest scoring of their chances to be biological units.

Results and Discussion

Table 1 shows empirical parameters of theoretical models for binding energy ΔGint (Eq. (9)) and entropy of dissociation ΔS (Eq. (17)), obtained by the fitting procedure described in Implementation and calibration of parameters. As found, the system of inequalities (22) remains underfit for the data sets used, which means that maximal number of satisfied inequalities is achieved by many different sets of fitted parameters. This implies that the data sets used may be insufficient for the

Conclusion

Here, we have described the theoretical background of a new approach to the identification of macromolecular complexes in crystals. We have also introduced the new publicly available EBI-MSD service PISA, which implements the method. The software provides a single-button analysis of X-ray-resolved structures, including the assessment of macromolecular interfaces, presence of thermodynamically stable complexes, and their probable dissociation patterns. Structure solution by means of X-ray

Protein data bank accession codes

The following PDB entries were used in order to calibrate the protein–protein interaction parameters of Eqs. (1), (9) and (17): 1a19, 1a6q, 1a8o, 1afk, 1ah7, 1ako, 1amj, 1aoh, 1aua, 1aun, 1ayi, 1ayl, 1bc2, 1be0, 1bea, 1bkz, 1bp1, 1bry, 1bwz,1cki, 1ckm, 1ctj, 1dff, 1djx, 1dmr, 1esf, 1fdr, 1feh, 1fsu, 1iae, 1ips, 1kpt, 1kwa, 1lrv, 1mdt, 1mh1, 1mpgk 1np4, 1pda, 1pgs, 1pmi, 1ppo, 1ps1, 1rhs, 1ton, 1xgs, 232l, 2abx, 2acy, 2atj, 2bls, 2hex, 2ihl, 2mbr, 3cms, 1a3c, 1ad3, 1af5, 1afw, 1ajs, 1alk, 1hlr,

Acknowledgements

The authors would like to thank Prof. Joel Janin for reading the manuscript and helpful discussion of vibrational entropy effects. E.K. is supported by the research grant No. 721/B19544 from the Biotechnology and Biological Sciences Research Council (BBSRC) UK.

References (121)

  • T.J. You et al.

    Conformation and hydrogen ion titration of proteins: a continuum electrostatic model with conformational flexibility

    Biophys. J.

    (1995)
  • I. McDonald et al.

    Satisfying hydrogen bonding potential in proteins

    J. Mol. Biol.

    (1994)
  • A. Horovitz et al.

    Strength and co-operativity of contributions of surface salt bridges to protein stability

    J. Mol. Biol.

    (1990)
  • R.J. Zauhar et al.

    A new method for computing the macromolecular electric potential

    J. Mol. Biol.

    (1985)
  • H.-X. Zhou

    Boundary element solution of macromolecular electrostatics: interaction energy between two proteins

    Biophys. J.

    (1993)
  • R.M. Jackson et al.

    Rapid refinement of protein interfaces incorporating solvation: application to the docking problem

    J. Mol. Biol.

    (1998)
  • D.D.L. Minh et al.

    The entropic cost of protein–protein association: a case study on Acetylcholinesterase binding to Fasciculin-2

    Biophys. J.

    (2005)
  • A. Schuetz et al.

    Structural basis of inhibition of the human NAD+-dependent deacetylase SIRT5 by Suramin

    Structure

    (2007)
  • A.G. Evdokimov et al.

    Unusual molecular architecture of the Yersinia pestis Cytotoxin YopM: a leucine-rich repeat protein with the shortest repeating unit

    J. Mol. Biol.

    (2001)
  • M.G. Rossmann et al.

    The bacteriophage T4 DNA injection machine

    Curr. Opin. Struct. Biol.

    (2004)
  • S. Rowsell et al.

    Crystal structure of carboxypeptidase G2, a bacterial enzyme with applications in cancer therapy

    Structure

    (1997)
  • H. Wu et al.

    Structure of human chorionic gonadotropin at 2.6 Å resolution from MAD analysis of the selenomethionyl protein

    Structure

    (1994)
  • M. Fujinaga et al.

    Rat submaxillary gland serine protease, tonin structure solution and refinement at 1.8 Å resolution

    J. Mol. Biol.

    (1987)
  • A.G. Evdokimov et al.

    Unusual molecular architecture of the Yersinia pestis cytotoxin YopM: a leucine-rich repeat protein with the shortest repeating unit

    J. Mol. Biol.

    (2001)
  • A.-P.G. Haramis et al.

    Selectivity and promiscuity in Eph receptors

    Structure

    (2006)
  • J.F. Hunt et al.

    Structural adaptations in the specialized bacteriophage T4 co-chaperonin Gp31 expand the size of the Anfinsen cage

    Cell

    (1997)
  • J.M. Berg et al.

    Biochemistry

    (2002)
  • T. Liu et al.

    Light scattering by proteins

  • L.A. Feigin et al.

    Structure Analysis by Small Angle X-ray and Neutron Scattering

    (1987)
  • C. Dass

    Principles and Practice of Biological Mass Spectrometry

    (2001)
  • H.M. Berman et al.

    The Protein Data Bank

    Nucleic Acids Res.

    (2000)
  • J. Cavanagh et al.

    Protein NMR Spectroscopy, Principles and Practice

    (1995)
  • T.L. Blundell et al.

    Protein Crystallography

    (1976)
  • H. Ponstingl et al.

    Discriminating between homodimeric and monomeric proteins in the crystalline state

    Proteins: Struct. Funct. Genet.

    (2000)
  • H. Ponstingl et al.

    Automatic inference of protein quaternary structure from crystals

    J. Appl. Crystallogr.

    (2003)
  • S. Jones et al.

    Principles of protein–protein interactions

    Proc. Natl Acad. Sci. USA

    (1996)
  • P. Argos

    An investigation of protein subunit and domain interfaces

    Protein Eng.

    (1988)
  • S. Miller

    The structure of interfaces between subunits ofdimeric and tetrameric proteins

    Protein Eng.

    (1989)
  • E.A. Padlan

    On the nature of antibody combining sites: unusual structural features that may confer on these sites an enhanced capacity for binding ligands

    Proteins: Struct. Funct. Genet.

    (1990)
  • F. Glaser et al.

    A method for localizing ligand binding pockets in protein structures

    Proteins: Struct. Funct. Genet.

    (2006)
  • C.J. Tsai et al.

    Protein-protein interfaces: architectures and interactions in protein–protein interfaces and in protein cores. Their similarities and differences

    Crit. Rev. Biochem. Mol. Biol.

    (1996)
  • L. Lo Conte et al.

    The atomic structure of protein–protein recognition sites

    J. Mol. Biol.

    (1999)
  • P. Chakrabarti et al.

    Dissecting protein–protein recognition sites

    Proteins: Struct. Funct. Genet.

    (2002)
  • O. Keskin et al.

    A new, structurally nonredundant, diverse data set of protein interfaces and its implications

    Protein Sci.

    (2004)
  • U. Ogmen et al.

    PRISM: protein interactions by structural matching

    Nucleic Acids Res.

    (2005)
  • S. Gong et al.

    A protein domain interaction interface database: InterPare

    BMC Bioinformatics

    (2005)
  • W.J. Moore

    Physical Chemistry

    (1972)
  • N. Horton et al.

    Calculation of the free energy of association for protein complexes

    Protein Sci.

    (1992)
  • J. Janin et al.

    Protein–protein interaction at crystal contacts

    Proteins: Struct. Funct. Genet.

    (1995)
  • R.B. Hermann

    Theory of hydrophobic bonding. II. The correlation of hydrocarbon solubility in water with solvent cavity surface area

    J. Phys. Chem.

    (1972)
  • Cited by (7636)

    View all citing articles on Scopus
    View full text