Tools and Resources

Biochemistry and Chemical Biology

Structural and functional characterization of G protein–coupled receptors with deep mutational scanning

Department of Chemistry and Biochemistry, UCLA-DOE Institute for Genomics and Proteomics, Molecular Biology Institute, Quantitative and Computational Biology Institute, Eli and Edythe Broad Center of Regenerative Medicine and Stem Cell Research, and Jonsson Comprehensive Cancer Center, UCLA, United States
MRC Laboratory of Molecular Biology, United Kingdom
Department of Computer Science, Stanford University, Department of Computer Science, Institute for Computational and Mathematical Engineering, Stanford University, Department of Computer Science, Department of Molecular and Cellular Physiology, Stanford University School of Medicine, Department of Computer Science, Department of Structural Biology, Stanford University School of Medicine, United States

Oct 21, 2020

https://doi.org/10.7554/eLife.54895

Open access
Copyright information

Version of Record

Accepted for publication after peer review and revision.

Download
Cite
Share
CommentOpen annotations (there are currently 0 annotations on this page).

Version of Record published: December 1, 2020 (This version)
Accepted Manuscript published: October 21, 2020 (Go to version)
Accepted: October 16, 2020
Received: January 5, 2020

1. Of interest
A multi-hierarchical approach reveals d-serine as a hidden substrate of sodium-coupled monocarboxylate transporters

Pattama Wiriyasermkul, Satomi Moriyama ... Shushi Nagamori

Research Article Apr 23, 2024
Further reading

Abstract
Introduction
Results
Discussion
Materials and methods
Data availability
References
Article and author information
Metrics

Abstract

The >800 human G protein–coupled receptors (GPCRs) are responsible for transducing diverse chemical stimuli to alter cell state- and are the largest class of drug targets. Their myriad structural conformations and various modes of signaling make it challenging to understand their structure and function. Here, we developed a platform to characterize large libraries of GPCR variants in human cell lines with a barcoded transcriptional reporter of G protein signal transduction. We tested 7800 of 7828 possible single amino acid substitutions to the beta-2 adrenergic receptor (β₂AR) at four concentrations of the agonist isoproterenol. We identified residues specifically important for β₂AR signaling, mutations in the human population that are potentially loss of function, and residues that modulate basal activity. Using unsupervised learning, we identify residues critical for signaling, including all major structural motifs and molecular interfaces. We also find a previously uncharacterized structural latch spanning the first two extracellular loops that is highly conserved across Class A GPCRs and is conformationally rigid in both the inactive and active states of the receptor. More broadly, by linking deep mutational scanning with engineered transcriptional reporters, we establish a generalizable method for exploring pharmacogenomics, structure and function across broad classes of drug receptors.

Introduction

G-protein-coupled receptors (GPCRs) are central mediators of mammalian cells’ ability to sense and respond to their environment. The >800 human GPCRs respond to a wide range of chemical stimuli such as hormones, odors, natural products, and drugs by modulating a small set of defined pathways that affect cellular physiology (Isberg et al., 2016; Niimura et al., 2014). Their central role in altering relevant cell states makes them ideal targets for therapeutic intervention, with ~34% of all U.S. Food and Drug Administration (FDA)-approved drugs targeting the GPCR superfamily (Hauser et al., 2017).

Understanding GPCR signal transduction is non-trivial for several reasons. First, GPCRs exist in a complex conformational landscape, making traditional biochemical and biophysical characterization difficult (Deupi and Kobilka, 2010; Kobilka and Deupi, 2007). Consequently, most experimentally determined GPCR structures are truncated, non-native, or artificially stabilized (Isberg et al., 2016). Even when structures exist, the majority are of inactive states - GPCR conformations that cannot couple with a G protein and cause it to stimulate intracellular signaling. Second, the function of a GPCR depends on its ability to change shape. Static structures from both X-ray crystallography and cryo electron microscopy do not directly probe structural dynamics (Granier and Kobilka, 2012). Tools such as double electron-electron resonance (DEER) spectroscopy, nuclear magnetic resonance (NMR) spectroscopy, and computational simulation have aided our understanding of GPCR dynamics, but interpreting how structural dynamics relate to function is still difficult (Latorraca et al., 2017; Manglik and Kobilka, 2014).

Structure- and dynamics-based analyses generate sets of candidate residues that are potentially critical for function and warrant further characterization. These approaches are complemented by methods that directly perturb protein function such as mutagenesis followed by functional screening. Several reporter gene and protein complementation assays measure GPCR signal transduction by activation of a transcriptional reporter, and are often used to identify and validate important structural residues (Pei et al., 1994; Schönegge et al., 2017; Valentin-Hansen et al., 2012). Such transcriptional reporter assays exist for most major drug receptor classes, including the major GPCR pathways: G_ɑs, G_ɑq, G_ɑi/o, and arrestin signaling (Azimzadeh et al., 2017; Cheng et al., 2010; Kroeze et al., 2015).

Recent advances in DNA synthesis, genome editing, and next-generation sequencing have enabled deep mutational scanning (DMS) approaches that functionally assay all possible missense mutants of a given protein (Fowler and Fields, 2014; Starita et al., 2017). Several new methods allow for the generation and screening of DMS libraries in human cell lines and yeast (Kotler et al., 2018; Lee et al., 2018; Majithia et al., 2016; Mavor et al., 2018; Starita et al., 2018). Function is usually assessed by next-generation sequencing using screens that are bespoke to each gene’s function, or by more general approaches that allow characterization of expression levels rather than function (Matreyek et al., 2018). For GPCRs, the DMS of the CXCR4, CCR5, and T1R2 GPCRs used binding to external epitopes to test expression and ligand binding (Heredia et al., 2018; Park et al., 2019). Unfortunately, such assays tell us little about the signaling capacity of these mutants, which is the primary function of GPCRs and many other drug receptors.

Here, we develop an experimental approach to simultaneously profile variant libraries with barcoded transcriptional reporters in human cell lines using RNA-seq. Methods to detect GPCR activation in multiplex have been previously described by us and others (Botvinnik et al., 2010; Galinski et al., 2018; Jones et al., 2019). Galinski et al.’s method reports on GPCR activity with a β-arrestin proximity sensor, requiring engineering of both arrestin and the GPCR, and enabling broad detection of GPCR activation across multiple signaling modalities. Our method is widely applicable to GPCRs and across the druggable genome where transcriptional reporters exist. As a proof-of-principle, we perform DMS on a prototypical GPCR, the β₂-adrenergic receptor (β₂AR) and measure the consequences of these mutations through the cyclic AMP (cAMP) dependent pathway, the primary signaling modality of Gs-coupled GPCRs.

Results

Multiplexed screening platform for G_s-coupled GPCR signaling

We developed a system to build, stably express, and assay individual variants of the β₂AR in human cell lines. The β₂AR primarily signals through the heterotrimeric G_s protein, activating adenylyl cyclase upon agonist binding. In our platform, cAMP production stimulates transcription of a barcoded reporter gene, controlled by multimerized cAMP response elements (CRE, thus referred to as the CRE reporter for the rest of the manuscript), which can be quantified by RNA-seq (Figure 1A). Initially, we generated a HEK293T-derived cell line for stable integration of the GPCR-reporter construct (Figure 1B, Figure 1—figure supplement 1A,B). We also modified a previously developed Bxb1-landing pad system to allow for stable, once-only integration at the transcriptionally-silent H11 safe-harbor locus to avoid placing the CRE reporter within transcribed genes (Cheung et al., 2019; Duportet et al., 2014; Matreyek et al., 2017). To prevent endogenous signaling, we knocked out the gene encoding for β₂AR, ADRB2, and verified loss of CRE reporter gene activity in response to the β₂AR agonist, isoproterenol (Figure 1—figure supplement 1C). Our donor vector configuration ensures the receptor and resistance marker are only activated upon successful integration into the landing pad (Figure 1B). Lastly, we included several sequence elements in the donor vector to improve signal-to-noise of the assay: an insulator upstream of the CRE reporter and an N-terminal affinity tag to the receptor (Figure 1—figure supplement 1D,E). As a result, upon integration of a donor vector expressing wild-type (WT) β₂AR, isoproterenol induces CRE reporter gene expression in a dose-dependent manner (Figure 1B).

Figure 1 with 1 supplement see all

Download asset Open asset

A platform for deep mutational scanning of GPCRs.

(A) Overview of the multiplexed GPCR activity assay. Plasmids encoding *ADRB2* variants, a transcriptional CRE reporter of signaling activity, and 15 nucleotide barcode sequences that identify the variant are integrated into a defined genomic locus such that one variant is present per cell. Upon stimulation by isoproterenol, G-protein signaling induces transcription of the CRE genetic reporter and the barcode. Thus, the activity of a given variant is proportional to the amount of barcode mRNA which can be read out in multiplex by RNA-seq. (B) Schematic detailing the recombination of the reporter-receptor expression plasmid into the landing pad locus. Top right: activation of the CRE reporter integrated with (purple) or without (grey) exogenous *ADRB2* into the landing pad when stimulated with isoproterenol in Δ*ADRB2* cells via a luciferase CRE reporter gene assay. (C) Overview of library generation and functional assay. Missense variants are synthesized on an oligonucleotide microarray, the oligos are amplified with random DNA barcode sequences appended, and the variants are cloned into wild-type background vectors. Barcode-variant pairs are mapped with next-generation sequencing and the remaining wild-type receptor and CRE reporter sequences are cloned into the vector. Next, the variant library is integrated *en masse* into the serine recombinase (Bxb1) landing pad engineered at the H11 locus of Δ*ADRB2* HEK293T cells. This integration strategy ensures a single pair of receptor variant and barcoded CRE reporter is integrated per cell and avoids crosstalk. After selection, the library is stimulated with various concentrations of the β₂AR agonist, isoproterenol. Finally, mutant activity is determined by measuring the relative abundance of each variant’s barcoded reporter transcript with RNA-seq.

We designed and synthesized the receptor’s 7828 possible missense variants in eight segments on oligonucleotide microarrays (Figure 1C). We amplified the mutant oligos, attaching a random 15 nucleotide barcode sequence, and cloned them into one of eight background vectors encoding the upstream, wild-type portion of the gene. In this configuration, we mapped barcode-variant pairs with next-generation sequencing and subsequently utilized Type IIS restriction enzymes to insert the remaining sequence elements between the receptor and barcode. In the resulting mature donor vector, the barcode is located in the 3’ untranslated region (UTR) of the CRE reporter gene. We integrated the library into our engineered cell line, and developed protocols to ensure proper quantification of library members, most notably vastly increasing the numbers of cells we assayed and RNA processed for the RNA-seq (Figure 1—figure supplement 1F, Figure 2—figure supplement 1A).

Measurement of mutant activities and comparison to evolutionary metrics

We screened the mutant library at four concentrations of the β₂AR full-agonist isoproterenol: vehicle control, an empirically determined half-maximal activity (EC₅₀, 150 nM), full activity (EC₁₀₀, 625 nM), and beyond saturation of the WT receptor (E_max, 5 µM). We obtained reliable measurements (coefficient of variation <1) for 95–99% (7,461–7,749/7,828 depending on the agonist concentration) of possible missense variants (412 residues * 19 amino acids = 7828 possible missense variants) with two biological replicates at each condition (Figure 1C). We normalized these measurements against forskolin treatment, which induces cAMP signaling independent of the β₂AR (Insel and Ostrom, 2003). Forskolin treatment maximally induces the CRE reporter gene, therefore the relative barcode expression is proportional to the physical composition of the library. Each cell contains a single copy of the same CRE reporter sequence, therefore any differences in maximum transcriptional output between barcodes will be due to differences in the frequency of each barcode within the cell library. Finally, we define activity as the ratio of this value to the mean frameshift (Materials and methods). Each variant was represented by 10 barcodes (median), with biological replicates displaying Pearson’s correlations of 0.87 to 0.90 at the barcode level and 0.66 to 0.75 when summarized by individual variants (Figure 1—figure supplement 1G,H, Figure 2—figure supplement 1A). Of note, we aimed for 10 barcodes per variant in order to account for any effects individual barcodes will have on CRE reporter transcription and serve as statistical replicates for each variant.

The heatmap representation of the variant-activity landscape reveals global and regional trends in response to specific subtypes of mutations (Figure 2A). For example, the transmembrane domain and intracellular helix eight are more sensitive to substitution than the termini or loops, and this effect becomes more pronounced at higher agonist concentrations (Figure 2A; all p<0.001; Mann-Whitney U). The transmembrane domain and intracellular helix eight are also sensitive to helix-disrupting proline substitutions (Figure 2B, Figure 2—figure supplement 1B; all p<<0.001 except TM vs Helix-8; Mann-Whitney U). Microarray-derived DNA often contains single-base deletions (47% of oligos in our library) that will introduce frameshift mutations into our library (LeProust et al., 2010). As expected, frameshifts consistently display lower activity than missense mutations regardless of agonist concentration (Figure 2C; p<<0.001; Mann-Whitney U). Furthermore, the effect of frameshifts are markedly decreased in the C-terminus of the protein (Figure 2D; p<<0.001; Mann-Whitney U). We also built and integrated previously characterized mutants (Elling et al., 1999; Sato et al., 1999; Shenoy et al., 2006) into our system individually and measured activity with a luciferase CRE reporter gene at the same induction conditions (Figure 2E and Figure 2—figure supplement 1C). As expected, known null mutations (D113A and I135W) have significantly diminished activity relative to WT in both systems, even at E_max (all p<<0.001; Wald Test). Known hypomorphic mutations (S203A and S204A) also have a significant decrease in activity relative to WT at EC₁₀₀ (all p<<0.001; Wald Test), but are not significantly different than WT at E_max as expected (all p>0.01; Wald Test).

Figure 2 with 1 supplement see all

Download asset Open asset

Variant-activity landscape for 7800 missense variants of the β2AR and multiplexed assay validation.

(A) Top: Secondary structure diagram of the β₂AR: the N and C termini are black, the transmembrane helices are purple blocks, and the intra- and extracellular domains are colored blue and green, respectively. The EVmutation track (EVmut.) displays the mean effect of mutation at each position as predicted from sequence covariation (Hopf et al., 2017). Conservation track (Cons.) displays the sequence conservation of each residue across 55 β₂AR orthologs from the OMA database (Capra and Singh, 2007; Altenhoff et al., 2018). A.U. stands for arbitrary units and the scale for the EVmutation and sequence conservation tracks are individually 0–1 normalized. The shaded guides represent positions in the transmembrane domain. Bottom: The heatmap representation of mutant activity at each agonist condition. Variants are colored by their activity score. relative to the mean frameshift mutation. Activity is the measurement of signaling for each variant relative to the mean frameshift (see methods). (B) The distribution of mutant activity for proline substitutions is significantly different for amino acids that reside in the transmembrane domain/helix eight to those in the flexible loops and termini at EC₁₀₀ (all p<<0.001 except TM vs Helix 8; Mann-Whitney U). (C) The distribution of frameshift mutant activity (red) is significantly different than the distribution of designed missense mutations (blue) across increasing isoproterenol concentrations (both p<<0.001; Mann-Whitney U). Mean frameshift activity marked with a dashed line. (D) Relative effect of the mean frameshift mutant activity per position is markedly decreased in the unstructured C-terminus of the protein (shaded region) and is consistent across agonist concentration (both p<<0.001; Mann-Whitney U). Blue line represents the LOESS fit. (E) Mutant activity measured individually with a luciferase CRE reporter gene compared to the multiplexed assay at EC₁₀₀ and E_Max isoproterenol induction. Known null mutations (D113A, I135W) have no dose response between EC₁₀₀ and E_max and are significantly different than synonymous mutants at both concentrations in both systems (all p<<0.001; Wald test). Alternatively, known hypomorphic mutations (S203A, S204A) are significantly different than synonymous mutations at EC₁₀₀ (all p<<0.001; Wald test), but are not significantly different at E_max (all p>0.01; Wald test). Bars represent mean value in the luciferase data. In the Individual facet, each dot represents a replicate measurement and in the multiplexed facet, each dot represents a different barcode.

Metrics for sequence conservation and covariation are often used to predict the effects a mutation will have on protein function (Adzhubei et al., 2013; Capra and Singh, 2007; Hopf et al., 2017). Mutational tolerance, the mean activity of all amino acid substitutions per residue at each agonist concentration, is highly correlated to conservation, both across species for the β₂AR (Figure 3—figure supplement 1A; Spearman's ρ = −0.74; 55 orthologs, predominantly mammals but including a few other vertebrates as well as a small number of invertebrate beta-like sequences, identified from the OMA Database, Supplementary file 1), and across all Class A GPCRs (Spearman's ρ = −0.68; Figure 3A and Figure 3—figure supplement 1B; Altenhoff et al., 2018; Capra and Singh, 2007; Hopf et al., 2017) at EC₁₀₀. From this point on, any use of the words tolerance or intolerance in this manuscript refer to mutational tolerance. Correlation between our data and both predictors increases with agonist concentration up to EC₁₀₀ (Figure 3—figure supplement 1A,B). We found a subset of residues in extracellular loop 2 (ECL2), including C184 and C190 that form an intraloop disulfide bridge, that were more intolerant to mutation than expected given their conservation across Class A GPCRs. This suggests a fairly specific functional role for this motif in the β₂AR (Figure 3A). On an individual variant level, mutational responses correlate (Spearman's ρ = 0.520) with EVmutation, a predictor of mutational effects from sequence covariation (Figure 3B and Figure 3—figure supplement 1C; Altenhoff et al., 2018; Capra and Singh, 2007; Hopf et al., 2017).

Figure 3 with 1 supplement see all

Download asset Open asset

Individual mutations and residues reveal evolutionary and structural insights into β2AR function.

(A) Positional conservation across Class A GPCRs correlates with mutational tolerance (Spearman's ρ = −0.676, Pearson’s r = −0.681), the mean activity of all amino acid substitutions per residue at each agonist concentration, at EC₁₀₀. However, four of the least conserved positions (C190, C184, A181, Y185) are highly sensitive to mutation and are located in ECL2, suggesting this region is uniquely important to the β2AR. The blue line is a simple linear regression. (B) Individual mutant activity correlates with EVmutation (Spearman's ρ = 0.521, Pearson’s r = 0.480) at EC₁₀₀. The blue line is a simple linear regression. (C) Activity of individual mutants present in the human population from the gnomAD database stratified by allele frequency. Mutations are classified as potential loss of function (LoF) mutations (orange) are classified as such (shaded region) if the mean activity at EC₁₀₀ plus the standard error of the mean (upper error bar) is more than two standard deviations below mean frameshift mutant activity (dashed line). (D) The distribution of the 100 most basally activating mutations across the β2AR snake plot reveals a clustering of mutants in the termini, TM1, TM5, and TM6. (E) Top: Distribution of the 100 most basally activating mutations stratified by domain. Bottom: The distribution of the 100 most basally activating mutations across the β2AR 3D structure (PDB: 3SN6). These positions (colored as in D) are concentrated on the surface of the β2AR (G_ɑs shown in blue).

Population genetics and structural analysis of individual variants

In addition to evolutionary metrics, understanding the functional distribution of ADRB2 variants found within the human population is important given the extensive variation found among GPCR drug targets (Hauser et al., 2018). The Genome Aggregation Database (gnomAD) reports variants found across 141,456 individuals (Karczewski et al., 2019), and many of the 180 ADRB2 missense variants are of unknown significance. We classified 11 of these variants as potentially loss of function, by comparing their activity to the distribution of frameshift mutations found in our assay (Figure 3C; see Materials and methods). Given that measurements of individual mutations are noisy (average coefficient of variation = 0.55), this analysis is best suited as a funnel to guide further characterization (see Discussion).

However, our analysis is more robust when we aggregate the signal of multiple mutations at a given position. Therefore, we compiled a list of the 100 most activating mutations at vehicle control and the 100 least active mutations at EC₁₀₀ and mapped their location on the β₂AR structure. As expected, the least active mutations tended to reside within the core of the transmembrane domain (Figure 3—figure supplement 1D,E). Alternatively, the most activating mutations mapped to TM1, TM5, TM6, and residues that typically face away from the internal core of the receptor (Figure 3D,E). Of note, a group of these mutations in TM5 face TM6, which undergoes a large structural rearrangement during receptor activation (Weis and Kobilka, 2018). Activating mutants are also enriched in the termini, ICL3, and Helix 8. Concentration at the termini is unsurprising, as these regions have known involvement in surface expression and our current assay does not discriminate between increased signaling potency and expression (see discussion; Dong et al., 2007). However, there are cases of constitutively active mutations in the N terminus that increase signaling potency without affecting surface expression, such as T11S of the melanocortin 4 (MC4R) (Lotta et al., 2019). Similarly, the enrichment of activating mutants in ICL3 appears to reflect its role in G-protein binding (Ozcan et al., 2013; Ozgur et al., 2016; West et al., 2011). Lastly, we observe a number of activating mutations in the terminal residue, L413. A recent study of genetic variation in human MC4R also found a gain-of-function mutation at the terminal residue of the receptor, suggesting a possible conserved role for this position in regulating basal activity of GPCRs (Lotta et al., 2019).

Unsupervised learning reveals functionally relevant groupings of residues

Given that our data spans thousands of mutations across several treatment conditions, we used unsupervised learning methods to reveal hidden regularities within groups of residues’ response to mutation. In particular, we applied Uniform Manifold Approximation and Projection (UMAP) (McInnes and Healy, 2018) to learn multiple different lower dimensional representations of our data and clustered the output with density-based hierarchical clustering (HDBSCAN; Figure 4—figure supplement 1; Campello et al., 2013). We found residues consistently separated into six clusters that exhibit distinct responses to mutation (Figure 4A,B). Clusters 1 and 2 are globally intolerant to all substitutions, whereas Cluster 3 is vulnerable to proline and charged substitutions. Cluster 4 is particularly inhibited by negatively charged substitutions and Cluster five by proline substitutions, while Cluster 6 is unaffected by any mutation. Mapping these clusters onto a 2D snake plot representation shows Clusters 1–5 primarily comprise the transmembrane domain, while Cluster 6 resides in the loops and termini (Figure 4C). These flexible regions are often truncated before crystal structure determination to minimize conformational variability (Rosenbaum et al., 2007). Surprisingly, a number of residues from Cluster five also map there, suggesting potential structured regions. However, Cluster 5 assignment is largely based on the response of a single proline mutation, and thus is more susceptible to noise than the other clusters (see Discussion).

Figure 4 with 1 supplement see all

Download asset Open asset

Unsupervised learning segregates residues into clusters with distinct responses to mutation.

(A) Amino acids were segregated into classes based on their physicochemical properties and mean activity scores were reported by class for each residue. With Uniform Manifold Approximation and Projection (UMAP) a 2D representation of every residue’s response to each mutation class across agonist conditions was learned. Each residue is assigned into one of six clusters using HDBSCAN (see Figure 4—figure supplement 1). (B) Class averages for each of these cluster reveal distinct responses to mutation. The upper dashed line represents the mean activity of Cluster 6 and the lower solid line represents the mean activity of frameshift mutations. (C) A 2D snake plot representation of β₂AR secondary structure with each residue colored by cluster identity.

Next, we projected the clusters onto the hydroxybenzyl isoproterenol-bound structure (Figure 5—figure supplement 1A; PDB: 4LDL). The globally intolerant Clusters 1 and 2 segregate to the core of the protein, while the charge-sensitive Cluster 3 is enriched in the lipid-facing portion (Figure 5—figure supplement 1B). This suggests that differential patterns of response to hydrophobic and charged substitutions could correlate with side chain orientation within the transmembrane domain. Indeed, residues that are uniquely charge sensitive are significantly more lipid-facing than those that are sensitive to both hydrophobic and charged mutations at EC₁₀₀ (Figure 5A, Figure 5—figure supplement 1C–D, p=0.000036; Mann-Whitney U) (Mitternacht, 2016).

Figure 5 with 2 supplements see all

Download asset Open asset

Mutational tolerance elucidates broad structural features and critical residues of the β₂AR.

(A) Residues within the transmembrane domain colored by their tolerance to particular classes of amino acid substitution. Teal residues are intolerant to both hydrophobic and charged amino acids (globally intolerant), and brown residues are tolerant to hydrophobic amino acids but intolerant to charged amino acids (charge intolerant). The charge-sensitive positions’ side chains are enriched pointing into the membrane, while the globally intolerant positions’ side chains face into the core of the protein (see Figure 5—figure supplement 1). (B) The crystal structure of the hydroxybenzyl isoproterenol-activated state of the β₂AR (PDB: 4LDL) with residues from the mutationally intolerant Clusters 1 and 2 highlighted in maroon. (C) 412 β₂AR residues rank ordered by mutational tolerance at the EC₁₀₀ isoproterenol condition. Residues in known structural motifs (colored points) are significantly more sensitive to mutation than other positions on the protein (p<<0.001). Dashed line demarcates the median of the ranking. The top 15 mutationally intolerant residues are listed and colored by motif association. (**D-F**) Selected vignettes of residues from the mutationally intolerant UMAP clusters and ranking. (D) W286^6x48 of the CWxP motif and the neighboring G315^7x41 are positioned in close proximity. Substitutions at G315^7x41 are likely to cause a steric clash with W286^6x48 (PDB: 4LDL). (E) An inactive-state water-mediated hydrogen bond network (red) associates N51^1x50 and Y326^7x53 (PDB: 2RH1). Disruption of this network may destabilize the receptor. (F) The ligand-bound orthosteric site surface colored by mutational tolerance. Receptor-ligand contacts with the catecholamine head (present in agonist used in assay) are more intolerant to mutation than those in the hydroxybenzyl tail (not present in agonist used in assay) of the isoproterenol analog depicted in this crystal structure (PDB: 4LDL).

Mutational tolerance stratifies the functional relevance of structural features

Decades of research have revealed how ligand binding is coupled to G-protein activation through a series of conserved motifs (Weis and Kobilka, 2018). This comprehensive, unbiased screen enables us to systematically evaluate and rank the functional importance of every implicated residue. The globally intolerant UMAP clusters (1 and 2) highlight many residues from these motifs and suggest novel residues for investigation (Figure 5B). We can further resolve the significance of individual residues within these motifs by ranking the mutational tolerance of positions in these clusters at EC₁₀₀ (Figure 5C). In fact, 11 of the 15 most mutationally intolerant positions belong to the PIF, CWxP, and NPxxY motif, orthosteric site, a water-mediated bond network, an extracellular disulfide bond, and a cholesterol-binding site. Interestingly, the second most intolerant residue is the uncharacterized G315^7x41 (GPCRdb numbering in superscript Isberg et al., 2016). In the active state, G315’s alpha carbon points directly at W286^6x48 of the CWxP motif, the fourth most intolerant residue, and any substitution at G315^7.x41 will likely clash with W286^6x48 (Figure 5D). We confirmed G315’s intolerance with a luciferase CRE reporter gene assay, where mutants G315T and G315L resulted in complete loss of function (Figure 5—figure supplement 2A).

Recent simulations suggest water-mediated hydrogen bond networks play a critical role in GPCR function (Venkatakrishnan et al., 2018; Venkatakrishnan et al., 2019). The third most intolerant residue in our assay, Y326^7x53 of the NPxxY motif, is especially important as it switches between two of these networks during receptor activation. In the inactive state, Y326^7x53 contacts N51^1x50 and D79^2x50, two of the top 15 most intolerant positions (Figure 5E). N51L and N51Y also result in complete loss of function when assayed individually (Figure 5—figure supplement 2A). The movement of Y326^7x53 is also part of a broader rearrangement of residue contacts that are conserved across Class A GPCRs, with the majority of these residues being intolerant to mutation (Figure 5—figure supplement 2B; Venkatakrishnan et al., 2016). Aside from G315^7x41, the other uncharacterized residues in the top 15 include W99^23x50, S319^7x46, and G83^2x54. Given the correlation between mutational tolerance and functional relevance, further investigation of these residues will likely reveal insights into GPCR biology.

Next, we hypothesized residues in the orthosteric site that directly contact isoproterenol would respond uniquely to mutation; however, no crystal structure of β₂AR bound to isoproterenol exists. Using the crystal structure of the β₂AR bound to the analog, hydroxybenzyl isoproterenol (PDB: 4LDL), we find that residues responsible for binding the derivatized hydroxybenzyl tail have significantly higher mutational tolerance than residues that contact the catecholamine head common to both isoproterenol and hydroxybenzyl isoproterenol at EC₁₀₀ (p=0.0162; Figure 5F, Figure 5—figure supplement 2C). Given this discrimination, we believe DMS can be a powerful tool for mapping functional ligand-receptor contacts in GPCRs.

GPCR signaling is dependent on a series of intermolecular interactions, and the numerous β₂AR crystal structures enable us to comprehensively evaluate residues mediating such interactions. For example, cholesterol is an important modulator of GPCR function (Thal et al., 2018), and the timolol-bound inactive-state β₂AR structure elucidated the location of a cholesterol-binding site (PDB: 3D4S) (Hanson et al., 2008). Of residues in this pocket, W158^4x50 is predicted to be most important for cholesterol binding, and in agreement, W158^4x50 is the most mutationally intolerant (Figure 5—figure supplement 2D). Similarly, a number of studies have mutagenized residues at the G_ɑs-β₂AR interface (Jensen et al., 2001; Moro et al., 1993; O'Dowd et al., 1988; Rasmussen et al., 2011; Sheikh et al., 1999; Swaminath et al., 2003; Valentin-Hansen et al., 2012; Valiquette et al., 1995), but a complete understanding of the relative contribution of each residue to maintaining the interface is unknown. Most residues are more mutationally tolerant than residues in the intolerant Clusters 1 and 2, but the four most intolerant positions are I135^3x54, V222^5x61, A271^6x33, and Q229^5x68 respectively (Figure 5—figure supplement 2E). Q229^5x68 appears to coordinate polar interactions between D381 and R385 of the α5 helix of G_ɑs, whereas V222^5x61 and I135^3x54 form a hydrophobic pocket on the receptor surface (Figure 5—figure supplement 2F).

A structural latch is conserved across Class A GPCRs

Analysis of the mutational tolerance data has highlighted the functional importance of previously uncharacterized residues. In particular, W99^23x50 of extracellular loop 1 (ECL1) is the 13^th most intolerant residue, which is unusual as mutationally intolerant residues are rare in the flexible loops. Furthermore, W99^23x50 is proximal to the disulfide bond C106^3x25-C191^45x50, an important motif for stabilization of the receptor’s active state (Noda et al., 1994; Dohlman et al., 1990; Hulme, 2013; Dohlman et al., 1990; Noda et al., 1994). While aromatic residues are known to facilitate disulfide bond formation, only tryptophan is tolerated at this position (Bhattacharyya et al., 2004). We hypothesize W99’s indole group hydrogen bonds with the backbone carbonyl of the neighboring uncharacterized and mutationally intolerant G102^3x21, positioning W99^23x50 toward the disulfide bond. Other aromatic residues are unable to form this hydrogen bond and are less likely to be positioned properly. G102^3x21 also hydrogen bonds with the backbone amide of C106^3x25, further stabilizing this region. To verify this claim, we individually confirmed the mutational intolerance of both W99^23x50 and G102^3x21 (Figure 6—figure supplement 1A). Additionally, we evaluated surface expression for a subset of W99^23×50 and G102^3×21 mutants (Figure 6—figure supplement 1B). Relative to three previously characterized mutants with severely impaired surface expression (Parmar et al., 2017) and wild-type β2AR, the mutants exhibited mildly impaired to normal surface expression—supporting a role in signaling for these residues.

Interestingly, W99^23x50, G102^3x21, and C106^3x25 are almost universally conserved across Class A GPCRs (Vass et al., 2018; Figure 6A, Figure 6—figure supplement 1C). Comparison of over 25 high-resolution structures of class A GPCRs from five functionally different sub-families and six different species revealed that these residues consistently contact each other (Figure 6B,C). Based on the evolutionary and structural conservation across Class A GPCRs, we find W99^23x50, G102^3x21, and the C106^3x25-C191^45x50 disulfide bond represent a conserved WxxGxxxC motif, forming an extracellular ‘structural latch’ that is maintained consistently throughout GPCRs spanning diverse molecular functions and phylogenetic origins. While a minority of Class A GPCRs lack the Trp/Gly combination of residues in the ECL1 region, these receptors have varying structures in ECL1: an alpha helix (sphingosine S1P receptor), beta strand (adenosine receptor), or even intrinsically disordered (viral chemokine receptor US28) (Figure 6—figure supplement 1D).

Figure 6 with 1 supplement see all

Download asset Open asset

A conserved extracellular tryptophan-disulfide ‘structural latch’ in class A GPCRs is mutationally intolerant and conformation-independent.

(A) Sequence conservation of extracellular loop 1 (ECL1) and the extracellular interface of TM3 (202 Class A GPCRs with a disulfide bridge between TM3 and ECL1). (B) Left: Depiction of the interaction of W99^23x50, G102^3x21, and C106^3x25 in ECL1 of the β₂AR. Top Right: Conservation of the structure of the ECL1 region across functionally different class A GPCRs. Bottom Right: Activity of all 19 missense variants assayed for each of the three conserved residues, with the mean activity (mutational tolerance) shown as a blue bar. The shaded bars represent the mean mutational tolerance ± 1 SD of residues in the tolerant Cluster 6 (green) and the intolerant Clusters 1 and 2 (red). (C) A hydrogen bond network between mutationally intolerant positions W99^23x50, G102^3x21, and C106^3x25. Representative examples of the structural latch are shown. (D) This structural latch is maintained in both the inactive and active state structures for the β₂AR (inactive: 2RH1, active: 3P0G), the M2 muscarinic receptor (inactive: 3UON, active: 4MQS), the angiotensin II type one receptor (inactive: 4ZUD, active: 6OS1), and the mu-opioid receptor (inactive: 4DKL, active: 5C1M).

To better understand the dynamics of the structural latch, we compared the active and inactive state crystal structures of four representative GPCRs. While the overall RMSD between the inactive and active states for the β2AR, M2 muscarinic receptor, and μ opioid receptorreceptor are 1 Å,1.5 Å, and 1.7 Årespectively, the conformation of the latch in the active and inactive states is nearly identical in each receptor (Figure 6D). This suggests that the extracellular structural latch is part of a larger rigid plug present at the interface of the transmembrane and extracellular regions, which could be important for the structural integrity of the receptor and possibly guide ligand entry.

In Class A receptors lacking components of the WxxGxxxC motif, introducing the Trp-Gly interaction could increase the stability of the receptor for structural studies. In fact, in the BLT1 receptor structure, a Gly mutation at 3×21 was found to be thermostabilizing (Hori et al., 2018). Other candidate receptors lacking a Gly at 3×21 include the alpha2B receptor and the neuropeptide FF2 receptor, where the R81G and D112G mutations have potential to increase receptor stability, respectively. More broadly, these ECL1/TM3 positions conserved across Class A GPCRs could serve as candidate sites for introducing thermostabilizing mutations.

Discussion

Our findings showcase a new, generalizable approach for DMS of human protein targets with transcriptional reporters. Such reporters enable precise measurements of gene-specific function that can be widely applied across the druggable genome. We show comprehensive mutagenesis can illuminate the structural organization of the protein and the local environment of individual residues. These results suggest DMS can work in concert with other techniques (e.g. X-ray crystallography, Cryo-EM, and molecular dynamics) to augment our understanding of GPCR structure-function relationships. Moreover, we identify key residues for β2AR function including uncharacterized positions that inform about receptor stability and activation. Importantly, these approaches can be undertaken when direct structural information is unavailable but reporters exist, which is true for most GPCRs.

There are still a number of limitations to our current approach that we expect will improve as we develop the method. Importantly, we did not quantify cell-surface expression directly in our high-throughput functional assay, and thus we cannot distinguish between mutations that substantially affect G-protein signaling and those that affect cell-surface expression. In particular, mutations that lead to increased signal in our assays could in fact work by reducing GPCR internalization and not by increasing the intrinsic activity of the receptor. However, we express our variant library in a genomic context at a controlled copy number, dampening the effects of expression-related artifacts typically associated with assays that involve transiently transfected receptor. In addition, expression level alterations can affect the dynamics of signaling and thus may be physiologically relevant. For example, the GPCR MC4R is haploinsufficient, and rare heterozygous mutations that eliminate or reduce receptor expression are associated with obesity (Farooqi et al., 2003; Khera et al., 2019; Lotta et al., 2019). Combining our assay with new generalized, multiplexed assays of protein expression levels in human cells can help tease apart mechanistic reasons for differences in signaling (Matreyek et al., 2018). Secondly, the current signal-to-noise ratio of this approach at single-variant resolution restricted our analyses to mutations with extreme effects on receptor function. This made interpreting single mutations challenging. For example, several mutations within the C terminus exhibited a sensitivity to proline substitution. This was surprising because the C terminus is thought to be a flexible, disordered region (Cherezov et al., 2007; Rasmussen et al., 2007). We individually synthesized and tested three of these mutations (E369P, R253P, and T360P) and found that they did not disrupt function (Figure 5—figure supplement 2A). Thus, individual variant data should be confirmed by more traditional assays until the signal-to-noise ratio is improved. However, our measurements are robust in aggregate, and pointed to new receptor biology, providing structural and functional insights. Further improvements to the signal-to-noise will facilitate the exploration of more subtle aspects of individual mutations.

Looking forward, our method is well-poised to investigate many outstanding questions in GPCR and drug receptor biology. First, individual GPCRs signal through multiple pathways, including pathways mediated by various G proteins and arrestins (Galandrin et al., 2007; DeWire et al., 2007; Hilger et al., 2018; Luttrell, 2008). We have only measured cAMP signaling in this manuscript, the primary signaling pathway of Gs-coupled GPCRs, but transcriptional reporters exist for the other signaling modalities and are compatible with our multiplexed approach. By leveraging transcriptional reporters for each of these pathways, we can understand the mechanisms that underpin signal transduction and biased signaling (Reiter et al., 2012). Second, GPCRs are often targeted by synthetic molecules with either unknown or predicted binding sites, and often have no known structures. We find ligands imprint a mutational signature on their receptor contacts which could potentially reveal the binding site for allosteric ligands. However, it should be noted that variation in receptor response to chemically diverse ligands at the cell surface may not reflect differences in downstream signal (Tsvetanova et al., 2017). We also found several regions on the external surface of the receptor where activating mutants are clustered. Since perturbations at these sites appear to increase receptor activity, they could potentially be targeted by positive allosteric modulators or allosteric agonists (Thal et al., 2018). Third, the identification of mutations that can stabilize specific conformations or increase receptor expression can aid in GPCR structure determination (Serrano-Vega et al., 2008; Tate and Schertler, 2009). Fourth, the development of stable cell libraries expressing human medicinally related GPCR variants can be combined with large-scale profiling against small molecule libraries to build very large-scale empirical maps for how small molecules modulate this broad class of receptors (Botvinik and Rossner, 2012; Galinski et al., 2018; Jones et al., 2019). Finally, our approach is generalizable to many classes of drug receptors where transcriptional reporters exist or can be developed (O'Connell et al., 2016), enabling the functional profiling, structural characterization, and pharmacogenomic analysis for most major drug target classes.

Materials and methods

Key resources table

Reagent type (species) or resource	Designation	Source or reference	Identifiers	Additional information
Cell line (Homo-sapiens)	HEK293T	ATCC	CRL-3216
Cell line (Homo-sapiens)	HEK293TΔADRB2 + Landing Pad	This paper		Construction Information found in Endogenous ADRB2 Deletion using CRISPR/Cas9 and Landing Pad Genome Editing Sections
Gene (Homo-sapiens)	ADRB2	NCBI	Gene ID 154
Chemical compound, drug	Isoproterenol	Millipore Sigma	I5627
Chemical compound, drug	Forskolin	Millipore Sigma	F6886
Commercial assay or kit	Dual Glo Luciferase Assay	Promega	E2920
Recombinant DNA reagent	TALEN plasmids	Addgene	#51554 #51555
Recombinant DNA reagent	SpCas9 plasmid	Addgene	pX339
Sequence-based reagent	Oligonucleotide Microarray	Agilent	Custom Synthesis
Commercial assay or kit	Nextseq Mid Output 300 cycle	Illumina	20024905
Commercial assay or kit	Nextseq High Output 75 cycle	Illumina	20024906
Strain, strain background (Escherichia coli)	Dh5 alpha	New England Biolabs	C2989K
Antibody	AlexaFluor 488 Anti-Flag rat monoclonal	Thermo Fisher	MA1-142-A488	(1:100)
Transfected construct (Homo-sapiens)	ADRB2 barcoded variant-reporter library	This paper		Reagent Construction Information found in Variant Library Generation and Cloning Section
Commercial assay or kit	RNEasy Miniprep Kit	Qiagen	74104
Commercial assay or kit	Plasmid Plus DNA Maxi Kit	Qiagen	12963
Commercial assay or kit	Superscript IV	Thermo Fisher	18091050
Commercial assay or kit	Lipofectamine 3000	Thermo Fisher	L3000001
Commercial assay or kit	D1000 DNA Screen Tape	Agilent	5067–5582
Commercial assay or kit	D1000 Reafents	Agilent	5067–5583
Commercial assay or kit	SYBR FAST QPCR Master Mix	Roche	07959362001
Commercial assay or kit	Zymo Clean Gel DNA Recovery Kit	Zymo Research	D4007
Commercial assay or kit	Zymo DNA Clean and Concentrator Kit	Zymo Research	D4013
Chemical compound, drug	CD293	Thermo Fisher Scientific	11913019
Software, algorithm	BBTools	Brian Bushnell	https://jgi.doe.gov/data-and-tools/bbtools/
Software, algorithm	Jensen-Shannon Conservation	https://doi.org/10.1093/bioinformatics/btm270
Software, algorithm	OMA Orthology Database	https://doi.org/10.1093/nar/gkx1019
Software, algorithm	FreeSASA	10.12688/f1000research.7931.1
Software, algorithm	EVmutation	doi:10.1038/nbt.3769
Software, algorithm	Parasail	http://dx.doi.org/10.1186/s12859-016-0930-z

Condition	Repeat	Reads
0	1	46811302
0	2	43527478
0.150	1	51795485
0.150	2	47528508
0.625	1	45295157
0.625	2	58560000
5	1	48206666
5	2	34977852
F	1	51172562
F	2	42013807
F_5	1	41727633
F_5	2	38259270

Share this article

Cite this article

A platform for deep mutational scanning of GPCRs.

Variant-activity landscape for 7800 missense variants of the β2AR and multiplexed assay validation.

Individual mutations and residues reveal evolutionary and structural insights into β2AR function.

Unsupervised learning segregates residues into clusters with distinct responses to mutation.

Mutational tolerance elucidates broad structural features and critical residues of the β2AR.

A conserved extracellular tryptophan-disulfide ‘structural latch’ in class A GPCRs is mutationally intolerant and conformation-independent.

Author details

Eric M Jones

Present address

Contribution

Contributed equally with

Competing interests

Nathan B Lubock

Present address

Contribution

Contributed equally with

Competing interests

AJ Venkatakrishnan

Contribution

Competing interests

Jeffrey Wang

Contribution

Competing interests

Alex M Tseng

Contribution

Competing interests

Joseph M Paggi

Contribution

Competing interests

Naomi R Latorraca

Contribution

Competing interests

Daniel Cancilla

Contribution

Competing interests

Megan Satyadi

Contribution

Competing interests

Jessica E Davis

Contribution

Competing interests

M Madan Babu

Contribution

Competing interests

Ron O Dror

Contribution

For correspondence

Competing interests

Sriram Kosuri

Contribution

For correspondence

Competing interests

Downloads (link to download the article as PDF)

Open citations (links to open the citations from this article in various online reference manager services)

Cite this article (links to download the citations from this article in formats compatible with various reference manager tools)

Categories and tags

Research organism

Further reading

Mutational tolerance elucidates broad structural features and critical residues of the β₂AR.