Introduction

Alternative splicing of pre-mRNAs is considered to be a major source of proteomic complexity in metazoans, and many alternative splicing events are controlled in tissue- and cell-type-dependent manners [14]. A variety of auxiliary trans-acting factors and cis-acting elements involved in the regulation of alternative splicing have been identified through biochemical, genetic, and bioinformatic analyses. The UGCAUG stretch was identified as a common cis-element critical for the cell-type-dependent regulation of well-studied model exons from alternatively spliced genes [511]. Jin et al. [12] first demonstrated that a zebrafish homologue of Caenorhabditis elegans putative RNA-binding protein FOX-1 specifically binds to the GCAUG stretch in vitro and provided evidence that vertebrate Fox-1 family proteins regulate splicing of alternatively spliced exons via (U)GCAUG element(s). Since this discovery, the Fox-1 family proteins have been proved to be responsible for alternative splicing regulation of a variety of alternatively spliced genes with conserved (U)GCAUG element(s). Current literature on the Fox-1 family will be summarized in this review.

Overview of the Fox-1 family proteins

Table 1 summarizes Fox-1 family members in human, mouse, zebrafish, fruitfly, and nematode worm. The Fox-1 family is featured by its highly conserved RNA recognition motif (RRM)-type RNA binding domain (Fig. 1a). Mammals have three family members, Fox-1, Fox-2, and Fox-3 (Table 1). Fox-1 (also known as A2BP1, ataxin-2 binding protein 1) and Fox-2 (also known as RBM9, RNA-binding motif protein 9) have an identical RRM domain (Fig. 1a). The third member, Fox-3, has not yet been well characterized in the literature. The zebrafish has four Fox-1 family members, two of which may not be expressed. A2BP1-like (a2bp1l, also known as zFox-1) was characterized as the first vertebrate homologue of C. elegans FOX-1 by its specific binding to the GCAUG stretch [12]. Drosophila has one homologous gene, CG32062, encoding a nearly identical RRM domain (Fig. 1a). C. elegans has three family members, FOX-1, ASD-1, and SPN-4 (Fig. 1a), as described in detail in this review.

Table 1 The Fox-1 family genes from human, mouse, zebrafish, Drosophila, and C. elegans
Fig. 1
figure 1

The Fox-1 family RNA-binding proteins are phylogenetically conserved. a Amino acid sequence alignment of the RRM domain of the Fox-1 family proteins in human and mouse (Fox-1, Fox-2, and Fox-3), zebrafish (a2bp1, LOC797814, zFox-1/2bp1l, and zgc85694), fruitfly (Drosophila CG32062), and nematode (C. elegans FOX-1, ASD-1, and SPN-4). Identical amino acid residues are shaded in orange and residues with similar properties are in yellow. The secondary structure of the human Fox-1 RRM domain is indicated above the alignment [23]. b Schematic illustration of the domain structure of mouse Fox-1 and Fox-2, zebrafish Fox-1/A2bp1l, and C. elegans FOX-1. RRM domain is in orange. Identities of amino acid sequences of each domain compared to that of mouse Fox-1 are indicated. Amino acid positions are indicated. c Amino acid sequence alignment of putative hPY-NLS of the Fox-1 family proteins. The C-terminal-most regions of mouse Fox-1, Fox-2, zebrafish a2bp1, zFox-1/2bp1l, Drosophila CG32062, and C. elegans FOX-1 and ASD-1 are aligned with confirmed hPY-NLS of human hnRNP A1, hnRNP D, hnRNP F, and TAP [18]. Residues that match the consensus of hPY-NLS [18] are shaded in orange and yellow. ϕ, hydrophobic side chain. Positions of the N-terminal-most residues are indicated. d Solution structure of the RRM domain of Fox-1/Fox-2 in complex with 5′-UGCAUGU-3′ in ribbon (protein backbone) and stick (RNA) representation. Right panel Image of left panel rotated by 90° as indicated. Important protein side chains and peptide bonds involved in hydrophobic or static interactions with the RNA are represented as sticks. Oxygen, nitrogen, hydrogen, and phosphorus atoms are in red, blue, light yellow, and magenta, respectively. Carbon atoms in the protein, the RNA backbone, and bases are in green, yellow, and gray, respectively. Hydrogen bonds are indicated as cyan dotted sticks. Hydrogen atoms in the RNA backbone are omitted except for one involved in a hydrogen bond. Illustrations were generated from a dataset deposited as ‘2err’ in PDBj database with MOLMOL [78]

Mammalian Fox-1 mRNA was exclusively detected in brain, skeletal muscle, and heart [1214]. High levels of Fox-2 mRNA were detected in whole embryo and in adult brain, heart, and ovary [14]. Fox-1 and Fox-2 proteins were co-localized in nuclei of neurons but excluded from glial cells in mouse brain [14]. Fox-1 protein was specifically detected in differentiated neuroblastoma and myoblast cell lines, while Fox-2 protein was detected in a variety of cell lines as well as in neuroblastoma and myoblast cell lines [14]. A robust up-regulation of Fox-1 and gradual decrease of Fox-2 were demonstrated along with postnatal mouse heart development [15], although the impact of this switching on alternative splicing patterns needs to be elucidated. Zebrafish Fox-1/a2bp1l is specifically expressed during muscle and heart development as detected by whole mount in situ hybridization [12]. These expression patterns of the vertebrate Fox-1 family proteins are consistent with their functions as tissue-specific alternative splicing regulators mainly in brain, muscle, and heart, and with functions of Fox-2 in some other cell types as described in this review.

The Fox-1 family shares overall domain structure (Fig. 1b), although mammalian Fox-1 and Fox-2 have many splicing isoforms (see below). The single highly conserved RRM domain resides in the middle portion of the protein (Fig. 1b), and the function of the Fox-1 family as a splicing regulator, absolutely relies on the intact RRM domain that can specifically bind to target RNAs [16, 17]. N-terminal and C-terminal portions are less conserved (Fig. 1b) but unique to the Fox-1 family. RF(A/T)PY sequence of the C-terminal end is especially conserved among the family members and matches the consensus of hydrophobic PY nuclear localization signal (hPY-NLS) [18] (Fig. 1c), suggesting that the C-terminal-most region is a conserved nuclear localization signal (NLS). Consistent with this idea, most of the Fox-1 and Fox-2 isoforms were localized to the nucleus in transfected cells [12, 17, 19], and Fox-1 splice variants with a frame-shift in the C-terminal portion were not preferentially localized to the nucleus [17]. C. elegans FOX-1 and ASD-1 also have conserved C-termini (Fig. 1c) and are localized to the nucleus when expressed in HeLa cells, and deletion of C-terminal 7 and 16 amino acid residues, respectively, completely abolished preferred localization to the nucleus (H.K., unpublished observation), suggesting that the hPY-NLS is evolutionarily conserved. Deleting most of the C-terminal portion but not N-terminal portion eliminates activity as a splicing regulator even though the protein was supplemented with an exogenous NLS [12, 20, 21], indicating that the C-terminal portion of the Fox-1 family is indispensable for the splicing regulation.

Fox-1 family proteins specifically bind to (U)GCAUG

A striking feature of the Fox-1 family among other tissue-specific splicing regulators is its exceptionally specific binding to the (U)GCAUG stretch. The strict binding specificity of the Fox-1 family was demonstrated by systematic evolution of ligands by exponential enrichment) (SELEX) analysis. Jin et al. [12] utilized this selection method in vitro to identify target RNA molecules for zebrafish Fox-1/a2bp1l, and reported that 14 out 18 sequenced clones contained a GCAUG pentamer. Ponthier et al. [22] independently performed SELEX experiments with human Fox-1, and reported that 45 out of 47 winner sequences contained a UGCAUG hexamer with an additional preference for U at the seventh position. It is not yet clear what makes the minor difference in the binding specificity between fish and mammalian homologues.

The solution structure of the RRM domain of mammalian Fox-1/Fox-2 in complex with 5′-UGCAUGU-3′ was solved by utilizing nuclear magnetic resonance (NMR) spectroscopy. The Fox-1 family RRM adopts the typical β 1 α 1 β 2 β 3 α 2 β 4 fold of an RRM (Fig. 1a) with the two α-helices packed against a four-stranded antiparallel β-sheet [23, 24] (Fig. 1d). The last three nucleotides, UGU, are recognized by a canonical interface of the RRM, the four-stranded β-sheet, although the base of U7 is not recognized by any specific hydrogen bond [23] (Fig. 1d). The first four nucleotides, UGCA, are bound by two loops independently from the β-sheet binding interface; nucleotides U1, G2, and C3 are wrapped around a single crucial phenylalanine, while G2 and A4 form a mismatch base-pair [23] (Fig. 1d). This unusual molecular mechanism confers the exceptionally high affinity and specificity in target RNA recognition to the Fox-1 family [23].

Fox-1 family proteins either enhance or repress inclusion of alternative exons

To date, several model genes have been extensively studied by utilizing reporter mini-genes. Common methodologies utilized in studies described here are to analyze effects of overexpression and/or knockdown of trans-acting factors including the Fox-1 family on splicing patterns of mRNAs derived from the modified reporter mini-genes or from the endogenous genes by detecting and quantifying alternative isoforms with RT-PCR. Two opposing roles for the Fox-1 family and (U)GCAUG element(s) in alternative splicing regulation are summarized here.

Figure 2a schematically illustrates exon repression by the Fox-1 family via (U)GCAUG element(s) in the upstream intronic flanking (UIF) region of a regulated exon. Exon 9 of the human mitochondrial ATP synthase γ-subunit (F1γ) gene is specifically excluded in skeletal muscle and heart [25]. Expression of Fox-1 induced exon 9 skipping of an F1γ mini-gene via specific binding to several copies of GCAUG stretches in the UIF region in various heterologous cells [12]. Fox-1 and Fox-2 mediated skipping of calcitonin-specific exon 4 in the human calcitonin/calcitonin gene-related peptide (CGRP) gene through intronic and exonic UGCAUG elements in a neuronal context [9, 26], as described in detail later. Expression of Fox-1 and Fox-2 repressed exon 9* of the CaV1.2 L-type calcium channel gene via UGCAUG elements in the UIF and exonic regions [27].

Fig. 2
figure 2

Schematic illustration of alternative splicing regulation by the Fox-1 family. a The Fox-1 family represses exon inclusion by binding to (U)GCAUG element(s) in the upstream intronic flanking (UIF) region. b The Fox-1 family enhances exon inclusion by binding to the (U)GCAUG element(s) in the downstream intronic flanking (DIF) region. Boxes indicate exons and horizontal lines indicate introns. Blue horizontal lines indicate the UIF and DIF regions. Orange boxes indicate (U)GCAUG elements. (U)GCAUG elements are frequently found in multiple copies, and increasing the number of UGCAUG elements exerted a stronger effect on the splicing regulation

Figure 2b schematically illustrates Fox-1 family-mediated inclusion of a cassette exon via UGCAUG element(s) in the downstream intronic flanking (DIF) region. Expression of Fox-1 promoted inclusion of EIIIB exon in the rat fibronectin gene via highly repeated and evolutionarily conserved UGCAUG motifs in the DIF region [12]. Expression of Fox-1 and Fox-2 mediated inclusion of a short neuron-specific cassette exon, N30, of the human non-muscle myosin II heavy chain-B (NMHC-B) gene through UGCAUG repeats located approximately 1.5 kb downstream of the N30 exon in Y79 retinoblastoma cells [8, 17]. Expression of Fox-1 or Fox-2 enhanced inclusion of a short neuron-specific exon, N1, of the c-src gene through UGCAUG element in the DIF region in non-neuronal HeLa cells and RNAi-mediated knockdown of Fox-2 inhibited splicing of the endogenous N1 exon in N2A neuroblastoma cells [14]. Fox-2 positively regulated inclusion of exon 16 of protein 4.1R pre-mRNA via conserved three copies of UGCAUG motifs in the DIF region in late erythroleukemia cell differentiation as well as in heterologous HeLa cells [19, 22]. Expression of Fox-2 enhanced splicing of exon 33 of the CaV1.2 L-type calcium channel gene via UGCAUG in the DIF region [27].

Fox-1/UGCAUG-mediated alternative splicing is often involved as a part of regulation mechanisms for mutually exclusive alternative exons. For example, the α-actinin gene undergoes mutually exclusive splicing of an upstream non-muscle-specific (NM) exon and a downstream smooth muscle-specific (SM) exon [28, 29]. Fox-1 not only repressed NM exon via multiple (U)GCAUG elements in the UIF region, but also promoted splicing of the SM exon presumably by antagonizing the effect of PTB [12]. Regulation of mutually exclusive alternative exons of the FGFR2 gene and its C. elegans ortholog, egl-15, by the the Fox-1 family proteins will be discussed later in this review.

Genomic structure of the mammalian the Fox-1 family genes

Figure 3 shows the schematic structure of the mouse Fox-1/A2bp1 and Fox-2/Rbm9 genes. Expressed sequence tag (EST) data and RT-PCR analysis revealed that there are multiple first exons in both the Fox-1 and Fox-2 genes [14] (Fig. 3). Mouse brain utilizes exon 1D and skeletal muscle utilizes exon 1E of the Fox-2 gene [19]. Pre-differentiated murine erythroleukemia (MEL) cells predominantly express Fox-2F that utilizes exon 1F, while differentiation-induced MEL cells predominantly express Fox-2A that starts from exon 1A [19]. Fox-2A exerts a much stronger effect than Fox-2F on inclusion of exon 16 of protein 4.1R mRNA, and this isoform switching is consistent with increasing level of exon 16 inclusion during erythroid differentiation [19]. These data suggest that each of the multiple promoters is regulated in tissue- and/or development-specific manners.

Fig. 3
figure 3

Schematic structure and major splicing patterns of the mouse Fox-1/A2bp1 and Fox-2/Rbm9 genes. Boxes indicate exons and horizontal lines indicate introns. Use of these exons is supported by cDNA and/or EST sequences in GenBank/EMBL/DDBJ databases, and the exons are designated after [19] for Fox-2/Rbm9 and human orthologs in [14]. Coding regions are coloured; RRM domains in orange, brain-specific region in green, muscle-specific region in magenta, other isoform-specific regions in blue, and common regions in yellow. The size of the exons is not proportional to that of the introns

Fox-1 and Fox-2 genes can produce many mRNA and protein isoforms by variable use of multiple alternatively spliced internal exons and alternative splice sites [14, 17, 19]. Both the Fox-1 and Fox-2 genes have a pair of mutually exclusive internal exons, designated as B40 and M43 (Fig. 3), which are selectively involved in brain and skeletal muscle, respectively [17]. B40 isoforms more potently induced inclusion of N30 exon of the NMHC-B gene than M43 isoforms [17], pointing out specific functions of the isoform-specific regions. The Fox-1 gene has a conserved cassette exon designated as A53 (Fig. 3). Inclusion of the A53 exon causes a frame-shift in the C-terminal portion and, therefore, A53 isoforms lack a conserved RFAPY-end, and are mostly cytoplasmic and less potent in splicing regulation [17]. As the C-terminal portions of the Fox-1 family proteins have a crucial role in splicing regulation besides serving NLS [20, 21], the A53 exon may be utilized for regulating the activity and/or sub-cellular localization of Fox-1 proteins. These and other alternative splicing regulations enable the Fox-1 and Fox-2 genes to produce various isoforms [14, 17, 19]. Expression profiles and functional properties of these isoforms as well as regulation mechanisms of each alternative splicing event are to be elucidated in the future.

Substantial amounts of inactive Fox-1 and Fox-2 isoforms are produced by skipping a cassette exon corresponding to a part of the RRM domain [17] (Fig. 3). Baraniak et al. [21] demonstrated that nucleotide sequences around exon 6, including three UGCAUG elements in the UIF region and one in the DIF region, are highly conserved among human, mouse, and rat Fox-2 genes, and that overexpression of exogenous Fox-2 led to exon 6 skipping of the endogenous Fox-2 gene. Cross-linking experiments (see below) revealed direct binding of Fox-2 protein to its own pre-mRNA in living cells [30]. These data illustrated direct negative auto-regulation of the Fox-2 gene. Similar auto-regulation is observed for C. elegans asd-1 (H.K., unpublished observation), suggesting that negative auto-regulation is a conserved and physiologically critical feature of the the Fox-1 family genes.

(U)GCAUG(U) motif is a conserved intronic element enriched in the proximity of alternatively spliced exons

Early computational analysis of 25 brain-specific alternative cassette exons and adjacent introns demonstrated a highly statistically significant over-representation of UGCAUG hexanucleotide and GCAUG pentanucleotide in the proximal DIF region [31]. UGCAUG was also found at a high frequency in the DIF region of 12 muscle-specific internal exons [31]. Comparative analysis of sequences from various vertebrate genomes revealed that the UGCAUG stretch is phylogenetically and spatially conserved in the proximal DIF region of alternative exons enriched in brain, but not of non-tissue-specific alternative exons or constitutive exons, among orthologous genes [32]. Yeo et al. [33] undertook a genome-wide comparative genomics approach using available mammalian genomes, from human, dog, rat, and mouse, to identify 314 conserved intronic elements proximal to all the internal exons. Many of these elements, including UGCAUG, were enriched near known alternatively spliced exons and actually functioned as splicing regulatory elements in heterologous contexts in human cells [33].

Recently, global analysis of mRNAs from multiple tissues and cell lines by utilizing oligonucleotide microarrays with splicing junction and exon probes facilitated global analysis of tissue specificity profiles of alternative splicing, leading to unbiased identification of candidate regulatory elements in flanking introns for alternative splicing. Sugnet et al. identified 171 cassette exons that are differentially regulated in brain relative to other tissues and found that GCAUG stretch is significantly enriched in the 150-nucleotide (nt) DIF region of brain-included exons [34]. Das et al. [35] identified 56 cassette exons that exhibited higher expression in muscle than in other normal adult tissues by using data from a human exon microarray. They demonstrated that UGCAUG was the most over-represented hexamer in the 200-nt DIF region of the identified exons in human as well as in frog, mouse, and chicken datasets [35]. Kalsotra et al. [15] identified 63 alternative splicing events that are coordinated during mouse heart development. Computational analysis of the 250-nt UIF and the DIF regions identified enriched and conserved pentamer motifs including GCAUG [15].

Most recently, Castle et al. [36] generated the first genome-scale expression compendium of human alternative splicing events using custom whole-transcript microarrays, monitoring 203,672 exons and 178,351 exon–exon junctions in 17,939 human genes, monitoring expression of 24,426 alternative splicing events. They found that 9,516 splicing events were differentially expressed in at least 1 tissue out of 48 diverse human samples. A subsequent, unbiased, systematic screen of 21,760 4-mer to 7-mer words for cis-regulatory motifs identified 143 RNA ‘words’ enriched near regulated cassette exons, including UGCAUG and GCAUGU in the 200-nt DIF region of cassette exons [36]. The compendium of alternative splicing events also illustrated that UGCAUG motif was enriched in the DIF region following upregulated cassette exons in skeletal muscle and heart, with limited enrichment in brain, adipose, and colon [36], consistent with high expression of Fox-1 and Fox-2 in skeletal muscle, brain, and heart.

Whole transcriptome analyses of diverse human tissues and cell lines by deep sequencing of complementary DNA fragments discovered many new exons and exon junctions, and revealed that up to 95% of human multi-exon genes undergo alternative splicing, and many of them are regulated in tissue-specific manners [37, 38]. Analysis of enrichment of hexanucleotides in regions adjacent to tissue-specific exons identified 362 motif/tissue enrichment patterns, and UGCAUG in the DIF region of exons with high inclusion in skeletal muscle was the third most significant motif/tissue pair [38]. The UGCAUG stretch was also substantially enriched in the DIF region of exons with increased inclusion in heart and brain, and in the UIF region of exons that had reduced inclusion in skeletal muscle [38]. The transcriptome analysis by deep sequencing also revealed that patterns of alternative polyadenylation were strongly correlated with those of alternative splicing across tissues, and all eight heptanucleotides NUGCAUG and UGCAUGN were highly enriched in the extension region of tandem 3’ untranslated regions (UTRs) [38].

Venables et al. [39] utilized a sensitive and high-throughput RT-PCR platform to screen 2,186 alternative splicing events found in a human RefSeq database for those associated with ovarian cancer and breast cancer. They identified 115 cancer-associated alternative splicing events common to both ovary and breast tissues, and found an enrichment of (U/A)GCAUG sequences in the DIF region of exons significantly downregulated in cancer [39].

Global analyses of the splicing regulatory networks of the Fox-1 family

Three approaches were applied to comprehensively search for targets of splicing regulation by the Fox-1 family proteins. Two of them relied on the highly specific recognition of the UGCAUG stretch by the Fox-1 family; one study analyzed phylogenetic conservation of UGCAUG stretch(es) in various vertebrate species; the other analyzed effects of Fox-2 knockdown on alternative exons with nearby A/UGCAUG stretch(es) in human cell lines. The third utilized UV-cross-linking and immunoprecipitation (CLIP) of the protein-RNA complex for comprehensive identification of RNA molecules that directly bind to Fox-2 protein in living cells. These studies increased the number of experimentally validated Fox-dependent alternative splicing events to over 120 [39].

Zhang et al. [40] systematically searched all human internal exons and 200-nt UIF and DIF regions for conserved UGCAUG elements by phylogenetic analysis of 28 sequenced vertebrate genomes, and comprehensively predicted 1,457 target exons from 1,103 genes at a false discovery rate of approximately 24%. Microarray analysis of mRNAs from 47 tissues and cell lines on 234 of predicted target cassette exons revealed tissue-dependent inclusion level of these exons, and unbiased hierarchical clustering of tissues grouped brain, muscle, and heart in one cluster [40], confirming the prominent roles of the Fox-1 family in these tissues.

Yeo et al. [30] recently constructed an RNA map of Fox-2-regulated alternative splicing via CLIP coupled with high-throughput sequencing (CLIP-seq) in human embryonic stem cells (hESCs). They used hESCs, which unexpectedly and abundantly express Fox-2, for identifying Fox-2 binding targets because they had previously identified 1,737 internal exons being predicted to undergo alternative splicing in hESCs compared to neural progenitor cells, and GCAUG as a conserved intronic element proximal to the candidate alternatively spliced exons [41]. Using a CLIP-seq technology and computational analyses, they uncovered thousands of Fox-2 binding clusters in 1,876 protein-coding genes, suggesting that ~7% of human genes are subjected to Fox-2 regulation in hESCs [30]. Consistent with former computational sequence analyses of UIF and DIF regions, Fox-2 preferentially bound to exonic, UIF, and DIF regions especially of alternatively spliced exons [30]. The two most significantly enriched hexamers within the clusters were UGCAUG and GCAUGU [30], confirming that Fox-2 recognizes (U)GCAUG(U) in living cells. The Fox-2 CLIP tags were often clustered to regions that lack GCAUG motif [30], implying that Fox-2 may also bind to other elements in certain conditions.

Venables et al. [39] analyzed the impact of Fox-2 knockdown on the splicing of 810 alternative exons that have at least one nearby A/UGCAUG site by utilizing a high-throughput RT-PCR platform, and identified 87 exons responsive to Fox-2 knockdown in ovarian and breast cancer cell lines. The splicing of most of these exons shifted by Fox-2 knockdown in the same direction as they were in ovarian and breast cancer tissues, and expression of Fox-2 itself was downregulated or its splicing was altered in cancer tissues [39], suggesting that cancer-associated modulation of alternative splicing is correlated with the decreased expression of Fox-2 in cancer tissues.

Global identification of many predicted target genes in these studies allowed gene ontology (GO) analysis of the putative target genes to reveal splicing regulatory networks of the Fox-1 family. Zhang et al. [40] reported that predicted targets of the Fox-1 family are enriched in genes with neuromuscular functions, as reflected in top GO terms related to cytoskeleton organization, ion channels, protein phosphorylation, muscle contraction, etc. They also mentioned that the predicted Fox-1 family targets are more likely to be disease genes in the OMIM (Online Mendelian Inheritance in Man) database [40]. As Fox-1 was originally identified as an interacter with ataxin-2 [13], and both Fox-1 and Fox-2 interact with ataxin-1 [42], which are mutated in spinal cerebellar ataxia, characterization of the splicing regulatory network of the Fox-1 family will be important for understanding these neurological, and other diseases. GO analysis of the Fox-2 CLIP tag clusters in hESCs revealed an enrichment of RNA-binding proteins, nuclear mRNA splicing factors, and serine/threonine kinases [30]. Among these Fox-2 target genes were heterogeneous ribonucleoproteins (hnRNPs) such as A2/B1, H1, H2, PTB, and R, alternative splicing regulators including Fox-1, PTB, nPTB, QKI, SRp20, SRp40, SRp55, SFRS11, and Tra2α, and other RNA-binding proteins important for stem-cell biology [30]. The finding that many Fox-2 targets in hESCs are themselves splicing regulators, and that Fox-2 is important for the survival of hESCs but not of other types of cells, implied that Fox-2 may function as an upstream regulator of the splicing network critical for maintaining hESCs [30].

Experimental validation of many predicted target genes for the Fox-1 family confirmed a previously demonstrated trend with respect to exon inclusion or skipping, depending on the location of the UGCAUG element. The alternative exon is included when the Fox-1 family binds to the UGCAUG element in the DIF region, while the exon usage is repressed when the Fox-1 family binds to the UGCAUG element in the UIF region [30, 40] (Fig. 2). The RNA map for the Fox-1/UGCAUG-mediated alternative splicing is reminiscent of an RNA map for a brain-specific splicing regulator family NOVA [43, 44]. These global studies, however, also demonstrated complexity of alternative splicing regulation by the Fox-1 family. The UGCAUG element is often conserved in the UIF and DIF regions of apparent constitutive exons, whose splicing was not necessarily affected by expression or depletion of the Fox-1 family proteins [30, 40]. Neural progenitors differentiated from hESCs and fetal neural stem cells, which also express Fox-2, show different splicing patterns of the validated Fox-2 target genes in hESCs [30]. These facts underscore the importance of experimentally identifying in vivo targets of the Fox-1 family, and presumably of other splicing regulators, in each cell and tissue context.

Coordinated alternative splicing regulation by the Fox-1 family and other splicing regulators

Several studies focusing on molecular mechanisms with which the Fox-1 family together with other splicing regulators ‘represses’ inclusion of regulated exons will be summarized in this section. Little is known, on the other hand, about how the Fox-1 family ‘promotes’ inclusion of cassette exons via binding to UGCAUG element(s) in the DIF region, as is the case with other tissue-specific alternative splicing regulators. A clue to elucidating the promotion mechanism may be interaction between Fox-1/Fox-2 and a U1 small nuclear ribonucleoprotein (snRNP)-specific protein U1-C in a yeast two-hybrid system [45], which is reminiscent of the case of TIA-1 recruiting U1 snRNP to the 5′ splice site [46].

Zhou et al. [26, 49] focused on the alternative processing of calcitonin/CGRP pre-mRNA. Calcitonin-specific exon 4 is regulated by a balance between competing effects of the Fox-1 family proteins binding to UGCAUG elements at positions −34 in the UIF region and +45 in exon 4, and of Tra2β and SRp55 binding to exonic splicing enhancers (ESEs) [26, 47, 48]. Zhou et al. [49] demonstrated that the Fox-1 family proteins bound to the −34 UGCAUG silencer element to prevent SF1 from binding to the branch point without affecting U1 snRNP binding to the pre-mRNA, and that the −34 UGCAUG element repressed formation of pre-spliceosome E′ complex (Fig. 4), a pre-spliceosome complex formed in U2AF-depleted HeLa nuclear extracts prior to early (E) complex formation [50]. They also demonstrated that the Fox-1 family proteins interfered with binding of Tra2β and SRp55 to the ESEs via the +45 UGCAUG element and that the +45 UGCAUG element blocked recruitment of U2AF65 and formation of the pre-spliceosome E complex [49] (Fig. 4). These results raised a fail-safe two-step model of repression of calcitonin-specific exon 4 by the Fox-1 family and two UGCAUG elements [49].

Fig. 4
figure 4

A model for repression of prespliceosome complex formation by the Fox-1 family. The repression of calcitonin-specific exon 4 of calcitonin/CGRP pre-mRNA in neuronal cells by the Fox-1 family involves two distinct regulatory events. First, the −34 element in the UIF region prevents E′ complex formation through repressing SF1 binding to the branch point (B). Second, the +45 exonic element blocks transition to E complex via inhibiting U2AF65 binding to polypyrimidine tract (PY). This figure is modified from [49]

Fukumura et al. [20, 51] utilized exon 9 of the human F1γ gene as a model system of Fox-1-mediated exon skipping. Fox-1 repressed F1γ exon 9 by inhibiting splicing of the downstream intron 9 via GCAUG elements in the UIF region without affecting efficiency of splicing of intron 8 where the GCAUG elements reside. Unexpectedly, U1 snRNP components were specifically absent from pre-spliceosomal E complex on the intron 9 formed [51], and Fox-1 prevented formation of the E complex in the in vitro splicing reaction with HeLa nuclear extract [20]. F1γ intron 9 was efficiently spliced in U1-disrupted Xenopus oocytes and in nuclear extracts from U1-disrupted HeLa cells, confirming that F1γ intron 9 was the natural substrate of U1-independent and U2-dependent splicing [51]. Mutations in the 5′ splice site of intron 9 conferred U1 dependency and concomitantly impaired Fox-1-mediated regulation, indicating that U1-independent splicing and presumably suboptimal 5′ splice site sequence is indispensable for the Fox-1-mediated repression of F1γ exon 9 [51]. It is worth investigating whether Fox-1-mediated repression of alternative exons in other target genes is generally accompanied by U1-independent splicing.

The fibroblast growth factor receptor 2 (FGFR2) transcripts occur in a cell-type-specific manner leading to the mutually exclusive use of exon IIIb in epithelia or exon IIIc in mesenchyme, and the alternative exons determine the ligand-specificity of the receptor. Baraniak et al. [21, 52] demonstrated that expression of Fox-2 can modestly activate exon IIIb and strongly repress exon IIIc via UGCAUG elements in the UIF and exonic regions of exon IIIc in epithelial cells. Mauger et al. [53] demonstrated that hnRNP H and hnRNP F proteins act as silencers of exon IIIc via binding to exonic GGG motifs overlapping with a critical ESE, and that hnRNP H1 forms a complex with Fox-2 to better antagonize ASF/SF2 binding to the ESE. Fox-2 is a critical but insufficient regulator for mesenchymal-epithelial transitions, as mesenchyme-like cells expressing Fox-2 take FGFR2(IIIc) form [21]. Recent cell-based cDNA expression screening identified epithelial cell-type-specific regulators epithelial splicing regulatory proteins 1 and 2 (ESRP1 and ESRP2) that specifically bind to the ISE/ISS-3 element in the UIF region of exon IIIc as critical regulators for FGFR2(IIIc)-to-(IIIb) switching [54]. The mutually exclusive alternative splicing of the FGFR2 gene has been utilized as a model system for visualization of the splicing patterns with fluorescent proteins [5557], which will be reviewed elsewhere (H.K., in press).

Fox-1 family/UGCAUG-mediated alternative splicing is evolutionarily conserved

Caenorhabditis elegans is an excellent model organism for studying the regulation mechanisms of alternative splicing in vivo. At least 5% of total genes in C. elegans are alternatively spliced in manners similar to vertebrates [58, 59]. Kabat et al. [60] compared the sequence of C. elegans genome with that of a related nematode species C. briggsae and identified 147 alternatively spliced cassette exons that exhibit high nucleotide conservation in the UIF and DIF regions. They analyzed frequency of pentamers and hexamers in the conserved intronic elements and found that high-scoring nematode motifs corresponded to known mammalian splicing regulatory elements, including (U)GCAUG [60]. In vivo experiments with transgenic reporter mini-genes confirmed involvement of these conserved elements in alternative splicing regulation [60], suggesting that mechanisms of alternative splicing regulation are well conserved in metazoans.

Recent genetic analysis demonstrated that the Fox-1 family regulates tissue-specific alternative splicing in C. elegans. The target gene egl-15 encodes the sole homolog of fibroblast growth factor receptors (FGFRs) in C. elegans [61], and selection of its mutually exclusive alternative exons, 5A and 5B (Fig. 5a), confers ligand-specificity to the receptor [6265]. We visualized tissue-specific expression profiles of the egl-15 alternative exons in vivo by utilizing green fluorescent protein (GFP) and red fluorescent protein (RFP) [16]. Screening for mutants defective in expression of the fluorescent splicing reporters in muscles led to identification of the Fox-1 family proteins ASD-1 (alternative-splicing-defective-1) and FOX-1 and muscle-specific RNA-binding protein SUP-12 as regulators of muscle-specific repression of egl-15 exon 5B [16, 66]. In muscles, the Fox-1 family and SUP-12 cooperatively bind to UGCAUG and GUGUG stretches, respectively, in the UIF region to repress exon 5B presumably by interfering with recognition of a branch point, leading to inclusion of exon 5A [16, 66] (Fig. 5a, b). A remarkable phenotype of the asd-1; fox-1 double mutant is the egg-laying defective (Egl) due to aberrant migration of sex myoblasts in hermaphrodites [16] (Fig. 5c, d). The same phenotype has been reported for mutants that specifically lack EGL-15 (5A) isoform [65] or lack EGL-15 (5A)-specific ligand EGL-17 [63, 64], indicating that the major role of the Fox-1 family in C. elegans is to repress exon 5B and promote exon 5A of the egl-15 gene.

Fig. 5
figure 5

Fox-1 family proteins ASD-1 and FOX-1 regulate tissue-specific mutually exclusive alternative splicing of the FGFR gene, egl-15, in C. elegans. a Tissue-specific mutually exclusive selection exons of the egl-15 gene of C. elegans. Note that muscle-specific exon 5A resides downstream of exon 5B. b Schematic illustration of muscle-specific repression of exon 5B. Nucleotide sequence of the UIF region is presented. The cis-elements for tissue-specific regulation are in orange. Fox, ASD-1 or FOX-1; SUP, SUP-12. (c,d) Left lateral view of adult hermaphrodite worms expressing egl-15 alternative splicing reporter in body wall muscles and vulval muscles in the wild-type (wt) background (c) and the asd-1; fox-1 double mutant background (d). Merged images of DIC and confocal images of GFP (green) and RFP (magenta). The egl-15 alternative splicing reporter worms express either RFP, representing exon 5A selection, or GFP, representing exon 5B selection, in body wall muscles and vulval muscles (see [16] for detail). Note that in the asd-1; fox-1 double mutant, vulval muscles (arrowheads) express GFP and are mislocalized off the vulva (Vul, arrows), and late embryos with reporter expression in body wall muscles retained in the uterus. Anterior is to the left. Scale bar in (c) 50 μm. (c,d) are modified from [16]

RRM domain of C. elegans FOX-1 is 77% identical to that of human Fox-1 (Fig. 1b), and not only the amino acid residues whose side chains directly contact with the UGCAUGU heptamer RNA but also the complete side of the RRM domain facing the UGCAUGU are conserved [23] (Fig. 1a, d). It is therefore natural that C. elegans FOX-1 and ASD-1 also specifically recognize UGCAUG [16]. The studies on alternative splicing regulation of the egl-15 gene provided the first genetic evidence that two families of trans-factors cooperatively regulate tissue-specific alternative splicing of a specific target gene in vivo through cooperative binding to their juxtaposed cis-elements in a subset of tissues where the trans-factors are co-expressed [16, 66]. The coordination of multiple trans-factors via multiple cis-elements would be a common mechanism for specific and robust regulation of tissue-specific alternative splicing in higher organisms. Furthermore, determination of the ligand specificity of FGFRs by Fox-1 family-mediated alternative splicing shows marked analogy between vertebrates and nematodes, suggesting selective pressure of conservation. The fully-sequenced genomes of C. elegans and other nematodes combined with its elegant genetics thus offers unique advantages for exploring alternative splicing regulation in metazoans [67, 68].

C. elegans SPN-4 is an outlying member of the Fox-1 family (Fig. 1a). Genetic evidence indicated that SPN-4 positively regulates translation of maternal GLP-1 mRNA in early blastomeres by counteracting a CCCH zinc-finger protein POS-1 via direct binding to 3’UTR of the GLP-1 mRNA, which does not have UGCAUG stretch [69]. Consistent with the function as a translational regulator and the fact that it lacks PY-NLS, SPN-4 was exclusively localized in the cytoplasm in immunohistochemical staining [69]. Although SPN-4 is most closely related to the Fox-1 family among RNA-binding protein families, its apparent counterpart is missing in higher eukaryotes and, therefore, its evolutionary origin is unclear.

Role of the fox-1 gene in sex determination of C. elegans

The founder member of the Fox-1 family, FOX-1 of C. elegans, was originally identified as a numerator element of its sex chromosome. Sex in C. elegans is determined by the ratio of X chromosome number to autosomal set number; the chromosome-counting mechanism reliably distinguishes the twofold difference in X-chromosome dose between males (XO, one X chromosome) and hermaphrodites (XX, two X chromosomes) [70]. Genetic studies with X chromosome duplications and transgenic overexpression suggested that FOX-1 is one of only a small number of major numerator sites on the X chromosome; when overexpressed, XX animals were viable, but XO animals were lethal and feminized [71].

The target gene of FOX-1 in sex determination is believed to be xol-1 (XO Lethal), the master sex-determination switch gene that specifies the male fate when active and the hermaphrodite fate when inactive [70]. The dose-sensitive signal elements on chromosome X control xol-1 through two different molecular mechanisms; a nuclear hormone receptor-like protein SEX-1 represses xol-1 transcription [72] and FOX-1 post-transcriptionally downregulates xol-1 level [73]. The small quantitative difference in X chromosome dosage is thus translated into the ‘on/off’ response of the xol-1 gene [70, 74]. xol-1 pre-mRNA is alternatively spliced into three mRNA isoforms and only one of them is necessary and sufficient for xol-1 activity in XO animals [75], and disruption and overexpression of fox-1 affected the functional xol-1 mRNA level in XO animals [76]. These genetic studies and the presence of multiple (U)GCAUG elements in the xol-1 pre-mRNA strongly suggest that FOX-1 regulates processing of xol-1 pre-mRNA in a sex-dependent manner. However, it has not yet been experimentally demonstrated which processing event is enhanced or repressed by FOX-1 expression. This is at least in part due to low expression level and rapid degradation of xol-1 mRNA in hermaphrodites in which SEX-1 and FOX-1 are active. fox-1 single mutant shows no apparent phenotype and even asd-1; fox-1 double mutant is normal in hermaphrodite fate specification [16], suggesting that the sex-determination of C. elegans does not strictly rely on the post-transcriptional regulation of xol-1 by the Fox-1 family.

Perspectives

As described in this review, the (U)GCAUG element is one of the most widely distributed, and evolutionarily conserved splicing regulatory elements in metazoans, and the Fox-1 family is the only subset of RNA-binding proteins that are known to specifically recognize the (U)GCAUG element and regulate splicing of nearby exons. One of the remaining questions on the Fox-1 family is a functional difference between Fox-1 and Fox-2, which has not yet been described in the literature. Conditional knockout models will reveal specific and redundant functions of each of the Fox-1 family proteins in vivo.

Deep sequencing analysis of transcriptome revealed strong correlation in alternative splicing and alternative cleavage and polyadenylation, and enrichment of UGCAUG stretch in 3′ UTRs [38], raising an interesting hypothesis that the Fox-1 family is also involved in 3′ UTR-related functions such as regulation of alternative cleavage and polyadenylation, mRNA stability, localization, or translation. Indeed, recent purification of human mRNA 3′ processing complexes revealed that Fox-2 is one of ~85 protein components of the complex [77]. These stories are reminiscent of the CLIP-seq analysis of NOVA that led to the discovery that NOVA interacts with 3′ UTRs of target mRNAs and regulates alternative polyadenylation in the brain [43]. Further biochemical analysis will reveal how these tissue-specific trans-acting factors coordinate with core machineries for splicing and 3′-end processing at molecular levels.

Another challenging problem is refinement of the ‘RNA map’ or location-dependent activity of the Fox-1 family via (U)GCAUG element(s). Recent global analysis of target genes indicated that the Fox-1 family is not necessarily the major regulator for all target genes. Indeed, coordinating or antagonizing activities of other RNA-binding proteins are described in this review. Future global analyses should be directed at identifying direct target genes for multiple trans-factors and elucidating ‘RNA maps’ for cooperating and/or antagonizing cis-elements in various tissues. Such systematic analyses will lead to a comprehensive understanding of splicing regulatory networks of the Fox-1 family and other tissue-specific regulators in metazoans.