Notable clustering of transcription-factor-binding motifs in human pericentric regions and its biological significance

Chromosome Res. 2013 Aug;21(5):461-74. doi: 10.1007/s10577-013-9371-y. Epub 2013 Jul 30.

Abstract

Since oligonucleotide composition in the genome sequence varies significantly among species even among those possessing the same genome G + C%, the composition has been used to distinguish a wide range of genomes and called as "genome signature". Oligonucleotides often represent motif sequences responsible for sequence-specific protein binding (e.g., transcription-factor binding). Occurrences of such motif oligonucleotides in the genome should be biased compared to those observed in random sequences and may differ among genomes and genomic portions. Self-Organizing Map (SOM) is a powerful tool for clustering high-dimensional data such as oligonucleotide composition on one plane. We previously modified the conventional SOM for genome informatics to batch learning SOM or "BLSOM". When we constructed BLSOMs to analyze pentanucleotide composition in 20-, 50-, and 100-kb sequences derived from the human genome, BLSOMs did not classify human sequences according to chromosome but revealed several specific zones composed primarily of sequences derived from pericentric regions. Interestingly, various transcription-factor-binding motifs were characteristically overrepresented in pericentric regions but underrepresented in most genomic sequences. When we focused on much shorter sequences (e.g., 1 kb), the clustering of transcription-factor-binding motifs was evident in pericentric, subtelomeric and sex chromosome pseudoautosomal regions. The biological significance of the clustering in these regions was discussed in connection with cell-type and -stage-dependent chromocenter formation and nuclear organization.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Base Sequence
  • Binding Sites*
  • Chromosome Mapping
  • Cluster Analysis
  • Computational Biology / methods*
  • Consensus Sequence
  • Databases, Genetic
  • Genome, Human*
  • Genomics / methods*
  • Humans
  • Nucleotide Motifs*
  • Transcription Factors / metabolism*

Substances

  • Transcription Factors