TY - JOUR
T1 - Short tandem repeats bind transcription factors to tune eukaryotic gene expression
AU - Horton, Connor A.
AU - Alexandari, Amr M.
AU - Hayes, Michael G.B.
AU - Marklund, Emil
AU - Schaepe, Julia M.
AU - Aditham, Arjun K.
AU - Shah, Nilay
AU - Suzuki, Peter H.
AU - Shrikumar, Avanti
AU - Afek, Ariel
AU - Greenleaf, William J.
AU - Gordân, Raluca
AU - Zeitlinger, Julia
AU - Kundaje, Anshul
AU - Fordyce, Polly M.
N1 - Publisher Copyright:
Copyright © 2023 The Authors, some rights reserved; exclusive licensee American Association for the Advancement of Science. No claim to original U.S. Government Works.
PY - 2023/9/22
Y1 - 2023/9/22
N2 - INTRODUCTION: Gene expression is regulated by transcription factor (TF) proteins that bind DNA-regulatory elements in the genome. Despite decades of research cataloging TF “motifs,” these do not fully explain observed genomic binding in cells. Many TFs bind regions lacking motifs, whereas other regions with apparently strong motifs remain unoccupied, and emerging evidence suggests that the DNA sequence context surrounding motifs can strongly affect binding (see the figure, panel A). Short tandem repeats (STRs, consecutively repeated units of one to six nucleotides) provide a good example of these sequence contexts. STRs comprise ~5% of the human genome (compared with 1.5% for all protein-coding genes) and are enriched in enhancers. Variations in STR length have been associated with changes in gene expression and implicated in several complex phenotypes, such as schizophrenia, cancer, autism, and Crohn’s disease. However, the mechanism by which STRs affect transcription remains unknown. RATIONALE: One mechanism by which STRs could affect gene expression is by altering the affinity and/or kinetics of TF binding to regulatory DNA (see the figure, panel A). To investigate this, we used various high-throughput microfluidic binding assays (i.e., MITOMI, k-MITOMI, and STAMMP) and bioinformatic analyses to systematically quantify the impacts of different sequence contexts on TF binding. We measured affinities (Kds) and kinetics (koffs) for two basic helix-loop-helix TFs that bind a CACGTG E-box motif (Pho4 from Saccharomyces cerevisiae and MAX from Homo sapiens) to DNA sequences with or without an E-box motif surrounded by random sequence or multiple different types of STRs (see the figure, panel B). RESULTS: Measured binding constants (Kds) for 609 distinct TF-DNA combinations revealed that different STRs can alter binding affinities by >70-fold (see the figure, panel C), approaching or exceeding effects from mutating the consensus motif. Preferred STRs differed for Pho4 and MAX TFs, demonstrating that motifs are not sufficient to predict preferred STRs. Gel-shift assays and additional experiments using TF truncation constructs established that TFs directly bind STRs (see the figure, panel C) through their DNA-binding domains in the presence or absence of motifs. Although not predicted by standard mononucleotide models, the observed STR binding is well explained by a simple partition function model from statistical mechanics in which multiple repeated weak binding sites contribute additively to binding affinity (see the figure, panel D). Measured apparent dissociation rates (koffs) for 106 TF-DNA combinations and kinetic modeling suggested that STRs primarily alter macroscopic apparent association rates and increase the local density of DNA-bound TFs. Finally, neural networks trained only on in vivo genome-wide chromatin immunoprecipitation data predict effects identical to those measured in vitro, suggesting that STR preferences play a substantial role in properly localizing TFs in cells. CONCLUSION: Analysis of previously published protein-binding microarray and SELEX data suggests that ~90% of eukaryotic TFs preferentially bind at least one type of STR (see the figure, panel E). Because STRs are highly mutable, we propose that they should be considered an easily evolvable class of cis-regulatory elements. Preferred STRs need not resemble known motifs, suggesting a mechanism by which TF paralogs can be recruited to different regulatory regions and regulate distinct target genes. Although STRs maximize the number of potential weak binding sites, we anticipate that nonrepetitive sequence contexts containing many low-affinity binding sites should similarly increase binding. Thus, we propose that STRs function as “rheostats” to tune local TF concentration and binding responses to regulate gene expression in disease, development, and homeostasis.
AB - INTRODUCTION: Gene expression is regulated by transcription factor (TF) proteins that bind DNA-regulatory elements in the genome. Despite decades of research cataloging TF “motifs,” these do not fully explain observed genomic binding in cells. Many TFs bind regions lacking motifs, whereas other regions with apparently strong motifs remain unoccupied, and emerging evidence suggests that the DNA sequence context surrounding motifs can strongly affect binding (see the figure, panel A). Short tandem repeats (STRs, consecutively repeated units of one to six nucleotides) provide a good example of these sequence contexts. STRs comprise ~5% of the human genome (compared with 1.5% for all protein-coding genes) and are enriched in enhancers. Variations in STR length have been associated with changes in gene expression and implicated in several complex phenotypes, such as schizophrenia, cancer, autism, and Crohn’s disease. However, the mechanism by which STRs affect transcription remains unknown. RATIONALE: One mechanism by which STRs could affect gene expression is by altering the affinity and/or kinetics of TF binding to regulatory DNA (see the figure, panel A). To investigate this, we used various high-throughput microfluidic binding assays (i.e., MITOMI, k-MITOMI, and STAMMP) and bioinformatic analyses to systematically quantify the impacts of different sequence contexts on TF binding. We measured affinities (Kds) and kinetics (koffs) for two basic helix-loop-helix TFs that bind a CACGTG E-box motif (Pho4 from Saccharomyces cerevisiae and MAX from Homo sapiens) to DNA sequences with or without an E-box motif surrounded by random sequence or multiple different types of STRs (see the figure, panel B). RESULTS: Measured binding constants (Kds) for 609 distinct TF-DNA combinations revealed that different STRs can alter binding affinities by >70-fold (see the figure, panel C), approaching or exceeding effects from mutating the consensus motif. Preferred STRs differed for Pho4 and MAX TFs, demonstrating that motifs are not sufficient to predict preferred STRs. Gel-shift assays and additional experiments using TF truncation constructs established that TFs directly bind STRs (see the figure, panel C) through their DNA-binding domains in the presence or absence of motifs. Although not predicted by standard mononucleotide models, the observed STR binding is well explained by a simple partition function model from statistical mechanics in which multiple repeated weak binding sites contribute additively to binding affinity (see the figure, panel D). Measured apparent dissociation rates (koffs) for 106 TF-DNA combinations and kinetic modeling suggested that STRs primarily alter macroscopic apparent association rates and increase the local density of DNA-bound TFs. Finally, neural networks trained only on in vivo genome-wide chromatin immunoprecipitation data predict effects identical to those measured in vitro, suggesting that STR preferences play a substantial role in properly localizing TFs in cells. CONCLUSION: Analysis of previously published protein-binding microarray and SELEX data suggests that ~90% of eukaryotic TFs preferentially bind at least one type of STR (see the figure, panel E). Because STRs are highly mutable, we propose that they should be considered an easily evolvable class of cis-regulatory elements. Preferred STRs need not resemble known motifs, suggesting a mechanism by which TF paralogs can be recruited to different regulatory regions and regulate distinct target genes. Although STRs maximize the number of potential weak binding sites, we anticipate that nonrepetitive sequence contexts containing many low-affinity binding sites should similarly increase binding. Thus, we propose that STRs function as “rheostats” to tune local TF concentration and binding responses to regulate gene expression in disease, development, and homeostasis.
UR - http://www.scopus.com/inward/record.url?scp=85171955389&partnerID=8YFLogxK
U2 - 10.1126/science.add1250
DO - 10.1126/science.add1250
M3 - Article
C2 - 37733848
AN - SCOPUS:85171955389
SN - 0036-8075
VL - 381
JO - Science
JF - Science
IS - 6664
M1 - add1250
ER -