CGGBP1

gene
On this page

Also known as p20-CGGBPCGGBP

Summary

CGGBP1 (CGG triplet repeat binding protein 1, HGNC:1888) is a protein-coding gene on chromosome 3p11.1, encoding CGG triplet repeat-binding protein 1 (Q9UFW8). Binds to nonmethylated 5’-d(CGG)(n)-3’ trinucleotide repeats in the FMR1 promoter.

This gene encodes a CGG repeat-binding protein that primarily localizes to the nucleus. CGG trinucleotide repeats are implicated in many disorders as they often act as transcription- and translation-regulatory elements, can produce hairpin structures which cause DNA replication errors, and form regions prone to chromosomal breakage. CGG repeats are also targets for CpG methylation. In addition to its ability to bind CGG repeats and regulate transcription, this gene is believed to play a role in DNA damage repair and telomere protection. In vitro studies indicate this protein does not bind to methylated CpG sequences.

Source: NCBI Gene 8545 — RefSeq curated summary.

At a glance

  • GWAS associations: 8
  • Clinical variants (ClinVar): 69 total
  • Druggable target: yes
  • MANE Select transcript: NM_001008390

Identifiers

Gene identifiers

FieldValue
HGNC IDHGNC:1888
Approved symbolCGGBP1
NameCGG triplet repeat binding protein 1
Location3p11.1
Locus typegene with protein product
StatusApproved
Aliasesp20-CGGBP, CGGBP
Ensembl geneENSG00000163320
Ensembl biotypeprotein_coding
OMIM603363
Entrez8545

Gene structure

Transcript identifiers

Ensembl transcripts: 11 — 10 protein_coding, 1 protein_coding_CDS_not_defined

ENST00000309534, ENST00000398392, ENST00000462901, ENST00000467332, ENST00000474441, ENST00000482016, ENST00000675130, ENST00000894700, ENST00000913949, ENST00000913951, ENST00000964552

RefSeq mRNA: 3 — MANE Select: NM_001008390 NM_001008390, NM_001195308, NM_003663

CCDS: CCDS43111

Canonical transcript exons

ENST00000482016 — 4 exons

ExonStartEnd
ENSE000015330318805719188057292
ENSE000015330338805801988058223
ENSE000018232428805881588059005
ENSE000019090988805195088055999

Expression profiles

Bgee: expression breadth ubiquitous, 290 present calls, max score 97.29.

FANTOM5 (CAGE): breadth ubiquitous, TPM avg 42.8188 / max 260.8929, expressed in 1823 samples.

FANTOM5 promoters (5 alternative TSS)

Promoter IDTPM avgSamples expressed
4329439.07471823
432931.7051730
432970.8552485
432960.7493399
432950.4345174

Top tissues by expression

299 total, by Bgee expression score (0-100, higher = more expressed):

TissueAnatomy IDExpression scoreQuality
germinal epithelium of ovaryUBERON:000130497.29gold quality
visceral pleuraUBERON:000240197.27gold quality
cortical plateUBERON:000534397.23gold quality
epithelium of nasopharynxUBERON:000195196.92gold quality
ventricular zoneUBERON:000305396.83gold quality
parietal pleuraUBERON:000240096.76gold quality
pleuraUBERON:000097796.70gold quality
ganglionic eminenceUBERON:000402396.64gold quality
pancreatic ductal cellCL:000207996.54gold quality
superficial temporal arteryUBERON:000161496.37gold quality
tibiaUBERON:000097996.23gold quality
lymph nodeUBERON:000002996.19gold quality
caput epididymisUBERON:000435896.10gold quality
endometriumUBERON:000129596.01gold quality
mammary ductUBERON:000176595.92gold quality
urethraUBERON:000005795.84gold quality
tonsilUBERON:000237295.81gold quality
renal medullaUBERON:000036295.77gold quality
pylorusUBERON:000116695.76gold quality
cauda epididymisUBERON:000436095.69gold quality
monocyteCL:000057695.65gold quality
trabecular bone tissueUBERON:000248395.60gold quality
skin of hipUBERON:000155495.56gold quality
cardia of stomachUBERON:000116295.50gold quality
mononuclear cellCL:000084295.48gold quality
pericardiumUBERON:000240795.46gold quality
leukocyteCL:000073895.41gold quality
trigeminal ganglionUBERON:000167595.40gold quality
esophagus squamous epitheliumUBERON:000692095.25gold quality
skeletal muscle tissue of rectus abdominisUBERON:000451195.24gold quality

Single-cell (SCXA)

Detected in 1 experiment(s), a significant marker in 1.

ExperimentMarker?Max mean expression
E-ANND-3yes5.94

Regulation

Is transcription factor: yes

Downstream targets (CollecTRI)

1 targets.

TargetRegulation
FMR1Repression

JASPAR motifs

MotifNameFamily
MA2538.1CGGBP1

JASPAR matrix evidence (PMIDs): PMID:33561217

Upstream regulators (CollecTRI, top): NFIX

miRNA regulators (miRDB)

259 targeting CGGBP1, top 30 by miRDB confidence (max_score; target_count = how many genes the miRNA targets in total — lower means more specific):

miRNAMax scoreAvg scoremiRNA target_count
HSA-MIR-3646100.0073.565283
HSA-MIR-3163100.0077.238605
HSA-MIR-3613-3P100.0076.367965
HSA-MIR-518D-5P100.0067.51979
HSA-MIR-518E-5P100.0067.66954
HSA-MIR-518F-5P100.0067.51979
HSA-MIR-519A-5P100.0067.66954
HSA-MIR-519B-5P100.0067.66954
HSA-MIR-519C-5P100.0067.66954
HSA-MIR-520C-5P100.0067.51979
HSA-MIR-522-5P100.0067.66954
HSA-MIR-523-5P100.0067.66954
HSA-MIR-526A-5P100.0067.51979
HSA-MIR-340-5P100.0072.504437
HSA-MIR-200B-3P100.0073.312693
HSA-MIR-200C-3P100.0073.352685
HSA-MIR-429100.0073.442698
HSA-MIR-29A-3P100.0073.111835
HSA-MIR-29B-3P100.0073.181833
HSA-MIR-29C-3P100.0073.151833
HSA-MIR-5692A100.0074.406850
HSA-MIR-366299.9973.825684
HSA-MIR-428299.9975.366408
HSA-MIR-548AW99.9972.573559
HSA-MIR-548C-3P99.9974.017587
HSA-MIR-520G-5P99.9966.76658
HSA-MIR-4789-3P99.9970.752484
HSA-MIR-524-5P99.9873.434882
HSA-MIR-477599.9875.006394
HSA-MIR-520D-5P99.9873.344883

Literature-anchored findings (GeneRIF, showing 13)

  • CGGBP1 was mapped to chromosome 3p and a sequence of 235 nucleotides 5’ upstream of CGGBP1 is essential for promoter activity. (PMID:14667814)
  • Differences in factors implicated in CGG repeat instability–CGG repeat size, XS548/FRAXAC1 haplotypes, and AGG interspersion pattern-are present in the Basque populations analyzed. (PMID:19728537)
  • CGGBP-20 downregulates the activity of 5’-region of FMR1 gene in the presence of GCC-triplets only. (PMID:20141036)
  • CGGBP1 depletion by RNA interference in tumor-derived cells caused an increase in the cell population at G0/G1 phase and reduced the number of cells in the S phase. (PMID:21733196)
  • Our results suggest that CGGBP1 phosphorylation at S164 is a novel telomere protection signal, which can affect telomere-protective function of the shelterin complex (PMID:24196442)
  • Over-expression of miR-7 could significantly inhibit the growth of human lung cancer cells in vivo and in vitro, which might be related to the down-regulated expression of tumor growth-associated protein CGGBP1 (PMID:24491049)
  • By studying global gene expression patterns and genome-wide DNA-binding patterns of CGGBP1, it has been shown that a possible mechanism through which it affects the expression of RNA Pol II-transcribed genes in trans depends on Alu RNA. (PMID:25483050)
  • CGGBP1 is a possible bidirectional regulator of CpG methylation at Alus, and acts as a repressor of methylation at L1 retrotransposons. (PMID:25981527)
  • CGGBP1 has no direct effect on FMR1 transcription and CGG repeat stability. (PMID:26306647)
  • CGGBP1 protein play an important cytoprotective role in patients with cancer. (PMID:26482656)
  • CGGBP1 is a regulator of CTCF-DNA-binding pattern with a direct effect on a potential chromatin barrier like functioning of repeatrich and motif-rich regions. (PMID:31547883)
  • CGGBP1-regulated cytosine methylation at CTCF-binding motifs resists stochasticity. (PMID:32727353)
  • CGGBP1-dependent CTCF-binding sites restrict ectopic transcription. (PMID:34585631)

Cross-species orthologs

2 orthologs

OrganismSymbolGene ID
mus_musculusCggbp1ENSMUSG00000054604
rattus_norvegicusCggbp1ENSRNOG00000000718

Protein

Protein identifiers

CGG triplet repeat-binding protein 1Q9UFW8 (reviewed: Q9UFW8)

Alternative names: 20 kDa CGG-binding protein, p20-CGGBP DNA-binding protein

All UniProt accessions (2): C9JUJ0, Q9UFW8

UniProt curated annotations — full annotation on UniProt →

Function. Binds to nonmethylated 5’-d(CGG)(n)-3’ trinucleotide repeats in the FMR1 promoter. May play a role in regulating FMR1 promoter.

Subcellular location. Nucleus.

Tissue specificity. Ubiquitous. Highly expressed in placenta, thymus, lymph nodes, cerebellum and cerebral cortex. Low expression in other regions of the brain.

Miscellaneous. Binding is severely inhibited by complete or partial cytosine-specific DNA methylation of the binding motif.

RefSeq proteins (3): NP_001008391, NP_001182237, NP_003654 (=MANE)

Domains & families (InterPro)

IDNameType
IPR033375Cggbp1Family

UniProt features (5 total): modified residue 2, chain 1, short sequence motif 1, sequence conflict 1

Structure

Experimental structures (PDB)

0 structures.

Predicted structure (AlphaFold)

ModelpLDDTFraction very-high
AF-Q9UFW8-F176.260.15

Functional residue map

Curated UniProt residues grouped by drug-discovery relevance — catalytic, ligand-binding, modification, and mutation-validated positions. Source: UniProtKB sequence features.

Post-translational modifications (2): 56, 164

Function

Pathways and Gene Ontology

Reactome pathways

0 pathways

MSigDB gene sets: 251 (showing top): RNGTGGGC_UNKNOWN, ENK_UV_RESPONSE_KERATINOCYTE_UP, SP1_Q2_01, ATGTTAA_MIR302C, NFKB_C, BLALOCK_ALZHEIMERS_DISEASE_UP, TGCTGAY_UNKNOWN, KMCATNNWGGA_UNKNOWN, GTGTTGA_MIR505, ZIC1_01, DOUGLAS_BMI1_TARGETS_DN, SANSOM_APC_TARGETS_DN, EGR1_01, GOBP_CHROMATIN_REMODELING, GTGACTT_MIR224

GO Biological Process (4): negative regulation of transcription by RNA polymerase II (GO:0000122), regulation of gene expression (GO:0010468), epigenetic regulation of gene expression (GO:0040029), regulation of transcription by RNA polymerase II (GO:0006357)

GO Molecular Function (5): double-stranded DNA binding (GO:0003690), identical protein binding (GO:0042802), DNA-binding transcription factor binding (GO:0140297), DNA binding (GO:0003677), protein binding (GO:0005515)

GO Cellular Component (3): nucleus (GO:0005634), nucleoplasm (GO:0005654), cytosol (GO:0005829)

GO top-level categories

Rollup of top GO terms by namespace:

CategoryTerms
transcription by RNA polymerase II2
cellular anatomical structure2
regulation of transcription by RNA polymerase II1
negative regulation of DNA-templated transcription1
gene expression1
regulation of macromolecule biosynthetic process1
chromatin remodeling1
regulation of gene expression1
regulation of DNA-templated transcription1
DNA binding1
protein binding1
transcription factor binding1
nucleic acid binding1
binding1
intracellular membrane-bounded organelle1
nuclear lumen1
cytoplasm1

Protein interactions and networks

STRING

1545 interactions, top by confidence (×1000):

Protein AProtein BPartner UniProtScore
CGGBP1FMR1Q06787793
CGGBP1ZNF654Q8IZM8619
CGGBP1C3orf38Q5JPI3605
CGGBP1IQSEC1Q6DN90555
CGGBP1NBEAL2Q6ZNJ1546
CGGBP1GIN1Q9NXP7492
CGGBP1ATXN7L3BQ96GX2490
CGGBP1ZIC4Q8N9L1486
CGGBP1LRRC3BQ96PB8476
CGGBP1URB1O60287470
CGGBP1TMEM209Q96SK2450
CGGBP1CLN8Q9UBY8410
CGGBP1NKIRAS1Q9NYS0396
CGGBP1TMCC3Q9ULS5393
CGGBP1TOPAZ1Q8N9V7384

IntAct

102 interactions, top by confidence:

ABTypeScore
MRM1CGGBP1psi-mi:“MI:0915”(physical association)0.800
CGGBP1RELpsi-mi:“MI:0915”(physical association)0.670
RELCGGBP1psi-mi:“MI:0915”(physical association)0.670
CGGBP1CGGBP1psi-mi:“MI:0915”(physical association)0.670
SPIN1SPINDOCpsi-mi:“MI:0914”(association)0.640
SDCBPCGGBP1psi-mi:“MI:0915”(physical association)0.560
GLRX3CGGBP1psi-mi:“MI:0915”(physical association)0.560
FAM124ACGGBP1psi-mi:“MI:0915”(physical association)0.560
CGGBP1FAM124Apsi-mi:“MI:0915”(physical association)0.560
NTAQ1CGGBP1psi-mi:“MI:0915”(physical association)0.560
BOLA2-SMG1P6CGGBP1psi-mi:“MI:0915”(physical association)0.560
CDKN2DCGGBP1psi-mi:“MI:0915”(physical association)0.560
TXN2CGGBP1psi-mi:“MI:0915”(physical association)0.560
RELCGGBP1psi-mi:“MI:0915”(physical association)0.560
PICK1CGGBP1psi-mi:“MI:0915”(physical association)0.560
CDC37CGGBP1psi-mi:“MI:0915”(physical association)0.560
GORASP2CGGBP1psi-mi:“MI:0915”(physical association)0.560
PIAS2CGGBP1psi-mi:“MI:0915”(physical association)0.560
PAX6CGGBP1psi-mi:“MI:0915”(physical association)0.560
POLR1CCGGBP1psi-mi:“MI:0915”(physical association)0.560
SNRNP27UBA6psi-mi:“MI:0914”(association)0.530
SPIN2BWDHD1psi-mi:“MI:0914”(association)0.530

BioGRID (79): CGGBP1 (Two-hybrid), CGGBP1 (Two-hybrid), CGGBP1 (Two-hybrid), GLRX3 (Two-hybrid), FAM124A (Two-hybrid), CGGBP1 (Affinity Capture-MS), CGGBP1 (Affinity Capture-MS), CGGBP1 (Affinity Capture-MS), CGGBP1 (Co-fractionation), CGGBP1 (Co-fractionation), DDB1 (Co-fractionation), CGGBP1 (Two-hybrid), CGGBP1 (Affinity Capture-MS), CGGBP1 (Affinity Capture-MS), CGGBP1 (Affinity Capture-MS)

ESM2 similar proteins: A2PYH4, A2RUV5, A4Z943, A4Z944, B2RD01, D3Z4R1, F1QWA8, O00463, O18475, O60308, O75417, O95786, O96006, P12258, P46063, P49916, P57075, P70191, P97386, Q0V842, Q29RQ5, Q49AG3, Q5MAE6, Q5N870, Q5RF63, Q5SVZ6, Q5SZJ8, Q6AYJ1, Q6PCN7, Q6PFX2, Q6Q899, Q6R2W3, Q7M3K2, Q7X7E9, Q8BHG9, Q8BYH3, Q8CGS6, Q8GT06, Q8N328, Q8R5F7

Diamond homologs: Q8BHG9, Q9UFW8

SIGNOR signaling

0 interactions.

Disease & clinical

Clinical variants and AI predictions

ClinVar

69 variants total. Per-class counts are floors (≥ shown; pagination cap):

ClassificationCount (floor)
Pathogenic0
Likely pathogenic0
Uncertain significance57
Likely benign1
Benign0

Top pathogenic / likely-pathogenic (0)

SpliceAI

819 predictions. Top by Δscore:

VariantEffectΔscore
3:88056000:C:CCacceptor_gain1.0000
3:88056001:T:Cacceptor_gain1.0000
3:88056004:C:CTacceptor_gain1.0000
3:88058055:G:Cdonor_gain1.0000
3:88058383:C:Adonor_gain1.0000
3:88055905:C:CTacceptor_gain0.9900
3:88055995:TGGTT:Tacceptor_gain0.9900
3:88055996:GGTT:Gacceptor_gain0.9900
3:88055997:GTT:Gacceptor_gain0.9900
3:88056001:T:TCacceptor_gain0.9900
3:88056005:A:Tacceptor_gain0.9900
3:88056009:C:CTacceptor_gain0.9900
3:88058054:A:ACdonor_gain0.9900
3:88058054:AG:Adonor_gain0.9900
3:88058107:T:Adonor_gain0.9900
3:88058219:CTTTA:Cacceptor_gain0.9900
3:88058224:C:CCacceptor_gain0.9900
3:88058382:T:TAdonor_gain0.9900
3:88058397:T:TAdonor_gain0.9900
3:88058424:T:TAdonor_gain0.9900
3:88058914:T:TAdonor_gain0.9900
3:88141049:G:GGdonor_gain0.9900
3:88056000:C:CAacceptor_loss0.9800
3:88056001:T:Aacceptor_loss0.9800
3:88056010:A:Tacceptor_gain0.9800
3:88058013:GTTTA:Gdonor_loss0.9800
3:88058014:TTTA:Tdonor_loss0.9800
3:88058015:TTACC:Tdonor_loss0.9800
3:88058016:TA:Tdonor_loss0.9800
3:88058017:A:Tdonor_loss0.9800

AlphaMissense

0 scored. Top likely-pathogenic:

dbSNP variants (sampled 300 via entrez): RS1000021226 (3:88072811 A>C,G), RS1000088188 (3:88071515 G>A), RS1000198310 (3:88143149 C>G,T), RS1000255636 (3:88143402 C>T), RS1000296323 (3:88054686 C>T), RS1000302459 (3:88110295 A>G), RS1000429451 (3:88060099 G>A,C), RS1000458099 (3:88073305 CAAAAA>C,CAAAA), RS1000460330 (3:88103599 A>T), RS1000569410 (3:88091998 AATCTT>A), RS1000621905 (3:88099189 T>C), RS1000690018 (3:88097703 G>A), RS1000756065 (3:88061643 A>G), RS1000767204 (3:88104805 C>T), RS1000826514 (3:88149424 G>A,T)

Disease associations

OMIM: gene MIM:603363 | disease phenotypes:

GenCC curated gene-disease

Mondo (0):

Orphanet (0):

HPO phenotypes

0 total (0 of 0 shown, HPO-id order):

GWAS associations

8 associations (top):

StudyTraitp-value
GCST004630_119Mean corpuscular hemoglobin3.000000e-10
GCST005830_81Hand grip strength9.000000e-09
GCST010989_229Body size at age 105.000000e-15
GCST011122_46Walking pace1.000000e-08
GCST90002390_153Mean corpuscular hemoglobin4.000000e-23
GCST90002392_194Mean corpuscular volume9.000000e-22
GCST90002396_197Mean reticulocyte volume6.000000e-13
GCST90002403_547Red blood cell count3.000000e-16

EFO canonical traits (5, from GWAS)

EFO IDTrait name
EFO:0004527mean corpuscular hemoglobin
EFO:0006941grip strength measurement
EFO:0009819comparative body size at age 10, self-reported
EFO:0010701mean reticulocyte volume
EFO:0004305erythrocyte count

Drugs & pharmacology

Drug and pharmacology data

Is drug target: yes

ChEMBL targets (1): CHEMBL5724774 (SINGLE PROTEIN)

PharmGKB: 1 entry (VIP=true, CPIC=false)

CTD chemical–gene interactions

27 total (human), top 27 by PubMed support.

ChemicalActions (top 5)PubMed papers
Valproic Acidaffects expression, decreases expression2
aristolochic acid Idecreases expression1
triphenyl phosphateaffects expression1
sodium arsenateincreases abundance, increases expression1
cobaltous chloridedecreases expression1
beta-methylcholineaffects expression1
K 7174increases expression1
jinfukangdecreases expression1
Resveratrolaffects cotreatment, increases expression1
Sunitinibincreases expression1
Acetaminophenincreases expression1
Ethanoldecreases expression, increases abundance, affects cotreatment1
Arsenicincreases abundance, increases expression1
Enzyme Inhibitorsdecreases activity, increases O-linked glycosylation1
Ethyl Methanesulfonateincreases expression1
Formaldehydeincreases expression1
Gasolineaffects cotreatment, decreases expression, increases abundance1
Ivermectindecreases expression1
Methyl Methanesulfonateincreases expression1
Plant Extractsaffects cotreatment, increases expression1
Polycyclic Aromatic Hydrocarbonsaffects cotreatment, decreases expression, increases abundance1
Cyclosporinedecreases expression1
Gold Compoundsdecreases expression1
Uranium Compoundsdecreases expression1
Cadmium Chloridedecreases expression1
Copper Sulfatedecreases expression1
Particulate Matteraffects cotreatment, decreases expression, increases abundance1

ChEMBL screening assays

6 unique, capped per target: 6 binding

Representative assays (with source publication via chembl_document):

Assay IDTypeDescriptionSource paper
CHEMBL5697440BindingInhibition of CGGBP1 (unknown origin) assessed as fold change at 10 uM incubated for 1 hr by colloidal coomassie staining based LC-MS/MS analysisInhibition of BET recruitment to chromatin as an effective treatment for MLL-fusion leukaemia. — Nature

Clinical trials (associated diseases)

0 trials via MONDO — disease-level, not drug-specific.

No linked Atlas pages yet — the cross-entity mesh grows as the corpus expands.