ARID1A Gene Complete Identifier and Functional Mapping Reference
Provide a comprehensive cross-database identifier and functional mapping reference for human ARID1A — a definitive lookup resource covering: ### Section 1: Gene identifiers For human gene ARID1A, list ALL gene-level database identifiers. Required: - HGNC ID and approved symbol - Ensembl gene ID (ENSG...) - NCBI Entrez Gene ID - OMIM gene/locus ID - Genomic location: chromosome, start position, end position, strand (GRCh38) ### Section 2: Transcript identifiers For human gene ARID1A, list ALL transcript-level identifiers. Required: - Ensembl transcripts: ALL ENST IDs with biotype. Total count. - RefSeq transcripts: ALL NM_ mRNA accessions. Mark which is MANE Select. - CCDS IDs. - For the CANONICAL/MANE SELECT transcript: ALL exon IDs (ENSE) with genomic coordinates and total exon count. ### Section 3: Protein identifiers For human gene ARID1A protein product(s), list ALL protein-level identifiers. Required: - UniProt accessions: ALL entries (reviewed and unreviewed). Mark the canonical reviewed entry. - RefSeq protein: ALL NP_ accessions. - Protein domains and families: list ALL annotated domains/families with identifiers, including name, type (domain/family/superfamily), and ID. - Antibody availability: known antibody resources for the protein. ### Section 4: Structure For human gene ARID1A protein, list ALL structural data. Required: - Experimental structures: ALL PDB IDs. For each: experimental method (X-ray/NMR/Cryo-EM) and resolution. Total count. - Predicted structures: AlphaFold model ID and confidence metrics (pLDDT). ### Section 5: Cross-species orthologs For human gene ARID1A, list orthologous genes in key model organisms. Organisms: - Mouse (Mus musculus): gene ID, symbol - Rat (Rattus norvegicus): gene ID, symbol - Zebrafish (Danio rerio): gene ID, symbol - Fruit fly (Drosophila melanogaster): gene ID, symbol - Worm (C. elegans): gene ID, symbol - Yeast (S. cerevisiae): gene ID, symbol ### Section 6: Clinical variants & AI predictions For human gene ARID1A, summarize clinical variants and AI predictions. Clinical variant annotations (ClinVar): - Total variant count (approximate is fine) - Breakdown by classification: Pathogenic, Likely Pathogenic, VUS, Likely Benign, Benign - TOP 30 pathogenic/likely pathogenic variants with: variant ID, HGVS notation, associated condition AI-based variant effect predictions: - Splice effect predictions: total count + TOP 30 with delta scores if known - Missense pathogenicity from AlphaMissense — total count + TOP 30 likely-pathogenic with am_pathogenicity scores. ### Section 7: Pathways & Gene Ontology For human gene ARID1A, list biological pathways and Gene Ontology annotations. Pathway membership: - ALL biological pathways this gene participates in, with pathway IDs and names - Total pathway count Gene Ontology: - Biological Process: count and TOP 20 terms with GO IDs - Molecular Function: count and TOP 20 terms with GO IDs - Cellular Component: count and TOP 20 terms with GO IDs ### Section 8: Protein interactions & networks For human gene ARID1A protein, summarize protein interactions and networks. Protein-protein interactions (STRING, IntAct, BioGRID, etc.): - Total interaction count (approximate) - TOP 30 highest-confidence interacting proteins with scores/evidence Protein similarity: - Structural/embedding similarity (e.g. Foldseek, ESM): TOP 20 similar proteins with scores - Sequence homology: TOP 20 homologous proteins with identity/similarity ### Section 9: Transcription factor regulatory data For human gene ARID1A, summarize transcription factor regulatory data. If ARID1A is a transcription factor: - Downstream targets: total count + TOP 30 with regulation type (activates/represses) and evidence - DNA binding motifs from JASPAR — all known motif IDs and motif family classification. Regardless: - Upstream regulators: TFs that regulate ARID1A — names with evidence type (ChIP-seq / predicted / experimentally validated) If ARID1A is not a transcription factor, say so briefly and skip the downstream/motif sections. ### Section 10: Drug & pharmacology data For human gene ARID1A protein as a drug target, summarize pharmacology data. If ARID1A is a known drug target: - Targeting molecules: total count in ChEMBL/DrugBank + TOP 30 by development phase (molecule ID, name, mechanism, highest phase) - Clinical trials: TOP 20 involving drugs targeting this gene — trial ID, phase, status, intervention - Pharmacogenomics: known drug-gene interactions affecting drug response + dosing guidelines if any If ARID1A is not currently a drug target, say so briefly. ### Section 11: Expression profiles For human gene ARID1A, summarize expression profiles. Tissue expression (GTEx, HPA, Bgee, etc.): - TOP 30 tissues with expression scores/levels (direction, units if known) - Note tissue-specific or tissue-enriched patterns Cell type expression (Tabula Sapiens, HCA, etc.): - TOP 30 cell types with expression scores - Note cell-type-specific patterns Single-cell expression: notable datasets or cell populations of interest for this gene. ### Section 12: Disease associations For human gene ARID1A, summarize disease associations. Mendelian / monogenic disease: - Diseases caused by mutations in ARID1A: disease name, disease ID (OMIM/Orphanet/Mondo), inheritance pattern, evidence level - Include all directly linked conditions Phenotype associations: - Clinical phenotypes associated with the gene (HPO terms where known) - TOP 30 phenotype terms with HPO IDs Complex-disease / GWAS: - Traits and diseases significantly associated via GWAS: trait name, variant, effect size, study where known - TOP 30 GWAS associations
Executive summary
ARID1A (AT-rich interactive domain-containing protein 1A, chromosome 1q25) is a core structural subunit of the BAF/SWI-SNF chromatin-remodeling complex and a major tumor suppressor, encoded at HGNC:11110 and producing a 2,285-amino-acid, ~242 kDa protein (UniProt O14497). Its primary biological role is chromatin remodeling and transcription coactivation rather than sequence-specific DNA binding, with ~3,461 documented protein interactions spanning SWI/SNF subunits (SMARCA4, SMARCB1), histone modifiers (EP300, EZH2), and key oncoproteins (TP53, PTEN, KRAS). In the germline, loss-of-function variants cause autosomal dominant intellectual disability (OMIM:614607) and Coffin-Siris syndrome 1, with ~170 pathogenic or likely pathogenic ClinVar entries and a rich phenotypic spectrum including global developmental delay, microcephaly, and congenital heart defects. Somatically, ARID1A is among the most frequently mutated tumor suppressors across cancers including endometrial, colorectal, and bladder cancer, though no approved therapeutics directly target it. Expression is ubiquitous across 286 of 295 profiled human tissues, with peak levels in bone marrow and embryonic neural tissues.
ARID1A — Reference
Cross-database identifier and functional mapping reference for ARID1A.
Gene identifiers
| Identifier | Value |
|---|---|
| HGNC ID | HGNC:11110 |
| Approved symbol | ARID1A |
| Ensembl gene ID | ENSG00000117713 |
| NCBI Entrez gene ID | 8289 |
| OMIM gene ID | 603024 |
| Genomic location (GRCh38) | |
| Chromosome | 1 |
| Start position | 26,693,236 bp |
| End position | 26,782,104 bp |
| Strand | + (forward) |
Transcript identifiers
Ensembl Transcripts
Total: 17 transcripts
| Ensembl ID | Biotype |
|---|---|
| ENST00000324856 | protein_coding |
| ENST00000374152 | protein_coding |
| ENST00000430291 | retained_intron |
| ENST00000430799 | protein_coding |
| ENST00000457599 | protein_coding |
| ENST00000466382 | nonsense_mediated_decay |
| ENST00000524572 | protein_coding |
| ENST00000532781 | nonsense_mediated_decay |
| ENST00000636072 | retained_intron |
| ENST00000636110 | retained_intron |
| ENST00000636219 | protein_coding |
| ENST00000636422 | retained_intron |
| ENST00000636794 | nonsense_mediated_decay |
| ENST00000636958 | protein_coding_CDS_not_defined |
| ENST00000637465 | protein_coding |
| ENST00000637788 | retained_intron |
| ENST00000850904 | protein_coding |
RefSeq mRNA Transcripts
Total: 12 transcripts
| NM ID | MANE Select |
|---|---|
| NM_006015 | ✓ |
| NM_001080819 | |
| NM_001341479 | |
| NM_001363070 | |
| NM_001401271 | |
| NM_001401273 | |
| NM_001401275 | |
| NM_001401276 | |
| NM_001401278 | |
| NM_001401279 | |
| NM_118259 | |
| NM_139135 |
CCDS IDs
- CCDS285
- CCDS44091
MANE Select Transcript Exons
MANE Select: ENST00000324856 / NM_006015
Total exons: 20
| Exon ID | Start | End | Strand | Chromosome |
|---|---|---|---|---|
| ENSE00001907429 | 26696015 | 26697540 | + | 1 |
| ENSE00003471930 | 26729651 | 26729863 | + | 1 |
| ENSE00000902180 | 26731152 | 26731604 | + | 1 |
| ENSE00001227857 | 26732676 | 26732792 | + | 1 |
| ENSE00001349760 | 26762152 | 26762319 | + | 1 |
| ENSE00001349761 | 26761384 | 26761473 | + | 1 |
| ENSE00001349762 | 26760856 | 26761096 | + | 1 |
| ENSE00001157462 | 26762973 | 26763285 | + | 1 |
| ENSE00003672172 | 26773346 | 26773496 | + | 1 |
| ENSE00003589420 | 26773580 | 26773717 | + | 1 |
| ENSE00003460238 | 26773802 | 26773898 | + | 1 |
| ENSE00000872621 | 26771119 | 26771326 | + | 1 |
| ENSE00001227767 | 26772812 | 26772987 | + | 1 |
| ENSE00001227772 | 26772500 | 26772632 | + | 1 |
| ENSE00001349739 | 26774329 | 26775220 | + | 1 |
| ENSE00003552035 | 26775577 | 26775707 | + | 1 |
| ENSE00000766221 | 26766221 | 26766366 | + | 1 |
| ENSE00001349753 | 26766457 | 26766566 | + | 1 |
| ENSE00001349752 | 26767790 | 26767999 | + | 1 |
| ENSE00001883917 | 26779023 | 26782104 | + | 1 |
Protein identifiers
UniProt Accessions
- O14497 (canonical reviewed) — AT-rich interactive domain-containing protein 1A
RefSeq Protein Accessions (NP_)
- NP_006006 (MANE Select, canonical)
- NP_624361
Protein Domains and Families (InterPro)
| ID | Name | Type |
|---|---|---|
| IPR001606 | ARID_dom | Domain |
| IPR030094 | ARID1A_ARID_BRIGHT_DNA-bd | Domain |
| IPR033388 | BAF250_C | Domain |
| IPR021906 | BAF250/Osa | Family |
| IPR011989 | ARM-like | Homologous_superfamily |
| IPR016024 | ARM-type_fold | Homologous_superfamily |
| IPR036431 | ARID_dom_sf | Homologous_superfamily |
Pfam Domains
- PF01388
- PF12031
Antibody Availability
No antibodies found in biobtree antibody resources for ARID1A (O14497).
Structure
Experimental Structures (PDB)
Total: 7 structures
| PDB ID | Method | Resolution | Title |
|---|---|---|---|
| 1RYU | NMR | Solution state | Solution Structure of the SWI1 ARID |
| 6LTH | Cryo-EM | 3.0 Å | Structure of human BAF Base module |
| 6LTJ | Cryo-EM | 3.7 Å | Structure of nucleosome-bound human BAF complex |
| 9RL4 | Cryo-EM | 3.5 Å | Structure of BAF in complex with OCT4-SOX2-bound nucleosome - SHL-6 |
| 9RMC | Cryo-EM | 4.2 Å | Structure of BAF in complex with OCT4-SOX2-bound nucleosome - SHL+6 class 1 |
| 9RN1 | Cryo-EM | 5.9 Å | Structure of BAF-nucleosome complex with OCT4-SOX2 at SHL+6 in ADP-bound state, BAF47 bound to ATPase lobe 2 |
| 9RN2 | Cryo-EM | 4.1 Å | Structure of BAF in complex with OCT4-SOX2-bound nucleosome - SHL+6 class 2 |
Predicted Structures (AlphaFold)
| Model ID | Global pLDDT | Very High Confidence Region (pLDDT ≥90) |
|---|---|---|
| O14497 | 47.81 | 17% |
Cross-species orthologs
| Organism | Gene ID | Symbol |
|---|---|---|
| Mouse (Mus musculus) | ENSMUSG00000007880 | Arid1a |
| Rat (Rattus norvegicus) | ENSRNOG00000006137 | Arid1a |
| Zebrafish (Danio rerio) | ENSDARG00000101710 | arid1aa |
| Fruit fly (Drosophila melanogaster) | FBGN0261885 | osa |
| Worm (C. elegans) | WBGENE00002717 | let-526 |
| Yeast (S. cerevisiae) | none | none |
Clinical variants & AI predictions
ClinVar Summary
Total variants: ~1,780
Classification breakdown:
| Classification | Count |
|---|---|
| Pathogenic | ~50 |
| Likely Pathogenic | ~120 |
| Uncertain Significance | ~600 |
| Likely Benign | ~380 |
| Benign | ~420 |
| Benign/Likely Benign | ~150 |
| Conflicting | ~60 |
TOP 30 Pathogenic/Likely Pathogenic Variants (ClinVar)
| Variant ID | HGVS Notation | Classification | Associated Condition |
|---|---|---|---|
| 1065491 | c.175G>T (p.Glu59Ter) | Pathogenic | ARID1A-related disorder |
| 1177329 | c.166C>T (p.Gln56Ter) | Pathogenic | ARID1A-related disorder |
| 1177330 | c.1708_1766del (p.Pro570fs) | Pathogenic | ARID1A-related disorder |
| 1177343 | c.2914del (p.Asp972fs) | Pathogenic | ARID1A-related disorder |
| 1182296 | c.3230C>A (p.Ala1077Glu) | Pathogenic/Likely pathogenic | ARID1A-related disorder |
| 1120179 | c.5940_6000del (p.Val1982fs) | Pathogenic | ARID1A-related disorder |
| 1323396 | c.1850C>A (p.Ser617Ter) | Pathogenic | ARID1A-related disorder |
| 1323404 | c.2122C>T (p.Gln708Ter) | Pathogenic | ARID1A-related disorder |
| 1028997 | c.5963T>C (p.Ile1988Thr) | Likely pathogenic | ARID1A-related disorder |
| 1172645 | c.4049del (p.Ser1350fs) | Likely pathogenic | ARID1A-related disorder |
| 1176188 | c.3169T>C (p.Ser1057Pro) | Likely pathogenic | ARID1A-related disorder |
| 1177344 | c.3146T>G (p.Leu1049Arg) | Likely pathogenic | ARID1A-related disorder |
| 1298412 | c.791C>A (p.Ser264Ter) | Likely pathogenic | ARID1A-related disorder |
| 1307168 | c.4101G>A (p.Gln1367=) | Likely pathogenic | ARID1A-related disorder |
| 1314744 | c.2341A>G (p.Ile781Val) | Likely pathogenic | ARID1A-related disorder |
| 1320102 | c.595C>T (p.Gln199Ter) | Likely pathogenic | ARID1A-related disorder |
AlphaMissense Pathogenicity Predictions
Total AlphaMissense predictions: 197
Likely pathogenic predictions: 100+
TOP 30 Likely-Pathogenic Missense Variants (AlphaMissense)
| Protein Variant | am_pathogenicity Score | Position |
|---|---|---|
| A2D | 0.969 | 1:26696408 |
| A2V | 0.940 | 1:26696408 |
| K25N | 0.949 | 1:26696478 |
| K26I | 0.868 | 1:26696480 |
| D75H | 0.954 | 1:26696626 |
| D75V | 0.932 | 1:26696627 |
| S79R | 0.924 | 1:26696638 |
| E72K | 0.911 | 1:26696617 |
| A3V | 0.908 | 1:26696411 |
| E78K | 0.908 | 1:26696635 |
| A3E | 0.888 | 1:26696411 |
| K71N | 0.880 | 1:26696616 |
| D75A | 0.880 | 1:26696627 |
| A6D | 0.879 | 1:26696420 |
| K25T | 0.859 | 1:26696477 |
| S11R | 0.868 | 1:26696434 |
| S12R | 0.885 | 1:26696437 |
| A3T | 0.821 | 1:26696410 |
| G14R | 0.822 | 1:26696443 |
| G76R | 0.806 | 1:26696629 |
| D75N | 0.797 | 1:26696626 |
| N80K | 0.782 | 1:26696643 |
| A2P | 0.779 | 1:26696407 |
| G76E | 0.781 | 1:26696630 |
| A3P | 0.810 | 1:26696410 |
| K26E | 0.717 | 1:26696479 |
| A9D | 0.706 | 1:26696429 |
| A8D | 0.672 | 1:26696426 |
| D75E | 0.600 | 1:26696628 |
| A10D | 0.598 | 1:26696432 |
Splice Effect Predictions (SpliceAI)
Total SpliceAI predictions: 2,653
Effect types: donor gain, donor loss, acceptor gain, acceptor loss
TOP 30 High-Impact Splice Variants
| Position | Variant | Effect | Delta Score |
|---|---|---|---|
| 1:26697536 | CTCAG>C | donor_loss | 0.99 |
| 1:26697537 | TCAG>T | donor_loss | 0.99 |
| 1:26697538 | CAGG>C | donor_loss | 0.99 |
| 1:26697539 | AG>A | donor_loss | 0.99 |
| 1:26697540 | GG>G | donor_loss | 0.99 |
| 1:26697541 | G>GA | donor_loss | 0.99 |
| 1:26697542 | T>A | donor_loss | 0.99 |
| 1:26697549 | C>G | donor_gain | 0.99 |
| 1:26696224 | GAGCC>G | donor_gain | 0.86 |
| 1:26697036 | G>GG | donor_gain | 0.80 |
| 1:26697035 | A>AG | donor_gain | 0.79 |
| 1:26697541 | G>GG | donor_gain | 0.79 |
| 1:26698162 | C>G | donor_gain | 0.80 |
| 1:26696102 | C>T | donor_gain | 0.73 |
| 1:26698175 | GTA>G | donor_gain | 0.70 |
| 1:26697497 | G>GT | donor_gain | 0.62 |
| 1:26697111 | T>TA | donor_gain | 0.61 |
| 1:26698173 | GAGTA>G | donor_gain | 0.59 |
| 1:26697548 | GC>G | donor_gain | 0.59 |
| 1:26698145 | G>GT | donor_gain | 0.52 |
| 1:26697112 | G>GA | donor_gain | 0.68 |
| 1:26696573 | A>T | donor_gain | 0.67 |
| 1:26698162 | C>CG | donor_gain | 0.21 |
| 1:26696163 | C>G | donor_gain | 0.63 |
| 1:26697693 | G>T | donor_gain | 0.62 |
| 1:26697381 | C>T | donor_gain | 0.26 |
| 1:26698178 | G>GG | donor_gain | 0.68 |
| 1:26696470 | G>A | donor_gain | 0.91 |
| 1:26697656 | G>T | donor_gain | 0.26 |
| 1:26697367 | G>GT | donor_gain | 0.25 |
Pathways & Gene Ontology
Reactome Pathways
Total: 8 pathways
| ID | Pathway Name |
|---|---|
| R-HSA-3214858 | RMTs methylate histone arginines |
| R-HSA-8939243 | RUNX1 interacts with co-factors whose precise effect on RUNX1 targets is not known |
| R-HSA-9764790 | Positive Regulation of CDH1 Gene Transcription |
| R-HSA-9824585 | Regulation of MITF-M-dependent genes involved in pigmentation |
| R-HSA-9845323 | Regulation of endogenous retroelements by Piwi-interacting RNAs (piRNAs) |
| R-HSA-9933937 | Formation of the canonical BAF (cBAF) complex |
| R-HSA-9933946 | Formation of the embryonic stem cell BAF (esBAF) complex |
| R-HSA-9934037 | Formation of neuronal progenitor and neuronal BAF (npBAF and nBAF) |
MSigDB Gene Sets
Total: 697 gene sets (curated gene set membership from MSigDB database)
Gene Ontology Annotations
Biological Process
Count: 16 terms
| GO ID | Term |
|---|---|
| GO:0006325 | Chromatin organization |
| GO:0006337 | Nucleosome disassembly |
| GO:0006338 | Chromatin remodeling |
| GO:0006357 | Regulation of transcription by RNA polymerase II |
| GO:0007399 | Nervous system development |
| GO:0030071 | Regulation of mitotic metaphase/anaphase transition |
| GO:0045582 | Positive regulation of T cell differentiation |
| GO:0045597 | Positive regulation of cell differentiation |
| GO:0045663 | Positive regulation of myoblast differentiation |
| GO:0045815 | Transcription initiation-coupled chromatin remodeling |
| GO:0045893 | Positive regulation of DNA-templated transcription |
| GO:0070316 | Regulation of G0 to G1 transition |
| GO:1902459 | Positive regulation of stem cell population maintenance |
| GO:2000045 | Regulation of G1/S transition of mitotic cell cycle |
| GO:2000781 | Positive regulation of double-strand break repair |
| GO:2000819 | Regulation of nucleotide-excision repair |
Molecular Function
Count: 5 terms
| GO ID | Term |
|---|---|
| GO:0003677 | DNA binding |
| GO:0003713 | Transcription coactivator activity |
| GO:0005515 | Protein binding |
| GO:0016922 | Nuclear receptor binding |
| GO:0031491 | Nucleosome binding |
Cellular Component
Count: 8 terms
| GO ID | Term |
|---|---|
| GO:0000785 | Chromatin |
| GO:0005634 | Nucleus |
| GO:0005654 | Nucleoplasm |
| GO:0016514 | SWI/SNF complex |
| GO:0035060 | Brahma complex |
| GO:0071564 | npBAF complex |
| GO:0071565 | nBAF complex |
| GO:0140092 | bBAF complex |
Protein interactions & networks
ARID1A (O14497) – Human AT-rich interactive domain-containing protein 1A
- Length: 2285 amino acids
- Molecular weight: ~242 kDa
Protein-Protein Interactions
Total interaction count (approximate):
- STRING: 2,774 interactions
- BioGRID: 486 interactions
- IntAct: 201 interactions (from xref count, direct mapping not available)
- Combined: ~3,461 interactions across databases
TOP 30 highest-confidence STRING interacting proteins:
| Rank | Protein | Interaction Count | Size (aa) | Function |
|---|---|---|---|---|
| 1 | SMARCA4 (BRG1) | 5,148 | 1,647 | SWI/SNF ATPase catalytic subunit |
| 2 | CREBBP (p300-CBP) | 5,086 | 2,442 | Histone acetyltransferase |
| 3 | TERT | 5,450 | 1,132 | Telomerase reverse transcriptase |
| 4 | ATRX | 5,224 | 2,492 | Chromatin remodeler, H3.3 chaperone |
| 5 | ACTL6A (BAF53A) | 4,850 | 429 | SWI/SNF actin-like component |
| 6 | ACTL6B (BAF53B) | 4,366 | 426 | Neural-specific SWI/SNF subunit |
| 7 | SMARCA2 (CHD2) | 3,380 | 1,590 | SWI/SNF ATPase variant |
| 8 | SMARCB1 (BAF47) | 3,478 | 394 | SWI/SNF core subunit |
| 9 | PPP2R1A | 3,046 | 589 | Protein phosphatase 2A regulatory subunit |
| 10 | KMT2C (MLL3) | 2,988 | 4,911 | Histone H3K4 methyltransferase |
| 11 | PBRM1 (BAF180) | 2,972 | 1,634 | SWI/SNF-B (PBAF) subunit |
| 12 | SMARCC1 | 2,936 | 1,105 | SWI/SNF core component |
| 13 | SMARCC2 | 2,760 | 1,245 | SWI/SNF core component |
| 14 | SMARCA1 (SNF2L1) | 4,094 | 1,070 | NURF/CERF ATPase |
| 15 | SMARCA5 (ISWI) | 5,382 | 1,052 | Chromatin remodeler |
| 16 | BRD4 | 5,234 | 1,362 | Acetyl-histone reader, elongation factor |
| 17 | EP300 (p300) | 6,450 | 2,414 | Histone acetyltransferase |
| 18 | KMT2D (MLL2) | 2,772 | 5,537 | H3K4 methyltransferase |
| 19 | EZH2 | 7,058 | 751 | Polycomb PRC2 methyltransferase |
| 20 | TP53 (p53) | 14,764 | 393 | Tumor suppressor |
| 21 | PTEN | 9,614 | 403 | Phosphatase tumor suppressor |
| 22 | BRCA1 | 6,120 | 1,884 | DNA repair E3 ubiquitin ligase |
| 23 | BRCA2 | 3,778 | 3,418 | DNA repair recombination protein |
| 24 | ATM | 6,446 | 3,056 | DNA damage sensor kinase |
| 25 | PIK3CA | 4,602 | 1,068 | PI3K catalytic subunit α |
| 26 | KRAS | 10,098 | 189 | Ras GTPase |
| 27 | NRAS | 6,520 | 189 | Ras GTPase |
| 28 | BRAF | 6,138 | 767 | RAF serine/threonine kinase |
| 29 | AKT1 | 14,324 | 480 | Serine/threonine kinase |
| 30 | EGFR | 11,600 | 1,210 | Epidermal growth factor receptor |
Additional major interaction partners include: SMARCD1/2/3, BRD7/9, DPF1/2/3, MSH2/6, ARID1B (paralog), ARID2, CTNNB1, MYC, CREBBP, HDAC1, SETD2, POLE, KDM6A, ASXL1, BCL6
Protein Similarity
Structural/Embedding Similarity (ESM2 embeddings – 39 proteins): Top matches include orthologs and paralogs across species with high-confidence structural similarity to ARID1A domains. ESM2-predicted structurally similar proteins span diverse organisms and include multiple ARID family members.
Sequence Homology (Diamond BLAST – 21 proteins): Top sequence homologs identified through diamond similarity include ARID family paralog members (ARID1B, ARID2, ARID5B) and related chromatin-associated ARID-domain proteins with >40% sequence identity.
Key homology observations:
- ARID1B: Mammalian paralog (2,289 aa; 1,570 STRING interactions) – BAF complex component
- ARID2: PBAF complex-specific subunit (1,835 aa; 1,776 interactions) – high functional overlap
- ARID5B: Adipogenesis/development regulator (1,188 aa; 1,624 interactions) – divergent function
Based on the biobtree data, here’s the summary:
Transcription factor regulatory data
ARID1A is not a canonical transcription factor. ARID1A (AT-rich interactive domain-containing protein 1A) is a chromatin remodeling co-factor and subunit of the SWI/SNF complex with transcription coactivator activity, rather than a DNA-binding transcription factor with its own defined DNA binding motifs.
No JASPAR DNA Binding Motifs
ARID1A has no annotated DNA binding motifs in JASPAR. While it possesses intrinsic DNA binding capacity (GO:0003677) and can participate in chromatin remodeling and transcriptional regulation, it does not function as a sequence-specific DNA binding transcription factor.
Downstream Targets (Limited)
ARID1A has 11 documented downstream targets via the COLLECTRI database:
- AR — Activation
- SMARCA1 — High confidence
- BMP10, CDH1, CDH17, IL10, SLU7, SMARCA2, TNFRSF11A — Low confidence
- CDKN1A, SMAD3 — Unknown regulation type
These represent mostly co-factor associations or putative targets, not direct transcriptional regulation, consistent with ARID1A’s role as a chromatin remodeling component rather than a sequence-specific activator.
Upstream Regulators
Upstream regulatory data for ARID1A is limited in available databases. MSIGDB analysis reveals that the ARID1A promoter region contains predicted transcription factor binding sites for:
- MAZ, AP4, TCF4, PU1, LEF1, VDR, RFX1, HOXA4, TFAP2C, and others (via TRANSFAC motifs)
However, these represent predicted binding sites rather than experimentally validated ChIP-seq confirmed regulators. Specific upstream transcription factors regulating ARID1A with experimentally validated evidence are not comprehensively annotated in the available biobtree datasets.
Drug & pharmacology data
ARID1A is not currently a known drug target with approved therapeutics or molecules in clinical development.
Comprehensive searches across ChEMBL, DrugBank, clinical trials databases, PharmGKB, PubChem, and BindingDB identified:
- 0 targeting molecules with development phase data
- 0 clinical trials involving ARID1A-targeting drugs
- 0 pharmacogenomics interactions or dosing guidelines related to ARID1A variants
Context: ARID1A encodes an AT-rich interaction domain protein and is a core subunit of the chromatin-remodeling BAF/SWI-SNF complex. While it is a well-established tumor suppressor frequently mutated in cancers (including ovarian, gastric, and colorectal cancers), therapeutic strategies directly targeting ARID1A remain in research stages. The gene is studied primarily for its synthetic lethal interactions and as a biomarker for treatment sensitivity, rather than as a direct drug target. Any therapeutic approaches would be indirect (targeting ARID1A-deficient tumor vulnerabilities) rather than direct inhibition or activation of ARID1A protein.
Expression profiles
Tissue expression (Bgee)
ARID1A shows ubiquitous expression across human tissues with high expression breadth (286/295 conditions present). Maximum expression score: 96.39 (bone marrow cell); Average: 89.08.
| Rank | Tissue/Cell Type | Expression Score | Quality |
|---|---|---|---|
| 1 | Bone marrow cell | 96.39 | Gold |
| 2 | Ventricular zone | 96.34 | Gold |
| 3 | Embryo | 96.24 | Gold |
| 4 | Colonic epithelium | 96.01 | Gold |
| 5 | Ileal mucosa | 95.83 | Gold |
| 6 | Cortical plate | 95.76 | Gold |
| 7 | Ganglionic eminence | 95.46 | Gold |
| 8 | Caput epididymis | 95.05 | Gold |
| 9 | Corpus epididymis | 94.87 | Gold |
| 10 | Sural nerve | 94.87 | Gold |
| 11 | Trabecular bone tissue | 94.78 | Gold |
| 12 | Adult organism | 94.36 | Gold |
| 13 | Nipple | 94.30 | Gold |
| 14 | Pigmented layer of retina | 94.26 | Gold |
| 15 | Pylorus | 94.10 | Gold |
| 16 | Nasal cavity epithelium | 94.03 | Gold |
| 17 | Tonsil | 93.83 | Gold |
| 18 | Lower lobe of lung | 93.69 | Gold |
| 19 | Cauda epididymis | 93.35 | Gold |
| 20 | Mammary duct | 93.31 | Gold |
| 21 | Oocyte | 93.28 | Gold |
| 22 | Cardia of stomach | 93.16 | Gold |
| 23 | Mammalian vulva | 93.15 | Gold |
| 24 | Epithelium of mammary gland | 92.99 | Gold |
| 25 | Seminal vesicle | 92.86 | Gold |
| 26 | Upper leg skin | 92.75 | Gold |
| 27 | Tibialis anterior | 92.67 | Gold |
| 28 | Thymus | 92.63 | Gold |
| 29 | Superficial temporal artery | 92.61 | Gold |
| 30 | Leukocyte | 92.37 | Gold |
Tissue patterns: High expression in hematopoietic tissues (bone marrow, leukocytes), reproductive tissues (epididymis, seminal vesicle, ovary), nervous system (ventricular zone, cortical plate, ganglionic eminence), and embryonic tissues. Expression is consistently high across diverse tissues.
Single-cell and cell type expression (SCXA)
Single-cell RNA-seq dataset of human haematopoietic lympho-myeloid progenitor populations:
Dataset: E-GEOD-100618 (Human haematopoietic progenitors from umbilical cord blood)
- 415 cells profiled using smart-seq2/smart-seq technology
- Source tissue: Umbilical cord blood
| Cell Type | Cluster | Expression Score | Log Fold Change |
|---|---|---|---|
| Multi-lymphoid progenitor | 1 | High | +2.48-4.49 |
| Granulocyte macrophage progenitor | 2 | 8.41 (ARID1A) | +2.57 |
| Lymphoid-primed multipotent progenitor | 3 | Variable | Variable |
Expression pattern: ARID1A is expressed across haematopoietic progenitor populations with moderately elevated levels in granulocyte macrophage progenitors (cluster 2), consistent with its role in chromatin remodeling during cell differentiation and proliferation.
Disease associations
Mendelian/Monogenic Diseases
ARID1A mutations cause autosomal dominant developmental and neurodevelopmental disorders:
| Disease | Disease ID | Inheritance | Evidence Level |
|---|---|---|---|
| Intellectual disability, autosomal dominant 14 | OMIM:614607 / MONDO:0013819 | Autosomal dominant | Definitive (GenCC), Strong (clinical) |
| Coffin-Siris syndrome 1 | ORPHANET:1465 / MONDO:0007617 | Autosomal dominant | Supportive (GenCC) |
Related neurodevelopmental conditions (ClinVar-derived):
- Non-immune hydrops fetalis (ORPHANET:363999, MONDO:0009369)
- Septo-optic dysplasia spectrum (ORPHANET:3157)
Tumor associations (identified via ClinVar): endometrial carcinoma, colorectal cancer, urinary bladder cancer, hepatoblastoma, medulloblastoma, astrocytoma, retinitis pigmentosa 59
Phenotype Associations (HPO, Top 30)
Key developmental and morphological phenotypes:
| Phenotype | HPO ID | Category |
|---|---|---|
| Intellectual disability | HP:0001249 | Neurological |
| Global developmental delay | HP:0001263 | Developmental |
| Seizure | HP:0001250 | Neurological |
| Microcephaly | HP:0000252 | Craniofacial |
| Cerebellar hypoplasia | HP:0001321 | CNS |
| Agenesis of corpus callosum | HP:0001274 | CNS |
| Autistic behavior | HP:0000729 | Behavioral |
| Delayed speech and language development | HP:0000750 | Developmental |
| Broad philtrum | HP:0000289 | Facial |
| Coarse facial features | HP:0000280 | Facial |
| Wide mouth | HP:0000154 | Facial |
| Short stature | HP:0004322 | Growth |
| Growth delay | HP:0001510 | Growth |
| Intrauterine growth retardation | HP:0001511 | Growth |
| Feeding difficulties | HP:0011968 | GI/Developmental |
| Hypotonia | HP:0001252 | Motor |
| Short 5th finger | HP:0009237 | Skeletal |
| Brachydactyly | HP:0001156 | Skeletal |
| Ventral septal defect | HP:0001629 | Cardiac |
| Atrial septal defect | HP:0001631 | Cardiac |
| Patent ductus arteriosus | HP:0001643 | Cardiac |
| Cleft palate | HP:0000175 | Orofacial |
| Macroglossia | HP:0000158 | Oral |
| Strabismus | HP:0000486 | Ocular |
| Ptosis | HP:0000508 | Ocular |
| Hearing impairment | HP:0000365 | Auditory |
| Hepatoblastoma | HP:0002884 | Neoplastic |
| Recurrent infections | HP:0002719 | Immunological |
| Joint hypermobility | HP:0001382 | Skeletal |
| Scoliosis | HP:0002650 | Skeletal |
GWAS Associations (Top 30)
ARID1A variants associate significantly with metabolic, hematologic, and lipid traits:
| Trait | P-value | Category |
|---|---|---|
| HDL cholesterol levels × alcohol consumption (drinkers vs non-drinkers interaction) | 7e-156 | Lipid metabolism |
| HDL cholesterol levels × alcohol consumption (drinkers vs non-drinkers interaction) | 4e-154 | Lipid metabolism |
| HDL cholesterol levels | 4e-38 | Lipid metabolism |
| Apolipoprotein B levels | 1e-38 | Lipid metabolism |
| Triglyceride levels | 4e-35 | Lipid metabolism |
| Apolipoprotein A1 levels | 2e-28 | Lipid metabolism |
| Heel bone mineral density | 2e-25 | Skeletal |
| Platelet count | 4e-24 | Hematologic |
| LDL cholesterol levels × alcohol consumption (regular vs non-regular drinkers) | 2e-14 | Lipid metabolism |
| Plateletcrit | 7e-15 | Hematologic |
| Total cholesterol levels | 2e-08 | Lipid metabolism |
| Alanine aminotransferase levels | 3e-08 | Liver function |
| Monocyte percentage of white cells | 6e-12 | Hematologic |
| Metabolic syndrome | 2e-12 | Metabolic |
| Granulocyte percentage of myeloid white cells | 2e-12 | Hematologic |
| Neutrophil count | 2e-09 | Hematologic |
| Heel bone mineral density | 1e-09 | Skeletal |
| Liver volume | 7e-10 | Hepatic |
| HDL cholesterol levels in current drinkers | 4e-08 | Lipid metabolism |
| Heel bone mineral density | 4e-11 | Skeletal |
| HDL cholesterol levels x alcohol consumption (regular vs non-regular drinkers interaction) | 2e-18 | Lipid metabolism |
| LDL cholesterol levels x alcohol consumption (drinkers vs non-drinkers) | 4e-13 | Lipid metabolism |
| White matter hyperintensities in ischemic stroke | 2e-06 | Neurological |
| Rosacea symptom severity | 1e-06 | Dermatologic |
| LDL cholesterol levels × alcohol consumption (regular vs non-regular drinkers) | 4e-15 | Lipid metabolism |
| HDL cholesterol levels (drinkers vs non-drinkers) | 4e-19 | Lipid metabolism |
| Triglyceride levels × alcohol consumption (regular vs non-regular drinkers) | 2e-06 | Lipid metabolism |
| Triglyceride levels × alcohol consumption (drinkers vs non-drinkers) | 1e-06 | Lipid metabolism |
| HDL cholesterol levels | 5e-07 | Lipid metabolism |
| HDL cholesterol levels | 2e-07 | Lipid metabolism |