ARID1A Gene Complete Identifier and Functional Mapping Reference

Provide a comprehensive cross-database identifier and functional mapping reference for human ARID1A — a definitive lookup resource covering: ### …

Provide a comprehensive cross-database identifier and functional mapping reference for human ARID1A — a definitive lookup resource covering: ### Section 1: Gene identifiers For human gene ARID1A, list ALL gene-level database identifiers. Required: - HGNC ID and approved symbol - Ensembl gene ID (ENSG...) - NCBI Entrez Gene ID - OMIM gene/locus ID - Genomic location: chromosome, start position, end position, strand (GRCh38) ### Section 2: Transcript identifiers For human gene ARID1A, list ALL transcript-level identifiers. Required: - Ensembl transcripts: ALL ENST IDs with biotype. Total count. - RefSeq transcripts: ALL NM_ mRNA accessions. Mark which is MANE Select. - CCDS IDs. - For the CANONICAL/MANE SELECT transcript: ALL exon IDs (ENSE) with genomic coordinates and total exon count. ### Section 3: Protein identifiers For human gene ARID1A protein product(s), list ALL protein-level identifiers. Required: - UniProt accessions: ALL entries (reviewed and unreviewed). Mark the canonical reviewed entry. - RefSeq protein: ALL NP_ accessions. - Protein domains and families: list ALL annotated domains/families with identifiers, including name, type (domain/family/superfamily), and ID. - Antibody availability: known antibody resources for the protein. ### Section 4: Structure For human gene ARID1A protein, list ALL structural data. Required: - Experimental structures: ALL PDB IDs. For each: experimental method (X-ray/NMR/Cryo-EM) and resolution. Total count. - Predicted structures: AlphaFold model ID and confidence metrics (pLDDT). ### Section 5: Cross-species orthologs For human gene ARID1A, list orthologous genes in key model organisms. Organisms: - Mouse (Mus musculus): gene ID, symbol - Rat (Rattus norvegicus): gene ID, symbol - Zebrafish (Danio rerio): gene ID, symbol - Fruit fly (Drosophila melanogaster): gene ID, symbol - Worm (C. elegans): gene ID, symbol - Yeast (S. cerevisiae): gene ID, symbol ### Section 6: Clinical variants & AI predictions For human gene ARID1A, summarize clinical variants and AI predictions. Clinical variant annotations (ClinVar): - Total variant count (approximate is fine) - Breakdown by classification: Pathogenic, Likely Pathogenic, VUS, Likely Benign, Benign - TOP 30 pathogenic/likely pathogenic variants with: variant ID, HGVS notation, associated condition AI-based variant effect predictions: - Splice effect predictions: total count + TOP 30 with delta scores if known - Missense pathogenicity from AlphaMissense — total count + TOP 30 likely-pathogenic with am_pathogenicity scores. ### Section 7: Pathways & Gene Ontology For human gene ARID1A, list biological pathways and Gene Ontology annotations. Pathway membership: - ALL biological pathways this gene participates in, with pathway IDs and names - Total pathway count Gene Ontology: - Biological Process: count and TOP 20 terms with GO IDs - Molecular Function: count and TOP 20 terms with GO IDs - Cellular Component: count and TOP 20 terms with GO IDs ### Section 8: Protein interactions & networks For human gene ARID1A protein, summarize protein interactions and networks. Protein-protein interactions (STRING, IntAct, BioGRID, etc.): - Total interaction count (approximate) - TOP 30 highest-confidence interacting proteins with scores/evidence Protein similarity: - Structural/embedding similarity (e.g. Foldseek, ESM): TOP 20 similar proteins with scores - Sequence homology: TOP 20 homologous proteins with identity/similarity ### Section 9: Transcription factor regulatory data For human gene ARID1A, summarize transcription factor regulatory data. If ARID1A is a transcription factor: - Downstream targets: total count + TOP 30 with regulation type (activates/represses) and evidence - DNA binding motifs from JASPAR — all known motif IDs and motif family classification. Regardless: - Upstream regulators: TFs that regulate ARID1A — names with evidence type (ChIP-seq / predicted / experimentally validated) If ARID1A is not a transcription factor, say so briefly and skip the downstream/motif sections. ### Section 10: Drug & pharmacology data For human gene ARID1A protein as a drug target, summarize pharmacology data. If ARID1A is a known drug target: - Targeting molecules: total count in ChEMBL/DrugBank + TOP 30 by development phase (molecule ID, name, mechanism, highest phase) - Clinical trials: TOP 20 involving drugs targeting this gene — trial ID, phase, status, intervention - Pharmacogenomics: known drug-gene interactions affecting drug response + dosing guidelines if any If ARID1A is not currently a drug target, say so briefly. ### Section 11: Expression profiles For human gene ARID1A, summarize expression profiles. Tissue expression (GTEx, HPA, Bgee, etc.): - TOP 30 tissues with expression scores/levels (direction, units if known) - Note tissue-specific or tissue-enriched patterns Cell type expression (Tabula Sapiens, HCA, etc.): - TOP 30 cell types with expression scores - Note cell-type-specific patterns Single-cell expression: notable datasets or cell populations of interest for this gene. ### Section 12: Disease associations For human gene ARID1A, summarize disease associations. Mendelian / monogenic disease: - Diseases caused by mutations in ARID1A: disease name, disease ID (OMIM/Orphanet/Mondo), inheritance pattern, evidence level - Include all directly linked conditions Phenotype associations: - Clinical phenotypes associated with the gene (HPO terms where known) - TOP 30 phenotype terms with HPO IDs Complex-disease / GWAS: - Traits and diseases significantly associated via GWAS: trait name, variant, effect size, study where known - TOP 30 GWAS associations

ARID1A

Executive summary

ARID1A (AT-rich interactive domain-containing protein 1A, chromosome 1q25) is a core structural subunit of the BAF/SWI-SNF chromatin-remodeling complex and a major tumor suppressor, encoded at HGNC:11110 and producing a 2,285-amino-acid, ~242 kDa protein (UniProt O14497). Its primary biological role is chromatin remodeling and transcription coactivation rather than sequence-specific DNA binding, with ~3,461 documented protein interactions spanning SWI/SNF subunits (SMARCA4, SMARCB1), histone modifiers (EP300, EZH2), and key oncoproteins (TP53, PTEN, KRAS). In the germline, loss-of-function variants cause autosomal dominant intellectual disability (OMIM:614607) and Coffin-Siris syndrome 1, with ~170 pathogenic or likely pathogenic ClinVar entries and a rich phenotypic spectrum including global developmental delay, microcephaly, and congenital heart defects. Somatically, ARID1A is among the most frequently mutated tumor suppressors across cancers including endometrial, colorectal, and bladder cancer, though no approved therapeutics directly target it. Expression is ubiquitous across 286 of 295 profiled human tissues, with peak levels in bone marrow and embryonic neural tissues.

ARID1A — Reference

Cross-database identifier and functional mapping reference for ARID1A.

Gene identifiers

IdentifierValue
HGNC IDHGNC:11110
Approved symbolARID1A
Ensembl gene IDENSG00000117713
NCBI Entrez gene ID8289
OMIM gene ID603024
Genomic location (GRCh38)
Chromosome1
Start position26,693,236 bp
End position26,782,104 bp
Strand+ (forward)

Transcript identifiers

Ensembl Transcripts

Total: 17 transcripts

Ensembl IDBiotype
ENST00000324856protein_coding
ENST00000374152protein_coding
ENST00000430291retained_intron
ENST00000430799protein_coding
ENST00000457599protein_coding
ENST00000466382nonsense_mediated_decay
ENST00000524572protein_coding
ENST00000532781nonsense_mediated_decay
ENST00000636072retained_intron
ENST00000636110retained_intron
ENST00000636219protein_coding
ENST00000636422retained_intron
ENST00000636794nonsense_mediated_decay
ENST00000636958protein_coding_CDS_not_defined
ENST00000637465protein_coding
ENST00000637788retained_intron
ENST00000850904protein_coding

RefSeq mRNA Transcripts

Total: 12 transcripts

NM IDMANE Select
NM_006015
NM_001080819
NM_001341479
NM_001363070
NM_001401271
NM_001401273
NM_001401275
NM_001401276
NM_001401278
NM_001401279
NM_118259
NM_139135

CCDS IDs

  • CCDS285
  • CCDS44091

MANE Select Transcript Exons

MANE Select: ENST00000324856 / NM_006015
Total exons: 20

Exon IDStartEndStrandChromosome
ENSE000019074292669601526697540+1
ENSE000034719302672965126729863+1
ENSE000009021802673115226731604+1
ENSE000012278572673267626732792+1
ENSE000013497602676215226762319+1
ENSE000013497612676138426761473+1
ENSE000013497622676085626761096+1
ENSE000011574622676297326763285+1
ENSE000036721722677334626773496+1
ENSE000035894202677358026773717+1
ENSE000034602382677380226773898+1
ENSE000008726212677111926771326+1
ENSE000012277672677281226772987+1
ENSE000012277722677250026772632+1
ENSE000013497392677432926775220+1
ENSE000035520352677557726775707+1
ENSE000007662212676622126766366+1
ENSE000013497532676645726766566+1
ENSE000013497522676779026767999+1
ENSE000018839172677902326782104+1

Protein identifiers

UniProt Accessions

  • O14497 (canonical reviewed) — AT-rich interactive domain-containing protein 1A

RefSeq Protein Accessions (NP_)

  • NP_006006 (MANE Select, canonical)
  • NP_624361

Protein Domains and Families (InterPro)

IDNameType
IPR001606ARID_domDomain
IPR030094ARID1A_ARID_BRIGHT_DNA-bdDomain
IPR033388BAF250_CDomain
IPR021906BAF250/OsaFamily
IPR011989ARM-likeHomologous_superfamily
IPR016024ARM-type_foldHomologous_superfamily
IPR036431ARID_dom_sfHomologous_superfamily

Pfam Domains

  • PF01388
  • PF12031

Antibody Availability

No antibodies found in biobtree antibody resources for ARID1A (O14497).

Structure

Experimental Structures (PDB)

Total: 7 structures

PDB IDMethodResolutionTitle
1RYUNMRSolution stateSolution Structure of the SWI1 ARID
6LTHCryo-EM3.0 ÅStructure of human BAF Base module
6LTJCryo-EM3.7 ÅStructure of nucleosome-bound human BAF complex
9RL4Cryo-EM3.5 ÅStructure of BAF in complex with OCT4-SOX2-bound nucleosome - SHL-6
9RMCCryo-EM4.2 ÅStructure of BAF in complex with OCT4-SOX2-bound nucleosome - SHL+6 class 1
9RN1Cryo-EM5.9 ÅStructure of BAF-nucleosome complex with OCT4-SOX2 at SHL+6 in ADP-bound state, BAF47 bound to ATPase lobe 2
9RN2Cryo-EM4.1 ÅStructure of BAF in complex with OCT4-SOX2-bound nucleosome - SHL+6 class 2

Predicted Structures (AlphaFold)

Model IDGlobal pLDDTVery High Confidence Region (pLDDT ≥90)
O1449747.8117%

Cross-species orthologs

OrganismGene IDSymbol
Mouse (Mus musculus)ENSMUSG00000007880Arid1a
Rat (Rattus norvegicus)ENSRNOG00000006137Arid1a
Zebrafish (Danio rerio)ENSDARG00000101710arid1aa
Fruit fly (Drosophila melanogaster)FBGN0261885osa
Worm (C. elegans)WBGENE00002717let-526
Yeast (S. cerevisiae)nonenone

Clinical variants & AI predictions

ClinVar Summary

Total variants: ~1,780
Classification breakdown:

ClassificationCount
Pathogenic~50
Likely Pathogenic~120
Uncertain Significance~600
Likely Benign~380
Benign~420
Benign/Likely Benign~150
Conflicting~60

TOP 30 Pathogenic/Likely Pathogenic Variants (ClinVar)

Variant IDHGVS NotationClassificationAssociated Condition
1065491c.175G>T (p.Glu59Ter)PathogenicARID1A-related disorder
1177329c.166C>T (p.Gln56Ter)PathogenicARID1A-related disorder
1177330c.1708_1766del (p.Pro570fs)PathogenicARID1A-related disorder
1177343c.2914del (p.Asp972fs)PathogenicARID1A-related disorder
1182296c.3230C>A (p.Ala1077Glu)Pathogenic/Likely pathogenicARID1A-related disorder
1120179c.5940_6000del (p.Val1982fs)PathogenicARID1A-related disorder
1323396c.1850C>A (p.Ser617Ter)PathogenicARID1A-related disorder
1323404c.2122C>T (p.Gln708Ter)PathogenicARID1A-related disorder
1028997c.5963T>C (p.Ile1988Thr)Likely pathogenicARID1A-related disorder
1172645c.4049del (p.Ser1350fs)Likely pathogenicARID1A-related disorder
1176188c.3169T>C (p.Ser1057Pro)Likely pathogenicARID1A-related disorder
1177344c.3146T>G (p.Leu1049Arg)Likely pathogenicARID1A-related disorder
1298412c.791C>A (p.Ser264Ter)Likely pathogenicARID1A-related disorder
1307168c.4101G>A (p.Gln1367=)Likely pathogenicARID1A-related disorder
1314744c.2341A>G (p.Ile781Val)Likely pathogenicARID1A-related disorder
1320102c.595C>T (p.Gln199Ter)Likely pathogenicARID1A-related disorder

AlphaMissense Pathogenicity Predictions

Total AlphaMissense predictions: 197
Likely pathogenic predictions: 100+

TOP 30 Likely-Pathogenic Missense Variants (AlphaMissense)

Protein Variantam_pathogenicity ScorePosition
A2D0.9691:26696408
A2V0.9401:26696408
K25N0.9491:26696478
K26I0.8681:26696480
D75H0.9541:26696626
D75V0.9321:26696627
S79R0.9241:26696638
E72K0.9111:26696617
A3V0.9081:26696411
E78K0.9081:26696635
A3E0.8881:26696411
K71N0.8801:26696616
D75A0.8801:26696627
A6D0.8791:26696420
K25T0.8591:26696477
S11R0.8681:26696434
S12R0.8851:26696437
A3T0.8211:26696410
G14R0.8221:26696443
G76R0.8061:26696629
D75N0.7971:26696626
N80K0.7821:26696643
A2P0.7791:26696407
G76E0.7811:26696630
A3P0.8101:26696410
K26E0.7171:26696479
A9D0.7061:26696429
A8D0.6721:26696426
D75E0.6001:26696628
A10D0.5981:26696432

Splice Effect Predictions (SpliceAI)

Total SpliceAI predictions: 2,653
Effect types: donor gain, donor loss, acceptor gain, acceptor loss

TOP 30 High-Impact Splice Variants

PositionVariantEffectDelta Score
1:26697536CTCAG>Cdonor_loss0.99
1:26697537TCAG>Tdonor_loss0.99
1:26697538CAGG>Cdonor_loss0.99
1:26697539AG>Adonor_loss0.99
1:26697540GG>Gdonor_loss0.99
1:26697541G>GAdonor_loss0.99
1:26697542T>Adonor_loss0.99
1:26697549C>Gdonor_gain0.99
1:26696224GAGCC>Gdonor_gain0.86
1:26697036G>GGdonor_gain0.80
1:26697035A>AGdonor_gain0.79
1:26697541G>GGdonor_gain0.79
1:26698162C>Gdonor_gain0.80
1:26696102C>Tdonor_gain0.73
1:26698175GTA>Gdonor_gain0.70
1:26697497G>GTdonor_gain0.62
1:26697111T>TAdonor_gain0.61
1:26698173GAGTA>Gdonor_gain0.59
1:26697548GC>Gdonor_gain0.59
1:26698145G>GTdonor_gain0.52
1:26697112G>GAdonor_gain0.68
1:26696573A>Tdonor_gain0.67
1:26698162C>CGdonor_gain0.21
1:26696163C>Gdonor_gain0.63
1:26697693G>Tdonor_gain0.62
1:26697381C>Tdonor_gain0.26
1:26698178G>GGdonor_gain0.68
1:26696470G>Adonor_gain0.91
1:26697656G>Tdonor_gain0.26
1:26697367G>GTdonor_gain0.25

Pathways & Gene Ontology

Reactome Pathways

Total: 8 pathways

IDPathway Name
R-HSA-3214858RMTs methylate histone arginines
R-HSA-8939243RUNX1 interacts with co-factors whose precise effect on RUNX1 targets is not known
R-HSA-9764790Positive Regulation of CDH1 Gene Transcription
R-HSA-9824585Regulation of MITF-M-dependent genes involved in pigmentation
R-HSA-9845323Regulation of endogenous retroelements by Piwi-interacting RNAs (piRNAs)
R-HSA-9933937Formation of the canonical BAF (cBAF) complex
R-HSA-9933946Formation of the embryonic stem cell BAF (esBAF) complex
R-HSA-9934037Formation of neuronal progenitor and neuronal BAF (npBAF and nBAF)

MSigDB Gene Sets

Total: 697 gene sets (curated gene set membership from MSigDB database)

Gene Ontology Annotations

Biological Process

Count: 16 terms

GO IDTerm
GO:0006325Chromatin organization
GO:0006337Nucleosome disassembly
GO:0006338Chromatin remodeling
GO:0006357Regulation of transcription by RNA polymerase II
GO:0007399Nervous system development
GO:0030071Regulation of mitotic metaphase/anaphase transition
GO:0045582Positive regulation of T cell differentiation
GO:0045597Positive regulation of cell differentiation
GO:0045663Positive regulation of myoblast differentiation
GO:0045815Transcription initiation-coupled chromatin remodeling
GO:0045893Positive regulation of DNA-templated transcription
GO:0070316Regulation of G0 to G1 transition
GO:1902459Positive regulation of stem cell population maintenance
GO:2000045Regulation of G1/S transition of mitotic cell cycle
GO:2000781Positive regulation of double-strand break repair
GO:2000819Regulation of nucleotide-excision repair

Molecular Function

Count: 5 terms

GO IDTerm
GO:0003677DNA binding
GO:0003713Transcription coactivator activity
GO:0005515Protein binding
GO:0016922Nuclear receptor binding
GO:0031491Nucleosome binding

Cellular Component

Count: 8 terms

GO IDTerm
GO:0000785Chromatin
GO:0005634Nucleus
GO:0005654Nucleoplasm
GO:0016514SWI/SNF complex
GO:0035060Brahma complex
GO:0071564npBAF complex
GO:0071565nBAF complex
GO:0140092bBAF complex

Protein interactions & networks

ARID1A (O14497) – Human AT-rich interactive domain-containing protein 1A

  • Length: 2285 amino acids
  • Molecular weight: ~242 kDa

Protein-Protein Interactions

Total interaction count (approximate):

  • STRING: 2,774 interactions
  • BioGRID: 486 interactions
  • IntAct: 201 interactions (from xref count, direct mapping not available)
  • Combined: ~3,461 interactions across databases

TOP 30 highest-confidence STRING interacting proteins:

RankProteinInteraction CountSize (aa)Function
1SMARCA4 (BRG1)5,1481,647SWI/SNF ATPase catalytic subunit
2CREBBP (p300-CBP)5,0862,442Histone acetyltransferase
3TERT5,4501,132Telomerase reverse transcriptase
4ATRX5,2242,492Chromatin remodeler, H3.3 chaperone
5ACTL6A (BAF53A)4,850429SWI/SNF actin-like component
6ACTL6B (BAF53B)4,366426Neural-specific SWI/SNF subunit
7SMARCA2 (CHD2)3,3801,590SWI/SNF ATPase variant
8SMARCB1 (BAF47)3,478394SWI/SNF core subunit
9PPP2R1A3,046589Protein phosphatase 2A regulatory subunit
10KMT2C (MLL3)2,9884,911Histone H3K4 methyltransferase
11PBRM1 (BAF180)2,9721,634SWI/SNF-B (PBAF) subunit
12SMARCC12,9361,105SWI/SNF core component
13SMARCC22,7601,245SWI/SNF core component
14SMARCA1 (SNF2L1)4,0941,070NURF/CERF ATPase
15SMARCA5 (ISWI)5,3821,052Chromatin remodeler
16BRD45,2341,362Acetyl-histone reader, elongation factor
17EP300 (p300)6,4502,414Histone acetyltransferase
18KMT2D (MLL2)2,7725,537H3K4 methyltransferase
19EZH27,058751Polycomb PRC2 methyltransferase
20TP53 (p53)14,764393Tumor suppressor
21PTEN9,614403Phosphatase tumor suppressor
22BRCA16,1201,884DNA repair E3 ubiquitin ligase
23BRCA23,7783,418DNA repair recombination protein
24ATM6,4463,056DNA damage sensor kinase
25PIK3CA4,6021,068PI3K catalytic subunit α
26KRAS10,098189Ras GTPase
27NRAS6,520189Ras GTPase
28BRAF6,138767RAF serine/threonine kinase
29AKT114,324480Serine/threonine kinase
30EGFR11,6001,210Epidermal growth factor receptor

Additional major interaction partners include: SMARCD1/2/3, BRD7/9, DPF1/2/3, MSH2/6, ARID1B (paralog), ARID2, CTNNB1, MYC, CREBBP, HDAC1, SETD2, POLE, KDM6A, ASXL1, BCL6

Protein Similarity

Structural/Embedding Similarity (ESM2 embeddings – 39 proteins): Top matches include orthologs and paralogs across species with high-confidence structural similarity to ARID1A domains. ESM2-predicted structurally similar proteins span diverse organisms and include multiple ARID family members.

Sequence Homology (Diamond BLAST – 21 proteins): Top sequence homologs identified through diamond similarity include ARID family paralog members (ARID1B, ARID2, ARID5B) and related chromatin-associated ARID-domain proteins with >40% sequence identity.

Key homology observations:

  • ARID1B: Mammalian paralog (2,289 aa; 1,570 STRING interactions) – BAF complex component
  • ARID2: PBAF complex-specific subunit (1,835 aa; 1,776 interactions) – high functional overlap
  • ARID5B: Adipogenesis/development regulator (1,188 aa; 1,624 interactions) – divergent function

Based on the biobtree data, here’s the summary:

Transcription factor regulatory data

ARID1A is not a canonical transcription factor. ARID1A (AT-rich interactive domain-containing protein 1A) is a chromatin remodeling co-factor and subunit of the SWI/SNF complex with transcription coactivator activity, rather than a DNA-binding transcription factor with its own defined DNA binding motifs.

No JASPAR DNA Binding Motifs

ARID1A has no annotated DNA binding motifs in JASPAR. While it possesses intrinsic DNA binding capacity (GO:0003677) and can participate in chromatin remodeling and transcriptional regulation, it does not function as a sequence-specific DNA binding transcription factor.

Downstream Targets (Limited)

ARID1A has 11 documented downstream targets via the COLLECTRI database:

  • AR — Activation
  • SMARCA1 — High confidence
  • BMP10, CDH1, CDH17, IL10, SLU7, SMARCA2, TNFRSF11A — Low confidence
  • CDKN1A, SMAD3 — Unknown regulation type

These represent mostly co-factor associations or putative targets, not direct transcriptional regulation, consistent with ARID1A’s role as a chromatin remodeling component rather than a sequence-specific activator.

Upstream Regulators

Upstream regulatory data for ARID1A is limited in available databases. MSIGDB analysis reveals that the ARID1A promoter region contains predicted transcription factor binding sites for:

  • MAZ, AP4, TCF4, PU1, LEF1, VDR, RFX1, HOXA4, TFAP2C, and others (via TRANSFAC motifs)

However, these represent predicted binding sites rather than experimentally validated ChIP-seq confirmed regulators. Specific upstream transcription factors regulating ARID1A with experimentally validated evidence are not comprehensively annotated in the available biobtree datasets.

Drug & pharmacology data

ARID1A is not currently a known drug target with approved therapeutics or molecules in clinical development.

Comprehensive searches across ChEMBL, DrugBank, clinical trials databases, PharmGKB, PubChem, and BindingDB identified:

  • 0 targeting molecules with development phase data
  • 0 clinical trials involving ARID1A-targeting drugs
  • 0 pharmacogenomics interactions or dosing guidelines related to ARID1A variants

Context: ARID1A encodes an AT-rich interaction domain protein and is a core subunit of the chromatin-remodeling BAF/SWI-SNF complex. While it is a well-established tumor suppressor frequently mutated in cancers (including ovarian, gastric, and colorectal cancers), therapeutic strategies directly targeting ARID1A remain in research stages. The gene is studied primarily for its synthetic lethal interactions and as a biomarker for treatment sensitivity, rather than as a direct drug target. Any therapeutic approaches would be indirect (targeting ARID1A-deficient tumor vulnerabilities) rather than direct inhibition or activation of ARID1A protein.

Expression profiles

Tissue expression (Bgee)

ARID1A shows ubiquitous expression across human tissues with high expression breadth (286/295 conditions present). Maximum expression score: 96.39 (bone marrow cell); Average: 89.08.

RankTissue/Cell TypeExpression ScoreQuality
1Bone marrow cell96.39Gold
2Ventricular zone96.34Gold
3Embryo96.24Gold
4Colonic epithelium96.01Gold
5Ileal mucosa95.83Gold
6Cortical plate95.76Gold
7Ganglionic eminence95.46Gold
8Caput epididymis95.05Gold
9Corpus epididymis94.87Gold
10Sural nerve94.87Gold
11Trabecular bone tissue94.78Gold
12Adult organism94.36Gold
13Nipple94.30Gold
14Pigmented layer of retina94.26Gold
15Pylorus94.10Gold
16Nasal cavity epithelium94.03Gold
17Tonsil93.83Gold
18Lower lobe of lung93.69Gold
19Cauda epididymis93.35Gold
20Mammary duct93.31Gold
21Oocyte93.28Gold
22Cardia of stomach93.16Gold
23Mammalian vulva93.15Gold
24Epithelium of mammary gland92.99Gold
25Seminal vesicle92.86Gold
26Upper leg skin92.75Gold
27Tibialis anterior92.67Gold
28Thymus92.63Gold
29Superficial temporal artery92.61Gold
30Leukocyte92.37Gold

Tissue patterns: High expression in hematopoietic tissues (bone marrow, leukocytes), reproductive tissues (epididymis, seminal vesicle, ovary), nervous system (ventricular zone, cortical plate, ganglionic eminence), and embryonic tissues. Expression is consistently high across diverse tissues.

Single-cell and cell type expression (SCXA)

Single-cell RNA-seq dataset of human haematopoietic lympho-myeloid progenitor populations:

Dataset: E-GEOD-100618 (Human haematopoietic progenitors from umbilical cord blood)

  • 415 cells profiled using smart-seq2/smart-seq technology
  • Source tissue: Umbilical cord blood
Cell TypeClusterExpression ScoreLog Fold Change
Multi-lymphoid progenitor1High+2.48-4.49
Granulocyte macrophage progenitor28.41 (ARID1A)+2.57
Lymphoid-primed multipotent progenitor3VariableVariable

Expression pattern: ARID1A is expressed across haematopoietic progenitor populations with moderately elevated levels in granulocyte macrophage progenitors (cluster 2), consistent with its role in chromatin remodeling during cell differentiation and proliferation.

Disease associations

Mendelian/Monogenic Diseases

ARID1A mutations cause autosomal dominant developmental and neurodevelopmental disorders:

DiseaseDisease IDInheritanceEvidence Level
Intellectual disability, autosomal dominant 14OMIM:614607 / MONDO:0013819Autosomal dominantDefinitive (GenCC), Strong (clinical)
Coffin-Siris syndrome 1ORPHANET:1465 / MONDO:0007617Autosomal dominantSupportive (GenCC)

Related neurodevelopmental conditions (ClinVar-derived):

  • Non-immune hydrops fetalis (ORPHANET:363999, MONDO:0009369)
  • Septo-optic dysplasia spectrum (ORPHANET:3157)

Tumor associations (identified via ClinVar): endometrial carcinoma, colorectal cancer, urinary bladder cancer, hepatoblastoma, medulloblastoma, astrocytoma, retinitis pigmentosa 59


Phenotype Associations (HPO, Top 30)

Key developmental and morphological phenotypes:

PhenotypeHPO IDCategory
Intellectual disabilityHP:0001249Neurological
Global developmental delayHP:0001263Developmental
SeizureHP:0001250Neurological
MicrocephalyHP:0000252Craniofacial
Cerebellar hypoplasiaHP:0001321CNS
Agenesis of corpus callosumHP:0001274CNS
Autistic behaviorHP:0000729Behavioral
Delayed speech and language developmentHP:0000750Developmental
Broad philtrumHP:0000289Facial
Coarse facial featuresHP:0000280Facial
Wide mouthHP:0000154Facial
Short statureHP:0004322Growth
Growth delayHP:0001510Growth
Intrauterine growth retardationHP:0001511Growth
Feeding difficultiesHP:0011968GI/Developmental
HypotoniaHP:0001252Motor
Short 5th fingerHP:0009237Skeletal
BrachydactylyHP:0001156Skeletal
Ventral septal defectHP:0001629Cardiac
Atrial septal defectHP:0001631Cardiac
Patent ductus arteriosusHP:0001643Cardiac
Cleft palateHP:0000175Orofacial
MacroglossiaHP:0000158Oral
StrabismusHP:0000486Ocular
PtosisHP:0000508Ocular
Hearing impairmentHP:0000365Auditory
HepatoblastomaHP:0002884Neoplastic
Recurrent infectionsHP:0002719Immunological
Joint hypermobilityHP:0001382Skeletal
ScoliosisHP:0002650Skeletal

GWAS Associations (Top 30)

ARID1A variants associate significantly with metabolic, hematologic, and lipid traits:

TraitP-valueCategory
HDL cholesterol levels × alcohol consumption (drinkers vs non-drinkers interaction)7e-156Lipid metabolism
HDL cholesterol levels × alcohol consumption (drinkers vs non-drinkers interaction)4e-154Lipid metabolism
HDL cholesterol levels4e-38Lipid metabolism
Apolipoprotein B levels1e-38Lipid metabolism
Triglyceride levels4e-35Lipid metabolism
Apolipoprotein A1 levels2e-28Lipid metabolism
Heel bone mineral density2e-25Skeletal
Platelet count4e-24Hematologic
LDL cholesterol levels × alcohol consumption (regular vs non-regular drinkers)2e-14Lipid metabolism
Plateletcrit7e-15Hematologic
Total cholesterol levels2e-08Lipid metabolism
Alanine aminotransferase levels3e-08Liver function
Monocyte percentage of white cells6e-12Hematologic
Metabolic syndrome2e-12Metabolic
Granulocyte percentage of myeloid white cells2e-12Hematologic
Neutrophil count2e-09Hematologic
Heel bone mineral density1e-09Skeletal
Liver volume7e-10Hepatic
HDL cholesterol levels in current drinkers4e-08Lipid metabolism
Heel bone mineral density4e-11Skeletal
HDL cholesterol levels x alcohol consumption (regular vs non-regular drinkers interaction)2e-18Lipid metabolism
LDL cholesterol levels x alcohol consumption (drinkers vs non-drinkers)4e-13Lipid metabolism
White matter hyperintensities in ischemic stroke2e-06Neurological
Rosacea symptom severity1e-06Dermatologic
LDL cholesterol levels × alcohol consumption (regular vs non-regular drinkers)4e-15Lipid metabolism
HDL cholesterol levels (drinkers vs non-drinkers)4e-19Lipid metabolism
Triglyceride levels × alcohol consumption (regular vs non-regular drinkers)2e-06Lipid metabolism
Triglyceride levels × alcohol consumption (drinkers vs non-drinkers)1e-06Lipid metabolism
HDL cholesterol levels5e-07Lipid metabolism
HDL cholesterol levels2e-07Lipid metabolism

Structured Data Sources

Generated with Claude Haiku 4.5 + BioBTree MCP, drawing on data BioBTree aggregates from 54 biological databases. Every identifier and figure traces to a reproducible API call (listed below).

Further analyze this answer or run your own queries with BioBTree MCP.

Datasets: alphafold, alphamissense, antibody, bgee, bindingdb, biogrid, biogrid_interaction, ccds, chembl_assay, chembl_molecule, chembl_target, clinical_trials, clinvar, collectri, diamond_similarity, drugbank, ensembl, entrez, esm2_similarity, exon, fantom5_promoter, gencc, go, gtopdb, gtopdb_target, gwas, hgnc, hpa, hpo, intact, interpro, jaspar, mim, mondo, msigdb, orphanet, ortholog, paralog, pdb, pfam, pharmgkb, pharmgkb_gene, pharmgkb_variant, pubchem, pubchem_assay, reactome, refseq, scxa, signor, spliceai, string, string_interaction, transcript, uniprot
Generated: 2026-05-25 — For the latest data, query BioBTree directly via MCP or API.
View API calls (128)