THOC3

gene
On this page

Also known as TEX1MGC5469

Summary

THOC3 (THO complex subunit 3, HGNC:19072) is a protein-coding gene on chromosome 5q35.2, encoding THO complex subunit 3 (Q96J01). Component of the THO subcomplex of the TREX complex which is thought to couple mRNA transcription, processing and nuclear export, and which specifically associates with spliced mRNA and not with unspliced pre-mRNA. It is a common-essential gene (DepMap: required in 99.7% of cancer cell lines).

This gene encodes a component of the nuclear THO transcription elongation complex, which is part of the larger transcription export (TREX) complex that couples messenger RNA processing and export. In humans, the transcription export complex is recruited to the 5’-end of messenger RNAs in a splicing- and cap-dependent manner. Studies of a related complex in mouse suggest that the metazoan transcription export complex is involved in cell differentiation and development. A pseudogene of this gene has been defined on chromosome 5.

Source: NCBI Gene 84321 — RefSeq curated summary.

At a glance

  • GWAS associations: 1
  • Clinical variants (ClinVar): 46 total
  • Cancer dependency (DepMap): dependent in 99.7% of screened cell lines (common-essential)
  • MANE Select transcript: NM_032361

Identifiers

Gene identifiers

FieldValue
HGNC IDHGNC:19072
Approved symbolTHOC3
NameTHO complex subunit 3
Location5q35.2
Locus typegene with protein product
StatusApproved
AliasesTEX1, MGC5469
Ensembl geneENSG00000051596
Ensembl biotypeprotein_coding
OMIM606929
Entrez84321

Gene structure

Transcript identifiers

Ensembl transcripts: 20 — 15 protein_coding, 3 nonsense_mediated_decay, 1 protein_coding_CDS_not_defined, 1 retained_intron

ENST00000265097, ENST00000432305, ENST00000505969, ENST00000510300, ENST00000511062, ENST00000513006, ENST00000513118, ENST00000513482, ENST00000514250, ENST00000514861, ENST00000515016, ENST00000628318, ENST00000909314, ENST00000909315, ENST00000928777, ENST00000928778, ENST00000928779, ENST00000928780, ENST00000928781, ENST00000960460

RefSeq mRNA: 2 — MANE Select: NM_032361 NM_001376902, NM_032361

CCDS: CCDS4397, CCDS93829

Canonical transcript exons

ENST00000265097 — 6 exons

ExonStartEnd
ENSE00002467360175959531175960132
ENSE00003459846175961051175961151
ENSE00003557520175964951175965155
ENSE00003618727175961273175961434
ENSE00003681177175967111175967267
ENSE00003899832175967942175968315

Expression profiles

Bgee: expression breadth ubiquitous, 143 present calls, max score 96.73.

Top tissues by expression

143 total, by Bgee expression score (0-100, higher = more expressed):

TissueAnatomy IDExpression scoreQuality
lower esophagus mucosaUBERON:003583496.73gold quality
esophagus mucosaUBERON:000246996.57gold quality
right testisUBERON:000453496.21gold quality
left testisUBERON:000453395.46gold quality
testisUBERON:000047394.98gold quality
embryoUBERON:000092294.01gold quality
ganglionic eminenceUBERON:000402394.01gold quality
tonsilUBERON:000237293.68gold quality
lymph nodeUBERON:000002993.25gold quality
prefrontal cortexUBERON:000045192.82gold quality
cerebellumUBERON:000203792.52gold quality
cerebellar cortexUBERON:000212992.47gold quality
cerebellar hemisphereUBERON:000224592.45gold quality
islet of LangerhansUBERON:000000692.17gold quality
rectumUBERON:000105291.88gold quality
ventricular zoneUBERON:000305391.70gold quality
right hemisphere of cerebellumUBERON:001489091.70gold quality
vaginaUBERON:000099691.62gold quality
cortical plateUBERON:000534391.60gold quality
mucosa of transverse colonUBERON:000499191.52gold quality
superior frontal gyrusUBERON:000266191.25gold quality
primordial germ cell in gonadCL:0000670 ∩ UBERON:000099190.99gold quality
olfactory segment of nasal mucosaUBERON:000538690.93gold quality
frontal cortexUBERON:000187090.60gold quality
frontal lobeUBERON:001652590.60gold quality
bone marrowUBERON:000237190.51gold quality
stromal cell of endometriumCL:000225590.48gold quality
esophagusUBERON:000104390.22gold quality
Brodmann (1909) area 9UBERON:001354089.88gold quality
placentaUBERON:000198789.84gold quality

Single-cell (SCXA)

Detected in 1 experiment(s), a significant marker in 1.

ExperimentMarker?Max mean expression
E-ANND-3yes14.51

Regulation

Is transcription factor: no

Functional genomics

DepMap (CRISPR cell-line fitness): dependent in 99.7% of screened cell lines, common-essential.

Literature-anchored findings (GeneRIF, showing 2)

  • recruitment of the human TREX complex to spliced mRNA is not directly coupled to transcription, but is instead coupled to transcription indirectly through splicing (PMID:15998806)
  • THOC3 interacts with YBX1 to promote lung squamous cell carcinoma progression through PFKFB4 mRNA modification. (PMID:37500615)

Cross-species orthologs

4 orthologs

OrganismSymbolGene ID
danio_reriothoc3ENSDARG00000056517
mus_musculusThoc3ENSMUSG00000025872
rattus_norvegicusThoc3ENSRNOG00000000104
drosophila_melanogastertexFBGN0037569

Protein

Protein identifiers

THO complex subunit 3Q96J01 (reviewed: Q96J01)

Alternative names: TEX1 homolog, hTREX45

All UniProt accessions (8): D6REC9, D6RGZ2, Q96J01, H0Y9T1, H0YA11, H0YAG7, H0YAJ0, H7C0C5

UniProt curated annotations — full annotation on UniProt →

Function. Component of the THO subcomplex of the TREX complex which is thought to couple mRNA transcription, processing and nuclear export, and which specifically associates with spliced mRNA and not with unspliced pre-mRNA. Required for efficient export of polyadenylated RNA and spliced mRNA. The THOC1-THOC2-THOC3 core complex alone is sufficient to bind export factor NXF1-NXT1 and promote ATPase activity of DDX39B. TREX is recruited to spliced mRNAs by a transcription-independent mechanism, binds to mRNA upstream of the exon-junction complex (EJC) and is recruited in a splicing- and cap-dependent manner to a region near the 5’ end of the mRNA where it functions in mRNA export to the cytoplasm via the TAP/NXF1 pathway. (Microbial infection) The TREX complex is essential for the export of Kaposi’s sarcoma-associated herpesvirus (KSHV) intronless mRNAs and infectious virus production.

Subunit / interactions. Component of the THO subcomplex, which is composed of THOC1, THOC2, THOC3, THOC5, THOC6 and THOC7. The THO subcomplex interacts with DDX39B to form the THO-DDX39B complex which multimerizes into a 28-subunit tetrameric assembly. Component of the transcription/export (TREX) complex at least composed of ALYREF/THOC4, DDX39B, SARNP/CIP29, CHTOP and the THO subcomplex; in the complex interacts with THOC2. TREX seems to have a dynamic structure involving ATP-dependent remodeling.

Subcellular location. Nucleus. Nucleus speckle.

Similarity. Belongs to the THOC3 family.

Isoforms (2)

UniProt IDNamesCanonical?
Q96J01-11yes
Q96J01-22

RefSeq proteins (2): NP_001363831, NP_115737* (*=MANE)

Domains & families (InterPro)

IDNameType
IPR001680WD40_rptRepeat
IPR015943WD40/YVTN_repeat-like_dom_sfHomologous_superfamily
IPR020472WD40_PAC1Repeat
IPR036322WD40_repeat_dom_sfHomologous_superfamily
IPR040132Tex1/THOC3Family

Pfam: PF25174

UniProt features (45 total): strand 27, repeat 6, turn 5, initiator methionine 1, chain 1, splice variant 1, sequence variant 1, helix 1, region of interest 1, modified residue 1

Structure

Experimental structures (PDB)

4 structures.

PDBMethodResolution (Å)
7APKELECTRON MICROSCOPY3.3
7ZNLELECTRON MICROSCOPY3.45
7ZNKELECTRON MICROSCOPY3.9
8R7LELECTRON MICROSCOPY4.12

Predicted structure (AlphaFold)

ModelpLDDTFraction very-high
AF-Q96J01-F191.070.87

Functional residue map

Curated UniProt residues grouped by drug-discovery relevance — catalytic, ligand-binding, modification, and mutation-validated positions. Source: UniProtKB sequence features.

Post-translational modifications (1): 2

Function

Pathways and Gene Ontology

Reactome pathways

8 pathways

IDPathway
R-HSA-159236Transport of Mature mRNA derived from an Intron-Containing Transcript
R-HSA-72187mRNA 3’-end processing
R-HSA-73856RNA Polymerase II Transcription Termination
R-HSA-72202Transport of Mature Transcript to Cytoplasm
R-HSA-72203Processing of Capped Intron-Containing Pre-mRNA
R-HSA-73857RNA Polymerase II Transcription
R-HSA-74160Gene expression (Transcription)
R-HSA-8953854Metabolism of RNA

MSigDB gene sets: 100 (showing top): MONNIER_POSTRADIATION_TUMOR_ESCAPE_UP, GOBP_NUCLEAR_TRANSPORT, REACTOME_MRNA_3_END_PROCESSING, REACTOME_PROCESSING_OF_CAPPED_INTRON_CONTAINING_PRE_MRNA, GOBP_NUCLEOBASE_CONTAINING_COMPOUND_TRANSPORT, GOBP_RNA_SPLICING, GOBP_NUCLEAR_EXPORT, ACEVEDO_METHYLATED_IN_LIVER_CANCER_DN, GOBP_RNA_LOCALIZATION, GRADE_COLON_AND_RECTAL_CANCER_UP, REACTOME_METABOLISM_OF_RNA, GOCC_TRANSCRIPTION_EXPORT_COMPLEX, GOCC_NUCLEAR_SPECK, GOCC_CHROMOSOMAL_REGION, GOCC_NUCLEAR_BODY

GO Biological Process (4): mRNA processing (GO:0006397), mRNA export from nucleus (GO:0006406), RNA splicing (GO:0008380), mRNA transport (GO:0051028)

GO Molecular Function (2): RNA binding (GO:0003723), protein binding (GO:0005515)

GO Cellular Component (7): transcription export complex (GO:0000346), THO complex part of transcription export complex (GO:0000445), nucleoplasm (GO:0005654), nuclear speck (GO:0016607), chromosome, telomeric region (GO:0000781), nucleus (GO:0005634), membrane (GO:0016020)

Reactome top-level categories

Rollup of top-5 pathways:

CategoryPathways
Processing of Capped Intron-Containing Pre-mRNA2
Transport of Mature Transcript to Cytoplasm1
RNA Polymerase II Transcription1
Metabolism of RNA1
Gene expression (Transcription)1

GO top-level categories

Rollup of top GO terms by namespace:

CategoryTerms
RNA processing2
cellular anatomical structure2
mRNA metabolic process1
RNA export from nucleus1
gene expression1
mRNA transport1
RNA transport1
nucleic acid binding1
binding1
nuclear protein-containing complex1
transcription export complex1
THO complex1
nuclear lumen1
nuclear ribonucleoprotein granule1
chromosomal region1
intracellular membrane-bounded organelle1

Protein interactions and networks

STRING

960 interactions, top by confidence (×1000):

Protein AProtein BPartner UniProtScore
THOC3THOC2Q8NI27999
THOC3THOC1Q96FV9999
THOC3DDX39BQ13838998
THOC3THOC5Q13769997
THOC3THOC7Q6I9Y2997
THOC3THOC6Q86W42997
THOC3CYLDQ9NQC7989
THOC3GLI2P10070986
THOC3ALYREFQ86V81963
THOC3SARNPP82979846
THOC3NXF1Q9UBU9750
THOC3TEX2Q8IWB9670
THOC3FYTTD1Q96QD9665
THOC3CHTOPQ9Y3Y2631
THOC3NCBP3Q53F19628

IntAct

66 interactions, top by confidence:

ABTypeScore
DDX39BTHOC5psi-mi:“MI:0915”(physical association)0.800
CCT2TXNDC9psi-mi:“MI:0914”(association)0.730
THOC1EIF4A3psi-mi:“MI:0914”(association)0.660
THOC1DDX39Apsi-mi:“MI:0914”(association)0.640
THOC3CLUHpsi-mi:“MI:0914”(association)0.530
NCBP3SAP18psi-mi:“MI:0914”(association)0.530
CCT7PEX7psi-mi:“MI:0914”(association)0.530
EIF4A3psi-mi:“MI:0915”(physical association)0.490
DDX21MED19psi-mi:“MI:2364”(proximity)0.480
CHTOPSAP18psi-mi:“MI:0915”(physical association)0.400
NUDCD2THOC3psi-mi:“MI:0915”(physical association)0.400
THOC3DLG4psi-mi:“MI:0915”(physical association)0.370
THOC3MLKLpsi-mi:“MI:0915”(physical association)0.370
TK1THOC3psi-mi:“MI:0915”(physical association)0.370
THOC2psi-mi:“MI:0914”(association)0.350
THOC1TARS3psi-mi:“MI:0914”(association)0.350
THOC5MYO1Gpsi-mi:“MI:0914”(association)0.350
THOC7ALYREFpsi-mi:“MI:0914”(association)0.350
BCLAF1PABPN1psi-mi:“MI:0914”(association)0.350
Srsf1SRRM1psi-mi:“MI:0914”(association)0.350
ANKRD28psi-mi:“MI:0914”(association)0.350
KSR1FBLL1psi-mi:“MI:0914”(association)0.350
KSR1DDX39Apsi-mi:“MI:0914”(association)0.350
KSR1FAM168Bpsi-mi:“MI:0914”(association)0.350
KSR1psi-mi:“MI:0914”(association)0.350

BioGRID (212): PAK2 (Co-fractionation), THOC1 (Co-fractionation), THOC2 (Co-fractionation), THOC3 (Co-fractionation), THOC3 (Co-fractionation), THOC6 (Co-fractionation), THOC3 (Affinity Capture-MS), THOC3 (Affinity Capture-MS), THOC3 (Affinity Capture-MS), THOC3 (Affinity Capture-MS), THOC3 (Affinity Capture-MS), THOC3 (Affinity Capture-MS), THOC1 (Affinity Capture-MS), THOC7 (Affinity Capture-MS), THOC5 (Affinity Capture-MS)

ESM2 similar proteins: A0A1L8EXB5, A4QNE6, A8WGF4, C1BK83, O35142, O43684, O55029, P35605, P35606, Q17QU5, Q1JP79, Q1JQB2, Q29RH4, Q29RZ9, Q3UGF1, Q4FZW5, Q4R4I8, Q561Y0, Q5I0B4, Q5M7F6, Q5MNZ6, Q5R664, Q5RB58, Q5U4Y8, Q5VQ78, Q6GNF1, Q6NWV3, Q6PA72, Q6TGU2, Q803V5, Q8AVT9, Q8BGF3, Q8IWZ6, Q8K2G4, Q8L828, Q8NEZ3, Q8VE80, Q92747, Q96J01, Q96MX6

Diamond homologs: A0JMQ0, A1CQI9, A1D3F5, A2QPZ4, A3LXF0, A4H6F7, A4HUV2, A4IHS2, A4R0Q1, A4RDD7, A5DBG1, A5DWF4, A6QX61, A6RRD4, A6RT32, A6RUL1, A6ZMA9, A7EF03, A8ID74, A8NWR2, A8PWB6, A8QD31, A8XYW9, A9UZS7, B0WC36, B0XQ42, B2AY28, B2VR76, B3MHX6, B3NLK7, B4GIU9, B4HN85, B4J9K1, B4KQU8, B4LKS9, B4MYI5, B4P528, B8M7Q5, B8NGT5, B9WD30

SIGNOR signaling

1 interactions.

AEffectBMechanism
THOC3“form complex”“TREX complex”binding

Enriched among interaction partners

Reactome pathways and GO biological processes over-represented among this gene’s 73 IntAct physical interaction partners (hypergeometric vs the genome-wide background, BH-FDR, gene-set size 15–500, ranked by fold). A functional readout of the neighbourhood — distinct from this gene’s own memberships above, and biased toward well-studied / hub proteins, so read it as themes rather than proof.

Reactome pathways:

PathwayPartnersFoldFDR
Transport of Mature Transcript to Cytoplasm756.7×2e-09
mRNA 3’-end processing1354.5×1e-17
Transport of Mature mRNA derived from an Intron-Containing Transcript1238.9×2e-14
RNA Polymerase II Transcription Termination837.4×2e-09
Processing of Capped Intron-Containing Pre-mRNA915.7×2e-07
mRNA Splicing511.7×1e-03
mRNA Splicing - Major Pathway910.5×7e-06
Metabolism of RNA98.0×6e-05

GO biological processes:

GO termPartnersFoldFDR
mRNA export from nucleus1047.7×4e-12
RNA splicing1115.7×3e-08
mRNA processing1114.0×6e-08
mRNA splicing, via spliceosome68.9×5e-03

Disease & clinical

Clinical variants and AI predictions

ClinVar

46 variants total. Per-class counts are floors (≥ shown; pagination cap):

ClassificationCount (floor)
Pathogenic0
Likely pathogenic0
Uncertain significance36
Likely benign2
Benign0

Top pathogenic / likely-pathogenic (0)

SpliceAI

1214 predictions. Top by Δscore:

VariantEffectΔscore
5:175960132:CCT:Cacceptor_loss1.0000
5:175960133:C:Aacceptor_loss1.0000
5:175960134:T:Aacceptor_loss1.0000
5:175961147:CCAGC:Cacceptor_gain1.0000
5:175961148:CAGC:Cacceptor_gain1.0000
5:175961148:CAGCC:Cacceptor_gain1.0000
5:175961149:AGC:Aacceptor_gain1.0000
5:175961150:GC:Gacceptor_gain1.0000
5:175961150:GCC:Gacceptor_loss1.0000
5:175961151:CCTA:Cacceptor_gain1.0000
5:175961152:C:CCacceptor_gain1.0000
5:175961152:C:CGacceptor_loss1.0000
5:175961152:C:Tacceptor_gain1.0000
5:175961153:T:Cacceptor_loss1.0000
5:175961154:A:ACacceptor_gain1.0000
5:175961154:A:Cacceptor_gain1.0000
5:175961157:A:ACacceptor_gain1.0000
5:175961157:A:Cacceptor_gain1.0000
5:175961160:CAGAG:Cacceptor_gain1.0000
5:175961161:A:Tacceptor_gain1.0000
5:175961164:G:Cacceptor_gain1.0000
5:175961164:G:GCacceptor_gain1.0000
5:175961169:C:CTacceptor_gain1.0000
5:175961269:TTACC:Tdonor_loss1.0000
5:175961270:TACCT:Tdonor_loss1.0000
5:175961271:A:ACdonor_gain1.0000
5:175961271:AC:Adonor_gain1.0000
5:175961272:C:CCdonor_gain1.0000
5:175961272:CC:Cdonor_gain1.0000
5:175961338:T:TAdonor_gain1.0000

AlphaMissense

2350 scored. Top likely-pathogenic:

VariantProtein changeam_pathogenicity
5:175960080:C:AW315C1.000
5:175960080:C:GW315C1.000
5:175960082:A:GW315R1.000
5:175960082:A:TW315R1.000
5:175960085:C:GA314P1.000
5:175961084:C:GD287H1.000
5:175961090:A:GS285P1.000
5:175961092:G:TA284E1.000
5:175961098:G:TA282E1.000
5:175961121:G:CF274L1.000
5:175961121:G:TF274L1.000
5:175961122:A:GF274S1.000
5:175961123:A:GF274L1.000
5:175961133:T:AR270S1.000
5:175961133:T:GR270S1.000
5:175961311:C:AW251C1.000
5:175961311:C:GW251C1.000
5:175961312:C:GW251S1.000
5:175961313:A:GW251R1.000
5:175961313:A:TW251R1.000
5:175961315:A:GL250P1.000
5:175961321:A:TV248D1.000
5:175961328:C:GA246P1.000
5:175961330:T:AD245V1.000
5:175961331:C:GD245H1.000
5:175961335:A:CS243R1.000
5:175961335:A:TS243R1.000
5:175961336:C:AS243I1.000
5:175961337:T:GS243R1.000
5:175961339:C:TG242E1.000

dbSNP variants (sampled 300 via entrez): RS1002031700 (5:175968941 A>G), RS1003702495 (5:175962218 T>C), RS1003744992 (5:175962835 A>G), RS1007937640 (5:175960591 G>A), RS1008021935 (5:175966375 C>G), RS1008340533 (5:175959567 A>G,T), RS1010776544 (5:175967011 T>C), RS1011213376 (5:175968531 T>G), RS1012864460 (5:175960772 G>A), RS1012927901 (5:175962682 G>A), RS1015042995 (5:175962240 C>T), RS1015064385 (5:175962961 T>G), RS1018230572 (5:175963868 T>A,C), RS1019257804 (5:175960707 A>G), RS1019329616 (5:175964826 T>C)

Disease associations

OMIM: gene MIM:606929 | disease phenotypes:

GenCC curated gene-disease

Mondo (0):

Orphanet (0):

HPO phenotypes

0 total (0 of 0 shown, HPO-id order):

GWAS associations

1 associations (top):

StudyTraitp-value
GCST007576_353Chronotype2.000000e-08

EFO canonical traits (1, from GWAS)

EFO IDTrait name
EFO:0008328chronotype measurement

Drugs & pharmacology

Drug and pharmacology data

Is drug target: no

PharmGKB: 1 entry (VIP=true, CPIC=false)

CTD chemical–gene interactions

25 total (human), top 25 by PubMed support.

ChemicalActions (top 5)PubMed papers
bisphenol Aaffects expression, decreases expression2
sodium arseniteincreases expression, decreases expression2
TAK-243increases sumoylation1
dicrotophosdecreases expression1
arseniteaffects binding, increases reaction1
tris(1,3-dichloro-2-propyl)phosphatedecreases expression1
4-aminophenylarsenoxideaffects binding, decreases reaction1
beta-methylcholineaffects expression1
di-n-butylphosphoric acidaffects expression1
K 7174decreases expression1
abrinedecreases expression1
jinfukangincreases expression1
Resveratrolaffects cotreatment, increases expression1
Temozolomideincreases expression1
Arsenic Trioxideaffects binding, decreases reaction1
Acetaminophendecreases expression1
Dichlorodiphenyl Dichloroethyleneincreases expression1
Doxorubicinincreases expression1
Enzyme Inhibitorsdecreases activity, increases O-linked glycosylation1
Ivermectindecreases expression1
Plant Extractsaffects cotreatment, increases expression1
Rotenonedecreases expression1
Thiramdecreases expression1
Tretinoindecreases expression1
Aflatoxin B1decreases methylation1

Cellosaurus cell lines

1 cell lines: 1 cancer cell line

First 10 cell lines (id-ordered, not curated):

CellosaurusNameCategorySex
CVCL_B2IIAbcam HeLa THOC3 KOCancer cell lineFemale

Clinical trials (associated diseases)

0 trials via MONDO — disease-level, not drug-specific.

No linked Atlas pages yet — the cross-entity mesh grows as the corpus expands.