GPA33

gene
On this page

Also known as A33

Summary

GPA33 (glycoprotein A33, HGNC:4445) is a protein-coding gene on chromosome 1q24.1, encoding Cell surface A33 antigen (Q99795). May play a role in cell-cell recognition and signaling.

The glycoprotein encoded by this gene is a cell surface antigen that is expressed in greater than 95% of human colon cancers. The open reading frame encodes a 319-amino acid polypeptide having a putative secretory signal sequence and 3 potential glycosylation sites. The predicted mature protein has a 213-amino acid extracellular region, a single transmembrane domain, and a 62-amino acid intracellular tail. The sequence of the extracellular region contains 2 domains characteristic of the CD2 subgroup of the immunoglobulin (Ig) superfamily.

Source: NCBI Gene 10223 — RefSeq curated summary.

At a glance

  • GWAS associations: 1
  • Clinical variants (ClinVar): 61 total
  • Druggable target: yes
  • MANE Select transcript: NM_005814

Identifiers

Gene identifiers

FieldValue
HGNC IDHGNC:4445
Approved symbolGPA33
Nameglycoprotein A33
Location1q24.1
Locus typegene with protein product
StatusApproved
AliasesA33
Ensembl geneENSG00000143167
Ensembl biotypeprotein_coding
OMIM602171
Entrez10223

Gene structure

Transcript identifiers

Ensembl transcripts: 14 — 12 protein_coding, 1 protein_coding_CDS_not_defined, 1 nonsense_mediated_decay

ENST00000367868, ENST00000527955, ENST00000534512, ENST00000632571, ENST00000903063, ENST00000903064, ENST00000903065, ENST00000903066, ENST00000903067, ENST00000903068, ENST00000903069, ENST00000903070, ENST00000903071, ENST00000903072

RefSeq mRNA: 1 — MANE Select: NM_005814 NM_005814

CCDS: CCDS1258

Canonical transcript exons

ENST00000367868 — 7 exons

ExonStartEnd
ENSE00001445813167090245167090377
ENSE00003466883167068922167069138
ENSE00003509770167063582167063737
ENSE00003566010167073385167073539
ENSE00003627151167054976167055111
ENSE00003658709167055730167055849
ENSE00003661478167052836167054466

Expression profiles

Bgee: expression breadth ubiquitous, 110 present calls, max score 99.12.

FANTOM5 (CAGE): breadth tissue_specific, TPM avg 0.9094 / max 306.4906, expressed in 103 samples.

FANTOM5 promoters (6 alternative TSS)

Promoter IDTPM avgSamples expressed
157580.435094
157530.273116
157540.104211
157570.05639
157550.03219
157560.00868

Top tissues by expression

260 total, by Bgee expression score (0-100, higher = more expressed):

TissueAnatomy IDExpression scoreQuality
ileal mucosaUBERON:000033199.12gold quality
mucosa of transverse colonUBERON:000499199.04gold quality
colonic mucosaUBERON:000031798.86gold quality
rectumUBERON:000105298.72gold quality
jejunal mucosaUBERON:000039998.68gold quality
mucosa of sigmoid colonUBERON:000499398.63gold quality
duodenumUBERON:000211494.69gold quality
transverse colonUBERON:000115789.47gold quality
small intestine Peyer’s patchUBERON:000345489.05gold quality
small intestineUBERON:000210888.22gold quality
vermiform appendixUBERON:000115486.17gold quality
caecumUBERON:000115383.90gold quality
intestineUBERON:000016081.37gold quality
colonic epitheliumUBERON:000039779.69gold quality
large intestineUBERON:000005979.20gold quality
colonUBERON:000115578.34gold quality
jejunumUBERON:000211577.08gold quality
granulocyteCL:000009472.63gold quality
bloodUBERON:000017868.65gold quality
lymph nodeUBERON:000002965.85gold quality
smooth muscle tissueUBERON:000113565.77gold quality
deciduaUBERON:000245064.91gold quality
primordial germ cell in gonadCL:0000670 ∩ UBERON:000099163.87gold quality
endothelial cellCL:000011563.54gold quality
germinal epithelium of ovaryUBERON:000130463.32gold quality
spermCL:000001963.05gold quality
apex of heartUBERON:000209862.57gold quality
male germ cellCL:000001562.15gold quality
leukocyteCL:000073861.94gold quality
sigmoid colonUBERON:000115961.88gold quality

Single-cell (SCXA)

Detected in 2 experiment(s), a significant marker in 2.

ExperimentMarker?Max mean expression
E-GEOD-125970yes28.80
E-ANND-3yes10.60

Regulation

Is transcription factor: no

Upstream regulators (CollecTRI, top): CDX1, KLF4, PPARA, PPARG

miRNA regulators (miRDB)

57 targeting GPA33, top 30 by miRDB confidence (max_score; target_count = how many genes the miRNA targets in total — lower means more specific):

miRNAMax scoreAvg scoremiRNA target_count
HSA-MIR-5011-5P100.0083.465820
HSA-MIR-190A-3P100.0080.355520
HSA-MIR-4795-3P100.0074.624024
HSA-MIR-513A-5P100.0069.772465
HSA-MIR-4481100.0066.421669
HSA-MIR-6867-5P100.0082.213464
HSA-MIR-126-5P100.0072.713180
HSA-MIR-4745-5P99.9865.951028
HSA-MIR-6793-5P99.9765.95758
HSA-MIR-7152-3P99.9767.47849
HSA-MIR-4725-3P99.9669.532520
HSA-MIR-6780B-5P99.9669.602562
HSA-MIR-551B-5P99.9671.283493
HSA-MIR-338-5P99.9272.342951
HSA-MIR-990299.8969.152250
HSA-MIR-153-5P99.8973.866317
HSA-MIR-427199.8868.322244
HSA-MIR-442299.7272.072908
HSA-MIR-430699.7270.503630
HSA-MIR-608699.7065.38699
HSA-MIR-1249-5P99.6166.552049
HSA-MIR-6797-5P99.6166.552084
HSA-MIR-1213299.4768.901341
HSA-MIR-449899.4767.422360
HSA-MIR-616599.4467.121389
HSA-MIR-569799.3967.741249
HSA-MIR-391199.3866.951087
HSA-MIR-520E-5P99.2768.901513
HSA-MIR-3922-3P99.2564.961136
HSA-MIR-397899.2468.392201

Literature-anchored findings (GeneRIF, showing 10)

  • cloning, chromosomal localizations, exon-intron structures and transcription start sites - target gene for the intestine-specific homeobox transcription factor, CDX1. (PMID:12114523)
  • Regulation of A33 antigen expression by GKLF. (PMID:12853980)
  • This work has demonstrated that the antigen is both highly immobile and extremely persistent-retaining its surface localization for a turnover halflife of greater than 2 days. (PMID:18236042)
  • The A33-dependent incorporation of B5 into extracellular enveloped vaccinia virions is mediated through an interaction between their lumenal domains. (PMID:22623777)
  • Increased interaction between vaccinia virus proteins A33 and B5 is detrimental to infectious extracellular enveloped virion production. (PMID:22623782)
  • Data indicate that EpCAM- and A33-Exos were released from the apical and basolateral surfaces, respectively. (PMID:23230278)
  • Data indicate positive staining was observed for both A33 expression and KRN330 binding from patients biopsied on Day 8, 9 or 14 following initial dosing of 3 mg/kg. (PMID:23294608)
  • A33 expression is directly correlated to colorectal tumor differentiation. (PMID:27272411)
  • A33 shows similar sensitivity to but is more specific than CDX2 as an immunomarker of colorectal carcinoma (PMID:28226180)
  • A panel of intestinal differentiation markers (CDX2, GPA33, and LI-cadherin) identifies gastric cancer patients with favourable prognosis. (PMID:32215766)

Cross-species orthologs

3 orthologs

OrganismSymbolGene ID
danio_reriogpa33bENSDARG00000040898
mus_musculusGpa33ENSMUSG00000000544
rattus_norvegicusGpa33ENSRNOG00000003740

Paralogs (14): VSIG2 (ENSG00000019102), VSIG1 (ENSG00000101842), VSIR (ENSG00000107738), IGSF11 (ENSG00000144847), ESAM (ENSG00000149564), CXADR (ENSG00000154639), JAM2 (ENSG00000154721), F11R (ENSG00000158769), MXRA8 (ENSG00000162576), JAM3 (ENSG00000166086), CLMP (ENSG00000166250), MUC15 (ENSG00000169550), VSTM2B (ENSG00000187135), VSIG8 (ENSG00000243284)

Protein

Protein identifiers

Cell surface A33 antigenQ99795 (reviewed: Q99795)

Alternative names: Glycoprotein A33

All UniProt accessions (3): Q99795, A0A0J9YXH7, E9PMB2

UniProt curated annotations — full annotation on UniProt →

Function. May play a role in cell-cell recognition and signaling.

Subcellular location. Membrane.

Tissue specificity. Expressed in normal gastrointestinal epithelium and in 95% of colon cancers.

Post-translational modifications. N-glycosylated, contains approximately 8 kDa of N-linked carbohydrate. Palmitoylated.

RefSeq proteins (1): NP_005805* (*=MANE)

Domains & families (InterPro)

IDNameType
IPR003598Ig_sub2Domain
IPR003599Ig_subDomain
IPR007110Ig-like_domDomain
IPR013106Ig_V-setDomain
IPR013783Ig-like_foldHomologous_superfamily
IPR036179Ig-like_dom_sfHomologous_superfamily
IPR042474A33Family

Pfam: PF07686, PF13927

UniProt features (18 total): glycosylation site 3, disulfide bond 3, sequence variant 2, topological domain 2, domain 2, compositionally biased region 2, signal peptide 1, chain 1, transmembrane region 1, region of interest 1

Structure

Experimental structures (PDB)

0 structures.

Predicted structure (AlphaFold)

ModelpLDDTFraction very-high
AF-Q99795-F184.470.67

Functional residue map

Curated UniProt residues grouped by drug-discovery relevance — catalytic, ligand-binding, modification, and mutation-validated positions. Source: UniProtKB sequence features.

Disulfide bonds (3): 43–117, 146–222, 162–211

Glycosylation sites (3): 112, 200, 223

Function

Pathways and Gene Ontology

Reactome pathways

0 pathways

MSigDB gene sets: 128 (showing top): VERHAAK_AML_WITH_NPM1_MUTATED_DN, MYOGENIN_Q6, MODULE_45, MODULE_64, MODULE_16, CHANG_IMMORTALIZED_BY_HPV31_DN, MODULE_88, MODULE_113, ELK1_01, MORF_PDPK1, MODULE_18, MODULE_11, MODULE_60, MODULE_38, LEE_RECENT_THYMIC_EMIGRANT

GO Biological Process (0):

GO Molecular Function (2): signaling receptor activity (GO:0038023), protein binding (GO:0005515)

GO Cellular Component (3): plasma membrane (GO:0005886), extracellular exosome (GO:0070062), membrane (GO:0016020)

GO top-level categories

Rollup of top GO terms by namespace:

CategoryTerms
molecular transducer activity1
binding1
membrane1
cell periphery1
extracellular vesicle1
cellular anatomical structure1

Protein interactions and networks

STRING

918 interactions, top by confidence (×1000):

Protein AProtein BPartner UniProtScore
GPA33HSF4Q9ULV5810
GPA33CYP27A1Q02318766
GPA33EPCAMP16422590
GPA33EGFRP00533503
GPA33FRMD4BQ9Y2L6461
GPA33GSTA3Q16772443
GPA33TM4SF20Q53R12441
GPA33CD24P25063415
GPA33SCAMP2O15127413
GPA33BRD3Q15059410
GPA33PARP4Q9UKK3410
GPA33FOLH1Q04609397
GPA33NADKO95544382
GPA33TNFAIP3P21580371
GPA33FCER2P06734370

IntAct

29 interactions, top by confidence:

ABTypeScore
UPK2GPA33psi-mi:“MI:0915”(physical association)0.560
NRSN1GPA33psi-mi:“MI:0915”(physical association)0.560
ENTPD3GPA33psi-mi:“MI:0915”(physical association)0.560
EMP3GPA33psi-mi:“MI:0915”(physical association)0.560
MALLGPA33psi-mi:“MI:0915”(physical association)0.560
GPA33SMCO4psi-mi:“MI:0915”(physical association)0.560
NSG1GPA33psi-mi:“MI:0915”(physical association)0.560
POT1GPA33psi-mi:“MI:0915”(physical association)0.510
GPA33CTDSP2psi-mi:“MI:0915”(physical association)0.400
SDCBPpsi-mi:“MI:0914”(association)0.350
GPA33ATE1psi-mi:“MI:0914”(association)0.350
IPO5psi-mi:“MI:0914”(association)0.350
MYO1Cpsi-mi:“MI:0914”(association)0.350
POT1GPA33psi-mi:“MI:0915”(physical association)0.000
UPK2GPA33psi-mi:“MI:0915”(physical association)0.000
NRSN1GPA33psi-mi:“MI:0915”(physical association)0.000
ENTPD3GPA33psi-mi:“MI:0915”(physical association)0.000
EMP3GPA33psi-mi:“MI:0915”(physical association)0.000
NSG1GPA33psi-mi:“MI:0915”(physical association)0.000
MALLGPA33psi-mi:“MI:0915”(physical association)0.000
SMCO4GPA33psi-mi:“MI:0915”(physical association)0.000

BioGRID (16): GPA33 (Affinity Capture-MS), GPA33 (Affinity Capture-RNA), GPA33 (Two-hybrid), GPA33 (Two-hybrid), MALL (Two-hybrid), NRSN1 (Two-hybrid), SMCO4 (Two-hybrid), ENTPD3 (Two-hybrid), NSG1 (Two-hybrid), CTDSP2 (Affinity Capture-MS), NSF (Affinity Capture-MS), KLHL22 (Affinity Capture-MS), ATE1 (Affinity Capture-MS), DNAJC11 (Affinity Capture-MS), GPX4 (Affinity Capture-MS)

ESM2 similar proteins: A0A0R4IGV4, A0A8M2B818, A0JM41, A2VD98, B0CLX4, B6ZK77, D3YX43, F1LW30, O00241, O18906, O54901, O88775, O95256, P00545, P04218, P0C673, P10522, P13369, P17948, P21995, P27931, P35916, P35917, P35969, P37301, P42071, P42703, P53767, Q08DK1, Q15762, Q58EG3, Q5DX21, Q5FWR8, Q5R412, Q5U2P2, Q5VJ70, Q6GMZ9, Q6PCB8, Q6X936, Q7TSN7

Diamond homologs: O08859, O14594, P03994, P07354, P07897, P07898, P10859, P10915, P13608, P13611, P16112, P41725, P55066, P55067, P55068, P55252, P81282, P98065, P98066, Q28062, Q28343, Q28381, Q28670, Q28858, Q29011, Q5IS41, Q61282, Q61361, Q62059, Q80WM4, Q80WM5, Q86UW8, Q8CFM6, Q8R4U0, Q8R4Y4, Q8WWQ8, Q90953, Q96GW7, Q96S86, Q99795

SIGNOR signaling

0 interactions.

Disease & clinical

Clinical variants and AI predictions

ClinVar

61 variants total. Per-class counts are floors (≥ shown; pagination cap):

ClassificationCount (floor)
Pathogenic0
Likely pathogenic0
Uncertain significance51
Likely benign5
Benign0

Top pathogenic / likely-pathogenic (0)

SpliceAI

2870 predictions. Top by Δscore:

VariantEffectΔscore
1:167054464:TTCCT:Tacceptor_loss1.0000
1:167054465:TC:Tacceptor_gain1.0000
1:167054465:TCCTG:Tacceptor_loss1.0000
1:167054466:CC:Cacceptor_gain1.0000
1:167054467:C:CCacceptor_gain1.0000
1:167054476:A:Tacceptor_gain1.0000
1:167054478:C:CTacceptor_gain1.0000
1:167054479:A:Tacceptor_gain1.0000
1:167054974:A:ACdonor_gain1.0000
1:167054974:ACGGC:Adonor_gain1.0000
1:167054975:C:CCdonor_gain1.0000
1:167054975:CG:Cdonor_gain1.0000
1:167054975:CGG:Cdonor_gain1.0000
1:167054975:CGGCC:Cdonor_gain1.0000
1:167054985:T:TAdonor_gain1.0000
1:167054988:T:TAdonor_gain1.0000
1:167055108:GAGG:Gacceptor_gain1.0000
1:167055110:GG:Gacceptor_gain1.0000
1:167055112:C:CCacceptor_gain1.0000
1:167055728:A:ACdonor_gain1.0000
1:167055729:C:CCdonor_gain1.0000
1:167055729:CGAG:Cdonor_gain1.0000
1:167055848:GGC:Gacceptor_loss1.0000
1:167055850:C:CAacceptor_loss1.0000
1:167055850:C:CCacceptor_gain1.0000
1:167055851:T:Cacceptor_loss1.0000
1:167063579:TA:Tdonor_loss1.0000
1:167063581:CC:Cdonor_loss1.0000
1:167063581:CCTGG:Cdonor_gain1.0000
1:167063733:TGGCA:Tacceptor_gain1.0000

AlphaMissense

2077 scored. Top likely-pathogenic:

VariantProtein changeam_pathogenicity
1:167063628:C:AW175C0.998
1:167063628:C:GW175C0.998
1:167073409:C:AW58C0.996
1:167073409:C:GW58C0.996
1:167055789:C:GC211S0.995
1:167055790:A:TC211S0.995
1:167068987:C:GC117S0.994
1:167068988:A:TC117S0.994
1:167063630:A:GW175R0.992
1:167063630:A:TW175R0.992
1:167063668:C:GC162S0.992
1:167063669:A:TC162S0.992
1:167055788:A:CC211W0.991
1:167055789:C:TC211Y0.990
1:167068988:A:GC117R0.990
1:167073411:A:GW58R0.990
1:167073411:A:TW58R0.990
1:167055790:A:GC211R0.989
1:167068986:A:CC117W0.989
1:167068994:A:CY115D0.988
1:167063669:A:GC162R0.987
1:167063668:C:TC162Y0.986
1:167063667:G:CC162W0.984
1:167069005:T:AD111V0.983
1:167063674:A:GL160P0.982
1:167063716:C:GC146S0.981
1:167063717:A:GC146R0.981
1:167063717:A:TC146S0.981
1:167073455:C:GC43S0.981
1:167073456:A:TC43S0.981

dbSNP variants (sampled 300 via entrez): RS1000008862 (1:167090573 C>T), RS1000076985 (1:167079455 G>A,T), RS1000106831 (1:167059168 A>C), RS1000169540 (1:167058168 C>T), RS1000228067 (1:167090944 C>T), RS1000267778 (1:167052899 C>T), RS1000277193 (1:167063859 T>G), RS1000336251 (1:167090307 C>T), RS1000370211 (1:167080166 A>G), RS1000431425 (1:167085638 C>T), RS1000486260 (1:167079856 A>T), RS1000551570 (1:167086428 C>A,T), RS1000607244 (1:167056170 G>A), RS1000616389 (1:167085362 G>A,C), RS1000703905 (1:167081231 G>T)

Disease associations

OMIM: gene MIM:602171 | disease phenotypes:

GenCC curated gene-disease

Mondo (0):

Orphanet (0):

HPO phenotypes

0 total (0 of 0 shown, HPO-id order):

GWAS associations

1 associations (top):

StudyTraitp-value
GCST009249_1Eosinophilic granulomatosis with polyangiitis (ANCA negative)7.000000e-10

Drugs & pharmacology

Drug and pharmacology data

Is drug target: yes

ChEMBL targets (1): CHEMBL3712927 (SINGLE PROTEIN)

PharmGKB: 1 entry (VIP=true, CPIC=false)

CTD chemical–gene interactions

13 total (human), top 13 by PubMed support.

ChemicalActions (top 5)PubMed papers
(+)-JQ1 compounddecreases expression2
Cadmium Chlorideincreases expression2
GSK-J4decreases expression1
chloroacetaldehydedecreases expression1
benzo(e)pyreneincreases methylation1
S-(1,2-dichlorovinyl)cysteineaffects response to substance, increases expression1
3-hydroxy-4-prenyl-5-methoxystilbene-2-carboxylic aciddecreases expression1
Cidofovirincreases expression1
Lipopolysaccharidesdecreases expression, affects response to substance, increases expression1
Methapyrileneincreases methylation1
Tobacco Smoke Pollutiondecreases expression1
Tretinoinincreases expression1
Antirheumatic Agentsdecreases expression1

Cellosaurus cell lines

3 cell lines: 1 spontaneously immortalized cell line, 1 cancer cell line, 1 transformed cell line

First 10 cell lines (id-ordered, not curated):

CellosaurusNameCategorySex
CVCL_E6QLGenomeditech CHO-K1 H_GPA33Spontaneously immortalized cell lineFemale
CVCL_E6T2Genomeditech CT26 H_GPA33Cancer cell lineFemale
CVCL_E6TZGenomeditech HEK-293 H_GPA33Transformed cell lineFemale

Clinical trials (associated diseases)

0 trials via MONDO — disease-level, not drug-specific.