Provide a comprehensive cross-database identifier and functional mapping reference for human INS — a definitive lookup resource covering:

### Section 1: Gene identifiers
For human gene INS, list ALL gene-level database identifiers.

Required:
- HGNC ID and approved symbol
- Ensembl gene ID (ENSG...)
- NCBI Entrez Gene ID
- OMIM gene/locus ID
- Genomic location: chromosome, start position, end position, strand (GRCh38)

### Section 2: Transcript identifiers
For human gene INS, list ALL transcript-level identifiers.

Required:
- Ensembl transcripts: ALL ENST IDs with biotype. Total count.
- RefSeq transcripts: ALL NM_ mRNA accessions. Mark which is MANE Select.
- CCDS IDs.
- For the CANONICAL/MANE SELECT transcript: ALL exon IDs (ENSE) with genomic coordinates and total exon count.

### Section 3: Protein identifiers
For human gene INS protein product(s), list ALL protein-level identifiers.

Required:
- UniProt accessions: ALL entries (reviewed and unreviewed). Mark the canonical reviewed entry.
- RefSeq protein: ALL NP_ accessions.
- Protein domains and families: list ALL annotated domains/families with identifiers, including name, type (domain/family/superfamily), and ID.
- Antibody availability: known antibody resources for the protein.

### Section 4: Structure
For human gene INS protein, list ALL structural data.

Required:
- Experimental structures: ALL PDB IDs. For each: experimental method (X-ray/NMR/Cryo-EM) and resolution. Total count.
- Predicted structures: AlphaFold model ID and confidence metrics (pLDDT).

### Section 5: Cross-species orthologs
For human gene INS, list orthologous genes in key model organisms.

Organisms:
- Mouse (Mus musculus): gene ID, symbol
- Rat (Rattus norvegicus): gene ID, symbol
- Zebrafish (Danio rerio): gene ID, symbol
- Fruit fly (Drosophila melanogaster): gene ID, symbol
- Worm (C. elegans): gene ID, symbol
- Yeast (S. cerevisiae): gene ID, symbol

### Section 6: Clinical variants & AI predictions
For human gene INS, summarize clinical variants and AI predictions.

Clinical variant annotations (ClinVar):
- Total variant count (approximate is fine)
- Breakdown by classification: Pathogenic, Likely Pathogenic, VUS, Likely Benign, Benign
- TOP 30 pathogenic/likely pathogenic variants with: variant ID, HGVS notation, associated condition

AI-based variant effect predictions:
- Splice effect predictions: total count + TOP 30 with delta scores if known
- Missense pathogenicity from AlphaMissense — total count + TOP 30 likely-pathogenic with am_pathogenicity scores.

### Section 7: Pathways & Gene Ontology
For human gene INS, list biological pathways and Gene Ontology annotations.

Pathway membership:
- ALL biological pathways this gene participates in, with pathway IDs and names
- Total pathway count

Gene Ontology:
- Biological Process: count and TOP 20 terms with GO IDs
- Molecular Function: count and TOP 20 terms with GO IDs
- Cellular Component: count and TOP 20 terms with GO IDs

### Section 8: Protein interactions & networks
For human gene INS protein, summarize protein interactions and networks.

Protein-protein interactions (STRING, IntAct, BioGRID, etc.):
- Total interaction count (approximate)
- TOP 30 highest-confidence interacting proteins with scores/evidence

Protein similarity:
- Structural/embedding similarity (e.g. Foldseek, ESM): TOP 20 similar proteins with scores
- Sequence homology: TOP 20 homologous proteins with identity/similarity

### Section 9: Transcription factor regulatory data
For human gene INS, summarize transcription factor regulatory data.

If INS is a transcription factor:
- Downstream targets: total count + TOP 30 with regulation type (activates/represses) and evidence
- DNA binding motifs from JASPAR — all known motif IDs and motif family classification.

Regardless:
- Upstream regulators: TFs that regulate INS — names with evidence type (ChIP-seq / predicted / experimentally validated)

If INS is not a transcription factor, say so briefly and skip the downstream/motif sections.

### Section 10: Drug & pharmacology data
For human gene INS protein as a drug target, summarize pharmacology data.

If INS is a known drug target:
- Targeting molecules: total count in ChEMBL/DrugBank + TOP 30 by development phase (molecule ID, name, mechanism, highest phase)
- Clinical trials: TOP 20 involving drugs targeting this gene — trial ID, phase, status, intervention
- Pharmacogenomics: known drug-gene interactions affecting drug response + dosing guidelines if any

If INS is not currently a drug target, say so briefly.

### Section 11: Expression profiles
For human gene INS, summarize expression profiles.

Tissue expression (GTEx, HPA, Bgee, etc.):
- TOP 30 tissues with expression scores/levels (direction, units if known)
- Note tissue-specific or tissue-enriched patterns

Cell type expression (Tabula Sapiens, HCA, etc.):
- TOP 30 cell types with expression scores
- Note cell-type-specific patterns

Single-cell expression: notable datasets or cell populations of interest for this gene.

### Section 12: Disease associations
For human gene INS, summarize disease associations.

Mendelian / monogenic disease:
- Diseases caused by mutations in INS: disease name, disease ID (OMIM/Orphanet/Mondo), inheritance pattern, evidence level
- Include all directly linked conditions

Phenotype associations:
- Clinical phenotypes associated with the gene (HPO terms where known)
- TOP 30 phenotype terms with HPO IDs

Complex-disease / GWAS:
- Traits and diseases significantly associated via GWAS: trait name, variant, effect size, study where known
- TOP 30 GWAS associations

Question

Provide a comprehensive cross-database identifier and functional mapping reference for human INS — a definitive lookup resource covering:

### Section 1: Gene identifiers
For human gene INS, list ALL gene-level database identifiers.

Required:
- HGNC ID and approved symbol
- Ensembl gene ID (ENSG...)
- NCBI Entrez Gene ID
- OMIM gene/locus ID
- Genomic location: chromosome, start position, end position, strand (GRCh38)

### Section 2: Transcript identifiers
For human gene INS, list ALL transcript-level identifiers.

Required:
- Ensembl transcripts: ALL ENST IDs with biotype. Total count.
- RefSeq transcripts: ALL NM_ mRNA accessions. Mark which is MANE Select.
- CCDS IDs.
- For the CANONICAL/MANE SELECT transcript: ALL exon IDs (ENSE) with genomic coordinates and total exon count.

### Section 3: Protein identifiers
For human gene INS protein product(s), list ALL protein-level identifiers.

Required:
- UniProt accessions: ALL entries (reviewed and unreviewed). Mark the canonical reviewed entry.
- RefSeq protein: ALL NP_ accessions.
- Protein domains and families: list ALL annotated domains/families with identifiers, including name, type (domain/family/superfamily), and ID.
- Antibody availability: known antibody resources for the protein.

### Section 4: Structure
For human gene INS protein, list ALL structural data.

Required:
- Experimental structures: ALL PDB IDs. For each: experimental method (X-ray/NMR/Cryo-EM) and resolution. Total count.
- Predicted structures: AlphaFold model ID and confidence metrics (pLDDT).

### Section 5: Cross-species orthologs
For human gene INS, list orthologous genes in key model organisms.

Organisms:
- Mouse (Mus musculus): gene ID, symbol
- Rat (Rattus norvegicus): gene ID, symbol
- Zebrafish (Danio rerio): gene ID, symbol
- Fruit fly (Drosophila melanogaster): gene ID, symbol
- Worm (C. elegans): gene ID, symbol
- Yeast (S. cerevisiae): gene ID, symbol

### Section 6: Clinical variants & AI predictions
For human gene INS, summarize clinical variants and AI predictions.

Clinical variant annotations (ClinVar):
- Total variant count (approximate is fine)
- Breakdown by classification: Pathogenic, Likely Pathogenic, VUS, Likely Benign, Benign
- TOP 30 pathogenic/likely pathogenic variants with: variant ID, HGVS notation, associated condition

AI-based variant effect predictions:
- Splice effect predictions: total count + TOP 30 with delta scores if known
- Missense pathogenicity from AlphaMissense — total count + TOP 30 likely-pathogenic with am_pathogenicity scores.

### Section 7: Pathways & Gene Ontology
For human gene INS, list biological pathways and Gene Ontology annotations.

Pathway membership:
- ALL biological pathways this gene participates in, with pathway IDs and names
- Total pathway count

Gene Ontology:
- Biological Process: count and TOP 20 terms with GO IDs
- Molecular Function: count and TOP 20 terms with GO IDs
- Cellular Component: count and TOP 20 terms with GO IDs

### Section 8: Protein interactions & networks
For human gene INS protein, summarize protein interactions and networks.

Protein-protein interactions (STRING, IntAct, BioGRID, etc.):
- Total interaction count (approximate)
- TOP 30 highest-confidence interacting proteins with scores/evidence

Protein similarity:
- Structural/embedding similarity (e.g. Foldseek, ESM): TOP 20 similar proteins with scores
- Sequence homology: TOP 20 homologous proteins with identity/similarity

### Section 9: Transcription factor regulatory data
For human gene INS, summarize transcription factor regulatory data.

If INS is a transcription factor:
- Downstream targets: total count + TOP 30 with regulation type (activates/represses) and evidence
- DNA binding motifs from JASPAR — all known motif IDs and motif family classification.

Regardless:
- Upstream regulators: TFs that regulate INS — names with evidence type (ChIP-seq / predicted / experimentally validated)

If INS is not a transcription factor, say so briefly and skip the downstream/motif sections.

### Section 10: Drug & pharmacology data
For human gene INS protein as a drug target, summarize pharmacology data.

If INS is a known drug target:
- Targeting molecules: total count in ChEMBL/DrugBank + TOP 30 by development phase (molecule ID, name, mechanism, highest phase)
- Clinical trials: TOP 20 involving drugs targeting this gene — trial ID, phase, status, intervention
- Pharmacogenomics: known drug-gene interactions affecting drug response + dosing guidelines if any

If INS is not currently a drug target, say so briefly.

### Section 11: Expression profiles
For human gene INS, summarize expression profiles.

Tissue expression (GTEx, HPA, Bgee, etc.):
- TOP 30 tissues with expression scores/levels (direction, units if known)
- Note tissue-specific or tissue-enriched patterns

Cell type expression (Tabula Sapiens, HCA, etc.):
- TOP 30 cell types with expression scores
- Note cell-type-specific patterns

Single-cell expression: notable datasets or cell populations of interest for this gene.

### Section 12: Disease associations
For human gene INS, summarize disease associations.

Mendelian / monogenic disease:
- Diseases caused by mutations in INS: disease name, disease ID (OMIM/Orphanet/Mondo), inheritance pattern, evidence level
- Include all directly linked conditions

Phenotype associations:
- Clinical phenotypes associated with the gene (HPO terms where known)
- TOP 30 phenotype terms with HPO IDs

Complex-disease / GWAS:
- Traits and diseases significantly associated via GWAS: trait name, variant, effect size, study where known
- TOP 30 GWAS associations

Accepted Answer

Executive summary INS (HGNC:6081) encodes human insulin, the 110-amino acid pancreatic peptide hormone that is the master regulator of glucose homeostasis and one of the most clinically consequential proteins in medicine. Expressed almost exclusively in pancreatic beta cells and islets of Langerhans (expression score 100.00), insulin acts through the insulin receptor (INSR) to drive glucose uptake, glycogen synthesis, and lipid metabolism across 14 Reactome pathways. The gene harbors ~170 …

Identifier	Value
HGNC ID	HGNC:6081
Approved symbol	INS
Ensembl gene ID	ENSG00000254647
NCBI Entrez Gene ID	3630
OMIM gene ID	176730
Chromosome	11
Start position (GRCh38)	2,159,779
End position (GRCh38)	2,161,221
Strand	− (minus)

Transcript ID	Biotype
ENST00000250971	protein_coding
ENST00000381330	protein_coding
ENST00000397262	protein_coding
ENST00000421783	non_stop_decay
ENST00000512523	protein_coding

Accession	MANE Select
NM_000207	✓ Yes
NM_001185097	No
NM_001185098	No
NM_001291897	No

Exon ID	Start	End	Chromosome	Strand
ENSE00001938789	2161168	2161209	11	−
ENSE00003494357	2160785	2160988	11	−
ENSE00003901829	2159779	2159997	11	−

Organism	Gene ID	Gene Symbol
Mouse (Mus musculus)	16333, 16334	Ins1, Ins2
Rat (Rattus norvegicus)	24505, 24506	Ins1, Ins2
Zebrafish (Danio rerio)	30262, 566735	ins, insb
Fruit fly (Drosophila melanogaster)	39149, 39150	Ilp1, Ilp2
Worm (C. elegans)	none	—
Yeast (S. cerevisiae)	none	—

Classification	Count
Benign	~50
Likely Benign	~30
Uncertain Significance (VUS)	~45
Likely Pathogenic	~28
Pathogenic	~17
Total	~170

ClinVar ID	HGVS Notation	Classification
1455986	NM_000207.3(INS):c.1A>G (p.Met1Val)	Pathogenic/Likely pathogenic
1457228	NC_000011.9:g.(?2181023)(2193087_?)del	Pathogenic
1459937	NC_000011.9:g.(?2181023)(2182533_?)del	Pathogenic
13378	NM_000207.3(INS):c.143T>C (p.Phe48Ser)	Pathogenic
13382	NM_000207.3(INS):c.266G>T (p.Arg89Leu)	Pathogenic
13383	NM_000207.3(INS):c.266G>C (p.Arg89Pro)	Pathogenic
13389	NM_000207.3(INS):c.143T>G (p.Phe48Cys)	Likely pathogenic/Likely risk allele
13392	NM_000207.3(INS):c.163C>T (p.Arg55Cys)	Pathogenic/Likely pathogenic/Likely risk allele
21122	NM_000207.3(INS):c.94G>A (p.Gly32Ser)	Pathogenic/Likely pathogenic
211186	NM_000207.3(INS):c.188-31G>A	Pathogenic/Likely risk allele
253331	NM_000207.3(INS):c.125T>C (p.Val42Ala)	Pathogenic/Likely pathogenic
431442	NM_000207.3(INS):c.-152C>A	Pathogenic/Likely pathogenic
431443	NM_000207.3(INS):c.-152C>G	Pathogenic
1162205	NM_000207.3(INS):c.115C>T (p.Leu39Phe)	Likely pathogenic
1336487	NM_000207.3(INS):c.289A>C (p.Thr97Pro)	Likely pathogenic
1338622	NM_000207.3(INS):c.95G>T (p.Gly32Val)	Likely pathogenic
1338640	NM_000207.3(INS):c.103C>G (p.Leu35Val)	Likely pathogenic
1526009	NM_000207.3(INS):c.293G>T (p.Ser98Ile)	Likely pathogenic
1526010	NM_000207.3(INS):c.322T>G (p.Tyr108Asp)	Likely pathogenic
1526012	NM_000207.3(INS):c.101A>C (p.His34Pro)	Likely pathogenic
1526013	NM_000207.3(INS):c.103C>A (p.Leu35Met)	Likely pathogenic
1801850	NM_000207.3(INS):c.155C>G (p.Pro52Arg)	Likely pathogenic
2630345	NM_000207.3(INS):c.136C>T (p.Arg46Ter)	Likely pathogenic
2631502	NM_000207.3(INS):c.284G>A (p.Cys95Tyr)	Likely pathogenic
3393374	NM_000207.3(INS):c.283T>C (p.Cys95Arg)	Likely pathogenic
36401	NM_000207.3(INS):c.71C>T (p.Ala24Val)	Likely pathogenic
3773933	NM_000207.3(INS):c.129C>G (p.Cys43Trp)	Likely pathogenic
65581	NM_000207.3(INS):c.*59A>G	Likely pathogenic
916729	NM_000207.3(INS):c.174del (p.Glu59fs)	Likely pathogenic
931331	NM_001042376.3(INS-IGF2):c.155C>T (p.Pro52Leu)	Pathogenic

Protein Variant	am_pathogenicity	Protein Variant	am_pathogenicity
C109W	0.999	C100W	0.998
C109F	0.999	C100R	0.998
C109Y	0.999	C100S	0.998
Y108C	0.999	Y103C	0.920
C109G	0.983	L105P	0.997
C109R	0.996	L105R	0.983
C109S	0.998	L102H	0.961
C96W	0.997	L102P	0.940
C96Y	0.998	Q104P	0.957
C96R	0.995	S101F	0.979
N110K	0.970	E106V	0.972
N110I	0.950	Y108D	0.992
N110Y	0.835	Y108H	0.994
C95W	0.998	C96F	0.996
C95R	0.997	C95F	0.997
Y108S	0.985	Y108N	0.988
E106D	0.965	E106K	0.910
L105Q	0.991	C100F	0.997
L105M	0.797	L105V	0.814
S101C	0.960	S101Y	0.933
C96S	0.997	E106A	0.836
C95Y	0.997	Q104H	0.605
S101P	0.882	N110D	0.595
S101A	0.708	R89P	0.861
L102R	0.877	G90C	0.990
C100G	0.991	C43Y	0.964
C95S	0.998	L39P	0.962
C95G	0.976	F48L	0.990

Pathway ID	Pathway Name
R-HSA-210745	Regulation of gene expression in beta cells
R-HSA-264876	Insulin processing
R-HSA-422085	Synthesis, secretion, and deacylation of Ghrelin
R-HSA-422356	Regulation of insulin secretion
R-HSA-6807878	COPI-mediated anterograde transport
R-HSA-6811558	PI5P, PP2A and IER3 Regulate PI3K/AKT Signaling
R-HSA-74713	IRS activation
R-HSA-74749	Signal attenuation
R-HSA-74751	Insulin receptor signalling cascade
R-HSA-74752	Signaling by Insulin receptor
R-HSA-77387	Insulin receptor recycling
R-HSA-9615017	FOXO-mediated transcription of oxidative stress, metabolic and neuronal genes
R-HSA-9768919	NPAS4 regulates expression of target genes
R-HSA-977225	Amyloid fiber formation

GO ID	Term
GO:0006006	Glucose metabolic process
GO:0008286	Insulin receptor signaling pathway
GO:0042593	Glucose homeostasis
GO:0046628	Positive regulation of insulin receptor signaling pathway
GO:0045821	Positive regulation of glycolytic process
GO:0045725	Positive regulation of glycogen biosynthetic process
GO:0045721	Negative regulation of gluconeogenesis
GO:0046326	Positive regulation of D-glucose import across plasma membrane
GO:0043410	Positive regulation of MAPK cascade
GO:0045840	Positive regulation of mitotic nuclear division
GO:0045597	Positive regulation of cell differentiation
GO:0030307	Positive regulation of cell growth
GO:0008284	Positive regulation of cell population proliferation
GO:0030335	Positive regulation of cell migration
GO:0001819	Positive regulation of cytokine production
GO:0046889	Positive regulation of lipid biosynthetic process
GO:0045922	Negative regulation of fatty acid metabolic process
GO:0055089	Fatty acid homeostasis
GO:0051897	Positive regulation of phosphatidylinositol 3-kinase/protein kinase B signal transduction
GO:0010628	Positive regulation of gene expression

GO ID	Term
GO:0005179	Hormone activity
GO:0005158	Insulin receptor binding
GO:0005159	Insulin-like growth factor receptor binding
GO:0048018	Receptor ligand activity
GO:0042802	Identical protein binding
GO:0002020	Protease binding

GO ID	Term
GO:0005576	Extracellular region
GO:0005615	Extracellular space
GO:0005788	Endoplasmic reticulum lumen
GO:0005796	Golgi lumen
GO:0000139	Golgi membrane
GO:0034774	Secretory granule lumen
GO:0033116	Endoplasmic reticulum-Golgi intermediate compartment membrane
GO:0031904	Endosome lumen
GO:0030133	Transport vesicle

Rank	UniProt	Protein Name	STRING Score
1	P01343	Insulin-like growth factor 1 (IGF1)	999
2	P02768	Albumin (ALB)	999
3	P06213	Insulin receptor (INSR)	999
4	P08069	Insulin-like growth factor 1 receptor (IGF1R)	999
5	P35568	Insulin receptor substrate 1 (IRS1)	999
6	P01275	Pro-glucagon (GCG)	997
7	P17936	Insulin-like growth factor-binding protein 3 (IGFBP3)	995
8	P08833	Insulin-like growth factor-binding protein 1 (IGFBP1)	994
9	P11717	Cation-independent mannose-6-phosphate receptor (IGF2R)	990
10	Q12988	Heat shock protein beta-3 (HSPB3)	990
11	Q16270	Insulin-like growth factor-binding protein 7 (IGFBP7)	990
12	P04629	High affinity nerve growth factor receptor (NTRK1)	989
13	P41159	Leptin (LEP)	986
14	Q9Y4H2	Insulin receptor substrate 2 (IRS2)	986
15	P02144	Myoglobin (MB)	985
16	P01344	Insulin-like growth factor 2 (IGF2)	982
17	P18065	Insulin-like growth factor-binding protein 2 (IGFBP2)	982
18	P14672	Glucose transporter 4 (SLC2A4/GLUT4)	978
19	P10997	Islet amyloid polypeptide (IAPP)	970
20	P14735	Insulin-degrading enzyme (IDE)	967
21	Q15848	Adiponectin (ADIPOQ)	961
22	P99999	Cytochrome c (CYCS)	953
23	P01133	Pro-epidermal growth factor (EGF)	953
24	P31749	RAC-alpha serine/threonine-protein kinase (AKT1)	953
25	P61278	Somatostatin (SST)	949
26	P01189	Pro-opiomelanocortin (POMC)	947
27	P35557	Hexokinase-4 (GCK)	945
28	P37231	Peroxisome proliferator-activated receptor gamma (PPARG)	944
29	P01236	Prolactin (PRL)	941
30	P11168	Glucose transporter 2 (SLC2A2/GLUT2)	941

Rank	UniProt	Protein Name	ESM2 Similarity	Avg Similarity
1	P01308	Insulin (self)	1.0000	0.9679
2	Q6YK33	Insulin homolog (mouse ortholog)	1.0000	0.9679
3	P30406	Insulin-like peptide	0.9998	0.9647
4	P30407	Insulin-like peptide	0.9998	0.9658
5	Q8HXV2	Insulin (species variant)	0.9996	0.9680
6	P01313	Insulin (species variant)	0.9980	0.9795
7	P01311	Insulin (species variant)	0.9945	0.9756
8	P51463	Insulin-related peptide	0.9961	0.9700
9	O55232	Insulin (rodent ortholog)	0.9986	0.9362
10	O55241	Insulin (rodent ortholog)	0.9986	0.9371
11	P01322	Insulin (species variant)	0.9989	0.9787
12	P01323	Insulin (species variant)	0.9991	0.9797
13	P01326	Insulin (species variant)	0.9991	0.9792
14	P67970	Insulin (species variant)	0.9961	0.9686
15	P67972	Insulin (species variant)	0.9970	0.9773
16	Q62587	Insulin (rodent ortholog)	0.9979	0.9793
17	P69045	Insulin (species variant)	0.9988	0.9753
18	P81025	Insulin (species variant)	0.9969	0.9743
19	P01321	Insulin (species variant)	0.9973	0.9721
20	P06306	Insulin (species variant)	0.9969	0.9726

Rank	UniProt	Identity (%)	Bitscore
1	P17085	92.0	347.0
2	P10764	96.1	340.0
3	P33712	97.5	306.0
4	P23695	92.2	315.0
5	P18254	100.0	268.0
6	P51462	100.0	258.0
7	P16501	93.2	254.0
8	P41694	93.8	251.0
9	P01322	93.6	215.0
10	P01333	84.8	149.0
11	P30410	98.2	224.0
12	P30407	99.1	227.0
13	P30406	99.1	226.0
14	P01308	100.0	226.0
15	P04667	78.2	174.0
16	P01310	91.3	158.0
17	P67971	93.3	109.0
18	P01335	90.7	211.0
19	P01315	91.3	189.0
20	P01314	100.0	112.0

TF	Regulation	Confidence
PDX1	Activation	High
HNF1A	Activation	High
HNF1B	Activation	High
ISL1	Activation	High
NEUROG3	Activation	High
CDX2	Activation	High
MAFB	Activation	High
MAFA	Unknown	High
MAF	Activation	High
KLF11	Activation	High
ESR1	Activation	High
NR1H4	Activation	High
ATF2	Activation	High
FOXA2	Unknown	High
HNF4A	Unknown	High
STAT5B	Unknown	High
CREB1	Unknown	High
CREM	Unknown	High
GLIS3	Unknown	High
NKX6-1	Unknown	High
NEUROD1	Unknown	High
FOXO1	Unknown	High
SOX9	Unknown	High
SOX6	Unknown	High
SREBF1	Unknown	High
SRF	Unknown	High
TCF3	Unknown	High
JUN	Repression	High
AP1	Repression	High
ATF6	Repression	High
PAX6	Repression	High
PAX4	Repression	High
NR0B2	Repression	High
PPARG	Repression	High

Molecule ID	Name	Type	Phase	Clinical Trials
CHEMBL1201631	Insulin human	Protein	4	603
CHEMBL1201497	Insulin glargine	Protein analog	4	510
CHEMBL1201496	Insulin aspart	Protein analog	4	294
CHEMBL2104391	Insulin detemir	Protein analog	4	185
CHEMBL1201538	Insulin lispro	Protein analog	4	154

Rank	Tissue/Condition	Expression Status	Score	Quality
1	Type B pancreatic cell	Present	100.00	Gold
2	Islet of Langerhans	Present	99.96	Gold
3	Body of pancreas	Present	99.78	Gold
4	Pancreas	Present	99.05	Gold
5	Epithelial cell of pancreas	Present	80.80	Gold
6	Right lobe of liver	Present	64.33	Gold
7	Triceps brachii	Absent	64.27	Gold
8	Gluteal muscle	Absent	64.12	Gold
9	Right adrenal gland	Present	63.29	Gold
10	Left adrenal gland	Present	62.29	Gold
11	Olfactory bulb	Absent	62.27	Gold
12	Left adrenal gland cortex	Present	61.07	Gold
13	Tongue squamous epithelium	Absent	61.07	Gold
14	Right adrenal gland cortex	Present	60.43	Gold
15	Adrenal cortex	Present	59.95	Gold
16	Ectocervix	Present	59.63	Gold
17	Descending thoracic aorta	Present	59.02	Gold
18	Vastus lateralis	Absent	58.39	Gold
19	Adrenal gland	Present	58.33	Gold
20	Oocyte	Absent	58.03	Gold
21	Left uterine tube	Present	57.44	Gold
22	Quadriceps femoris	Absent	57.41	Gold
23	Right coronary artery	Present	55.87	Gold
24	Lower esophagus mucosa	Present	55.80	Gold
25	Myocardium	Absent	55.26	Gold
26	Endocervix	Present	54.73	Gold
27	Diaphragm	Absent	54.62	Gold
28	Fundus of stomach	Present	54.26	Gold
29	Lateral nuclear group of thalamus	Absent	54.02	Gold
30	Substantia nigra	Present	53.69	Gold

Dataset ID	Study	Species	Cell Count
E-ENAD-27	Single cell transcriptomics defines human islet cell signatures and reveals cell-type-specific expression changes in type 2 diabetes	Homo sapiens	1,145
E-GEOD-81547	Single cell transcriptome analysis of human pancreas	Homo sapiens	2,544
E-GEOD-81608	Single cell RNA-seq of human islet cells from non-diabetic and type II diabetes organ donors	Homo sapiens	1,600
E-GEOD-83139	Single cell RNA-seq of human pancreatic endocrine cells from juvenile, adult control, and type 2 diabetic donors	Homo sapiens	635
E-HCAD-31	Massively parallel single-cell RNA-seq analysis of pancreatic islet cells from healthy and type II diabetic donors	Homo sapiens	38,217
E-MTAB-10137	Unraveling transcriptomic heterogeneity in human dermal blood vascular endothelium at single-cell resolution	Homo sapiens	1,523
E-MTAB-5061	Single-cell RNA-seq analysis of human pancreas from healthy individuals and type 2 diabetes patients	Homo sapiens	3,386

Disease	Disease IDs	Inheritance	Evidence Level
Diabetes mellitus, permanent neonatal 4	OMIM:618858, MONDO:0030089, MONDO:0100164	Autosomal dominant, Autosomal recessive	Strong (Genomics England PanelApp, Labcorp), Moderate (Ambry Genetics)
Transient neonatal diabetes mellitus	MONDO:0020525	Autosomal dominant, Autosomal recessive	Strong (Genomics England PanelApp)
Permanent neonatal diabetes mellitus	Orphanet:99885, MONDO:0100164	Autosomal dominant	Strong (Genomics England PanelApp, Labcorp), Supportive (Orphanet)
Type 1 diabetes mellitus 2	OMIM:125852, MONDO:0007454	Autosomal dominant	Strong (Genomics England PanelApp)
Maturity-onset diabetes of the young type 10	OMIM:613370, MONDO:0013240, Orphanet:552	Autosomal dominant	Strong (Genomics England PanelApp, Labcorp), Supportive (Orphanet)
Hyperproinsulinemia	OMIM:616214, MONDO:0014535	Autosomal dominant	Strong (Genomics England PanelApp), Limited (Labcorp)
Maturity-onset diabetes of the young	Orphanet:552	Autosomal dominant	Supportive (Orphanet)

Trait/Disease	Associated Gene(s)	p-value	Study
Type 1 diabetes	INS-IGF2, INS	1e-196	GCST001191
Type 1 diabetes	INS-IGF2, INS	1e-160	GCST010681
Type 1 diabetes	INS-IGF2, INS	1e-100	GCST005536
Type 1 diabetes	INS-IGF2, INS	1e-18	GCST007246
Type 1 diabetes	INS - TH	1e-13	GCST009916
Type 1 diabetes	INS-IGF2, IGF2, IGF2-AS	8e-11	GCST003097
Type 1 diabetes	INS-IGF2, IGF2	4e-09	GCST000054
Type 1 diabetes	MIR4686 - ASCL2	2e-31	GCST90000529
Type 1 diabetes	IGF2, INS-IGF2, IGF2-AS	1e-09	GCST90000529
Type 1 diabetes	H19 - IGF2	1e-09	GCST90000529
Type 1 diabetes in high risk HLA genotype individuals	INS - TH	3e-07	GCST006196
Type 1 diabetes	INS-IGF2, IGF2	2e-07	GCST000038
Severe autoimmune type 2 diabetes	INS-IGF2, INS	3e-07	GCST90026412
Type 1 diabetes autoantibodies in high risk HLA genotype individuals	INS - TH	1e-07	GCST006197
Type 1 diabetes	INS	1e-06	GCST008377
Type 2 diabetes	MIR4686 - ASCL2	1e-16	GCST010118
Type 2 diabetes	MIR4686 - ASCL2	4e-26	GCST009379
Type 2 diabetes	MIR4686 - ASCL2	3e-13	GCST007847
Type 2 diabetes	MIR4686 - ASCL2	2e-07	GCST008114
Type 2 diabetes	INS - TH	1e-06	GCST009379
Type 2 diabetes	TH	2e-08	GCST008464
Type 2 diabetes	H19 - IGF2	2e-08	GCST009379
Type 2 diabetes	IGF2	4e-08	GCST009379
Type 2 diabetes	FAM99B, LINC02708	2e-06	GCST009379
Prostate cancer	MIR4686 - ASCL2	3e-33	GCST000488
Birth weight	H19 - IGF2	7e-10	GCST005146
Celiac disease	H19 - IGF2	7e-06	GCST002112
Latent autoimmune diabetes vs. type 2 diabetes	INS-IGF2, INS	1e-18	GCST007246
Type 1 diabetes autoantibodies (time to event)	INS - TH	6e-06	GCST006197
Pediatric autoimmune diseases	INS-IGF2, IGF2, IGF2-AS	8e-11	GCST003097

Executive summary

Gene identifiers

Transcript identifiers

Ensembl Transcripts

RefSeq mRNA Accessions

CCDS IDs

MANE SELECT / Canonical Transcript (ENST00000381330)

Protein identifiers

UniProt Accessions

RefSeq Protein (NP_ accessions)

Protein Domains and Families

Antibody Resources

Structure

Experimental Structures: PDB

By Experimental Method:

Predicted Structure: AlphaFold

Cross-species orthologs

Clinical variants & AI predictions

ClinVar Summary

Top 30 Pathogenic/Likely Pathogenic Variants

AlphaMissense Pathogenicity Predictions

Splice Effect Predictions

Pathways & Gene Ontology

Reactome Pathways (14)

MSigDB Gene Sets (100 total)

Gene Ontology

Protein interactions & networks

Protein-Protein Interactions

Protein Structural & Sequence Similarity

Transcription factor regulatory data

Upstream regulators

Drug & pharmacology data

Insulin-based therapeutics in ChEMBL

Clinical trials (top 20 involving insulin therapeutics)

Pharmacogenomics

Expression profiles

Tissue Expression (Bgee)

Single-Cell Expression Datasets (scxa)

Disease associations

Mendelian / Monogenic Disease

Phenotype Associations (Top 30 HPO Terms)

Complex Disease / GWAS Associations (Top 30)