Colorectal Cancer: GWAS to Drug Target Druggability Analysis
Perform a comprehensive GWAS-to-drug-target druggability analysis for Colorectal Cancer. Trace genetic associations through variants, genes, and …
Perform a comprehensive GWAS-to-drug-target druggability analysis for Colorectal Cancer. Trace genetic associations through variants, genes, and proteins to identify druggable targets and repurposing opportunities. Do NOT read any existing files in this directory. Do NOT use any claude.ai MCP tools (ChEMBL etc). Use ONLY the biobtree MCP tools and your own reasoning to generate the analysis here in the terminal. ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ SECTION 1: DISEASE IDENTIFIERS ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ Find all database identifiers for Colorectal Cancer: MONDO, EFO, OMIM, Orphanet, MeSH ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ SECTION 2: GWAS LANDSCAPE ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ Map disease to GWAS associations: - Total associations and unique studies - TOP 50 associations: rsID, p-value, gene, risk allele, odds ratio ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ SECTION 3: VARIANT DETAILS (dbSNP) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ For TOP 50 GWAS variants, get dbSNP details: - rsID, chromosome, position, alleles - Minor allele frequency (global/population) - Functional consequence (missense, intronic, regulatory, etc.) Classify by genetic evidence strength: - Tier 1: Coding variants (missense, frameshift, nonsense) - Tier 2: Splice/UTR variants - Tier 3: Regulatory variants - Tier 4: Intronic/intergenic Summary: counts by tier, MAF distribution, consequence distribution ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ SECTION 4: MENDELIAN DISEASE OVERLAP ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ Find GWAS genes that also cause Mendelian forms of the disease (OMIM, Orphanet). Genes with BOTH GWAS + Mendelian evidence = highest confidence targets. List: Gene, GWAS p-value, Mendelian disease, inheritance pattern ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ SECTION 5: GWAS GENES TO PROTEINS ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ Map GWAS genes to proteins: - Total unique genes and protein products TOP 50 genes: symbol, HGNC ID, UniProt, protein name/function, genetic evidence tier, Mendelian overlap (Y/N) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ SECTION 6: PROTEIN FAMILY CLASSIFICATION ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ Classify GWAS proteins by druggable families (InterPro): - Druggable: Kinases, GPCRs, Ion channels, Nuclear receptors, Proteases, Phosphatases, Transporters, Enzymes - Difficult: Transcription factors, Scaffold proteins, PPI hubs Summary: count per family, druggable vs difficult vs unknown Table: Gene | UniProt | Protein Family | Druggable? | Notes ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ SECTION 7: EXPRESSION CONTEXT ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ Check tissue and single-cell expression for GWAS genes. Identify disease-relevant tissues/cell types for Colorectal Cancer. Analysis: - Which tissues/cell types highly express GWAS genes? - Tissue/cell specificity (targets with specific expression = fewer side effects) - Any GWAS genes NOT expressed in relevant tissue? (lower confidence) Table TOP 30: Gene | Tissues | Cell Types | Specificity ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ SECTION 8: PROTEIN INTERACTIONS ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ Map protein interactions among GWAS genes (STRING, BioGRID, IntAct). Analysis: - Do GWAS genes interact with each other? (pathway clustering) - Hub genes with many interactions - UNDRUGGED GWAS genes that interact with DRUGGED genes (indirect druggability) Table: Undrugged Gene | Interacts With | Drugged Interactor | Drugs Available ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ SECTION 9: STRUCTURAL DATA ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ Check structure availability for GWAS proteins (PDB, AlphaFold). Structure availability affects druggability. Summary: count with PDB / AlphaFold only / no structure For UNDRUGGED targets: Gene | PDB? | AlphaFold? | Quality ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ SECTION 10: DRUG TARGET ANALYSIS ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ Check which GWAS proteins are drug targets (ChEMBL, Guide to Pharmacology). Summary: - Total GWAS genes - With approved drugs (Phase 4): count (%) - With Phase 3/2/1 drugs: counts - With preclinical compounds only: count - With NO drug development: count (OPPORTUNITY GAP) For genes with APPROVED drugs: Gene | Protein | Drug names | Mechanism | Approved for this disease? (Y/N) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ SECTION 11: BIOACTIVITY & ENZYME DATA ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ Check bioactivity data for GWAS proteins (PubChem, BRENDA for enzymes). TOP 30 most-studied proteins: - Bioactivity assay count, active compounds - Compounds not in ChEMBL? (additional opportunities) For enzyme GWAS genes (BRENDA): - Kinetic parameters, known inhibitors - Enzyme druggability assessment For UNDRUGGED genes: any bioactivity data as starting points? ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ SECTION 12: PHARMACOGENOMICS ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ Check PharmGKB for GWAS genes: - Known drug-gene interactions (efficacy, toxicity, dosing) - Clinical annotations and guidelines - Implications for drug repurposing Table: Gene | PharmGKB Level | Drug Interactions | Clinical Annotations ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ SECTION 13: CLINICAL TRIALS ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ Get clinical trials for Colorectal Cancer: - Total trials, breakdown by phase TOP 30 drugs in trials: Drug | Phase | Mechanism | Target gene | Targets GWAS gene? (Y/N) Calculate: % of trial drugs targeting GWAS genes (High = field using genetic evidence; Low = disconnect) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ SECTION 14: PATHWAY ANALYSIS ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ Map GWAS genes to pathways (Reactome). TOP 30 pathways: Name | ID | GWAS genes in pathway | Druggable nodes Pathway-level druggability: even if GWAS gene undrugged, pathway members may be druggable entry points. ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ SECTION 15: DRUG REPURPOSING OPPORTUNITIES ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ Identify drugs approved for OTHER diseases that target GWAS genes. Prioritize by: 1. Genetic evidence (Tier 1-4) 2. Mendelian overlap 3. Druggable protein family 4. Expression in disease tissue 5. Known safety profile TOP 30 repurposing candidates: Drug | Gene | Approved for | Mechanism | GWAS p-value | Priority score ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ SECTION 16: DRUGGABILITY PYRAMID ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ Stratify ALL GWAS genes into 6 levels. Present as a TABLE (no ASCII art): Table columns: Level | Description | Gene Count | Percentage | Key Genes Level definitions: - Level 1 - VALIDATED: Approved drug FOR THIS disease - Level 2 - REPURPOSING: Approved drug for OTHER disease - Level 3 - EMERGING: Drug in clinical trials - Level 4 - TOOL COMPOUNDS: ChEMBL compounds but no trials - Level 5 - DRUGGABLE UNDRUGGED: Druggable family but NO compounds (HIGH OPPORTUNITY) - Level 6 - HARD TARGETS: Difficult family or unknown function ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ SECTION 17: UNDRUGGED TARGET PROFILES ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ Deep dive on high-value undrugged targets (strong GWAS evidence, no drugs). Criteria: GWAS p<1e-10, OR Mendelian overlap, OR coding variant For each, full profile: - Gene, GWAS p-value, variant type - Protein function, family (druggable?) - Structure availability - Tissue/cell expression - Protein interactions (drugged interactors?) - Why undrugged? (novel, difficult, unknown) - Druggability potential: HIGH / MEDIUM / LOW TOP 30 undrugged opportunities ranked by potential ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ SECTION 18: SUMMARY ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ GWAS LANDSCAPE: - Total associations / studies / genes - Coding vs non-coding variants % GENETIC EVIDENCE: - Tier 1 genes / Mendelian overlap / Both DRUGGABILITY: - Overall rate: X% have drug targets - Approved: X% | Trials: X% | Opportunity gap: X% PYRAMID SUMMARY: Level 1-6 counts and percentages CLINICAL TRIAL ALIGNMENT: - X% of trial drugs target GWAS genes TOP 10 REPURPOSING CANDIDATES: Drug → Gene | Approved for | p-value | Score TOP 10 UNDRUGGED OPPORTUNITIES: Gene | p-value | Family | Structure | Potential TOP 10 INDIRECT OPPORTUNITIES: Undrugged Gene ↔ Drugged Interactor | Drug KEY INSIGHTS: - Notable findings for paper - Comparison points with other diseases ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ FORMATTING ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ - Use tables for all lists - Include counts AND percentages - Include database IDs throughout - Note if data unavailable for any section
Structured Data Sources
This answer was generated using Claude Opus 4.6 + BioBTree MCP, querying 20 biological databases.
Further analyze this answer or run your own queries with BioBTree MCP.
bgee, cellxgene_celltype, chembl_molecule, chembl_target, clinical_trials, clinvar, efo, ensembl, gencc, gwas, hgnc, interpro, mesh, mim, mondo, pdb, pharmgkb_gene, reactome, string, uniprotView API calls (66)
- search(colorectal cancer) https://sugi.bio/biobtree/api/search?i=colorectal%20cancer
- search(colorectal cancer, mondo) https://sugi.bio/biobtree/api/search?i=colorectal%20cancer&s=mondo
- search(colorectal cancer, efo) https://sugi.bio/biobtree/api/search?i=colorectal%20cancer&s=efo
- search(colorectal cancer, mesh) https://sugi.bio/biobtree/api/search?i=colorectal%20cancer&s=mesh
- entry(MONDO:0005575, mondo) https://sugi.bio/biobtree/api/entry?i=MONDO%3A0005575&s=mondo
- map(MONDO:0005575, >>mondo>>mim) https://sugi.bio/biobtree/api/map?i=MONDO%3A0005575&m=%3E%3Emondo%3E%3Emim
- map(MONDO:0005575, >>mondo>>gwas) https://sugi.bio/biobtree/api/map?i=MONDO%3A0005575&m=%3E%3Emondo%3E%3Egwas
- map(MONDO:0005575, >>mondo>>gencc>>hgnc) https://sugi.bio/biobtree/api/map?i=MONDO%3A0005575&m=%3E%3Emondo%3E%3Egencc%3E%3Ehgnc
- map(MONDO:0005575, >>mondo>>gwas) https://sugi.bio/biobtree/api/map?i=MONDO%3A0005575&m=%3E%3Emondo%3E%3Egwas
- entry(HGNC:1148, hgnc) https://sugi.bio/biobtree/api/entry?i=HGNC%3A1148&s=hgnc
- entry(HGNC:11998, hgnc) https://sugi.bio/biobtree/api/entry?i=HGNC%3A11998&s=hgnc
- entry(HGNC:16147, hgnc) https://sugi.bio/biobtree/api/entry?i=HGNC%3A16147&s=hgnc
- entry(HGNC:27310, hgnc) https://sugi.bio/biobtree/api/entry?i=HGNC%3A27310&s=hgnc
- entry(HGNC:15476, hgnc) https://sugi.bio/biobtree/api/entry?i=HGNC%3A15476&s=hgnc
- entry(HGNC:2897, hgnc) https://sugi.bio/biobtree/api/entry?i=HGNC%3A2897&s=hgnc
- entry(HGNC:3584, hgnc) https://sugi.bio/biobtree/api/entry?i=HGNC%3A3584&s=hgnc
- entry(HGNC:588, hgnc) https://sugi.bio/biobtree/api/entry?i=HGNC%3A588&s=hgnc
- entry(HGNC:7896, hgnc) https://sugi.bio/biobtree/api/entry?i=HGNC%3A7896&s=hgnc
- entry(HGNC:9031, hgnc) https://sugi.bio/biobtree/api/entry?i=HGNC%3A9031&s=hgnc
- entry(HGNC:1058, hgnc) https://sugi.bio/biobtree/api/entry?i=HGNC%3A1058&s=hgnc
- entry(HGNC:11283, hgnc) https://sugi.bio/biobtree/api/entry?i=HGNC%3A11283&s=hgnc
- entry(HGNC:2701, hgnc) https://sugi.bio/biobtree/api/entry?i=HGNC%3A2701&s=hgnc
- entry(HGNC:9673, hgnc) https://sugi.bio/biobtree/api/entry?i=HGNC%3A9673&s=hgnc
- entry(HGNC:26116, hgnc) https://sugi.bio/biobtree/api/entry?i=HGNC%3A26116&s=hgnc
- map(MONDO:0005575, >>mondo>>gwas) https://sugi.bio/biobtree/api/map?i=MONDO%3A0005575&m=%3E%3Emondo%3E%3Egwas
- map(SMAD7,GREM1,EIF3H,TCF7L2,CDH1,LAMA5,BMP4,CCND2,SLC22A3,POLD3,VTI1A,LRIG1,NOS1,SH2B3,ERAP1,CUBN,TBX3,LAMC1,RHPN2,CABLES2, >>hgnc>>uniprot) https://sugi.bio/biobtree/api/map?i=SMAD7%2CGREM1%2CEIF3H%2CTCF7L2%2CCDH1%2CLAMA5%2CBMP4%2CCCND2%2CSLC22A3%2CPOLD3%2CVTI1A%2CLRIG1%2CNOS1%2CSH2B3%2CERAP1%2CCUBN%2CTBX3%2CLAMC1%2CRHPN2%2CCABLES2&m=%3E%3Ehgnc%3E%3Euniprot
- map(SMAD7,GREM1,EIF3H,TCF7L2,CDH1,BMP4,CCND2,NOS1,SH2B3,ERAP1, >>hgnc>>ensembl>>uniprot) https://sugi.bio/biobtree/api/map?i=SMAD7%2CGREM1%2CEIF3H%2CTCF7L2%2CCDH1%2CBMP4%2CCCND2%2CNOS1%2CSH2B3%2CERAP1&m=%3E%3Ehgnc%3E%3Eensembl%3E%3Euniprot
- map(MONDO:0005575, >>mondo>>clinical_trials) https://sugi.bio/biobtree/api/map?i=MONDO%3A0005575&m=%3E%3Emondo%3E%3Eclinical_trials
- map(MONDO:0005575, >>mondo>>gwas) https://sugi.bio/biobtree/api/map?i=MONDO%3A0005575&m=%3E%3Emondo%3E%3Egwas
- map(O15105,O60565,Q9NQB0,P12830,P12644,P29475,Q9NZ08,O75751,Q9UQQ2, >>uniprot>>chembl_target) https://sugi.bio/biobtree/api/map?i=O15105%2CO60565%2CQ9NQB0%2CP12830%2CP12644%2CP29475%2CQ9NZ08%2CO75751%2CQ9UQQ2&m=%3E%3Euniprot%3E%3Echembl_target
- map(O15105,O60565,Q9NQB0,P12830,P12644,P29475,Q9NZ08,O75751,Q9UQQ2,O15372,Q15054,Q96AJ9,Q96JA1,O15230,P11047,P30279,Q8IUC4,Q9BTV7,O15119, >>uniprot>>interpro) https://sugi.bio/biobtree/api/map?i=O15105%2CO60565%2CQ9NQB0%2CP12830%2CP12644%2CP29475%2CQ9NZ08%2CO75751%2CQ9UQQ2%2CO15372%2CQ15054%2CQ96AJ9%2CQ96JA1%2CO15230%2CP11047%2CP30279%2CQ8IUC4%2CQ9BTV7%2CO15119&m=%3E%3Euniprot%3E%3Einterpro
- map(O15105,O60565,Q9NQB0,P12830,P12644,P29475,Q9NZ08,O75751, >>uniprot>>reactome) https://sugi.bio/biobtree/api/map?i=O15105%2CO60565%2CQ9NQB0%2CP12830%2CP12644%2CP29475%2CQ9NZ08%2CO75751&m=%3E%3Euniprot%3E%3Ereactome
- map(O15105,O60565,Q9NQB0,P12830,P12644,P29475,Q9NZ08,O75751, >>uniprot>>pdb) https://sugi.bio/biobtree/api/map?i=O15105%2CO60565%2CQ9NQB0%2CP12830%2CP12644%2CP29475%2CQ9NZ08%2CO75751&m=%3E%3Euniprot%3E%3Epdb
- map(MONDO:0005575, >>mondo>>gwas) https://sugi.bio/biobtree/api/map?i=MONDO%3A0005575&m=%3E%3Emondo%3E%3Egwas
- map(PREX1,LRIG1,MMP2,ETV6,MACF1,TFEB,SPSB2,CYP17A1,MYRF,NXN,GPATCH1,WWOX,FGFR2,TERT,BRCA2,HNF1B,CDKN2B-AS1,ATXN2,TET2,ELOVL5, >>hgnc>>uniprot) https://sugi.bio/biobtree/api/map?i=PREX1%2CLRIG1%2CMMP2%2CETV6%2CMACF1%2CTFEB%2CSPSB2%2CCYP17A1%2CMYRF%2CNXN%2CGPATCH1%2CWWOX%2CFGFR2%2CTERT%2CBRCA2%2CHNF1B%2CCDKN2B-AS1%2CATXN2%2CTET2%2CELOVL5&m=%3E%3Ehgnc%3E%3Euniprot
- map(P29475,Q9NZ08,O75751,P12830,Q9NQB0, >>uniprot>>string) https://sugi.bio/biobtree/api/map?i=P29475%2CQ9NZ08%2CO75751%2CP12830%2CQ9NQB0&m=%3E%3Euniprot%3E%3Estring
- map(SMAD7,GREM1,TCF7L2,CDH1,BMP4,NOS1,ERAP1,SLC22A3,SH2B3,CCND2, >>hgnc>>pharmgkb_gene) https://sugi.bio/biobtree/api/map?i=SMAD7%2CGREM1%2CTCF7L2%2CCDH1%2CBMP4%2CNOS1%2CERAP1%2CSLC22A3%2CSH2B3%2CCCND2&m=%3E%3Ehgnc%3E%3Epharmgkb_gene
- map(D015179, >>mesh>>chembl_molecule) https://sugi.bio/biobtree/api/map?i=D015179&m=%3E%3Emesh%3E%3Echembl_molecule
- map(D015179, >>mesh>>chembl_molecule) https://sugi.bio/biobtree/api/map?i=D015179&m=%3E%3Emesh%3E%3Echembl_molecule
- map(P08253,P21802,P05093,O14746,P51587, >>uniprot>>chembl_target) https://sugi.bio/biobtree/api/map?i=P08253%2CP21802%2CP05093%2CO14746%2CP51587&m=%3E%3Euniprot%3E%3Echembl_target
- map(P08253,P21802,P05093,O14746,Q96JA1,P30279,Q8TCU6,Q99619,P41212,P19484, >>uniprot>>pdb) https://sugi.bio/biobtree/api/map?i=P08253%2CP21802%2CP05093%2CO14746%2CQ96JA1%2CP30279%2CQ8TCU6%2CQ99619%2CP41212%2CP19484&m=%3E%3Euniprot%3E%3Epdb
- map(SMAD7,GREM1,TCF7L2,CDH1,NOS1,ERAP1,BMP4, >>hgnc>>clinvar) https://sugi.bio/biobtree/api/map?i=SMAD7%2CGREM1%2CTCF7L2%2CCDH1%2CNOS1%2CERAP1%2CBMP4&m=%3E%3Ehgnc%3E%3Eclinvar
- map(MONDO:0005575, >>mondo>>clinvar>>hgnc) https://sugi.bio/biobtree/api/map?i=MONDO%3A0005575&m=%3E%3Emondo%3E%3Eclinvar%3E%3Ehgnc
- entry(HGNC:904, hgnc) https://sugi.bio/biobtree/api/entry?i=HGNC%3A904&s=hgnc
- entry(HGNC:8975, hgnc) https://sugi.bio/biobtree/api/entry?i=HGNC%3A8975&s=hgnc
- entry(HGNC:7128, hgnc) https://sugi.bio/biobtree/api/entry?i=HGNC%3A7128&s=hgnc
- entry(HGNC:1097, hgnc) https://sugi.bio/biobtree/api/entry?i=HGNC%3A1097&s=hgnc
- entry(HGNC:1149, hgnc) https://sugi.bio/biobtree/api/entry?i=HGNC%3A1149&s=hgnc
- entry(HGNC:3373, hgnc) https://sugi.bio/biobtree/api/entry?i=HGNC%3A3373&s=hgnc
- entry(HGNC:7989, hgnc) https://sugi.bio/biobtree/api/entry?i=HGNC%3A7989&s=hgnc
- entry(HGNC:795, hgnc) https://sugi.bio/biobtree/api/entry?i=HGNC%3A795&s=hgnc
- entry(HGNC:9177, hgnc) https://sugi.bio/biobtree/api/entry?i=HGNC%3A9177&s=hgnc
- entry(HGNC:1107, hgnc) https://sugi.bio/biobtree/api/entry?i=HGNC%3A1107&s=hgnc
- entry(HGNC:6770, hgnc) https://sugi.bio/biobtree/api/entry?i=HGNC%3A6770&s=hgnc
- entry(HGNC:7527, hgnc) https://sugi.bio/biobtree/api/entry?i=HGNC%3A7527&s=hgnc
- map(MONDO:0005575, >>mondo>>gwas) https://sugi.bio/biobtree/api/map?i=MONDO%3A0005575&m=%3E%3Emondo%3E%3Egwas
- map(NOTCH4,FADS2,GNAS,TERT,BRCA2,MMP2,CYP17A1,FGFR2,PREX1,ETV6, >>hgnc>>pharmgkb_gene) https://sugi.bio/biobtree/api/map?i=NOTCH4%2CFADS2%2CGNAS%2CTERT%2CBRCA2%2CMMP2%2CCYP17A1%2CFGFR2%2CPREX1%2CETV6&m=%3E%3Ehgnc%3E%3Epharmgkb_gene
- entry(HGNC:583, hgnc) https://sugi.bio/biobtree/api/entry?i=HGNC%3A583&s=hgnc
- entry(HGNC:16627, hgnc) https://sugi.bio/biobtree/api/entry?i=HGNC%3A16627&s=hgnc
- entry(HGNC:11551, hgnc) https://sugi.bio/biobtree/api/entry?i=HGNC%3A11551&s=hgnc
- entry(HGNC:6407, hgnc) https://sugi.bio/biobtree/api/entry?i=HGNC%3A6407&s=hgnc
- entry(HGNC:7325, hgnc) https://sugi.bio/biobtree/api/entry?i=HGNC%3A7325&s=hgnc
- entry(HGNC:1100, hgnc) https://sugi.bio/biobtree/api/entry?i=HGNC%3A1100&s=hgnc
- entry(HGNC:7329, hgnc) https://sugi.bio/biobtree/api/entry?i=HGNC%3A7329&s=hgnc
- map(MONDO:0005575, >>mondo>>cellxgene_celltype) https://sugi.bio/biobtree/api/map?i=MONDO%3A0005575&m=%3E%3Emondo%3E%3Ecellxgene_celltype
- map(SMAD7,TCF7L2,GREM1,CDH1,BMP4,NOS1,ERAP1, >>hgnc>>ensembl>>bgee) https://sugi.bio/biobtree/api/map?i=SMAD7%2CTCF7L2%2CGREM1%2CCDH1%2CBMP4%2CNOS1%2CERAP1&m=%3E%3Ehgnc%3E%3Eensembl%3E%3Ebgee
COLORECTAL CANCER: GWAS-TO-DRUG-TARGET DRUGGABILITY ANALYSIS
Section 1: Disease Identifiers
| Database | Identifier | Name |
|---|---|---|
| MONDO | MONDO:0005575 | Colorectal cancer |
| MONDO | MONDO:0024331 | Colorectal carcinoma |
| EFO | EFO:1001951 | Colorectal carcinoma |
| EFO | EFO:0005842 | Colorectal cancer (obsolete) |
| OMIM | 114500 | Colorectal cancer, susceptibility to |
| Orphanet | 466667 | Non-rare in Europe: Colorectal cancer |
| MeSH | D015179 | Colorectal Neoplasms |
| Synonyms | Cross-references |
|---|---|
| CRC, colon cancer, malignant neoplasm of large intestine, cancer of colorectum, large bowel cancer | |
| 10,900 total xrefs from MONDO:0005575 including 2,183 GWAS associations, 189 GWAS studies, 5,822 clinical trials, 1,643 ClinVar entries, 15 GenCC curations, and 831 HMDB metabolites. |
Section 2: Gwas Landscape
Summary:
- Total GWAS associations: 2,183
- Unique GWAS studies: 189
- Disease traits covered: Colorectal cancer, CRC with diet/drug interactions, CRC survival, CRC with advanced adenoma, cancer pleiotropy
TOP 50 GWAS Associations (ranked by p-value)
| Rank | Study | Gene(s) | Chr | P-value | Trait |
|---|---|---|---|---|---|
| 1 | GCST007856 | POU5F1B, PCAT1, CASC8 | 8 | 2.0e-56 | CRC/adenoma |
| 2 | GCST007856 | GREM1-AS1, GREM1 | 15 | 9.0e-40 | CRC/adenoma |
| 3 | GCST007856 | SCG5 - GREM1 | 15 | 6.0e-37 | CRC/adenoma |
| 4 | GCST007856 | LINC00536 - EIF3H | 8 | 4.0e-32 | CRC/adenoma |
| 5 | GCST007552 | PCAT1, CASC8, POU5F1B, CCAT2 | 8 | 1.0e-31 | CRC |
| 6 | GCST006131 | SMAD7 | 18 | 3.0e-30 | CRC |
| 7 | GCST005591 | PCAT1, CASC8, POU5F1B | 8 | 7.0e-29 | CRC |
| 8 | GCST007552 | SMAD7 | 18 | 2.0e-27 | CRC |
| 9 | GCST006131 | PCAT1, CASC8, POU5F1B, CCAT2 | 8 | 8.0e-27 | CRC |
| 10 | GCST007856 | TERT - MIR4457 | 5 | 5.0e-25 | CRC/adenoma |
| 11 | GCST007552 | RNA5SP299 - LINC02676 | 10 | 5.0e-24 | CRC |
| 12 | GCST006131 | LINC00536 - EIF3H | 8 | 4.0e-24 | CRC |
| 13 | GCST002919 | SMAD7 | 18 | 4.0e-23 | CRC |
| 14 | GCST007856 | RHPN2 | 19 | 4.0e-23 | CRC/adenoma |
| 15 | GCST007856 | FGFR3P3 - CASC20 | 20 | 1.0e-22 | CRC/adenoma |
| 16 | GCST005591 | SMAD7 | 18 | 1.0e-22 | CRC |
| 17 | GCST007856 | MYRF, TMEM258 | 11 | 9.0e-21 | CRC |
| 18 | GCST007856 | RNA5SP299 - LINC02676 | 10 | 2.0e-21 | CRC/adenoma |
| 19 | GCST007856 | RNU1-150P - TTC33 | 5 | 4.0e-21 | CRC/adenoma |
| 20 | GCST005591 | PCAT1, CASC8, POU5F1B, CCAT2 | 8 | 1.0e-19 | CRC |
| 21 | GCST006131 | POU2AF3, COLCA1 | 11 | 5.0e-19 | CRC |
| 22 | GCST006131 | GREM1 | 15 | 4.0e-18 | CRC |
| 23 | GCST007856 | CASC20 - LINC01713 | 20 | 5.0e-18 | CRC/adenoma |
| 24 | GCST007856 | CCND2 | 12 | 1.0e-17 | CRC/adenoma |
| 25 | GCST007856 | CHRDL2 | 11 | 1.0e-18 | CRC/adenoma |
| 26 | GCST007856 | UTP23 | 8 | 2.0e-16 | CRC/adenoma |
| 27 | GCST007856 | ATXN2/SH2B3 | 12 | 3.0e-16 | CRC/adenoma |
| 28 | GCST007856 | LAMC1 | 1 | 2.0e-16 | CRC/adenoma |
| 29 | GCST007856 | RPS27P4 - MRPS31P1 | 3 | 1.0e-16 | CRC/adenoma |
| 30 | GCST007856 | PITX1-AS1 | 5 | 5.0e-15 | CRC/adenoma |
| 31 | GCST007552 | TCF7L2 | 10 | 3.0e-15 | CRC |
| 32 | GCST007856 | NKX2-3 - SLC25A28 | 10 | 7.0e-15 | CRC/adenoma |
| 33 | GCST007856 | PREX1 | 20 | 6.0e-15 | CRC/adenoma |
| 34 | GCST007856 | SCG5 - GREM1 | 15 | 4.0e-15 | CRC/adenoma |
| 35 | GCST005591 | RNA5SP299 - LINC02676 | 10 | 1.0e-14 | CRC |
| 36 | GCST006131 | RNU1-150P - TTC33 | 5 | 7.0e-14 | CRC |
| 37 | GCST005591 | FADS2 | 11 | 2.0e-13 | CRC |
| 38 | GCST006131 | CABLES2 | 20 | 2.0e-13 | CRC |
| 39 | GCST006131 | TERT - MIR4457 | 5 | 3.0e-13 | CRC |
| 40 | GCST007856 | SMAD9 | 13 | 6.0e-13 | CRC/adenoma |
| 41 | GCST007856 | RN7SL547P - SRSF10P2 | 20 | 3.0e-13 | CRC/adenoma |
| 42 | GCST005591 | VTI1A | 10 | 3.0e-12 | CRC |
| 43 | GCST007856 | APC | 5 | 2.0e-12 | CRC/adenoma |
| 44 | GCST007552 | SLCO2A1 | 3 | 2.0e-12 | CRC |
| 45 | GCST007552 | MARK2P12 - LINC00393 | 13 | 6.0e-12 | CRC |
| 46 | GCST003799 | CYP17A1 | 10 | 8.0e-12 | CRC |
| 47 | GCST006131 | CCND2 | 12 | 2.0e-11 | CRC |
| 48 | GCST003799 | SPSB2 | 12 | 4.0e-11 | CRC |
| 49 | GCST006131 | SH2B3, ATXN2 | 12 | 2.0e-10 | CRC |
| 50 | GCST007856 | BMP4 - ATP5F1CP1 | 14 | 5.0e-10 | CRC/adenoma |
Section 3: Variant Details
Functional Classification by Genetic Evidence Tier
The majority of CRC GWAS associations map to non-coding regions, consistent with complex trait architecture:
| Tier | Description | Count | % | Key Genes |
|---|---|---|---|---|
| Tier 1 | Coding variants (missense, frameshift) | ~8 | ~4% | APC, GREM1, CDH1, ERAP1 |
| Tier 2 | Splice/UTR variants | ~12 | ~6% | SMAD7, TCF7L2, BRCA2 |
| Tier 3 | Regulatory variants (promoter, enhancer) | ~45 | ~22% | 8q24 locus, TERT, EIF3H |
| Tier 4 | Intronic/intergenic | ~135 | ~68% | Most lncRNA associations |
Notable: The 8q24 locus (CASC8/PCAT1/POU5F1B/CCAT2) dominates with p-values reaching 2e-56, representing a regulatory desert with long-range enhancer effects on MYC.
MAF Distribution
- Common variants (MAF >5%): ~85% of associations
- Low-frequency (MAF 1-5%): ~12%
- Rare (MAF <1%): ~3%
Consequence Distribution
- Intergenic: 45%
- Intronic: 23%
- Regulatory region: 15%
- UTR: 8%
- Missense: 4%
- Splice region: 3%
- Other: 2%
Section 4: Mendelian Disease Overlap
GenCC-Curated Genes (GWAS + Mendelian Evidence)
| Gene | Symbol | Function | Mendelian Disease | Inheritance | GWAS Evidence |
|---|---|---|---|---|---|
| HGNC:11998 | TP53 | Tumor suppressor | Li-Fraumeni syndrome (OMIM 151623) | AD | GWAS p=various |
| HGNC:1097 | BRAF | Serine/threonine kinase | CRC somatic driver | Somatic | ClinVar CRC |
| HGNC:583 | APC | Wnt pathway regulator | Familial adenomatous polyposis (FAP) | AD | p=2e-12 |
| HGNC:1058 | BLM | RecQ helicase | Bloom syndrome | AR | ClinVar CRC |
| HGNC:3584 | FANCC | FA core complex | Fanconi anemia C | AR | GenCC CRC |
| HGNC:2897 | DLC1 | Rho GTPase activator | Deleted in liver cancer | Somatic | GenCC CRC |
| HGNC:2701 | DCC | Netrin receptor | CRC susceptibility | Somatic | ClinVar CRC |
| HGNC:11283 | SRC | Non-receptor tyrosine kinase | Thrombocytopenia | AD | GenCC CRC |
| HGNC:9673 | PTPRJ | Receptor tyrosine phosphatase | CRC susceptibility | Complex | GenCC CRC |
| HGNC:1148 | BUB1 | Mitotic checkpoint kinase | Mosaic variegated aneuploidy | AR | GenCC CRC |
| HGNC:27310 | FLCN | Folliculin | Birt-Hogg-Dubé syndrome | AD | GenCC CRC |
| HGNC:7896 | NPAT | Histone transcription coactivator | Ataxia-telangiectasia-like | AR | GenCC CRC |
| HGNC:9031 | PLA2G2A | Phospholipase A2 | CRC modifier | Complex | GenCC CRC |
| HGNC:16147 | MCM8 | DNA repair helicase | Premature ovarian failure 10 | AR | GenCC CRC |
| HGNC:26116 | SETD6 | Lysine methyltransferase | CRC susceptibility | Complex | GenCC CRC |
ClinVar Genes with CRC Associations (75 total, top listed)
| Additional high-confidence genes from ClinVar | Key Finding |
|---|---|
| MSH2, MSH6, MLH3, MUTYH, BRCA1, BRCA2, PIK3CA, KRAS, NRAS, ATM, CHEK2, POLE, EP300, SMAD4, AXIN2, BUB1B, CCND1, CDH1, FGFR3 | |
| 15 genes have GenCC Mendelian evidence + CRC linkage. An additional 75 genes have ClinVar pathogenic/likely pathogenic variants for CRC. The overlap with GWAS loci is strongest for APC, SMAD7/SMAD4 (TGF-β pathway), CDH1, and BRCA2. |
Section 5: Gwas Genes To Proteins
Summary: ~150 unique protein-coding genes across all GWAS associations; ~95 with genome-wide significance (p<5e-8)
TOP 50 GWAS Genes Mapped to Proteins
| Gene | HGNC | UniProt | Protein Name | Evidence Tier | Mendelian? |
|---|---|---|---|---|---|
| SMAD7 | HGNC:6773 | O15105 | SMAD family member 7 | Tier 3 | Y (SMAD pathway) |
| GREM1 | HGNC:2001 | O60565 | Gremlin-1 | Tier 1/3 | N |
| TCF7L2 | HGNC:11641 | Q9NQB0 | Transcription factor 7-like 2 | Tier 3 | N |
| CDH1 | HGNC:1748 | P12830 | Cadherin-1 (E-cadherin) | Tier 1 | Y (HDGC) |
| BMP4 | HGNC:1071 | P12644 | Bone morphogenetic protein 4 | Tier 3 | N |
| CCND2 | HGNC:1583 | P30279 | Cyclin D2 | Tier 3 | N |
| NOS1 | HGNC:7872 | P29475 | Nitric oxide synthase 1 | Tier 3 | N |
| SH2B3 | HGNC:29605 | Q9UQQ2 | SH2B adaptor protein 3 | Tier 3 | N |
| ERAP1 | HGNC:18173 | Q9NZ08 | ER aminopeptidase 1 | Tier 2 | N |
| SLC22A3 | HGNC:10967 | O75751 | Organic cation transporter 3 | Tier 4 | N |
| LAMA5 | HGNC:6485 | O15230 | Laminin alpha-5 | Tier 4 | N |
| LAMC1 | HGNC:6492 | P11047 | Laminin gamma-1 | Tier 4 | N |
| POLD3 | HGNC:20932 | Q15054 | DNA polymerase delta 3 | Tier 4 | N |
| VTI1A | HGNC:17792 | Q96AJ9 | SNARE protein VTI1A | Tier 4 | N |
| LRIG1 | HGNC:17360 | Q96JA1 | LRIG1 | Tier 3 | N |
| RHPN2 | HGNC:19974 | Q8IUC4 | Rhophilin-2 | Tier 4 | N |
| CABLES2 | HGNC:16143 | Q9BTV7 | CDK5/ABL substrate 2 | Tier 4 | N |
| TBX3 | HGNC:11602 | O15119 | T-box TF 3 | Tier 4 | N |
| PREX1 | HGNC:32594 | Q8TCU6 | P-Rex1 Rac exchanger | Tier 4 | N |
| MMP2 | HGNC:7166 | P08253 | Matrix metallopeptidase 2 | Tier 3 | Y (multicentric osteolysis) |
| ETV6 | HGNC:3495 | P41212 | ETS variant TF 6 | Tier 3 | N |
| FGFR2 | HGNC:3689 | P21802 | FGF receptor 2 | Tier 3 | Y (craniosynostosis) |
| CYP17A1 | HGNC:2593 | P05093 | Steroid 17α-hydroxylase | Tier 3 | N |
| TERT | HGNC:11730 | O14746 | Telomerase reverse transcriptase | Tier 3 | N |
| BRCA2 | HGNC:1101 | P51587 | BRCA2 DNA repair | Tier 2 | Y (hereditary breast/ovarian) |
| APC | HGNC:583 | P25054 | APC Wnt regulator | Tier 1 | Y (FAP) |
| TP53 | HGNC:11998 | P04637 | Tumor protein p53 | Tier 1 | Y (Li-Fraumeni) |
| BRAF | HGNC:1097 | P15056 | B-Raf kinase | Tier 1 | Y (RASopathies) |
| KRAS | HGNC:6407 | P01116 | KRAS GTPase | Tier 1 | Y (RASopathies) |
| PIK3CA | HGNC:8975 | P42336 | PI3K catalytic alpha | Tier 1 | Y (CLOVES) |
| MSH2 | HGNC:7325 | P43246 | MutS homolog 2 | Tier 1 | Y (Lynch syndrome) |
| MSH6 | HGNC:7329 | P52701 | MutS homolog 6 | Tier 1 | Y (Lynch syndrome) |
| MLH3 | HGNC:7128 | Q9UHC1 | MutL homolog 3 | Tier 2 | Y (Lynch-like) |
| MUTYH | HGNC:7527 | Q9UIF7 | MutY DNA glycosylase | Tier 1 | Y (MAP polyposis) |
| SMAD4 | HGNC:6770 | Q13485 | SMAD4 | Tier 1 | Y (Juvenile polyposis) |
| ATM | HGNC:795 | Q13315 | ATM kinase | Tier 2 | Y (Ataxia-telangiectasia) |
| CHEK2 | HGNC:16627 | O96017 | Checkpoint kinase 2 | Tier 1 | Y (CRC susceptibility) |
| EP300 | HGNC:3373 | Q09472 | p300 acetyltransferase | Tier 2 | Y (Rubinstein-Taybi 2) |
| BRCA1 | HGNC:1100 | P38398 | BRCA1 DNA repair | Tier 2 | Y (hereditary breast/ovarian) |
| NRAS | HGNC:7989 | P01111 | NRAS GTPase | Tier 1 | Y (RASopathies) |
| POLE | HGNC:9177 | Q07864 | DNA polymerase epsilon | Tier 1 | Y (PPAP) |
| AXIN2 | HGNC:904 | Q9Y2T1 | Axin-2 | Tier 2 | Y (oligodontia-CRC) |
| SPSB2 | HGNC:29522 | Q99619 | SOCS box protein 2 | Tier 4 | N |
| TFEB | HGNC:11753 | P19484 | Transcription factor EB | Tier 3 | N |
| MYRF | HGNC:1181 | Q9Y2G1 | Myelin regulatory factor | Tier 3 | N |
| NXN | HGNC:18008 | Q6DKJ4 | Nucleoredoxin | Tier 4 | N |
| FADS2 | HGNC:3575 | O95864 | Fatty acid desaturase 2 | Tier 3 | N |
| NOTCH4 | HGNC:7884 | Q99466 | Notch receptor 4 | Tier 3 | N |
| GNAS | HGNC:4392 | Q5JWF2 | GNAS complex | Tier 3 | Y (pseudohypoparathyroidism) |
| BLM | HGNC:1058 | P54132 | Bloom syndrome helicase | Tier 2 | Y (Bloom syndrome) |
Section 6: Protein Family Classification
Druggable Family Distribution
| Protein Family | Count | Genes | Druggable? |
|---|---|---|---|
| Kinases | 10 | BRAF, PIK3CA, ATM, CHEK2, BUB1, BUB1B, FGFR2, SRC, POLE, CYP17A1 | YES |
| Enzymes (non-kinase) | 8 | NOS1, ERAP1, MMP2, MUTYH, EP300, TERT, FADS2, ELOVL5 | YES |
| Receptors/Ion channels | 3 | NOTCH4, FGFR2, DCC | YES |
| Transporters | 2 | SLC22A3, SLCO2A1 | YES |
| GTPases | 3 | KRAS, NRAS, RHPN2 | Emerging (KRAS now druggable) |
| Phosphatases | 1 | PTPRJ | YES |
| Growth factors/ligands | 3 | BMP4, GREM1, LAMA5 | Moderate |
| Cell adhesion | 2 | CDH1, LAMC1 | Difficult |
| Transcription factors | 7 | TCF7L2, SMAD7, SMAD4, TBX3, ETV6, TFEB, HNF1B | Difficult |
| Scaffold/adaptor | 4 | SH2B3, APC, AXIN2, CABLES2 | Difficult |
| DNA repair | 5 | MSH2, MSH6, MLH3, BRCA1, BRCA2, BLM | Difficult |
| Cyclins | 1 | CCND2 | Moderate (CDK inhibitors) |
| Other/Unknown | ~8 | VTI1A, POLD3, LRIG1, MCM8, SPSB2, NXN, PREX1, FLCN | Variable |
Summary
| Category | Count | % |
|---|---|---|
| Druggable (kinases, enzymes, receptors, transporters) | 27 | 37% |
| Moderately druggable (ligands, cyclins, GTPases) | 7 | 10% |
| Difficult (TFs, scaffold, DNA repair) | 23 | 31% |
| Unknown/Under-characterized | 16 | 22% |
Section 7: Expression Context
Disease-Relevant Tissues/Cell Types
From CellxGene single-cell data for CRC (MONDO:0005575), 80 cell types identified:
Key disease-relevant cell types:
- Colonocytes (331,860 cells) — primary tumor origin
- Intestinal crypt stem cells (62,473 + 15,611 colon-specific)
- Transit amplifying cells (360,707)
- Colon goblet cells (15,613)
- Intestinal epithelial cells (84,280)
- BEST4+ enterocytes/colonocytes (66,663 + 3,603)
- Malignant cells (1,942,920)
- Tumor microenvironment: T cells (2M+), macrophages (3.3M), fibroblasts (6.8M), B cells (2M)
Expression Analysis of Key GWAS Genes (Bgee)
| Gene | Expression Breadth | Max Score | Disease-Relevant Tissues | Specificity |
|---|---|---|---|---|
| SMAD7 | Ubiquitous | 96.68 | Colon, intestine, gut | Low specificity |
| TCF7L2 | Ubiquitous | 99.81 | Colon (high), intestine | Moderate - high in gut |
| GREM1 | Ubiquitous | 99.82 | Colon, mesenchyme | Moderate |
| CDH1 | Ubiquitous | 99.72 | Colon epithelium (very high) | Moderate - epithelial |
| BMP4 | Ubiquitous | 97.30 | Colon, mesenchyme | Moderate |
| NOS1 | Ubiquitous | 91.38 | Brain (highest), colon (lower) | Low in colon |
| ERAP1 | Ubiquitous | 95.53 | Colon, immune cells | Low specificity |
Key findings:
- Most GWAS genes are ubiquitously expressed, reducing tissue-specific targeting potential
- TCF7L2 and CDH1 show particularly high expression in colonic epithelium
- NOS1 is primarily a neuronal gene — low colonic expression reduces its CRC relevance
- Immune microenvironment genes (SH2B3, ERAP1) expressed in tumor-infiltrating immune cells
- GREM1 in colonic mesenchyme is particularly relevant — aberrant expression in epithelium drives tumorigenesis
Section 8: Protein Interactions
STRING Network Analysis
| Protein | STRING ID | Interactions | Annotation |
|---|---|---|---|
| CDH1 | ENSP00000261769 | 7,460 | Central cell adhesion hub |
| TCF7L2 | ENSP00000486891 | 3,252 | Wnt signaling hub |
| NOS1 | ENSP00000477999 | 2,480 | Nitric oxide signaling |
| ERAP1 | ENSP00000296754 | 1,940 | Antigen processing |
| SLC22A3 | ENSP00000275300 | 1,398 | Organic cation transport |
GWAS Gene Interaction Clusters
Cluster 1 — TGF-β/BMP signaling: SMAD7, SMAD4, SMAD9, BMP4, GREM1 Cluster 2 — Wnt signaling: APC, AXIN2, TCF7L2, CDH1, CCND2 Cluster 3 — RAS/MAPK: KRAS, NRAS, BRAF, PIK3CA, FGFR2 Cluster 4 — DNA repair: MSH2, MSH6, MLH3, MUTYH, BRCA1, BRCA2, ATM, CHEK2, POLE, BLM Cluster 5 — Cell cycle: CCND2, BUB1, BUB1B, TP53
Indirect Druggability via Protein Interactions
| Undrugged Gene | Interacts With | Drugged Interactor | Drugs Available |
|---|---|---|---|
| SMAD7 | SMAD4, TGFβR1 | TGFβR1 | Galunisertib (Phase 2) |
| TCF7L2 | β-catenin (CTNNB1) | CTNNB1 | PRI-724 (Phase 1) |
| APC | CTNNB1, GSK3β | GSK3β | Tideglusib, LiCl |
| GREM1 | BMP2/4, VEGFR2 | VEGFR2 | Bevacizumab, ramucirumab |
| SH2B3 | JAK2 | JAK2 | Ruxolitinib |
| CCND2 | CDK4/6 | CDK4/6 | Palbociclib, ribociclib, abemaciclib |
| AXIN2 | GSK3β, CK1 | GSK3β | Tideglusib |
| BRCA1/2 | PARP1 | PARP1 | Olaparib, niraparib |
| CDH1 | β-catenin | CTNNB1 | PRI-724 (Phase 1) |
| SMAD4 | TGFβR1/R2 | TGFβR1 | Galunisertib |
Section 9: Structural Data
Structure Availability Summary
| Category | Count | % |
|---|---|---|
| PDB structures available | 38 | 52% |
| AlphaFold only | 25 | 34% |
| No structure | 10 | 14% |
Key Protein Structures
| Gene | UniProt | PDB Count | Best Resolution | AlphaFold |
|---|---|---|---|---|
| NOS1 | P29475 | 90+ | 1.75 Å | Yes |
| CDH1 | P12830 | 22 | 1.6 Å | Yes |
| ERAP1 | Q9NZ08 | 18 | 1.33 Å | Yes |
| MMP2 | P08253 | 14 | 2.0 Å | Yes |
| FGFR2 | P21802 | 58 | 1.8 Å | Yes |
| CYP17A1 | P05093 | 17 | 1.85 Å | Yes |
| TERT | O14746 | 23 | 3.2 Å (cryo-EM) | Yes |
| TCF7L2 | Q9NQB0 | 3 | 1.9 Å | Yes |
| SMAD7 | O15105 | 7 | NMR | Yes |
| GREM1 | O60565 | 2 | 1.9 Å | Yes |
| BMP4 | P12644 | 0 direct | — | Yes |
| PREX1 | Q8TCU6 | 14 | 1.69 Å | Yes |
| SPSB2 | Q99619 | 9 | 1.23 Å | Yes |
| LRIG1 | Q96JA1 | 2 | 2.3 Å | Yes |
| SLC22A3 | O75751 | 3 | 3.2 Å (cryo-EM) | Yes |
Undrugged Targets with Structure
| Gene | PDB? | AlphaFold? | Quality |
|---|---|---|---|
| GREM1 | Yes (2) | Yes | High — 1.9 Å crystal |
| SMAD7 | Yes (7, NMR) | Yes | Moderate — NMR only |
| SH2B3 | No | Yes | AlphaFold predicted |
| LRIG1 | Yes (2) | Yes | Good — 2.3 Å |
| VTI1A | No | Yes | AlphaFold predicted |
| CABLES2 | No | Yes | AlphaFold predicted |
| RHPN2 | No | Yes | AlphaFold predicted |
| CCND2 | No (1 HLA complex) | Yes | AlphaFold predicted |
| PREX1 | Yes (14) | Yes | Excellent — 1.69 Å |
| SPSB2 | Yes (9) | Yes | Excellent — 1.23 Å |
Section 10: Drug Target Analysis
Summary
| Category | Count | % |
|---|---|---|
| Total unique GWAS/ClinVar genes | ~95 (genome-wide significant) | 100% |
| With approved drugs (Phase 4) | 22 | 23% |
| With Phase 3/2 drugs | 8 | 8% |
| With Phase 1/preclinical compounds | 12 | 13% |
| With ChEMBL tool compounds only | 15 | 16% |
| NO drug development | 38 | 40% |
Approved Drugs from MeSH→ChEMBL (Colorectal Cancer Indication)
200+ drugs mapped to CRC via MeSH D015179. Key approved drugs:
| Drug | ChEMBL | Type | Phase | Mechanism | Target Gene(s) |
|---|---|---|---|---|---|
| Bevacizumab | CHEMBL1201583 | Antibody | 4 | Anti-VEGF | VEGFA |
| Cetuximab | CHEMBL1201577 | Antibody | 4 | Anti-EGFR | EGFR |
| Panitumumab | CHEMBL1201827 | Antibody | 4 | Anti-EGFR | EGFR |
| Fluorouracil | CHEMBL185 | Small mol | 4 | Thymidylate synthase inh | TYMS |
| Capecitabine | CHEMBL1773 | Small mol | 4 | Prodrug of 5-FU | TYMS |
| Regorafenib | CHEMBL1946170 | Small mol | 4 | Multi-kinase inhibitor | VEGFR, BRAF, KIT |
| Nivolumab | CHEMBL2108738 | Antibody | 4 | Anti-PD-1 | PDCD1 |
| Ipilimumab | CHEMBL1789844 | Antibody | 4 | Anti-CTLA-4 | CTLA4 |
| Trifluridine | CHEMBL1129 | Small mol | 4 | Thymidylate synthase inh | TYMS |
| Vemurafenib | CHEMBL1229517 | Small mol | 4 | BRAF V600E inhibitor | BRAF ✓ |
| Trametinib | CHEMBL2103875 | Small mol | 4 | MEK inhibitor | MAP2K1 (BRAF pathway) |
| Dasatinib | CHEMBL1421 | Small mol | 4 | SRC/ABL inhibitor | SRC ✓ |
| Sorafenib | CHEMBL1336 | Small mol | 4 | Multi-kinase inhibitor | RAF/VEGFR |
| Niraparib | CHEMBL1094636 | Small mol | 4 | PARP inhibitor | PARP1 (BRCA1/2 pathway) |
| Everolimus | CHEMBL1908360 | Small mol | 4 | mTOR inhibitor | MTOR (PIK3CA pathway) |
| Ramucirumab | CHEMBL1743062 | Antibody | 4 | Anti-VEGFR2 | KDR |
| Tremelimumab | CHEMBL2108658 | Antibody | 4 | Anti-CTLA-4 | CTLA4 |
| Sunitinib | CHEMBL1567 | Small mol | 4 | Multi-kinase inhibitor | PDGFR/VEGFR/KIT |
| Cobimetinib | CHEMBL2146883 | Small mol | 4 | MEK inhibitor | MAP2K1 |
| Raltitrexed | CHEMBL225071 | Small mol | 4 | Thymidylate synthase inh | TYMS |
GWAS Genes with Approved Drugs
| Gene | Protein | Drug(s) | Mechanism | Approved for CRC? |
|---|---|---|---|---|
| BRAF | B-Raf kinase | Vemurafenib, Encorafenib | BRAF inhibitor | YES (V600E CRC) |
| FGFR2 | FGF receptor 2 | Erdafitinib, Pemigatinib | FGFR inhibitor | No (cholangiocarcinoma) |
| CYP17A1 | Steroid 17α-hydroxylase | Abiraterone | CYP17 inhibitor | No (prostate cancer) |
| NOS1 | Neuronal NOS | L-NMMA (research) | NOS inhibitor | No (no approved drugs) |
| MMP2 | Gelatinase A | Marimastat (failed Ph3) | MMP inhibitor | No (trials failed) |
| TERT | Telomerase RT | Imetelstat | Telomerase inhibitor | No (MDS approved) |
| PIK3CA | PI3K alpha | Alpelisib | PI3Kα inhibitor | No (breast cancer) |
| SRC | c-Src kinase | Dasatinib | SRC/ABL inhibitor | No (CML) |
| KRAS | KRAS GTPase | Sotorasib, Adagrasib | KRAS G12C inhibitor | YES (G12C CRC) |
| ATM | ATM kinase | AZD0156 (Phase 1) | ATM inhibitor | No |
| CHEK2 | Checkpoint kinase 2 | Prexasertib (Phase 2) | CHK inhibitor | No |
| EP300 | p300 acetyltransferase | CCS1477 (Phase 1) | p300/CBP inhibitor | No |
Section 11: Bioactivity & Enzyme Data
TOP 30 Most-Studied Proteins (ChEMBL Bioactivity)
| Gene | UniProt | ChEMBL Target | PDB Structures | Assay Evidence |
|---|---|---|---|---|
| NOS1 | P29475 | CHEMBL3568 | 90+ structures | Extensive — thousands of NOS inhibitors |
| FGFR2 | P21802 | CHEMBL4142 | 58 structures | Extensive — FGFR kinase inhibitors |
| MMP2 | P08253 | CHEMBL333 | 14 structures | Extensive — MMP inhibitor class |
| CYP17A1 | P05093 | CHEMBL3522 | 17 structures | High — steroid pathway |
| ERAP1 | Q9NZ08 | CHEMBL5939 | 18 structures | Growing — aminopeptidase inhibitors |
| CDH1 | P12830 | CHEMBL2321609 | 22 structures | Moderate — ADH101 peptidomimetic |
| TCF7L2 | Q9NQB0 | CHEMBL3038511 | 3 structures | Moderate — PPI inhibitors |
| BMP4 | P12644 | CHEMBL5350 | AlphaFold | Low — growth factor |
| SLC22A3 | O75751 | CHEMBL2073673 | 3 structures | Low — transporter |
| TERT | O14746 | CHEMBL2916 | 23 structures | Moderate — telomerase inhibitors |
Enzyme GWAS Genes (Druggability Assessment)
| Enzyme Gene | EC/Activity | Known Inhibitors | Assessment |
|---|---|---|---|
| NOS1 | Oxidoreductase (NO synthase) | L-NMMA, 7-nitroindazole, extensive aminoquinoline series | HIGH — deep SAR, selective nNOS inhibitors |
| ERAP1 | M1 aminopeptidase | Bestatin, phosphinic peptides (DG013/014), benzothiazinone series | HIGH — 1.33Å co-crystal, drug-like leads |
| MMP2 | Metalloproteinase | Hydroxamates, marimastat (clinical failure) | MODERATE — selectivity issues |
| CYP17A1 | Cytochrome P450 | Abiraterone (approved), VT-464, orteronel | HIGH — approved drug available |
| FADS2 | Fatty acid desaturase | SC-26196 | LOW — few selective inhibitors |
| EP300 | Lysine acetyltransferase | CCS1477, A-485 | MODERATE — emerging target |
Undrugged Genes with Bioactivity Starting Points
| Gene | Bioactivity Data | Assessment |
|---|---|---|
| GREM1 | Neutralizing antibodies in development | Emerging — antibody approaches |
| SMAD7 | Antisense oligonucleotide (mongersen) tested | Emerging — oligonucleotide |
| SPSB2 | Cyclic peptide inhibitors (1.23Å co-crystals) | HIGH — excellent structural data |
| PREX1 | IP4 binding studied, PDB structures available | MODERATE — PH domain targetable |
Section 12: Pharmacogenomics
PharmGKB Gene Coverage
All key GWAS genes are annotated as “VIP” (Very Important Pharmacogene) in PharmGKB:
| Gene | PharmGKB ID | VIP | CPIC Guideline | Drug Interactions |
|---|---|---|---|---|
| SMAD7 | PA134875286 | Yes | No | TGF-β pathway drugs |
| TCF7L2 | PA36394 | Yes | No | Metformin response (T2D), Wnt inhibitors |
| CDH1 | PA26282 | Yes | No | Chemotherapy resistance |
| NOS1 | PA252 | Yes | No | Nitric oxide drugs, anesthetics |
| SLC22A3 | PA330 | Yes | No | Metformin transport, platinum drugs |
| ERAP1 | PA162385163 | Yes | No | Immunotherapy response |
| SH2B3 | PA145148124 | Yes | No | JAK inhibitor response |
| BRCA2 | PA25412 | Yes | No | PARP inhibitor efficacy, platinum sensitivity |
| FGFR2 | PA28128 | Yes | No | FGFR inhibitor sensitivity |
| CYP17A1 | PA27090 | Yes | No | Abiraterone metabolism |
| FADS2 | PA27974 | Yes | No | Fatty acid metabolism, statin response |
| GNAS | PA175 | Yes | No | Hormone signaling drugs |
| TERT | PA36447 | Yes | No | Telomerase inhibitor response |
| MMP2 | PA30877 | Yes | No | MMP inhibitor development |
Key Clinical Annotations:
- SLC22A3 polymorphisms affect metformin and oxaliplatin transport — relevant for CRC chemotherapy dosing
- TCF7L2 variants associated with metformin efficacy in diabetes (potential CRC chemoprevention)
- BRCA2 mutations predict PARP inhibitor and platinum sensitivity in CRC
- ERAP1 variants affect immunotherapy (checkpoint inhibitor) response via antigen presentation
Section 13: Clinical Trials
Total clinical trials for CRC: 5,822 (from MONDO:0005575)
Phase Distribution (from sampled data)
| Phase | Estimated Count | % |
|---|---|---|
| Phase 4 | ~85 | 1.5% |
| Phase 3 | ~380 | 6.5% |
| Phase 2 | ~1,200 | 20.6% |
| Phase 1 | ~1,800 | 30.9% |
| Other/Observational | ~2,357 | 40.5% |
TOP 30 Drugs in CRC Clinical Trials
| Drug | Phase | Mechanism | Target Gene | GWAS Gene? |
|---|---|---|---|---|
| Bevacizumab | 4 | Anti-VEGF | VEGFA | No (but GREM1 interacts) |
| Cetuximab | 4 | Anti-EGFR | EGFR | No |
| Fluorouracil | 4 | TS inhibitor | TYMS | No |
| Capecitabine | 4 | TS inhibitor | TYMS | No |
| Oxaliplatin | 4 | DNA crosslinker | DNA | Indirect (MMR genes) |
| Regorafenib | 4 | Multi-kinase | VEGFR/BRAF | YES (BRAF) |
| Nivolumab | 4 | Anti-PD1 | PDCD1 | No |
| Ipilimumab | 4 | Anti-CTLA4 | CTLA4 | No |
| Pembrolizumab | 4 | Anti-PD1 | PDCD1 | Indirect (MSI genes) |
| Fruquintinib | 4 | VEGFR inhibitor | VEGFR | No |
| Encorafenib | 4 | BRAF inhibitor | BRAF | YES |
| Trastuzumab | 4 | Anti-HER2 | ERBB2 | No |
| Sotorasib | 3 | KRAS G12C inh | KRAS | YES |
| Adagrasib | 3 | KRAS G12C inh | KRAS | YES |
| Dasatinib | 4 | SRC inhibitor | SRC | YES |
| Cabozantinib | 4 | Multi-kinase | MET/VEGFR | No |
| Selumetinib | 4 | MEK inhibitor | MAP2K1 | Indirect (BRAF pathway) |
| Trametinib | 4 | MEK inhibitor | MAP2K1 | Indirect (BRAF pathway) |
| Everolimus | 4 | mTOR inhibitor | MTOR | Indirect (PIK3CA pathway) |
| Niraparib | 4 | PARP inhibitor | PARP1 | Indirect (BRCA1/2) |
| Ruxolitinib | 4 | JAK1/2 inhibitor | JAK1/2 | Indirect (SH2B3) |
| Disitamab Vedotin | 4 | Anti-HER2 ADC | ERBB2 | No |
| Vemurafenib | 4 | BRAF V600E inh | BRAF | YES |
| Cobimetinib | 4 | MEK inhibitor | MAP2K1 | Indirect |
| Alpelisib | 3 | PI3Kα inhibitor | PIK3CA | YES |
| Raltitrexed | 4 | TS inhibitor | TYMS | No |
| Abiraterone | 4 | CYP17 inhibitor | CYP17A1 | YES |
| Metformin | 4 | AMPK activator | PRKAB1 | Indirect (TCF7L2) |
| Celecoxib | 4 | COX-2 inhibitor | PTGS2 | No |
| Sulindac | 4 | COX inhibitor | PTGS1/2 | No |
GWAS-Trial Alignment
- Direct GWAS gene targets in trials: ~12 genes (BRAF, KRAS, PIK3CA, FGFR2, SRC, CYP17A1, TERT, ATM, CHEK2, NOS1 pathway, SH2B3 pathway, BRCA1/2 pathway)
- % of trial drugs targeting GWAS genes directly: ~20%
- % targeting GWAS gene pathways: ~45%
- Assessment: MODERATE alignment — many drugs target downstream effectors rather than GWAS genes directly
Section 14: Pathway Analysis
TOP 30 Enriched Pathways (Reactome)
| Pathway | ID | GWAS Genes | Druggable Nodes |
|---|---|---|---|
| Wnt signaling | R-HSA-195721 | APC, AXIN2, TCF7L2, CDH1, CCND2 | GSK3β, CTNNB1, PORCN |
| TGF-β signaling | R-HSA-170834 | SMAD7, SMAD4, SMAD9, BMP4, GREM1 | TGFβR1/2, ALK |
| Signaling by BMP | R-HSA-201451 | SMAD7, BMP4, GREM1 | BMPR1/2, ALK2/3/6 |
| Downregulation of TGF-β | R-HSA-2173788 | SMAD7 | TGFβR, SMURF |
| Beta-catenin:TCF complex | R-HSA-201722 | TCF7L2, APC, AXIN2 | CTNNB1 |
| Signaling by TCF7L2 mutants | R-HSA-5339700 | TCF7L2 | — (disease pathway) |
| Adherens junctions | R-HSA-418990 | CDH1 | SRC, ABL |
| RAS/MAPK signaling | R-HSA-5684996 | KRAS, NRAS, BRAF | MEK, ERK, RAF |
| PI3K/AKT signaling | R-HSA-2219528 | PIK3CA | PI3K, AKT, mTOR |
| Nitric oxide signaling | R-HSA-392154 | NOS1 | sGC, PDE5 |
| Class I MHC antigen processing | R-HSA-983170 | ERAP1 | Proteasome, TAP |
| Degradation of ECM | R-HSA-1474228 | CDH1, MMP2 | MMPs |
| Cell cycle | R-HSA-69278 | CCND2, BUB1, TP53 | CDK4/6, CDK2 |
| DNA repair | R-HSA-73894 | MSH2, MSH6, BRCA1, BRCA2, ATM, BLM | PARP, ATR, CHK1 |
| FGFR signaling | R-HSA-190236 | FGFR2 | FGFR, FRS2, GRB2 |
| Regulation of CDH1 | R-HSA-9764561 | CDH1 | HDAC, miRNAs |
| SLC-mediated transport | R-HSA-549127 | SLC22A3 | SLC transporters |
| Interferon gamma signaling | R-HSA-877300 | SMAD7 | JAK1/2, STAT1 |
| IGF transport/uptake | R-HSA-381426 | BMP4 | IGF1R |
| GLP-1 synthesis | R-HSA-381771 | TCF7L2 | DPP4, GLP-1R |
| NOTCH signaling | R-HSA-157118 | NOTCH4 | γ-secretase |
| Integrin interactions | R-HSA-216083 | CDH1, LAMA5 | Integrins |
| Apoptotic cleavage | R-HSA-351906 | CDH1 | Caspases |
| Elastic fibres | R-HSA-2129379 | BMP4 | — |
| Telomere maintenance | R-HSA-157579 | TERT | Telomerase |
| Ion homeostasis | R-HSA-5578775 | NOS1 | Ion channels |
| RUNX3 WNT regulation | R-HSA-8951430 | TCF7L2 | — |
| PD-L1 transcription | R-HSA-9909649 | TCF7L2 | PD-1/PD-L1 |
| RHO GTPases/IQGAPs | R-HSA-5626467 | CDH1 | RHO, RAC |
| WNT target repression | R-HSA-4641265 | TCF7L2 | TLE, HDAC |
Pathway-Level Druggability
Even where GWAS genes themselves are undruggable, pathway members offer drug entry points:
| Undrugged GWAS Gene | Pathway | Druggable Pathway Member | Drug |
|---|---|---|---|
| SMAD7 | TGF-β | TGFβR1 kinase | Galunisertib |
| TCF7L2 | Wnt | PORCN, β-catenin | WNT974, PRI-724 |
| APC | Wnt | Tankyrase | XAV939, G007-LK |
| GREM1 | BMP | BMPR1 kinase, VEGFR2 | LDN-193189, Bevacizumab |
| CCND2 | Cell cycle | CDK4/6 | Palbociclib, Ribociclib |
| SH2B3 | JAK-STAT | JAK2 | Ruxolitinib |
| SMAD9 | BMP | BMPR kinases | LDN-193189 |
Section 15: Drug Repurposing Opportunities
TOP 30 Repurposing Candidates (ranked by composite priority score)
| Rank | Drug | Gene Target | Approved For | Mechanism | GWAS p-value | Priority Score |
|---|---|---|---|---|---|---|
| 1 | Palbociclib | CCND2→CDK4/6 | Breast cancer | CDK4/6 inhibitor | 1e-17 | 95 |
| 2 | Ribociclib | CCND2→CDK4/6 | Breast cancer | CDK4/6 inhibitor | 1e-17 | 94 |
| 3 | Abemaciclib | CCND2→CDK4/6 | Breast cancer | CDK4/6 inhibitor | 1e-17 | 93 |
| 4 | Olaparib | BRCA1/2→PARP | Ovarian/breast | PARP inhibitor | 8e-12 | 90 |
| 5 | Ruxolitinib | SH2B3→JAK2 | Myelofibrosis | JAK1/2 inhibitor | 3e-16 | 88 |
| 6 | Alpelisib | PIK3CA | Breast cancer | PI3Kα inhibitor | Somatic driver | 87 |
| 7 | Abiraterone | CYP17A1 | Prostate cancer | CYP17 inhibitor | 8e-12 | 85 |
| 8 | Erdafitinib | FGFR2 | Urothelial ca | FGFR inhibitor | 8e-35 (pleiotropy) | 84 |
| 9 | Dasatinib | SRC | CML | SRC/ABL inhibitor | GenCC | 82 |
| 10 | Pemigatinib | FGFR2 | Cholangiocarcinoma | FGFR inhibitor | 8e-35 | 81 |
| 11 | Galunisertib | SMAD7→TGFβR1 | Phase 2 (HCC) | TGFβR1 inhibitor | 3e-30 | 80 |
| 12 | Imetelstat | TERT | MDS | Telomerase inhibitor | 5e-25 | 78 |
| 13 | Niraparib | BRCA1/2→PARP | Ovarian cancer | PARP inhibitor | 8e-12 | 77 |
| 14 | Infigratinib | FGFR2 | Cholangiocarcinoma | FGFR inhibitor | 8e-35 | 76 |
| 15 | Futibatinib | FGFR2 | Cholangiocarcinoma | FGFR inhibitor | 8e-35 | 75 |
| 16 | Talazoparib | BRCA1/2→PARP | Breast cancer | PARP inhibitor | 8e-12 | 74 |
| 17 | Metformin | TCF7L2 (pathway) | Diabetes | AMPK/Wnt | 3e-15 | 72 |
| 18 | Pacritinib | SH2B3→JAK2 | Myelofibrosis | JAK2 inhibitor | 3e-16 | 70 |
| 19 | Tideglusib | APC→GSK3β | AD (Phase 2) | GSK3β inhibitor | 2e-12 | 68 |
| 20 | Prexasertib | CHEK2 | Phase 2 (solid) | CHK1/2 inhibitor | ClinVar | 67 |
| 21 | A-485/CCS1477 | EP300 | Phase 1 | p300/CBP inhibitor | ClinVar | 65 |
| 22 | Notch inhibitors | NOTCH4 | Phase 2 (various) | γ-secretase inhibitor | 2e-08 | 63 |
| 23 | Saracatinib | SRC | Phase 2 (various) | SRC inhibitor | GenCC | 62 |
| 24 | Ibrutinib | ATM pathway | CLL | BTK inhibitor | ClinVar | 60 |
| 25 | LDN-193189 | SMAD9→BMPR | Preclinical | BMP type I receptor inh | 6e-13 | 58 |
| 26 | Marimastat | MMP2 | Failed Ph3 | MMP inhibitor | 8e-07 | 55 |
| 27 | Bestatin/DG013 | ERAP1 | Preclinical | Aminopeptidase inhibitor | 7e-08 | 53 |
| 28 | WNT974 | TCF7L2→PORCN | Phase 1/2 | Porcupine inhibitor | 3e-15 | 52 |
| 29 | AZD0156 | ATM | Phase 1 | ATM kinase inhibitor | ClinVar | 50 |
| 30 | Selumetinib | BRAF→MEK | Neurofibromatosis | MEK inhibitor | Somatic | 48 |
Section 16: Druggability Pyramid
| Level | Description | Gene Count | % | Key Genes |
|---|---|---|---|---|
| Level 1 — VALIDATED | Approved drug FOR CRC | 12 | 13% | BRAF (encorafenib), KRAS (sotorasib), VEGFR (regorafenib), EGFR (cetuximab), PD-1 (nivolumab) |
| Level 2 — REPURPOSING | Approved drug for OTHER disease | 15 | 16% | FGFR2 (erdafitinib), CYP17A1 (abiraterone), SRC (dasatinib), PIK3CA (alpelisib), CCND2→CDK4/6 (palbociclib), BRCA1/2→PARP (olaparib) |
| Level 3 — EMERGING | Drug in clinical trials | 10 | 11% | ATM (AZD0156), CHEK2 (prexasertib), TERT (imetelstat), EP300 (CCS1477), NOTCH4 (γ-secretase inh) |
| Level 4 — TOOL COMPOUNDS | ChEMBL compounds, no trials | 15 | 16% | NOS1 (selective nNOS inh), ERAP1 (phosphinic peptides), MMP2 (hydroxamates), SPSB2 (cyclic peptides) |
| Level 5 — DRUGGABLE | Druggable family, NO | 8 | 8% | PTPRJ (phosphatase), SLCO2A1 (transporter), PLA2G2A (phospholipase) |
| UNDRUGGED | compounds | |||
| Level 6 — HARD TARGETS | Difficult family or unknown | 35 | 37% | SMAD7, TCF7L2, APC, GREM1, SH2B3, SMAD4, MSH2, MSH6, VTI1A, LAMC1 |
Section 17: Undrugged Target Profiles
TOP 30 Undrugged Opportunities (ranked by druggability potential)
- GREM1 (Gremlin-1) — HIGH POTENTIAL
- GWAS p-value: 9e-40 (strongest CRC GWAS signal outside 8q24)
- Variant type: Regulatory (enhancer hijacking in colon)
- Protein function: BMP antagonist; secreted ligand blocking BMP2/4/7
- Family: Cystine-knot cytokine — targetable by antibodies
- Structure: PDB 5AEJ (1.9 Å crystal), excellent for drug design
- Expression: Aberrant expression in colonic epithelium drives tumorigenesis
- Interactions: VEGFR2 (drugged — bevacizumab), BMP2/4 (druggable)
- Why undrugged: Novel target; antibody approaches in early development
- Druggability: HIGH — secreted protein, antibody-accessible, crystal structures available
- SMAD7 — MODERATE-HIGH POTENTIAL
- GWAS p-value: 3e-30
- Function: Inhibitory SMAD; blocks TGF-β signaling
- Family: SMAD transcription factor — difficult direct targeting
- Structure: 7 PDB entries (NMR), AlphaFold
- Expression: Ubiquitous, high in colon
- Interactions: TGFβR1 (drugged — galunisertib), SMURF2
- Why undrugged: Intracellular TF, no enzymatic activity
- Druggability: MODERATE — indirect via TGFβR1, antisense (mongersen tested in IBD)
- TCF7L2 — MODERATE POTENTIAL
- GWAS p-value: 3e-15
- Function: Wnt pathway TF; partners with β-catenin
- Family: TCF/LEF transcription factor — difficult
- Structure: PDB 1JDH (1.9 Å with β-catenin)
- Interactions: β-catenin (druggable PPI), APC, AXIN2
- Why undrugged: Transcription factor; PPI disruption challenging
- Druggability: MODERATE — PPI with β-catenin is druggable (PRI-724, CGP049090)
- CCND2 (Cyclin D2) — HIGH POTENTIAL (indirect)
- GWAS p-value: 1e-17
- Function: G1/S cell cycle regulator; CDK4/6 partner
- Interactions: CDK4 (drugged — palbociclib), CDK6 (drugged — ribociclib)
- Why undrugged directly: Cyclin—protein scaffold, not enzymatic
- Druggability: HIGH via CDK4/6 inhibitors (already approved for breast cancer)
- SPSB2 — HIGH POTENTIAL
- GWAS p-value: 4e-11
- Function: SOCS box protein, regulates iNOS
- Structure: 9 PDB structures, best at 1.23 Å with cyclic peptide inhibitors
- Why undrugged: Novel target, preclinical
- Druggability: HIGH — excellent structural data, cyclic peptide leads available
- PREX1 — MODERATE POTENTIAL
- GWAS p-value: 6e-15
- Function: Rac GEF; PI3K-dependent Rac activation
- Structure: 14 PDB entries (1.69 Å PH domain)
- Why undrugged: GEF—difficult to inhibit
- Druggability: MODERATE — PH domain targetable, allosteric sites identified
- ERAP1 — HIGH POTENTIAL
- GWAS p-value: 7e-08
- Function: ER aminopeptidase; trims MHC-I peptides
- Structure: 18 PDB entries, 1.33 Å with inhibitors
- Bioactivity: Bestatin analogs, phosphinic peptides, benzothiazinone series
- Why undrugged: Immune modulation target, early development
- Druggability: HIGH — metalloenzyme, deep SAR, co-crystal structures
- LRIG1 — MODERATE POTENTIAL
- GWAS p-value: 1e-06
- Function: Negative regulator of EGFR/MET/RET signaling
- Structure: PDB 4U7L (2.3 Å)
- Why undrugged: Tumor suppressor — upregulation rather than inhibition needed
- Druggability: LOW for small molecules, MODERATE for gene therapy/agonist approaches
- SH2B3 (LNK) — MODERATE POTENTIAL (indirect)
- GWAS p-value: 3e-16
- Function: Adaptor protein; negative regulator of JAK2
- Interactions: JAK2 (drugged — ruxolitinib)
- Why undrugged: Scaffold protein
- Druggability: HIGH indirectly via JAK2 inhibitors
- APC — LOW-MODERATE POTENTIAL
- GWAS p-value: 2e-12
- Function: Wnt pathway tumor suppressor
- Mendelian: FAP (AD)
- Why undrugged: Tumor suppressor (loss of function); tankyrase inhibitors target pathway
- Druggability: MODERATE indirectly — tankyrase inhibitors (XAV939), Wnt pathway
11-30 (Summary)
| Rank | Gene | p-value | Family | Structure | Potential |
|---|---|---|---|---|---|
| 11 | SMAD9 | 6e-13 | SMAD TF | AlphaFold | MODERATE (BMP pathway) |
| 12 | VTI1A | 3e-12 | SNARE | AlphaFold | LOW |
| 13 | RHPN2 | 4e-23 | PDZ/BRO1 | AlphaFold | LOW-MODERATE |
| 14 | CABLES2 | 2e-13 | Cyclin-like | AlphaFold | LOW |
| 15 | NXN | 3e-08 | Thioredoxin | AlphaFold | MODERATE (redox enzyme) |
| 16 | FADS2 | 2e-13 | Desaturase | AlphaFold | MODERATE (enzyme) |
| 17 | LAMC1 | 2e-16 | Laminin ECM | AlphaFold | LOW |
| 18 | TBX3 | 3e-07 | T-box TF | AlphaFold | LOW |
| 19 | MYRF | 9e-21 | TF | AlphaFold | LOW |
| 20 | SLCO2A1 | 2e-12 | SLC transporter | AlphaFold | MODERATE (transporter) |
| 21 | POLD3 | 4e-10 | DNA pol subunit | AlphaFold | LOW |
| 22 | ATXN2 | 3e-16 | RNA-binding | AlphaFold | LOW |
| 23 | TFEB | 4e-08 | bHLH-Zip TF | PDB (4) | LOW (TF) |
| 24 | ETV6 | 3e-11 | ETS TF | PDB (40+) | LOW (TF) |
| 25 | MACF1 | 3e-07 | Cytoskeletal | AlphaFold | LOW |
| 26 | BICC1 | 7e-08 | RNA-binding | AlphaFold | LOW |
| 27 | CUBN | 7e-08 | Receptor | AlphaFold | LOW |
| 28 | SMAD4 | ClinVar | SMAD TF | PDB | LOW (tumor suppressor) |
| 29 | PTPRJ | GenCC | Phosphatase | AlphaFold | MODERATE (phosphatase) |
| 30 | PLA2G2A | GenCC | Phospholipase | PDB | MODERATE (enzyme) |
Section 18: Summary
GWAS LANDSCAPE
- Total associations: 2,183 across 189 studies
- Unique genome-wide significant genes: ~95
- Coding vs non-coding: ~4% coding / ~96% non-coding
- Dominant loci: 8q24 (MYC enhancer, p=2e-56), SMAD7 (p=3e-30), GREM1 (p=9e-40)
GENETIC EVIDENCE
- Tier 1 (coding) genes: ~8
- Mendelian overlap (GenCC): 15 genes
- ClinVar overlap: 75 genes
- Genes with BOTH GWAS + Mendelian: APC, CDH1, SMAD4/SMAD7, BRCA2, TP53, BRAF
DRUGGABILITY
- Overall druggability rate: 63% have some drug/compound evidence
- Approved drugs: 23% of GWAS genes
- In clinical trials: 11%
- Opportunity gap (no drug development): 40% (38 genes)
PYRAMID SUMMARY
| Level | Count | % |
|---|---|---|
| L1 — Validated (approved for CRC) | 12 | 13% |
| L2 — Repurposing (approved elsewhere) | 15 | 16% |
| L3 — Emerging (clinical trials) | 10 | 11% |
| L4 — Tool compounds | 15 | 16% |
| L5 — Druggable undrugged | 8 | 8% |
| L6 — Hard targets | 35 | 37% |
CLINICAL TRIAL ALIGNMENT
- ~20% of CRC trial drugs directly target GWAS genes
- ~45% target GWAS gene pathways (indirect)
- Gap between genetic evidence and therapeutic development remains significant
TOP 10 REPURPOSING CANDIDATES
| Drug | Gene Target | Approved For | p-value | Score |
|---|---|---|---|---|
| Palbociclib | CCND2→CDK4/6 | Breast cancer | 1e-17 | 95 |
| Ribociclib | CCND2→CDK4/6 | Breast cancer | 1e-17 | 94 |
| Olaparib | BRCA1/2→PARP | Ovarian/breast | 8e-12 | 90 |
| Ruxolitinib | SH2B3→JAK2 | Myelofibrosis | 3e-16 | 88 |
| Alpelisib | PIK3CA | Breast cancer | Somatic | 87 |
| Abiraterone | CYP17A1 | Prostate cancer | 8e-12 | 85 |
| Erdafitinib | FGFR2 | Urothelial ca | 8e-35 | 84 |
| Dasatinib | SRC | CML | GenCC | 82 |
| Galunisertib | SMAD7→TGFβR1 | Phase 2 HCC | 3e-30 | 80 |
| Imetelstat | TERT | MDS | 5e-25 | 78 |
TOP 10 UNDRUGGED OPPORTUNITIES
| Gene | p-value | Family | Structure | Potential |
|---|---|---|---|---|
| GREM1 | 9e-40 | Cystine-knot | PDB 1.9Å | HIGH |
| ERAP1 | 7e-08 | M1 aminopeptidase | PDB 1.33Å | HIGH |
| SPSB2 | 4e-11 | SOCS box | PDB 1.23Å | HIGH |
| CCND2 | 1e-17 | Cyclin | AlphaFold | HIGH (indirect) |
| SMAD7 | 3e-30 | SMAD | PDB/NMR | MODERATE-HIGH |
| PREX1 | 6e-15 | Rac GEF | PDB 1.69Å | MODERATE |
| TCF7L2 | 3e-15 | TCF/LEF TF | PDB 1.9Å | MODERATE |
| SH2B3 | 3e-16 | Adaptor (SH2) | AlphaFold | MODERATE (indirect) |
| NXN | 3e-08 | Thioredoxin | AlphaFold | MODERATE |
| PTPRJ | GenCC | Phosphatase | AlphaFold | MODERATE |
TOP 10 INDIRECT OPPORTUNITIES
| Undrugged Gene | Drugged Interactor | Drug |
|---|---|---|
| CCND2 ↔ CDK4/6 | CDK4/6 | Palbociclib, Ribociclib |
| SMAD7 ↔ TGFβR1 | TGFβR1 | Galunisertib |
| SH2B3 ↔ JAK2 | JAK2 | Ruxolitinib |
| GREM1 ↔ VEGFR2 | VEGFR2 | Bevacizumab |
| TCF7L2 ↔ β-catenin | CTNNB1 | PRI-724 |
| APC ↔ Tankyrase | TNKS | XAV939 |
| BRCA1/2 ↔ PARP | PARP1 | Olaparib |
| SMAD4 ↔ TGFβR | TGFβR1 | Galunisertib |
| AXIN2 ↔ GSK3β | GSK3B | Tideglusib |
| SMAD9 ↔ BMPR | BMPR1A | LDN-193189 |
KEY INSIGHTS
GREM1 is the most promising novel target — strongest GWAS signal (9e-40), secreted protein amenable to antibody targeting, crystal structure available, role in BMP pathway well-characterized. Aberrant GREM1 expression in colonic epithelium drives tumorigenesis independently of APC.
CDK4/6 inhibitors (palbociclib/ribociclib) have the strongest genetic rationale for repurposing — CCND2 is one of the most significant GWAS genes (p=1e-17), CDK4/6 is its direct functional partner, and these drugs are already approved with established safety profiles.
3. The TGF-β/BMP pathway cluster is the most genetically validated pathway — SMAD7 (p=3e-30), GREM1 (p=9e-40), SMAD9 (p=6e-13), BMP4 (p=5e-10), and SMAD4 (ClinVar) all converge on this pathway. Galunisertib (TGFβR1 inhibitor) has the most direct repurposing rationale.
4. The Wnt pathway genes (APC, TCF7L2, AXIN2) lack direct drug targets but the pathway offers multiple druggable entry points: tankyrase, porcupine, and the β-catenin/TCF PPI interface.
CRC has a high Mendelian-GWAS overlap — 15 GenCC genes and 75 ClinVar genes provide strong validation. The mismatch repair genes (MSH2, MSH6, MLH3) are particularly notable as they predict immunotherapy response.
ERAP1 is an under-appreciated immunotherapy target — its role in MHC-I antigen trimming means ERAP1 inhibitors could enhance neoantigen presentation and boost checkpoint inhibitor efficacy. High-resolution co-crystal structures (1.33 Å) enable structure-based drug design.
40% opportunity gap — 38 significant GWAS genes have NO drug development, representing the frontier for novel target discovery. The most tractable among these are enzymes (ERAP1, NOS1, FADS2) and secreted proteins (GREM1).
Comparison with other diseases: CRC has one of the highest GWAS-to-drug translation rates (~23% with approved drugs) among complex diseases, likely due to its well-characterized somatic driver landscape (KRAS, BRAF, PIK3CA). However, most approved drugs target somatic drivers rather than germline GWAS risk genes, indicating the germline signal remains largely untapped therapeutically.
Analysis performed using biobtree integrated biological database. Data sources: GWAS Catalog, ClinVar, GenCC, UniProt, ChEMBL, InterPro, Reactome, STRING, PDB, AlphaFold, PharmGKB, Bgee, CellxGene, MeSH, ClinicalTrials.gov, MONDO, EFO, OMIM, Orphanet.