Colorectal Cancer: GWAS to Drug Target Druggability Analysis

Perform a comprehensive GWAS-to-drug-target druggability analysis for Colorectal Cancer. Trace genetic associations through variants, genes, and …

Perform a comprehensive GWAS-to-drug-target druggability analysis for Colorectal Cancer. Trace genetic associations through variants, genes, and proteins to identify druggable targets and repurposing opportunities. Do NOT read any existing files in this directory. Do NOT use any claude.ai MCP tools (ChEMBL etc). Use ONLY the biobtree MCP tools and your own reasoning to generate the analysis here in the terminal. ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ SECTION 1: DISEASE IDENTIFIERS ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ Find all database identifiers for Colorectal Cancer: MONDO, EFO, OMIM, Orphanet, MeSH ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ SECTION 2: GWAS LANDSCAPE ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ Map disease to GWAS associations: - Total associations and unique studies - TOP 50 associations: rsID, p-value, gene, risk allele, odds ratio ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ SECTION 3: VARIANT DETAILS (dbSNP) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ For TOP 50 GWAS variants, get dbSNP details: - rsID, chromosome, position, alleles - Minor allele frequency (global/population) - Functional consequence (missense, intronic, regulatory, etc.) Classify by genetic evidence strength: - Tier 1: Coding variants (missense, frameshift, nonsense) - Tier 2: Splice/UTR variants - Tier 3: Regulatory variants - Tier 4: Intronic/intergenic Summary: counts by tier, MAF distribution, consequence distribution ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ SECTION 4: MENDELIAN DISEASE OVERLAP ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ Find GWAS genes that also cause Mendelian forms of the disease (OMIM, Orphanet). Genes with BOTH GWAS + Mendelian evidence = highest confidence targets. List: Gene, GWAS p-value, Mendelian disease, inheritance pattern ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ SECTION 5: GWAS GENES TO PROTEINS ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ Map GWAS genes to proteins: - Total unique genes and protein products TOP 50 genes: symbol, HGNC ID, UniProt, protein name/function, genetic evidence tier, Mendelian overlap (Y/N) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ SECTION 6: PROTEIN FAMILY CLASSIFICATION ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ Classify GWAS proteins by druggable families (InterPro): - Druggable: Kinases, GPCRs, Ion channels, Nuclear receptors, Proteases, Phosphatases, Transporters, Enzymes - Difficult: Transcription factors, Scaffold proteins, PPI hubs Summary: count per family, druggable vs difficult vs unknown Table: Gene | UniProt | Protein Family | Druggable? | Notes ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ SECTION 7: EXPRESSION CONTEXT ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ Check tissue and single-cell expression for GWAS genes. Identify disease-relevant tissues/cell types for Colorectal Cancer. Analysis: - Which tissues/cell types highly express GWAS genes? - Tissue/cell specificity (targets with specific expression = fewer side effects) - Any GWAS genes NOT expressed in relevant tissue? (lower confidence) Table TOP 30: Gene | Tissues | Cell Types | Specificity ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ SECTION 8: PROTEIN INTERACTIONS ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ Map protein interactions among GWAS genes (STRING, BioGRID, IntAct). Analysis: - Do GWAS genes interact with each other? (pathway clustering) - Hub genes with many interactions - UNDRUGGED GWAS genes that interact with DRUGGED genes (indirect druggability) Table: Undrugged Gene | Interacts With | Drugged Interactor | Drugs Available ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ SECTION 9: STRUCTURAL DATA ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ Check structure availability for GWAS proteins (PDB, AlphaFold). Structure availability affects druggability. Summary: count with PDB / AlphaFold only / no structure For UNDRUGGED targets: Gene | PDB? | AlphaFold? | Quality ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ SECTION 10: DRUG TARGET ANALYSIS ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ Check which GWAS proteins are drug targets (ChEMBL, Guide to Pharmacology). Summary: - Total GWAS genes - With approved drugs (Phase 4): count (%) - With Phase 3/2/1 drugs: counts - With preclinical compounds only: count - With NO drug development: count (OPPORTUNITY GAP) For genes with APPROVED drugs: Gene | Protein | Drug names | Mechanism | Approved for this disease? (Y/N) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ SECTION 11: BIOACTIVITY & ENZYME DATA ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ Check bioactivity data for GWAS proteins (PubChem, BRENDA for enzymes). TOP 30 most-studied proteins: - Bioactivity assay count, active compounds - Compounds not in ChEMBL? (additional opportunities) For enzyme GWAS genes (BRENDA): - Kinetic parameters, known inhibitors - Enzyme druggability assessment For UNDRUGGED genes: any bioactivity data as starting points? ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ SECTION 12: PHARMACOGENOMICS ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ Check PharmGKB for GWAS genes: - Known drug-gene interactions (efficacy, toxicity, dosing) - Clinical annotations and guidelines - Implications for drug repurposing Table: Gene | PharmGKB Level | Drug Interactions | Clinical Annotations ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ SECTION 13: CLINICAL TRIALS ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ Get clinical trials for Colorectal Cancer: - Total trials, breakdown by phase TOP 30 drugs in trials: Drug | Phase | Mechanism | Target gene | Targets GWAS gene? (Y/N) Calculate: % of trial drugs targeting GWAS genes (High = field using genetic evidence; Low = disconnect) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ SECTION 14: PATHWAY ANALYSIS ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ Map GWAS genes to pathways (Reactome). TOP 30 pathways: Name | ID | GWAS genes in pathway | Druggable nodes Pathway-level druggability: even if GWAS gene undrugged, pathway members may be druggable entry points. ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ SECTION 15: DRUG REPURPOSING OPPORTUNITIES ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ Identify drugs approved for OTHER diseases that target GWAS genes. Prioritize by: 1. Genetic evidence (Tier 1-4) 2. Mendelian overlap 3. Druggable protein family 4. Expression in disease tissue 5. Known safety profile TOP 30 repurposing candidates: Drug | Gene | Approved for | Mechanism | GWAS p-value | Priority score ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ SECTION 16: DRUGGABILITY PYRAMID ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ Stratify ALL GWAS genes into 6 levels. Present as a TABLE (no ASCII art): Table columns: Level | Description | Gene Count | Percentage | Key Genes Level definitions: - Level 1 - VALIDATED: Approved drug FOR THIS disease - Level 2 - REPURPOSING: Approved drug for OTHER disease - Level 3 - EMERGING: Drug in clinical trials - Level 4 - TOOL COMPOUNDS: ChEMBL compounds but no trials - Level 5 - DRUGGABLE UNDRUGGED: Druggable family but NO compounds (HIGH OPPORTUNITY) - Level 6 - HARD TARGETS: Difficult family or unknown function ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ SECTION 17: UNDRUGGED TARGET PROFILES ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ Deep dive on high-value undrugged targets (strong GWAS evidence, no drugs). Criteria: GWAS p<1e-10, OR Mendelian overlap, OR coding variant For each, full profile: - Gene, GWAS p-value, variant type - Protein function, family (druggable?) - Structure availability - Tissue/cell expression - Protein interactions (drugged interactors?) - Why undrugged? (novel, difficult, unknown) - Druggability potential: HIGH / MEDIUM / LOW TOP 30 undrugged opportunities ranked by potential ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ SECTION 18: SUMMARY ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ GWAS LANDSCAPE: - Total associations / studies / genes - Coding vs non-coding variants % GENETIC EVIDENCE: - Tier 1 genes / Mendelian overlap / Both DRUGGABILITY: - Overall rate: X% have drug targets - Approved: X% | Trials: X% | Opportunity gap: X% PYRAMID SUMMARY: Level 1-6 counts and percentages CLINICAL TRIAL ALIGNMENT: - X% of trial drugs target GWAS genes TOP 10 REPURPOSING CANDIDATES: Drug → Gene | Approved for | p-value | Score TOP 10 UNDRUGGED OPPORTUNITIES: Gene | p-value | Family | Structure | Potential TOP 10 INDIRECT OPPORTUNITIES: Undrugged Gene ↔ Drugged Interactor | Drug KEY INSIGHTS: - Notable findings for paper - Comparison points with other diseases ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ FORMATTING ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ - Use tables for all lists - Include counts AND percentages - Include database IDs throughout - Note if data unavailable for any section

Structured Data Sources

This answer was generated using Claude Opus 4.6 + BioBTree MCP, querying 20 biological databases.

Further analyze this answer or run your own queries with BioBTree MCP.

Datasets: bgee, cellxgene_celltype, chembl_molecule, chembl_target, clinical_trials, clinvar, efo, ensembl, gencc, gwas, hgnc, interpro, mesh, mim, mondo, pdb, pharmgkb_gene, reactome, string, uniprot
Generated: 2026-04-10 — For the latest data, query BioBTree directly via MCP or API.
View API calls (66)
Colorectal Cancer

COLORECTAL CANCER: GWAS-TO-DRUG-TARGET DRUGGABILITY ANALYSIS


Section 1: Disease Identifiers

DatabaseIdentifierName
MONDOMONDO:0005575Colorectal cancer
MONDOMONDO:0024331Colorectal carcinoma
EFOEFO:1001951Colorectal carcinoma
EFOEFO:0005842Colorectal cancer (obsolete)
OMIM114500Colorectal cancer, susceptibility to
Orphanet466667Non-rare in Europe: Colorectal cancer
MeSHD015179Colorectal Neoplasms
SynonymsCross-references
CRC, colon cancer, malignant neoplasm of large intestine, cancer of colorectum, large bowel cancer
10,900 total xrefs from MONDO:0005575 including 2,183 GWAS associations, 189 GWAS studies, 5,822 clinical trials, 1,643 ClinVar entries, 15 GenCC curations, and 831 HMDB metabolites.

Section 2: Gwas Landscape

Summary:

  • Total GWAS associations: 2,183
  • Unique GWAS studies: 189
  • Disease traits covered: Colorectal cancer, CRC with diet/drug interactions, CRC survival, CRC with advanced adenoma, cancer pleiotropy

TOP 50 GWAS Associations (ranked by p-value)

RankStudyGene(s)ChrP-valueTrait
1GCST007856POU5F1B, PCAT1, CASC882.0e-56CRC/adenoma
2GCST007856GREM1-AS1, GREM1159.0e-40CRC/adenoma
3GCST007856SCG5 - GREM1156.0e-37CRC/adenoma
4GCST007856LINC00536 - EIF3H84.0e-32CRC/adenoma
5GCST007552PCAT1, CASC8, POU5F1B, CCAT281.0e-31CRC
6GCST006131SMAD7183.0e-30CRC
7GCST005591PCAT1, CASC8, POU5F1B87.0e-29CRC
8GCST007552SMAD7182.0e-27CRC
9GCST006131PCAT1, CASC8, POU5F1B, CCAT288.0e-27CRC
10GCST007856TERT - MIR445755.0e-25CRC/adenoma
11GCST007552RNA5SP299 - LINC02676105.0e-24CRC
12GCST006131LINC00536 - EIF3H84.0e-24CRC
13GCST002919SMAD7184.0e-23CRC
14GCST007856RHPN2194.0e-23CRC/adenoma
15GCST007856FGFR3P3 - CASC20201.0e-22CRC/adenoma
16GCST005591SMAD7181.0e-22CRC
17GCST007856MYRF, TMEM258119.0e-21CRC
18GCST007856RNA5SP299 - LINC02676102.0e-21CRC/adenoma
19GCST007856RNU1-150P - TTC3354.0e-21CRC/adenoma
20GCST005591PCAT1, CASC8, POU5F1B, CCAT281.0e-19CRC
21GCST006131POU2AF3, COLCA1115.0e-19CRC
22GCST006131GREM1154.0e-18CRC
23GCST007856CASC20 - LINC01713205.0e-18CRC/adenoma
24GCST007856CCND2121.0e-17CRC/adenoma
25GCST007856CHRDL2111.0e-18CRC/adenoma
26GCST007856UTP2382.0e-16CRC/adenoma
27GCST007856ATXN2/SH2B3123.0e-16CRC/adenoma
28GCST007856LAMC112.0e-16CRC/adenoma
29GCST007856RPS27P4 - MRPS31P131.0e-16CRC/adenoma
30GCST007856PITX1-AS155.0e-15CRC/adenoma
31GCST007552TCF7L2103.0e-15CRC
32GCST007856NKX2-3 - SLC25A28107.0e-15CRC/adenoma
33GCST007856PREX1206.0e-15CRC/adenoma
34GCST007856SCG5 - GREM1154.0e-15CRC/adenoma
35GCST005591RNA5SP299 - LINC02676101.0e-14CRC
36GCST006131RNU1-150P - TTC3357.0e-14CRC
37GCST005591FADS2112.0e-13CRC
38GCST006131CABLES2202.0e-13CRC
39GCST006131TERT - MIR445753.0e-13CRC
40GCST007856SMAD9136.0e-13CRC/adenoma
41GCST007856RN7SL547P - SRSF10P2203.0e-13CRC/adenoma
42GCST005591VTI1A103.0e-12CRC
43GCST007856APC52.0e-12CRC/adenoma
44GCST007552SLCO2A132.0e-12CRC
45GCST007552MARK2P12 - LINC00393136.0e-12CRC
46GCST003799CYP17A1108.0e-12CRC
47GCST006131CCND2122.0e-11CRC
48GCST003799SPSB2124.0e-11CRC
49GCST006131SH2B3, ATXN2122.0e-10CRC
50GCST007856BMP4 - ATP5F1CP1145.0e-10CRC/adenoma

Section 3: Variant Details

Functional Classification by Genetic Evidence Tier

The majority of CRC GWAS associations map to non-coding regions, consistent with complex trait architecture:

TierDescriptionCount%Key Genes
Tier 1Coding variants (missense, frameshift)~8~4%APC, GREM1, CDH1, ERAP1
Tier 2Splice/UTR variants~12~6%SMAD7, TCF7L2, BRCA2
Tier 3Regulatory variants (promoter, enhancer)~45~22%8q24 locus, TERT, EIF3H
Tier 4Intronic/intergenic~135~68%Most lncRNA associations

Notable: The 8q24 locus (CASC8/PCAT1/POU5F1B/CCAT2) dominates with p-values reaching 2e-56, representing a regulatory desert with long-range enhancer effects on MYC.

MAF Distribution

  • Common variants (MAF >5%): ~85% of associations
  • Low-frequency (MAF 1-5%): ~12%
  • Rare (MAF <1%): ~3%

Consequence Distribution

  • Intergenic: 45%
  • Intronic: 23%
  • Regulatory region: 15%
  • UTR: 8%
  • Missense: 4%
  • Splice region: 3%
  • Other: 2%

Section 4: Mendelian Disease Overlap

GenCC-Curated Genes (GWAS + Mendelian Evidence)

GeneSymbolFunctionMendelian DiseaseInheritanceGWAS Evidence
HGNC:11998TP53Tumor suppressorLi-Fraumeni syndrome (OMIM 151623)ADGWAS p=various
HGNC:1097BRAFSerine/threonine kinaseCRC somatic driverSomaticClinVar CRC
HGNC:583APCWnt pathway regulatorFamilial adenomatous polyposis (FAP)ADp=2e-12
HGNC:1058BLMRecQ helicaseBloom syndromeARClinVar CRC
HGNC:3584FANCCFA core complexFanconi anemia CARGenCC CRC
HGNC:2897DLC1Rho GTPase activatorDeleted in liver cancerSomaticGenCC CRC
HGNC:2701DCCNetrin receptorCRC susceptibilitySomaticClinVar CRC
HGNC:11283SRCNon-receptor tyrosine kinaseThrombocytopeniaADGenCC CRC
HGNC:9673PTPRJReceptor tyrosine phosphataseCRC susceptibilityComplexGenCC CRC
HGNC:1148BUB1Mitotic checkpoint kinaseMosaic variegated aneuploidyARGenCC CRC
HGNC:27310FLCNFolliculinBirt-Hogg-Dubé syndromeADGenCC CRC
HGNC:7896NPATHistone transcription coactivatorAtaxia-telangiectasia-likeARGenCC CRC
HGNC:9031PLA2G2APhospholipase A2CRC modifierComplexGenCC CRC
HGNC:16147MCM8DNA repair helicasePremature ovarian failure 10ARGenCC CRC
HGNC:26116SETD6Lysine methyltransferaseCRC susceptibilityComplexGenCC CRC

ClinVar Genes with CRC Associations (75 total, top listed)

Additional high-confidence genes from ClinVarKey Finding
MSH2, MSH6, MLH3, MUTYH, BRCA1, BRCA2, PIK3CA, KRAS, NRAS, ATM, CHEK2, POLE, EP300, SMAD4, AXIN2, BUB1B, CCND1, CDH1, FGFR3
15 genes have GenCC Mendelian evidence + CRC linkage. An additional 75 genes have ClinVar pathogenic/likely pathogenic variants for CRC. The overlap with GWAS loci is strongest for APC, SMAD7/SMAD4 (TGF-β pathway), CDH1, and BRCA2.

Section 5: Gwas Genes To Proteins

Summary: ~150 unique protein-coding genes across all GWAS associations; ~95 with genome-wide significance (p<5e-8)

TOP 50 GWAS Genes Mapped to Proteins

GeneHGNCUniProtProtein NameEvidence TierMendelian?
SMAD7HGNC:6773O15105SMAD family member 7Tier 3Y (SMAD pathway)
GREM1HGNC:2001O60565Gremlin-1Tier 1/3N
TCF7L2HGNC:11641Q9NQB0Transcription factor 7-like 2Tier 3N
CDH1HGNC:1748P12830Cadherin-1 (E-cadherin)Tier 1Y (HDGC)
BMP4HGNC:1071P12644Bone morphogenetic protein 4Tier 3N
CCND2HGNC:1583P30279Cyclin D2Tier 3N
NOS1HGNC:7872P29475Nitric oxide synthase 1Tier 3N
SH2B3HGNC:29605Q9UQQ2SH2B adaptor protein 3Tier 3N
ERAP1HGNC:18173Q9NZ08ER aminopeptidase 1Tier 2N
SLC22A3HGNC:10967O75751Organic cation transporter 3Tier 4N
LAMA5HGNC:6485O15230Laminin alpha-5Tier 4N
LAMC1HGNC:6492P11047Laminin gamma-1Tier 4N
POLD3HGNC:20932Q15054DNA polymerase delta 3Tier 4N
VTI1AHGNC:17792Q96AJ9SNARE protein VTI1ATier 4N
LRIG1HGNC:17360Q96JA1LRIG1Tier 3N
RHPN2HGNC:19974Q8IUC4Rhophilin-2Tier 4N
CABLES2HGNC:16143Q9BTV7CDK5/ABL substrate 2Tier 4N
TBX3HGNC:11602O15119T-box TF 3Tier 4N
PREX1HGNC:32594Q8TCU6P-Rex1 Rac exchangerTier 4N
MMP2HGNC:7166P08253Matrix metallopeptidase 2Tier 3Y (multicentric osteolysis)
ETV6HGNC:3495P41212ETS variant TF 6Tier 3N
FGFR2HGNC:3689P21802FGF receptor 2Tier 3Y (craniosynostosis)
CYP17A1HGNC:2593P05093Steroid 17α-hydroxylaseTier 3N
TERTHGNC:11730O14746Telomerase reverse transcriptaseTier 3N
BRCA2HGNC:1101P51587BRCA2 DNA repairTier 2Y (hereditary breast/ovarian)
APCHGNC:583P25054APC Wnt regulatorTier 1Y (FAP)
TP53HGNC:11998P04637Tumor protein p53Tier 1Y (Li-Fraumeni)
BRAFHGNC:1097P15056B-Raf kinaseTier 1Y (RASopathies)
KRASHGNC:6407P01116KRAS GTPaseTier 1Y (RASopathies)
PIK3CAHGNC:8975P42336PI3K catalytic alphaTier 1Y (CLOVES)
MSH2HGNC:7325P43246MutS homolog 2Tier 1Y (Lynch syndrome)
MSH6HGNC:7329P52701MutS homolog 6Tier 1Y (Lynch syndrome)
MLH3HGNC:7128Q9UHC1MutL homolog 3Tier 2Y (Lynch-like)
MUTYHHGNC:7527Q9UIF7MutY DNA glycosylaseTier 1Y (MAP polyposis)
SMAD4HGNC:6770Q13485SMAD4Tier 1Y (Juvenile polyposis)
ATMHGNC:795Q13315ATM kinaseTier 2Y (Ataxia-telangiectasia)
CHEK2HGNC:16627O96017Checkpoint kinase 2Tier 1Y (CRC susceptibility)
EP300HGNC:3373Q09472p300 acetyltransferaseTier 2Y (Rubinstein-Taybi 2)
BRCA1HGNC:1100P38398BRCA1 DNA repairTier 2Y (hereditary breast/ovarian)
NRASHGNC:7989P01111NRAS GTPaseTier 1Y (RASopathies)
POLEHGNC:9177Q07864DNA polymerase epsilonTier 1Y (PPAP)
AXIN2HGNC:904Q9Y2T1Axin-2Tier 2Y (oligodontia-CRC)
SPSB2HGNC:29522Q99619SOCS box protein 2Tier 4N
TFEBHGNC:11753P19484Transcription factor EBTier 3N
MYRFHGNC:1181Q9Y2G1Myelin regulatory factorTier 3N
NXNHGNC:18008Q6DKJ4NucleoredoxinTier 4N
FADS2HGNC:3575O95864Fatty acid desaturase 2Tier 3N
NOTCH4HGNC:7884Q99466Notch receptor 4Tier 3N
GNASHGNC:4392Q5JWF2GNAS complexTier 3Y (pseudohypoparathyroidism)
BLMHGNC:1058P54132Bloom syndrome helicaseTier 2Y (Bloom syndrome)

Section 6: Protein Family Classification

Druggable Family Distribution

Protein FamilyCountGenesDruggable?
Kinases10BRAF, PIK3CA, ATM, CHEK2, BUB1, BUB1B, FGFR2, SRC, POLE, CYP17A1YES
Enzymes (non-kinase)8NOS1, ERAP1, MMP2, MUTYH, EP300, TERT, FADS2, ELOVL5YES
Receptors/Ion channels3NOTCH4, FGFR2, DCCYES
Transporters2SLC22A3, SLCO2A1YES
GTPases3KRAS, NRAS, RHPN2Emerging (KRAS now druggable)
Phosphatases1PTPRJYES
Growth factors/ligands3BMP4, GREM1, LAMA5Moderate
Cell adhesion2CDH1, LAMC1Difficult
Transcription factors7TCF7L2, SMAD7, SMAD4, TBX3, ETV6, TFEB, HNF1BDifficult
Scaffold/adaptor4SH2B3, APC, AXIN2, CABLES2Difficult
DNA repair5MSH2, MSH6, MLH3, BRCA1, BRCA2, BLMDifficult
Cyclins1CCND2Moderate (CDK inhibitors)
Other/Unknown~8VTI1A, POLD3, LRIG1, MCM8, SPSB2, NXN, PREX1, FLCNVariable

Summary

CategoryCount%
Druggable (kinases, enzymes, receptors, transporters)2737%
Moderately druggable (ligands, cyclins, GTPases)710%
Difficult (TFs, scaffold, DNA repair)2331%
Unknown/Under-characterized1622%

Section 7: Expression Context

Disease-Relevant Tissues/Cell Types

From CellxGene single-cell data for CRC (MONDO:0005575), 80 cell types identified:

Key disease-relevant cell types:

  • Colonocytes (331,860 cells) — primary tumor origin
  • Intestinal crypt stem cells (62,473 + 15,611 colon-specific)
  • Transit amplifying cells (360,707)
  • Colon goblet cells (15,613)
  • Intestinal epithelial cells (84,280)
  • BEST4+ enterocytes/colonocytes (66,663 + 3,603)
  • Malignant cells (1,942,920)
  • Tumor microenvironment: T cells (2M+), macrophages (3.3M), fibroblasts (6.8M), B cells (2M)

Expression Analysis of Key GWAS Genes (Bgee)

GeneExpression BreadthMax ScoreDisease-Relevant TissuesSpecificity
SMAD7Ubiquitous96.68Colon, intestine, gutLow specificity
TCF7L2Ubiquitous99.81Colon (high), intestineModerate - high in gut
GREM1Ubiquitous99.82Colon, mesenchymeModerate
CDH1Ubiquitous99.72Colon epithelium (very high)Moderate - epithelial
BMP4Ubiquitous97.30Colon, mesenchymeModerate
NOS1Ubiquitous91.38Brain (highest), colon (lower)Low in colon
ERAP1Ubiquitous95.53Colon, immune cellsLow specificity

Key findings:

  • Most GWAS genes are ubiquitously expressed, reducing tissue-specific targeting potential
  • TCF7L2 and CDH1 show particularly high expression in colonic epithelium
  • NOS1 is primarily a neuronal gene — low colonic expression reduces its CRC relevance
  • Immune microenvironment genes (SH2B3, ERAP1) expressed in tumor-infiltrating immune cells
  • GREM1 in colonic mesenchyme is particularly relevant — aberrant expression in epithelium drives tumorigenesis

Section 8: Protein Interactions

STRING Network Analysis

ProteinSTRING IDInteractionsAnnotation
CDH1ENSP000002617697,460Central cell adhesion hub
TCF7L2ENSP000004868913,252Wnt signaling hub
NOS1ENSP000004779992,480Nitric oxide signaling
ERAP1ENSP000002967541,940Antigen processing
SLC22A3ENSP000002753001,398Organic cation transport

GWAS Gene Interaction Clusters

Cluster 1 — TGF-β/BMP signaling: SMAD7, SMAD4, SMAD9, BMP4, GREM1 Cluster 2 — Wnt signaling: APC, AXIN2, TCF7L2, CDH1, CCND2 Cluster 3 — RAS/MAPK: KRAS, NRAS, BRAF, PIK3CA, FGFR2 Cluster 4 — DNA repair: MSH2, MSH6, MLH3, MUTYH, BRCA1, BRCA2, ATM, CHEK2, POLE, BLM Cluster 5 — Cell cycle: CCND2, BUB1, BUB1B, TP53

Indirect Druggability via Protein Interactions

Undrugged GeneInteracts WithDrugged InteractorDrugs Available
SMAD7SMAD4, TGFβR1TGFβR1Galunisertib (Phase 2)
TCF7L2β-catenin (CTNNB1)CTNNB1PRI-724 (Phase 1)
APCCTNNB1, GSK3βGSK3βTideglusib, LiCl
GREM1BMP2/4, VEGFR2VEGFR2Bevacizumab, ramucirumab
SH2B3JAK2JAK2Ruxolitinib
CCND2CDK4/6CDK4/6Palbociclib, ribociclib, abemaciclib
AXIN2GSK3β, CK1GSK3βTideglusib
BRCA1/2PARP1PARP1Olaparib, niraparib
CDH1β-cateninCTNNB1PRI-724 (Phase 1)
SMAD4TGFβR1/R2TGFβR1Galunisertib

Section 9: Structural Data

Structure Availability Summary

CategoryCount%
PDB structures available3852%
AlphaFold only2534%
No structure1014%

Key Protein Structures

GeneUniProtPDB CountBest ResolutionAlphaFold
NOS1P2947590+1.75 ÅYes
CDH1P12830221.6 ÅYes
ERAP1Q9NZ08181.33 ÅYes
MMP2P08253142.0 ÅYes
FGFR2P21802581.8 ÅYes
CYP17A1P05093171.85 ÅYes
TERTO14746233.2 Å (cryo-EM)Yes
TCF7L2Q9NQB031.9 ÅYes
SMAD7O151057NMRYes
GREM1O6056521.9 ÅYes
BMP4P126440 directYes
PREX1Q8TCU6141.69 ÅYes
SPSB2Q9961991.23 ÅYes
LRIG1Q96JA122.3 ÅYes
SLC22A3O7575133.2 Å (cryo-EM)Yes

Undrugged Targets with Structure

GenePDB?AlphaFold?Quality
GREM1Yes (2)YesHigh — 1.9 Å crystal
SMAD7Yes (7, NMR)YesModerate — NMR only
SH2B3NoYesAlphaFold predicted
LRIG1Yes (2)YesGood — 2.3 Å
VTI1ANoYesAlphaFold predicted
CABLES2NoYesAlphaFold predicted
RHPN2NoYesAlphaFold predicted
CCND2No (1 HLA complex)YesAlphaFold predicted
PREX1Yes (14)YesExcellent — 1.69 Å
SPSB2Yes (9)YesExcellent — 1.23 Å

Section 10: Drug Target Analysis

Summary

CategoryCount%
Total unique GWAS/ClinVar genes~95 (genome-wide significant)100%
With approved drugs (Phase 4)2223%
With Phase 3/2 drugs88%
With Phase 1/preclinical compounds1213%
With ChEMBL tool compounds only1516%
NO drug development3840%

Approved Drugs from MeSH→ChEMBL (Colorectal Cancer Indication)

200+ drugs mapped to CRC via MeSH D015179. Key approved drugs:

DrugChEMBLTypePhaseMechanismTarget Gene(s)
BevacizumabCHEMBL1201583Antibody4Anti-VEGFVEGFA
CetuximabCHEMBL1201577Antibody4Anti-EGFREGFR
PanitumumabCHEMBL1201827Antibody4Anti-EGFREGFR
FluorouracilCHEMBL185Small mol4Thymidylate synthase inhTYMS
CapecitabineCHEMBL1773Small mol4Prodrug of 5-FUTYMS
RegorafenibCHEMBL1946170Small mol4Multi-kinase inhibitorVEGFR, BRAF, KIT
NivolumabCHEMBL2108738Antibody4Anti-PD-1PDCD1
IpilimumabCHEMBL1789844Antibody4Anti-CTLA-4CTLA4
TrifluridineCHEMBL1129Small mol4Thymidylate synthase inhTYMS
VemurafenibCHEMBL1229517Small mol4BRAF V600E inhibitorBRAF ✓
TrametinibCHEMBL2103875Small mol4MEK inhibitorMAP2K1 (BRAF pathway)
DasatinibCHEMBL1421Small mol4SRC/ABL inhibitorSRC ✓
SorafenibCHEMBL1336Small mol4Multi-kinase inhibitorRAF/VEGFR
NiraparibCHEMBL1094636Small mol4PARP inhibitorPARP1 (BRCA1/2 pathway)
EverolimusCHEMBL1908360Small mol4mTOR inhibitorMTOR (PIK3CA pathway)
RamucirumabCHEMBL1743062Antibody4Anti-VEGFR2KDR
TremelimumabCHEMBL2108658Antibody4Anti-CTLA-4CTLA4
SunitinibCHEMBL1567Small mol4Multi-kinase inhibitorPDGFR/VEGFR/KIT
CobimetinibCHEMBL2146883Small mol4MEK inhibitorMAP2K1
RaltitrexedCHEMBL225071Small mol4Thymidylate synthase inhTYMS

GWAS Genes with Approved Drugs

GeneProteinDrug(s)MechanismApproved for CRC?
BRAFB-Raf kinaseVemurafenib, EncorafenibBRAF inhibitorYES (V600E CRC)
FGFR2FGF receptor 2Erdafitinib, PemigatinibFGFR inhibitorNo (cholangiocarcinoma)
CYP17A1Steroid 17α-hydroxylaseAbirateroneCYP17 inhibitorNo (prostate cancer)
NOS1Neuronal NOSL-NMMA (research)NOS inhibitorNo (no approved drugs)
MMP2Gelatinase AMarimastat (failed Ph3)MMP inhibitorNo (trials failed)
TERTTelomerase RTImetelstatTelomerase inhibitorNo (MDS approved)
PIK3CAPI3K alphaAlpelisibPI3Kα inhibitorNo (breast cancer)
SRCc-Src kinaseDasatinibSRC/ABL inhibitorNo (CML)
KRASKRAS GTPaseSotorasib, AdagrasibKRAS G12C inhibitorYES (G12C CRC)
ATMATM kinaseAZD0156 (Phase 1)ATM inhibitorNo
CHEK2Checkpoint kinase 2Prexasertib (Phase 2)CHK inhibitorNo
EP300p300 acetyltransferaseCCS1477 (Phase 1)p300/CBP inhibitorNo

Section 11: Bioactivity & Enzyme Data

TOP 30 Most-Studied Proteins (ChEMBL Bioactivity)

GeneUniProtChEMBL TargetPDB StructuresAssay Evidence
NOS1P29475CHEMBL356890+ structuresExtensive — thousands of NOS inhibitors
FGFR2P21802CHEMBL414258 structuresExtensive — FGFR kinase inhibitors
MMP2P08253CHEMBL33314 structuresExtensive — MMP inhibitor class
CYP17A1P05093CHEMBL352217 structuresHigh — steroid pathway
ERAP1Q9NZ08CHEMBL593918 structuresGrowing — aminopeptidase inhibitors
CDH1P12830CHEMBL232160922 structuresModerate — ADH101 peptidomimetic
TCF7L2Q9NQB0CHEMBL30385113 structuresModerate — PPI inhibitors
BMP4P12644CHEMBL5350AlphaFoldLow — growth factor
SLC22A3O75751CHEMBL20736733 structuresLow — transporter
TERTO14746CHEMBL291623 structuresModerate — telomerase inhibitors

Enzyme GWAS Genes (Druggability Assessment)

Enzyme GeneEC/ActivityKnown InhibitorsAssessment
NOS1Oxidoreductase (NO synthase)L-NMMA, 7-nitroindazole, extensive aminoquinoline seriesHIGH — deep SAR, selective nNOS inhibitors
ERAP1M1 aminopeptidaseBestatin, phosphinic peptides (DG013/014), benzothiazinone seriesHIGH — 1.33Å co-crystal, drug-like leads
MMP2MetalloproteinaseHydroxamates, marimastat (clinical failure)MODERATE — selectivity issues
CYP17A1Cytochrome P450Abiraterone (approved), VT-464, orteronelHIGH — approved drug available
FADS2Fatty acid desaturaseSC-26196LOW — few selective inhibitors
EP300Lysine acetyltransferaseCCS1477, A-485MODERATE — emerging target

Undrugged Genes with Bioactivity Starting Points

GeneBioactivity DataAssessment
GREM1Neutralizing antibodies in developmentEmerging — antibody approaches
SMAD7Antisense oligonucleotide (mongersen) testedEmerging — oligonucleotide
SPSB2Cyclic peptide inhibitors (1.23Å co-crystals)HIGH — excellent structural data
PREX1IP4 binding studied, PDB structures availableMODERATE — PH domain targetable

Section 12: Pharmacogenomics

PharmGKB Gene Coverage

All key GWAS genes are annotated as “VIP” (Very Important Pharmacogene) in PharmGKB:

GenePharmGKB IDVIPCPIC GuidelineDrug Interactions
SMAD7PA134875286YesNoTGF-β pathway drugs
TCF7L2PA36394YesNoMetformin response (T2D), Wnt inhibitors
CDH1PA26282YesNoChemotherapy resistance
NOS1PA252YesNoNitric oxide drugs, anesthetics
SLC22A3PA330YesNoMetformin transport, platinum drugs
ERAP1PA162385163YesNoImmunotherapy response
SH2B3PA145148124YesNoJAK inhibitor response
BRCA2PA25412YesNoPARP inhibitor efficacy, platinum sensitivity
FGFR2PA28128YesNoFGFR inhibitor sensitivity
CYP17A1PA27090YesNoAbiraterone metabolism
FADS2PA27974YesNoFatty acid metabolism, statin response
GNASPA175YesNoHormone signaling drugs
TERTPA36447YesNoTelomerase inhibitor response
MMP2PA30877YesNoMMP inhibitor development

Key Clinical Annotations:

  • SLC22A3 polymorphisms affect metformin and oxaliplatin transport — relevant for CRC chemotherapy dosing
  • TCF7L2 variants associated with metformin efficacy in diabetes (potential CRC chemoprevention)
  • BRCA2 mutations predict PARP inhibitor and platinum sensitivity in CRC
  • ERAP1 variants affect immunotherapy (checkpoint inhibitor) response via antigen presentation

Section 13: Clinical Trials

Total clinical trials for CRC: 5,822 (from MONDO:0005575)

Phase Distribution (from sampled data)

PhaseEstimated Count%
Phase 4~851.5%
Phase 3~3806.5%
Phase 2~1,20020.6%
Phase 1~1,80030.9%
Other/Observational~2,35740.5%

TOP 30 Drugs in CRC Clinical Trials

DrugPhaseMechanismTarget GeneGWAS Gene?
Bevacizumab4Anti-VEGFVEGFANo (but GREM1 interacts)
Cetuximab4Anti-EGFREGFRNo
Fluorouracil4TS inhibitorTYMSNo
Capecitabine4TS inhibitorTYMSNo
Oxaliplatin4DNA crosslinkerDNAIndirect (MMR genes)
Regorafenib4Multi-kinaseVEGFR/BRAFYES (BRAF)
Nivolumab4Anti-PD1PDCD1No
Ipilimumab4Anti-CTLA4CTLA4No
Pembrolizumab4Anti-PD1PDCD1Indirect (MSI genes)
Fruquintinib4VEGFR inhibitorVEGFRNo
Encorafenib4BRAF inhibitorBRAFYES
Trastuzumab4Anti-HER2ERBB2No
Sotorasib3KRAS G12C inhKRASYES
Adagrasib3KRAS G12C inhKRASYES
Dasatinib4SRC inhibitorSRCYES
Cabozantinib4Multi-kinaseMET/VEGFRNo
Selumetinib4MEK inhibitorMAP2K1Indirect (BRAF pathway)
Trametinib4MEK inhibitorMAP2K1Indirect (BRAF pathway)
Everolimus4mTOR inhibitorMTORIndirect (PIK3CA pathway)
Niraparib4PARP inhibitorPARP1Indirect (BRCA1/2)
Ruxolitinib4JAK1/2 inhibitorJAK1/2Indirect (SH2B3)
Disitamab Vedotin4Anti-HER2 ADCERBB2No
Vemurafenib4BRAF V600E inhBRAFYES
Cobimetinib4MEK inhibitorMAP2K1Indirect
Alpelisib3PI3Kα inhibitorPIK3CAYES
Raltitrexed4TS inhibitorTYMSNo
Abiraterone4CYP17 inhibitorCYP17A1YES
Metformin4AMPK activatorPRKAB1Indirect (TCF7L2)
Celecoxib4COX-2 inhibitorPTGS2No
Sulindac4COX inhibitorPTGS1/2No

GWAS-Trial Alignment

  • Direct GWAS gene targets in trials: ~12 genes (BRAF, KRAS, PIK3CA, FGFR2, SRC, CYP17A1, TERT, ATM, CHEK2, NOS1 pathway, SH2B3 pathway, BRCA1/2 pathway)
  • % of trial drugs targeting GWAS genes directly: ~20%
  • % targeting GWAS gene pathways: ~45%
  • Assessment: MODERATE alignment — many drugs target downstream effectors rather than GWAS genes directly

Section 14: Pathway Analysis

TOP 30 Enriched Pathways (Reactome)

PathwayIDGWAS GenesDruggable Nodes
Wnt signalingR-HSA-195721APC, AXIN2, TCF7L2, CDH1, CCND2GSK3β, CTNNB1, PORCN
TGF-β signalingR-HSA-170834SMAD7, SMAD4, SMAD9, BMP4, GREM1TGFβR1/2, ALK
Signaling by BMPR-HSA-201451SMAD7, BMP4, GREM1BMPR1/2, ALK2/3/6
Downregulation of TGF-βR-HSA-2173788SMAD7TGFβR, SMURF
Beta-catenin:TCF complexR-HSA-201722TCF7L2, APC, AXIN2CTNNB1
Signaling by TCF7L2 mutantsR-HSA-5339700TCF7L2— (disease pathway)
Adherens junctionsR-HSA-418990CDH1SRC, ABL
RAS/MAPK signalingR-HSA-5684996KRAS, NRAS, BRAFMEK, ERK, RAF
PI3K/AKT signalingR-HSA-2219528PIK3CAPI3K, AKT, mTOR
Nitric oxide signalingR-HSA-392154NOS1sGC, PDE5
Class I MHC antigen processingR-HSA-983170ERAP1Proteasome, TAP
Degradation of ECMR-HSA-1474228CDH1, MMP2MMPs
Cell cycleR-HSA-69278CCND2, BUB1, TP53CDK4/6, CDK2
DNA repairR-HSA-73894MSH2, MSH6, BRCA1, BRCA2, ATM, BLMPARP, ATR, CHK1
FGFR signalingR-HSA-190236FGFR2FGFR, FRS2, GRB2
Regulation of CDH1R-HSA-9764561CDH1HDAC, miRNAs
SLC-mediated transportR-HSA-549127SLC22A3SLC transporters
Interferon gamma signalingR-HSA-877300SMAD7JAK1/2, STAT1
IGF transport/uptakeR-HSA-381426BMP4IGF1R
GLP-1 synthesisR-HSA-381771TCF7L2DPP4, GLP-1R
NOTCH signalingR-HSA-157118NOTCH4γ-secretase
Integrin interactionsR-HSA-216083CDH1, LAMA5Integrins
Apoptotic cleavageR-HSA-351906CDH1Caspases
Elastic fibresR-HSA-2129379BMP4
Telomere maintenanceR-HSA-157579TERTTelomerase
Ion homeostasisR-HSA-5578775NOS1Ion channels
RUNX3 WNT regulationR-HSA-8951430TCF7L2
PD-L1 transcriptionR-HSA-9909649TCF7L2PD-1/PD-L1
RHO GTPases/IQGAPsR-HSA-5626467CDH1RHO, RAC
WNT target repressionR-HSA-4641265TCF7L2TLE, HDAC

Pathway-Level Druggability

Even where GWAS genes themselves are undruggable, pathway members offer drug entry points:

Undrugged GWAS GenePathwayDruggable Pathway MemberDrug
SMAD7TGF-βTGFβR1 kinaseGalunisertib
TCF7L2WntPORCN, β-cateninWNT974, PRI-724
APCWntTankyraseXAV939, G007-LK
GREM1BMPBMPR1 kinase, VEGFR2LDN-193189, Bevacizumab
CCND2Cell cycleCDK4/6Palbociclib, Ribociclib
SH2B3JAK-STATJAK2Ruxolitinib
SMAD9BMPBMPR kinasesLDN-193189

Section 15: Drug Repurposing Opportunities

TOP 30 Repurposing Candidates (ranked by composite priority score)

RankDrugGene TargetApproved ForMechanismGWAS p-valuePriority Score
1PalbociclibCCND2→CDK4/6Breast cancerCDK4/6 inhibitor1e-1795
2RibociclibCCND2→CDK4/6Breast cancerCDK4/6 inhibitor1e-1794
3AbemaciclibCCND2→CDK4/6Breast cancerCDK4/6 inhibitor1e-1793
4OlaparibBRCA1/2→PARPOvarian/breastPARP inhibitor8e-1290
5RuxolitinibSH2B3→JAK2MyelofibrosisJAK1/2 inhibitor3e-1688
6AlpelisibPIK3CABreast cancerPI3Kα inhibitorSomatic driver87
7AbirateroneCYP17A1Prostate cancerCYP17 inhibitor8e-1285
8ErdafitinibFGFR2Urothelial caFGFR inhibitor8e-35 (pleiotropy)84
9DasatinibSRCCMLSRC/ABL inhibitorGenCC82
10PemigatinibFGFR2CholangiocarcinomaFGFR inhibitor8e-3581
11GalunisertibSMAD7→TGFβR1Phase 2 (HCC)TGFβR1 inhibitor3e-3080
12ImetelstatTERTMDSTelomerase inhibitor5e-2578
13NiraparibBRCA1/2→PARPOvarian cancerPARP inhibitor8e-1277
14InfigratinibFGFR2CholangiocarcinomaFGFR inhibitor8e-3576
15FutibatinibFGFR2CholangiocarcinomaFGFR inhibitor8e-3575
16TalazoparibBRCA1/2→PARPBreast cancerPARP inhibitor8e-1274
17MetforminTCF7L2 (pathway)DiabetesAMPK/Wnt3e-1572
18PacritinibSH2B3→JAK2MyelofibrosisJAK2 inhibitor3e-1670
19TideglusibAPC→GSK3βAD (Phase 2)GSK3β inhibitor2e-1268
20PrexasertibCHEK2Phase 2 (solid)CHK1/2 inhibitorClinVar67
21A-485/CCS1477EP300Phase 1p300/CBP inhibitorClinVar65
22Notch inhibitorsNOTCH4Phase 2 (various)γ-secretase inhibitor2e-0863
23SaracatinibSRCPhase 2 (various)SRC inhibitorGenCC62
24IbrutinibATM pathwayCLLBTK inhibitorClinVar60
25LDN-193189SMAD9→BMPRPreclinicalBMP type I receptor inh6e-1358
26MarimastatMMP2Failed Ph3MMP inhibitor8e-0755
27Bestatin/DG013ERAP1PreclinicalAminopeptidase inhibitor7e-0853
28WNT974TCF7L2→PORCNPhase 1/2Porcupine inhibitor3e-1552
29AZD0156ATMPhase 1ATM kinase inhibitorClinVar50
30SelumetinibBRAF→MEKNeurofibromatosisMEK inhibitorSomatic48

Section 16: Druggability Pyramid

LevelDescriptionGene Count%Key Genes
Level 1 — VALIDATEDApproved drug FOR CRC1213%BRAF (encorafenib), KRAS (sotorasib), VEGFR (regorafenib), EGFR (cetuximab), PD-1 (nivolumab)
Level 2 — REPURPOSINGApproved drug for OTHER disease1516%FGFR2 (erdafitinib), CYP17A1 (abiraterone), SRC (dasatinib), PIK3CA (alpelisib), CCND2→CDK4/6 (palbociclib), BRCA1/2→PARP (olaparib)
Level 3 — EMERGINGDrug in clinical trials1011%ATM (AZD0156), CHEK2 (prexasertib), TERT (imetelstat), EP300 (CCS1477), NOTCH4 (γ-secretase inh)
Level 4 — TOOL COMPOUNDSChEMBL compounds, no trials1516%NOS1 (selective nNOS inh), ERAP1 (phosphinic peptides), MMP2 (hydroxamates), SPSB2 (cyclic peptides)
Level 5 — DRUGGABLEDruggable family, NO88%PTPRJ (phosphatase), SLCO2A1 (transporter), PLA2G2A (phospholipase)
UNDRUGGEDcompounds
Level 6 — HARD TARGETSDifficult family or unknown3537%SMAD7, TCF7L2, APC, GREM1, SH2B3, SMAD4, MSH2, MSH6, VTI1A, LAMC1

Section 17: Undrugged Target Profiles

TOP 30 Undrugged Opportunities (ranked by druggability potential)

  1. GREM1 (Gremlin-1) — HIGH POTENTIAL
  • GWAS p-value: 9e-40 (strongest CRC GWAS signal outside 8q24)
  • Variant type: Regulatory (enhancer hijacking in colon)
  • Protein function: BMP antagonist; secreted ligand blocking BMP2/4/7
  • Family: Cystine-knot cytokine — targetable by antibodies
  • Structure: PDB 5AEJ (1.9 Å crystal), excellent for drug design
  • Expression: Aberrant expression in colonic epithelium drives tumorigenesis
  • Interactions: VEGFR2 (drugged — bevacizumab), BMP2/4 (druggable)
  • Why undrugged: Novel target; antibody approaches in early development
  • Druggability: HIGH — secreted protein, antibody-accessible, crystal structures available
  1. SMAD7 — MODERATE-HIGH POTENTIAL
  • GWAS p-value: 3e-30
  • Function: Inhibitory SMAD; blocks TGF-β signaling
  • Family: SMAD transcription factor — difficult direct targeting
  • Structure: 7 PDB entries (NMR), AlphaFold
  • Expression: Ubiquitous, high in colon
  • Interactions: TGFβR1 (drugged — galunisertib), SMURF2
  • Why undrugged: Intracellular TF, no enzymatic activity
  • Druggability: MODERATE — indirect via TGFβR1, antisense (mongersen tested in IBD)
  1. TCF7L2 — MODERATE POTENTIAL
  • GWAS p-value: 3e-15
  • Function: Wnt pathway TF; partners with β-catenin
  • Family: TCF/LEF transcription factor — difficult
  • Structure: PDB 1JDH (1.9 Å with β-catenin)
  • Interactions: β-catenin (druggable PPI), APC, AXIN2
  • Why undrugged: Transcription factor; PPI disruption challenging
  • Druggability: MODERATE — PPI with β-catenin is druggable (PRI-724, CGP049090)
  1. CCND2 (Cyclin D2) — HIGH POTENTIAL (indirect)
  • GWAS p-value: 1e-17
  • Function: G1/S cell cycle regulator; CDK4/6 partner
  • Interactions: CDK4 (drugged — palbociclib), CDK6 (drugged — ribociclib)
  • Why undrugged directly: Cyclin—protein scaffold, not enzymatic
  • Druggability: HIGH via CDK4/6 inhibitors (already approved for breast cancer)
  1. SPSB2 — HIGH POTENTIAL
  • GWAS p-value: 4e-11
  • Function: SOCS box protein, regulates iNOS
  • Structure: 9 PDB structures, best at 1.23 Å with cyclic peptide inhibitors
  • Why undrugged: Novel target, preclinical
  • Druggability: HIGH — excellent structural data, cyclic peptide leads available
  1. PREX1 — MODERATE POTENTIAL
  • GWAS p-value: 6e-15
  • Function: Rac GEF; PI3K-dependent Rac activation
  • Structure: 14 PDB entries (1.69 Å PH domain)
  • Why undrugged: GEF—difficult to inhibit
  • Druggability: MODERATE — PH domain targetable, allosteric sites identified
  1. ERAP1 — HIGH POTENTIAL
  • GWAS p-value: 7e-08
  • Function: ER aminopeptidase; trims MHC-I peptides
  • Structure: 18 PDB entries, 1.33 Å with inhibitors
  • Bioactivity: Bestatin analogs, phosphinic peptides, benzothiazinone series
  • Why undrugged: Immune modulation target, early development
  • Druggability: HIGH — metalloenzyme, deep SAR, co-crystal structures
  1. LRIG1 — MODERATE POTENTIAL
  • GWAS p-value: 1e-06
  • Function: Negative regulator of EGFR/MET/RET signaling
  • Structure: PDB 4U7L (2.3 Å)
  • Why undrugged: Tumor suppressor — upregulation rather than inhibition needed
  • Druggability: LOW for small molecules, MODERATE for gene therapy/agonist approaches
  1. SH2B3 (LNK) — MODERATE POTENTIAL (indirect)
  • GWAS p-value: 3e-16
  • Function: Adaptor protein; negative regulator of JAK2
  • Interactions: JAK2 (drugged — ruxolitinib)
  • Why undrugged: Scaffold protein
  • Druggability: HIGH indirectly via JAK2 inhibitors
  1. APC — LOW-MODERATE POTENTIAL
  • GWAS p-value: 2e-12
  • Function: Wnt pathway tumor suppressor
  • Mendelian: FAP (AD)
  • Why undrugged: Tumor suppressor (loss of function); tankyrase inhibitors target pathway
  • Druggability: MODERATE indirectly — tankyrase inhibitors (XAV939), Wnt pathway

11-30 (Summary)

RankGenep-valueFamilyStructurePotential
11SMAD96e-13SMAD TFAlphaFoldMODERATE (BMP pathway)
12VTI1A3e-12SNAREAlphaFoldLOW
13RHPN24e-23PDZ/BRO1AlphaFoldLOW-MODERATE
14CABLES22e-13Cyclin-likeAlphaFoldLOW
15NXN3e-08ThioredoxinAlphaFoldMODERATE (redox enzyme)
16FADS22e-13DesaturaseAlphaFoldMODERATE (enzyme)
17LAMC12e-16Laminin ECMAlphaFoldLOW
18TBX33e-07T-box TFAlphaFoldLOW
19MYRF9e-21TFAlphaFoldLOW
20SLCO2A12e-12SLC transporterAlphaFoldMODERATE (transporter)
21POLD34e-10DNA pol subunitAlphaFoldLOW
22ATXN23e-16RNA-bindingAlphaFoldLOW
23TFEB4e-08bHLH-Zip TFPDB (4)LOW (TF)
24ETV63e-11ETS TFPDB (40+)LOW (TF)
25MACF13e-07CytoskeletalAlphaFoldLOW
26BICC17e-08RNA-bindingAlphaFoldLOW
27CUBN7e-08ReceptorAlphaFoldLOW
28SMAD4ClinVarSMAD TFPDBLOW (tumor suppressor)
29PTPRJGenCCPhosphataseAlphaFoldMODERATE (phosphatase)
30PLA2G2AGenCCPhospholipasePDBMODERATE (enzyme)

Section 18: Summary

GWAS LANDSCAPE

  • Total associations: 2,183 across 189 studies
  • Unique genome-wide significant genes: ~95
  • Coding vs non-coding: ~4% coding / ~96% non-coding
  • Dominant loci: 8q24 (MYC enhancer, p=2e-56), SMAD7 (p=3e-30), GREM1 (p=9e-40)

GENETIC EVIDENCE

  • Tier 1 (coding) genes: ~8
  • Mendelian overlap (GenCC): 15 genes
  • ClinVar overlap: 75 genes
  • Genes with BOTH GWAS + Mendelian: APC, CDH1, SMAD4/SMAD7, BRCA2, TP53, BRAF

DRUGGABILITY

  • Overall druggability rate: 63% have some drug/compound evidence
  • Approved drugs: 23% of GWAS genes
  • In clinical trials: 11%
  • Opportunity gap (no drug development): 40% (38 genes)

PYRAMID SUMMARY

LevelCount%
L1 — Validated (approved for CRC)1213%
L2 — Repurposing (approved elsewhere)1516%
L3 — Emerging (clinical trials)1011%
L4 — Tool compounds1516%
L5 — Druggable undrugged88%
L6 — Hard targets3537%

CLINICAL TRIAL ALIGNMENT

  • ~20% of CRC trial drugs directly target GWAS genes
  • ~45% target GWAS gene pathways (indirect)
  • Gap between genetic evidence and therapeutic development remains significant

TOP 10 REPURPOSING CANDIDATES

DrugGene TargetApproved Forp-valueScore
PalbociclibCCND2→CDK4/6Breast cancer1e-1795
RibociclibCCND2→CDK4/6Breast cancer1e-1794
OlaparibBRCA1/2→PARPOvarian/breast8e-1290
RuxolitinibSH2B3→JAK2Myelofibrosis3e-1688
AlpelisibPIK3CABreast cancerSomatic87
AbirateroneCYP17A1Prostate cancer8e-1285
ErdafitinibFGFR2Urothelial ca8e-3584
DasatinibSRCCMLGenCC82
GalunisertibSMAD7→TGFβR1Phase 2 HCC3e-3080
ImetelstatTERTMDS5e-2578

TOP 10 UNDRUGGED OPPORTUNITIES

Genep-valueFamilyStructurePotential
GREM19e-40Cystine-knotPDB 1.9ÅHIGH
ERAP17e-08M1 aminopeptidasePDB 1.33ÅHIGH
SPSB24e-11SOCS boxPDB 1.23ÅHIGH
CCND21e-17CyclinAlphaFoldHIGH (indirect)
SMAD73e-30SMADPDB/NMRMODERATE-HIGH
PREX16e-15Rac GEFPDB 1.69ÅMODERATE
TCF7L23e-15TCF/LEF TFPDB 1.9ÅMODERATE
SH2B33e-16Adaptor (SH2)AlphaFoldMODERATE (indirect)
NXN3e-08ThioredoxinAlphaFoldMODERATE
PTPRJGenCCPhosphataseAlphaFoldMODERATE

TOP 10 INDIRECT OPPORTUNITIES

Undrugged GeneDrugged InteractorDrug
CCND2 ↔ CDK4/6CDK4/6Palbociclib, Ribociclib
SMAD7 ↔ TGFβR1TGFβR1Galunisertib
SH2B3 ↔ JAK2JAK2Ruxolitinib
GREM1 ↔ VEGFR2VEGFR2Bevacizumab
TCF7L2 ↔ β-cateninCTNNB1PRI-724
APC ↔ TankyraseTNKSXAV939
BRCA1/2 ↔ PARPPARP1Olaparib
SMAD4 ↔ TGFβRTGFβR1Galunisertib
AXIN2 ↔ GSK3βGSK3BTideglusib
SMAD9 ↔ BMPRBMPR1ALDN-193189

KEY INSIGHTS

  1. GREM1 is the most promising novel target — strongest GWAS signal (9e-40), secreted protein amenable to antibody targeting, crystal structure available, role in BMP pathway well-characterized. Aberrant GREM1 expression in colonic epithelium drives tumorigenesis independently of APC.

  2. CDK4/6 inhibitors (palbociclib/ribociclib) have the strongest genetic rationale for repurposing — CCND2 is one of the most significant GWAS genes (p=1e-17), CDK4/6 is its direct functional partner, and these drugs are already approved with established safety profiles.

3. The TGF-β/BMP pathway cluster is the most genetically validated pathway — SMAD7 (p=3e-30), GREM1 (p=9e-40), SMAD9 (p=6e-13), BMP4 (p=5e-10), and SMAD4 (ClinVar) all converge on this pathway. Galunisertib (TGFβR1 inhibitor) has the most direct repurposing rationale.

4. The Wnt pathway genes (APC, TCF7L2, AXIN2) lack direct drug targets but the pathway offers multiple druggable entry points: tankyrase, porcupine, and the β-catenin/TCF PPI interface.

  1. CRC has a high Mendelian-GWAS overlap — 15 GenCC genes and 75 ClinVar genes provide strong validation. The mismatch repair genes (MSH2, MSH6, MLH3) are particularly notable as they predict immunotherapy response.

  2. ERAP1 is an under-appreciated immunotherapy target — its role in MHC-I antigen trimming means ERAP1 inhibitors could enhance neoantigen presentation and boost checkpoint inhibitor efficacy. High-resolution co-crystal structures (1.33 Å) enable structure-based drug design.

  3. 40% opportunity gap — 38 significant GWAS genes have NO drug development, representing the frontier for novel target discovery. The most tractable among these are enzymes (ERAP1, NOS1, FADS2) and secreted proteins (GREM1).

  4. Comparison with other diseases: CRC has one of the highest GWAS-to-drug translation rates (~23% with approved drugs) among complex diseases, likely due to its well-characterized somatic driver landscape (KRAS, BRAF, PIK3CA). However, most approved drugs target somatic drivers rather than germline GWAS risk genes, indicating the germline signal remains largely untapped therapeutically.


Analysis performed using biobtree integrated biological database. Data sources: GWAS Catalog, ClinVar, GenCC, UniProt, ChEMBL, InterPro, Reactome, STRING, PDB, AlphaFold, PharmGKB, Bgee, CellxGene, MeSH, ClinicalTrials.gov, MONDO, EFO, OMIM, Orphanet.