# sugi.bio > Biomedical data tools and databases for research and discovery. ## BioBTree BioBTree is a unified biomedical graph database integrating 70+ primary databases (UniProt, Ensembl, ChEMBL, ClinVar, PDB, Reactome, GO, and more) into a single queryable system with billions of cross-reference edges. ## MCP Server (for AI Assistants) Connect to BioBTree via MCP (Model Context Protocol): ```json { "mcpServers": { "biobtree": { "type": "http", "url": "https://sugi.bio/biobtree/mcp" } } } ``` ### Available Tools #### biobtree_search Search 70+ biological databases. **Syntax:** `biobtree_search(terms="entity", dataset="optional_filter")` **Workflow:** 1. Search WITHOUT dataset filter first (discover where entity exists) 2. Use IDs from results with biobtree_map **ID Type Recognition:** - ENSG* → ensembl - P*/Q*/O* → uniprot - CHEMBL* → chembl_molecule - GO:* → go - MONDO:* → mondo - HP:* → hpo - Gene symbols → hgnc **Returns:** id | dataset | name | xref_count #### biobtree_map Map identifiers between databases. **Syntax:** `biobtree_map(terms="ID", chain=">>source>>target")` - Chain MUST start with `>>` - Source MUST match input ID type **Discovery approach:** 1. Use biobtree_entry to see xrefs (what's connected) 2. Check dataset edges to see where each dataset leads 3. Build chains based on what connections exist **Returns:** mapped identifiers with dataset and name #### biobtree_entry Get full details for one identifier. **Syntax:** `biobtree_entry(identifier="ID", dataset="dataset_name")` **Use for:** - See all attributes of an entry - Discover filterable fields - Get detailed info (sequences, scores, descriptions) - Discover connections via xrefs **Returns:** All attributes + xref counts to connected datasets ### Example Queries Once connected, you can answer questions like: - "What proteins does BRCA1 encode?" - "Find drugs that target TP53" - "What pathways involve P04637?" - "Show me pathogenic variants in SCN9A" - "What tissues express CFTR most highly?" ## REST API Base URL: https://sugi.bio/biobtree/api/ ### Endpoints **Search** - Find identifiers across databases ``` GET /api/search?i={terms}&s={dataset} ``` Example: `https://sugi.bio/biobtree/api/search?i=BRCA1` **Map** - Chain queries across databases ``` GET /api/map?i={terms}&m={chain} ``` Example: `https://sugi.bio/biobtree/api/map?i=TP53&m=>>ensembl>>uniprot` **Entry** - Get full entry details ``` GET /api/entry?i={identifier}&s={dataset} ``` Example: `https://sugi.bio/biobtree/api/entry?i=P04637&s=uniprot` **Help** - Get schema reference ``` GET /api/help?topic={topic} ``` Topics: `edges`, `filters`, `patterns`, `examples`, `all` ### Query Patterns **Drug for Disease/Condition:** - Search disease → mondo → clinical_trials → chembl_molecule - Or: >>mesh>>chembl_molecule (MeSH disease → drugs with indications) - Or: >>mondo>>clinical_trials>>chembl_molecule **Drug Targets:** - >>chembl_molecule>>chembl_target>>uniprot (mechanism-level) - >>pubchem>>pubchem_activity>>uniprot (bioactivity data) - >>gtopdb_ligand>>gtopdb_interaction>>gtopdb>>uniprot (curated pharmacology with affinity data) - Filter approved: >>chembl_molecule[highestDevelopmentPhase==4] **Disease Genes:** - >>mondo>>gencc>>hgnc (curated associations) - >>mondo>>clinvar>>hgnc (variant-based) **Gene/Protein Function:** - >>ensembl>>go (gene ontology) - >>uniprot>>reactome (pathways) **Expression Data:** - >>ensembl>>bgee (tissue expression) **Protein Structures:** - >>uniprot>>pdb **Clinical Variants:** - >>hgnc>>clinvar - Filter pathogenic: >>clinvar[germline_classification=="Pathogenic"] ### Common Mapping Chains - `>>ensembl>>uniprot` - Gene to proteins - `>>uniprot>>pdb` - Protein to structures - `>>hgnc>>clinvar` - Gene to clinical variants - `>>chembl_molecule>>chembl_target>>uniprot` - Drug to protein targets - `>>ensembl>>reactome` - Gene to pathways - `>>ensembl>>bgee` - Gene to expression data ### Filter Syntax ``` >>dataset[field operator value] Operators: ==, !=, >, <, >=, <=, .contains() Logical: &&, ||, ! Examples: >>chembl_molecule[highestDevelopmentPhase==4] # approved drugs >>clinvar[germline_classification=="Pathogenic"] # pathogenic variants >>go[type=="biological_process"] # BP terms only >>gtopdb[type=="gpcr"] # GPCR targets >>gtopdb_ligand[approved==true] # approved drugs only ``` ### Dataset Edges (what connects to what) ``` ensembl: uniprot, go, transcript, hgnc, entrez, refseq, bgee, gwas, gencc hgnc: ensembl, uniprot, entrez, gencc, clinvar, mim, refseq, gwas, dbsnp, hpo uniprot: ensembl, alphafold, interpro, pdb, intact, string, chembl_target, go, reactome chembl_molecule: mesh, chembl_activity, chembl_target, pubchem, chebi, clinical_trials chembl_target: chembl_assay, uniprot, chembl_molecule pubchem: chembl_molecule, chebi, hmdb, pubchem_activity, pubmed, bindingdb, ctd, pharmgkb clinvar: hgnc, mondo, hpo, dbsnp, orphanet mondo: gencc, clinvar, efo, mesh, hpo, clinical_trials, orphanet gencc: mondo, hpo, hgnc, ensembl clinical_trials: mondo, chembl_molecule go: ensembl, uniprot, reactome, msigdb, bgee, interpro hpo: clinvar, gencc, mondo, msigdb, orphanet, mim, hmdb, hgnc reactome: ensembl, uniprot, chebi, go pdb: uniprot, go, interpro, taxonomy bgee: ensembl, uberon, cl, taxonomy string: uniprot, string_interaction orphanet: hpo, uniprot, mondo, hgnc, clinvar, mim pharmgkb: hgnc, dbsnp, mesh, pharmgkb_gene, pharmgkb_variant gtopdb: uniprot, hgnc, gtopdb_ligand, gtopdb_interaction # GPCRs, ion channels, enzymes gtopdb_ligand: pubchem, chebi, chembl_molecule, gtopdb_interaction # ligands with binding data gtopdb_interaction: gtopdb, gtopdb_ligand, pubmed # target-ligand affinity values ``` Full edge list available via: `GET /api/help?topic=edges` ## Q&A Content (/biobtree/) Pre-generated answers to biological questions: https://sugi.bio/biobtree/ Each answer includes: - Source datasets used - API calls for reproducibility - HTML tables with structured data ## Key Topics - Gene function and identifiers (HGNC, Ensembl, UniProt) - Drug mechanisms and targets (ChEMBL, PubChem) - Disease associations (ClinVar, OMIM, Orphanet) - Protein structures and interactions (PDB, STRING) - Pathway analysis (Reactome, GO) - Expression profiles (Bgee) - Clinical variants (ClinVar, PharmGKB) ## Citation When referencing BioBTree, please cite: - Repository: https://github.com/tamerh/biobtree - Preprint: https://zenodo.org/records/18962899 ## Contact GitHub: https://github.com/tamerh/biobtree Email: tamer.gur07@gmail.com