About Sugi Predict

Predicted protein targets for patent chemistry, open and reproducible.

Sugi Predict is an open, reproducible system that predicts the human protein targets of a small molecule and serves those predictions at the scale of the patent record. Every drug-like compound in SureChEMBL, roughly 30 million, carries its predicted targets and its patent provenance, grounded to stable gene and protein identifiers through BioBTree.

Predictions follow an established principle: chemically similar molecules tend to share targets. A compound is matched by its chemical fingerprint to a reference of known ChEMBL ligands, and the targets of its nearest neighbours are carried over with a confidence calibrated to the accuracy observed at that similarity, alongside a count of how many neighbours support each target. A molecule with no close analogue is flagged as novel rather than given a low-quality guess.

One index answers two questions: from a compound to its predicted targets, and, inverting the same search, from a target to the patented chemistry predicted against it. Each surfaced compound resolves to its SureChEMBL provenance: the patents that disclose or claim it, their assignees, and their dates. Clinical trials, related proteins, and patent text are held alongside as supporting context, used for retrieval rather than prediction.

Sugi Predict is the predictive counterpart to the deterministic Sugi Atlas: where that catalogues what is known, this surfaces what is plausible but not yet curated. The full method and validation are in the preprint. It is part of the broader sugi.bio project; questions or issues, get in touch.