Research

The science behind Provotics.

Provotics reads cancer's molecular signature from gene expression. The work is about doing that accurately, calibrating the uncertainty honestly, and keeping every call traceable to biology.

Approach

How it works under the hood

Transcriptome-wide signal

Each tumor carries roughly 18,000 gene measurements. Provotics learns the patterns that distinguish tissues of origin and molecular states from that full expression profile.

Ensemble modeling

Multiple models score each profile and are combined, which is more robust than any single classifier on noisy, correlated, high-dimensional data.

Calibration

Probabilities are temperature-scaled so a stated 80% means roughly 80% in practice, the difference between a number you can act on and one you can't.

Novelty detection

Out-of-distribution profiles are flagged and abstained on rather than forced into a label, so a wrong call is caught instead of reported.

Validation

Measured on tumors it never trained on

Numbers from held-out tumors and an independent cohort, not the training set.

92.6%accuracy on 1,724 held-out tumors

0.96site accuracy on an independent cohort (n=381)

0.14 → 0.03calibration error after temperature scaling

0.87macro-F1 across 25 sites

Honesty

What it can't do yet

Fully independent, out-of-distribution cohorts show the generalization gap that honest external validation always reveals. Some rare sites remain weak, and the model is deliberately built to abstain on samples it doesn't recognize rather than guess.

Provotics is a research and educational project. It is not a medical device, and its outputs are research hypotheses, not clinical decisions. Read more on the Safety page.

Work with the science

Open to research-lab and biotech collaborations. Request access to evaluate the model under our access agreement.

Request access