🆕Clinical Report
A clinical report is a single, traceable piece of clinical evidence that links a drug to a disease or safety outcome. At a conceptual level, each clinical report captures four core questions:
Which drug is this evidence about?
Which disease or safety outcome is it linked to?
Where does this evidence come from?
What development or regulatory status does the source describe?
Clinical evidence is heterogeneous in format and origin, so a clinical report is not limited to a single source type. Depending on the source, a report may represent a registered clinical trial record, a regulatory medicine or approval record, a curated indication reference, or a curated warning or withdrawal-related record. Across all source types, the core requirement is the same: the record must support a clinically meaningful link between a drug and a disease (or safety outcome) with identifiable provenance.
Clinical reports are later combined into aggregated drug-disease or drug-target views.
The clinical reports and all downstream datasets described in this page are generated by the Clinical Mining Pipeline. The pipeline is modular and designed to be extended to additional clinical data sources. The code is openly available on GitHub.
Data sources used to extract clinical reports
We currently integrate six source families. They are complementary: each source contributes a different evidence profile and helps provide a more complete picture of the journey from discovery to market.
ClinicalTrials.gov via AACT
Structured registry records of clinical studies, including interventions and studied conditions.
A single trial record (one NCT ID), which may involve multiple drugs and multiple disease conditions
ChEMBL curated indications
Curated indication records, linked to external references (for example FDA, EMA, ATC, DailyMed, INN, USAN).
A single indication record, which may correspond to either a curated drug/indication pair (one drug, one disease) or a DailyMed medicine reference (one label, potentially covering multiple drugs and diseases)
ChEMBL drug warnings
Curated warning and withdrawal-oriented records linked to drugs.
A single warning record associated with a drug (either a black box warning or a withdrawal)
Therapeutic Target Database (TTD)
Curated drug and disease information from the Therapeutic Target Database.
A single drug/indication pair
EMA Human Medicines
Regulatory evidence for authorised human medicines and therapeutic-use context in Europe.
A single medicine label, which may cover one or more active ingredients and one or more approved indications
PMDA approvals
Public approvals information from Japan's Pharmaceuticals and Medical Devices Agency.
A single drug/indication pair
Clinical stage categories
Each source reports development or regulatory status using its own terms. To support cross-source analysis, Open Targets harmonises source-reported values into a shared clinical-stage framework.
This harmonisation has two goals:
make stage labels comparable across heterogeneous sources;
support consistent ranking of evidence in downstream clinical precedence views.
The harmonised framework includes the following categories:
Withdrawal
Evidence that a medicine was withdrawn, revoked, lapsed, suspended, or otherwise removed from use
Approval
Evidence of marketing authorisation or approved status from regulatory or authoritative sources
Phase IV
Post-marketing interventional evidence from trials conducted after regulatory approval
Preapproval
Late-stage regulatory submission evidence before full marketing authorisation, including submitted applications, formal opinions, and equivalent pre-market review stages
Phase III
Phase II/III
Evidence from trials spanning or bridging mid- to late-stage clinical development
Phase II
Mid-stage clinical development evidence (Phase II, including subphases such as IIa/IIb)
Phase I/II
Evidence from trials spanning or bridging early to mid-stage clinical development
Phase I
Early human clinical development evidence (Phase I, including subphases such as Ib)
Early Phase I
Exploratory early human studies conducted prior to standard Phase I, typically with a limited number of participants and a primary focus on safety or pharmacokinetics
IND
Investigational New Drug application or equivalent regulatory filing authorising first-in-human studies; the drug has not yet entered clinical trials
Preclinical
Evidence reported as preclinical or patented/preclinical development
Unknown
Source value is missing, ambiguous, not mappable, or not directly comparable to standard stage labels
Reason to stop categories
For clinical reports derived from clinical trials, we integrate a machine learning-based classification of the reasons why a trial ended earlier than expected. The model was trained on free-text stop reasons from 28,842 stopped trials on ClinicalTrials.gov and classifies them into 17 categories covering negative, neutral, and positive reasons for stoppage. The model is available on Hugging Face.
The 17 classes are: Another Study, Business or Administrative, Negative, Study Design, Invalid Reason, Ethical Reason, Insufficient Data, Insufficient Enrolment, Study Staff Moved, Endpoint Met, Regulatory, Logistics or Resources, Safety and Side Effects, No Context, Success, Interim Analysis, and Covid 19.
This classification is used to down-weight evidence from trials that stopped for negative reasons in downstream scoring.
Reference: Razuvayevskaya et al., Nature Genetics, 2024
Quality filters
Not all clinical reports are used to derive downstream datasets. Before aggregation into clinical indications, clinical targets, or target/disease evidence, each report is evaluated against a set of quality criteria. Reports that fail any of these checks are flagged and excluded from downstream use.
Three exclusion criteria are currently applied:
Phase IV reports with indications for which we don't have any approval
Reports where indications cannot be validated by a high-confidence report, i.e. a report that has been manually curated, an approved indication, or a report linking one drug to a single indication
Clinical trial reports in which the primary purpose of the study is not to directly measure the effect of the intervention on the target condition or its symptomatology.
Flagged reports remain accessible at the clinical report level but do not contribute to clinical indications, clinical targets, or target/disease evidence.
Last updated
Was this helpful?