🆕Clinical Report

A clinical report is a single, traceable piece of clinical evidence that links a drug to a disease or safety outcome. At a conceptual level, each clinical report captures four core questions:

  • Which drug is this evidence about?

  • Which disease or safety outcome is it linked to?

  • Where does this evidence come from?

  • What development or regulatory status does the source describe?

Clinical evidence is heterogeneous in format and origin, so a clinical report is not limited to a single source type. Depending on the source, a report may represent a registered clinical trial record, a regulatory medicine or approval record, a curated indication reference, or a curated warning or withdrawal-related record. Across all source types, the core requirement is the same: the record must support a clinically meaningful link between a drug and a disease (or safety outcome) with identifiable provenance.

Clinical reports are later combined into aggregated drug-disease or drug-target views.

circle-info

The clinical reports and all downstream datasets described in this page are generated by the Clinical Mining Pipeline. The pipeline is modular and designed to be extended to additional clinical data sources. The code is openly available on GitHubarrow-up-right.

Data sources used to extract clinical reports

We currently integrate six source families. They are complementary: each source contributes a different evidence profile and helps provide a more complete picture of the journey from discovery to market.

Source
Description
Unit of evidence
Reference

ClinicalTrials.gov via AACT

Structured registry records of clinical studies, including interventions and studied conditions.

A single trial record (one NCT ID), which may involve multiple drugs and multiple disease conditions

ChEMBL curated indications

Curated indication records, linked to external references (for example FDA, EMA, ATC, DailyMed, INN, USAN).

A single indication record, which may correspond to either a curated drug/indication pair (one drug, one disease) or a DailyMed medicine reference (one label, potentially covering multiple drugs and diseases)

ChEMBL drug warnings

Curated warning and withdrawal-oriented records linked to drugs.

A single warning record associated with a drug (either a black box warning or a withdrawal)

Therapeutic Target Database (TTD)

Curated drug and disease information from the Therapeutic Target Database.

A single drug/indication pair

EMA Human Medicines

Regulatory evidence for authorised human medicines and therapeutic-use context in Europe.

A single medicine label, which may cover one or more active ingredients and one or more approved indications

PMDA approvals

Public approvals information from Japan's Pharmaceuticals and Medical Devices Agency.

A single drug/indication pair

Clinical stage categories

Each source reports development or regulatory status using its own terms. To support cross-source analysis, Open Targets harmonises source-reported values into a shared clinical-stage framework.

This harmonisation has two goals:

  • make stage labels comparable across heterogeneous sources;

  • support consistent ranking of evidence in downstream clinical precedence views.

The harmonised framework includes the following categories:

Category
Description

Withdrawal

Evidence that a medicine was withdrawn, revoked, lapsed, suspended, or otherwise removed from use

Approval

Evidence of marketing authorisation or approved status from regulatory or authoritative sources

Phase IV

Post-marketing interventional evidence from trials conducted after regulatory approval

Preapproval

Late-stage regulatory submission evidence before full marketing authorisation, including submitted applications, formal opinions, and equivalent pre-market review stages

Phase III

Phase II/III

Evidence from trials spanning or bridging mid- to late-stage clinical development

Phase II

Mid-stage clinical development evidence (Phase II, including subphases such as IIa/IIb)

Phase I/II

Evidence from trials spanning or bridging early to mid-stage clinical development

Phase I

Early human clinical development evidence (Phase I, including subphases such as Ib)

Early Phase I

Exploratory early human studies conducted prior to standard Phase I, typically with a limited number of participants and a primary focus on safety or pharmacokinetics

IND

Investigational New Drug application or equivalent regulatory filing authorising first-in-human studies; the drug has not yet entered clinical trials

Preclinical

Evidence reported as preclinical or patented/preclinical development

Unknown

Source value is missing, ambiguous, not mappable, or not directly comparable to standard stage labels

Reason to stop categories

For clinical reports derived from clinical trials, we integrate a machine learning-based classification of the reasons why a trial ended earlier than expected. The model was trained on free-text stop reasons from 28,842 stopped trials on ClinicalTrials.gov and classifies them into 17 categories covering negative, neutral, and positive reasons for stoppage. The model is available on Hugging Facearrow-up-right.

The 17 classes are: Another Study, Business or Administrative, Negative, Study Design, Invalid Reason, Ethical Reason, Insufficient Data, Insufficient Enrolment, Study Staff Moved, Endpoint Met, Regulatory, Logistics or Resources, Safety and Side Effects, No Context, Success, Interim Analysis, and Covid 19.

This classification is used to down-weight evidence from trials that stopped for negative reasons in downstream scoring.

Reference: Razuvayevskaya et al., Nature Genetics, 2024arrow-up-right

Quality filters

Not all clinical reports are used to derive downstream datasets. Before aggregation into clinical indications, clinical targets, or target/disease evidence, each report is evaluated against a set of quality criteria. Reports that fail any of these checks are flagged and excluded from downstream use.

Three exclusion criteria are currently applied:

  • Phase IV reports with indications for which we don't have any approval

  • Reports where indications cannot be validated by a high-confidence report, i.e. a report that has been manually curated, an approved indication, or a report linking one drug to a single indication

  • Clinical trial reports in which the primary purpose of the study is not to directly measure the effect of the intervention on the target condition or its symptomatology.

Flagged reports remain accessible at the clinical report level but do not contribute to clinical indications, clinical targets, or target/disease evidence.

Last updated

Was this helpful?