LogoLogo
OT PlatformOT GeneticsCommunityBlog
  • Open Targets Platform
  • Getting started
  • Target
    • Tractability
    • Safety
    • Chemical probes & TEPs
    • Baseline expression
    • Molecular interactions
    • Core Gene Essentiality
    • Pharmacogenetics
  • Disease or Phenotype
    • Clinical signs and symptoms
  • 🆕Variant
  • 🆕Study
  • Drug
    • Clinical Precedence
    • Pharmacovigilance
    • Pharmacogenetics
  • 🆕Credible Set
  • Target–disease evidence
  • Target–disease associations
  • 🆕GWAS & functional genomics
    • Data sources
    • Fine-mapping
    • Colocalisation
    • Locus-to-Gene (L2G)
    • Gentropy
  • Bibliography
  • Web interface
    • Associations on the Fly
    • Target Prioritisation
    • Evidence pages
    • Entity profile pages
  • Data and code access
    • Download datasets
    • Google BigQuery
    • GraphQL API
    • 🆕Platform infrastructure
    • 🆕Data pipeline
  • 🆕FAQs
  • Release notes
  • Citation
  • Licence
    • Terms of use
  • Partner Preview Platform
Powered by GitBook
On this page
  • Target–disease evidence
  • Evidence data sources
  • GWAS associations
  • Gene Burden
  • ClinVar
  • Genomics England (GEL) PanelApp
  • Gene2Phenotype
  • UniProt literature
  • UniProt variants
  • Orphanet
  • ClinGen
  • ChEMBL
  • Reactome
  • CRISPR screens
  • Project Score
  • SLAPenrich
  • Gene signatures
  • PROGENy
  • Expression Atlas
  • Cancer Gene Census
  • IntOGen
  • ClinVar (somatic)
  • Europe PMC
  • IMPC
  • Cancer Biomarkers

Was this helpful?

Export as PDF

Target–disease evidence

PreviousCredible SetNextTarget–disease associations

Last updated 2 months ago

Was this helpful?

Target–disease evidence

Every event or set of events pinpointing a target as a potential causal gene or protein for a disease represents the unit of information, most often referred to as evidence. Within the Open Targets Platform, a series of pipelines ensure information is retrieved from its sources and standardised in a way that can be immediately applied to answer drug development queries.

All evidence is mapped to the reference target entity identifier (Ensembl gene) and disease or phenotype identifier (experimental factor ontology, EFO), as well as other reference controlled vocabularies and ontologies when appropriate. Evidence is also reviewed to minimise the presence of duplicates within the same data source.

Data sources are also grouped into bigger categories abstracting the type of evidence they predominantly capture. In the platform, these categories are usually referred to as data types, as opposed to the individual resource data referred to as data sources.

The Open Targets Platform provides a scoring framework for each data source to contextualise the relative importance of each piece of evidence. This score will be more relevant when understanding the association scoring in later sections.

Evidence data sources

GWAS associations

The GWAS associations data source aggregates target-disease relationships supported by significant genome-wide associations (GWAS) in the context of other functional genomics data.

The evidence in this data source results from a comprehensive statistical genetics analysis described in section. The aim of this analysis is to identify GWAS-significant signals across an of GWAS studies covering binary and quantitative traits. To address linkage disequilibrium, all significant signals are and the resulting credible sets against molQTL studies. All GWAS and functional genomic features are leveraged by the machine-learning method aimed to prioritise likely causal genes in the region.

The GWAS association evidence is defined as any credible set in a GWAS trait associated with a gene with a Locus2Gene (L2G) > 0.05. The feature contributions for the L2G predictions are also by SHAP analysis helping with the interpretation of the observed features. All credible sets can also be futher interrogated in their own page, including an interpretation of the directionality in the context of colocalising molQTL studies.

Datatype: Genetic associations

Evidence scoring: , filtered to use scores above 0.05

Gene Burden

Gene burden data comprises gene–phenotype relationships observed in gene-level association tests using rare variant collapsing analyses. The Platform integrates burden tests carried out by several sources:

  • REGENERON (Backman et al., 2021), a whole-exome sequencing analysis of individuals from the UK Biobank.

  • AstraZeneca PheWAS Portal (Wang et al., 2021), a whole-exome sequencing analysis of individuals from the UK Biobank.

  • Genebass (Karczewski et al., 2022): Gene-based Association Summary Statistics (Genebass), a whole-exome sequencing analysis of individuals from the UK Biobank.

  • The results of whole-exome and whole-genome sequencing analysis based on the SPARK cohort bring evidence of novel targets implicated in autism spectrum disorder (Zhou et al., 2022).

  • The SCHEMA consortium (Singh et al., 2022), a whole-exome sequencing analysis of individuals with schizophrenia.

  • The Epi25 collaborative (Epi25 Collaborative, 2019), a whole-exome sequencing analysis of individuals with epilepsy.

  • The Autism Sequencing Consortium (Satterstrom et al., 2020), a whole-exome sequencing analysis of individuals with autism spectrum disorder.

  • The results of an Open Targets project (Bomba et al., 2022), a whole-exome sequencing analysis of individuals from the INTERVAL cohort testing for associations between rare coding variants and blood metabolites.

  • The results of a pan-ancestry whole-exome sequencing analysis identify relevant genes associated with fat distribution (Akbari et al., 2022).

  • The results of whole-exome and whole-genome sequencing analysis on Parkinson disease and promoted by the AMP-PD initiative, and other collaborators (Makarious et al., 2022).

  • The results of gene-based analyses of rare variants and circulating metabolic biomarkers relevant to cardiovascular disease (Riveros-McKay et al., 2020).

  • The results of rare coding variant analyses from whole exome sequencing of Black South African men to identify genes significantly associated with prostate cancer (Soh et al., 2023)

These associations are a result of collapsing rare variants in a gene into a single burden statistic and regress the phenotype on the burden statistic to test for the combined effects of all rare variants in that gene. The different collapsing methods inform about the filters used to select the set of qualifying variants, mostly based on their pathogenicity and frequency in the population.

Datatype: Genetic associations

Evidence scoring: Scaled p-value from 0.25 (p = 1e-7) to 1 (p < 1e-17).

Direction of Effect assessment:

Direction on Target (Gain of Function (GoF) / Loss of Function (LoF))
Direction on Trait (Risk/Protective)

Assumption of all variants LoF

ClinVar

ClinVar is an NIH public archive of reports of the relationships among human variations and phenotypes, with supporting evidence. The ClinVar data source in the Open Targets Platform captures the subset of ClinVar that refers to germline variants (as opposed to somatic variants). Each evidence in the platform aims to capture an individual RCV record in ClinVar.

Information on variants is covered extensively for both single point and structural variants. When available, genomic coordinates are reported with RS numbers, or by following the CHROM_POS_REF_ALT and HGVS notations.

Datatype: Genetic associations

Evidence scoring: ClinVar evidence is scored in a 2-step process. In Step 1, a score is assigned to every piece of evidence based on the clinical significance:

Clinical significance
Evidence score

association not found

0

benign

0

not provided

0

likely benign

0

conflicting data from submitters

0.3

conflicting interpretations of pathogenicity

0.3

low penetrance

0.3

other

0.3

uncertain risk allele

0.3

uncertain significance

0.3

established risk allele

0.5

risk factor

0.5

affects

0.5

likely pathogenic

0.7

association

0.9

confers sensitivity

0.9

drug response

0.9

protective

0.9

pathogenic

0.9

In Step 2, the score is modulated based on the ClinVar review status:

Confidence
Evidence score modifier

no assertion provided

+0

no assertion criteria provided

+0

no assertion for the individual variant

+0

criteria provided, single submitter

+0.02

criteria provided, conflicting interpretations

+0.02

criteria provided, multiple submitters, no conflicts

+0.05

reviewed by expert panel

+0.07

practice guideline

+0.1

Direction of Effect assessment:

Direction on Target (Gain of Function (GoF) / Loss of Function (LoF))
Direction on Trait (Risk/Protective)

LoF variants

Genomics England (GEL) PanelApp

The Genomics England PanelApp is a knowledge base that combines crowdsourced expertise with curation to provide gene–disease relationships. Virtual gene panels related to human disorders are reviewed by experts within the clinical and scientific community to support the interpretation of genomes within the 100,000 Genomes Project. Within a panel, genes are rated based on the level of evidence supporting the association with the phenotypes identified by the panel. Genes are then classified according to a traffic light system with red/stop, amber/pause, and green/go classifications. To receive a green rating (diagnostic-grade) on a version 1+ panel, the gene requires "evidence from 3 or more unrelated families or from 2-3 unrelated families where there is strong additional functional data" and "genes that do not meet these criteria are rated as Amber (borderline) or Red (low level of evidence)."

Data type: Genetic associations

Evidence scoring: Based on Genomics England gene rating:

Gene Rating in GEL Panel
Evidence score

Amber

0.5

Green

1

Gene2Phenotype

G2P evidence in the Platform is the result of any target-disease curation by any of the expert panels.

Data type: Genetic associations

Evidence scoring:

Gene2Phenotype confidence
Evidence score

Limited

0.01

Moderate

0.5

Strong

1

Both RD and IF

1

Definitive

1

Direction of Effect assessment:

Direction on Target (Gain of Function (GoF) / Loss of Function (LoF))
Direction on Trait (Risk/Protective)

LoF and GoF variants

Assumption of Risk

UniProt literature

The Universal Protein Resource (UniProt) provides a large compendium of sequence and functional information at the protein level. As part of their functional annotation effort, UniProt curators also annotate proteins with publications supporting their involvement on pathogenic processes.

All publications supporting a given target disease relationship are aggregated into one single Platform evidence.

Data type: Genetic associations

Evidence scoring:

Uniprot confidence

Evidence score

Medium

0.5

High

1

UniProt variants

The Universal Protein Resource (UniProt) also curate variants supported by publications that are known to alter protein function on disease. Curated mutations are predominantly protein coding or in regulatory regions clearly associated with the causal protein.

All publications supporting a given variant in connection with a disease constitute individual evidence. All supporting publications are aggregated within the same evidence.

Data type: Genetic associations

Evidence scoring:

UniProt confidence

Evidence score

Medium

0.5

High

1

Orphanet

Orphanet is an international network that offers a range of resources to improve the understanding of rare disorders of genetic origin. These resources include an inventory of rare disease and gene associations, classification of the gene–disease relationship, information on the kind of mutation, and supporting publication references.

Data type: Genetic associations

Evidence scoring:

Orphanet Disorder Gene Association Status
Evidence score

Not yet assessed

0.5

Assessed

1

Direction of Effect assessment:

Direction on Target (Gain of Function (GoF) / Loss of Function (LoF))
Direction on Trait (Risk/Protective)

LoF and GoF variants

Assumption of Risk

ClinGen

The Clinical Genome Resource (ClinGen) Gene–Disease Validity Curation aims to evaluate the strength of evidence supporting or refuting a claim that variation in a particular gene causes a particular disease. ClinGen provides a framework of guidelines to assess clinical validity in a semi-quantitative manner allowing curators to classify the validity of given gene–disease pair.

All gene–disease pairs mapped to EFO constitute individual evidence in the Platform.

Data type: Genetic associations

Evidence scoring:

ClinGen classification

Evidence score

No reported evidence

0.01

Refuted

0.01

Disputed

0.01

Limited

0.01

Moderate

0.5

Strong

1

Definitive

1

ChEMBL

EMBL-EBI's ChEMBL is a manually curated database of bioactive molecules with drug-like properties, either approved for marketing by the U.S Food and Drug Administration (FDA), or clinical candidates. ChEMBL also captures information regarding the drug molecule indications, as well as their curated pharmacological target.

In the Platform, ChEMBL evidence represents any target–disease relationship that can be explained by an approved or clinical candidate drug, targeting the gene product and indicated for the disease. Independent studies are treated as individual evidence.

To provide additional context, we integrate a machine learning-based analysis of the reasons why a clinical trial has ended earlier than scheduled. This sorts the stop reasons into a set of 17 classes which include negative, neutral, and positive reasons. This information is available when hovering on the tooltip of the Source column.

The 17 classes are: Another Study, Business or Administrative, Negative, Study Design, Invalid Reason, Ethical Reason, Insufficient Data, Insufficient Enrolment, Study Staff Moved, Endpoint Met, Regulatory, Logistics or Resources, Safety and Side Effects, No Context, Success, Interim Analysis, and Covid 19.

Data type: Drugs

Evidence scoring: ChEMBL evidence is scored in a 2-step process. In Step 1, a score is assigned to every piece of evidence based on the clinical precedence:

Clinical Precedence
Evidence score

Phase I (Early)

0.05

Phase I

0.1

Phase II

0.2

Phase III

0.7

Phase IV (only for approved indications)

1

In Step 2, for those clinical trials that have stopped early, the score is down-weighted based on the classification of the reason to stop. In this way, less importance is attributed to evidence of studies that have been stopped due to negative outcomes or safety concerns:

Reason to stop class
Score weight

Negative

0.5

Safety or side effects

0.5

Direction of Effect assessment:

Direction on Target (Gain of Function (GoF) / Loss of Function (LoF))
Direction on Trait (Risk/Protective)

Assumption of Protective

Reactome

The Reactome database manually curates and identifies reaction pathways that are affected by a disease. Reactome annotation includes information regarding the causal target–disease link either being a protein coding mutation or an altered expression.

In the Platform, any mutation or altered expression event affecting a different reaction is captured in a different target–disease evidence.

Data type: Pathways & systems biology

Evidence scoring: All manually curated evidence in Reactome has a score of 1.

CRISPR screens

One of the most powerful approaches to uncover gene function is the experimental perturbation of genes followed by the observation of related phenotypes. The perturbation of gene function in human cells has been greatly facilitated by developments in CRISPR technology.

We have linked cell types to diseases, meaning these diseases are often characterised with abnormal phenotypes in these cell types — hence the association. If knocking out a gene causes significant perturbation in the cell type, it might indicate a potential targeting strategy in the disease.

Data Type: Pathways & systems biology

Evidence Scoring: The Platform uses the linearised CRISPRbrain's assessment of statistical significance to assign a score, including hits from both the upper and lower end of the distribution

Project Score

Project Score is a Wellcome Sanger Institute resource that aims to identify dependencies in cancer cell lines to guide precision medicine. The project combines gene fitness effects derived from whole-genome CRISPR-Cas9 synthetic lethality screenings with tractability data, genomic biomarkers and various target annotation enabling a systematic prioritisation of potential targets. The resulting inferences are then mapped from the cancer cell lines in which the experiment is performed to their corresponding tumours.

In the Platform, any Project Score prioritised target with priority score reaching 36.0 is included as independent evidence; however, pan-cancer dependencies are excluded from the integration.

Data type: Pathways & systems biology

Evidence scoring: Project Score priority score divided by 100

SLAPenrich

In the Platform, each pathway significantly enriched in tumour-occurring mutations constitutes an individual piece of evidence.

Data type: Pathways & systems biology

Evidence scoring: Scaled enrichment p-value from 0.5 (p = 1e-4) to 1 (p<1e-14).

Gene signatures

The Platform also provides information about key driver genes for specific diseases that have been curated from Systems Biology analysis. These publications present different disease gene signatures as potential key drivers or key regulators causing disease.

Data type: Pathways & systems biology

Evidence scoring: Scoring depends on whether the original data contains or not a score:

  • p-values and rank-based scores are normalised to the 0.5 - 1 range

  • If there is no score a fixed value of 0.5 is used

PROGENy

In the Platform, a PROGENy evidence is defined as any significantly regulated sample-level pathway activities inferred from matched normal vs. tumour samples.

Data type: Pathways & systems biology

Evidence scoring: Scaled p-value from 0.5 (p = 1e-4) to 1 (p<1e-14).

Expression Atlas

The EMBL-EBI Expression Atlas provides a differential expression pipeline aiming to identify genes that are differentially expressed in disease vs control samples. Only contrasts from studies with enough replicates and minimum quality criteria are included in the processing.

In a given contrast, to consider a gene significantly regulated in a contrast, all the following rules are required:

  • Absolute log2 fold change > 1

  • Adjusted p-value <= 0.05

  • Maximum significant genes probes per contrast = 1000

In the Platform, each contrast from independent studies capturing differentially regulated genes constitutes independent evidence.

Data type: RNA expression

Evidence scoring: ExpressionAtlas scoring is the result of the product of:

  • Scaled p-value from 0 (p = 1) to 1 (p<1e-10)

  • Absolute log2 fold change divided by 10

  • Percentile rank divided by 100

Cancer Gene Census

In the Platform, CGC evidence is aggregated at the target–disease level to provide a summary of all curated evidence supporting the involvement of a target with a particular cancer type.

Data type: Somatic mutations

Modulator
Condition

-0.25

Only 1 mutated sample

+0.25

Gene mutated more frequently in particular disease compared to other diseases

+0.25

Mutations in gene occur more frequently than in other genes of similar length in the same disease

IntOGen

In the Platform, independent target–disease evidence are defined as any significant driver gene detected in any individual cohort. Information regarding the individual driver methods is also provided within each evidence.

Data type: Somatic mutations

ClinVar (somatic)

ClinVar is an NIH public archive of reports of the relationships among human variations and phenotypes, with supporting evidence. The ClinVar (somatic) data source in the Open Targets Platform captures the subset of ClinVar that refers to somatic variants (as opposed to germline variants).

Information on variants is covered extensively for both single point and structural variants. When available, genomic coordinates are reported with RS numbers, or by following the CHROM_POS_REF_ALT and HGVS notations.

Each evidence in the Platform aims to capture an individual RCV record in ClinVar.

Datatype: Somatic mutations

Evidence scoring: ClinVar evidence is scored in a 2-step process. In Step 1, a score is assigned to every piece of evidence based on the clinical significance:

Clinical significance
Evidence score

association not found

0

benign

0

not provided

0

likely benign

0

conflicting interpretations of pathogenicity

0.3

other

0.3

uncertain significance

0.3

risk factor

0.5

affects

0.5

likely pathogenic

0.7

association

0.9

drug response

0.9

protective

0.9

pathogenic

0.9

In Step 2, scored is modulated based on the ClinVar review status:

Confidence
Evidence score modifier

no assertion provided

+0

no assertion criteria provided

+0

no assertion for the individual variant

+0

criteria provided, single submitter

+0.02

criteria provided, conflicting interpretations

+0.02

criteria provided, multiple submitters, no conflicts

+0.05

reviewed by expert panel

+0.07

practice guideline

+0.1

Direction of Effect assessment:

Direction on Target (Gain of Function (GoF) / Loss of Function (LoF))
Direction on Trait (Risk/Protective)

LoF variants

Europe PMC

The EMBL-EBI's Europe PMC enables access to a worldwide collection of life science publications and preprints from trusted sources. The Europe PMC data source aims to identify target–disease co-occurrences in the literature and provide an assessment on the confidence of the relationship. This pipeline uses deep-learning based Named Entity Recognition (NER) to identify gene/proteins and diseases when mentioned in the text, to later normalise them to the target or disease/phenotype entities in the Platform. All co-occurrences of both types of entities in the same sentence are considered evidence.

In the Platform, a piece of Europe PMC evidence is the result of aggregating all co-occurrences of the same target and disease within the same publication.

Data type: Text mining

IMPC

The genotype–phenotype associations made available by the International Mouse Phenotypes Consortium (IMPC) are used to identify models of human disease based on phenotypic similarity scores.

The Wellcome Sanger Institute PhenoDigm is an algorithm aimed at capturing the similarity between a knockout mouse and the clinical manifestations (phenotype) of a human disease. The premise is that if a gene knock-out causes an equivalent phenotype in mouse, the human counterpart is likely to be related with the cause of the disease.

It uses a semantic approach to map between clinical features observed in humans and mouse phenotype annotations. The phenotypic effects in mice are then mapped to phenotypes associated with human diseases. The matches are identified and a similarity score between a mouse model and a human disease is computed.

Data type: Animal models

Direction of Effect assessment:

Direction on Target (Gain of Function (GoF) / Loss of Function (LoF))
Direction on Trait (Risk/Protective)

Assumption of all variants LoF

Assumption of Risk

Cancer Biomarkers

One of the aims of the Cancer Genome Interpreter is to identify how variations in the tumour genome may influence its response to anti-cancer therapies. The Cancer Biomarkers database features biomarkers of drug sensitivity, resistance, and toxicity for drugs targeting specific targets in cancer, curated by clinical and scientific experts in precision oncology, and classified by cancer type.

Data type: Pathways & systems biology

Evidence scoring: All manually curated evidence in Cancer Biomarkers has a score of 1

The FinnGen (R12) gene-based burden test results from collapsing loss of function variants, based on genotyping data from the Finnish population. .

Source: ,

References: ; ; ; , ; ; ; ; ; ; ;

Source: (via )

References: ; ; ;

The Open Targets Platform includes "green" and "amber" genes from version 1+ panels along with their phenotypes, providing the latter can be mapped to a disease or phenotype ontology. As we standardise our evidence to EFO, some of the phenotypes cannot be mapped and included in the Platform; please visit the for the full set.

Source:

References:

The data in Gene2Phenotype (G2P) is produced and curated from the literature by different sets of panels formed by consultant clinical geneticists. The G2P data is designed to facilitate the development, validation, curation, and distribution of large-scale, evidence-based datasets for use in diagnostic variant filtering. Each G2P entry associates an allelic requirement and a mutational consequence at a defined locus with a disease entity. A confidence level and evidence link are assigned to each entry. This confidence level follows the terminology described by for describing gene–disease validity.

Source:

References:

Source:

References:

Source:

References:

Source:

References: ;

Source:

References:

Source:

References:

Source:

References:

CRISPRbrain is a database for functional genomics screens in differentiated human brain cell types. We have prioritised genome-wide (healthy vs KO) for integration in the Platform to generate target–disease evidence.

Source:

Reference:

Source: CRISPR (via)

References:

SLAPenrich (Sample-population Level Analysis of Pathway enrichments) is a novel statistical framework for the identification of significantly mutated pathways, at the sample population level, in large cohorts of cancer patients. SLAPenrich is based on a Poisson binomial model that takes into account the length of blocks of exons in genes within each pathway, and the background mutation rate of the analysed cohort of patients. SLAPenrich enrichment analysis is based on EMBL-EBI Reactome pathways and mutation data from The Cancer Genome Atlas () cohort.

Source:

References:

References: ; ; ;

PROGENy (Pathway RespOnsive GENes) is a linear regression model that calculates pathway activity estimates based on consensus transcriptomic gene signatures obtained from perturbation experiments. PROGENy () provides a framework to systematically compare pathway activities between normal and primary samples from The Cancer Genome Atlas (TCGA).

Source:

References:

Source:

References:

Cancer Gene Census (CGC) is part of the Wellcome Sanger Institute Catalogue of Somatic Mutations in Cancer (). CGC is an effort to catalogue genes which contain mutations that have been causally implicated in cancer. The exhaustive curation of the CGC covers individual studies as well as pan-cancer sequencing efforts, including The Cancer Genome Atlas (TCGA) and the International Cancer Genome Consortium (ICGC) among others.

Evidence scoring: Scoring is based on the

Source:

References:

IntOGen provides a framework to identify potential cancer driver genes using large-scale mutational data from sequenced tumour samples. By harmonising tumour sequencing data from the ICGC/TCGA Pan-Cancer Analysis of Whole Genomes () and other comprehensive efforts, IntOGen aims to provide a consensus assessment of cancer driver genes. Several state-of-the-art driver methodologies aiming to cover different approaches (e.g. dN/dS, Hotspots, etc.) are included to finally produce a consensus q-value for each driver gene in every tumour.

Evidence scoring: Scaled from 0.25 (q = 0.1) to 1 (q < 1e-10)

Source:

References:

Source: (via )

References: ; ; ;

Evidence scoring: Score based on weighted document sections, sentence locations, and title for full text articles and abstracts as described in . The aggregated scores of each gene/disease co-occurrence in the publication are further normalised between 0 and 1.

Source:

References: ;

Evidence scoring: The evidence score indicates the degree of concordance between the mouse and disease phenotypes, as described by .

Source:

References:

Source:

References:

Pathogenic/Likely pathogenic = Risk

Protective = Protective

Activators = GoF

Inhibitors = LoF

Pathogenic/Likely pathogenic = Risk

Protective = Protective

Find out more in their documentation
AstraZeneca PheWAS Portal
Genebass
Wang, Q. et al, 2021
Backman, J.D. et al, 2021
K.K., Karczewski et al., 2022
Zhou X. et al, 2022
Singh et al., 2022
Epi25 Collaborative, 2019
Satterstrom et al., 2020
Bomba et al., 2022
Akbari, P., 2022
Makarious et al., 2022
Riveros-McKay et al., 2020
Soh et al., 2023
ClinVar
European Variation Archive
Cezard T. et al, 2021
Shen A. et al, 2024
Landrum, M. et al, 2014
Landrum, M. et al, 2020
Genomics England PanelApp website
Genomics England PanelApp
Martin, A. et al, 2019
GenCC
Gene2Phenotype
Thormann, A. et al, 2019
UniProt
The UniProt Consortium, 2021
UniProt
The UniProt Consortium, 2021
Orphanet Genes Associated with Rare Diseases
Orphanet
Orphadata
ClinGen Gene-Disease Validity
Strande, N. et al., 2017
ChEMBL
Mendez, D. et al, 2019
Reactome
Jassal, B. et al, 2020
CRISPRi/a/KO screens
CRISPRbrain
Tian, R et al, 2021
Project Score
Pacini et al, 2024
TCGA
SLAPenrich
Iorio, F. et al, 2018
Peters, L. A. et al, 2017
Huan, T. et al, 2013
Zhang, B. et al, 2013
Mostafavi, S. et al, 2018
Schubert et al
PROGENy
Schubert, M. et al, 2018
Expression Atlas
Papatheodorou, I. et al, 2020
COSMIC
Cancer Gene Census tier system
Cancer Gene Census
Sondka, Z. et al, 2018
PCAWG
combined q-values
intOGen
Martínez-Jiménez, F. et al, 2020
ClinVar
European Variation Archive
Cezard T. et al, 2021
Shen A. et al, 2024
Landrum, M. et al, 2014
Landrum, M. et al, 2020
Kafkas et al., 2017
Europe PMC
The Europe PMC Consortium, 2015
Kafkas et al., 2017
Smedley et al 2013
IMPC
Smedley, D. et al, 2013
Cancer Genome Interpreter
Tamborero, D. et al, 2018
GWAS and functional genomics
extensive set
fine-mapped
Locus-to-Gene
credible set
Locus-to-Gene score
Beta values
Odds ratios

< 0 = Protective

> 0 = Risk

< 1 = Protective

> 1 = Risk

colocalised
explained