LogoLogo
OT PlatformOT GeneticsCommunityBlog
  • Open Targets Platform
  • Getting started
  • Target
    • Tractability
    • Safety
    • Chemical probes & TEPs
    • Baseline expression
    • Molecular interactions
    • Core Gene Essentiality
    • Pharmacogenetics
  • Disease or Phenotype
    • Clinical signs and symptoms
  • 🆕Variant
  • 🆕Study
  • Drug
    • Clinical Precedence
    • Pharmacovigilance
    • Pharmacogenetics
  • 🆕Credible Set
  • Target–disease evidence
  • Target–disease associations
  • 🆕GWAS & functional genomics
    • Data sources
    • Fine-mapping
    • Colocalisation
    • Locus-to-Gene (L2G)
    • Gentropy
  • Bibliography
  • Web interface
    • Associations on the Fly
    • Target Prioritisation
    • Evidence pages
    • Entity profile pages
  • Data and code access
    • Download datasets
    • Google BigQuery
    • GraphQL API
    • 🆕Platform infrastructure
    • 🆕Data pipeline
  • 🆕FAQs
  • Release notes
  • Citation
  • Licence
    • Terms of use
  • Partner Preview Platform
Powered by GitBook
On this page
  • Overview
  • Population Allele Frequencies
  • Variant effect
  • Molecular Structure Viewer
  • Transcript consequences
  • Variant-to-phenotype

Was this helpful?

Export as PDF

Variant

Common and rare variation in Open Targets Platform

Overview

In the Open Targets Platform, a variant refers to any human variation associated with a disease, trait or phenotype that has been reported in any of our sources. All variation is mapped to GRCh38 build and enriched with functional annotation. The Platform currently captures single nucleotide polymorphisms (SNPs) and insertions/deletions.

Variant identifier

Variant identifiers of SNPs and small indels are created based on genomic location and alleles like: 6_160589086_A_G where A is the reference allele at position 160,589,086 on chromosome 6 and the alternate allele is G. Being consistent with gnomAD, we are using a 1-based coordinate system.

For longer insertions (200+) and deletions, where keeping the full length of the allele in the variant identifier is impractical, the allele string is hashed to create the identifier, which, when available, might contain the chromosome and position as well. Example: OTVAR_11_614383_9cc2ae367cc98c283cb510e8ea29c9f0

All variants shown in the Platform are reported in at least one of our variant-to-phenotype sources.

Population Allele Frequencies

Alternate allelic frequencies from gnomAD variation database are reported for all major populations when available.

Source: gnomAD 4.1

Variant effect

Variants are annotated with an integrated view of variant effects from multiple methods. Based on all predictions or annotations, we normalise the variant's likely deleteriousness to a common scale.

Method Name

Description

AlphaMissense

FoldX

GERP

LOFTEE

SIFT

Ensembl VEP

To make the predicted variant effects comparable across different methods, raw predictions from each methods were normalised to a unified scale ranging from likely benign to uncertain to likely deleterious.

Molecular Structure Viewer

For predicted missense variants, we have included a Molecular Structure viewer on the variant page that locates the variant in the AlphaFold structure. The feature includes the option to switch to a pathogenicity view, which shows the AlphaMissense pathogenicity for the substitution corresponding to the variant, and the average AlphaMissense pathogenicity score across all possible amino acid substitutions at other positions.

Source: AlphaPhold DB

Transcript consequences

Every variant is annotated with the predicted consequence for all canonical transcripts in a +/-500Kb window, allowing us to understand the likely effects in the neighbouring coding or non-coding genes. For all variant-transcript pairs in the region, this information includes:

  • Distance from transcription start site (TSS)

  • Distance to footprint

  • Predicted functional consequence based on Ensembl VEP

  • Amino-acid consequence relative to the UniProt reference protein

Source: Ensembl VEP

Variant-to-phenotype

The list of variant sources includes:

  • 95% GWAS credible sets

  • 95% Molecular QTL credible sets

  • ClinVar: Submitted variants at all clinical significances

  • Uniprot: Literature-based curation of disease-associated variants

  • Pharmacogenetics: Variants corresponding to genotypes associated with drug responses

PreviousClinical signs and symptomsNextStudy

Last updated 2 days ago

Was this helpful?

A deep learning model that builds on the protein structure prediction tool AlphaFold2 () to assess the effect for missense variants across the proteome.

FoldX is a computational tool that predicts the impact of mutations on protein stability and structure by calculating changes in free energy, helping to assess the potential functional consequences of missense variants (). The data represented in the Platform was generated by an Open Targets project team using the FoldX algorithm to predict stability changes for protein variants based on all human AlphaFold2 () predicted structures with confidence scores of pLDDT>70.

GERP (Genomic Evolutionary Rate Profiling) scores are used to identify regions of the genome that are evolutionarily conserved and likely to be functionally important, where higher conservation indicates potential deleterious impact of variants.

LOFTEE (Loss-Of-Function Transcript Effect Estimator) is a tool used to identify and annotate high-confidence loss-of-function variants in human genetic data, focusing on variants that likely disrupt gene function.

SIFT (Sorting Intolerant From Tolerant) predicts whether an amino acid substitution affects protein function based on sequence homology and the physical properties of amino acids.

Pathogenicity score derived from the most severe consequence term provided by Ensembl’s Variant Effect Predictor (VEP). ,

🆕
ref
ref
ref
ref
ref
ref
ref
ref
GitHub