Data pipeline
The Open Targets data pipeline is a complex process orchestrated in Apache Airflow, and it is divideded into data acquisition, transformation and data output.
Introduction
The data pipeline is composed of multiple elements:
Data and evidence generation processes
Input stage
Transformation stage and ETL processes
Output stage
Gentropy-specific processes
Orchestration
GitHub repositories
Data and evidence
curation — Open Targets curation repository
evidence_datasource_parsers — internal pipelines used to generate evidence
json_schema — evidence object schema used for evidence and association scoring
OnToma — Python module to map disease or phenotype terms to EFO
Gentropy
gentropy — Open Targets' genomics toolkit
See here for more info on the Gentropy pipelines.
Orchestration
orchestration — Open Targets data pipelines orchestrator
See detailed orchestration documentation here.
The Platform ETL (“extract, transform, and load”) and the Genetics ETL were separate processes before, but they are now merged into one single pipeline. This means that the data produced for both Genetics ETL and the Platform are released at the same time. Herein, we refer to this joint pipeline as the "unified pipeline".
The orchestration occurs on Google Airflow using Google Cloud as the cloud resource provider. The logic of the orchestration is based on the steps. The combination of steps forms directed acyclic graphs (DAGs).
The unified pipeline uses many static assets (link), like Open Targets related data and data needed to run Genetics ETL.
Unified pipeline
otter — Open Targets' Task ExecutoR i.e. scripts that process and prepare data for our ETL pipelines
platform-etl-backend: ETL pipelines to generate associations, evidence, and entity indices
platform-etl-openfda-faers: ETL pipeline to process Open FDA adverse events data
platform-etl-literature: ETL pipeline to generate similar entities and publications
platform-output-support: scripts for infrastructure tasks and generating a Platform release
If you have further questions, please get in touch with us on the Open Targets Community.
Last updated
Was this helpful?