Intervals

Intervals are directly ingested from the E2G consortium, which utilises a mixture of epigenetic datasets to connect genomic intervals (regions defined as chromosome, start, end) to putative genes. The intervals are likely regulatory elements that affect the transcriptional activity of their connected genes.

The input datasets are based on publicly available resources from the ENCODE project, these include histone modification ChIP-seq, open chromatin DNase-seq and ATAC-seq, and 3D chromatin conformation structure determined by Hi-C. These inputs are normalised and used as features in a machine learning approach described in the original E2G publication. The final output connects potential regulatory regions to their target genes, along with a score that ranges between 0 to 1 that indicates the confidence of a given assignment.

Intervals from E2G are ingested into the Open Targets ecosystem and can currently be browsed through the variant page, where the regulatory regions overlapping a given variant of interest are displayed in the interval widget. We have implemented a stringent filter of 0.6 on the interval dataset to reduce computational costs and redunancy. The selection of the filter was based on an analysis performed on E2G interval overlaps with eQTL credible sets.

Effect of interval-score filtering on gene prioritisation for eQTL credible sets. True positives are defined as the eQTL target gene; Sensitivity (orange) is TP recall, and FDR (blue) = 1 − precision among retained cs–gene pairs. Points are thresholds labelled by percentiles (Px) of the interval score distribution. Moderate filtering removes significant amounts of raw intervals while retaining most TP assignments. FDR changes little until very aggressive cutoffs, indicating many non-TP—but potentially interesting—gene links persist.

Future work will involve adding the interval widget to the credible set pages, as well as integrating the interval dataset to construct additional features for the Locus-to-Gene (L2G) pipeline for gene prioritisation purposes.

Last updated

Was this helpful?