About 95% of somatic mutations in most cancers are located in noncoding regions, some of which have been identified as putative driver mutations. Interestingly, the noncoding regions (~98% of human genome) contain lots of functional elements based on one dimension (1D) epigenomic profiles (like ENCODE and RoadMap epigenomics). Furthermore, integrating three dimension (3D) spatial long-range interactions data sets (like Hi-C-Seq, ChIA-PET) can build accurate enhancer-promoter pairs. Therefore, integrating somatic noncoding mutations with 1D coordinated epigenomic profiles and 3D long-range interactions in specific tissue/cell type will provide a promising direction to fine-map the causal regulatory variants and understand the underlying regulatory mechanism in human cancer development.
A large number of tumor somatic mutations were identified by large cancer sequencing projects, such as TCGA and ICGC, providing an unprecedented opportunity to identify these noncoding driver mutations. In our current study, we build a database, OncoBase, which provides comprehensive annotation and prediction of regulatory somatic mutations by employing state-of-the-art methods for targets prediction, gene or mutation prioritization and function prediction. OncoBase was constructed with following information:
- Collects all of the somatic mutations identified by TCGA and ICGC, and somatic mutations deposited in COSMIC and ClinVar;
- Constructs more than 10 million enhancer target interactions by multiple predictions from multiple resources;
- Incorporates 127 tissue/cell type-specific epigenomic data from ENCODE and Roadmap epigenomics project;
- Integrates motifs of 2817 transcriptional regulators from 4 public resources and predicts the effects of mutations on the binding motifs;
- Uniformly processes of 3C/4C/5C/Hi-C/ChIA-PET data and generates significant interactions at high resolution across 80 tissues/cell types;
- Provides comprehensive functional annotation and prediction of the regulatory somatic mutations;
- Equips a highly interactive visualization function for mutation-target interaction;
- Includes multiple concept of QTLs, including eQTLs, hQTLs, mQTLs and dsQTLs;
- Provides the prioritization of gene targets of regulatory mutations by network diffusion;
- Establishes weighted gene co-expression networks for 36 tumor types from TCGA projects.
Citation: OncoBase: a platform for decoding regulatory somatic mutations in human cancers. Nucleic Acids Research, 2018.
News and Updates
- 11 Nov. 2018
11:00OncoBase is accepted by Nucleic Acids Research.
- 08 Aug. 2018
12:00Version 1.1 of the database is online.
- 06 Jun. 2018
15:40Version 1.0 of the database is online.
- 02 Feb. 2018