A. Cancer genomics, epigenomics, and public datasets
Without a doubt, our current understanding of cancer biology and all its clinical implications is heavily influenced by the advent of modern genomics. Various genome-wide analytical tools (both sequencing- and array-based technologies) interrogating molecular attributes such as expression (mRNA, miRNA, lncRNA), copy number, CpG methylation, sequence variations (mutation, SNPs), and DNA-protein interactions, have led to discovery of: a) genes crucial to cancer progression; and b) molecular biomarkers of cancer progression, early detection, prognosis, predisposition, and therapeutic response, as well as new targets for drugs. In addition to ushering in new discoveries, genomics data have also validated (and at times corrected) much of our prior knowledge regarding cancer (such as functionalities, pathways, regulatory mechanisms, and diagnostic values associated with a given gene). Fortunately, there are now tens of thousands of cancer-related genomic (as well as proteomic, metabolomic, and pharmacogenomic) datasets which are publicly available from sources such as TCGA, GEO, ENCODE, CCLE, GTEx, and DepMap. TCGA datasets are particularly interesting since the molecular profiling data are integrated, and are accompanied by very comprehensive clinico-pathological information. I strongly believe that many of the important, clinically-relevant, but yet-to-be- discovered knowledge regarding cancer (and other diseases) are just buried in the depths of these publicly available datasets.
B. Current projects and interests
Having been previously involved in a wide range of cancer-related research (chemical carcinogenesis and DNA repair, therapeutics and drug resistance, diagnostics, molecular genetics), I am currently interested in employing predictive approaches that can be applied to cancer, and more recently, other diseases (e.g., neurodegenerative diseases, COVID). The eventual aim is for these bioinformatic observations to be experimentally validated, clinically translated (for some), and IP protected (for some). Below are some of my ongoing (and recent) projects.
1. Cancer biomarker prediction (with emphasis on molecular diagnostics)
● Bionformatic identification of CpG methylation markers for blood-based early and recurrent cancer detection. We then develop highly sensitive assays, based on technologies invented in Francis Barany’s group (Weill Cornell). Although markers have been identified for almost every cancer type, the assays under development are intended for early detection of colorectal cancer, breast cancer, and ovarian cancer. The projects received support from Acuamark Diagnostics and Earlier.org.
● Identification of CpG markers that can serve as a surrogate for assessment of immune infiltration and may predict patient response to checkpoint inhibitors.
● Machine Learning-based predictive models for circulating tumor DNA methylation assays for detecting early and recurrent cancers, as well as response to cancer therapeutics.
● Bioinformatic identification of other potential (and non-CpG), blood-based, early cancer molecular markers such as miRNA, lncRNA, and secreted proteins.
● Prediction of molecular markers of metastasis (surface proteins, secreted proteins).
● Rationalization of the prognostic values of copy number-driven gene dysregulations in cancer.
● Conceptualization of how genomic autozygosity and somatic uniparental disomy may contribute to cancer predisposition and progression.
2. Immuno-Oncogenomics.
● Defining the immune-related pathways that are dysregulated in various solid tumors Bioinformatic analyses of public genomic datasets may also help elucidate the biology of immune infiltration in cancer.
● Bioinformatic modeling of epigenetically-driven transcription of genes crucial to T cell activation and proliferation (e.g. CTLA4, PD1, CD3 genes, PRF1.)
3. Predictive Cancer Biology. Various questions related to cancer biology are being addressed through integrated analyses of public genomic datasets. Among the bioinformatic/statistical approaches employed are virtual gene over-expression/repression, gene set enrichment analysis, and transcription/methylation correlation analysis. These bioinformatic results are then experimentally verified by a number of collaborators from VCU (group of Paul Fisher), MSKCC, and WCM. Current and recent topics of interest, include the following.
● Bioinformatic approaches to predict the biology of prostate cancer metastasis.
● Molecular pathways and functionalities associated with the metastasis-promoting gene MDA9 (with particular focus on glioma and neuroblastoma)
● Modeling the epigenetic regulation of metastasis-promoting gene MDA9
● Predicting the mRNA transcripts (including oncogenes) regulated by hPNPase
● Molecular pathways associated with IDH1 mutation in glioma
● The interaction between the oncogenes AEG1 and AKT2, and its prognostic relevance in glioma
4. Cancer therapeutics and drug resistance. Other ongoing projects concern cancer therapeutics and drug resistance. Bioinformatic analyses (using public genomic datasets, as well as drug databases) are conducted for:
● Identification of candidate proteins that can be targeted using the protein degradation approach
● Modeling the epigenetic regulation of MGMT (a DNA repair protein which is an important factor of resistance against alkylating/methylating chemotherapy drugs).
● Identification of biomarkers of cancer drug efficacy and formulation of mechanistic models of drug resistance. These are accomplished through integrative analyses of publicly available pharmacological, transcriptional, gene dependency, proteomic, mutational, copy number, methylation, and metabolomic data for hundreds of cell lines exposed to thousands of known and potential anti-cancer drugs.
5. Beyond cancer. More recently, I am employing integrated analyses of various publicly available molecular profiling datasets to study other diseases. Among the ongoing projects are:
● Machine learning approaches to predict the onset of neurodegenerative diseases (e.g. Parkinson’s, Alzheimer’s) based on epigenetic markers from genomic DNA isolated from patient blood.
● Integration of transcriptional, methylation, and metabolomic data to investigate mathematical models and biological pathways associated with the aging process.
● Examination of molecular processes that contribute to gender disparity in COVID-related morbidity.