Research Areas

Google Scholar PubMed

Defining cellular phenotypes with molecular patterns that transfer across single-cell multi-omics datasets

Biological interpretation of single-cell multi-omics datasets requires distilling measurements of thousands of genes and millions of cells into a lower dimensional set of molecular programs underlying cellular phenotypes and function. Our Bayesian non-negative matrix factorization model Coordinated Gene Activity in Pattern Sets (CoGAPS) provides the foundational tool to define these molecular patterns. We pioneered a transfer learning method, projectR, to map molecular patterns underlying cellular phenotypes and fate decisions between datasets. This approach empowers transfer of labels between datasets for automated annotation of cells, inference of regulatory programs from multi-omics integration, spatial mapping, and in silico validation of cell regulatory programs. Our lab has developed standardized notation and pipelines to empower biological interpretation of matrix factorization analyses for the bioinformatics community. Applications in translational oncology demonstrated that transfer learning empowers bi-directional bench to bedside research by quantitatively interrelating data of carcinogenesis and therapeutic response in biological models to cellular processes in human biospecimens.

Gene and cell regulatory network inference

My computational biology lab invents biologically-driven genomics algorithms to infer regulatory mechanisms. These algorithms have spanned multi-omics data modalities and are supported for the broader bioinformatics community in open-source software ecosystems such as Bioconductor. Our methods development has spanned biological scales and encompassed epigenetic regulation of transcription, tumor heterogeneity, cell to cell communication, and temporal regulation. We focus on developing analysis methods for new molecular measurement technologies or in anticipation of future technology development, ensuring that our algorithms have had relevance throughout the bulk profiling era to today’s spatial and temporal multi-omics era.

Systems biology approaches to carcinogenesis, metastasis, and therapeutic resistance

To complement my computational research, my lab also leverages systems biology approaches for integrated computational-experimental strategies to infer the molecular and cellular pathways of dynamic processes in cancer, including carcinogenesis, therapeutic resistance, and progression to metastasis. My wet lab has pioneered temporally resolved multi-omics profiling techniques and spatial molecular profiling of human biospecimens for this spatiotemporal characterization. Close coupling of novel experimental design with computational algorithm development enables us to uncover the systems-level interactions of molecular, cellular, and tissue states in disease progression and therapeutic response.

Forecasting cellular networks and cancer progression by assimilating high-throughput data with mathematical modeling

Chaos theory has demonstrated that embedding prior knowledge of the regulatory mechanisms with high-throughput data can increase prediction accuracy, even when the mathematical models or data are incomplete. To overcome the lack of true temporalomics data in cancer my lab builds a biological forecast system by integrating machine learning approaches that infer cellular regulatory networks from single-cell multi-omics data with mathematical models that simulate phenotypic changes resulting from these inferred cellular interactions. We are currently pioneering agent-based mathematical models, time course multi-omics analysis methods, and time-dependent artificial intelligence methods to enable temporal modeling of biological systems. We strive to integrate these approaches to combine data with computational models, informed data assimilation for weather prediction.

Team science multi-omics and atlas studies

I led genomics analysis of datasets from numerous screening studies from diverse tumor types, biological systems, experimental conditions, and high-throughput data platforms as part of team science research. In addition to analyses performed in my lab, I have devised a training plan to empower biologists in my collaborators’ labs to perform their own bioinformatics analyses leading to numerous co-authored publications and their subsequent independent publications beyond the mentorship period. Leveraging the tools developed in my lab, many of cancer multi-omics and spatial molecular studies performed in my lab focused on transcriptional regulation of human cancers as well as single cell and spatial multi-omics analysis across species and even into clinical trials samples. These collaborative analyses demonstrate my expertise in inferring biologically-relevant findings from both proprietary and public domain high-throughput datasets to advance collaborative research. These studies empower bi-directional clinical and computational research, motivating new computational research areas.

Accounting for inter-tumor heterogeneity in genomics

Genomic and epigenetic landscapes of tumors have higher variability than normal samples, increasing with in tumors worsening prognosis. Statistical analyses that compare the variability of the genomic measurements in tumor samples relative to normal for genes can prioritize molecular alterations in individual tumors. Expression Variation Analysis (EVA) quantifies differential variability analysis of pathways and splice variants.