Complex graphical models for biological network science

PRIN-2022, Unifi
UNIFI Research Unit: Francesco C. Stingo


Project is lead by th University of Florence (PI: Francesco C. Stingo), in collaboration with Università Cattolica del Sacro Cuore (Local PI: G. Consonni), University of Padua (Local PI: A. Roverato), and University of Palermo (Local PI: L. Augugliaro)

UniFI personnel:
F.C. Stingo (PI), M. Lupparelli, A. Gottard, A. Panzera, G. Poli, C. Busatto, L. Focardi Olmi

Brief description of the proposal

This project concerns the development of novel principled statistical tools for the analysis of complex networks under non-standard experimental setups (e.g. relaxing the i.i.d. assumption).
The methodological innovations that can be achieved with this proposal are as follows:

  1. Development of multiple, paired, and covariate-dependent graphical models for heterogeneous networks for both continuous and discrete variables
  2. Development of single and multiple graphical models for causal inference based on observational and interventional data
  3. Development of graphical models for non-normal (e.g., continuous but not Gaussian or circular data) and censored random variables.

The proposed research is expected to provide a methodological foundation for novel types and classes of graphical models. Compared to existing approaches, the additional benefits of our approaches include their interpretability (such as similarity measures between groups for both graph structures and edge values), their ability to assimilate information from several dimensions and to borrow strength only between related groups and/or units, to include prior information such as known biological regulatory mechanisms, and to provide interpretable measures of uncertainty both for single network structures and similarities between groups. We will develop both Bayesian and penalized likelihood approaches.

The proposed statistical models and computational algorithms are flexible and efficient quantitative tools for the analysis of dependence structures of biological networks, including co-expression, gene regulatory, mutations, and protein interaction networks.
Classical approaches to graphical models are not suited to capture and model the heterogeneous multi-dimensional data structures commonly observed in cancer genomics, nor can take into account intervention or missing data.
The methodology proposed in this application along with the companion software will provide medical researchers with a powerful new set of tools for determining the associations between a large number of genetic variables under a variety of complex data generating mechanisms. The application of the proposed methodology will result in a better understanding of the biological mechanisms of cancer, and other disease types.

Impact. The proposed methods are meant to advance knowledge, both theoretical and applied, in the broad areas of multivariate models for the analysis of high-dimensional complex data. In particular we will provide new methodology for graphical models in a variety of settings: non-standard experimental setups, non-Gaussian distributions, heterogeneous data, causal inference, various regimes for biological networks, using frequentist and Bayesian approaches. Our methods are broadly applicable, and are often motivated by investigations in cancer genomics.