PRIN 2022
PI: Francesco Claudio Stingo
UNIFI personnel: Monia Lupparelli, Anna Gottard, Agnese Panzera, Giovanni Poli, Claudio Busatto, Lorenzo Focardi Olmi
Coordinator: Università degli Studi di Firenze (PI: Francesco Claudio Stingo)
Participants: Università Cattolica del Sacro Cuore (Local PI: Guido Consonni), Università degli Studi di Padova (Local PI: Alberto Roverato), Università degli Studi di Palermo (Local PI: Luigi Augugliaro)
Brief description of the proposal
This project concerns the development of novel principled statistical tools for the analysis of complex networks under non-standard experimental setups (e.g. relaxing the i.i.d. assumption).
The methodological innovations that can be achieved with this proposal are as follows:
- Development of multiple, paired, and covariate-dependent graphical models for heterogeneous networks for both continuous and discrete variables
- Development of single and multiple graphical models for causal inference based on observational and interventional data
- Development of graphical models for non-normal (e.g., continuous but not Gaussian or circular data) and censored random variables.
The proposed research is expected to provide a methodological foundation for novel types and classes of graphical models. Compared to existing approaches, the additional benefits of our approaches include their interpretability (such as similarity measures between groups for both graph structures and edge values), their ability to assimilate information from several dimensions and to borrow strength only between related groups and/or units, to include prior information such as known biological regulatory mechanisms, and to provide interpretable measures of uncertainty both for single network structures and similarities between groups. We will develop both Bayesian and penalized likelihood approaches.
The proposed statistical models and computational algorithms are flexible and efficient quantitative tools for the analysis of dependence structures of biological networks, including co-expression, gene regulatory, mutations, and protein interaction networks.
Classical approaches to graphical models are not suited to capture and model the heterogeneous multi-dimensional data structures commonly observed in cancer genomics, nor can take into account intervention or missing data.
The methodology proposed in this application along with the companion software will provide medical researchers with a powerful new set of tools for determining the associations between a large number of genetic variables under a variety of complex data generating mechanisms. The application of the proposed methodology will result in a better understanding of the biological mechanisms of cancer, and other disease types.
Impact. The proposed methods are meant to advance knowledge, both theoretical and applied, in the broad areas of multivariate models for the analysis of high-dimensional complex data. In particular we will provide new methodology for graphical models in a variety of settings: non-standard experimental setups, non-Gaussian distributions, heterogeneous data, causal inference, various regimes for biological networks, using frequentist and Bayesian approaches. Our methods are broadly applicable, and are often motivated by investigations in cancer genomics.