Invited Sessions | Cladag2021

Advances in Clustering

The aim of this session is presentation and discussion of novelty in three important areas of Clustering. The first area is clustering on textual data, and in particular Topic Modeling, which gains interest day after day due to the increasing amount of available data. The second focused area is adversarial clustering, which is used for the purpose of detecting illegal or malicious activities within large volumes of data, especially for Medical Frauds. The last area concerns the study of specific properties of the Gaussian mixture maximum likelihood estimator.

Organizer: Luca Frigau, Università degli Studi di Cagliari, ITALY

5

Tahir Ekin

Texas State University
USA

5

Christian Martin Hennig

Università degli Studi di Bologna
ITALY

5

Qiuyi Wu

University of Rochester
USA

Advances in mixture models for matrix-variate and tensor data

The session will include talks about recent advances in mixture models for matrixvariate and tensor data. This type of data has received an increasing interest by researchers, especially within the finite mixture model literature. Typical examples of this data structure include spatial multivariate data, longitudinal data on multiple response variables or spatio-temporal data. In all these cases the data can be arranged in a three-way or four-way array.

Organizer: Antonio Punzo, Università degli Studi di Catania, ITALY

5

Andriette Bekker

University of Pretoria
SOUTH AFRICA

5

Paul D. McNicholas

McMaster University
CANADA

5

Shuchismita Sarkar

Bowling Green State University
USA

Advances in parsimonious mixture modeling

Finite mixtures are a popular tool for modeling heterogeneity in data. A high number of parameters involved in multivariate and matrix mixture models can lead to overfitting and mixture order underestimation. One effective way of addressing this issue is to consider various parsimonious models with an objective to model and explain the data with as few parameters as possible. The session is devoted to recent developments in this area.

Organizer: Volodymyr Melnykov, University of Alabama, USA

5

Michael P. B. Gallaugher

Baylor University
USA

5

Salvatore Daniele Tomarchio

Università degli Studi di Catania
ITALY

5

Xuwen Zhu

University of Alabama
USA

Advances in robust cluster analysis

It is well-known the detrimental effect that (even few) outlying observations could have on many statistical procedures. Robust methods have been introduced to deal with outliers and departures from the assumed models. This is also the case when applying many popular clustering procedures. In fact, different clusters can be joined incorrectly, and uninteresting clusters made from few outlying observations can be detected. Therefore, there is a clear need for robust clustering methods being able to resist outliers and deviations from model assumptions in these clustering approaches. Moreover, robust clustering methods are also useful for detecting anomalous data patterns that can be further investigated if necessary. Some interesting robust clustering approaches and useful ideas in this framework will be presented in this session.

Organizers: Luis Angel García-Escudero, Universidad de Valladolid, SPAIN – Agustín Mayo-Iscar, Universidad de Valladolid, SPAIN

5

Francesca Greselin

Università degli Studi di Milano-Bicocca
ITALY

5

Valentin Todorov

United Nations Industrial Development Organization
AUSTRIA

5

Francesca Torti

Joint Research Center in Ispra
ITALY

Bayesian analysis of finite and infinite mixtures

Finite and infinite mixture models are important tools for flexible modeling of data and for Bayesian cluster analysis. They allow for a principled approach to include prior information and provide a more complete picture of model uncertainty when performing inference. This session highlights different aspects of Bayesian estimation of finite and infinite mixture models and discusses new developments in this area.

Organizer: Bettina Grün, Wirtschaftsuniversität Wien, AUSTRIA

5

Leonardo Egidi

Università degli Studi di Trieste
ITALY

5

Alessandra Guglielmi

Politecnico di Milano
ITALY

5

Keefe Murphy

Maynooth University,
IRELAND

Bayesian non parametrics methods for classification

Bayesian non parametrics is a well established area of research, characterized by the inclusion of unknown functions without a prespecified form as part of the model specification. These quantities are estimated through Bayesian inference, allowing a rigorous quantification of uncertainty. In addition to a very large literature regarding the theoretical properties of these methods, many authors are nowadays exploiting these methods to analyse real data, focusing on challenging regression, classification and unstructured problems.

Organizer: Bruno Scarpa, Università degli Studi di Padova, ITALY

5

Emanuele Aliverti

Università Ca’ Foscari Venezia
ITALY

5

Sally Paganin

University of California Berkeley
USA

5

Massimiliano Russo

Harvard University
USA

Co-clustering for temporal sequences and distributional data

Co-clustering methods have been largely developed since the Hartigan’s work in 1975. These techniques are based on a double partition of the rows and columns of a data matrix aiming to discover hidden patterns clusters of observations that look alike and also sets of variables the most informative for such clusters. The obtained co-clusters provide interpretable block-clustering structures from data which give us more information about the links between observations and variables in a reduced dimensional matrix. Co-clustering techniques have found several applications for the treatment of sparse and high-dimensional data in various fields, such as gene expression analysis, bioinformatics, text mining, web mining, image analysis, and network analysis. Particularly interesting is the extension of co-clustering methods to sequences of temporal data also modelling through functional data, and to distributional data, to cluster high-dimensional data aggregate in form of curves or distributions according to their similarity and to set temporal windows or set of consecutive time instants which related to alike trends in the different clusters. An extension of Co-clustering methods to distributional data have been recently proposed. Usually they are distance-based methods using clustering algorithms based on suitable distances. The co-clusters are described by prototypes based on a linear combination of the variable distributions of the alike individuals. Co-clustering methods for temporal sequences and distributional find application in many fields where the high data production requests a simultaneous reduction of the observations and of the variables (e.g. energy consumption; environmental monitoring; stock exchange market; image detection; ….)

Organizer: Rosanna Verde, Università degli Studi della Campania “Luigi Vanvitelli”, ITALY

5

Antonio Balzanella

Università degli Studi della Campania “Luigi Vanvitelli”
ITALY

5

Nicoleta Rogovschi

Université de Paris Descartes & Université Sorbonne Paris Nord
FRANCE

5

SIMONE VANTINI

Politecnico di Milano
ITALY

Copulas in time series analysis

The session is thought as an occasion for discussing about contributions of copula theory to time series analysis. Specifically, the session focuses on recent developments in flexible copula-based models or techniques for the analysis of temporal dynamics with particular attention to tail dependence. The area of application is mainly financial with high interest on tail risk and conditional volatility.

Organizers: Marta L. Di Lascio, Libera Università di Bolzano, ITALY – Roberta Pappadà, Università degli Studi di Trieste, ITALY

5

Anna Denkowska

Uniwersytet Ekonomiczny w Krakowie
POLAND

5

Yarema Okrhin

Universität Augsburg
AUSTRIA

5

Giorgia Rivieccio , ITALY

Università degli Studi di Napoli Parthenope
ITALY

Flexible Bayesian mixture models for complex data

Bayesian mixture models with k components, with k either finite or infinite, are an extremely popular class of models, that have been successfully used in many contexts and applications. We will see a MCMC scheme to compute posterior inference for repulsive mixtures (Møller) in Bayesian cluster detection. Compared to more traditional mixture models, repulsive mixture models produce a smaller number of well separated clusters. The mixture is finite, with a random number of components with prior distribution induced by a finite repulsive point process; Møller will show how difficult reversible jump MCMC computation may be avoided. Argiento will introduce a finite mixture model to cluster categorical attributes with no natural ordering. In the heuristic framework, the problem of clustering these data is tackled by introducing suitable distances, while in the Bayesian model-based framework, stable sampling models for categorical data are needed. In order to provide a probabilistic solution to clustering categorical data, Argiento will provide the definition of a probability mass function over a metric space defined by the Hamming distance first and then mixtures of this distribution. In applications placing interest on large observations with missing data, usual data imputation methods may fail to reproduce the heavy-tailed behaviour of the quantities involved. Recent literature has proposed the use of multivariate extreme value theory to predict an unobserved coordinate of a random vector given large observed values of the rest. Antoniano proposes to generalize such proposal, via Bayesian mixture models based on constrained Bernstein polynomials.

Organizer: Alessandra Guglielmi, Politecnico di Milano, ITALY

5

Isadora Antoniano-Villalobos

Università Ca’ Foscari Venezia
ITALY

5

Raffaele Argiento

Università Cattolica del Sacro Cuore
ITALY

5

Jesper Møller

Aalborg Universitet
DENMARK

Issues in Directional Data Analysis

Directional data analysis deals with data constrained to lie on the surface of hyperspheres, which in turns can be described by unit vectors within an Euclidean space. This constraint makes directional data analysis a specific field within the realm of statistics, asking for specific techniques to tackle many issues. Aim of this session is to present some of the most recent tools to deal with relevant issues in the field.

Organizer: Giovanni Camillo Porzio, Università degli Studi di Cassino e del Lazio Meridionale, ITALY

5

Jayant Jha

Aix-Marseille Université
FRANCE

5

Stanislav Nagy

Univerzita Karlova
CZECH REPUBLIC

5

Paula Saavedra Nieves

Universidade de Santiago de Compostela
SPAIN

Latent variable mixture modeling in epidemiology

The session focuses on novel lines of research for modelling epidemiological data generated by rating surveys or collected by epidemiologists working in applied public health able to correctly analyse the uncertainty component of decisional processes and investigate the factors influencing the occurrence of disease. Among the targets to be discussed: extension of Latent Variables models for complex epidemiological data, possibly from multiple data sources; novel risk indicators for evaluating the grade of a disease.

Organizers: Maria Iannario, Università degli Studi di Napoli Federico II, ITALY

5

Federica Cugnata

Università Vita-Salute San Raffaele
ITALY

5

Xanthi Pedeli

Athens University of Economics and Business
GREECE

5

Marika Vezzoli

Università degli Studi di Brescia
ITALY

Latent variable models for constructing composite indices

Composite indices are widely used by several international organizations to measure latent multidimensional phenomena such as development, well-being, poverty, etc. Some research developments have focused on the construction of the composite index itself, i.e., on the choice of the most appropriate method to synthesize a set of indicators into a single index. Other developments concerned the analysis of the dependence relationships between several composite indices. Finally, some developments have focused on the analysis of hierarchical composite indices, which can be broken down into dimensions and subdimensions. The session will present both current research and applications.

Organizer: Rosaria Romano, Università degli Studi di Napoli Federico II, ITALY

5

Matteo Mazziotta

Istat – Istituto Nazionale di Statistica
ITALY

5

Florian SChuberth

Universiteit Twente
THE NETHERLANDS

5

Laura Trinchera

NEOMA Business School
FRANCE

Methods for inference from innovative or multiple data sources

Nowadays, data comes from a variety of sources with different characteristics. From self-reported administrative datasets to social-media and pervasive systems like cellular networks, GPS devices, and WiFi hotspots. From ad hoc probabilistic surveys to non-probabilistic on-line surveys that often are replacing the former. This poses new challenges to data analysis and motivate the development of new statistical methods to exploit multiple and innovative data sources. The session’s aim is to promote the discussion on this issue.

Organizers: Chiara Bocci, Università degli Studi di Firenze, ITALY – Emilia Rocco, Università degli Studi di Firenze, ITALY

5

Daniela Marella

Università Roma Tre
ITALY

5

Natalie Shlomo

University of Manchester
UK

5

Paul A. Smith

University of Southampton
UK

Modern likelihood methods for model based-clustering

Nowadays the data structure results to be complex. This is due, for example, to observations recorded at different times or occasions, presence of several sources of heterogeneity, or different types of variables. All these features make standard models and methods inadequate. Such complex data, which can also be high-dimensional, are often analysed by latent variable models. In particular, in this session we focus the attention on the model-based clustering, that is a class of latent variable models aiming at capturing unobserved heterogeneity. However, to account for the data complexity described above, the model definition could make the standard estimation methods unfeasible. This leads to the definition of some surrogate functions or some alternative efficient estimation methods.

Organizer: Monia Ranalli, Sapienza Università di Roma, ITALY

5

Francesco Bartolucci

Università degli Studi di Perugia
ITALY

5

Claire Gormley

University College Dublin
IRELAND

5

Matthieu Marbac

Université de Rennes & ENSAI
FRANCE

Networks data analysis and applications

Powerful analytical, statistical and computational methods have been devised to model and analyze networks. They are currently used to obtain more insight in real world problems, ranging from social sciences, to economics, and biology, to name a few. In this session different approaches will be discussed, providing a rich set of algorithms, models and applications.

Organizer: Mario R. Guarracino, Università degli Studi di Cassino e del Lazio Meridionale, ITALY

5

Silvia D’Angelo

University College Dublin
IRELAND

5

Panos Pardalos

University of Florida
USA

5

Maria Prosperina Vitale

Università degli Studi di Salerno
ITALY

New issues in univariate and multivariate quantile regression

Organizer: Lea Petrella, Sapienza Università di Roma, ITALY

5

Matteo Bottai

Karolinska Institutet
SWEDEN

5

Carlo Gaetan

Università Ca’ Foscari Venezia
ITALY

5

Luca Merlo

Sapienza Università di Roma
ITALY

Penalized techniques for data analysis

The aim of this session is to discuss about the field of sparse statistical modelling and classification. Nowadays, many applied fields need a solution to large-scale problems with the goal to predict the outcome or to build a classifier from the predictors, both for actual prediction with future data and also to discover which predictors play an important role.

Organizer: Gianluca Sottile, Università degli Studi di Palermo, ITALY

5

Marcello Chiodi

Università degli Studi di Palermo
ITALY

5

Paolo Giudici

Università degli Studi di Pavia
ITALY

5

Ernst-Jan Camiel Wit

Università della Svizzera Italiana
SWITZERLAND

Recent advances in directional statistics

Directional data arise in many scientific fields where observations are recorded as directions or angles relative to a fixed reference point. In general, the space of all directions is the unit hypersphere. Classical examples of such data include directions of winds, marine currents, Earth’s mainmagnetic field, rock fractures. Because of the nonlinear nature of the manifold, all the statistical methods for dealing with directional data require to be adapted. The aim of this session is to discuss some recent advances in directional data analysis.

Organizers: Stefania Fensore, Università degli Studi “G. d’Annunzio” di Chieti-Pescara, ITALY – Agnese Panzera, Università degli Studi di Firenze, ITALY

5

Shogo Kato

The Institute of Statistical Mathematics
JAPAN

5

John T. Kent

University of Leeds
UK

5

Giuseppe Pandolfo

Università degli studi di Napoli Federico II
ITALY

Recent advances in dynamic clustering: Markov models and extensions

New methods to cluster time-series and longitudinal data are introduced to account for several data features, including serial heterogeneity, non-linear correlations in a high-dimensional setting, measurement error, missing values and dependence on external variables. The focus is on model-based clustering, where efficient algorithms are implemented to speed up the estimation of model’s parameters under both the frequentist and the Bayesian paradigm. Multivariate continuous and categorical data are modelled. The methods are motivated by real data application from finance, medicine, and education.

Organizer: Antonello Maruotti, Libera Università Maria Ss. Assunta, ITALY

5

Geir Drage Berentsen

NNH Norges Handelshøyskole
NORWAY

5

Roberto Di Mari

Università degli Studi di Catania
ITALY

5

Sabrina Giordano

Università della Calabria
ITALY

Recent advances in item response theory models

The session aims at illustrating recent developments in IRT models concerning two different aspects. The first one explores the assessment of multidimensional models in presence of misfit associated to specific items and in presence of sparse data, the second one proposes the improvement of IRT models accuracy by means of boosting methods. Simulation studies and applications to real data sets are illustrated in all the contributions.

Organizer: Silvia Cagnone, Università degli Studi di Bologna, ITALY

5

Michela Battauz

Università degli Studi di Udine
ITALY

5

Stefania Mignani

Università degli Studi di Bologna
ITALY

5

Mark Reiser

Arizona State University
USA

Recent developments in flexible regression – Methods and software

In regression modelling, researchers often seek statistical methods that can adapt to the presence of data features such as nonlinearity, skewness, and heavy tailedness. This session aims at discussing recent developments in parametric, semiparametric and nonparametric regression including additive, quantile, and mixed-effects models. The contributions will provide a balanced view on both methodological and applied aspects, especially those related to software implementation.

Organizer: Marco Geraci, Sapienza Università di Roma, ITALY & University of South Carolina, USA

5

Matteo Fasiolo

University of Bristol
UK

5

Javier Rubio

University College London
UK

5

Fabian Scheipl

Ludwig-Maximilians-Universität München
GERMANY

Recent Developments in Symbolic Data Analysis

Statisticians and Data Analysts are nowadays confronted with increasingly complex data, coming from areas such as genomics, natural language processing, security and fraud detection, finance or the web and social media. The collection, representation, analysis and interpretation of such data present new challenges to researchers. Symbolic Data Analysis provides a framework for the representation and analysis of such complex data, which comprehends inherent variability. These data take the form of sets of values, intervals or distributions, arising from the aggregation of large amounts of open/collected/generated, or directly available in a structured or unstructured form, which takes variability into account. This session aims at presenting and discussing new approaches for the analysis of symbolic data, focusing on interval and distribution-valued data, with applications in different fields.

Organizer: Paula Brito, Universidade do Porto, PORTUGAL

5

Simona Korenjak Černe

Univerza v Ljubljani
SLOVENIA

5

M. Rosário Oliveira

Universidade de Lisboa
PORTUGAL

5

Rosanna Verde

Università degli Studi della Campania “Luigi Vanvitelli”
ITALY

Recent developments in the statistical analysis of categorical data

The use of specialized statistical methods for categorical data has dramatically increased in recent years. Indeed, scientists and practitioners got more and more aware that it is unnecessary and often inappropriate to use methods for continuous data with categorical responses. The session will be devoted to the presentation of recent findings from a methodological and applicative point of view by stressing the relevance of scaling effects in categorical data analysis and discussing suitable risk indicators for priority of intervention.

Organizer: Claudia Tarantola, Università degli Studi di Pavia, ITALY

5

Silvia Facchinetti

Università Cattolica del Sacro Cuore
ITALY

5

Maria Iannario

Università degli Studi di Napoli Federico II
ITALY

5

Maria Kateri

RWTH Aachen University
GERMANY

Robust classification in action

Deviations from theoretical assumptions together with the presence of certain amount of outlying observations are common in many practical statistical applications. This is also the case when applying Cluster Analysis methods, where these troubles could lead to unsatisfactory clustering results. Robust Clustering methods are aimed at avoiding these unsatisfactory results. The purpose of this invited session is to show emprical and theoretical advances in robust clustering with special emphasis to application to real complex clustering data problems affected by multiple outliers.

Organizer: Marco Riani, Università degli Studi di Parma, ITALY

5

Claudio Agostinelli

Università degli Studi di Trento
ITALY

5

Fabrizio Laurini

Università degli Studi di Parma
ITALY

5

Agustin Mayo Iscar

Universidad de Valladolid
SPAIN

Social inequalities

The aim of this session is to analyse the inequalities for social-economic outcomes as for example: income, occupation, education, health. The attention will be given on the aspect unidimensional and multi-dimensional of the nature of inequalities, looking at the definitions and indicators to measure them. In this session, the importance will be given to the methodological aspect and to the applicative aspect.

Organizer: Mariangela Zenga, Università degli Studi di Milano-Bicocca, ITALY

5

Carlotta Galeone

Università degli Studi di Milano
ITALY

5

Alina Jędrzejczak

Uniwersytet Łódzki
POLAND

5

Marcella Mazzoleni

Università degli studi di Bergamo
ITALY

Time series clustering

Time series clustering is a solution for classifying large temporal databases when there is not any early knowledge about classes. With emerging novel concepts, like cloud computing and big data along with their vast applications, in the last decade, research works on unsupervised solutions like clustering algorithms have been significantly increased. Clustering time-series data is routinely used in many scientific areas, ranging from gene expression data in biology to stock market analysis in finance, to discover patterns which empower data analysts to extract valuable information from complex and massive datasets. In the case of huge temporal databases, clustering can play a crucial role as a solution on its own or as a preliminary step to other statistical analysis such as supervised classification solutions or predictive modelling approaches. The session is intended to gather together some prominent researchers in the area to present and discuss the most recent research advances on this topic.

Organizers: Pietro Coretto, Università degli Studi di Salerno, ITALY – Michele La Rocca, Università degli Studi di Salerno, ITALY

5

Andrés M. Alonso

Universidad Carlos III de Madrid
SPAIN

5

Claudio Conversano

Università degli Studi di Cagliari
ITALY

5

Vincenzina Vitale

Sapienza Università di Roma
ITALY