Advances in Clustering
The aim of this session is presentation and discussion of novelty in three important areas of Clustering. The first area is clustering on textual data, and in particular Topic Modeling, which gains interest day after day due to the increasing amount of available data. The second focused area is adversarial clustering, which is used for the purpose of detecting illegal or malicious activities within large volumes of data, especially for Medical Frauds. The last area concerns the study of specific properties of the Gaussian mixture maximum likelihood estimator.
Organizer: Luca Frigau, Università degli Studi di Cagliari, ITALY
Tahir Ekin
Texas State University
USA
Christian Martin Hennig
Università degli Studi di Bologna
ITALY
Qiuyi Wu
University of Rochester
USA
Advances in mixture models for matrix-variate and tensor data
The session will include talks about recent advances in mixture models for matrixvariate and tensor data. This type of data has received an increasing interest by researchers, especially within the finite mixture model literature. Typical examples of this data structure include spatial multivariate data, longitudinal data on multiple response variables or spatio-temporal data. In all these cases the data can be arranged in a three-way or four-way array.
Organizer: Antonio Punzo, Università degli Studi di Catania, ITALY
Andriette Bekker
University of Pretoria
SOUTH AFRICA
Paul D. McNicholas
McMaster University
CANADA
Shuchismita Sarkar
Bowling Green State University
USA
Advances in parsimonious mixture modeling
Finite mixtures are a popular tool for modeling heterogeneity in data. A high number of parameters involved in multivariate and matrix mixture models can lead to overfitting and mixture order underestimation. One effective way of addressing this issue is to consider various parsimonious models with an objective to model and explain the data with as few parameters as possible. The session is devoted to recent developments in this area.
Organizer: Volodymyr Melnykov, University of Alabama, USA
Michael P. B. Gallaugher
Baylor University
USA
Salvatore Daniele Tomarchio
Università degli Studi di Catania
ITALY
Xuwen Zhu
University of Alabama
USA
Advances in robust cluster analysis
It is well-known the detrimental effect that (even few) outlying observations could have on many statistical procedures. Robust methods have been introduced to deal with outliers and departures from the assumed models. This is also the case when applying many popular clustering procedures. In fact, different clusters can be joined incorrectly, and uninteresting clusters made from few outlying observations can be detected. Therefore, there is a clear need for robust clustering methods being able to resist outliers and deviations from model assumptions in these clustering approaches. Moreover, robust clustering methods are also useful for detecting anomalous data patterns that can be further investigated if necessary. Some interesting robust clustering approaches and useful ideas in this framework will be presented in this session.
Organizers: Luis Angel García-Escudero, Universidad de Valladolid, SPAIN – Agustín Mayo-Iscar, Universidad de Valladolid, SPAIN
Francesca Greselin
Università degli Studi di Milano-Bicocca
ITALY
Valentin Todorov
United Nations Industrial Development Organization
AUSTRIA
Francesca Torti
Joint Research Center in Ispra
ITALY
Bayesian analysis of finite and infinite mixtures
Finite and infinite mixture models are important tools for flexible modeling of data and for Bayesian cluster analysis. They allow for a principled approach to include prior information and provide a more complete picture of model uncertainty when performing inference. This session highlights different aspects of Bayesian estimation of finite and infinite mixture models and discusses new developments in this area.
Organizer: Bettina Grün, Wirtschaftsuniversität Wien, AUSTRIA
Leonardo Egidi
Università degli Studi di Trieste
ITALY
Alessandra Guglielmi
Politecnico di Milano
ITALY
Keefe Murphy
Maynooth University,
IRELAND
Bayesian non parametrics methods for classification
Bayesian non parametrics is a well established area of research, characterized by the inclusion of unknown functions without a prespecified form as part of the model specification. These quantities are estimated through Bayesian inference, allowing a rigorous quantification of uncertainty. In addition to a very large literature regarding the theoretical properties of these methods, many authors are nowadays exploiting these methods to analyse real data, focusing on challenging regression, classification and unstructured problems.
Organizer: Bruno Scarpa, Università degli Studi di Padova, ITALY
Emanuele Aliverti
Università Ca’ Foscari Venezia
ITALY
Sally Paganin
University of California Berkeley
USA
Massimiliano Russo
Harvard University
USA
Co-clustering for temporal sequences and distributional data
Co-clustering methods have been largely developed since the Hartigan’s work in 1975. These techniques are based on a double partition of the rows and columns of a data matrix aiming to discover hidden patterns clusters of observations that look alike and also sets of variables the most informative for such clusters. The obtained co-clusters provide interpretable block-clustering structures from data which give us more information about the links between observations and variables in a reduced dimensional matrix. Co-clustering techniques have found several applications for the treatment of sparse and high-dimensional data in various fields, such as gene expression analysis, bioinformatics, text mining, web mining, image analysis, and network analysis. Particularly interesting is the extension of co-clustering methods to sequences of temporal data also modelling through functional data, and to distributional data, to cluster high-dimensional data aggregate in form of curves or distributions according to their similarity and to set temporal windows or set of consecutive time instants which related to alike trends in the different clusters. An extension of Co-clustering methods to distributional data have been recently proposed. Usually they are distance-based methods using clustering algorithms based on suitable distances. The co-clusters are described by prototypes based on a linear combination of the variable distributions of the alike individuals. Co-clustering methods for temporal sequences and distributional find application in many fields where the high data production requests a simultaneous reduction of the observations and of the variables (e.g. energy consumption; environmental monitoring; stock exchange market; image detection; ….)
Organizer: Rosanna Verde, Università degli Studi della Campania “Luigi Vanvitelli”, ITALY
Antonio Balzanella
ITALY
Nicoleta Rogovschi
Université de Paris Descartes & Université Sorbonne Paris Nord
FRANCE
SIMONE VANTINI
Politecnico di Milano
ITALY
Copulas in time series analysis
The session is thought as an occasion for discussing about contributions of copula theory to time series analysis. Specifically, the session focuses on recent developments in flexible copula-based models or techniques for the analysis of temporal dynamics with particular attention to tail dependence. The area of application is mainly financial with high interest on tail risk and conditional volatility.
Organizers: Marta L. Di Lascio, Libera Università di Bolzano, ITALY – Roberta Pappadà, Università degli Studi di Trieste, ITALY
Anna Denkowska
Uniwersytet Ekonomiczny w Krakowie
POLAND
Yarema Okrhin
Universität Augsburg
AUSTRIA
Giorgia Rivieccio , ITALY
Università degli Studi di Napoli Parthenope
ITALY
Flexible Bayesian mixture models for complex data
Bayesian mixture models with k components, with k either finite or infinite, are an extremely popular class of models, that have been successfully used in many contexts and applications. We will see a MCMC scheme to compute posterior inference for repulsive mixtures (Møller) in Bayesian cluster detection. Compared to more traditional mixture models, repulsive mixture models produce a smaller number of well separated clusters. The mixture is finite, with a random number of components with prior distribution induced by a finite repulsive point process; Møller will show how difficult reversible jump MCMC computation may be avoided. Argiento will introduce a finite mixture model to cluster categorical attributes with no natural ordering. In the heuristic framework, the problem of clustering these data is tackled by introducing suitable distances, while in the Bayesian model-based framework, stable sampling models for categorical data are needed. In order to provide a probabilistic solution to clustering categorical data, Argiento will provide the definition of a probability mass function over a metric space defined by the Hamming distance first and then mixtures of this distribution. In applications placing interest on large observations with missing data, usual data imputation methods may fail to reproduce the heavy-tailed behaviour of the quantities involved. Recent literature has proposed the use of multivariate extreme value theory to predict an unobserved coordinate of a random vector given large observed values of the rest. Antoniano proposes to generalize such proposal, via Bayesian mixture models based on constrained Bernstein polynomials.
Organizer: Alessandra Guglielmi, Politecnico di Milano, ITALY
Isadora Antoniano-Villalobos
Università Ca’ Foscari Venezia
ITALY
Raffaele Argiento
Università Cattolica del Sacro Cuore
ITALY
Jesper Møller
Aalborg Universitet
DENMARK
Issues in Directional Data Analysis
Directional data analysis deals with data constrained to lie on the surface of hyperspheres, which in turns can be described by unit vectors within an Euclidean space. This constraint makes directional data analysis a specific field within the realm of statistics, asking for specific techniques to tackle many issues. Aim of this session is to present some of the most recent tools to deal with relevant issues in the field.
Organizer: Giovanni Camillo Porzio, Università degli Studi di Cassino e del Lazio Meridionale, ITALY
Jayant Jha
Aix-Marseille Université
FRANCE
Stanislav Nagy
Univerzita Karlova
CZECH REPUBLIC
Paula Saavedra Nieves
Universidade de Santiago de Compostela
SPAIN
Latent variable mixture modeling in epidemiology
The session focuses on novel lines of research for modelling epidemiological data generated by rating surveys or collected by epidemiologists working in applied public health able to correctly analyse the uncertainty component of decisional processes and investigate the factors influencing the occurrence of disease. Among the targets to be discussed: extension of Latent Variables models for complex epidemiological data, possibly from multiple data sources; novel risk indicators for evaluating the grade of a disease.
Organizers: Maria Iannario, Università degli Studi di Napoli Federico II, ITALY
Federica Cugnata
Università Vita-Salute San Raffaele
ITALY
Xanthi Pedeli
Athens University of Economics and Business
GREECE
Marika Vezzoli
Università degli Studi di Brescia
ITALY
Latent variable models for constructing composite indices
Composite indices are widely used by several international organizations to measure latent multidimensional phenomena such as development, well-being, poverty, etc. Some research developments have focused on the construction of the composite index itself, i.e., on the choice of the most appropriate method to synthesize a set of indicators into a single index. Other developments concerned the analysis of the dependence relationships between several composite indices. Finally, some developments have focused on the analysis of hierarchical composite indices, which can be broken down into dimensions and subdimensions. The session will present both current research and applications.
Organizer: Rosaria Romano, Università degli Studi di Napoli Federico II, ITALY
Matteo Mazziotta
Istat – Istituto Nazionale di Statistica
ITALY
Florian SChuberth
Universiteit Twente
THE NETHERLANDS
Laura Trinchera
NEOMA Business School
FRANCE
Methods for inference from innovative or multiple data sources
Nowadays, data comes from a variety of sources with different characteristics. From self-reported administrative datasets to social-media and pervasive systems like cellular networks, GPS devices, and WiFi hotspots. From ad hoc probabilistic surveys to non-probabilistic on-line surveys that often are replacing the former. This poses new challenges to data analysis and motivate the development of new statistical methods to exploit multiple and innovative data sources. The session’s aim is to promote the discussion on this issue.
Organizers: Chiara Bocci, Università degli Studi di Firenze, ITALY – Emilia Rocco, Università degli Studi di Firenze, ITALY
Daniela Marella
Università Roma Tre
ITALY
Natalie Shlomo
University of Manchester
UK
Paul A. Smith
University of Southampton
UK
Modern likelihood methods for model based-clustering
Nowadays the data structure results to be complex. This is due, for example, to observations recorded at different times or occasions, presence of several sources of heterogeneity, or different types of variables. All these features make standard models and methods inadequate. Such complex data, which can also be high-dimensional, are often analysed by latent variable models. In particular, in this session we focus the attention on the model-based clustering, that is a class of latent variable models aiming at capturing unobserved heterogeneity. However, to account for the data complexity described above, the model definition could make the standard estimation methods unfeasible. This leads to the definition of some surrogate functions or some alternative efficient estimation methods.
Organizer: Monia Ranalli, Sapienza Università di Roma, ITALY
Francesco Bartolucci
Università degli Studi di Perugia
ITALY
Claire Gormley
University College Dublin
IRELAND
Matthieu Marbac
Université de Rennes & ENSAI
FRANCE
Networks data analysis and applications
Powerful analytical, statistical and computational methods have been devised to model and analyze networks. They are currently used to obtain more insight in real world problems, ranging from social sciences, to economics, and biology, to name a few. In this session different approaches will be discussed, providing a rich set of algorithms, models and applications.
Organizer: Mario R. Guarracino, Università degli Studi di Cassino e del Lazio Meridionale, ITALY
Silvia D’Angelo
University College Dublin
IRELAND
Panos Pardalos
University of Florida
USA
Maria Prosperina Vitale
Università degli Studi di Salerno
ITALY
New issues in univariate and multivariate quantile regression
Organizer: Lea Petrella, Sapienza Università di Roma, ITALY
Matteo Bottai
Karolinska Institutet
SWEDEN
Carlo Gaetan
Università Ca’ Foscari Venezia
ITALY
Luca Merlo
Sapienza Università di Roma
ITALY
Penalized techniques for data analysis
The aim of this session is to discuss about the field of sparse statistical modelling and classification. Nowadays, many applied fields need a solution to large-scale problems with the goal to predict the outcome or to build a classifier from the predictors, both for actual prediction with future data and also to discover which predictors play an important role.
Organizer: Gianluca Sottile, Università degli Studi di Palermo, ITALY
Marcello Chiodi
Università degli Studi di Palermo
ITALY
Paolo Giudici
Università degli Studi di Pavia
ITALY
Ernst-Jan Camiel Wit
Università della Svizzera Italiana
SWITZERLAND
Recent advances in directional statistics
Directional data arise in many scientific fields where observations are recorded as directions or angles relative to a fixed reference point. In general, the space of all directions is the unit hypersphere. Classical examples of such data include directions of winds, marine currents, Earth’s mainmagnetic field, rock fractures. Because of the nonlinear nature of the manifold, all the statistical methods for dealing with directional data require to be adapted. The aim of this session is to discuss some recent advances in directional data analysis.
Organizers: Stefania Fensore, Università degli Studi “G. d’Annunzio” di Chieti-Pescara, ITALY – Agnese Panzera, Università degli Studi di Firenze, ITALY
Shogo Kato
The Institute of Statistical Mathematics
JAPAN
John T. Kent
University of Leeds
UK
Giuseppe Pandolfo
Università degli studi di Napoli Federico II
ITALY
Recent advances in dynamic clustering: Markov models and extensions
New methods to cluster time-series and longitudinal data are introduced to account for several data features, including serial heterogeneity, non-linear correlations in a high-dimensional setting, measurement error, missing values and dependence on external variables. The focus is on model-based clustering, where efficient algorithms are implemented to speed up the estimation of model’s parameters under both the frequentist and the Bayesian paradigm. Multivariate continuous and categorical data are modelled. The methods are motivated by real data application from finance, medicine, and education.
Organizer: Antonello Maruotti, Libera Università Maria Ss. Assunta, ITALY
Geir Drage Berentsen
NNH Norges Handelshøyskole
NORWAY
Roberto Di Mari
Università degli Studi di Catania
ITALY
Sabrina Giordano
Università della Calabria
ITALY
Recent advances in item response theory models
The session aims at illustrating recent developments in IRT models concerning two different aspects. The first one explores the assessment of multidimensional models in presence of misfit associated to specific items and in presence of sparse data, the second one proposes the improvement of IRT models accuracy by means of boosting methods. Simulation studies and applications to real data sets are illustrated in all the contributions.
Organizer: Silvia Cagnone, Università degli Studi di Bologna, ITALY
Michela Battauz
Università degli Studi di Udine
ITALY
Stefania Mignani
Università degli Studi di Bologna
ITALY
Mark Reiser
Arizona State University
USA
Recent developments in flexible regression – Methods and software
In regression modelling, researchers often seek statistical methods that can adapt to the presence of data features such as nonlinearity, skewness, and heavy tailedness. This session aims at discussing recent developments in parametric, semiparametric and nonparametric regression including additive, quantile, and mixed-effects models. The contributions will provide a balanced view on both methodological and applied aspects, especially those related to software implementation.
Organizer: Marco Geraci, Sapienza Università di Roma, ITALY & University of South Carolina, USA
Matteo Fasiolo
University of Bristol
UK
Javier Rubio
University College London
UK
Fabian Scheipl
Ludwig-Maximilians-Universität München
GERMANY
Recent Developments in Symbolic Data Analysis
Statisticians and Data Analysts are nowadays confronted with increasingly complex data, coming from areas such as genomics, natural language processing, security and fraud detection, finance or the web and social media. The collection, representation, analysis and interpretation of such data present new challenges to researchers. Symbolic Data Analysis provides a framework for the representation and analysis of such complex data, which comprehends inherent variability. These data take the form of sets of values, intervals or distributions, arising from the aggregation of large amounts of open/collected/generated, or directly available in a structured or unstructured form, which takes variability into account. This session aims at presenting and discussing new approaches for the analysis of symbolic data, focusing on interval and distribution-valued data, with applications in different fields.
Organizer: Paula Brito, Universidade do Porto, PORTUGAL
Simona Korenjak Černe
Univerza v Ljubljani
SLOVENIA
M. Rosário Oliveira
Universidade de Lisboa
PORTUGAL
Rosanna Verde
Università degli Studi della Campania “Luigi Vanvitelli”
ITALY
Recent developments in the statistical analysis of categorical data
The use of specialized statistical methods for categorical data has dramatically increased in recent years. Indeed, scientists and practitioners got more and more aware that it is unnecessary and often inappropriate to use methods for continuous data with categorical responses. The session will be devoted to the presentation of recent findings from a methodological and applicative point of view by stressing the relevance of scaling effects in categorical data analysis and discussing suitable risk indicators for priority of intervention.
Organizer:
Silvia Facchinetti
Università Cattolica del Sacro Cuore
ITALY
Maria Iannario
Università degli Studi di Napoli Federico II
ITALY
Maria Kateri
RWTH Aachen University
GERMANY
Robust classification in action
Deviations from theoretical assumptions together with the presence of certain amount of outlying observations are common in many practical statistical applications. This is also the case when applying Cluster Analysis methods, where these troubles could lead to unsatisfactory clustering results. Robust Clustering methods are aimed at avoiding these unsatisfactory results. The purpose of this invited session is to show emprical and theoretical advances in robust clustering with special emphasis to application to real complex clustering data problems affected by multiple outliers.
Organizer: Marco Riani, Università degli Studi di Parma, ITALY
Claudio Agostinelli
Università degli Studi di Trento
ITALY
Fabrizio Laurini
Università degli Studi di Parma
ITALY
Agustin Mayo Iscar
Universidad de Valladolid
SPAIN
Social inequalities
The aim of this session is to analyse the inequalities for social-economic outcomes as for example: income, occupation, education, health. The attention will be given on the aspect unidimensional and multi-dimensional of the nature of inequalities, looking at the definitions and indicators to measure them. In this session, the importance will be given to the methodological aspect and to the applicative aspect.
Organizer: Mariangela Zenga, Università degli Studi di Milano-Bicocca, ITALY
Carlotta Galeone
Università degli Studi di Milano
ITALY
Alina Jędrzejczak
Uniwersytet Łódzki
POLAND
Marcella Mazzoleni
Università degli studi di Bergamo
ITALY
Time series clustering
Time series clustering is a solution for classifying large temporal databases when there is not any early knowledge about classes. With emerging novel concepts, like cloud computing and big data along with their vast applications, in the last decade, research works on unsupervised solutions like clustering algorithms have been significantly increased. Clustering time-series data is routinely used in many scientific areas, ranging from gene expression data in biology to stock market analysis in finance, to discover patterns which empower data analysts to extract valuable information from complex and massive datasets. In the case of huge temporal databases, clustering can play a crucial role as a solution on its own or as a preliminary step to other statistical analysis such as supervised classification solutions or predictive modelling approaches. The session is intended to gather together some prominent researchers in the area to present and discuss the most recent research advances on this topic.
Organizers: Pietro Coretto, Università degli Studi di Salerno, ITALY – Michele La Rocca, Università degli Studi di Salerno, ITALY
Andrés M. Alonso
Universidad Carlos III de Madrid
SPAIN
Claudio Conversano
Università degli Studi di Cagliari
ITALY
Vincenzina Vitale
Sapienza Università di Roma
ITALY