Interview with Andrea Marino – Florence data science

Andrea Marino, researcher at the Department of Statistics, Computer Science, Applications “G. Parenti” at University of Florence, was our guest to the D2 seminar presenting his study entitled: “Approximating the Neighborhood Function of (Temporal) Graphs”.

When we talk about degrees of separation, some network can immediately came to our mind, social network like Facebook, based on “friendship” relation, or the Internet Move Database, an online archive full of informations about films, tv series, cast, crew and so on, who began like a fan operated database, based on collaboration. These are some average distance graphs, largely investigated, but what happens when the number of nodes involved grows up to millions or billions? Computing this measure requires huge time and space costs since for each node must be computed the so-called neighborhood function. In the presented research the networks analysed were temporal ones, in which each edge has some sort of label specifying their occurring times. They can be thought like timetables of public transportation, where the notion of distance is the earliest arrival time between two nodes. The novelty of his study is that it introduces a probabilistic algorithm to approximate sizes and neighborhood functions of temporal graphs, to avoid the computation to go on endlessly.

How would you explain your results to someone who doesn’t know what a temporal neighborhood function is but has certainly heard about degrees of separation as a social network user?

Given two individuals Alice and Bob, the degrees of separations correspond to the number of intermediaries between Alice and Bob, in the sense that, if Alice is separated by Bob with h degrees of separations, it means that Alice has a friend who knows a friend, who knows a friend, …, who knows Bob, where the number of these intermediate individuals is h. But what if we have many individuals and we want to compute the average degree of separation among all the pairs of individuals, not just between Alice and Bob? This is something very expensive to compute.

In order to do this, we can model friendships using a graph of friendships between individuals and study the distance between individuals in this graph. In particular for each individual, say Alice, and for each integer h, we are able to approximate the number of individuals x such that Alice needs less or equal than h intermediates to reach x, i.e. the number of individuals which are at distance within h from Alice (this is called the neighborhood function). This can be done in a recursive way, where an approximation for distance h can be obtained using the approximation for the distance h-1. The approximation makes use of probabilistic counting, where sets of friends within distance h are concisely and approximately represented in order to fasten the execution of the algorithm.

We have shown that the algorithm can be extended in the case of temporal graphs, where edges can be traversed only at specific times and the only valid paths to be considered are the ones respecting these time constraints. In this sense the temporal neighborhood of a vertex is composed by the vertices that can be reached not in h steps but within time t.

During your presentation you mentioned that your research creates a bridge between computer science and data science, could you explain what is the outlook of the study in this sense?

This research puts together graph theory and probabilistic counting. Probabilistic counting is a technique which allows us to approximate sets in order to estimate the number of distinct elements in them and in order to estimate the size of their unions. Probabilistic counting makes use of special sampling strategies which are very useful for designing randomized algorithms. While statistics takes care of the quality guarantees of these samples, i.e. unbiasedness and concentration, computer science takes care of the efficiency of computing samples, in this case focusing on the invariants to be maintained during the computation in order to maintain the guarantees.

Andrea Marino is also a member of our center. More information about his research can be found on his personal page.

You can access the recording of this as well as to other seminars at this link. (Registration needed)