Projects

Full publication list on Google Scholar

The privacy bounds of human mobility

We used 15 months of data from 1.5 million people to show that 4 points--approximate places and times--are enough to identify 95% of individuals in a mobility database. Our work shows that human behavior puts fundamental natural constraints to the privacy of individuals and these constraints hold even when the resolution of the dataset is low; even coarse datasets provide little anonymity. We further developed a formula to estimate the uniqueness of human mobility traces. These findings have important implications for the design of frameworks and institutions dedicated to protect the privacy of individuals.


In collaboration with César Hidalgo, Vincent Blondel, and Michel Verleysen

openPDS/SaferAnswers: Protecting the Privacy of Metadata

In a world where sensors, data storage and processing power are too cheap to meter how do you ensure that users can realize the full value of their data while protecting their privacy? openPDS is a field-tested personal metadata management framework which allows individuals to collect, store, and give fine-grained access to their metadata to third parties. SafeAnswers is a new and practical way of protecting the privacy of metadata at individual level. SafeAnswers turns a hard anonymization problem into a more tractable security one. It allows services to ask questions whose answers are calculated against the metadata instead of trying to anonymize individuals' metadata. Together, openPDS and SafeAnswers provide a new way of dynamically protecting personal metadata.


In collaboration with Samuel Wang, Erez Shmueli, Sandy Pentland, and the Harvard Berkman Center

What Can Your Phone Metadata Tell About You?

How much can others learn about your personality just by looking at the way you use your phone? We provide the first evidence that personality types (for example, neurotism, extraversion, openness) can be predicted from standard mobile phone metadata. We have developed a set of novel psychology-informed indicators that can be computed from any set of mobile phone metadata. These fall into five categories, and range from the time it took you to answer a text, the entropy of your contacts, your daily distance traveled, or the percentage of text conversations you started. Using these 36 indicators, we were able to predict people's personalities correctely up to 63/%, 1.7 times better than random using only metadata.


In collaboration with Jordi Quoidbach, Florent Robic, and Sandy Pentland

Using big data for effective marketing

Using big data for effective marketing is hard. As a consequence, 80% of marketing decisions are still based on gut feeling. This work shows how a principled approach to big data can improve customer segmentation. We run a large-scale text-based experiment in an Asian country, comparing our data-driven approach to the company marketer's best practice. Our approach outperforms marketing's 13 times in click-through rate for a data plan. It also shows significantly better retention rate.


In collaboration with Pål Sundsøy, Johannes Bjelland, Asif Iqbal, and Sandy Pentland

Why networking at work... doesn't work

We show in this study that networking does not improve team performance in competitive environments. Only the participants' strongest ties had an actual effect on their performance — and the stronger the ties a team had, the better the team performed. None of the participants' weak instrumental (goal-oriented) or expressive (personal) networking ties significantly impacted the performance of their teams. The research further showed a team's strongest ties predicts performance better than the technical abilities of its members, what the members already knew on the topic, or their personality types.

When solving problems in a competitive environment, the study shows, it does not matter how many people someone knows or networks with — what really matters are the strongest ties in the network.
In collaboration with Arek Stopczynski, Erez Shmueli, Sandy Pentland, and Sune Lehmann

The limits of community detection in networks

What can really be inferred from communities unfold by modularity-based algorithms? A broad and systematic characterization of the theoretical and practical performance of modularity contradicts the widely held assumption that the modularity function typically exhibits a clear global optimum. This implies that (i) modules identified via modularity maximization are not unique and should therefore be interpreted with extreme caution, and (ii) even moderate differences in modularity scores are meaningless.


In collaboration with Aaron Clauset, and Ben Good

Big Data for development

Quantifying the Stability of Society: Is there such a thing as a 'poverty trap'? Logistic classifiers applied on communication and census data point to a new mechanism for poverty that relates to the persistence of relationships.

Modeling the Dynamics of Urbanization on Social Support Networks: What is attracting migrants to urban areas within the developing world? Using 4 years of movement and communication data, it is possible to model the reinforcing social mechanisms that could explain the recent rapid growth of urban areas.


In collaboration with Nathan Eagle, Aaron Clauset, and Luís M.A. Bettencourt