Projects
Full publication list on Google Scholar
Unique in the Shopping Mall: Reidentifying credit card data
Large-scale data sets of human behavior have the potential to fundamentally transform the way we fight diseases, design cities, or perform research. Metadata, however, contain sensitive information. We study 3 months of credit card records for 1.1 million people and show that four spatiotemporal points are enough to uniquely reidentify 90% of individuals. We show that knowing the price of a transaction increases the risk of reidentification by 22%, on average. Finally, we show that even data sets that provide coarse information at any or all of the dimensions provide little anonymity and that women are more reidentifiable than men in credit card metadata.
In collaboration with Alex "Sandy" Pentland, Laura Radaelli, and Vivek K. Singh
- de Montjoye Y.-A., Radaelli L., Singh V. K., Pentland A. S., Unique in the shopping mall: On the reidentifiability of credit card metadata. Science 347 (6221), 536-539. DOI:10.1126/science.1256297 (2015).
- Artworks and visuals (.zip).
- New York Times, Wall Street Journal (1, 2), BBC, Harvard Business Review, Nature, Technology Review, PBS, Le Monde (FR), Die Zeit (DE), Die Spiegel (DE), El Pais (ES), RT, The Hill, Telegraph (UK), Les Echos (FR), Scientific American, New Scientist, Five Thirty Eight, Gizmodo, Fast Company, Computer World, ZDNet, Tom's guide, Popular Mechanics, Motherboard, TechTarget, US News, NBC, CNBC, Huffington Post (US), Pacific Standard, IEEE Spectrum, Phys.org, La Recherche - Entretien du mois (FR), Europe1, Radio Canada (FR), Le Vif (FR), Slate (FR), Trends-Tendance (FR), Science et vie (FR),
Related and selected press:
The privacy bounds of human mobility
We used 15 months of data from 1.5 million people to show that 4 points--approximate places and times--are enough to identify 95% of individuals in a mobility database. Our work shows that human behavior puts fundamental natural constraints to the privacy of individuals and these constraints hold even when the resolution of the dataset is low; even coarse datasets provide little anonymity. We further developed a formula to estimate the uniqueness of human mobility traces. These findings have important implications for the design of frameworks and institutions dedicated to protect the privacy of individuals.
In collaboration with César Hidalgo, Vincent Blondel, and Michel Verleysen
- de Montjoye, Y.-A., Hidalgo, C.A., Verleysen, M. & Blondel, V.D. Unique in the Crowd: The privacy bounds of human mobility. Nature srep. 3, 1376; DOI:10.1038/srep01376 (2013).
- Artworks and visuals (.zip).
- In March 2013: BBC News, CNN, World Economic Forum, European Commission, Wall Street Journal, New-York Times, Le Monde, Guardian, MIT Technology Review, Nature, United Nations - Global Pulse, Wired (US), Wired (UK), MIT News, Boing Boing, Fast Company, PopSci, Bruce Schneier, GigaOM, Phys.org, Slate, MIT Technology review, Smithsonian The Huffington Post, The Telegraph, The Inquirer, Spiegel, Die Welt (1, 2), BFM TV, IEEE Spectrum, Bundeszentrale für politische Bildung (DE)
- Our editorial in the Christian Science Monitor: Solution to NSA overreach -- put people in charge of their own data.
- Our editorial in Le Monde (FR): Il est temps de parler des métadonnées and in La revue des Centraliens (FR) Métadonnées, pour ou contre?.
- After: New-York Times, Wall Street Journal CIO Journal, Foreign Policy, New Scientist, CS Monitor, MIT Technology Review, The Atlantic, Le Monde, National Journal, NBC, Mashable, Wired (JP), TechDirt, InternetActu (FR), Privacy Commissioner Ontario, Canada Mediapart (FR), Orange Digital Society (FR), IEEE Spectrum, DefenseOne, Deloitte Reports, Significance, Royal Statistical Society, Le Monde Informatique (FR), Europe 1 (FR) Le Vif (FR)
Related and selected press:
openPDS/SafeAnswers: Protecting the Privacy of Metadata
In a world where sensors, data storage and processing power are too cheap to meter how do you ensure that users can realize the full value of their data while protecting their privacy? openPDS is a field-tested personal metadata management framework which allows individuals to collect, store, and give fine-grained access to their metadata to third parties. SafeAnswers is a new and practical way of protecting the privacy of metadata at individual level. SafeAnswers turns a hard anonymization problem into a more tractable security one. It allows services to ask questions whose answers are calculated against the metadata instead of trying to anonymize individuals' metadata. Together, openPDS and SafeAnswers provide a new way of dynamically protecting personal metadata.
In collaboration with Samuel Wang, Erez Shmueli, Sandy Pentland, and the Harvard Berkman Center
- openPDS website
- de Montjoye Y.-A., Shmueli E., Wang S., Pentland A., openPDS: Protecting the Privacy of Metadata through SafeAnswers. PLoS One, 10.1371 (2014).
- de Montjoye Y.-A., Wang S., Pentland A., On the Trusted Use of Large-Scale Personal Data. IEEE Data Engineering Bulletin, 35-4 (2012).
- openPDS privacy settings - Visuals (.zip).
- View our youtube video
- Press:
BBC,
New-York Times,
Wall Street Journal,
Technology Review,
World Economic Forum,
Real Time with Bill Maher (HBO),
Le Monde (FR),
MIT News,
Wired (UK),
New Scientist,
The Parliament Magazine (EU),
Baratunde for Fast Company,
GigaOM,
Scientific American
(1,
2),
Fast Company,
The Edge,
Phys.org,
Red Orbit,
Science Daily,
Cisco,
CIO magazine,
Radio Canada,
Trends-Tendance,
New Scientist,
Vice.com,
InternetActu (FR),
Les Echos (FR),
T3N (DE)
Related and selected press:
What Can Your Phone Metadata Tell About You?
How much can others learn about your personality just by looking at the way you use your phone? We provide the first evidence that personality types (for example, neurotism, extraversion, openness) can be predicted from standard mobile phone metadata. We have developed a set of novel psychology-informed indicators that can be computed from any set of mobile phone metadata. These fall into five categories, and range from the time it took you to answer a text, the entropy of your contacts, your daily distance traveled, or the percentage of text conversations you started. Using these 36 indicators, we were able to predict people's personalities correctely up to 63/%, 1.7 times better than random using only metadata.
In collaboration with Jordi Quoidbach, Florent Robic, and Sandy Pentland
- Click here for an infographic describing our research and results
- de Montjoye, Y.-A.*, Quoidbach J.*, Robic F.*, Pentland A., Predicting people personality using novel mobile phone-based metrics. International Conference on Social Computing, Behavioral-Cultural Modeling, & Prediction, Washington, USA (2013)
- Press: Boston Globe, Russia Today, Le Monde (FR), InternetActu (FR), Mediapart (FR), Agence Science Presse (FR), Radio Canada (FR), DefenseOne and Quartz
Related and press:
Using big data for effective marketing
Using big data for effective marketing is hard. As a consequence, 80% of marketing decisions are still based on gut feeling. This work shows how a principled approach to big data can improve customer segmentation. We run a large-scale text-based experiment in an Asian country, comparing our data-driven approach to the company marketer's best practice. Our approach outperforms marketing's 13 times in click-through rate for a data plan. It also shows significantly better retention rate.
In collaboration with Pål Sundsøy, Johannes Bjelland, Asif Iqbal, and Sandy Pentland
- Click here for an infographic describing our research and results
- Sundsøy, P., Bjelland, J., Iqbal, A., Pentland, A., and de Montjoye, Y. A. (2014). Big Data-Driven Marketing: How Machine Learning Outperforms Marketers’ Gut-Feeling. In Social Computing, Behavioral-Cultural Modeling and Prediction (pp. 367-374). Springer International Publishing.
- Press: Gates Fondation
Related publications:
Why networking at work... doesn't work
We show in this study that networking does not improve team performance in competitive environments. Only the participants' strongest ties had an actual effect on their performance — and the stronger the ties a team had, the better the team performed. None of the participants' weak instrumental (goal-oriented) or expressive (personal) networking ties significantly impacted the performance of their teams. The research further showed a team's strongest ties predicts performance better than the technical abilities of its members, what the members already knew on the topic, or their personality types.
When solving problems in a competitive environment, the study shows, it does not matter how many people someone knows or networks with — what really matters are the strongest ties in the network.
In collaboration with Arek Stopczynski, Erez Shmueli, Sandy Pentland, and Sune Lehmann
- Click here for an infographic describing our research and results
- de Montjoye, Y. A., Stopczynski, A., Shmueli, E., Pentland, A., & Lehmann, S. (2014). The Strength of the Strongest Ties in Collaborative Problem Solving. Scientific Reports, 4.
- Press: MIT News, GigaOM, Express (NL), Computer World (DK), DR (DK), DTU (DK)
Related publications:
The limits of community detection in networks
What can really be inferred from communities unfold by modularity-based algorithms? A broad and systematic characterization of the theoretical and practical performance of modularity contradicts the widely held assumption that the modularity function typically exhibits a clear global optimum. This implies that (i) modules identified via modularity maximization are not unique and should therefore be interpreted with extreme caution, and (ii) even moderate differences in modularity scores are meaningless.
In collaboration with Aaron Clauset, and Ben Good
- The performance of modularity maximization in practical contexts. Physical Review E 81, 046106 (2010).
Related:
Big Data for development
Quantifying the Stability of Society: Is there such a thing as a 'poverty trap'? Logistic classifiers applied on communication and census data point to a new mechanism for poverty that relates to the persistence of relationships.
Modeling the Dynamics of Urbanization on Social Support Networks: What is attracting migrants to urban areas within the developing world? Using 4 years of movement and communication data, it is possible to model the reinforcing social mechanisms that could explain the recent rapid growth of urban areas.
In collaboration with Nathan Eagle, Aaron Clauset, and Luís M.A. Bettencourt
- Stability in society: Parameters for the persistence of social networks, Master’s thesis, Université catholique de Louvain, 2009.
- Community Computing: Comparisons between Rural and Urban Societies using Mobile Phone Data. International Conference on Computational Science and Engineering. (2009)