Selected Research & Publications
Measuring memorization in language models via probabilistic extraction
Jamie Hayes, Marika Swanberg, Harsh Chaudhari, Itay Yona, Ilia Shumailov, Milad Nasr, Christopher A Choquette-Choo, Katherine Lee, A Feder Cooper
NAACL 2025
We introduce probabilisitic discoverable extraction, a new definition of memorization that takes into account sampling methods used by LLMs and show that it provides a more nuanced quantification of training data memorization in a realistic adversarial setting compared to previous measures.
[Link to Paper]
Beyond the Worst Case: Extending Differential Privacy Guarantees to Realistic Adversaries
Marika Swanberg, Meenatchi Sundaram Muthu Selva Annamalai, Jamie Hayes, Borja Balle, Adam Smith
ArXiv
This work creates a framework to compute high-probability guarantees for DP mechanisms against more realistic classes of attackers rather than worst-case theoretical adversaries. In particular it allows us to do "canary-less" auditing of LLMs in one run.
[Link to Paper]
Control, Confidentiality, and the Right to Be Forgotten
Aloni Cohen, Adam Smith, Marika Swanberg, Prashant Nalini Vasudevan
ACM SIGSAC Conference on Computer and Communications Security (CCS 2023)
Explores how deletion should be formalized in complex systems, critiquing current machine unlearning definitions.
[Link to Paper]
Differentially Private Sampling from Distributions
Sofya Raskhodnikova, Satchit Sivakumar, Adam Smith, Marika Swanberg
NeurIPS 2022 and SIAM 2025
Investigates the complexity of sampling from a distribution while maintaining privacy, offering new lower bounds for fundamental statistical tasks. Fun fact: the main lower bound technique came from my and Satchit's final project for Sofya's Sublinear Algorithms course!
[Link to Journal Version]