My recent work centers on making large language models safer and more steerable. I develop methods for configurable preference tuning using synthetic data and rubric-guided generation, enabling fine-grained control over LLM behavior. A recurring theme is test-time adaptation: rather than relying solely on training, I explore how models can self-correct at inference time—refining safety specifications on the fly and mitigating reward hacking without retraining. I also investigate self-critique and model merging as defenses against jailbreak attacks, and more recently, how LLMs can distill user feedback into persistent memory to improve over successive interactions.
Full list → Google Scholar
PhD in Statistics & Operations Research, Universidad Complutense de Madrid, 2021
Contributions to Large Scale Bayesian Inference and Adversarial Machine Learning
Supervised by David Ríos Insua (ICMAT, Royal Academy of Sciences) and David Gómez-Ullate (ICMAT, UCA)
MSc in Mathematical Engineering, UCM
Double Degree in Mathematics & Computer Science, UCM