Víctor Gallego

PhD in Stats & OR. Building applied AI at Komorebi AI. Teaching at IE University. Based in Madrid.

Now

Co-founder & Chief Research Officer, Komorebi AI (2019–present)
Applied AI firm specializing in applied research and custom solutions.
Lecturer, IE University (2021–present)
Teaching machine learning and programming courses.

Research Interests

My recent work centers on making large language models safer and more steerable. I develop methods for configurable preference tuning using synthetic data and rubric-guided generation, enabling fine-grained control over LLM behavior. A recurring theme is test-time adaptation: rather than relying solely on training, I explore how models can self-correct at inference time—refining safety specifications on the fly and mitigating reward hacking without retraining. I also investigate self-critique and model merging as defenses against jailbreak attacks, and more recently, how LLMs can distill user feedback into persistent memory to improve over successive interactions.

Selected Publications

2026
Distilling Feedback into Memory-as-a-Tool
V Gallego. ICLR 2026 Workshop on Memory for LLM-Based Agentic Systems · arXiv
2025
Specification Self-Correction: Mitigating In-Context Reward Hacking Through Test-Time Refinement
V Gallego. SCALR Workshop @ COLM 2025 · arXiv
Configurable Preference Tuning with Rubric-Guided Synthetic Data
V Gallego. ICML 2025 Workshop on Models of Human Feedback for AI Alignment · arXiv
MetaSC: Test-Time Safety Specification Optimization for Language Models
V Gallego. ICLR 2025 Workshop on Foundation Models in the Wild · arXiv
2024
Refined Direct Preference Optimization with Synthetic Data for Behavioral Alignment of LLMs
V Gallego. LOD 2024: 10th International Conference on Machine Learning, Optimization, and Data Science · arXiv
Protecting Classifiers from Attacks
V Gallego, R Naveiro, A Redondo, D Ríos Insua, F Ruggeri. Statistical Science 39(3), 2024 · arXiv
Merging Improves Self-Critique Against Jailbreak Attacks
V Gallego. ICML 2024 Workshop on Foundation Models in the Wild · arXiv
Configurable Safety Tuning of Language Models with Synthetic Preference Data
V Gallego. 3rd Workshop on Practical Deep Learning: Towards Efficient and Reliable LLMs @ IEEE CAI 2024 · arXiv
2023
Adversarial Machine Learning: Bayesian Perspectives
D Rios Insua, R Naveiro, V Gallego, J Poulos. Journal of the American Statistical Association 118(543), 2023 — 56 citations · arXiv
ZYN: Zero-Shot Reward Models with Yes-No Questions for RLAIF
V Gallego. arXiv:2308.06385 · arXiv
2022
Personalizing Text-to-Image Generation via Aesthetic Gradients
V Gallego. NeurIPS 2022, Workshop on ML for Creativity and Design · arXiv
Current Advances in Neural Networks
V Gallego, D Rios Insua. Annual Review of Statistics and Its Application 9, 2022 — 31 citations
Adversarial Risk Analysis: An Overview
D Banks, V Gallego, R Naveiro, D Ríos Insua. WIREs Computational Statistics 14(1), 2022 — 49 citations · arXiv
2021 & earlier
AI in Drug Development: A Multidisciplinary Perspective
V Gallego, R Naveiro, C Roca, D Rios Insua, NE Campillo. Molecular Diversity 25(3), 2021 — 100 citations
Reinforcement Learning under Threats
V Gallego, R Naveiro, D Rios Insua. AAAI 2019 — 42 citations · arXiv
Stochastic Gradient MCMC with Repulsive Forces
V Gallego, D Rios Insua. NeurIPS 2018, Workshop on Bayesian Deep Learning — 46 citations · arXiv

Full list → Google Scholar

Education

PhD in Statistics & Operations Research, Universidad Complutense de Madrid, 2021
Contributions to Large Scale Bayesian Inference and Adversarial Machine Learning
Supervised by David Ríos Insua (ICMAT, Royal Academy of Sciences) and David Gómez-Ullate (ICMAT, UCA)


MSc in Mathematical Engineering, UCM

Double Degree in Mathematics & Computer Science, UCM