Víctor Gallego

PhD in Stats & OR. Building applied AI at Komorebi AI. Teaching at IE University. Based in Madrid.

Now

Co-founder & Chief Research Officer, Komorebi AI (2019–present)
Applied AI firm specializing in applied research and custom solutions.
Lecturer, IE University (2021–present)
Teaching machine learning and programming courses.

Research Interests

Lately I've been treating LLMs as optimizers in algorithm space: instead of updating weights by gradient descent, a frozen model iteratively writes and refines artifacts—behavioral specifications, programmatic policies, even GPU kernels—guided by feedback from its own evaluations. The design of that feedback turns out to matter as much as the model: a core question is feedback engineering, ranging from a single bit of danger signal to dense social metrics that act as coordination signals between agents. This search loop is powerful but easy to game, so I pair it with lightweight oversight primitives—held-out evaluation gates, decoupled safety channels—that catch reward hacking and keep the optimization honest. I apply these ideas to multi-agent cooperation in sequential social dilemmas, and to autoresearch, where an outer-loop agent autonomously redesigns the very pipeline that drives the inner-loop synthesizer.

This builds on earlier work on making large language models safer and more steerable. I develop methods for configurable preference tuning using synthetic data and rubric-guided generation, enabling fine-grained control over LLM behavior. A recurring theme is test-time adaptation: rather than relying solely on training, I explore how models can self-correct at inference time—refining safety specifications on the fly and mitigating reward hacking without retraining. I also investigate self-critique and model merging as defenses against jailbreak attacks, and how LLMs can distill user feedback into persistent memory to improve over successive interactions.

Selected Publications

2026
Discovering Cooperative Pipelines: Autoresearch for Sequential Social Dilemmas
V Gallego. AI Agents for Discovery in the Wild (AID-Wild) Workshop @ ACM CAIS 2026 · arXiv
Metal-Sci: A Scientific Compute Benchmark for Evolutionary LLM Kernel Search on Apple Silicon
V Gallego. Preprint, 2026 · arXiv
Discovering Agentic Safety Specifications from 1-Bit Danger Signals
V Gallego. Adaptive and Learning Agents Workshop (ALA 2026) @ AAMAS 2026 · arXiv
Beyond Scalar Rewards: Dense Feedback for LLM Policy Synthesis in Sequential Social Dilemmas
V Gallego. NExT-Game 2026: New Frontiers in Game-Theoretic Learning @ ICML 2026 Workshop · arXiv
Distilling Feedback into Memory-as-a-Tool
V Gallego. ICLR 2026 Workshop on Memory for LLM-Based Agentic Systems · arXiv
2025
Specification Self-Correction: Mitigating In-Context Reward Hacking Through Test-Time Refinement
V Gallego. SCALR Workshop @ COLM 2025 · arXiv
Configurable Preference Tuning with Rubric-Guided Synthetic Data
V Gallego. ICML 2025 Workshop on Models of Human Feedback for AI Alignment · arXiv
MetaSC: Test-Time Safety Specification Optimization for Language Models
V Gallego. ICLR 2025 Workshop on Foundation Models in the Wild · arXiv
2024
Refined Direct Preference Optimization with Synthetic Data for Behavioral Alignment of LLMs
V Gallego. LOD 2024: 10th International Conference on Machine Learning, Optimization, and Data Science · arXiv
Protecting Classifiers from Attacks
V Gallego, R Naveiro, A Redondo, D Ríos Insua, F Ruggeri. Statistical Science 39(3), 2024 · arXiv
Merging Improves Self-Critique Against Jailbreak Attacks
V Gallego. ICML 2024 Workshop on Foundation Models in the Wild · arXiv
Configurable Safety Tuning of Language Models with Synthetic Preference Data
V Gallego. 3rd Workshop on Practical Deep Learning: Towards Efficient and Reliable LLMs @ IEEE CAI 2024 · arXiv
2023
Adversarial Machine Learning: Bayesian Perspectives
D Rios Insua, R Naveiro, V Gallego, J Poulos. Journal of the American Statistical Association 118(543), 2023 — 56 citations · arXiv
ZYN: Zero-Shot Reward Models with Yes-No Questions for RLAIF
V Gallego. arXiv:2308.06385 · arXiv
2022
Personalizing Text-to-Image Generation via Aesthetic Gradients
V Gallego. NeurIPS 2022, Workshop on ML for Creativity and Design · arXiv
Current Advances in Neural Networks
V Gallego, D Rios Insua. Annual Review of Statistics and Its Application 9, 2022 — 31 citations
Adversarial Risk Analysis: An Overview
D Banks, V Gallego, R Naveiro, D Ríos Insua. WIREs Computational Statistics 14(1), 2022 — 49 citations · arXiv
2021 & earlier
AI in Drug Development: A Multidisciplinary Perspective
V Gallego, R Naveiro, C Roca, D Rios Insua, NE Campillo. Molecular Diversity 25(3), 2021 — 100 citations
Reinforcement Learning under Threats
V Gallego, R Naveiro, D Rios Insua. AAAI 2019 — 42 citations · arXiv
Stochastic Gradient MCMC with Repulsive Forces
V Gallego, D Rios Insua. NeurIPS 2018, Workshop on Bayesian Deep Learning — 46 citations · arXiv

Full list → Google Scholar

Education

PhD in Statistics & Operations Research, Universidad Complutense de Madrid, 2021
Contributions to Large Scale Bayesian Inference and Adversarial Machine Learning
Supervised by David Ríos Insua (ICMAT, Royal Academy of Sciences) and David Gómez-Ullate (ICMAT, UCA)


MSc in Mathematical Engineering, UCM

Double Degree in Mathematics & Computer Science, UCM