
I work on AI safety and robustness, focusing on building trustworthy AI systems able to reason in a reliable way,
based on factuality and causality. I am particularly interested in creating large reasoning models (LRMs) that can learn cause and effect relationships
from data and use them to perform strong and auditable reasoning in adversarial environments.
I am currently finishing my PhD at the University of Auckland at the intersection of deep learning and causality.
I showcased my work at top AI conferences and received the University of Auckland Best Student Published Paper in Computer Science award in 2023.
As part of my work, I conducted the first evaluation of large language models (LLMs) on abstract reasoning, highlighting their brittleness and limitations.
I developed a novel modular language model architecture based on causal principles for out-of-distribution reasoning, showed that causal models
can improve the learning of interpretable, robust and domain-invariant mechanisms, and built the first end-to-end framework for causal extraction and inference with LLM agents.
Latest Research

Counterfactual Causal Inference in Natural Language
We build the first causal extraction and counterfactual causal inference system for natural language, and propose a new direction for model oversight and strategic foresight.

Independent Causal Language Models
We develop a novel modular language model architecture sparating inference into independant causal modules, and show that it can be used to improve abstract reasoning performance and robustness for out-of-distribution settings.

Behaviour Modelling of Social Agents
We model the behaviour of interacting social agents (e.g. meerkats) using a combination of causal inference and graph neural networks, and demonstrate increased efficiency and interpretability compared to existing architectures.

Evaluation of LLMs on Abstract Reasoning
We evaluate the performance of large language models on abstract reasoning tasks and show that they fail to adapt to unseen reasoning chains, highlighting a lack of generalization and robustness.

Disentanglement via Causal Interventions on a Quantized Latent Space
We propose a new approach to disentanglement based on hard causal interventions over a quantized latent space, and demonstrate its potential for improving the interpretability and robustness of generative models.
