Dario Fumarola

Photo ICML NeurIPS NY-RL Photo

The true mystery of the world is the visible, not the invisible. - Oscar Wilde

Download CV
dariofumarola90@gmail.com
LinkedIn

Last updated: January 2026

About Me

Hey! I’m Dario, a Computer Science and Mathematics graduate from Washington and Lee University. Originally from a small village in Italy, I’m currently a Prototyping Architect at Amazon Web Services in New York City, where I build AI systems on AWS and deliver customer-facing solutions designed to work reliably at scale.

My research focuses on reinforcement learning, representation learning, and the geometry of neural systems. I’m especially interested in how graph structure and differential geometry can shape more robust, interpretable representations, and how RL agents can adapt their behavior and computation under uncertainty. Lately, I’ve been focusing on adaptive control mechanisms and multi-agent settings, with an emphasis on stability, long-tail performance, and clear behavioral knobs.

I’m actively seeking research collaborations to apply these directions to real-world problems in learning and decision-making - feel free to reach out if you’d like to discuss ideas or build something together :)

Research Interests

Education

Washington and Lee University (2019 - 2023)
Computer Science and Mathematics - Davis Scholar

Relevant Coursework: Deep Learning, Machine Learning and Big Data, Real Analysis, Network Security, Differential Geometry, Differential Equations

Experience Highlights

Professional Memberships:

Presented Work

Mood Swings: Neuromodulatory Control for Deep RL Agents ICML 2025

Mood Swings introduces a compact control interface for actor–critic agents using three global scalars: dopaminergic gain on TD error and two serotonergic coefficients controlling entropy drive and threat discounting. These scalars define a continuous "mood" manifold outside the network, enabling behavior shifts by writing three floats rather than retraining. Experiments in Pac-Mind and MiniHack trace smooth safety–performance frontiers, with higher dopamine accelerating learning but increasing collision risk, and higher serotonin improving survival while moderating returns.

Elastic State Models: Geometry-Aware Adaptive Compute NeurIPS 2025

Elastic State Models (ESM) add adaptive computation to a streaming state-space backbone by converting per-step error into an integer refinement depth under an explicit compute penalty. When activated, ESM performs latent-space updates using metric-preconditioned gradients with trust-region clipping to stabilize correction. The method concentrates compute on difficult timesteps, improving performance in maze navigation and protein loop repair while using lower average compute than Transformer baselines.

Broadcast-Gain: Minimal Control Plane for Cooperative MARL NY-RL 2025

Broadcast-Gain (BG) is a fixed-rate, neighbor-only overlay that improves coordination in cooperative MARL under lossy communication. Each agent broadcasts two bytes per cycle (a signed residual and a meta tag) without modifying the base PPO+GAE learner; receivers compute a confidence-weighted consensus that gates a phase scheduler and applies a clipped, distance-decayed bias to the MOVE logit near junctions. At ~0.24 kbit/s per agent, BG reduces tail wait by ~5 steps and increases near-gate flow by +392 per 1k steps on hard evaluation cells.

Selective Replication for Efficient k-NN Retrieval

We propose selective vector replication to improve k-nearest neighbor retrieval in clustered high-dimensional indexes. The method identifies boundary vectors likely to be relevant across neighboring clusters and replicates only those vectors into the adjacent partitions they connect. This targeted replication increases recall while reducing the number of vectors scanned per query, yielding better retrieval quality with bounded storage overhead.

Hierarchically Partitioned Cloud-Native Vector Search

This work adapts graph-based ANN search to object storage by combining hierarchical graph partitioning with parallel S3 reads. Large HNSW graphs are partitioned into size-bounded subgraphs optimized for object fetch and caching behavior, and queries retrieve only the subgraphs required for traversal, in parallel. The result is lower tail latency while maintaining high recall at billion-scale, with predictable storage and compute costs.

GeoGAT: Geometry-Aware Graph Attention for Molecular Property Prediction

GeoGAT integrates bonded connectivity, 3D geometry, and electronic descriptors in a sparse molecular graph attention model. A global context node captures molecule-level effects while invariant pair and angle features modulate attention to distinguish conformers and long-range interactions. Across tasks such as solubility, LogP, and binding-related prediction, GeoGAT improves accuracy and yields interpretable attention patterns that localize influential atoms and functional groups.