Hi, I am Reza.
I am an RA at Mila working with Aaron Courville and Pascal Vincent. I am interested in fundamental research in AI. I would like to understand why and when deep learning works, and how it can be improved. Currently, I am working on the connection between memorization and generalization in deep learning.
Contact me: reza.bayat [at] mila.quebec
Follow me on X: reza_byt
News
September 2024: I began an RA position at Mila under the supervision of Aaron Courville and Pascal Vincent.
AI Series

S1.E1 ∙ Relaxed Recursive Transformers: Effective Parameter Sharing with Layer-wise LoRA
TL;DR: Relaxed Recursive Transformers is a novel method for compressing LLMs by repeatedly using a single transformer block with shared parameters. It introduces innovative initialization and layer-wise LoRAs to relax layer-tying constraints, enabling it to outperform similarly sized LLMs and achieve performance comparable to that of larger models. Additionally, it proposes Continuous Depth-wise Batching, an inference paradigm that enhances throughput when combined with early-exiting algorithms.
November 28, 2024 | Presented by: Sangmin Bae | Watch TBD

S1.E2 ∙ Adaptive Inference-Time Compute: LLMs Can Predict if They Can Do Better, Even Mid-Generation
TL;DR: Adaptive Inference-Time Compute introduces a generative reward model, enabling LLMs to predict mid-generation whether restarting would improve the response. This approach eliminates the need for an external model and can be used to decide whether to generate additional samples, prune unpromising samples early, or select the best sample. It is highly cost-efficient, as it involves generating a single predefined token and can leverage the existing KV cache.
December 5, 2024 | Presented by: Rohin Manvi | Watch TBD

S1.E3 ∙ Titans: Learning to Memorize at Test Time
TL;DR: Titans are a new family of architectures with a neural long-term memory that memorizes historical context, aiding an attention module in utilizing long-past information. They enable fast, parallel training and inference, scaling beyond 2M context windows while outperforming Transformers and modern RNNs in language modeling, common-sense reasoning, and time series forecasting.
February 12, 2025 | Presented by: Ali Behrouz | Watch TBD