Hi! I am Ankesh Anand, a Research Scientist in AI at Google DeepMind, London working on large-sclae multimodal models in the Gemini project. I also worked with the Blueshift team to improve mathematical reasoning in large language models like Minerva using RL planning methods.
I recently finished working as a PhD student at Mila, with Aaron Courville on Representation Rearning and Reinforcement Learning. I have also worked as a research intern at DeepMind, London with Jessica Hamrick, and at Microsoft Research, Montreal with Devon Hjelm and Philip Bachman.
Earlier, I graduated from IIT Kharagpur with a Bachelors and Masters in Mathematics and Computing. I have also spent time at VISA, HackerEarth and Google Summer of Code.
Check out my recent blog post on how we should think about RL in the era of foundation models!