Hi! I am Ankesh Anand, a Research Scientist in AI at Google DeepMind, London working on large-sclae multimodal models in the Gemini project. I also worked with the Blueshift team to improve mathematical reasoning in large language models like Minerva using RL planning methods.
I recently finished working as a PhD student at Mila, with Aaron Courville on Representation Rearning and Reinforcement Learning. I have also worked as a research intern at DeepMind, London with Jessica Hamrick, and at Microsoft Research, Montreal with Devon Hjelm and Philip Bachman.
Earlier, I graduated from IIT Kharagpur with a Bachelors and Masters in Mathematics and Computing. I have also spent time at VISA, HackerEarth and Google Summer of Code.Check out my recent blog post on how we should think about RL in the era of foundation models!