Summary
I am an ML Engineer at Google Research working on creating ML models for healthcare applications. I graduated with an MS CS from the College of Information and Computer Sciences(CICS) at UMass, Amherst. Before that, I was an undergrad at IIT, Delhi. You can find my resume here.
I am interested broadly in the field of Theoretical Machine Learning and NLP and have worked in projects involving computer vision, natural language processing and reinforcement learning. Some of my research areas are:
- Bayesian Calibration in the learning of neural networks
- Causal inference for medical diagnosis on multi-modal data
- Natural Language Understanding for smart assistants like Alexa
Major Projects and Internships
Faster Bayesian Parameter estimation for Neural nets
Currently working on making the process of MCMC sampling for bayesian parameter estimation quicker. Starting off from the work of bayesian dark knowledge, we want to come up with estimators more robust to dataset uncertainty, while keeping the inference time for the estimation process to be the same or better.
Causal Inference models for multi-model Medical Diagnosis data
Currently working on building hierarchical bayesian diagnostic models for MIMIC dataset. We are working on formalizing the problem as a causal inference problem and investigating the advantages such a formulation can have.
Building models for Noisy Conversational Question Answering
Worked with Alexa's Implicit Memory team on finding out the best way to integrate noise robustness in text based Q&A models using transformers and ELMo embeddings. We add noise to the CoQA dataset, creating a noisy version of CoQA. Evaluating models like FlowQA on noisy-CoQA exposes the problems with modern text based NLP models to structured noise coming through speech recognition and ambient noise sources. We use stabilization of layers to get noise robust embeddings for models and are continuing the work to build in robustness for such models.
Natural Language Understanding through Intent Classification
Worked with Alexa's Spoken Language Understanding team on building a fast intent classifier. This will be integrated with the voice assistant as a tool to identify the intent
of the utterance spoken by the user in a dialogue form. We were able to build in an early exiting strategy, that can be integrated with LSTMs and affine neural networks, traditionally used for such tasks. We are able to show that much shallower networks are able to perform reasonably well on the task, while being significantly faster and smaller in the number of parameters. Accepted for conference proceedings at ICASSP 2020. Link to paper.
Getting better at Game Playing by transfer of skill
Worked on transfer learning in the context of game playing. Transfer Learning has been used to learn policies from 10 of the 11 Atari games and use these as policy initiliasers for the last game.
A generative model is fit for the simulations of the first ten\\& and then fine-tuned by Joint Training and Feature Extraction for the eleventh game. Results show promising transfer of policy in the context of Atari games.
WordVector Based Moderation Pipeline for Amazon Marketplace
We developed an end-to-end distributed pipeline for scoring the risk factor of a specific advertisment campaign on the Amazon Marketplace. This uses the text description of the ad and the associated photograph.
Using these two, we developed a feature extractor to encode the risk of the advertisement and then use an ensemble of regressors to give the majority score of the advertisment. This was used in conjunction with the manual moderation team to make it
more efficient for them to score only those ads on which our pipeline wasn't confident enough.
Minimax Bot for playing the game of Tak
The created bot uses Minimax to search over all the possible set of game states, and selects the best ones using a utility function. The utility function has been trained by seeing which features affect the quality of gameplay the most. The code and a description of the strategy used can be found here
-
Using Genetic Algorithms for approximately solving the Traveling Salesman Problem
A specific tour of the travelling salesman was modelled as a gene for the Genetic Algorithm(GA). This gene was then mutated using two different crossover techniques: CX and PMX; and both the types of offsprings were included in the final population for the next generation.
Fitness of a gene was the Laplace smoothened reciprocal of the total distance of the tour. The crossover process was parallelised thus allowing for very large starting populations and also made the GA approach the maxima in a small number of iterations.
For more on the algorithm and the code go here.
-
Local Search Algorithms with Parallelization
Implemented a set of Local Search algorithms including: HillClimibingWithRandomRestarts, HillClimbingwithTabu, BeamSearch,
BeamSearchWithTabu and BeamSearchWithRandomRestarts. This is used to search for the global maxima for the cost of a resource allocation problem. Parallelization was done to introduce three threads to update a common max value, to make searching more time efficient. The code can be found here
-
Echo State Networks for Stock Prediction
We use an Echo State Network or a reservoir RNN for capturing the movement of a specific stock. The echo state network takes in the input as the stock price of the previous day, the learned state of the echo state network and outputs the predicted value of the index on the current day. The code for the ESN can be found here
Other Projects and Apps
Sparse Matrix Vector Multiplication
Built a parallelised module for multiplying a sparse matrix with a dense vector. This use case is commmonly seen in Deep Learning libraries with vector calculus dictating many of the matrices in use to be sparse.
The Matrix was represented as a Compressed Sparse Row(CSR), and the calculations were henceforth parallelised for the dot product. Using CUDA allowed for device level and global level data and functionality separation.
Code can be found here.
Physics Engine for a Simulator Game
Built a C# application for the universal Windows environment. We designed a PhysX Engine from scratch for a space simulator with gravity and multi-partilce interactions. This was packaged as a game and won the Runners Up at Microsoft's Code.Fun.Do hackathon.
Choosing to develop a customised PhysX Engine over Unity gave us more speed and resulted in a smoother user experience. Code for this app can be found here.
You can also see a demo video here.
Android application for Moodle
We developed an android app for the learning management system used at IIT Delhi- Moodle. This was done as an assignment as proof-of-concept for porting the moodle system to a mobile-friendly environment. web2py was used for the server-side handling of the users and their registration and modification of data. Code can be found here.
Teaching Experience
- Fall Semester 2017- Undergrad TA for ELL311: Communication Engineering
- Spring Semester 2018- Undergrad TA for MTL106: Probability and Stochastic Processes
- Spring Semester 2019- Grad TA for CS683: Artifical Intelligence
- Fall Semester 2019- Grad TA for CS589: Machine Learning