Vishakh Pillai

Logo

Hello, I'm Vish, an incoming Data Scientist at DataKind. My portfolio showcases the work I completed in my Masters in Information and Data Science (MIDS) program at UC Berkeley.

Based in New York City, I spend my free time tutoring underserved populations and improving my skills in jazz piano. One of my passions is carrying on my family's legacy of cooking South-Indian cuisine while adapting the recipes to my plant-based diet.

View My Resume

View My LinkedIn Profile

View My Recent Emissions Forecasting Project

View the Project on GitHub vishpillai123/vishpillai.io

My Projects


Leopard Re-Identification

Github | Website | Paper

Problem Statement

Our project scope was to automate the tracking of animal populations and poaching in the wild. Utilizing deep learning techniques, we want to understand whether it is possible to achieve a manual survey accuracy (79%) for leopards. We will identify leopards based on distinctive features and flag untracked or out-of-distribution leopards in an unsupervised setting.

Methodology

Results


Automated Essay Scoring

Github | Paper

Problem Statement

The assessment of essays can be extremely time-consuming and expensive as teachers spend hours grading essays individually. Automated essay scoring (AES) can help reduce cost and potential grading biases and improve time efficiency.

Our project aims to develop AES by using BERT base and BERT-LSTM models to observe whether an RNN layer is needed for an effective two-stage learning framework.

Methodology

Results


Forest Cover Prediction

Github | Presentation

Problem Statement

Natural resource managers need to predict how climate change will affect the composition of tree communities and the functionality of ecosystems. Forest dynamics such as tree species, age composition, and other ecosystem attributes can be studied to understand environmental disturbances and management activities.

Given a dataset of 15,120 training samples and 565,892 test samples, predict the forest cover type (out of 7 classes) for 30x30m2 sections of land based on 54 attributes (ex. elevation, area, soil type, distance to water, aspect, etc.).

Methodology

Results


COVID Regression Modeling

Paper

Problem Statement

In November 2020, our statistics team attempted to model and understand the spread of the global COVID-19 pandemic. Utilizing a dataset of the 50 states in the US, we wanted to develop a regression outlining part of the causal chain of virus spread and casualties. Though the correlation between cases and deaths was undeniable, the lack of data on ICU capacity, healthcare access, and general population health contributed to the difficulty of studying deaths. Instead, we outlined a regression for the predictors of cases across the states.

Methodology

Results