Me

Hello! I'm a CS PhD student at Harvard (starting Fall '24), funded by the NSF GRFP. I'm interested in computer vision, human-robot interaction, and robotics. My mission is to use methods from these areas to develop tools to help people, particularly those who are disadvantaged.

I'm currently working with Prof. Patrick Slade at the Harvard Ability Lab, to develop a smartphone-based navigation assistant for people who are blind or visually impaired.

Some of my past/current projects are shown below. To see my academic record, work experience, and more, check out my CV.


A Low-Cost and Accessible Navigation Tool for the Blind and Visually Impaired

I am working with Prof. Patrick Slade at the Harvard Ability Lab to develop a smartphone-based device to help blind and visually impaired people navigate. It is currently capable of full-body collision avoidance, turn-by-turn GPS navigation, and semantic segmentation of street scenes.

I've also had the pleasure of working with several undergrads to develop a motorized wheel that can be attached to a cane to better guide users.

Our overarching goal is to develop a system that (1) demonstrably improves mobility for blind users by tackling all major navigation challenges, and (2) is accessible to as many people as possible.

Figure

Exploring Generative Models in Hyperbolic Space

Hyperbolic space is a generally under-explored topic, but it's particularly interesting because unlike Euclidean space, hyperbolic space expands exponentially due to having negative curvature, which makes it much more interesting to explore and far less limiting to explore.

During summer 2021 I worked with Prof. Ryan Adams to develop a system for visualizing and exploring generative models in simulated hyperbolic space. I implemented a model, projection, and a square tiling system for hyperbolic space in OpenGL and connected the model with a PGAN for generating correlated images based on geodesic distance.

For my undergraduate senior thesis I extended this work by implementing it in Unity with a generalized tiling system and a text-to-image model (LAFITE). I also ran simulated experiments to gauge the effectiveness of my system for finding imagined outputs from a model.

The Unity WebGL app is playable here (although the server for the generative model is down.)

Generated outputs in hyperbolic space A (5,7) hyperbolic tiling

Convolutional Transformers for Inertial Navigation

For my junior independent work at Princeton, I implemented neural network architectures that improve on existing state-of-the-art architectures on the task of inertial navigation. This task uses measurements from inertial measurement units (IMUs), which contain an accelerometer and a gyroscope, to predict an objects position. IMUs are cheap, ubiquitous, energy-efficient, and used in a wide variety of applications.

My models use a transformer encoder and incorporate convolutional layers to extract both global and local data relationships, and achieve better results than the previous best method.

Ground truth and predicted trajectories from various models.

OSCAR: Occluding Spatials, Category, And Region under discussion

For the final project for COS484 (Natural Language Processing) at Princeton, we reproduced a question-answering model and the ablations from this paper. We additionally performed several of our own ablations, and tested using different models for generating question embeddings.

The QCS+RuD model.

Pedestrian Detection and Interpretability

For the final project for COS429 (Computer Vision) at Princeton, we investigated whether CNNs trained for object detections are reliant on visual cues when detecting pedestrians. We trained a Faster R-CNN on the Caltech Pedestrian Dataset, evaluated the model on different categories of pedestrian images, and improved the model by up-weighting images in poor-performing categories during training.

Pedestrians

Crowdsourcing Datasets for Optical Flow

During summer 2020, I joined the Princeton Vision & Learning Lab to work on a visual learning project on optical flow. I helped develop and optimize a system for collecting human-annotated images.

These annotations are used to predict the ground truth optical flow for various scenes and videos.

Frame 1 Frame 8

Ray Tracing

During a summer internship at Oregon State University I created a simple ray tracer using C++. I implemented and tested methods for improving image rendering speed and making rendered images more realistic.

The top image is an output from an early version of the ray tracer that used simple methods such as antialiasing and reflection.

The bottom image is an output that used more advanced techniques, such as Monte Carlo path tracing, to create more realistic lighting.

I presented my work at the 2018 ASE Symposium (see p.25) at the University of Portland.

Spheres Path tracing

Explainable Neural Networks

During summer of 2019 I worked on a project at Oregon State University on explainable neural networks. This project aimed to explain the decision-making of deep neural nets for image recognition in terms of human concepts.

The program was trained to recognize and identify images of birds and then analyzed to see whether it focused on semantically meaningful concepts, such as "Eye" or "Crown" as in the image shown here.

XNN

Dementia Diagnosis

Over the course of 2016 and early 2017 I helped develop a project that used deep convolutional neural nets to analyze MRI scans and attempt to diagnose various stages of dementia. My motivation for this project comes from my family's history of Alzheimer's.

By developing a technique that allowed the 3D MRI scans to each be split into hundreds of 2D "slices" as seen in the image, I was able to substantially improve the accuracy of the program.

This project was submitted to the 2017 Central Western Oregon Science Expo and the subsequent Intel Northwest Science Expo, where it won several awards.

2D MRI scan 2D MRI scan 2D MRI scan

Other Websites

Aside from this website, I've developed several other websites and concepts for websites. One example is airinchina.github.io, which I created as an informative supplement for a high school project on air pollution in China.

A test website I created as a structure for my final project for Web Development (CS290) at OSU can be viewed here. The full version had a functional database, but the search bar on the test version still works!

Air in China

The most recent and the largest-scale website I've worked on so far is TigerTools, which I worked on alongside two of my classmates and friends, Indu Panigrahi and Adam Rebei. The website connects a simple and intuitive interface with a variety of APIs, allowing Princeton students to quickly locate on-campus amenities, such as printers, scanners, water filling stations, cafés, and vending machines.

The latest version of TigerTools is currently hosted on TigerApps. It requires a Princeton account to access.

TigerTools landing page TigerTools interface

Another website I've developed is named QTPod, which provides a web interface for users to listen to podcasts with advertisements automatically blocked. The advertisements are detected with speech recognition and removed.

QTPod popular page Podcast interface

Footer