top of page
Image by Emily Morter

My Research

I am passionate about research at the intersection of science and computation (particularly AI).  I started my research journey by investigating multi-modal approaches to forecasting Alzheimer’s, and then explored cheminformatics and computer vision methods applied to healthcare and environmental problems. I have learned through these experiences that everything is connected. For example - I find that my understanding of biology and chemistry have helped me develop better AI approaches to not just healthcare problems but also to environmental problems, and I have been able to leverage AI methods used in one field for another. I have also seen how computational methods and infrastructure choices can greatly affect the viability of solutions, for example from a power consumption standpoint. Along my journey, I have also come to appreciate how challenging it is to both find and work with real-world datasets and how much value a public dataset can have to encourage innovation. To this end, my projects over the past two years have also had a focus on creating new datasets to contribute to the community.

​

You can find datasets I have created here, and my publications here

 

I am still early in my research journey. My research goals so far are to explore these connections further, and see how a deep understanding of, in particular computational methods, biology and chemistry, can be used to model and extract information from data and create practical solutions to challenging environmental and bioinformatics problems. I also expect that, as I learn more, my goals will evolve.

​

My projects are summarized below:

​

Mitigating Textile Pollution

​

The textile industry is one of the world’s largest environmental polluters. My work in this area began with the creation of a textile optimized code that can be digitally printed on fabrics and used for traceability, using object detection methods to overcome the current challenges with QR codes and fabric tags. This work was published in ACM and presented at ACM COMPASS 2024. Next, I collaborated with SPECIM to generate the world’s first public dataset on Near Infrared Spectroscopy Data for fabrics, and developed multiclass classification methods that I demonstrated can correctly classify challenging blends. My current work focuses on the remediation of textile dye effluent (the cause of 20% of all water pollution), using mycoremediation. In this work, I am using Large Language Models to create the world’s first dataset for machine learning modeling of mycoremediation and  have built a mycoremediation simulator and AI models to forecast mycoremediation effectiveness. My current work validates these models via physical experiments with Rit Dye and Trametes Versicolor, matching simulation and ML projections. Furthermore, I have demonstrated the impact of dye effluent on four variants of lettuce, and am conducting experiments to assess the growth of plants in mycoremediated water.  Part of this work has been selected for the 2024 NeurIPS Climate Workshop and will appear online in Climate Change AI. This work has created 4 new public datasets so far, including the world’s first public datasets suitable for NIRS classification, and I am in the process of creating the world’s first public dataset on mycoremediation. This is my own work, with guidance from Harker School teachers.

 

Mosquito Surveillance for Public Health

​

Mosquitoes are the carrier cause of Malaria, Dengue and other diseases that kill millions of people worldwide every year. Climate change has affected mosquito habitats and breeding patterns, causing new spreads of disease. This project aims to create a low power sensor that can be placed in rural areas for long periods of time, gathering mosquito sounds and images, and using AI to classify and track mosquito populations and behaviors.

 

To date, notable baseline results from this work include (a) demonstration low power sound featurization and mosquito classification techniques using librosa featurization and machine learning  (b) demonstration that even small sound snippets (0.1s-1s) can effectively distinguish mosquitos from background noise even involving other insects and (c) use of DINO object labeling followed by YOLO object detection to classify mosquitoes from  object detection methods for correctly classifying mosquitoes in the wild from natural images containing background noise. Extending these, we have demonstrated some of these techniques on a Raspberry Pi 4 and an ESP 32, with the ESP 32 being able to execute the sound classification at extremely low power levels. These initial results suggest feasibility of a low power device using sound as the primary method to detect mosquito presence, followed by on-demand camera invocation for image based mosquito classification. This is the focus of our current work. This work is jointly with Professor Zhao (Computer Science - Arizona State University and the ASU SCENE Program), Awani Gadre (Biology - Santa Clara University), with advice from Dr. Renu Wickramasinghe (Parisitology - Sri Jayawardenapura University - Sri Lanka). Our intent is to evaluate our eventual device prototype in Sri Lanka for Dengue carrier mosquito tracking. Parts of this research have been published in IEEE Conference on AI and Big Data 2024, SPIE Applications of Digital Image Processing 2024, and others selected for poster presentation at NeurIPS WiML 2024. This work has generated two public datasets, one of featurized mosquito sounds and one of DINO bounding box labeled mosquito images.

​

Drug Side Effects and Personalized Medicine

​

I have explored ways to leverage computational models for drug management and personalized medicine, focusing on two areas. The first involved predicting drug side effects using Multi-Task Classifiers on SMILES strings, and a Knowledge Graph that links drugs, side effects, indications and targets. Using datasets from SIDER, DrugBank and PubChem, and the software packages DeepChem, PyKeen, Sci-Kit Learn and other Python libraries, we explored a range of techniques for predicting side effects, including an ensemble of multitask classifier and knowledge graphs, which demonstrated promising results on the holdout set as well as on the drug reports from the FDA’s FAERS database. This work was with Dr. Bharath Ramsundar (CEO of DeepForest Sciences and creator of DeepChem) and published in IEEE/ACM CHASE 2024.

 

The second effort in this area was the result of my internship with PGxAI, where we developed AI algorithms to recommend targeted care for individuals based on drugs, Genotype, RS SNPs (Single Nucleotide Polymorphisms), Genes and other information. This work, in collaboration with PGxAI scientists, and others, led to PGxAI’s first released model (Sirius) and was published in IEEE COMPSAC (see publications list for full author details and affiliations).

​

Other Research Projects

​

I have done several other projects at the intersection of science, AI and applied topics in healthcare and environmental science. They included (a)  experimenting with intermediate and long term forecasting of reference Evapotranspiration using Prophet, SARIMA and Theta algorithms from the Kats library, using geographically diverse historical ET data from OpenET, (b)  using the OASIS-3 dataset build a multi-modal AI combining Clinical Data, MRIs, and FreeSurfer brain volumetrics, and ( c) an app to monitor CO2 levels and relate to ventilation, helping businesses assess safe reopening after COVID-19, using machine learning to forecast CO2 from historical and real-time particulate data. See publications for more details.

©2022 by AI4Science. Proudly created with Wix.com

bottom of page