Medical Doctor (MBBS) & Data Scientist | Dedicated to Health Equity and Mental Wellness
Scalable Drug-Drug Interaction API - Enhancing Drug Safety with Machine Learning
Developed a proof of concept (POC) API using FastAPI, RAG (Retrieval-Augmented Generation), ChromaDB, and React to support real-time drug interaction queries for up to 1,000 concurrent users. Integrated machine learning models to enhance query precision by combining structured data retrieval with AI. Conducted extensive unit and load testing to ensure scalability and reliability, demonstrating expertise in API development, machine learning integration, and user interface design.
Tools: FastAPI, RAG, ChromaDB, React
Techniques: Machine Learning, API Development, UI Design, Scalability Testing
Outcome: This scalable and reliable system significantly improves drug safety by providing precise, real-time interaction data, enhancing both patient care and clinical decision-making.
Tableau Dashboard - Netflix Content Distribution
Created an interactive dashboard to analyze Netflix's global content distribution and its shift from movies to TV shows, using the Kaggle Netflix dataset for visualization in Tableau. The project demonstrated data visualization skills, allowing users to explore changes in Netflix's content strategy and identify key trends.
Tools: Python, Tableau
Techniques: Data Cleaning, Data Visualization
Outcome: The final Tableau dashboard offers an interactive way to explore Netflix's content strategy, showing how the platform's focus has shifted and reflecting broader viewing trends.
Used Car Market Analysis and Price Prediction
I managed the creation of a predictive model for forecasting used car prices, focusing on encoding categorical features and using regression techniques to predict 'price_log'. Using Python, I processed and analyzed data, then built and validated regression models.
Tools: Python
Libraries: pandas, numpy, matplotlib, seaborn, sklearn
Techniques: Data Cleansing, Exploratory Data Analysis, Categorical Feature Encoding, Train-Test Split, Regression Modeling & Analysis (Linear Regression, Ridge/Lasso Regression), Model Performance Evaluation (R2 Score, RMSE).
Outcome: The project highlighted key factors influencing used car prices. Through various regression models, including Linear Regression, Ridge, and Random Forest, I identified crucial predictors like Power and CarAge, enhancing predictive accuracy for used car prices.
Analysis of Diabetes Prevalence in Pima Indians
In this project, I conducted an exploratory data analysis to identify the prevalence and predictors of diabetes among Pima Indian women. I utilized Python and its libraries to clean, analyze, and visualize the data.
Tools: Python
Libraries: Numpy, Pandas, Seaborn, Matplotlib
Techniques: Exploratory Data Analysis, Statistical Modeling, Data Visualization
Outcome: My analysis revealed significant predictors of diabetes, providing valuable insights into its distribution within the Pima Indian community, which could inform future public health strategies.
Movie Recommendation System - Google PaLM 2 API
This team project introduces MovieConnect, a system that leverages Google's PaLM 2 API for creating a personalized movie recommendation engine. The team collaborated on algorithm development, system integration, and user interface design, focusing on enhancing the recommendation accuracy and user experience.
Tools: Google Generative AI (PaLM 2), Python, Streamlit.
Libraries: Pandas, scikit-learn, numpy, matplotlib, seaborn, IPython, ipywidgets.
Techniques: Generative AI, Large Language Models (LLMs) Integration, IPython for interactive UI, prompt design and testing for model training, Web Application Development.
Outcome: MovieConnect demonstrates the successful application of advanced AI in entertainment, significantly improving recommendation personalization and introducing a new standard for user interaction in digital content platforms.
Boston House Price Prediction Project
In this project, I developed a machine learning model to predict Boston housing prices. I leveraged regression analysis, employing Python’s sklearn library for model training, testing, and validation based on historical housing data.
Tools: Python
Libraries: NumPy, pandas, Scikit-learn, Matplotlib, Seaborn
Techniques: Linear regression, decision trees, random forest, model evaluation metrics (MSE, RMSE, MAE).
Outcome: The model effectively predicted house prices, demonstrating the importance of features such as the number of rooms and proximity to employment centers, providing insights into the housing market dynamics.