Building end-to-end clinical AI pipelines, from predictive modeling and NLP to LLM-driven agentic systems, that transform health data into actionable decisions.
I'm a Data Scientist with a Master's in Data Science from Illinois Tech, focused on predictive modeling, NLP, and LLM-driven agentic systems. I build end-to-end clinical AI pipelines on AWS, from post-discharge voice triage to hypertension adherence agents, applying robust, interpretable AI to real-world health system challenges.
Architected a three-layer AI pipeline: deterministic rule engine, weighted risk scoring (0-100), and LLM narrative generation with 11 validation checks, for hypertension management across a full patient panel. Built 5 nightly clinical detectors with patient-adaptive thresholds, an Ask ARIA chatbot with three-layer guardrails, and a full-stack system with a 12-table PostgreSQL schema and Next.js clinician dashboard.
Production-grade AI voice triage system targeting the post-discharge follow-up gap, with outbound calls, natural speech clinical assessments, and real-time SBAR reports delivered to a nurse dashboard with tiered risk classification. Designed an 8-Lambda serverless architecture integrating Amazon Bedrock Nova Pro/Lite with LACE readmission risk scoring and SNS escalation for high-risk patients.
AI-powered point-of-care antibiotic prescribing tool combining facility antibiogram data, patient-specific lab values, and validated clinical scoring algorithms to reduce the ~50% inappropriate antibiotic use rate prevalent in US hospitals. Integrates real-time medication safety analysis with local pathogen resistance patterns for EHR-integrated workflows.
A full-stack Django-based automatic subjective grading platform that uses BERT-driven semantic similarity (Sentence Transformers, cosine scoring) with role-based dashboards to evaluate unstructured answers at scale, achieving a 0.87 F1 score and retaining educator control via a human-in-the-loop override.
A context-aware conversational AI for preliminary symptom analysis, integrating LangChain and Gemini LLMs with Pinecone vector databases to perform high-speed semantic search across chunks of clinical text from the Gale Encyclopedia of Medicine
Developed a reproducible, end-to-end ML pipeline for estimating biological age from DNA methylation data using interpretable models and a calibrated stacked ensemble, with robust cross-study validation demonstrating strong generalization on an external cohort
Built an end-to-end, interpretable diabetes prediction pipeline on a 100K-record clinical dataset using Lasso-based feature selection and ensemble modeling, achieving a 0.97 AUC with a tuned XGBoost model while translating predictions into clinically actionable risk insights.
Currently open for new opportunities. Whether you have a question, a project proposal, or just want to say hi, I'll try my best to get back to you!