About

As a Data Scientist driven by the intersection of business and predictive analytics, I am dedicated to leveraging the power of AI and machine learning to drive real-world impact. With a strong foundation in natural language processing, image classification, and advanced language models like Mistral, T5, and LLaMA, I am passionate about developing innovative solutions that facilitate informed decision-making and deliver tangible value.

Throughout my career, I have gained hands-on experience in numerous projects where I utilized my expertise to achieve key outcomes. I thrive in environments where I can apply my skills to solve complex problems and drive strategic initiatives. I enjoy being at the crossroads of business and data science where I can be technically hands-on while also being involved in decision-making processes.

I am currently pursuing a Master of Information and Data Science degree from the University of California, Berkeley (Expected Graduation: Dec 2025) and hold a Bachelor of Science in Applied Data Science from Pennsylvania State University. My educational and professional background has equipped me with a robust understanding of various machine learning algorithms, agile processes, data engineering methodologies, model scalability, and solutions architecture.

I am always open to connecting with professionals who share a passion for data-driven innovation and exploring potential collaborations.

  • Occupation: Data Scientist
  • Phone: (267) 467-1239
  • City: Washington D.C.
  • Email: tanavthan@gmail.com

Skills

Software and Web Development

Artificial Intellgience Implementation

Data Visualization

Digital Transformation

Workflow Automation

Predictive Modeling

Web Scraping

Image Processing

Education

University of California, Berkeley

May 2024 - Present
  • Master of Information and Data Science

Pennsylvania State University

August 2019 - May 2023
  • BS in Applied Data Science (Cybersecurity Application Focus)
  • Security and Risk Analysis Minor
  • Information Sciences and Technology Certificate
  • National Security Agency Letter of Recognition

Online Certifications

Palantir: Foundry & AIP Builder Foundations

Databricks: Generative AI Fundamentals

McKinsey & Co: Forward Program

10-week Online Leadership and St rategy bootcamp

Booz Allen Hamilton: AI Engineer Expert

Booz Allen Hamilton: AI Engineer Practitioner

Booz Allen Hamilton: Python Practitioner

Microsoft Azure Data Science Associate

Microsoft Azure Fundamentals

Decisions: Business Analyst

Experience

Booz Allen Hamilton

July 2023 - April 2024
Staff Data Scientist, Senior Consultant
Technology Stack: Python (Transformers, Tensorflow, Keras, PyTorch, Streamlit, NLTK, PySpark), Qlik, SQL, Amazon S3 Storage, Databricks
  • Supporting ADVANA for the Chief Digital and AI Office (CDAO) under the OSD, addressing data science and ML needs.
  • Led development of a Retrieval-Augmented Generation (RAG) pipeline for sensitive PDF processing.
  • Used VGGNet and GradCAM for deep-fake AI image detection, achieving 89% accuracy, with a custom Streamlit front-end.
  • Implemented T5-small and GPT-2 for summarization/comparison to improve content verification by 18%.
  • Spearheaded the launch of a Gen AI challenge for the FDA, leveraging domain expertise and technical ML/AI knowledge to drive execution of high-impact use cases.
  • Designed a Q&A chatbot for client databases, exploring models like Mistral-7b, Dolly-12b, and Phi-2 to generate SQL queries for efficient data retrieval from databases, enhancing user interaction and data accessibility.
  • Led sentiment analysis and NER on 8,000+ social media comments for DoD, improving accuracy by 14%.
  • Developed a tool with LLaMA 405b to evaluate LLM outputs and migrated it to Palantir AIP.
  • Conducted technical interviews with DoD clients to extract key ML/AI use cases.
  • Engineered solution using Google Gemini 1.5 Pro API, fuzzy matching, PDFMiner, and Streamlit for summarizing RFP PDFs.

Lithia Motors Inc.

July 2021 - April 2023
Data Science Analyst
Technology Stack: Snowflake, Neo4J, Python, Microsoft SQL Server, Cypher, Azure Data Factory, SQL, Databricks, Git
  • Built TensorFlow neural network to predict vehicle resale time, reducing inventory costs.
  • Created Python-based Database Comparison algorithm saving over $5,000 in migration validation.
  • Used NLP for text comparison of vehicle trims across DBs to improve workflow and model accuracy.
  • Designed image enrichment algorithm using Azure Databricks + AutoML to enhance e-commerce visuals.
  • Helped migrate/validate scripts from Synapse to Snowflake.
  • Preprocessed large datasets for ML model training and NLP tasks.
  • Built dealership profiling knowledge graphs in Neo4J for customer and sales insights.

Apoio Clinica

March 2021 - November 2021
Machine Learning Engineer
Technology Stack: Python, R, Azure Cloud Services, IBM Cloud Pak for Data, IBM Natural Language Understanding (NLU), Git
  • IBM-sponsored mental health AI software project; Runner-up in Microsoft 2022 Imagine Cup and Nittany AI Challenge.
  • Built ML model using Azure Form Recognizer to extract risk factors from patient in-take forms.
  • Wrangled and cleaned JSON data for patient insight modeling.
  • Used IBM Watson NLU to detect patient sentiment, risk factors, and behavioral patterns.

Nittany Data Labs

June 2020 - October 2020
Software Engineering Intern
Technology Stack: Python, JavaScript, HTML Bootstrap, Git, PowerBI, Microsoft SQL Server
  • Developed user interface using JavaScript, Python, HTML/Bootstrap, and Power BI.
  • Created Microsoft SQL Server database with team.
  • Digitized manual paper processes for tracking volunteers, recipients, and inventory.
  • Gathered requirements and collaborated on system design and business solutions.

Projects

Chain-of-Thought Refinement

BeyondATS AI

OSINT Q&A Bot (RAG)

Robust Neural Networks for Noisy Sign Language Recognition

Stroke Risk Modeling: Convolutional Networks and XGBoost with REST API

Scalable Production LLM Inference API with Kubernetes and Redis Caching

Skills

Languages

  • Python
  • Javascript
  • HTML/CSS
  • SQL
  • R
  • Cypher

Python Libraries

  • PySpark
  • TensorFlow
  • PyTorch
  • Keras
  • Transformers
  • Langchain
  • Pandas
  • Numpy
  • NLTK
  • Spacy
  • Numpy
  • Streamlit
  • Seaborn
  • SciPy
  • Scikit-Learn
  • Statsmodels
  • Matplotlib
  • Pyarrow
  • MLFlow
  • Pydantic

Databases

  • MySQL
  • Neo4J
  • Redis
  • MySQL Server
  • Snowflake
  • MongoDB
  • Databricks Data Warehouse
  • PostgreSQL

Frameworks

  • Flask
  • Django
  • React

Tools

  • Git
  • AWS Sagemaker
  • Google Cloud
  • Azure Form Recognizer
  • Azure Synapse Analytics
  • Jupyter
  • PowerBI
  • Tableau
  • Qlik
  • Databricks
  • AWS EC2
  • Kubernetes
  • Docker
  • Palantir Foundry + AIP
  • AutoML

Contact

Social Profiles

Email

tanavthan@gmail.com

Contact

(267) 467-1239