Predicting Vaccine Hesitancy
Built predictive model and user segmentation. Won first place of Humana-Mays Analytics Competition.
Team
Georgia Tech
Jia Shi, Siyan Cai, Sam Pang, Manqiu Liu
See details here
Method
Data Preprocessing
- Data type transformation
- Missing data imputation
Feature encoding
- binary labels
- categorical to dummy
- ordinal trend variable transformation
Feature Engineering
- External data: regional vaccination rate and social vulnerability index
- Binning for Age
- Combining variables: based on feature category and meaning
Feature Selection
- XGBoost
- Gini Importance
- Random Forest with entropy
Model Building
2 stage process with train, validation, test and 6 model comparison
- Logistic Regression
- Random Forest
- Neural Networks
- GBDT
- LightGBM
- XGBoost
Final Model
- Fine-tuned XGBoost Classifier with Randomized Search
- Test AUC: 0.6839
- 90% precision rate for hesitant class
- Disparity Score: 0.99