Search

Predicting High-Cost health insurance members through boosted trees and oversampling : an application using the HCCI database

Recurso electrónico / Electronic resource
MARC record
Tag12Value
LDR  00000cab a2200000 4500
001  MAP20210010781
003  MAP
005  20210405201151.0
008  210331e20210301esp|||p |0|||b|spa d
040  ‎$a‎MAP‎$b‎spa‎$d‎MAP
084  ‎$a‎6
1001 ‎$0‎MAPA20130016856‎$a‎Hartman, Brian M.
24510‎$a‎Predicting High-Cost health insurance members through boosted trees and oversampling‎$b‎: an application using the HCCI database‎$c‎Brian Hartman, Rebecca Owen, Zoe Gibbs
520  ‎$a‎Using the Health Care Cost Institute data (approximately 47 million members over seven years), we examine how to best predict which members will be high-cost next year. We find that cost history, age, and prescription drug coverage all predict high costs, with cost history being by far the most predictive. We also compare the predictive accuracy of logistic regression to extreme gradient boosting (XGBoost) and find that the added flexibility of the extreme gradient boosting improves the predictive power. Finally, we show that with extremely unbalanced classes (because high-cost members are so rare), oversampling the minority class provides a better XGBoost predictive model than undersampling the majority class or using the training data as is. Logistic regression performance seems unaffected by the method of sampling.
650 4‎$0‎MAPA20080602437‎$a‎Matemática del seguro
650 4‎$0‎MAPA20130012056‎$a‎Gastos médicos
650 4‎$0‎MAPA20120011137‎$a‎Predicciones estadísticas
651 1‎$0‎MAPA20080638337‎$a‎Estados Unidos
7001 ‎$0‎MAPA20210005367‎$a‎Owen, Rebecca
7001 ‎$0‎MAPA20210005374‎$a‎Gibbs, Zoe
7730 ‎$w‎MAP20077000239‎$t‎North American actuarial journal‎$d‎Schaumburg : Society of Actuaries, 1997-‎$x‎1092-0277‎$g‎01/03/2021 Tomo 25 Número 1 - 2021 , p. 53-61