Predicting High-Cost health insurance members through boosted trees and oversampling : an application using the HCCI database
<?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.loc.gov/MARC21/slim http://www.loc.gov/standards/marcxml/schema/MARC21slim.xsd">
<record>
<leader>00000cab a2200000 4500</leader>
<controlfield tag="001">MAP20210010781</controlfield>
<controlfield tag="003">MAP</controlfield>
<controlfield tag="005">20210405201151.0</controlfield>
<controlfield tag="008">210331e20210301esp|||p |0|||b|spa d</controlfield>
<datafield tag="040" ind1=" " ind2=" ">
<subfield code="a">MAP</subfield>
<subfield code="b">spa</subfield>
<subfield code="d">MAP</subfield>
</datafield>
<datafield tag="084" ind1=" " ind2=" ">
<subfield code="a">6</subfield>
</datafield>
<datafield tag="100" ind1="1" ind2=" ">
<subfield code="0">MAPA20130016856</subfield>
<subfield code="a">Hartman, Brian M.</subfield>
</datafield>
<datafield tag="245" ind1="1" ind2="0">
<subfield code="a">Predicting High-Cost health insurance members through boosted trees and oversampling</subfield>
<subfield code="b">: an application using the HCCI database</subfield>
<subfield code="c">Brian Hartman, Rebecca Owen, Zoe Gibbs</subfield>
</datafield>
<datafield tag="520" ind1=" " ind2=" ">
<subfield code="a">Using the Health Care Cost Institute data (approximately 47 million members over seven years), we examine how to best predict which members will be high-cost next year. We find that cost history, age, and prescription drug coverage all predict high costs, with cost history being by far the most predictive. We also compare the predictive accuracy of logistic regression to extreme gradient boosting (XGBoost) and find that the added flexibility of the extreme gradient boosting improves the predictive power. Finally, we show that with extremely unbalanced classes (because high-cost members are so rare), oversampling the minority class provides a better XGBoost predictive model than undersampling the majority class or using the training data as is. Logistic regression performance seems unaffected by the method of sampling.</subfield>
</datafield>
<datafield tag="650" ind1=" " ind2="4">
<subfield code="0">MAPA20080602437</subfield>
<subfield code="a">Matemática del seguro</subfield>
</datafield>
<datafield tag="650" ind1=" " ind2="4">
<subfield code="0">MAPA20130012056</subfield>
<subfield code="a">Gastos médicos</subfield>
</datafield>
<datafield tag="650" ind1=" " ind2="4">
<subfield code="0">MAPA20120011137</subfield>
<subfield code="a">Predicciones estadísticas</subfield>
</datafield>
<datafield tag="651" ind1=" " ind2="1">
<subfield code="0">MAPA20080638337</subfield>
<subfield code="a">Estados Unidos</subfield>
</datafield>
<datafield tag="700" ind1="1" ind2=" ">
<subfield code="0">MAPA20210005367</subfield>
<subfield code="a">Owen, Rebecca</subfield>
</datafield>
<datafield tag="700" ind1="1" ind2=" ">
<subfield code="0">MAPA20210005374</subfield>
<subfield code="a">Gibbs, Zoe</subfield>
</datafield>
<datafield tag="773" ind1="0" ind2=" ">
<subfield code="w">MAP20077000239</subfield>
<subfield code="t">North American actuarial journal</subfield>
<subfield code="d">Schaumburg : Society of Actuaries, 1997-</subfield>
<subfield code="x">1092-0277</subfield>
<subfield code="g">01/03/2021 Tomo 25 Número 1 - 2021 , p. 53-61</subfield>
</datafield>
</record>
</collection>