Using machine learning to better model long-term care insurance claims

<?xml version="1.0" encoding="UTF-8"?><collection xmlns="" xmlns:xsi="" xsi:schemaLocation="">
    <leader>00000cab a2200000   4500</leader>
    <controlfield tag="001">MAP20220023795</controlfield>
    <controlfield tag="003">MAP</controlfield>
    <controlfield tag="005">20220916085518.0</controlfield>
    <controlfield tag="008">220916e20220912esp|||p      |0|||b|spa d</controlfield>
    <datafield tag="040" ind1=" " ind2=" ">
      <subfield code="a">MAP</subfield>
      <subfield code="b">spa</subfield>
      <subfield code="d">MAP</subfield>
    <datafield tag="084" ind1=" " ind2=" ">
      <subfield code="a">6</subfield>
    <datafield tag="100" ind1="1" ind2=" ">
      <subfield code="0">MAPA20220008242</subfield>
      <subfield code="a">Cummings, Jared</subfield>
    <datafield tag="245" ind1="1" ind2="0">
      <subfield code="a">Using machine learning to better model long-term care insurance claims</subfield>
      <subfield code="c">Jared Cummings</subfield>
    <datafield tag="520" ind1=" " ind2=" ">
      <subfield code="a">Long-term care insurance (LTCI) should be an essential part of a family financial plan. It could protect assets from the expensive and relatively common risk of needing disability assistance, and LTCI purchase rates are lower than expected. Though there are multiple reasons for this trend, it is partially due to the difficultly insurers have in operating profitably as LTCI providers. If LTCI providers were better able to forecast claim rates, they would have less difficulty maintaining profitability. In this article, we develop several models to improve upon those used by insurers to forecast claim rates. We find that standard logistic regression is outperformed by tree-based and neural network models. More modest improvements can be found by using a neighbor-based model. Of all of our tested models, the random forest models were the consistent top performers. Additionally, simple sampling techniques influence the performance of each of the models. This is especially true for the deep neural network, which improves drastically under oversampling. The effects of the sampling vary depending on the size of the available data. To better understand this relationship, we thoroughly examine three states with various amounts of available data as case studies.

    <datafield tag="650" ind1=" " ind2="4">
      <subfield code="0">MAPA20170005476</subfield>
      <subfield code="a">Machine learning</subfield>
    <datafield tag="650" ind1=" " ind2="4">
      <subfield code="0">MAPA20080567118</subfield>
      <subfield code="a">Reclamaciones</subfield>
    <datafield tag="650" ind1=" " ind2="4">
      <subfield code="0">MAPA20080573867</subfield>
      <subfield code="a">Seguro de salud</subfield>
    <datafield tag="773" ind1="0" ind2=" ">
      <subfield code="w">MAP20077000239</subfield>
      <subfield code="g">12/09/2022 Tomo 26 Número 3 - 2022 , p. 470-483</subfield>
      <subfield code="x">1092-0277</subfield>
      <subfield code="t">North American actuarial journal</subfield>
      <subfield code="d">Schaumburg : Society of Actuaries, 1997-</subfield>