A Nonzero-sum game with reinforcement learning under mean-variance framework
<?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.loc.gov/MARC21/slim http://www.loc.gov/standards/marcxml/schema/MARC21slim.xsd">
<record>
<leader>00000cab a2200000 4500</leader>
<controlfield tag="001">MAP20260001968</controlfield>
<controlfield tag="003">MAP</controlfield>
<controlfield tag="005">20260205101750.0</controlfield>
<controlfield tag="008">260202e20260115bel|||p |0|||b|eng d</controlfield>
<datafield tag="040" ind1=" " ind2=" ">
<subfield code="a">MAP</subfield>
<subfield code="b">spa</subfield>
<subfield code="d">MAP</subfield>
</datafield>
<datafield tag="084" ind1=" " ind2=" ">
<subfield code="a">6</subfield>
</datafield>
<datafield tag="245" ind1="1" ind2="2">
<subfield code="a">A Nonzero-sum game with reinforcement learning under mean-variance framework</subfield>
<subfield code="c">Junyi Guo...[et al.]</subfield>
</datafield>
<datafield tag="520" ind1=" " ind2=" ">
<subfield code="a">This paper examines a competitive setting in which two agents invest in a risk-free and a risky asset while considering both their own wealth and their wealth gap relative to each other. With market parameters partially or fully unknown, the problem is formulated as a nonzero-sum differential game within a reinforcement learning framework. Each agent seeks to optimize a Choquet-regularized, time-inconsistent mean-variance objective. Using dynamic programming, the authors derive a time-consistent Nash equilibrium in an incomplete market. Under a Gaussian mean-return assumption, they obtain an explicit analytical solution that enables the construction of a practical reinforcement learning algorithm. The algorithm shows uniform convergence, despite the absence of a traditional policy-improvement guarantee, and numerical experiments confirm its robustness and effectiveness</subfield>
</datafield>
<datafield tag="650" ind1=" " ind2="4">
<subfield code="0">MAPA20080597641</subfield>
<subfield code="a">Mercados financieros</subfield>
</datafield>
<datafield tag="650" ind1=" " ind2="4">
<subfield code="0">MAPA20250003316</subfield>
<subfield code="a">Gestión de riesgos</subfield>
</datafield>
<datafield tag="650" ind1=" " ind2="4">
<subfield code="0">MAPA20080586447</subfield>
<subfield code="a">Modelo estocástico</subfield>
</datafield>
<datafield tag="650" ind1=" " ind2="4">
<subfield code="0">MAPA20080576790</subfield>
<subfield code="a">Modelo Gaussiano</subfield>
</datafield>
<datafield tag="650" ind1=" " ind2="4">
<subfield code="0">MAPA20080579258</subfield>
<subfield code="a">Cálculo actuarial</subfield>
</datafield>
<datafield tag="650" ind1=" " ind2="4">
<subfield code="0">MAPA20080602437</subfield>
<subfield code="a">Matemática del seguro</subfield>
</datafield>
<datafield tag="650" ind1=" " ind2="4">
<subfield code="0">MAPA20080592042</subfield>
<subfield code="a">Modelos matemáticos</subfield>
</datafield>
<datafield tag="700" ind1="1" ind2=" ">
<subfield code="0">MAPA20080649876</subfield>
<subfield code="a">Guo, Junyi</subfield>
</datafield>
<datafield tag="710" ind1="2" ind2=" ">
<subfield code="0">MAPA20100017661</subfield>
<subfield code="a">International Actuarial Association</subfield>
</datafield>
<datafield tag="773" ind1="0" ind2=" ">
<subfield code="w">MAP20077000420</subfield>
<subfield code="g">19/01/2026 Volume 56 Issue 1 - January 2026 , p. 154 - 180</subfield>
<subfield code="x">0515-0361</subfield>
<subfield code="t">Astin bulletin</subfield>
<subfield code="d">Belgium : ASTIN and AFIR Sections of the International Actuarial Association</subfield>
</datafield>
</record>
</collection>