[ Pobierz całość w formacie PDF ]
even though maximal exploitation is unlikely, fast op-
by the learner during the learning phase. The strate-
ponent modelling may still yield significant benefits.
gies with larger ³ values are clearly stronger, more ef-
fectively exploring the opponent s strategy during the
Acknowledgements
learning phase. This advantage is typical of Nash strate-
Thanks to the Natural Sciences and Engineering Re-
gies with ³ > 0.7 across all opponents we tried.
search Council of Canada and the Alberta Ingenuity
Centre for Machine Learning for project funding, and
Learning Method Comparison
the University of Alberta poker group for their insights.
Figure 7 directly compares strategy and parameter
learning (both balanced and Nash exploration (³ = 1)),
References
all against a single opponent. Balanced parameter learn-
Auer, P.; Cesa-Bianchi, N.; Freund, Y.; and Schapire,
ing outperforms strategy learning substantially for this
R. E. 1995. Gambling in a rigged casino: the adversar-
opponent. Over all opponents, either the balanced or the
ial multi-armed bandit problem. In Proc. of the 36th
Nash parameter learner is the best, and strategy learning
Annual Symp. on Foundations of Comp. Sci., 322 331.
is worst in all but one case.
Billings, D.; Davidson, A.; Schauenberg, T.; Burch,
N.; Bowling, M.; Holte, R.; Schaeffer, J.; and Szafron,
Conclusions
D. Game Tree Search with Adaptation in Stochas-
This work shows that learning to maximally exploit an
tic Imperfect Information Games. In Computers and
opponent, even a stationary one in a game as small as
Games 04.
Kuhn poker, is not generally feasible in a small num-
Billings, D.; Burch, N.; Davidson, A.; Holte, R.; Scha-
ber of hands. However, the learning methods explored
effer, J.; Schauenberg, T.; and Szafron, D. 2003. Ap-
are capable of showing positive results in as few as
proximating game-theoretic optimal strategies for full-
50 hands, so that learning to exploit is typically bet-
scale poker. In 18th Intl. Joint Conf. on Artificial In-
ter than adopting a pessimistic Nash equilibrium strat-
telligence (IJCAI 2003).
egy. Furthermore, this 50 hand switching point is ro-
Koller, D., and Pfeffer, A. 1997. Representations and
bust to game length and opponent. Future work includes
non-stationary opponents, a wider exploration of learn- solutions for game-theoretic problems. Artificial Intel-
ligence 94(1):167 215.
ing strategies, and larger games. Both approaches can
scale up, provided the number of parameters or experts
Korb, K., and Nicholson, A. 1999. Bayesian poker. In
is kept small (abstraction can reduce parameters and
Uncertainty in Artificial Intelligence, 343 350.
small sets of experts can be carefully selected). Also,
Kuhn, H. W. 1950. A simplified two-person poker.
the exploration differences amongst equal valued strate-
Contributions to the Theory of Games 1:97 103.
gies (e.g. Nash) deserves more attention. It may be pos-
AAAI-05 / 788
Expected Total Winnings
Expected Total Winnings
[ Pobierz całość w formacie PDF ]