AlphaGo Zero: Learning from scratch (A.K.A 제파고)

세상의모든계산기2017.10.20 09:04

Rollout 과 관련하여 논문에 나온 내용을 뽑아보겠습니다.

Finally, it uses a simpler tree search that relies upon this single neural network to evaluate positions and sample moves, without performing any Monte-Carlo rollouts.
In each position st, a Monte-Carlo tree search (MCTS) α_θ is executed (see Figure 2) using the latest neural network f_θ. Moves are selected according to the search probabilities computed by the MCTS, at ~ πt.
Figure 2: Monte-Carlo tree search in AlphaGo Zero.
Monte-Carlo tree search (MCTS) may also be viewed as a form of self-play reinforcement learning.
MCTS programs have previously achieved strong amateur level in Go, but used substantial domain expertise: a fast rollout policy, based on handcrafted features, that evaluates positions by running simulations until the end of the game; and a tree policy, also based on handcrafted features, that selects moves within the search tree.

this single neural network to evaluate positions and sample moves, 

without performing any Monte Carlo rollouts.

▼

전체메뉴