Finally, it uses a simpler tree search that relies upon this single neural network to evaluate positions and sample moves, without performing any Monte-Carlo rollouts.
In each position st, a Monte-Carlo tree search (MCTS) αθ is executed (see Figure 2) using the latest neural network fθ. Moves are selected according to the search probabilities computed by the MCTS, at ~ πt.
Figure 2: Monte-Carlo tree search in AlphaGo Zero.
Monte-Carlo tree search (MCTS) may also be viewed as a form of self-play reinforcement learning.
MCTS programs have previously achieved strong amateur level in Go, but used substantial domain expertise: a fast rollout policy, based on handcrafted features, that evaluates positions by running simulations until the end of the game; and a tree policy, also based on handcrafted features, that selects moves within the search tree.
this single neural network to evaluate positions and sample moves,
Rollout 과 관련하여 논문에 나온 내용을 뽑아보겠습니다.