2020-12-03 02:32

# Try minimax eval

(that's an idea for much later)

I've tried to replace `Q` from being subtree average to be simple minimax and ran a few games. Although minimax variant was mostly losing, it played quite decently, so it's an idea worth exploring.

The problem with "subtree Q" is that if there is a long tactical losing line, Lc0 doesn't drop the move immediately but it needs many visits for Q to slowly drop, and then many visits for N of other edge to exceed N of "bad node".

The problem with minimax is opposite: when some tactics is found, low score may propagate quite far up, and this node is temporary out of consideration.

Possible solution would be to to have a mix: for `theta` between 0 and 1, `Q = theta*minimax_Q + (1-theta)*subtree_avg_Q`

Hacky patch of minimax `Q` if someone wants to play around.

``````diff
diff --git a/src/mcts/node.cc b/src/mcts/node.cc
--- a/src/mcts/node.cc
+++ b/src/mcts/node.cc
@@ -178,10 +178,13 @@ void Node::CancelScoreUpdate() { --n_in_flight_; }

void Node::FinalizeScoreUpdate(float v, float gamma, float beta) {
// Recompute Q.
-  if (gamma == 1.0f && beta == 1.0f) {
-    q_ += (v - q_) / (n_ + 1);
+  if (n_ == 0) {
+    q_ = v;
} else {
-    q_ += (v - q_) / (std::pow(static_cast<float>(n_), gamma) * beta + 1);
+    q_ = 1.0f;
+    for (Node* child : ChildNodes()) {
+      q_ = std::min(q_, -child->q_);
+    }
}
// Increment N.
++n_;
</float>``````

• 点赞
• 写回答
• 关注问题
• 收藏
• 复制链接分享
• 邀请回答

#### 29条回答

• Cleaned up PR: https://github.com/LeelaChessZero/lc0/pull/243

点赞 评论 复制链接分享
• https://www.remi-coulom.fr/CG2006/CG2006.pdf this paper might be of interest even if it's for Go. It analyzes the advantages/disadvantages of backing up the mean instead of the max value.

Key paragraph for me : "These data clearly demonstrate what was suggested intuitively in the begin ning of this section: the mean operator under-estimates the node value, whereas the max operator over-estimates it. Also, the mean operator is more accurate when the number of simulations is low, and the max operator is more accurate when the number of simulations is high."

点赞 评论 复制链接分享
• Thanks , didn't knew about that paper and it's exactly what I was looking for.

点赞 评论 复制链接分享
• so should we be using something like max-stderr? this would behave like mean at low nodecounts, but would converge to Mac as the number of nodes increased

点赞 评论 复制链接分享
• that would be the mean plus the standard deviation?

点赞 评论 复制链接分享
• no. mean+std would not approach the max as we got more visits. Subtracting the standard error would cause it to behave like the mean a low visits, but since standard error gets divided by n, it would approach the max at higher visit counts.

点赞 评论 复制链接分享
• sorry I don't follow, substracting the standard error of `X` to `Y`? Where I don't know what `X` and `Y` are. We need a function that takes as input the (Q, N) values of each child and outputs a Q value that is more accurate than the actual weighted average.

点赞 评论 复制链接分享
• Q_current = max Q_child - standard error of Q_child

点赞 评论 复制链接分享
• In my experimental build, I use a similar (but not identical) scheme. I calculate the real variance for each node and add a half-normal noise term with that variance scaled by a parameter to q for the puct formula. It yields some minor elo gain.

In general, i am not quite certain i understand the tests above. Am I correct in saying that at default puct the scheme was a net elo loss and that with cpuct of 3.0 or 4.0 it starts to gain elo? Is that "higher" cpuct value for both opponents or just for the minimax version?

In general we need a much larger number of games to get out of the noise for a change of that magnitude.

点赞 评论 复制链接分享
• https://github.com/Zeta36/leela-fish

I made a chess engine based in the Stockfish tree search but where we use the LCZero value head as evaluation function. So in this project we are just using the Stockfish code but replacing the human developed evaluation function for the neural network value head of the LeelaChess Zero project.

This is a kind of experiment in which we try to figure out if the results are good even without the use of the MCTS LCZero does.

Please can anybody with a good GPU help me to test the project performance?

I did in my local machine some tests and the results were: Results are very promising and in a 1:1 ratio (when the number of nodes used by the original Stockfish or LCZero are forced to be equal to the number of nodes used by LeelaFish) our development is able to beat both SF and LCZero. We used for these tests the LCZero network tesnet 10510.

One thing is clear: the value head of the network is as good as the original manually programmed evaluation function of SF.

点赞 评论 复制链接分享
• Obsolete.

点赞 评论 复制链接分享
• I tried this and similar schemes some months back and could not make them work, it always lost a substantial amount of elo. But any new datapoint on this is welcome. One idea i did not follow through yet is to use alpha beta bounds not for backpropagating value but just to prune branches by penalizing policy for visiting out of bound branches..... so many ideas, so little time....

点赞 评论 复制链接分享
• Question about the averaging idea: won't that cause the non-minimax component of `Q` of parent node to be "contaminated" with is childs minimax-mixed values?

点赞 评论 复制链接分享
• Also something else to take into consideration: when there are no more child moves it could be that we checkmated the opposing king, or that we drowned it, so te score could be `0.0f` in that case, right?.

点赞 评论 复制链接分享
• The snippet above doesn't take into account parent's eval if there is at least one extended child:

``````
|
Q=0.5    ``````
``````
|
Q=0.5    ``````

I tried to change it to only omit parent's eval if all edges are there (which does sound more reasonable to me):

``````
|
Q=0.5    ``````
``````
|
Q=0.5    ``````

But it resulted in worse play.

点赞 评论 复制链接分享
• IMHO there is no need to do that, AB engines have really low branching factors and they still use the minimax value propagation.

点赞 评论 复制链接分享
• Another idea would be to derive the "minmax" component from some other value, like depth/extended vs non-extended childs/nodes/etc

点赞 评论 复制链接分享
• Please try this! But I have some reservations if such a scheme can work in the current framework. The problem is that any mini-max component obtained by sampling (even if deterministic) will have an error, and if we further assume that extreme evals are subject to largest error, a mini-max backup (even if weighted) will propagate on average the largest errors to root. At first glance I dismissed this quote by DM as am off-hand remark to justify their one-glove fits all approach (A0):

This provides a much more powerful representation, but may also introduce spurious approximation errors. MCTS averages over these approximation errors, which therefore tend to cancel out when evaluating a large subtree. In contrast, alpha-beta search computes an explicit minimax, which propagates the biggest approximation errors to the root of the subtree.

After more deeply examining MCTS, I think they are spot-on. Two obvious questions arise though:

(1) How can a/b work with propagating large approximation errors? The answer is simple, we look at many more nodes by exhaustively searching parts of the tree (remember a/b, while more effective, yields identical results to pure minimax). So the largest errors will be much smaller.

(2) So how could this work in for example Scorpio. AFAIK (at least early versions) of Scorpio do an a/b search at the MCTS leaf nodes, and is seems likely that as in (1) this reduces error. Also if Scorpio uses vanilla MCTS (not the A0 flavour), then he might not use fpu, which then would also reduce the problem, as its less selective.

Maybe we should ask the author of Scorpio, as he seems to have done some work on this, to get more insight?! I also think that eventually an approach combining a/b and mcts will prevail in "trappy" games like chess, so Scorpio is on the right track here.

点赞 评论 复制链接分享
• As per : `final score of pr221 --minmax-search-component= 0 vs 0.5 (default): +21 -7 =53. Colab disconnected at this point.`

点赞 评论 复制链接分享
• This was with tc=10/10+.5. Faster test with tc=40/10+.1: +59 -46 =95 Elo difference: 22.62 +/- 34.96

点赞 评论 复制链接分享
• Longer time control tc=10/40+.5 --minmax-search-component= 0 vs 0.5: +10 -2 =12 until colab disconnection

点赞 评论 复制链接分享
• Thanks for testing
Looks like it is weaker, right? (I don't have access to `bayeselo` now). It's an interesting experiment though. Let me know if there is any variation you guys want to try, and we can use PR221 and Google Colab to try to understand if the idea has potential.

点赞 评论 复制链接分享
• Maybe found something:

``````
Score of ID10469 vs ID2: 13 - 33 - 54  [0.400] 100
Elo difference: -70.44 +/- 46.21
Finished match
``````

(In favour of minmax). This is as per commit https://github.com/DanielUranga/lc0/commit/3b0d9c1ac2142b7f5b14cf5efcffd97df5e4f302 with `cpuct=3.0` at fixed 800 nodes. Going to try with fixed 1600 nodes now.

点赞 评论 复制链接分享
• At 1600 nodes, `cpuct=3.0` for minmax:

``````
Score of ID10469 vs ID2: 24 - 30 - 46  [0.470] 100
Elo difference: -20.87 +/- 50.33
``````

(in favour of minmax) Going to test 1600 nodes + tesnet id 10480 for both, with `cpuct=4.0` for minmax.

点赞 评论 复制链接分享
• 1600 nodes both, `cpuct=4.0` for minmax.

``````
Score of ID10480 vs ID10480-minmax: 18 - 17 - 65  [0.505] 100
Elo difference: 3.47 +/- 40.41
Finished match
``````
点赞 评论 复制链接分享
• 800 nodes both, `cpuct=3.0` for minmax

``````
Score of ID10480 vs ID10480-minmax: 9 - 41 - 50  [0.340] 100
Elo difference: -115.23 +/- 48.17
Finished match
``````
点赞 评论 复制链接分享
• 800 nodes for both, `cpuct=3.0` for both

``````
Score of ID10480 vs ID10480-minmax: 19 - 32 - 49  [0.435] 100
Elo difference: -45.42 +/- 48.91
Finished match
``````
点赞 评论 复制链接分享
• From now on testing https://github.com/LeelaChessZero/lc0/pull/221/commits/06709c0019b63bd04707eaa67a4b7364947fd568 Both with fixed 200 nodes, `cpuct=3.0`:

``````
Score of ID10480 vs ID10480-minmax: 21 - 39 - 40  [0.410] 100
Elo difference: -63.23 +/- 53.40
Finished match
``````
点赞 评论 复制链接分享
• 800 nodes `cpuct=3.0`

``````
Score of ID10480-minmax vs ID10480: 25 - 12 - 63  [0.565] 100
Elo difference: 45.42 +/- 41.30
Finished match
``````
点赞 评论 复制链接分享