weixin_39958100
weixin_39958100
2020-12-03 02:32

Try minimax eval

(that's an idea for much later)

I've tried to replace Q from being subtree average to be simple minimax and ran a few games. Although minimax variant was mostly losing, it played quite decently, so it's an idea worth exploring.

The problem with "subtree Q" is that if there is a long tactical losing line, Lc0 doesn't drop the move immediately but it needs many visits for Q to slowly drop, and then many visits for N of other edge to exceed N of "bad node".

The problem with minimax is opposite: when some tactics is found, low score may propagate quite far up, and this node is temporary out of consideration.

Possible solution would be to to have a mix: for theta between 0 and 1, Q = theta*minimax_Q + (1-theta)*subtree_avg_Q

Hacky patch of minimax Q if someone wants to play around.

diff
diff --git a/src/mcts/node.cc b/src/mcts/node.cc
index adde34a..8ba0b73 100644
--- a/src/mcts/node.cc
+++ b/src/mcts/node.cc
@@ -178,10 +178,13 @@ void Node::CancelScoreUpdate() { --n_in_flight_; }

 void Node::FinalizeScoreUpdate(float v, float gamma, float beta) {
   // Recompute Q.
-  if (gamma == 1.0f && beta == 1.0f) {
-    q_ += (v - q_) / (n_ + 1);
+  if (n_ == 0) {
+    q_ = v;
   } else {
-    q_ += (v - q_) / (std::pow(static_cast<float>(n_), gamma) * beta + 1);
+    q_ = 1.0f;
+    for (Node* child : ChildNodes()) {
+      q_ = std::min(q_, -child->q_);
+    }
   }
   // Increment N.
   ++n_;
</float>

该提问来源于开源项目:LeelaChessZero/lc0

  • 点赞
  • 写回答
  • 关注问题
  • 收藏
  • 复制链接分享
  • 邀请回答

29条回答

  • weixin_39960145 weixin_39960145 5月前

    Cleaned up PR: https://github.com/LeelaChessZero/lc0/pull/243

    点赞 评论 复制链接分享
  • weixin_39558521 weixin_39558521 5月前

    https://www.remi-coulom.fr/CG2006/CG2006.pdf this paper might be of interest even if it's for Go. It analyzes the advantages/disadvantages of backing up the mean instead of the max value.

    Key paragraph for me : "These data clearly demonstrate what was suggested intuitively in the begin ning of this section: the mean operator under-estimates the node value, whereas the max operator over-estimates it. Also, the mean operator is more accurate when the number of simulations is low, and the max operator is more accurate when the number of simulations is high."

    点赞 评论 复制链接分享
  • weixin_39960145 weixin_39960145 5月前

    Thanks , didn't knew about that paper and it's exactly what I was looking for.

    点赞 评论 复制链接分享
  • weixin_39614146 weixin_39614146 5月前

    so should we be using something like max-stderr? this would behave like mean at low nodecounts, but would converge to Mac as the number of nodes increased

    点赞 评论 复制链接分享
  • weixin_39960145 weixin_39960145 5月前

    that would be the mean plus the standard deviation?

    点赞 评论 复制链接分享
  • weixin_39614146 weixin_39614146 5月前

    no. mean+std would not approach the max as we got more visits. Subtracting the standard error would cause it to behave like the mean a low visits, but since standard error gets divided by n, it would approach the max at higher visit counts.

    点赞 评论 复制链接分享
  • weixin_39960145 weixin_39960145 5月前

    sorry I don't follow, substracting the standard error of X to Y? Where I don't know what X and Y are. We need a function that takes as input the (Q, N) values of each child and outputs a Q value that is more accurate than the actual weighted average.

    点赞 评论 复制链接分享
  • weixin_39614146 weixin_39614146 5月前

    Q_current = max Q_child - standard error of Q_child

    点赞 评论 复制链接分享
  • weixin_39815879 weixin_39815879 5月前

    In my experimental build, I use a similar (but not identical) scheme. I calculate the real variance for each node and add a half-normal noise term with that variance scaled by a parameter to q for the puct formula. It yields some minor elo gain.

    In general, i am not quite certain i understand the tests above. Am I correct in saying that at default puct the scheme was a net elo loss and that with cpuct of 3.0 or 4.0 it starts to gain elo? Is that "higher" cpuct value for both opponents or just for the minimax version?

    In general we need a much larger number of games to get out of the noise for a change of that magnitude.

    点赞 评论 复制链接分享
  • weixin_39836860 weixin_39836860 5月前

    https://github.com/Zeta36/leela-fish

    I made a chess engine based in the Stockfish tree search but where we use the LCZero value head as evaluation function. So in this project we are just using the Stockfish code but replacing the human developed evaluation function for the neural network value head of the LeelaChess Zero project.

    This is a kind of experiment in which we try to figure out if the results are good even without the use of the MCTS LCZero does.

    Please can anybody with a good GPU help me to test the project performance?

    I did in my local machine some tests and the results were: Results are very promising and in a 1:1 ratio (when the number of nodes used by the original Stockfish or LCZero are forced to be equal to the number of nodes used by LeelaFish) our development is able to beat both SF and LCZero. We used for these tests the LCZero network tesnet 10510.

    One thing is clear: the value head of the network is as good as the original manually programmed evaluation function of SF.

    点赞 评论 复制链接分享
  • weixin_39958100 weixin_39958100 5月前

    Obsolete.

    点赞 评论 复制链接分享
  • weixin_39815879 weixin_39815879 5月前

    I tried this and similar schemes some months back and could not make them work, it always lost a substantial amount of elo. But any new datapoint on this is welcome. One idea i did not follow through yet is to use alpha beta bounds not for backpropagating value but just to prune branches by penalizing policy for visiting out of bound branches..... so many ideas, so little time....

    点赞 评论 复制链接分享
  • weixin_39960145 weixin_39960145 5月前

    Question about the averaging idea: won't that cause the non-minimax component of Q of parent node to be "contaminated" with is childs minimax-mixed values?

    点赞 评论 复制链接分享
  • weixin_39960145 weixin_39960145 5月前

    Also something else to take into consideration: when there are no more child moves it could be that we checkmated the opposing king, or that we drowned it, so te score could be 0.0f in that case, right?.

    点赞 评论 复制链接分享
  • weixin_39958100 weixin_39958100 5月前

    The snippet above doesn't take into account parent's eval if there is at least one extended child:

    
          |
        Q=0.5    
    
          |
        Q=0.5    

    I tried to change it to only omit parent's eval if all edges are there (which does sound more reasonable to me):

    
          |
        Q=0.5    
    
          |
        Q=0.5    

    But it resulted in worse play.

    点赞 评论 复制链接分享
  • weixin_39960145 weixin_39960145 5月前

    IMHO there is no need to do that, AB engines have really low branching factors and they still use the minimax value propagation.

    点赞 评论 复制链接分享
  • weixin_39960145 weixin_39960145 5月前

    Another idea would be to derive the "minmax" component from some other value, like depth/extended vs non-extended childs/nodes/etc

    点赞 评论 复制链接分享
  • weixin_39815879 weixin_39815879 5月前

    Please try this! But I have some reservations if such a scheme can work in the current framework. The problem is that any mini-max component obtained by sampling (even if deterministic) will have an error, and if we further assume that extreme evals are subject to largest error, a mini-max backup (even if weighted) will propagate on average the largest errors to root. At first glance I dismissed this quote by DM as am off-hand remark to justify their one-glove fits all approach (A0):

    This provides a much more powerful representation, but may also introduce spurious approximation errors. MCTS averages over these approximation errors, which therefore tend to cancel out when evaluating a large subtree. In contrast, alpha-beta search computes an explicit minimax, which propagates the biggest approximation errors to the root of the subtree.

    After more deeply examining MCTS, I think they are spot-on. Two obvious questions arise though:

    (1) How can a/b work with propagating large approximation errors? The answer is simple, we look at many more nodes by exhaustively searching parts of the tree (remember a/b, while more effective, yields identical results to pure minimax). So the largest errors will be much smaller.

    (2) So how could this work in for example Scorpio. AFAIK (at least early versions) of Scorpio do an a/b search at the MCTS leaf nodes, and is seems likely that as in (1) this reduces error. Also if Scorpio uses vanilla MCTS (not the A0 flavour), then he might not use fpu, which then would also reduce the problem, as its less selective.

    Maybe we should ask the author of Scorpio, as he seems to have done some work on this, to get more insight?! I also think that eventually an approach combining a/b and mcts will prevail in "trappy" games like chess, so Scorpio is on the right track here.

    点赞 评论 复制链接分享
  • weixin_39960145 weixin_39960145 5月前

    As per : final score of pr221 --minmax-search-component= 0 vs 0.5 (default): +21 -7 =53. Colab disconnected at this point.

    点赞 评论 复制链接分享
  • weixin_39944146 weixin_39944146 5月前

    This was with tc=10/10+.5. Faster test with tc=40/10+.1: +59 -46 =95 Elo difference: 22.62 +/- 34.96

    点赞 评论 复制链接分享
  • weixin_39944146 weixin_39944146 5月前

    Longer time control tc=10/40+.5 --minmax-search-component= 0 vs 0.5: +10 -2 =12 until colab disconnection

    点赞 评论 复制链接分享
  • weixin_39960145 weixin_39960145 5月前

    Thanks for testing
    Looks like it is weaker, right? (I don't have access to bayeselo now). It's an interesting experiment though. Let me know if there is any variation you guys want to try, and we can use PR221 and Google Colab to try to understand if the idea has potential.

    点赞 评论 复制链接分享
  • weixin_39960145 weixin_39960145 5月前

    Maybe found something:

    
    Score of ID10469 vs ID2: 13 - 33 - 54  [0.400] 100
    Elo difference: -70.44 +/- 46.21
    Finished match
    

    (In favour of minmax). This is as per commit https://github.com/DanielUranga/lc0/commit/3b0d9c1ac2142b7f5b14cf5efcffd97df5e4f302 with cpuct=3.0 at fixed 800 nodes. Going to try with fixed 1600 nodes now.

    点赞 评论 复制链接分享
  • weixin_39960145 weixin_39960145 5月前

    At 1600 nodes, cpuct=3.0 for minmax:

    
    Score of ID10469 vs ID2: 24 - 30 - 46  [0.470] 100
    Elo difference: -20.87 +/- 50.33
    

    (in favour of minmax) Going to test 1600 nodes + tesnet id 10480 for both, with cpuct=4.0 for minmax.

    点赞 评论 复制链接分享
  • weixin_39960145 weixin_39960145 5月前

    1600 nodes both, cpuct=4.0 for minmax.

    
    Score of ID10480 vs ID10480-minmax: 18 - 17 - 65  [0.505] 100
    Elo difference: 3.47 +/- 40.41
    Finished match
    
    点赞 评论 复制链接分享
  • weixin_39960145 weixin_39960145 5月前

    800 nodes both, cpuct=3.0 for minmax

    
    Score of ID10480 vs ID10480-minmax: 9 - 41 - 50  [0.340] 100
    Elo difference: -115.23 +/- 48.17
    Finished match
    
    点赞 评论 复制链接分享
  • weixin_39960145 weixin_39960145 5月前

    800 nodes for both, cpuct=3.0 for both

    
    Score of ID10480 vs ID10480-minmax: 19 - 32 - 49  [0.435] 100
    Elo difference: -45.42 +/- 48.91
    Finished match
    
    点赞 评论 复制链接分享
  • weixin_39960145 weixin_39960145 5月前

    From now on testing https://github.com/LeelaChessZero/lc0/pull/221/commits/06709c0019b63bd04707eaa67a4b7364947fd568 Both with fixed 200 nodes, cpuct=3.0:

    
    Score of ID10480 vs ID10480-minmax: 21 - 39 - 40  [0.410] 100
    Elo difference: -63.23 +/- 53.40
    Finished match
    
    点赞 评论 复制链接分享
  • weixin_39960145 weixin_39960145 5月前

    800 nodes cpuct=3.0

    
    Score of ID10480-minmax vs ID10480: 25 - 12 - 63  [0.565] 100
    Elo difference: 45.42 +/- 41.30
    Finished match
    
    点赞 评论 复制链接分享

相关推荐