Cleaned up PR: https://github.com/LeelaChessZero/lc0/pull/243
Try minimax eval
(that's an idea for much later)
I've tried to replace Q
from being subtree average to be simple minimax and ran a few games. Although minimax variant was mostly losing, it played quite decently, so it's an idea worth exploring.
The problem with "subtree Q" is that if there is a long tactical losing line, Lc0 doesn't drop the move immediately but it needs many visits for Q to slowly drop, and then many visits for N of other edge to exceed N of "bad node".
The problem with minimax is opposite: when some tactics is found, low score may propagate quite far up, and this node is temporary out of consideration.
Possible solution would be to to have a mix: for theta
between 0 and 1,
Q = theta*minimax_Q + (1theta)*subtree_avg_Q
Hacky patch of minimax Q
if someone wants to play around.
diff
diff git a/src/mcts/node.cc b/src/mcts/node.cc
index adde34a..8ba0b73 100644
 a/src/mcts/node.cc
+++ b/src/mcts/node.cc
@@ 178,10 +178,13 @@ void Node::CancelScoreUpdate() { n_in_flight_; }
void Node::FinalizeScoreUpdate(float v, float gamma, float beta) {
// Recompute Q.
 if (gamma == 1.0f && beta == 1.0f) {
 q_ += (v  q_) / (n_ + 1);
+ if (n_ == 0) {
+ q_ = v;
} else {
 q_ += (v  q_) / (std::pow(static_cast<float>(n_), gamma) * beta + 1);
+ q_ = 1.0f;
+ for (Node* child : ChildNodes()) {
+ q_ = std::min(q_, child>q_);
+ }
}
// Increment N.
++n_;
</float>
该提问来源于开源项目：LeelaChessZero/lc0
 点赞
 写回答
 关注问题
 收藏
 复制链接分享
 邀请回答
29条回答

采纳
点赞 评论 复制链接分享

采纳
https://www.remicoulom.fr/CG2006/CG2006.pdf this paper might be of interest even if it's for Go. It analyzes the advantages/disadvantages of backing up the mean instead of the max value.
Key paragraph for me : "These data clearly demonstrate what was suggested intuitively in the begin ning of this section: the mean operator underestimates the node value, whereas the max operator overestimates it. Also, the mean operator is more accurate when the number of simulations is low, and the max operator is more accurate when the number of simulations is high."
点赞 评论 复制链接分享 
采纳
Thanks , didn't knew about that paper and it's exactly what I was looking for.
点赞 评论 复制链接分享 
采纳
so should we be using something like maxstderr? this would behave like mean at low nodecounts, but would converge to Mac as the number of nodes increased
点赞 评论 复制链接分享 
采纳
that would be the mean plus the standard deviation?
点赞 评论 复制链接分享 
采纳
no. mean+std would not approach the max as we got more visits. Subtracting the standard error would cause it to behave like the mean a low visits, but since standard error gets divided by n, it would approach the max at higher visit counts.
点赞 评论 复制链接分享 
采纳
sorry I don't follow, substracting the standard error of
X
toY
? Where I don't know whatX
andY
are. We need a function that takes as input the (Q, N) values of each child and outputs a Q value that is more accurate than the actual weighted average.点赞 评论 复制链接分享 
采纳
Q_current = max Q_child  standard error of Q_child
点赞 评论 复制链接分享 
采纳
In my experimental build, I use a similar (but not identical) scheme. I calculate the real variance for each node and add a halfnormal noise term with that variance scaled by a parameter to q for the puct formula. It yields some minor elo gain.
In general, i am not quite certain i understand the tests above. Am I correct in saying that at default puct the scheme was a net elo loss and that with cpuct of 3.0 or 4.0 it starts to gain elo? Is that "higher" cpuct value for both opponents or just for the minimax version?
In general we need a much larger number of games to get out of the noise for a change of that magnitude.
点赞 评论 复制链接分享 
采纳
https://github.com/Zeta36/leelafish
I made a chess engine based in the Stockfish tree search but where we use the LCZero value head as evaluation function. So in this project we are just using the Stockfish code but replacing the human developed evaluation function for the neural network value head of the LeelaChess Zero project.
This is a kind of experiment in which we try to figure out if the results are good even without the use of the MCTS LCZero does.
Please can anybody with a good GPU help me to test the project performance?
I did in my local machine some tests and the results were: Results are very promising and in a 1:1 ratio (when the number of nodes used by the original Stockfish or LCZero are forced to be equal to the number of nodes used by LeelaFish) our development is able to beat both SF and LCZero. We used for these tests the LCZero network tesnet 10510.
One thing is clear: the value head of the network is as good as the original manually programmed evaluation function of SF.
点赞 评论 复制链接分享 
采纳
Obsolete.
点赞 评论 复制链接分享 
采纳
I tried this and similar schemes some months back and could not make them work, it always lost a substantial amount of elo. But any new datapoint on this is welcome. One idea i did not follow through yet is to use alpha beta bounds not for backpropagating value but just to prune branches by penalizing policy for visiting out of bound branches..... so many ideas, so little time....
点赞 评论 复制链接分享 
采纳
Question about the averaging idea: won't that cause the nonminimax component of
Q
of parent node to be "contaminated" with is childs minimaxmixed values?点赞 评论 复制链接分享 
采纳
Also something else to take into consideration: when there are no more child moves it could be that we checkmated the opposing king, or that we drowned it, so te score could be
0.0f
in that case, right?.点赞 评论 复制链接分享 
采纳
The snippet above doesn't take into account parent's eval if there is at least one extended child:
 Q=0.5
 Q=0.5
I tried to change it to only omit parent's eval if all edges are there (which does sound more reasonable to me):
 Q=0.5
 Q=0.5
But it resulted in worse play.
点赞 评论 复制链接分享 
采纳
IMHO there is no need to do that, AB engines have really low branching factors and they still use the minimax value propagation.
点赞 评论 复制链接分享 
采纳
Another idea would be to derive the "minmax" component from some other value, like depth/extended vs nonextended childs/nodes/etc
点赞 评论 复制链接分享 
采纳
Please try this! But I have some reservations if such a scheme can work in the current framework. The problem is that any minimax component obtained by sampling (even if deterministic) will have an error, and if we further assume that extreme evals are subject to largest error, a minimax backup (even if weighted) will propagate on average the largest errors to root. At first glance I dismissed this quote by DM as am offhand remark to justify their oneglove fits all approach (A0):
This provides a much more powerful representation, but may also introduce spurious approximation errors. MCTS averages over these approximation errors, which therefore tend to cancel out when evaluating a large subtree. In contrast, alphabeta search computes an explicit minimax, which propagates the biggest approximation errors to the root of the subtree.
After more deeply examining MCTS, I think they are spoton. Two obvious questions arise though:
(1) How can a/b work with propagating large approximation errors? The answer is simple, we look at many more nodes by exhaustively searching parts of the tree (remember a/b, while more effective, yields identical results to pure minimax). So the largest errors will be much smaller.
(2) So how could this work in for example Scorpio. AFAIK (at least early versions) of Scorpio do an a/b search at the MCTS leaf nodes, and is seems likely that as in (1) this reduces error. Also if Scorpio uses vanilla MCTS (not the A0 flavour), then he might not use fpu, which then would also reduce the problem, as its less selective.
Maybe we should ask the author of Scorpio, as he seems to have done some work on this, to get more insight?! I also think that eventually an approach combining a/b and mcts will prevail in "trappy" games like chess, so Scorpio is on the right track here.
点赞 评论 复制链接分享 
采纳
As per :
final score of pr221 minmaxsearchcomponent= 0 vs 0.5 (default): +21 7 =53. Colab disconnected at this point.
点赞 评论 复制链接分享 
采纳
This was with tc=10/10+.5. Faster test with tc=40/10+.1: +59 46 =95 Elo difference: 22.62 +/ 34.96
点赞 评论 复制链接分享 
采纳
Longer time control tc=10/40+.5 minmaxsearchcomponent= 0 vs 0.5: +10 2 =12 until colab disconnection
点赞 评论 复制链接分享 
采纳
Thanks for testing
Looks like it is weaker, right? (I don't have access tobayeselo
now). It's an interesting experiment though. Let me know if there is any variation you guys want to try, and we can use PR221 and Google Colab to try to understand if the idea has potential.点赞 评论 复制链接分享 
采纳
Maybe found something:
Score of ID10469 vs ID2: 13  33  54 [0.400] 100 Elo difference: 70.44 +/ 46.21 Finished match
(In favour of minmax). This is as per commit https://github.com/DanielUranga/lc0/commit/3b0d9c1ac2142b7f5b14cf5efcffd97df5e4f302 with
cpuct=3.0
at fixed 800 nodes. Going to try with fixed 1600 nodes now.点赞 评论 复制链接分享 
采纳
At 1600 nodes,
cpuct=3.0
for minmax:Score of ID10469 vs ID2: 24  30  46 [0.470] 100 Elo difference: 20.87 +/ 50.33
(in favour of minmax) Going to test 1600 nodes + tesnet id 10480 for both, with
cpuct=4.0
for minmax.点赞 评论 复制链接分享 
采纳
1600 nodes both,
cpuct=4.0
for minmax.Score of ID10480 vs ID10480minmax: 18  17  65 [0.505] 100 Elo difference: 3.47 +/ 40.41 Finished match
点赞 评论 复制链接分享 
采纳
800 nodes both,
cpuct=3.0
for minmaxScore of ID10480 vs ID10480minmax: 9  41  50 [0.340] 100 Elo difference: 115.23 +/ 48.17 Finished match
点赞 评论 复制链接分享 
采纳
800 nodes for both,
cpuct=3.0
for bothScore of ID10480 vs ID10480minmax: 19  32  49 [0.435] 100 Elo difference: 45.42 +/ 48.91 Finished match
点赞 评论 复制链接分享 
采纳
From now on testing https://github.com/LeelaChessZero/lc0/pull/221/commits/06709c0019b63bd04707eaa67a4b7364947fd568 Both with fixed 200 nodes,
cpuct=3.0
:Score of ID10480 vs ID10480minmax: 21  39  40 [0.410] 100 Elo difference: 63.23 +/ 53.40 Finished match
点赞 评论 复制链接分享 
采纳
800 nodes
cpuct=3.0
Score of ID10480minmax vs ID10480: 25  12  63 [0.565] 100 Elo difference: 45.42 +/ 41.30 Finished match
点赞 评论 复制链接分享
相关推荐
 回答 1 已采纳 Problem Description Triangulation of surfaces has applications in the Finite Element Method of solid mechanics. The objective is to estimate the stress and strain on complex objects by partitioning them into small simple objects which are considered incompressible. It is convenient to approximate a plane surface with a simple polygon, i.e., a piecewiselinear, closed curve in the plane on m distinct vertices, which does not intersect itself. A chord is a line segment between two nonadjacent vertices of the polygon which lies entirely inside the polygon, so in particular, the endpoints of the chord are the only points of the chord that touch the boundary of the polygon. A triangulation of the polygon, is any choice of m  3 chords, such that the polygon is divided into triangles. In a triangulation, no two of the chosen chords intersect each other, except at endpoints, and all of the remaining (unchosen) chords cross at least one of the chosen chords. Fortunately, finding an arbitrary triangulation is a fairly easy task, but what if you were asked to find the best triangulation according to some measure? Figure I.1: Five out of nine possible triangulations of the example polygon. The leftmost has the smallest largest triangle. Input On the first line of the input is a single positive integer n, telling the number of test scenarios to follow. Each scenario begins with a line containing one positive integer 2 < m < 50, being the number of vertices of the simple polygon. The following m lines contain the vertices of the polygon in the order they appear along the border, going either clockwise or counter clockwise, starting at an arbitrary vertex. Each vertex is described by a pair of integers x y obeying 0 <= x <= 10 000 and 0 <= y <= 10 000. Output For each scenario, output one line containing the area of the largest triangle in the triangulation of the polygon which has the smallest largest triangle. The area should be presented with one fractional decimal digit. Sample Input 1 6 7 0 6 2 9 5 3 5 0 3 1 1 Sample Output 9.0
 4年前回答 2 已采纳 Freddy Frog is sitting on a stone in the middle of a lake. Suddenly he notices Fiona Frog who is sitting on another stone. He plans to visit her, but since the water is dirty and full of tourists' sunscreen, he wants to avoid swimming and instead reach her by jumping. Unfortunately Fiona's stone is out of his jump range. Therefore Freddy considers to use other stones as intermediate stops and reach her by a sequence of several small jumps. To execute a given sequence of jumps, a frog's jump range obviously must be at least as long as the longest jump occuring in the sequence. The frog distance (humans also call it minimax distance) between two stones therefore is defined as the minimum necessary jump range over all possible paths between the two stones. You are given the coordinates of Freddy's stone, Fiona's stone and all other stones in the lake. Your job is to compute the frog distance between Freddy's and Fiona's stone. Input The input will contain one or more test cases. The first line of each test case will contain the number of stones n (2 <= n <= 200). The next n lines each contain two integers xi, yi (0 <= xi, yi <= 1000) representing the coordinates of stone #i. Stone #1 is Freddy's stone, stone #2 is Fiona's stone, the other n2 stones are unoccupied. There's a blank line following each test case. Input is terminated by a value of zero (0) for n. Output For each test case, print a line saying "Scenario #x" and a line saying "Frog Distance = y" where x is replaced by the test case number (they are numbered from 1) and y is replaced by the appropriate real number, printed to three decimals. Put a blank line after each test case, even after the last one. Sample Input 2 0 0 3 4 3 17 4 19 4 18 5 0 Sample Output Scenario #1 Frog Distance = 5.000 Scenario #2 Frog Distance = 1.414
 萧乡月夜的博客 MTD(f) A Minimax Algorithm faster than NegaScout ...MTD(f) is a new minimax search algorithm, simpler and more efficient than previous algorithms. In tests with a number of tournament game playing
 weixin_30838921的博客 and to try them first is called the killer move heuristic. Transpositions: Different permutations of the move sequence that end up in the same position. Repeated states occur frequently in many ...
 @yuqing_wang的博客 经常用的 evaluation function： a linear combination of features Eval(s)=wtf(s)Eval(s)=w^tf(s)Eval(s)=wtf(s) expectimax 在game tree中引入chance node： consider average case， expected utility rule: ∀...
 颹蕭蕭的博客 minimax，alphabeta剪枝原理，TICTACTOE
 4年前PGaholic的博客 2048 Minimax 人工智能 算法 针对目前火爆的2048游戏，有人实现了一个AI程序，可以以较大概率（高于90%）赢得游戏，并且作者在stackoverflow上简要介绍了AI的算法框架和实现思路。但是这个回答主要集中在启发...
 5年前GarfieldEr007的博客 针对目前火爆的2048游戏，有人实现了一个AI程序，可以以较大概率（高于...这篇文章将主要分为两个部分，第一部分介绍其中用到的基础算法，即Minimax和Alphabeta剪枝；第二部分分析作者具体的实现。 基础算法
 7年前过顶擒龙的博客 原文出处： 张洋（@敲代码的张洋） 针对目前火爆的2048游戏，有人实现了一个AI程序，可以以较大概率（高于90%...这篇文章将主要分为两个部分，第一部分介绍其中用到的基础算法，即Minimax和Alphabeta剪枝；第二
 7年前SkyJ的博客 // try a 2 and 4 in each cell and measure how annoying it is // with metrics from eval var candidates = []; var cells = this . grid . availableCells (); var scores = { 2 : [], 4 : ...
 4年前qiqi_686的博客 try: move = eval(move_string) except NameError: move = move_string return move def random_player(game, state): """A player that chooses a legal move at random. 随机选择一个动作""" return ...
 高黑的博客 // try a 2 and 4 in each cell and measure how annoying it is // with metrics from eval var candidates = []; var cells = this.grid.availableCells(); var scores = { 2: [], 4: [] }; for (var value in ...
 夏日里的猫的博客 转载自：... 书籍推荐：《大嘴巴漫谈数据挖掘(全彩)》 – chinapub 2014年4月9日机器学习smallroof ...Chinapub链接：《大嘴巴漫谈数据挖掘(全彩)》 ...ISBN：978712
 mmc2015的博客 写的不错，直接拿来了。 http://bamos.github.io/2016/08/09/deepcompletion/#sohowcanwecompleteimages IntroductionStep 1: Interpreting images as samples from a probability ...How