掲題の件、export_graphvizで可視化した添付画像について、ノード内のsamplesとvaluesの合計が一致しないのはなぜでしょうか?
外国のサイトで調べると、
I guess number of samples and values do not match because of bootstrapping approach in which samples are drawn with replacement. Number of "values" give the number of samples from each class after bootstrapping, and "samples" give the distinct sample number.
There are actually 50 samples from each class. Value should be 50 50 50 when bootstrap is false, but it is 47, 41, 62 now. In the documentation of scikit it says "The sub-sample size is always the same as the original input sample size but the samples are drawn with replacement if bootstrap=True (default).", so tree always work on 150 samples, but they are not distinct. Two-third is not mentioned in the document.
という答えが出てくるのですが、これは本当に正しいでしょうか?
宜しくお願い致します。
あなたの回答
tips
プレビュー