前提

遺伝的プログラミングGPで選択を行う際にルーレット選択を行いたいです。
添付しているソースコードのpopulation変数には、1000個の個体(数式)が格納されています。
populationに格納されているそれぞれの個体はTreeクラスによって作られたインスタンスで、tree.fitnessという変数を持ちます。tree.fitnessは各個体の適応度を表しています。

コード中のmain関数でselectという関数を用いて、選択という操作を行っています。
このコードは見た感じトーナメント選択という手法をとっています。
トーナメント選択とは、populationの中からランダムにM個(指定した分)選び、その中でtree.fitnessが最も小さいものを新たにpopulationに格納、といった操作を行っています。
この部分をルーレット選択で実装したいのですが、うまくできず困っています。

ルーレット選択はpopulationに格納されている各個体の適応度tree.fitnessに注目し、
個体i(population[i])の選ばれる確率 = 個体iの適応度 / 全個体の適応度の和
とし、各個体が選択される確率を与えます。その後、与えられた確率に応じて個体を選択し、新たにpopulationに格納、という手順をとります。

実現したいこと

コード中のトーナメント選択をルーレット選択に変えて実装したい。

コードの補足説明

〇main関数
N: 個体数（population）に格納される数
M: トーナメント選択数
initial_population: 初期個体（数式。Treeインスタンス）をランダムに1000個生成し、populationに格納。
各個体はfitnessという変数をもっており、適応度を表す。

実際に今回使う関数はこれくらいだと思うので、ほかの部分は無視していただいて結構です。
わからない箇所あれば補足いたします。
よろしくお願いいたします。

補足情報（FW/ツールのバージョンなど）

参考

該当のソースコード

Python
1import operator
2import math
3import random
4import numpy as np
5from copy import deepcopy
6import matplotlib.pyplot as plt
7
8
9class Tree(list):
10    def __init__(self,content):
11        list.__init__(self,content)
12        self.fitness = None
13    @property
14    def height(self):
15        stack = [0]
16        max_depth = 0
17        for elem in self:
18            depth = stack.pop()
19            max_depth = max(max_depth, depth)
20            stack.extend([depth+1] * elem[2])
21        return max_depth
22    def searchSubtree(self, begin):
23        end = begin + 1
24        total = self[begin][2]
25        while total > 0:
26            total += self[end][2] - 1
27            end += 1
28        return slice(begin, end)
29    def __str__(self):
30        start = 0
31        str_arr = [""]
32        for node in self:
33            for i in range(len(str_arr)):
34                if str_arr[i] == "":
35                    name,func,arity = node
36                    if arity != 0:
37                        tmp_arr = [name,"(",""]
38                        for x in range(arity-1):
39                            tmp_arr += [",",""]
40                        tmp_arr.append(")")
41                        str_arr = str_arr[:i] + tmp_arr + str_arr[i+1:]
42                    else:
43                        str_arr = str_arr[:i] + [name] + str_arr[i+1:]
44                    break
45        return "".join(str_arr)
46
47def makePrimeNumbers(len_numbers):
48    primes = [2, 3]
49    n = 5
50    while len(primes) < len_numbers:
51        isprime = True
52        for i in range(1, len(primes)):
53            if primes[i] ** 2 > n:
54                break
55            if n % primes[i] == 0:
56                isprime = False
57                break
58        if isprime:
59            primes.append(n)
60        n += 2
61    return primes
62
63def protectedDiv(left, right):
64    with np.errstate(divide='ignore',invalid='ignore'):
65        y = np.divide(left, right)
66        if isinstance(y, np.ndarray):
67            y[np.isinf(y)] = 1
68            y[np.isnan(y)] = 1
69        elif np.isinf(y) or np.isnan(y):
70            y = 1
71    return y
72
73def protectedPower(left,right):
74    with np.errstate(over='ignore',invalid='ignore'):
75        y = np.power(left, right)
76        if isinstance(y, np.ndarray):
77            y[np.isinf(y)] = 1
78            y[np.isnan(y)] = 1
79        elif np.isinf(y) or np.isnan(y):
80            y = 1
81    return y
82
83def makeNodeSet():
84    nodeSet = []
85    leafSet = []
86    nodeSet.append(("add",np.add,2))
87    nodeSet.append(("sub",np.subtract,2))
88    nodeSet.append(("mul",np.multiply,2))
89    nodeSet.append(("div",protectedDiv,2))
90    #nodeSet.append(("pow",np.power,2))
91    nodeSet.append(("sin",np.sin,1))
92    nodeSet.append(("cos",np.cos,1))
93    nodeSet.append(("tan",np.tan,1))
94    # nodeSet.append(("log",np.log,1))
95    # nodeSet.append(("exp",np.exp,1))
96    # nodeSet.append(("sqrt",np.sqrt,1))
97
98    leafSet.append(("-1",1,0))
99    leafSet.append(("0",0,0))
100    leafSet.append(("1",1,0))
101    leafSet.append(("0.5",0.5,0))
102    leafSet.append(("0.1",0.1,0))
103    leafSet.append(("pi",np.pi,0))
104    leafSet.append(("e",np.e,0))
105    leafSet.append(("x",None,0))
106
107    return nodeSet,leafSet
108
109def randomTreeMake(nodeSet,leafSet,_min,_max):
110    height = random.randint(_min,_max)
111    items = []
112    stack = [0]
113    while len(stack) != 0:
114        depth = stack.pop()
115        if depth == height:
116            node = random.choice(leafSet)
117        else:
118            node = random.choice(nodeSet)
119            stack.extend([depth+1]*node[2])
120        items.append(node)
121    tree = Tree(items)
122    return tree
123
124def get_y(tree,nodeSet,leafSet,x):
125    stack = []
126    for node in tree[::-1]:
127        name,func,arity = node
128        if name == "x":
129            output = x
130        else:
131            if arity == 0:
132                output = func*np.ones(x.shape[0])
133            if arity == 1:
134                output = func(stack.pop())
135            if arity == 2:
136                output = func(stack.pop(),stack.pop())
137        stack.append(output)
138    return stack.pop()
139
140def mate(tree1,tree2,limit,limMode):
141    save1 = deepcopy(tree1)
142    save2 = deepcopy(tree2)
143    while True:
144        index1 = random.randint(1,len(tree1)-1)
145        index2 = random.randint(1,len(tree2)-1)
146        slice1 = tree1.searchSubtree(index1)
147        slice2 = tree2.searchSubtree(index2)
148        tree1[slice1],tree2[slice2] = tree2[slice2],tree1[slice1]
149        if limMode == "LENGTH":
150            if len(tree1) <= limit and len(tree2) <= limit:
151                return tree1,tree2
152        elif limMode == "HEIGHT":
153            if tree1.height <= limit and tree2.height <= limit:
154                return tree1,tree2
155        else:
156            return tree1,tree2
157        tree1,tree2 = deepcopy(save1),deepcopy(save2)
158
159def mutate(tree,NodeSet,leafSet,limit,limMode):
160    save = deepcopy(tree)
161    while True:
162        index = random.randrange(len(tree))
163        slice_ = tree.searchSubtree(index)
164        tree[slice_] = randomTreeMake(NodeSet,leafSet,1,2)
165        if limMode == "LENGTH":
166            if len(tree) <= limit:
167                return tree
168        elif limMode == "HEIGHT":
169            if tree.height <= limit:
170                return tree
171        else:
172            return tree
173        tree = deepcopy(save)
174
175def initial_population(N,nodeSet,leafSet,_min,_max):
176    population = []
177    for n in range(N):
178        tree = randomTreeMake(nodeSet,leafSet,_min,_max)
179        population.append(tree)
180    return population
181
182def evaluate(population,nodeSet,leafSet,x,y):
183    for tree in population:
184        if tree.fitness == None:
185            _y = get_y(tree,nodeSet,leafSet,x)
186            fitness = np.sum(np.abs(_y - y))
187            tree.fitness = fitness
188
189def select(population,M):
190    new_population = []
191    for i in range(len(population)):
192        sub_population = random.sample(population,M)
193        best = min(sub_population,key=lambda tree:tree.fitness)
194        new_population.append(deepcopy(best))
195    return new_population
196
197def crossover(population,Pcx):
198    random.shuffle(population)
199    for i in range(int(len(population)/2)):
200        if random.random() < Pcx:
201            childA = population[2*i]
202            childB = population[2*i+1]
203            childA,childB = mate(childA,childB,17,"HEIGHT")
204            population[2*i] = deepcopy(childA)
205            population[2*i+1] = deepcopy(childB)
206            population[2*i].fitness = None
207            population[2*i+1].fitness = None
208    return population
209
210def mutation(population,Pmut,nodeSet,leafSet):
211    random.shuffle(population)
212    for i in range(len(population)):
213        if random.random() < Pmut:
214            child = population[i]
215            child = mutate(child,nodeSet,leafSet,17,"HEIGHT")
216            population[i] = deepcopy(child)
217            population[i].fitness = None
218
219    return population
220
221def printLog(population,g):
222    key=lambda tree:tree.fitness
223    worst = max(population,key=key)
224    best = min(population,key=key)
225    ave = sum([tree.fitness for tree in population])/len(population)
226    key=lambda tree:len(tree)
227    max_node = len(max(population,key=key))
228    min_node = len(min(population,key=key))
229    ave_node = sum([len(tree) for tree in population])/len(population)
230    print("===========",g,"genelation===========")
231    print("worst fitness : ",worst.fitness,"  max nodes : ",max_node)
232    print("ave   fitness : ",ave          ,"  ave nodes : ",ave_node)
233    print("best  fitness : ",best.fitness, "  min nodes : ",min_node)
234
235def makeGraph(tree,x,y,nodeSet,leafSet):
236    _y = get_y(tree,nodeSet,leafSet,x)
237    plt.plot(x,y,label="true prime number")
238    plt.plot(x,_y,label="gp prime number")
239    plt.legend(loc="upper left")
240    plt.savefig("test.pdf")
241
242def main():
243    N = 1000
244    M = 7
245    maxG = 500
246    Pcx = 0.5
247    Pmut = 0.1
248    nodeSet,leafSet = makeNodeSet()
249    x = np.array(range(1,10))
250    y = np.array(makePrimeNumbers(x.shape[0]))
251
252    population = initial_population(N,nodeSet,leafSet,1,2)
253    g = 0
254    while True:
255        evaluate(population,nodeSet,leafSet,x,y)
256        printLog(population,g)
257        if g == maxG:
258            break
259        population = select(population,M)  # 変更したい部分
260        population = crossover(population,Pcx)
261        population = mutation(population,Pmut,nodeSet,leafSet)
262        g+=1
263    best = min(population,key=lambda tree:tree.fitness)
264    print(best)
265    makeGraph(best,x,y,nodeSet,leafSet)
266
267main()
268

行動規範の内容に同意します

回答1件

ベストアンサー

重み付き乱数を使用してみてはいかがでしょうか。
https://docs.python.org/ja/3.10/library/random.html#random.choices
random.choices()関数の引数 weights に各要素の重みを指定することで、相対的な重みに基づいて要素が選ばれます。

py
1sub_population = random.choices(population, weights=(p.fitness for p in population), k=M)

fitnessが小さいものを優先するのであれば下記のようになるかと

py
1sub_population = random.choices(population, weights=(1/p.fitness for p in population), k=M)

追記

M個を重複なしで、かつfitness に応じて優先的に選出したい（fitness大ならば選出確率大）という場合、
random.sample の重み付き版である numpy.random.choice(replace=false, p=重みリスト)を使ってみてはいかがでしょうか。
（非復元重み付きランダムサンプリング）

python
1def select(population,M):
2    new_population = []
3    # probeの総和を1にするためにfitnessの合計値を求めておく
4    s = sum(u.fitness for u in population)
5    # fitnessから求めた population各個体の選出確率のリスト（重みリスト）
6    prob = [p.fitness/s for p in population]
7    for i in range(len(population)):
8        sub_population = np.random.choice(population,size=M,replace=False,p=prob)
9        best = min(sub_population,key=lambda tree:tree.fitness)
10        new_population.append(deepcopy(best))
11    return new_population

numpyを使わない場合は下記のような形で。

python
1def select(population,M):
2    new_population = []
3    for i in range(len(population)):
4        sub_population = []
5        # 選出用populationのインデックスとfitnessのペアを格納したプール
6        pool = [[idx, p.fitness] for idx,p in enumerate(population)]
7
8        # M個選出するためにM回繰り返す
9        for _ in range(M):
10            # 現在のプールの重みリスト
11            w = [p[1] for p in pool]
12            # 重み付き確率に応じて選出プールから1個選出する
13            selected = random.choices(pool, weights=w, k=1)[0]
14            sub_population.append(population[selected[0]])
15            # 選出された個体の重みをゼロにして選出対象外とする
16            selected[1]=0
17
18        best = min(sub_population,key=lambda tree:tree.fitness)
19        new_population.append(deepcopy(best))
20    return new_population
21

※いずれにしても計算は重くなります。

投稿2022/11/25 11:08

編集2022/11/26 01:06

退会済みユーザー

総合スコア0

AI_engineer

2022/11/25 11:35

ご回答ありがとうございます。 URL見ましたが、いまいち理解できませんでした... ルーレット選択では、（個体の選ばれる確率）＝　（個体の適応度）/ （全個体の適応度の和）で表されます。相対的な重みに基づいてとありますが、この重み付き乱数でも上式と同じ確率で選択されるのでしょうか。

AI_engineer

2022/11/25 13:04

書いていただいたコードを試してみましたが、sub_populationに選ばられる個体が毎回同じになってしまいます...

退会済みユーザー

2022/11/25 13:10

はい。たとえば下記のコードの結果をみていただければ適応度とみなした「重み」配列の確率で選択されていることがお分かりいただけるかと思います。 import random numbers = list(range(0,10)) #背番号0～9 w = [1, 3, 5, 7, 9, 9, 7, 5, 3, 1] # index0～9：各背番号0～9に応じた重み　（＝適応度に相当） samples = random.choices(numbers, k = 10000, weights = w) print([samples.count(i) for i in numbers])

退会済みユーザー

2022/11/25 13:49 編集

>sub_populationに選ばられる個体が毎回同じになってしまいます... そもそももともとのコードでは、設定されるfitness が、世代を経るごとに大幅な偏りが出るつくりになっているためではないでしょうか。「全体個体のfitness総和に対する、各個体のfitnessに応じて選出されるようにする」というのが要件と伺っておりますので、仮にfitness が突出して**大きい**個体が存在する場合、サンプリングの過程でその個体が優先的に選ばれるのは当然です。現状の**select関数内で**計算毎に使用される、各個体のfitnessが、どのような値であるか、確認されましたでしょうか？ (場合によっては、質問者さんが想定しない値になっている可能性があるかもしれませんが）（もしかしてfitnessが小さい方を優先的に選ぶ、という要件がありましたでしょうか・であれば回答に追記した形のように逆数にしてみてください。）（fitnessに応じた確率での選出は無視して）1回あたりのサンプリングではとりあえずM個重複なしで選出したい、というのであれば、コードはまた別の形になります。ただしこれは非復元重み付きサンプリングになりますので必ずしも直感的に期待される分布にはなりません。

退会済みユーザー

2022/11/27 08:18 編集

重複なしで選出する場合のコードを追記しました。

AI_engineer

2022/11/29 06:43

ご回答ありがとうございます。コードを見直してみると適合度が小さいものほどよいっぽいですね。コードの方も勉強になりました。ありがとうございました。

行動規範の内容に同意します