三目並べの局面探索をlru_cacheで高速化するには

Question

三目並べを実装しています。局面探索に使うminimax法派生のアルゴリズムは再帰関数のため、`lru_cache`で高速化できるだろうと考えました。ですが予想に反してcProfileによる結果を見る限り、探索関数に`@lru_cache`ラッパーを付けても変化は見られません。どこをどう書き換えればキャッシュが機能するのか、アドバイスをいただければ幸いです。よろしくお願いします。以下のコードとWandboxへのリンクは同じ内容です。 [Tic Tac Toe - [Wandbox]三へ( へ՞ਊ ՞)へﾊｯﾊｯ](https://wandbox.org/permlink/IsDlGopkexRkCNkU) ``` # Tic Tac Toe from __future__ import annotations import random from functools import lru_cache from typing import List class TicTacToeGame: __slots__ = ["state", "search_count"] def __init__(self) -> None: self.state = TicTacToeState() self.search_count = 0 def play(self) -> None: while not self.state.is_settled: choice = self._nega_scout_strategy(self.state.copy()) self.state.turn(choice) def reset(self): self.state = TicTacToeState() self.search_count = 0 def _evaluate(self, state: TicTacToeState, depth: int) -> int: if state.is_win: return 10 - depth elif state.is_loss: return -10 + depth return 0 # この探索関数がボトルネックなので、高速化したい @lru_cache def _nega_scout_strategy(self, state: TicTacToeState, depth: int = 0, alpha: int = -(1 << 30), beta: int = 1 << 30) -> int: self.search_count += 1 if state.is_settled: return self._evaluate(state, depth) best_hand = -1 b = beta available_hands = state.available_hands random.shuffle(available_hands) for i, hand in enumerate(available_hands): child_state = state.copy() child_state.turn(hand) child_value = -self._nega_scout_strategy(child_state, depth + 1, -b, -alpha) if alpha < child_value < beta and i: # re-search child_value = -self._nega_scout_strategy(child_state, depth + 1, -beta, -alpha) if child_value > alpha: # update alpha = child_value best_hand = hand if alpha >= beta: # beta pruning return alpha b = alpha + 1 # set new null window if depth: return alpha # in nodes of search tree return best_hand # in root of search tree class TicTacToeState: _winnable_conditions = \ {0b111000000, 0b000111000, 0b000000111, 0b100100100, 0b010010010, 0b001001001, 0b100010001, 0b001010100} __slots__ = ["_player", "_opponent"] def __init__(self, player: int = 0b000000000, opponent: int = 0b000000000) -> None: self._player = player self._opponent = opponent @staticmethod def _popcount(player: int) -> int: return bin(player).count("1") @property def is_settled(self) -> bool: return self.is_win or self.is_loss or self.is_draw @property def is_draw(self) -> bool: return self.field == 0b111111111 @property def is_win(self) -> bool: return any(self._player & condition == condition for condition in self._winnable_conditions) @property def is_loss(self) -> bool: return any(self._opponent & condition == condition for condition in self._winnable_conditions) @property def field(self) -> int: return self._player | self._opponent @property def available_hands(self) -> List[int]: return [i for i in range(9) if not (self.field & (1 << i))] def judge(self) -> None: if self._popcount(self._player) == self._popcount(self._opponent): player, opponent = "Player", "Opponent" else: player, opponent = "Opponent", "Player" if self.is_win: print(f"{player} win") elif self.is_loss: print(f"{opponent} win") else: print(f"Draw") def display(self) -> None: visible_field = [] player_color, opponent_color = "ox" if self._popcount(self._player) == self._popcount(self._opponent) else "xo" for i in range(9): if self._player & (1 << i): visible_field.append(player_color) elif self._opponent & (1 << i): visible_field.append(opponent_color) else: visible_field.append(".") for row in zip(*[iter(visible_field)] * 3): print(*row, sep="|") print() def turn(self, choice: int) -> None: self._player |= (1 << choice) self._player, self._opponent = self._opponent, self._player def copy(self) -> TicTacToeState: return TicTacToeState(self._player, self._opponent) def main(): """ 各局面での合法手はランダムな順番で探索するので、複数回実行して平均を取った。 game.search_count は1ゲーム(9ターン)で探索関数が呼び出された回数を指す。 lru_cacheを使えば過去のターンで探索済みのものはキャッシュを参照して省略されることを期待していた。 """ time = 10 game = TicTacToeGame() for i in range(1, time + 1): game.play() print(f"{i:02}: {game.search_count=}") game.reset() if __name__ == "__main__": main() ```

Accepted Answer

検証の準備
---
**1**
乱数を用いると毎度結果が変わり面倒なので、シード値を固定します。
```Python
random.seed(0)
```

**2**
試行回数10回は多いので、一発勝負にします。
```Python
def main():
    game = TicTacToeGame()
    game.play()
    print(f"{game.search_count=}")
    game.reset()
```

**3**
キャッシュサイズに依って結果が変わり得るので、とりあえず様子見で無制限にします。
```Python
@lru_cache
def ...

↓

@lru_cache(maxsize=None)
def ...
```

**4**
lru_cacheにラップされた関数オブジェクトは属性cache_infoを持ちます。
これを用いてヒット率を検証します。
```Python
@lru_cache
def func(a): pass

func(1)
func(2)
func(1)
print(func.cache_info())   # CacheInfo(hits=1, misses=2, maxsize=128, currsize=2)
```

検証結果
---
一回もヒットしていないですね。
キャッシュを確認する手間がかかる分、むしろ低速化している恐れがあります。

```plain
game.search_count=21881
CacheInfo(hits=0, misses=21881, maxsize=None, currsize=21881)
```

[Wandbox](https://wandbox.org/permlink/tUkDsiUZ9ud7Ccl9)

原因
---
メソッドの引数のうち、**stateがキャッシュのキーとして万全に機能していません。**
次のように特殊メソッドを定義してやります。
```Python
class TicTacToeState:
    ...
        
    def __eq__(self, other):
        return self._player == other._player      \
           and self._opponent == other._opponent
        
    def __hash__(self):
        return hash((self._player, self._opponent))
```

実行結果
---
ヒット率が大幅に改善したことが分かります。

```plain
game.search_count=6198
CacheInfo(hits=2517, misses=6198, maxsize=None, currsize=6198)
```

[Wandbox](https://wandbox.org/permlink/yzrlTGKq4VAPvndO)

検証の準備

検証結果

原因

実行結果

関連した質問