実現したいこと
UnityのML-Agentsを使って強化学習をしたいと考えています。
フィールド上のエージェントが落下やランダムな位置と角度で配置された障害物との接触を避けながらゴールを目指すものです。
発生している問題
学習を開始してもエージェントが全く動かず、学習が進みません。
手動でWASDキーで動かすことはできるのですが、学習モードの時には動きません。
おそらくエージェントを動かす部分のコード(下記のMoveAgentの部分)が誤っていると思うのですが、どのように誤っているのかわかりません。
同じ環境で別の学習をした際には上手くいったので、環境が原因ではないと思われます。
ご教授よろしくお願い致します。
ソースコード
C#
1using System.Collections; 2using System.Collections.Generic; 3using System.Linq; 4using UnityEngine; 5using Unity.MLAgents; 6using Unity.MLAgents.Sensors; 7using Unity.MLAgents.Actuators; 8 9public class FWAgent : Agent 10{ 11 Rigidbody rBody; 12 public Transform Target; 13 public GameObject RhwystrPlace; 14 15 void Start() 16 { 17 rBody = GetComponent<Rigidbody>(); 18 } 19 20 public override void OnEpisodeBegin() 21 { 22 this.rBody.angularVelocity = Vector3.zero; 23 this.rBody.velocity = Vector3.zero; 24 this.transform.localPosition = new Vector3(5.2f, 0.5f, -5.2f); 25 this.transform.localRotation = Quaternion.identity; 26 27 GameObject[] objects; 28 objects = GameObject.FindGameObjectsWithTag("Rhwystr"); 29 for(int d = 0; d < objects.Length; d++) 30 { 31 Destroy(objects[d].gameObject); 32 } 33 RhwystrPlace.GetComponent<RhwyGenerater>().PutRhwy(); 34 } 35 36 public override void CollectObservations(VectorSensor sensor) 37 { 38 sensor.AddObservation(Target.localPosition); 39 sensor.AddObservation(this.transform.localPosition); 40 41 sensor.AddObservation(rBody.velocity.x); 42 sensor.AddObservation(rBody.velocity.z); 43 } 44 45 public void MoveAgent(ActionSegment<int> act) 46 { 47 var dirToGo = Vector3.zero; 48 var rotateDir = Vector3.zero; 49 50 var action = act[0]; 51 switch (action) 52 { 53 case 1: 54 dirToGo = transform.position += transform.forward * 1.0f * Time.deltaTime; 55 break; 56 case 2: 57 dirToGo = transform.position -= transform.forward * 1.0f * Time.deltaTime; 58 break; 59 case 3: 60 rotateDir = transform.up * 0.5f; 61 break; 62 case 4: 63 rotateDir = transform.up * -0.5f; 64 break; 65 } 66 transform.Rotate(rotateDir, Time.deltaTime * 100f); 67 } 68 69 void OnCollisionEnter(Collision collision) 70 { 71 if (collision.gameObject.name == "Target") 72 { 73 AddReward(1.0f); 74 EndEpisode(); 75 } 76 else 77 { 78 AddReward(-0.2f); 79 EndEpisode(); 80 } 81 } 82 83 private void FixedUpdate() 84 { 85 AddReward(-0.001f); 86 } 87 88 public override void OnActionReceived(ActionBuffers actionBuffers) 89 { 90 MoveAgent(actionBuffers.DiscreteActions); 91 92 Vector3 AgentPos = this.transform.localPosition; 93 Vector3 TargetPos = Target.localPosition; 94 float AgentFall = this.transform.localPosition.y; 95 float distanceToTarget = Vector3.Distance(AgentPos, TargetPos); 96 97 if (AgentFall < 0.45) 98 { 99 AddReward(-0.2f); 100 EndEpisode(); 101 } 102 } 103 104 public override void Heuristic(in ActionBuffers actionsOut) 105 { 106 var discreteActionsOut = actionsOut.DiscreteActions; 107 discreteActionsOut[0] = 0; 108 if (Input.GetKey(KeyCode.W)) 109 { 110 discreteActionsOut[0] = 1; 111 } 112 else if (Input.GetKey(KeyCode.S)) 113 { 114 discreteActionsOut[0] = 2; 115 } 116 else if (Input.GetKey(KeyCode.D)) 117 { 118 discreteActionsOut[0] = 3; 119 } 120 else if (Input.GetKey(KeyCode.A)) 121 { 122 discreteActionsOut[0] = 4; 123 } 124 } 125}
######YAMLファイル
behaviors: FWAgent: trainer_type: ppo hyperparameters: batch_size: 128 buffer_size: 2048 learning_rate: 3.0e-4 beta: 5.0e-4 epsilon: 0.2 lambd: 0.95 num_epoch: 3 learning_rate_schedule: linear network_settings: normalize: false hidden_units: 128 num_layers: 2 reward_signals: extrinsic: gamma: 0.99 strength: 1.0 max_steps: 500000 time_horizon: 128 summary_freq: 10000
###ターミナルでの出力
2021-07-12 00:13:30 INFO [stats.py:139] FWAgent. Step: 10000. Time Elapsed: 47.074 s. Mean Reward: -0.200. Std of Reward: 0.000. Training. 2021-07-12 00:14:12 INFO [stats.py:139] FWAgent. Step: 20000. Time Elapsed: 89.638 s. No episode was completed since last summary. Training. 2021-07-12 00:14:58 INFO [stats.py:139] FWAgent. Step: 30000. Time Elapsed: 135.142 s. No episode was completed since last summary. Training. 2021-07-12 00:15:46 INFO [stats.py:139] FWAgent. Step: 40000. Time Elapsed: 183.530 s. No episode was completed since last summary. Training. 2021-07-12 00:16:35 INFO [stats.py:139] FWAgent. Step: 50000. Time Elapsed: 231.954 s. No episode was completed since last summary. Training. 2021-07-12 00:17:24 INFO [stats.py:139] FWAgent. Step: 60000. Time Elapsed: 281.103 s. No episode was completed since last summary. Training. 2021-07-12 00:18:10 INFO [stats.py:139] FWAgent. Step: 70000. Time Elapsed: 327.051 s. No episode was completed since last summary. Training. 2021-07-12 00:18:52 INFO [stats.py:139] FWAgent. Step: 80000. Time Elapsed: 369.530 s. No episode was completed since last summary. Training. 2021-07-12 00:19:35 INFO [stats.py:139] FWAgent. Step: 90000. Time Elapsed: 412.841 s. No episode was completed since last summary. Training. 2021-07-12 00:20:25 INFO [stats.py:139] FWAgent. Step: 100000. Time Elapsed: 462.921 s. No episode was completed since last summary. Training.
環境
OS: macOS 10.14.6
Unity: Version 2020.3.13f1 Personal
ML-Agents: Release12 (Python Package 0.23.0)
Python: 3.7.6
回答1件
あなたの回答
tips
プレビュー
バッドをするには、ログインかつ
こちらの条件を満たす必要があります。