Creating Intelligent Agents in Unity Using Reinforcement Learning 🎮🧠 | by Klajdi Beqiraj | Aug, 2024


Have you ever wondered how video game characters could learn and adapt like real players? 🤔 In this project, I dove deep into the fascinating world of Machine Learning (ML) to create autonomous agents using Unity, focusing on Reinforcement Learning (RL). Let me take you on a journey through the process, from conceptualization to simulation, and reveal how these virtual agents learn to perform tasks in dynamic environments.

Reinforcement Learning is like training a pet 🐾 — you reward good behavior and discourage bad ones. In RL, an agent learns to make decisions by interacting with its environment. It goes through a cycle of Observations (collecting data from its environment), Decisions (choosing an action), Actions (performing the chosen action), and Rewards (receiving feedback).

In Unity, I used the ML-Agents toolkit, which is designed for creating intelligent agents. The toolkit splits the agent’s functionality into two main parts:

  1. Agent: This is the entity that perceives the environment, makes decisions, and takes actions.
  2. Behavior: This governs how the agent processes observations and decides on actions.
  • Space Size: Defines the dimensionality of observations.
  • Stacked Vectors: Allows the agent to consider multiple observations over time, which is crucial for understanding motion.
  • Behavior Type: Can be set to Heuristic, Learning, or Inference, depending on whether the agent is being trained, tested, or manually controlled.

In the first experiment, the goal was simple yet challenging: teach the agent to reach a ball. The agent starts at a random position, and its task is to move toward the ball while avoiding going out of bounds.

  • Initialization: The agent is spawned randomly in the environment.
  • Observations: The agent continuously measures the distance to the goal.
  • Actions: The agent can move along the X and Y axes.
  • Rewards: A reward of 1 is given for reaching the goal, and a penalty of -1 is given for leaving the environment.

Training involved configuring a neural network with parameters like learning rate, batch size, and beta (which controls entropy). Using TensorFlow’s TensorBoard, I monitored metrics like cumulative reward, episode length, and policy loss to track the agent’s progress.

After training, the agent learned to efficiently reach the goal. The training graphs showed a decrease in episode length and policy loss over time, indicating that the agent was becoming more skilled.

In this evolved experiment, the agent had to first press a button to spawn a food item and then reach it. This added a layer of complexity, requiring the agent to learn a sequence of actions.

  • Initialization: The agent is randomly placed in the environment.
  • Observations: Initially, the agent observes the distance to the button; after pressing it, the distance to the food.
  • Actions: In addition to movement, the agent can press the button.
  • Rewards: The agent receives rewards for pressing the button and reaching the food, along with a small penalty for each step to encourage efficiency.

I experimented with both RL and Imitation Learning (IL). In IL, the agent learns by observing a demonstration provided by the programmer. While RL led the agent to discover the correct sequence through trial and error, IL provided a shortcut by showing the desired behavior directly.

The training phase showed significant oscillations in the reward graphs due to the increased complexity. However, the agent successfully learned the task, demonstrating the power of combining RL with IL.

The final and most ambitious experiment involved two agents — one trying to hide, the other trying to seek. The challenge was to teach the agents their respective roles and optimize their strategies.

  • Initialization: Both agents are placed in random positions.
  • Observations: The seeker receives data on the distance to the hider, while the hider knows the distance to potential hiding spots.
  • Actions: The agents move around the environment and interact with obstacles.
  • Rewards: The seeker is penalized for failing to detect the hider and rewarded when successful, while the hider is penalized for being found.

I introduced a block that the hider could use to obscure the seeker’s view. This added a strategic element, but unfortunately, despite extensive training, the hider struggled to use the block effectively.

While the simulation didn’t achieve the desired outcome, it highlighted the importance of hyperparameter tuning and the potential need for more advanced techniques or longer training times.

This project was an exciting exploration into the world of autonomous agents using Reinforcement Learning in Unity. Each experiment provided valuable insights into how agents can learn and adapt to their environments, even in complex, dynamic scenarios.

Whether you’re a game developer or a machine learning enthusiast, I hope this journey into the world of intelligent agents inspires you to explore the possibilities of RL in your own projects. The potential is limitless! 🌟

Feel free to share your thoughts and experiences — I’d love to hear how you’re using ML in your own projects! 💬

Happy coding! 👾

Recent Articles

Related Stories

Leave A Reply

Please enter your comment!
Please enter your name here