Reinforcement Learning

Hana M April 28, 2023 | 10:00 AM Technology

Reinforcement learning is a type of machine learning where an agent learns to make decisions by interacting with an environment. The agent receives feedback in the form of rewards or penalties based on its actions, and it learns to make better decisions over time by maximizing its expected long-term reward.

The goal of reinforcement learning is to find an optimal policy, which is a mapping from states to actions that maximizes the expected cumulative reward. This is often done using a trial-and-error approach, where the agent explores the environment by taking actions and receives feedback from the environment in the form of rewards or penalties.

Example:

The problem is as follows: We have an agent and a reward, with many hurdles in between. The agent is supposed to find the best possible path to reach the reward. The following problem explains the problem more easily. [1]

Figure 1. Example of Reinforcement Learning. [1]

Figure 1 is an example of reinforcement learning. The above image shows the robot, diamond, and fire. The goal of the robot is to get the reward that is the diamond and avoid the hurdles that are fired. The robot learns by trying all the possible paths and then choosing the path which gives him the reward with the least hurdles. Each right step will give the robot a reward and each wrong step will subtract the reward of the robot. The total reward will be calculated when it reaches the final reward that is the diamond. [1]

Terms used in Reinforcement Learning

Agent(): An entity that can perceive/explore the environment and act upon it. [2]

Environment(): A situation in which an agent is present or surrounded by. In RL, we assume the stochastic environment, which means it is random in nature. [2]

Action(): Actions are the moves taken by an agent within the environment. [2]

State(): State is a situation returned by the environment after each action taken by the agent. [2]

Reward(): A feedback returned to the agent from the environment to evaluate the action of the agent. [2]

Policy(): Policy is a strategy applied by the agent for the next action based on the current state. [2]

Value(): It is expected long-term retuned with the discount factor and opposite to the short-term reward. [2]

Q-value(): It is mostly similar to the value, but it takes one additional parameter as a current action (a). [2]

Reinforcement learning has been successfully applied in a wide range of applications, such as game playing, robotics, and autonomous driving. Some popular algorithms in reinforcement learning include Q-learning, SARSA, and policy gradient methods.

References:

  1. https://www.geeksforgeeks.org/what-is-reinforcement-learning/
  2. https://www.javatpoint.com/reinforcement-learning

Cite this article:

Hana M (2023), Reinforcement Learning, AnaTechmaz, pp.215

Recent Post

Blog Archive