Skip to content

samTime101/Q-Learning-distance-heuristic

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Q-LEARNING GRID AGENT

THIS IS A SIMPLE REINFORCEMENT LEARNING DEMO USING Q-LEARNING WITH A GRID WORLD ENVIRONMENT IN PYTHON AND TKINTER. THE AGENT LEARNS TO REACH A RANDOMLY GENERATED GOAL USING EPSILON-GREEDY ACTION SELECTION AND Q-VALUE UPDATES.


VIDEO DEM

2025-09-17.21-17-28.mp4

STATE, ACTION, REWARD

  • STATE: REPRESENTED AS (X, Y) COORDINATES OF THE AGENT.
  • ACTION:
    • 0 = UP
    • 1 = DOWN
    • 2 = LEFT
    • 3 = RIGHT
  • REWARD:
    • +10 IF AGENT REACHES THE GOAL
    • -1 FOR EVERY OTHER MOVE

Q-TABLE

  • Q-TABLE IS A DICTIONARY THAT MAPS (STATE, ACTION) TO A Q-VALUE.
  • INITIAL Q-VALUE FOR ALL STATE-ACTION PAIRS IS 0.

ALGORITHM

  1. INITIALIZE Q-TABLE WITH 0 FOR ALL STATE-ACTION PAIRS.
  2. FOR EACH EPISODE:
    1. PLACE GOAL AT RANDOM LOCATION
    2. RESET AGENT TO (0,0)
    3. WHILE AGENT HAS NOT REACHED GOAL:
      1. CHOOSE ACTION USING EPSILON-GREEDY + HEURISTIC
      2. MOVE AGENT AND UPDATE DISPLAY
      3. MARK CURRENT CELL AS EXPLORED
      4. TAKE ACTION AND OBSERVE NEXT STATE AND REWARD
      5. UPDATE Q-TABLE USING Q-LEARNING FORMULA
      6. DECAY EPSILON
      7. IF AGENT REACHES GOAL:
        • CLEAR EXPLORED MARKERS
        • END EPISODE

VISUALIZATION

  • AGENT IS RED
  • GOAL IS GREEN
  • EXPLORED CELLS ARE BLUE (SMALL RECTANGLES INSIDE CELLS)

REQUIREMENTS

  • PYTHON 3
  • TKINTER

HOW TO RUN

python3 python+heuristic+elipsongreedy/gui.py

WEB-BASED DEMO

YOU CAN TRY THE LIVE DEMO OF THIS Q-LEARNING GRID AGENT IN YOUR BROWSER HERE (MOBILE IMPLEMENTATION NOT DONE YET :( ):
https://samtime101.github.io/Q-Learning-distance-heuristic/

About

Agent which follows its goal using Q learning algorithm feeded with distance heuristic

Resources

Stars

Watchers

Forks

Contributors