THIS IS A SIMPLE REINFORCEMENT LEARNING DEMO USING Q-LEARNING WITH A GRID WORLD ENVIRONMENT IN PYTHON AND TKINTER. THE AGENT LEARNS TO REACH A RANDOMLY GENERATED GOAL USING EPSILON-GREEDY ACTION SELECTION AND Q-VALUE UPDATES.
2025-09-17.21-17-28.mp4
- STATE: REPRESENTED AS
(X, Y)COORDINATES OF THE AGENT. - ACTION:
0= UP1= DOWN2= LEFT3= RIGHT
- REWARD:
+10IF AGENT REACHES THE GOAL-1FOR EVERY OTHER MOVE
- Q-TABLE IS A DICTIONARY THAT MAPS
(STATE, ACTION)TO A Q-VALUE. - INITIAL Q-VALUE FOR ALL STATE-ACTION PAIRS IS
0.
- INITIALIZE Q-TABLE WITH 0 FOR ALL STATE-ACTION PAIRS.
- FOR EACH EPISODE:
- PLACE GOAL AT RANDOM LOCATION
- RESET AGENT TO
(0,0) - WHILE AGENT HAS NOT REACHED GOAL:
- CHOOSE ACTION USING EPSILON-GREEDY + HEURISTIC
- MOVE AGENT AND UPDATE DISPLAY
- MARK CURRENT CELL AS EXPLORED
- TAKE ACTION AND OBSERVE NEXT STATE AND REWARD
- UPDATE Q-TABLE USING Q-LEARNING FORMULA
- DECAY EPSILON
- IF AGENT REACHES GOAL:
- CLEAR EXPLORED MARKERS
- END EPISODE
- AGENT IS RED
- GOAL IS GREEN
- EXPLORED CELLS ARE BLUE (SMALL RECTANGLES INSIDE CELLS)
- PYTHON 3
- TKINTER
python3 python+heuristic+elipsongreedy/gui.pyYOU CAN TRY THE LIVE DEMO OF THIS Q-LEARNING GRID AGENT IN YOUR BROWSER HERE (MOBILE IMPLEMENTATION NOT DONE YET :( ):
https://samtime101.github.io/Q-Learning-distance-heuristic/