Skip to content

Commit 3c25041

Browse files
committed
Explain Pytorch installation
1 parent 9d2853d commit 3c25041

1 file changed

Lines changed: 79 additions & 4 deletions

File tree

src/ppo/main.clj

Lines changed: 79 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@
33
:external-requirements []
44
:quarto {:author [:janwedekind]
55
:draft true
6-
:description "A Clojure port of XinJingHao's PPO implementation using Pytorch and Quil"
6+
:description "A Clojure port of XinJingHao's PPO implementation using libpython-clj2, Pytorch, and Quil"
77
:image "pendulum.png"
88
:type :post
99
:date "2026-04-18"
@@ -15,9 +15,15 @@
1515
[clojure.core.async :as async]
1616
[quil.core :as q]
1717
[quil.middleware :as m]
18-
[libpython-clj2.require :refer (require-python)]))
18+
[libpython-clj2.require :refer (require-python)]
19+
[libpython-clj2.python :refer (py.) :as py]))
1920

20-
(require-python '[torch :as torch])
21+
(require-python '[builtins :as python]
22+
'[torch :as torch]
23+
'[torch.nn :as nn]
24+
'[torch.nn.functional :as F]
25+
'[torch.optim :as optim]
26+
'[torch.distributions :refer (Beta)])
2127

2228
;; ## Motivation
2329
;;
@@ -26,8 +32,9 @@
2632
;; However I had stability issues.
2733
;; The algorithm would learn a strategy and then suddenly diverge again.
2834
;;
29-
;; More recently (2017) the Proximal Policy Optimization (PPO) algorithm was published and it has gained in popularity.
35+
;; More recently (2017) the [Proximal Policy Optimization (PPO) algorithm was published](https://arxiv.org/abs/1707.06347) and it has gained in popularity.
3036
;; PPO is inspired by Trust Region Policy Optimization (TRPO) but is much easier to implement.
37+
;; Most importantly PPO can handle continuous observation and action spaces.
3138
;; The [Stable Baselines3](https://github.com/DLR-RM/stable-baselines3) Python library has a implementation of PPO, TRPO, and other reinforcement learning algorithms.
3239
;; However I found [XinJingHao's PPO implementation](https://github.com/XinJingHao/PPO-Continuous-Pytorch/) which I found easier to follow.
3340
;;
@@ -129,6 +136,8 @@
129136
(observation {:angle 0.0 :velocity 0.5} config)
130137
(observation {:angle (/ PI 2) :velocity 0.0} config)
131138

139+
;; Note that the observation needs to capture all information required for achieving the objective, because it the only information available to the policy for deciding on the next action.
140+
132141
;; ### Action
133142
;;
134143
;; The action of a pendulum is a vector with one element between 0 and 1.
@@ -257,3 +266,69 @@
257266
:on-close (fn [& _] (async/close! done-chan)))
258267
(async/<!! done-chan))
259268
(System/exit 0))
269+
270+
;; ## Neural networks
271+
;;
272+
;; PPO is a machine learning technique using backpropagation to learn the parameters of two neural networks.
273+
;;
274+
;; * The **actor** network takes an observation as an input and outputs the parameters of a probability distribution for sampling the next action to take.
275+
;; * The **critic** takes an observation as an input and outputs the expected cumulative reward for the current state.
276+
;;
277+
;; ### Pytorch
278+
;;
279+
;; For implementing the neural networks and backpropagation, I am using the Python-Clojure bridge [libpython-clj2](https://github.com/clj-python/libpython-clj) and [Pytorch](https://pytorch.org/).
280+
;; The Pytorch library is quite comprehensive, is free software, and you can find a lot of documentation on how to use it.
281+
;; The default version of [Pytorch on pypi.org](https://pypi.org/project/torch/) comes with CUDA (Nvidia) GPU support.
282+
;; There is also a [Pytorch wheel on AMD's website](https://rocm.docs.amd.com/projects/install-on-linux/en/latest/install/3rd-party/pytorch-install.html#use-a-wheels-package) which comes with [ROCm](https://rocm.docs.amd.com/projects/install-on-linux/en/latest/install/quick-start.html) support.
283+
;; Here we are going to use a CPU version of Pytorch which is a much smaller install.
284+
;;
285+
;; You need to install [Python 3.10](https://www.python.org/) or later.
286+
;; For package management we are going to use the [uv](https://docs.astral.sh/uv/) package manager.
287+
;; The following *pyproject.toml* file is used to install Pytorch and NumPy.
288+
;;
289+
;; ```toml
290+
;; [project]
291+
;; name = "ppo"
292+
;; version = "0.1.0"
293+
;; description = "Proximal Policy Optimization"
294+
;; authors = [{ name="Jan Wedekind", email="jan@wedesoft.de" }]
295+
;; requires-python = ">=3.10.0"
296+
;; dependencies = [
297+
;; "numpy",
298+
;; "torch",
299+
;; ]
300+
;;
301+
;; [tool.uv]
302+
;; python-preference = "only-system"
303+
;;
304+
;; [tool.uv.sources]
305+
;; torch = { index = "pytorch" }
306+
;; numpy = { index = "pytorch" }
307+
;;
308+
;; [[tool.uv.index]]
309+
;; name = "pytorch"
310+
;; url = "https://download.pytorch.org/whl/cpu"
311+
;;
312+
;; [build-system]
313+
;; requires = ["setuptools", "wheel"]
314+
;; build-backend = "setuptools.build_meta"
315+
;; ```
316+
;;
317+
;; Note that we are specifying a custom repository index to get the CPU-only version of Pytorch.
318+
;; Also we are using the system version of Python to prevent *uv* from trying to install its own version which lacks the *\_cython* module.
319+
;; To freeze the dependencies and create a *uv.lock* file, you need to run
320+
;;
321+
;; ```bash
322+
;; uv lock
323+
;; ```
324+
;;
325+
;; You can install the dependencies using
326+
;; ```bash
327+
;; uv sync
328+
;; ```
329+
;;
330+
;; In order to access Pytorch from Clojure you need to run the `clj` command via `uv`:
331+
;;
332+
;; ```bash
333+
;; uv run clj
334+
;; ```

0 commit comments

Comments
 (0)