Accelerated on-device training pipeline with inference only edge AI (Hailo) co-processor. Project evaluates and compares the proposed workflow with common-purpose edge AI hardware, such as Raspberry Pi or GPU-based NVIDIA Jetson. Repository covers model preparation, conversion, on-device training, and runtime profiling.
References:
To simplify project reproduction, Python wheels for ONNX Runtime and PyTorch are available at chmura.put.poznan.pl or at project arrtifacts repository in Zenodo service.
- JetPack: 6.2
- CUDA: 12.6
- Python 3.12.6
- Source: https://github.com/microsoft/onnxruntime
- Branch: v1.22.1
Build command:
./build.sh --config Release --update --build --build_shared_lib --no_kleidiai --skip_tests --enable_training --build_wheel --use_cuda --cuda_home /usr/local/cuda --cuda_version=12.6 --cudnn_home /usr/lib/aarch64-linux-gnu --parallel- Hailo accelerator
- HailoRT: 4.23.0 (pre-built packages, documentation and installtion steps are available at https://hailo.ai/developer-zone/)
- Python 3.12.6
- Source: https://github.com/MatPiech/onnxruntime
- Branch: v1.22.1-hailo
Build command:
./build.sh --config Release --update --build --build_shared_lib --no_kleidiai --skip_tests --enable_pybind --build_wheel --use_hailo --enable_training --parallel --cmake_extra_defines CMAKE_POLICY_VERSION_MINIMUM=3.5References:
To avoid conflicts between dependencies it is recommended to create two virtualenvs, e.g., hailo (conversion from ONNX to Hailo HAR and HEF formats) and ort (conversion from PyTorch to ONNX and generating ONNX Runtime training graphs).
hailo virtualenv:
- Use Python 3.10 due to Hailo Dataflow Compiler Python bindings requirements.
- Install HailoRT (4.23.0) and Hailo Dataflow Compiler (3.31.0) - follow the instructions at https://hailo.ai/developer-zone/.
- Install repository dependencies:
pip install -e .
ort virtualenv:
- Keep the same Python version (3.10) for compatibility with
hailovirtualenv. - Install repository dependencies:
pip install -e . - Build or install
onnxruntime-trainingwith HailoRT provider support to enable conversion of ONNX models withHailoOpto training graphs compatible with ONNX Runtime training APIs:
pip install onnxruntime_training-1.22.1+hailo*.whlModel conversion relies on the notebooks in notebooks/ and the helper script scripts/change_model_output_features.py.
- Create a PyTorch model (from PyTorch Image Models -
timm) and export to ONNX:notebooks/create_onnx_model.ipynb(ortvirtualenv). - Convert ONNX for Hailo (quantization/optimization):
notebooks/create_onnx_hailoop_model.ipynb(hailovirtualenv). - Generate training artifacts for ONNX Runtime:
notebooks/generate_training_graph.ipynb(ortvirtualenv).
The conversion flow typically produces artifacts under artifacts/<model_name>/... used by the runtime scripts.
- Use Python 3.12 with pip (preferably in a virtualenv).
- Install project dependencies:
pip install -e .[jetson]- Build or install
onnxruntime-trainingwith CUDA support.
pip install onnxruntime_training-1.22.1+cu12*.whl- Use Python 3.12 with pip (preferably in a virtualenv).
- Install HailoRT and its Python bindings following the instructions at https://hailo.ai/developer-zone/.
- Install project dependencies:
pip install -e .[rpi]- Build or install
onnxruntime-trainingpackage with hardware accelerator (CPU / Hailo depending on platform):
pip install onnxruntime_training-1.22.1+*.whlRun on-device training using ONNX Runtime training APIs:
python scripts/onnx_runtime_training.py \
--model-dir <model-dir> \
--data-path <dataset-path> \
--device <cpu / cuda / hailo> \
--epochs 1 \
--batch-size 1 \
--trainUse --device cuda on GPU-based Jetson or --device hailo with the Hailo provider when available.
For native PyTorch training on CPU or CUDA:
python scripts/pytorch_training.py \
--model-path <model-file>.pth \
--data-path <dataset-path> \
--device <cpu / cuda> \
--epochs 1 \
--batch-size 1 \
--trainTraining accuracy vs time curves for CIFAR-100 and Oxford-IIIT Pet across four models, comparing Raspberry Pi CPU, Jetson Orin Nano, and Raspberry Pi + Hailo-8L.
Bar charts comparing CIFAR-100 and Oxford-IIIT Pet test accuracy across models, showing how different Hailo int8 quantization and performance recovery strategies restore accuracy relative to Raspberry Pi fp32 baseline.



