Skip to content

Commit e2618b0

Browse files
authored
Merge branch 'master' into copilot/implement-gpu-support
2 parents 87ee4ed + d2e6535 commit e2618b0

11 files changed

Lines changed: 780 additions & 86 deletions

File tree

docs/musicalgestures/_flow.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -75,15 +75,15 @@ Renders a dense optical flow video of the input video file using `cv2.calcOptica
7575

7676
### Flow().get_acceleration
7777

78-
[[find in source code]](https://github.com/fourMs/MGT-python/blob/master/musicalgestures/_flow.py#L249)
78+
[[find in source code]](https://github.com/fourMs/MGT-python/blob/master/musicalgestures/_flow.py#L252)
7979

8080
```python
8181
def get_acceleration(velocity, fps):
8282
```
8383

8484
### Flow().get_velocity
8585

86-
[[find in source code]](https://github.com/fourMs/MGT-python/blob/master/musicalgestures/_flow.py#L259)
86+
[[find in source code]](https://github.com/fourMs/MGT-python/blob/master/musicalgestures/_flow.py#L262)
8787

8888
```python
8989
def get_velocity(
@@ -99,7 +99,7 @@ def get_velocity(
9999

100100
### Flow().sparse
101101

102-
[[find in source code]](https://github.com/fourMs/MGT-python/blob/master/musicalgestures/_flow.py#L274)
102+
[[find in source code]](https://github.com/fourMs/MGT-python/blob/master/musicalgestures/_flow.py#L277)
103103

104104
```python
105105
def sparse(
@@ -137,7 +137,7 @@ Renders a sparse optical flow video of the input video file using `cv2.calcOptic
137137

138138
### Flow().velocity_meters_per_second
139139

140-
[[find in source code]](https://github.com/fourMs/MGT-python/blob/master/musicalgestures/_flow.py#L267)
140+
[[find in source code]](https://github.com/fourMs/MGT-python/blob/master/musicalgestures/_flow.py#L270)
141141

142142
```python
143143
def velocity_meters_per_second(

docs/musicalgestures/_pose.md

Lines changed: 11 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@ Helper function to automatically download model (.caffemodel) files.
1818

1919
## pose
2020

21-
[[find in source code]](https://github.com/fourMs/MGT-python/blob/master/musicalgestures/_pose.py#L14)
21+
[[find in source code]](https://github.com/fourMs/MGT-python/blob/master/musicalgestures/_pose.py#L30)
2222

2323
```python
2424
def pose(
@@ -37,15 +37,20 @@ def pose(
3737
```
3838

3939
Renders a video with the pose estimation (aka. "keypoint detection" or "skeleton tracking") overlaid on it.
40-
Outputs the predictions in a text file containing the normalized x and y coordinates of each keypoints
41-
(default format is csv). Uses models from the [openpose](https://github.com/CMU-Perceptual-Computing-Lab/openpose) project.
40+
Outputs the predictions in a text file containing the normalized x and y coordinates of each keypoint
41+
(default format is csv).
42+
43+
Supports two backends:
44+
45+
- **MediaPipe** (`model='mediapipe'`): Uses Google's MediaPipe Pose which detects 33 landmarks entirely on CPU. Requires the optional `mediapipe` package (`pip install musicalgestures[pose]`). The model file (~8–28 MB) is auto-downloaded on first use and cached in `musicalgestures/models/`.
46+
- **OpenPose** (`model='body_25'`, `'coco'`, or `'mpi'`): Uses Caffe-based OpenPose models. Model weights (~200 MB) are downloaded on first use.
4247

4348
#### Arguments
4449

45-
- `model` *str, optional* - 'body_25' loads the model trained on the BODY_25 dataset, 'mpi' loads the model trained on the Multi-Person Dataset (MPII), 'coco' loads one trained on the COCO dataset. The BODY_25 model outputs 25 points, the MPII model outputs 15 points, while the COCO model produces 18 points. Defaults to 'body_25'.
46-
- `device` *str, optional* - Sets the backend to use for the neural network ('cpu' or 'gpu'). Defaults to 'gpu'.
50+
- `model` *str, optional* - Pose model to use. `'mediapipe'` uses MediaPipe Pose (33 landmarks, model auto-downloaded on first use). `'body_25'` loads the OpenPose BODY_25 model (25 keypoints), `'mpi'` loads the MPII model (15 keypoints), `'coco'` loads the COCO model (18 keypoints). Defaults to 'body_25'.
51+
- `device` *str, optional* - Sets the backend to use for the neural network ('cpu' or 'gpu'). Ignored when `model='mediapipe'` (MediaPipe always runs on CPU). Defaults to 'gpu'.
4752
- `threshold` *float, optional* - The normalized confidence threshold that decides whether we keep or discard a predicted point. Discarded points get substituted with (0, 0) in the output data. Defaults to 0.1.
48-
- `downsampling_factor` *int, optional* - Decides how much we downsample the video before we pass it to the neural network. For example `downsampling_factor=4` means that the input to the network is one-fourth the resolution of the source video. Heaviver downsampling reduces rendering time but produces lower quality pose estimation. Defaults to 2.
53+
- `downsampling_factor` *int, optional* - Decides how much we downsample the video before we pass it to the neural network. Ignored when `model='mediapipe'`. Defaults to 2.
4954
- `save_data` *bool, optional* - Whether we save the predicted pose data to a file. Defaults to True.
5055
- `data_format` *str, optional* - Specifies format of pose-data. Accepted values are 'csv', 'tsv' and 'txt'. For multiple output formats, use list, eg. ['csv', 'txt']. Defaults to 'csv'.
5156
- `save_video` *bool, optional* - Whether we save the video with the estimated pose overlaid on it. Defaults to True.

docs/musicalgestures/_pose_estimator.md

Lines changed: 9 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,7 @@ This module provides:
2626
* class `PoseEstimator` – an abstract base class (ABC) defining the common
2727
interface that all pose backends must implement.
2828
* class `MediaPipePoseEstimator` – a concrete backend powered by Google
29-
MediaPipe Pose (33 landmarks, CPU-friendly, zero model download).
29+
MediaPipe Pose (33 landmarks, CPU-friendly, auto-downloads model on first use).
3030
* class `OpenPosePoseEstimator` – a thin wrapper around the legacy OpenPose /
3131
Caffe-model implementation already present in :mod:[Pose](_pose.md#pose).
3232

@@ -56,30 +56,30 @@ class MediaPipePoseEstimator(PoseEstimator):
5656
model_complexity: int = 1,
5757
min_detection_confidence: float = 0.5,
5858
min_tracking_confidence: float = 0.5,
59-
static_image_mode: bool = False,
6059
) -> None:
6160
```
6261

63-
Pose estimator backed by Google MediaPipe Pose.
62+
Pose estimator backed by Google MediaPipe Pose (Tasks API).
6463

65-
Requires the optional ``mediapipe`` package
64+
Requires the optional ``mediapipe>=0.10`` package
6665

6766
```python
6867
pip install musicalgestures[pose]
6968
```
7069

70+
The first time you use a given complexity level the corresponding
71+
`.task` model file (~828 MB) is downloaded from Google's model
72+
storage and cached in `musicalgestures/models/`.
73+
7174
Parameters
7275
----------
7376
model_complexity:
74-
MediaPipe model complexity (0, 1, or 2). Higher = more accurate
75-
but slower. Default: 1.
77+
MediaPipe model complexity (0 = lite, 1 = full, 2 = heavy).
78+
Higher values are more accurate but slower. Default: 1.
7679
min_detection_confidence:
7780
Minimum confidence for initial body detection. Default: 0.5.
7881
min_tracking_confidence:
7982
Minimum confidence for landmark tracking. Default: 0.5.
80-
static_image_mode:
81-
If *True*, treat every frame as a static image (no tracking).
82-
Default: False.
8383

8484
Examples
8585
--------

musicalgestures/__init__.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,7 @@
1515
get_length,
1616
generate_outfilename,
1717
get_cuda_device_count,
18+
show_progress,
1819
)
1920
from musicalgestures._mglist import MgList
2021

musicalgestures/_cropvideo.py

Lines changed: 31 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -197,28 +197,37 @@ def mg_cropvideo_ffmpeg(
197197

198198
if crop_movement.lower() == 'manual':
199199
if not in_colab():
200-
201-
# scale_ratio = get_box_video_ratio(filename)
202-
# width, height = get_widthheight(filename)
203-
# scaled_width, scaled_height = [int(elem * scale_ratio) for elem in [width, height]]
204-
# first_frame_as_image = get_first_frame_as_image(filename, pict_format='.jpg')
205-
206-
# Cropping UI moved to another subprocess to avoid cv2.waitKey crashing Python with segmentation fault on Linux in Terminal
207-
import threading
208-
import queue
209-
210-
que = queue.Queue()
211-
t = threading.Thread(target=lambda q, arg1:q.put(cropping_window(arg1)), args=(que, filename))
212-
213-
t.start()
214-
t.join()
215-
216-
w, h, x, y = que.get()
217-
218-
# x = threading.Thread(target=run_cropping_window, args=(first_frame_as_image, scale_ratio, scaled_width, scaled_height))
219-
# run_cropping_window(first_frame_as_image, scale_ratio, scaled_width, scaled_height)
220-
# x.start()
221-
# x.join()
200+
import sys
201+
import subprocess
202+
import musicalgestures
203+
204+
scale_ratio = get_box_video_ratio(filename)
205+
width, height = get_widthheight(filename)
206+
scaled_width, scaled_height = [int(elem * scale_ratio) for elem in [width, height]]
207+
first_frame_as_image = get_first_frame_as_image(filename, pict_format='.jpg')
208+
209+
module_path = os.path.abspath(os.path.dirname(musicalgestures.__file__))
210+
pyfile = os.path.join(module_path, '_cropping_window.py')
211+
212+
result = subprocess.run(
213+
[sys.executable, pyfile, first_frame_as_image, str(scale_ratio), str(scaled_width), str(scaled_height)],
214+
stdout=subprocess.PIPE, stderr=subprocess.PIPE, text=True
215+
)
216+
217+
os.remove(first_frame_as_image)
218+
219+
if result.returncode != 0:
220+
raise RuntimeError(
221+
f"Cropping window subprocess failed (exit code {result.returncode}):\n{result.stderr}"
222+
)
223+
224+
res = result.stdout.strip()
225+
res_array = res.split(' ')
226+
if len(res_array) != 4:
227+
raise RuntimeError(
228+
f"Unexpected output from cropping window: '{res}'"
229+
)
230+
w, h, x, y = [int(elem) for elem in res_array]
222231

223232
else:
224233
x, y, w, h = manual_text_input()
@@ -228,11 +237,6 @@ def mg_cropvideo_ffmpeg(
228237

229238
cropped_video = crop_ffmpeg(filename, w, h, x, y, target_name=target_name, overwrite=overwrite)
230239

231-
# if crop_movement.lower() == 'manual':
232-
# cv2.destroyAllWindows()
233-
# if not in_colab():
234-
# os.remove(first_frame_as_image)
235-
236240
return cropped_video
237241

238242

0 commit comments

Comments
 (0)