Add chunk infer#13
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: d7a7519040
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| chunk_list, batch_first=True | ||
| ).transpose(1, 2) | ||
|
|
||
| feature_lens_after_cnn = self._compute_downsampled_length(chunk_lengths) |
There was a problem hiding this comment.
Validate chunk window against downsample rounding
When n_window is changed so chunk_frames is not aligned with the three stride-2 convs (for example n_window=201, producing 402-frame chunks), computing downsampled lengths per chunk here makes the encoder emit sum(ceil(chunk_len / 8)) audio tokens. The processor still creates placeholders with MossAudioProcessor._conv3_downsample_len(raw_len) over the whole audio, so an 804-frame input would produce 102 embeds but only 101 audio_input_mask slots, causing the later Audio token count mismatch error. Either constrain/validate the chunk size or make placeholder counting use the same chunked length calculation.
Useful? React with 👍 / 👎.
No description provided.