Add MultiViewLightGBM model#430
Conversation
df5be1f to
61e15af
Compare
|
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## development #430 +/- ##
===============================================
+ Coverage 80.34% 80.92% +0.58%
===============================================
Files 101 103 +2
Lines 8171 8336 +165
===============================================
+ Hits 6565 6746 +181
+ Misses 1606 1590 -16 ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
| max_depth: | ||
| - 10 | ||
| num_leaves: | ||
| - 63 | ||
| - 127 | ||
| subsample: | ||
| - 0.8 | ||
| colsample_bytree: | ||
| - 0.6 | ||
| - 0.8 | ||
| reg_alpha: | ||
| - 0 | ||
| - 1 |
There was a problem hiding this comment.
Comment by Claude because I'm not an expert on LightGBM and its standard parameters, but maybe worth looking into:
num_leaves vs max_depth are in conflict
This is the main issue. LightGBM is a leaf-wise tree grower, so num_leaves is the primary complexity control — max_depth is secondary and mainly used as a guardrail. The rule of thumb is:
num_leaves < 2^max_depth
With max_depth: 10, you could have up to 2^10 = 1024 leaves. Your num_leaves values of 63 and 127 are well within that, so there's no hard conflict — but if you intend max_depth: 10 to constrain complexity, it's effectively doing nothing here since num_leaves is already much smaller. You'd be better off either:
- Dropping max_depth and just tuning num_leaves, or
- Setting max_depth to something tighter like 6 or 7 if you actually want it to bite
num_leaves: 127 can be aggressive
127 leaves is fairly complex — fine for large datasets, but if your dataset is small/medium you may be giving the model too much capacity. Worth adding a smaller value like 31 to the search.
reg_lambda: 0.1 only
You're searching reg_alpha over two values but reg_lambda (L2) over only one. LightGBM defaults reg_lambda to 0.0, so 0.1 already adds regularization — but it's worth also trying 0 and maybe 1 here for symmetry with how you're treating reg_alpha.
| xgboost = { version = "^3.2.0", optional = true } | ||
| lightgbm = { version = "^4.0.0", optional = true } | ||
| typer = ">=0.26,<0.27" | ||
| rich = "^15.0.0" | ||
| gseapy = { version = "^1.1.0", optional = true } |
There was a problem hiding this comment.
Just seeing this now: any particular reason why the xgboost, lightgbm, and gseapy requirements HAVE to be this version? I'd prefer >= for easier maintenance.
61e15af to
89f0cab
Compare
|
Thanks! Could you also run the model and add it to the leaderboard? |
Description
Add MultiViewLightGBM model. Supports gene expression, methylation, mutations, copy number variation, and proteomics and drug fingerprints
Changes
New features
MultiViewLightGBMmodeldrevalpy/models/__init__.pyhyperparameters.yamllightgbmas optional dependency inpyproject.tomllightgbmto nox test session extras innoxfile.pyMultiViewLightGBMto baseline tests