Update spatialdata#130
Conversation
…_ist_preprocessing into update_spatialdata
| - type: python | ||
| pypi: [squidpy, rasterio] | ||
| github: [theislab/txsim@dev] | ||
| # 1. remove pyarrow when https://github.com/scverse/spatialdata/issues/1007 is fixed. |
There was a problem hiding this comment.
Somehow this comment moved here, right?
But it's not super related anymore? The zarr things are fixed with this PR and pyarrow install I don't see
| - type: boolean | ||
| name: --keep_files | ||
| required: true | ||
| default: true |
There was a problem hiding this comment.
This argument I brought in for development purposes. Didn't think about setting it to true as default, to not have files laying around when running the loader somewhere else. But it's not really important I guess
| - name: Inputs | ||
| arguments: | ||
| - type: string | ||
| - type: file |
There was a problem hiding this comment.
I had huge problems in the past when developing this component when setting type to file.
I don't recall exactly what was the problem, but I think it was that things then happen in the background via nextflow where I don't have insights to debug, and this was combined with very long download/access times of files
| del sdata.tables[key] | ||
|
|
||
| # raw_ist.zarr stores the metadata table as 'table'; rename to match the output spec | ||
| if 'table' in sdata.tables and 'metadata' not in sdata.tables: |
There was a problem hiding this comment.
I wonder if we should still assume that 'table' could exist at this stage?
Do I understand correctly that the previous occurrences of 'table' were all renamed to 'metadata' mainly directly in the data processing script, so we have it from the beginning of the pipeline? Or is there another 'table' generated in other steps? If the latter is the case, then fine.
But Otherwise I guess this fix here is because the test data hasn't been updated? Think it would be better to update the test data then
There was a problem hiding this comment.
Ah okay, I see it now! E.g. in binning we do generate a 'table' - all good then
| transcripts_df = sdata_transcripts["transcripts"].compute() | ||
| transcripts_assigned = transcripts_df[transcripts_df["cell_id"] != 0] | ||
| cell_shapes = transcripts_assigned.groupby("cell_id")[["x", "y"]].apply( | ||
| lambda g: MultiPoint(list(zip(g["x"], g["y"]))).convex_hull |
There was a problem hiding this comment.
Just out of interest, was this tested with a lot of cells? I.e. does this implementation scale well? (was this taken from sopa or so?)
Describe your changes
Upgrade spatialdata and zarr
Checklist before requesting a review
I have performed a self-review of my code
Check the correct box. Does this PR contain:
Proposed changes are described in the CHANGELOG.md
CI Tests succeed and look good!