Skip to content

Update spatialdata#130

Open
dariarom94 wants to merge 19 commits into
mainfrom
update_spatialdata
Open

Update spatialdata#130
dariarom94 wants to merge 19 commits into
mainfrom
update_spatialdata

Conversation

@dariarom94
Copy link
Copy Markdown
Contributor

Describe your changes

Upgrade spatialdata and zarr

Checklist before requesting a review

  • I have performed a self-review of my code

  • Check the correct box. Does this PR contain:

    • [ x] Breaking changes
    • New functionality
    • Major changes
    • Minor changes
    • Bug fixes
  • Proposed changes are described in the CHANGELOG.md

  • CI Tests succeed and look good!

- type: python
pypi: [squidpy, rasterio]
github: [theislab/txsim@dev]
# 1. remove pyarrow when https://github.com/scverse/spatialdata/issues/1007 is fixed.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Somehow this comment moved here, right?
But it's not super related anymore? The zarr things are fixed with this PR and pyarrow install I don't see

- type: boolean
name: --keep_files
required: true
default: true
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This argument I brought in for development purposes. Didn't think about setting it to true as default, to not have files laying around when running the loader somewhere else. But it's not really important I guess

- name: Inputs
arguments:
- type: string
- type: file
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had huge problems in the past when developing this component when setting type to file.
I don't recall exactly what was the problem, but I think it was that things then happen in the background via nextflow where I don't have insights to debug, and this was combined with very long download/access times of files

del sdata.tables[key]

# raw_ist.zarr stores the metadata table as 'table'; rename to match the output spec
if 'table' in sdata.tables and 'metadata' not in sdata.tables:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should still assume that 'table' could exist at this stage?
Do I understand correctly that the previous occurrences of 'table' were all renamed to 'metadata' mainly directly in the data processing script, so we have it from the beginning of the pipeline? Or is there another 'table' generated in other steps? If the latter is the case, then fine.
But Otherwise I guess this fix here is because the test data hasn't been updated? Think it would be better to update the test data then

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah okay, I see it now! E.g. in binning we do generate a 'table' - all good then

transcripts_df = sdata_transcripts["transcripts"].compute()
transcripts_assigned = transcripts_df[transcripts_df["cell_id"] != 0]
cell_shapes = transcripts_assigned.groupby("cell_id")[["x", "y"]].apply(
lambda g: MultiPoint(list(zip(g["x"], g["y"]))).convex_hull
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just out of interest, was this tested with a lot of cells? I.e. does this implementation scale well? (was this taken from sopa or so?)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants