Skip to content

Save hash and path grid informations when generating FK Tables#247

Open
kamillaurent wants to merge 3 commits into
mainfrom
hash_theory
Open

Save hash and path grid informations when generating FK Tables#247
kamillaurent wants to merge 3 commits into
mainfrom
hash_theory

Conversation

@kamillaurent
Copy link
Copy Markdown

@kamillaurent kamillaurent commented Apr 28, 2026

When a FK Table is generated, the hash information and path of the pineappl grid used for it are saved in the metadata of the FK Table.

These information can be read using the pinealppl-cli, for example:

pineappl read --show ATLAS_WZ_TOT_13TEV-ATLASWZTOT13TEV81PB_Z_tot.pineappl.lz4

outputs, between the other informations, this lines:

grid_files: {"ATLAS_WZ_TOT_13TEV-ATLASWZTOT13TEV81PB_Z_tot.pineappl": {"hash": "f1110fd0bb8ccddf28bf722f9f9c3dd5", "path": "/data/theorie/klaurent/FKTables/pineko/data/grids/41000000/ATLAS_WZ_TOT_13TEV-ATLASWZTOT13TEV81PB_Z_tot.pineappl.lz4"}}

This PR is related to issue #225

Copy link
Copy Markdown
Member

@scarlehoff scarlehoff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.
(haven't tested it, but I trust the OP is a real-life test)

Comment thread src/pineko/evolve.py Outdated
if grid_path is not None:
grid_path_obj = pathlib.Path(grid_path)
grid_hash = hashlib.md5(grid_path_obj.read_bytes()).hexdigest()
grid_files = {grid_path_obj.stem: {"hash": grid_hash, "path": str(grid_path_obj.resolve())}}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
grid_files = {grid_path_obj.stem: {"hash": grid_hash, "path": str(grid_path_obj.resolve())}}
grid_files = {grid_path_obj.stem: {"hash": grid_hash, "theory_folder": grid_path_obj.parent.name}}

Out of privacy concerns, I would just store the path starting from the theory_slim folder.

My suggestion is, since you have the grid name as the key, save just the theory folder name.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed on the privacy, but I think I would keep the resolve(), because I think this way we can resolve symbolic links, right? i.e. .resolve().parent (or similar)

@scarlehoff
Copy link
Copy Markdown
Member

Hi @kamillaurent please do the small change requested and fix the issues found by pre-commit (usually just means you didn't run pre-commit, if you do the changes then git add the evolve.py file and run pre-commit it will be fixed.

Copy link
Copy Markdown
Contributor

@felixhekhorn felixhekhorn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remember the other things we discussed:

  • split in two simple keys "grid_path" and "grid_hash"
  • address FONLL magic

Comment thread src/pineko/evolve.py Outdated
if grid_path is not None:
grid_path_obj = pathlib.Path(grid_path)
grid_hash = hashlib.md5(grid_path_obj.read_bytes()).hexdigest()
grid_files = {grid_path_obj.stem: {"hash": grid_hash, "path": str(grid_path_obj.resolve())}}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed on the privacy, but I think I would keep the resolve(), because I think this way we can resolve symbolic links, right? i.e. .resolve().parent (or similar)

@scarlehoff
Copy link
Copy Markdown
Member

@kamillaurent please address the changes and run pre-commit so that we can merge this

@scarlehoff
Copy link
Copy Markdown
Member

There's a few conflicts due to having merge @Radonirinaunimi's PR but they are in the documentation of the functions so it should be quick to fix. It would be good to have and use this PR for the next batch of fk tables so we have this information in 😅

Kamil Laurent added 2 commits May 26, 2026 09:57
@kamillaurent
Copy link
Copy Markdown
Author

Now I made the informations in the metadata more readable. For example, using the command
pineappl read --show ATLAS_WZ_TOT_13TEV-ATLASWZTOT13TEV81PB_Z_tot.pineappl.lz4 one can find the informations:

...
grid_hash: f1110fd0bb8ccddf28bf722f9f9c3dd5
grid_path: /data/grids/41000000/ATLAS_WZ_TOT_13TEV-ATLASWZTOT13TEV81PB_Z_tot.pineappl.lz4
grid_theory: 41000000
...

When we use the FONLL procedure, in the last step many FK-tables get merged into one. I am working on retaining the info about hash and theory of the original FKs into the merged one.

Comment thread src/pineko/evolve.py
Comment on lines +117 to +131
if grid_path is not None:
grid_path_obj = pathlib.Path(grid_path).resolve()
grid_hash = hashlib.md5(grid_path_obj.read_bytes()).hexdigest()
grid_path_parts = grid_path_obj.parts
if "pineko" in grid_path_parts:
pineko_idx = grid_path_parts.index("pineko")
display_path = str(pathlib.Path("/", *grid_path_parts[pineko_idx + 1 :]))
elif "data" in grid_path_parts:
data_idx = grid_path_parts.index("data")
display_path = str(pathlib.Path("/", *grid_path_parts[data_idx:]))
else:
display_path = grid_path_obj.name
fktable.set_metadata("grid_hash", grid_hash)
fktable.set_metadata("grid_theory", grid_path_obj.parent.name)
fktable.set_metadata("grid_path", display_path)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. This is far too much code duplication - this needs to be encapsulated
  2. do we really need grid_path? this looks fairly complicated ...

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the grids path part is a bit more complicated than needed, data/grids are not important.

Once you have the grid path, the "path of the grid itself" is grid_path.name and the theory folder grid_path.parent.name. You don't need anything else.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants