Save hash and path grid informations when generating FK Tables#247
Save hash and path grid informations when generating FK Tables#247kamillaurent wants to merge 3 commits into
Conversation
scarlehoff
left a comment
There was a problem hiding this comment.
LGTM.
(haven't tested it, but I trust the OP is a real-life test)
| if grid_path is not None: | ||
| grid_path_obj = pathlib.Path(grid_path) | ||
| grid_hash = hashlib.md5(grid_path_obj.read_bytes()).hexdigest() | ||
| grid_files = {grid_path_obj.stem: {"hash": grid_hash, "path": str(grid_path_obj.resolve())}} |
There was a problem hiding this comment.
| grid_files = {grid_path_obj.stem: {"hash": grid_hash, "path": str(grid_path_obj.resolve())}} | |
| grid_files = {grid_path_obj.stem: {"hash": grid_hash, "theory_folder": grid_path_obj.parent.name}} |
Out of privacy concerns, I would just store the path starting from the theory_slim folder.
My suggestion is, since you have the grid name as the key, save just the theory folder name.
There was a problem hiding this comment.
agreed on the privacy, but I think I would keep the resolve(), because I think this way we can resolve symbolic links, right? i.e. .resolve().parent (or similar)
|
Hi @kamillaurent please do the small change requested and fix the issues found by pre-commit (usually just means you didn't run pre-commit, if you do the changes then git add the |
felixhekhorn
left a comment
There was a problem hiding this comment.
Remember the other things we discussed:
- split in two simple keys "grid_path" and "grid_hash"
- address FONLL magic
| if grid_path is not None: | ||
| grid_path_obj = pathlib.Path(grid_path) | ||
| grid_hash = hashlib.md5(grid_path_obj.read_bytes()).hexdigest() | ||
| grid_files = {grid_path_obj.stem: {"hash": grid_hash, "path": str(grid_path_obj.resolve())}} |
There was a problem hiding this comment.
agreed on the privacy, but I think I would keep the resolve(), because I think this way we can resolve symbolic links, right? i.e. .resolve().parent (or similar)
|
@kamillaurent please address the changes and run pre-commit so that we can merge this |
|
There's a few conflicts due to having merge @Radonirinaunimi's PR but they are in the documentation of the functions so it should be quick to fix. It would be good to have and use this PR for the next batch of fk tables so we have this information in 😅 |
…etadata, keeping path to grids from pineko
|
Now I made the informations in the metadata more readable. For example, using the command When we use the FONLL procedure, in the last step many FK-tables get merged into one. I am working on retaining the info about hash and theory of the original FKs into the merged one. |
| if grid_path is not None: | ||
| grid_path_obj = pathlib.Path(grid_path).resolve() | ||
| grid_hash = hashlib.md5(grid_path_obj.read_bytes()).hexdigest() | ||
| grid_path_parts = grid_path_obj.parts | ||
| if "pineko" in grid_path_parts: | ||
| pineko_idx = grid_path_parts.index("pineko") | ||
| display_path = str(pathlib.Path("/", *grid_path_parts[pineko_idx + 1 :])) | ||
| elif "data" in grid_path_parts: | ||
| data_idx = grid_path_parts.index("data") | ||
| display_path = str(pathlib.Path("/", *grid_path_parts[data_idx:])) | ||
| else: | ||
| display_path = grid_path_obj.name | ||
| fktable.set_metadata("grid_hash", grid_hash) | ||
| fktable.set_metadata("grid_theory", grid_path_obj.parent.name) | ||
| fktable.set_metadata("grid_path", display_path) |
There was a problem hiding this comment.
- This is far too much code duplication - this needs to be encapsulated
- do we really need
grid_path? this looks fairly complicated ...
There was a problem hiding this comment.
Yes, the grids path part is a bit more complicated than needed, data/grids are not important.
Once you have the grid path, the "path of the grid itself" is grid_path.name and the theory folder grid_path.parent.name. You don't need anything else.
When a FK Table is generated, the hash information and path of the
pineapplgrid used for it are saved in the metadata of the FK Table.These information can be read using the
pinealppl-cli, for example:pineappl read --show ATLAS_WZ_TOT_13TEV-ATLASWZTOT13TEV81PB_Z_tot.pineappl.lz4outputs, between the other informations, this lines:
This PR is related to issue #225