Skip to content

refactor: ♻️ update shared primary keys#106

Open
martonvago wants to merge 6 commits into
mainfrom
refactor/primary-key
Open

refactor: ♻️ update shared primary keys#106
martonvago wants to merge 6 commits into
mainfrom
refactor/primary-key

Conversation

@martonvago

@martonvago martonvago commented Jun 10, 2026

Copy link
Copy Markdown
Collaborator

Description

This PR updates the primary keys common to all resources.
Based on my understanding of the situation and our discussions, but very open to comments.
I tried to include all my assumptions.

Closes #95

This PR needs an in-depth review.

Checklist

  • Ran just run-all

description="The unique identifier of the participant.",
constraints=sp.ConstraintsProperties(required=True),
)
event_field = sp.FieldProperties(

@martonvago martonvago Jun 10, 2026

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is still the unique name of the event that we get from the raw data. There is also a numeric event ID, but we would need to look that up separately using another API request.

As for the relationship between events and visits, my understanding is that some events belong to a visit and some don't. So I don't know if adding a visit_id field to all resources makes sense because sometimes this will be empty. There are also forms with consult or week in the name, so that could be equivalent to visit. Maybe event is a broader category (any interaction with the participant) and visit is a particular type of interaction.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A visit ID should never be empty tho 🤔 Or at least in very few exceptions, like things collected between visits like the adverse events, but that will still have date fields that can line up with the visits. At the least, there should be a visit date.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What field does this relate to? Would we rename from another field? If so, would be good to put as a comment above the field variable.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is from the raw redcap_event_name field, which is given for each row of the raw data. Event is the concept REDCap forms are organised around, visit is one kind of event in this study.

Visit ID can be missing if the row doesn't come from an instrument/form that is associated with a visit. I found about 24 forms like this. I looked at the fields for each to see if there is a date. There are date fields in each (e.g., contact_date, date_dietary_deviation_1, date_diet_consult_1), and some are also clearly associated with a study week or with another type of event (e.g., a consultation or a group meeting).

So we could pull out a date from each form, but it could be a bit faffy because we would need to check which field has the date in each. And then for forms that are associated with a visit, we would need to find the correct visit (based on the form name, event name, or field names) and then find the date for that visit. What do you think?

)
default_fields = [event_field, center_field]
primary_key = ["event"]
submission_field = sp.FieldProperties(

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is renamed from redcap_repeat_instance. Not all forms can repeat (= be submitted multiple times for the same participant and the same event), but when they do repeat, this number tells apart the submissions. I put it as a string to match the other IDs.

We can either add this everywhere and set it to 1 where the form doesn't repeat, or add it only for forms that can repeat.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, I think only for forms that repeat.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I made it like that!
I included downloading the list of repeating instruments in the metadata download, so I can check which ones repeat.

Comment thread scripts/redcap_dict_to_properties.py Outdated
constraints=sp.ConstraintsProperties(required=True),
)
default_fields = [participant_field, event_field, submission_field, center_field]
primary_key = ["participant_id", "event_id", "submission_id"]

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't add an equivalent to redcap_repeat_instrument here because that should be the resource itself. When we transform the resources (like with VAS), we derive an extra PK component from the original instrument/form name, so that should maintain the distinction between rows coming from different instruments/forms.

@martonvago martonvago moved this from Todo to In review in Data development Jun 10, 2026
@martonvago martonvago marked this pull request as ready for review June 10, 2026 16:30
@martonvago martonvago requested a review from a team as a code owner June 10, 2026 16:30

@lwjohnst86 lwjohnst86 left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Some questions ☺️

Comment on lines 225 to 243
@@ -234,8 +241,15 @@ def _form_to_resource(
enum=["Copenhagen", "Aarhus", "Odense"],
),
)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This field still needs to be more clear what it actually is. How does it relate to visit? Could you try to expand the description to make it clearer what it means? Maybe we can also rename it something clearer 🤔

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding is that an event is a collection of forms/instruments filled in at the same time. I think an event usually corresponds to a patient interaction (e.g., visit, consultation, group meeting, phone call), but it can also be more abstract (e.g., randomisation). So all visits are events but not all events are visits. I think this is why we initially decided not to rename event to visit, but there may be a better name.

@K-Beicher, is this correct?

description="The unique identifier of the participant.",
constraints=sp.ConstraintsProperties(required=True),
)
event_field = sp.FieldProperties(

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A visit ID should never be empty tho 🤔 Or at least in very few exceptions, like things collected between visits like the adverse events, but that will still have date fields that can line up with the visits. At the least, there should be a visit date.

)
default_fields = [event_field, center_field]
primary_key = ["event"]
submission_field = sp.FieldProperties(

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, I think only for forms that repeat.

Comment thread scripts/redcap_dict_to_properties.py Outdated
description="The unique identifier of the participant.",
constraints=sp.ConstraintsProperties(required=True),
)
event_field = sp.FieldProperties(

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What field does this relate to? Would we rename from another field? If so, would be good to put as a comment above the field variable.

Comment thread scripts/redcap_dict_to_properties.py
@github-project-automation github-project-automation Bot moved this from In review to In progress in Data development Jun 11, 2026

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated all metadata when I downloaded the list of repeating instruments

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we add more requests, it will probably be worth generalising some logic here.

@martonvago martonvago moved this from In progress to In review in Data development Jun 11, 2026
@martonvago martonvago requested a review from lwjohnst86 June 11, 2026 14:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: In review

Development

Successfully merging this pull request may close these issues.

Correct primary key in resources?

2 participants