Load pk data#540
Conversation
…provides a standardized workflow to load, classify, clean, and preprocess pharmacokinetic (PK) data from multiple file formats prior to NCA analysis. Key Features Automatic detection of concentration and dose datasets Flexible column mapping via regex patterns Support for both separate and combined datasets BLQ handling with interpolation (linear pre-Cmax, log-linear post-Cmax) Decimal precision standardization Minimal assumptions about input data structure
| cowplot, | ||
| ggplot2, | ||
| ggtibble, | ||
| haven, |
There was a problem hiding this comment.
Can it work just importing rio and not haven?
| withr | ||
| withr, | ||
| rio, | ||
| zoo, |
There was a problem hiding this comment.
I'd like to avoid more than the required number of imports. If possible, please omit zoo and janitor.
| #' Streamlines loading, cleaning, and standardisation of PK data from multiple | ||
| #' file formats (XPT, XLSX, XLS, CSV, TXT, SAS7BDAT). | ||
| #' | ||
| #' @param path Character. Directory containing PK files. Default: \code{getwd()}. |
There was a problem hiding this comment.
Please do not make this have a default of getwd() as that would not typically be reproducible.
| #' \code{c("xpt","xlsx","xls","csv","txt","sas7bdat")}. | ||
| #' @param patterns Named list. Regex patterns for PK column roles. | ||
| #' See \code{\link{get_pk_patterns}}. | ||
| #' @param decimal_control Logical. Apply smart decimal formatting? Default \code{TRUE}. |
There was a problem hiding this comment.
Please define what "smart decimal formatting" means.
| #' @param patterns Named list. Regex patterns for PK column roles. | ||
| #' See \code{\link{get_pk_patterns}}. | ||
| #' @param decimal_control Logical. Apply smart decimal formatting? Default \code{TRUE}. | ||
| #' @param blq_handling Logical. Apply BLQ interpolation? Default \code{TRUE}. |
There was a problem hiding this comment.
blq_handling should all occur within PKNCA. The only part that may happen before PKNCA is conversion (e.g. the text "BLQ", "BLOQ", "BQL", etc. may be converted to 0 for the user). if that is the goal, please change "interpolation" to "conversion" and give a more detailed example of what it means in the details section of this code.
| #' detected. Warns if duplicate time values suggest multiple subjects. | ||
| #' | ||
| #' @keywords internal | ||
| auto_create_subject_id <- function(df, verbose = FALSE) { |
There was a problem hiding this comment.
No subject column should be needed (or created) if no subject is given. Please remove this.
| time_col <- get_mapped_column(data, "time") | ||
| conc_col <- get_mapped_column(data, "conc") | ||
|
|
||
| blq_strings <- c("blq", "bloq", "bql", "lloq", "na", "nr", "", |
There was a problem hiding this comment.
"na", "nr", "", and "nd" are no data indicators and not BLQ. They should be set to NA and not to 0.
| #' Interpolate BLQ Values for a Single Subject | ||
| #' | ||
| #' @keywords internal | ||
| interpolate_subject <- function(sub_df, time_col, conc_col, verbose) { |
There was a problem hiding this comment.
Please do not interpolate BLQ values. They are important for the PKNCA calculations and are used within the calculation methods based on user preferences there. Please keep the data loading scripts focused on loading the data only.
| #' ensuring the metadata actually persists. | ||
| #' | ||
| #' @keywords internal | ||
| decimal_formatter <- function(df, col_max_map, verbose) { |
There was a problem hiding this comment.
Please do not reformat the columns, only load the data and categorize the type of data.
| # ============================================================================= | ||
| # 10. Usage Example (wrapped in if (FALSE) so it never auto-runs) | ||
| # ============================================================================= | ||
| if (FALSE) { |
There was a problem hiding this comment.
Please do not put examples in the code. Please put this into a vignette.
Key Features
Automatic detection of concentration and dose datasets
Flexible column mapping via regex patterns
Support for both separate and combined datasets
BLQ handling with interpolation (linear pre-Cmax, log-linear post-Cmax)
Decimal precision standardization
Minimal assumptions about input data structure