| Title: | 'REDCap' Data Management |
|---|---|
| Description: | REDCap Data Management - 'REDCap' (Research Electronic Data CAPture; <https://projectredcap.org>) is a web application developed at Vanderbilt University, designed for creating and managing online surveys and databases and the REDCap API is an interface that allows external applications to connect to REDCap remotely, and is used to programmatically retrieve or modify project data or settings within REDCap, such as importing or exporting data. REDCapDM is an R package that allows users to manage data exported directly from REDCap or using an API connection. This package includes several functions designed for pre-processing data, generating reports of queries such as outliers or missing values, and following up on previously identified queries. |
| Authors: | João Carmezim [aut, cre], Pau Satorra [aut], Judith Peñafiel [aut], Esther García [aut], Natàlia Pallarès [aut], Cristian Tebé [aut] |
| Maintainer: | João Carmezim <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 1.0.1 |
| Built: | 2026-06-03 08:48:28 UTC |
| Source: | https://github.com/bruigtp/redcapdm |
This helper function processes the project argument in the several other functions.
It extracts data, dictionary, event form, and results from the project object while handling
potential duplication and providing warnings when arguments are provided redundantly.
check_proj(project, data = NULL, dic = NULL, event_form = NULL)check_proj(project, data = NULL, dic = NULL, event_form = NULL)
project |
A list or object containing |
data |
A data frame (optional) that may be overridden if provided in the |
dic |
A data dictionary (optional) that may be overridden if provided in the |
event_form |
An optional event-form object that may be overridden if provided in the |
Compare an older query report (old) with a newer one (new) and classify each
query into one of four statuses:
Pending — the same query is present in both reports (no change detected),
Solved — the query was present in the old report but is absent from the new report,
New — the query appears in the new report but was not present in the old report,
Miscorrected — a special case where a query in the new report is marked as New
but shares the same Identifier and Description as an existing record (suggesting a
re-issued or modified query for the same identifier).
The function returns a detailed comparison table of queries with a Modification factor (one of the four statuses) and an HTML summary table showing counts per status.
check_queries(old, new, report_title = NULL, return_viewer = TRUE)check_queries(old, new, report_title = NULL, return_viewer = TRUE)
old |
Data frame containing the previous (older) query report. Must include
|
new |
Data frame containing the newer query report. Must include
|
report_title |
Optional single string used as the caption for the HTML summary table.
Defaults to |
return_viewer |
Logical; if |
Requirements:
Both old and new must be data frames.
Both data frames must contain at least the following character columns:
Identifier, Description, and Query.
A Code column is optional; if present it will be preserved and considered
for sorting and output. If Code exists in new, it is removed at the
beginning of the routine to avoid conflicts with re-assigned codes.
The function merges the two reports, constructs composite keys used for comparison, classifies each row into a modification state, detects and re-labels Miscorrected cases, reassigns a Code per Identifier to keep codes consistent, and returns a detailed dataset plus an optional HTML summary viewer.
A list with two elements:
queriesA data frame containing all queries present in either old or new.
A factor column Modification indicates the state for each row (levels: Pending,
Solved, Miscorrected, New). The function also reassigns Code
values so codes are consistent per Identifier.
resultsIf return_viewer = TRUE, an HTML knitr::kable (styled with
kableExtra) summarising totals per state. If return_viewer = FALSE, this is NULL.
Column types: If Identifier, Description or Query are
factors, they will be used in the comparison — it is recommended to convert them to
character prior to calling check_queries() to avoid factor-level mismatches.
Sorting: When Identifier values contain a dash (e.g. "100-20"),
the function attempts to split into numeric center and id parts for
logical ordering. Otherwise Identifier is coerced to numeric for ordering.
Miscorrected detection: A Miscorrected label is assigned when more
than one row shares the same Identifier + Description composite and a row is
otherwise classified as New — this signals a likely re-issued or modified query
for an existing identifier.
# Minimal reproducible example old <- data.frame( Identifier = c("100-1", "100-2", "200-1"), Description = c("age check", "weight check", "lab miss"), Query = c("is.na(age)", "is.na(weight)", "missing lab result"), Code = c("100-1-1", "100-2-1", "200-1-1"), stringsAsFactors = FALSE ) new <- data.frame( Identifier = c("100-1", "200-1", "300-1"), Description = c("age check", "lab miss", "new query"), Query = c("is.na(age)", "missing lab result (clarify)", "is.na(x)"), stringsAsFactors = FALSE ) res <- check_queries(old = old, new = new, report_title = "My Query Comparison") # detailed table head(res$queries) # HTML summary (if in an RMarkdown or interactive viewer) res$results# Minimal reproducible example old <- data.frame( Identifier = c("100-1", "100-2", "200-1"), Description = c("age check", "weight check", "lab miss"), Query = c("is.na(age)", "is.na(weight)", "missing lab result"), Code = c("100-1-1", "100-2-1", "200-1-1"), stringsAsFactors = FALSE ) new <- data.frame( Identifier = c("100-1", "200-1", "300-1"), Description = c("age check", "lab miss", "new query"), Query = c("is.na(age)", "missing lab result (clarify)", "is.na(x)"), stringsAsFactors = FALSE ) res <- check_queries(old = old, new = new, report_title = "My Query Comparison") # detailed table head(res$queries) # HTML summary (if in an RMarkdown or interactive viewer) res$results
This function updates the names of checkboxes in the dataset and dictionary to reflect the names of their options.
checkbox_names(data, dic, labels, checkbox_labels = c("No", "Yes"))checkbox_names(data, dic, labels, checkbox_labels = c("No", "Yes"))
data |
Dataset containing the REDCap data. |
dic |
Dataset containing the REDCap dictionary. |
labels |
Named character vector with the names of the variables in the data and their corresponding REDCap labels. |
checkbox_labels |
Character vector specifying the names for the two options of each checkbox variable. The default is |
A random sample of the COVICAN study. An international, multicentre cohort study of cancer patients with COVID-19 to describe the epidemiology, risk factors, and clinical outcomes of co-infections and superinfections in onco-haematological patients with COVID-19.
data(covican)data(covican)
A data frame with 342 rows and 56 columns
Identifier of each record. This information does not match the real data.
Auto-generated name of the events
Auto-generated name of each center. This information does not match the real data.
Inclusion criteria of 'Patients older than 18 years' (0 = No ; 1 = Yes)
Inclusion criteria of 'Cancer patients' (0 = No ; 1 = Yes)
Inclusion criteria of 'Diagnosed of COVID-19' (0 = No ; 1 = Yes)
Exclusion criteria of 'Solid tumour remission >1 year' (0 = No ; 1 = Yes)
Indicator of non-compliance with inclusion and exclusion criteria (0 = compliance ; 1 = non-compliance)
Date of birth (y-m-d). This date does not correspond to the original.
Date of first visit (y-m-d). This date does not correspond to the original.
Age in years
Indicator of diabetes (0 = No ; 1 = Yes)
Type of diabetes (1 = No complications ; 2 = End-organ diabetes-related disease)
Indicator of chronic obstructive pulmonary disease (0 = No ; 1 = Yes)
Fraction of inspired oxygen in percentage
Indicator of blood test available (0 = No ; 1 = Yes)
Potassium in mmol/L
Respiratory rate in bpm
Indicator of leukemia or lymphoma (0 = No ; 1 = Yes)
Indicator of acute leukemia (0 = No ; 1 = Yes)
Checkbox with the type of underlying disease (0 = Haematological cancer ; 1 = Solid tumour)
Checkbox with the type of underlying disease (1 = Acute myeloid leukemia ; 2 = Myelodysplastic syndrome ; 3 = Chronic myeloid leukaemia ; 4 = Acute lymphoblastic leukaemia ; 5 = Hodgkin lymphoma ; 6 = Non Hodgkin lymphoma ; 7 = Multiple myeloma ; 8 = Myelofibrosis ; 9 = Aplastic anaemia ; 10 = Chronic lymphocytic leukaemia ; 11 = Amyloidosis ; 12 = Other)
Indicator of urine culture: (0 = Not done ; 1 = Done)
Labels of the different variables
List with three data frames: the first one with the data, the second one with the dictionary (codebook) of the REDCap project and the last one with the instrument-event mappings of the REDCap project.
Gudiol, C., Durà-Miralles, X., Aguilar-Company, J., Hernández-Jiménez, P., Martínez-Cutillas, M., Fernandez-Avilés, F., Machado, M., Vázquez, L., Martín-Dávila, P., de Castro, N., Abdala, E., Sorli, L., Andermann, T. M., Márquez-Gómez, I., Morales, H., Gabilán, F., Ayaz, C. M., Kayaaslan, B., Aguilar-Guisado, M., Herrera, F. Royo-Cebrecos C, Peghin M, González-Rico C, Goikoetxea J, Salgueira S, Silva-Pinto A, Gutiérrez-Gutiérrez B, Cuellar S, Haidar G, Maluquer C, Marin M, Pallarès N, Carratalà J. (2021). Co-infections and superinfections complicating COVID-19 in cancer patients: A multicentre, international study. The Journal of infection, 83(3), 306–313. https://doi.org/10.1016/j.jinf.2021.07.014
This function fills all rows in the dataset with the value of a particular variable in a specified event. It is an auxiliary function used in the rd_rlogic function.
fill_data(which_event, which_var, data)fill_data(which_event, which_var, data)
which_event |
String specifying the name of the event. |
which_var |
String specifying the name of the variable. |
data |
Dataset containing the REDCap data. |
This function is used to process checkbox variables in a REDCap dataset. By default, it changes their default categories ("Unchecked" and "Checked") to new ones ("No" and "Yes). Optionally, the function can also evaluate the branching logic for checkbox fields and adjust the data and dictionary accordingly.
rd_checkbox( project = NULL, data = NULL, dic = NULL, event_form = NULL, checkbox_labels = c("No", "Yes"), checkbox_names = TRUE, na_logic = "missing" )rd_checkbox( project = NULL, data = NULL, dic = NULL, event_form = NULL, checkbox_labels = c("No", "Yes"), checkbox_names = TRUE, na_logic = "missing" )
project |
A list containing the REDCap data, dictionary, and event mapping (expected |
data |
A |
dic |
A |
event_form |
Only applicable for longitudinal projects (presence of events). Event-to-form mapping for longitudinal projects. |
checkbox_labels |
Character vector of length 2 for labels of unchecked/checked values. Default: |
checkbox_names |
Logical. If |
na_logic |
Controls how missing values are set based on branching logic. Must be one of |
Checkbox columns are expected in REDCap wide format (field___code).
Branching logic evaluation requires event_form for longitudinal projects.
Names are cleaned and truncated to 60 characters; uniqueness is enforced.
Fields that cannot be evaluated are listed in results.
A list with:
Transformed dataset with checkbox fields as factors and optionally renamed.
Updated dictionary with checkbox fields expanded and optionally renamed.
The event_form passed in (if applicable).
Summary of transformations and any fields needing review.
# Basic usage with a project object res <- rd_checkbox(covican) # With custom labels res <- rd_checkbox(data = covican$data, dic = covican$dictionary, checkbox_labels = c("Not present", "Present")) # Keep original checkbox names res <- rd_checkbox(covican, checkbox_names = FALSE) # Longitudinal project with NA logic res <- rd_checkbox(data = covican$data, dic = covican$dictionary, event_form = covican$event_form,, na_logic = "eval")# Basic usage with a project object res <- rd_checkbox(covican) # With custom labels res <- rd_checkbox(data = covican$data, dic = covican$dictionary, checkbox_labels = c("Not present", "Present")) # Keep original checkbox names res <- rd_checkbox(covican, checkbox_names = FALSE) # Longitudinal project with NA logic res <- rd_checkbox(data = covican$data, dic = covican$dictionary, event_form = covican$event_form,, na_logic = "eval")
Converts date and datetime fields in a REDCap dataset to appropriate R classes.
rd_dates(project = NULL, data = NULL, dic = NULL, event_form = NULL)rd_dates(project = NULL, data = NULL, dic = NULL, event_form = NULL)
project |
A list containing the REDCap data, dictionary, and event mapping (expected |
data |
A |
dic |
A |
event_form |
Only applicable for longitudinal projects (presence of events). Event-to-form mapping for longitudinal projects. |
The function performs the following tasks:
Detects date and datetime fields from the REDCap dictionary (date_* and datetime_* validation types).
Converts date fields to Date class.
Converts datetime fields to POSIXct class, treating empty strings as NA.
A list with the following elements:
The transformed dataset with date and datetime fields formatted as Date and POSIXct.
The original REDCap dictionary passed to the function.
The original event-form mapping (if applicable).
A summary of the transformations performed.
result <- rd_dates(covican) transformed_data <- result$dataresult <- rd_dates(covican) transformed_data <- result$data
Deletes selected variables from a REDCap dataset and its dictionary, keeping them consistent and preserving variable labels.
rd_delete_vars( project = NULL, data = NULL, dic = NULL, event_form = NULL, vars = NULL, pattern = NULL )rd_delete_vars( project = NULL, data = NULL, dic = NULL, event_form = NULL, vars = NULL, pattern = NULL )
project |
A list containing the REDCap data, dictionary, and event mapping (expected |
data |
A |
dic |
A |
event_form |
Only applicable for longitudinal projects (presence of events). Event-to-form mapping for longitudinal projects. |
vars |
Optional. A character vector of variable names to remove from both the dataset and dictionary. |
pattern |
Optional. A character vector of regular expression patterns. Variables matching these patterns will be removed from the dataset and dictionary. |
Ensure that at least one of vars or pattern is specified.
Removes specified variables and their factor versions (e.g., variable.factor) from the dataset.
Removes matching variables from the dictionary.
Warns about factor versions of variables matching patterns, recommending use of rd_factor() if necessary.
A list containing:
The updated dataset with specified variables removed.
The updated REDCap dictionary.
The original event-form mapping (if applicable).
A summary message describing the variable removal operation.
# Delete specific variables by name result <- rd_delete_vars( project = covican, vars = c("potassium", "leuk_lymph") ) # Delete variables matching patterns result <- rd_delete_vars( data = covican$data, dic = covican$dictionary, pattern = c("_complete$", "_other$") )# Delete specific variables by name result <- rd_delete_vars( project = covican, vars = c("potassium", "leuk_lymph") ) # Delete variables matching patterns result <- rd_delete_vars( data = covican$data, dic = covican$dictionary, pattern = c("_complete$", "_other$") )
Updates a REDCap data dictionary by converting branching logic and calculation expressions into valid R expressions. This ensures that conditional display rules and calculated fields in the dictionary can be programmatically evaluated with the dataset. Any variables that cannot be converted are reported in the results.
rd_dictionary(project = NULL, data = NULL, dic = NULL, event_form = NULL)rd_dictionary(project = NULL, data = NULL, dic = NULL, event_form = NULL)
project |
A list containing the REDCap data, dictionary, and event mapping (expected |
data |
A |
dic |
A |
event_form |
Only applicable for longitudinal projects (presence of events). Event-to-form mapping for longitudinal projects. |
The function performs the following tasks:
Evaluates and transforms branching logic expressions into valid R expressions using rd_rlogic.
Evaluates and converts calculation expressions for fields of type calc.
Generates a results summary table listing any variables that could not be converted.
A list with the following elements:
The original dataset passed to the function.
The updated data dictionary with modified branching logic and calculations.
The original event-form mapping (if applicable).
A summary of the transformations, including any variables with unconverted branching logic.
## Not run: result <- rd_dictionary(covican) print(result$results) updated_dic <- result$dictionary ## End(Not run)## Not run: result <- rd_dictionary(covican) print(result$results) updated_dic <- result$dictionary ## End(Not run)
Helps identify records in a REDCap longitudinal project that are missing one or more specified events. Because REDCap typically omits empty events from exports, an event that contains no data for a record will not appear. This function finds those absent events and returns a per-record query table and a summarized HTML report.
rd_event( project = NULL, data = NULL, dic = NULL, event_form = NULL, event, filter = NA, query_name = NA, addTo = NA, report_title = NA, report_zeros = FALSE, link = list() )rd_event( project = NULL, data = NULL, dic = NULL, event_form = NULL, event, filter = NA, query_name = NA, addTo = NA, report_title = NA, report_zeros = FALSE, link = list() )
project |
A list containing the REDCap data, dictionary, and event mapping (expected |
data |
A |
dic |
A |
event_form |
Only applicable for longitudinal projects (presence of events). Event-to-form mapping for longitudinal projects. |
event |
Character vector with one or more REDCap event names to check for missing records. |
filter |
Optional. A single character string containing a filter expression to subset the dataset before checking for missing events. Example: |
query_name |
Optional character vector describing each query. Defaults to a standard format: |
addTo |
Optional data frame from a previous query report to which the new results can be appended. |
report_title |
Optional string specifying the title of the final report. Defaults to |
report_zeros |
Logical, include variables with zero queries in the report. Default is |
link |
Optional list containing project information ( |
A named list with two elements:
queriesA data frame listing records missing the specified events.
Columns: Identifier, DAG, Event, Instrument,
Field, Repetition, Description, Query, Code,
and optionally Link. If no queries are found this will be an empty
data frame with the expected columns.
resultsAn HTML table (knitr::kable styled with kableExtra) summarising
the number of missing records per event. Returned as knitr::kable (HTML).
# Minimal reproducible example data0 <- data.frame( record_id = c("100-1", "100-2", "200-1"), redcap_event_name = c("baseline_arm_1", "baseline_arm_1", "follow_up_arm_1"), redcap_event_name.factor = factor(c("Baseline", "Baseline", "Follow-up")), stringsAsFactors = FALSE ) # Suppose we want to check that every record has the follow-up event res <- rd_event( data = data0, dic = data.frame(), # placeholder dictionary event = "follow_up_arm_1", report_zeros = TRUE ) # Records missing the event: res$queries # HTML summary (in RMarkdown or Viewer) res$results# Minimal reproducible example data0 <- data.frame( record_id = c("100-1", "100-2", "200-1"), redcap_event_name = c("baseline_arm_1", "baseline_arm_1", "follow_up_arm_1"), redcap_event_name.factor = factor(c("Baseline", "Baseline", "Follow-up")), stringsAsFactors = FALSE ) # Suppose we want to check that every record has the follow-up event res <- rd_event( data = data0, dic = data.frame(), # placeholder dictionary event = "follow_up_arm_1", report_zeros = TRUE ) # Records missing the event: res$queries # HTML summary (in RMarkdown or Viewer) res$results
Export a query dataset (e.g., from rd_query or rd_event) to an .xlsx file. The function can optionally convert a column of URLs into Excel hyperlinks and apply password protection to the worksheet.
rd_export( project = NULL, queries = NULL, column = NULL, sheet_name = NULL, path = NULL, password = NULL )rd_export( project = NULL, queries = NULL, column = NULL, sheet_name = NULL, path = NULL, password = NULL )
project |
A list containing the dataframe of queries and results (expected |
queries |
A data frame of identified queries. |
column |
Name of the column containing URLs to convert into hyperlinks. If |
sheet_name |
Name of the Excel sheet in the resulting |
path |
File path for saving the |
password |
Optional password to protect the worksheet from edits. |
An .xlsx file written to the specified path.
## Not run: rd_export( queries = my_queries, column = "Link", sheet_name = "My Queries", path = "queries.xlsx" ) ## End(Not run)## Not run: rd_export( queries = my_queries, column = "Link", sheet_name = "My Queries", path = "queries.xlsx" ) ## End(Not run)
Converts variables in a REDCap dataset with associated .factor columns into actual factor variables, while allowing the exclusion of specific variables. Ensures consistency with the dataset and preserves variable labels.
rd_factor( project = NULL, data = NULL, dic = NULL, event_form = NULL, exclude = NULL )rd_factor( project = NULL, data = NULL, dic = NULL, event_form = NULL, exclude = NULL )
project |
A list containing the REDCap data, dictionary, and event mapping (expected |
data |
A |
dic |
A |
event_form |
Only applicable for longitudinal projects (presence of events). Event-to-form mapping for longitudinal projects. |
exclude |
Optional character vector of variable names (use original names without the |
The function looks for columns ending in .factor and replaces the original variable values with those .factor values (converted to factors). It preserves variable labels. The exclude argument must contain base variable names (no .factor suffix); if any .factor names are passed to exclude the function will throw an informative error. The columns redcap_event_name, redcap_repeat_instrument and redcap_data_access_group (and their .factor counterparts) are handled specially to avoid altering event or access-group data.
A list with the following elements:
The transformed dataset with .factor columns applied as factors.
The dictionary used (unchanged).
The event-form mapping used (if applicable).
A brief text summary of the transformation.
## Not run: result <- rd_factor(covican) result <- rd_factor(covican, exclude = c("available_analytics", "urine_culture")) transformed_data <- result$data ## End(Not run)## Not run: result <- rd_factor(covican) result <- rd_factor(covican, exclude = c("available_analytics", "urine_culture")) transformed_data <- result$data ## End(Not run)
Sets selected variables to NA when a filter condition is satisfied. Useful for managing checkboxes or other fields without explicit gatekeeper questions.
rd_insert_na( project = NULL, data = NULL, dic = NULL, event_form = NULL, vars, filter )rd_insert_na( project = NULL, data = NULL, dic = NULL, event_form = NULL, vars, filter )
project |
A list containing the REDCap data, dictionary, and event mapping (expected |
data |
A |
dic |
A |
event_form |
Only applicable for longitudinal projects (presence of events). Event-to-form mapping for longitudinal projects. |
vars |
Character vector of variable names to set to |
filter |
A single logical expression (as string). Rows where the filter evaluates to |
Each variable is only updated in rows/events where both the variable and filter are present.
For longitudinal projects, event_form must be provided for proper event-level filtering.
Only one filter expression is allowed.
Variables and filter columns must exist in both data and dictionary.
A list with:
The dataset with NA inserted where the filter applies.
The unchanged dictionary.
The event_form passed in (if applicable).
Summary message of the changes applied.
# Set 'potassium' to NA where age < 65 ## Not run: data <- rd_insert_na( data = covican$data, dic = covican$dictionary, vars = "potassium", filter = "age < 65" ) table(data$potassium) ## End(Not run)# Set 'potassium' to NA where age < 65 ## Not run: data <- rd_insert_na( data = covican$data, dic = covican$dictionary, vars = "potassium", filter = "age < 65" ) table(data$potassium) ## End(Not run)
Detects and summarizes queries in a REDCap dataset based on specified expressions, filters, or a defined branching logic. Useful for identifying missing values, out-of-range values, or values that do not meet predefined criteria.
rd_query( project = NULL, variables = NA, expression = NA, negate = FALSE, event = NA, filter = NA, addTo = NA, variables_names = NA, query_name = NA, instrument = NA, report_title = NA, report_zeros = FALSE, by_dag = FALSE, link = list(), data = NULL, dic = NULL, event_form = NULL )rd_query( project = NULL, variables = NA, expression = NA, negate = FALSE, event = NA, filter = NA, addTo = NA, variables_names = NA, query_name = NA, instrument = NA, report_title = NA, report_zeros = FALSE, by_dag = FALSE, link = list(), data = NULL, dic = NULL, event_form = NULL )
project |
A list containing the REDCap data, dictionary, and event mapping (expected |
variables |
Character vector of variable names to check for queries. |
expression |
Character vector of R expressions to evaluate for each variable. |
negate |
Logical, if |
event |
Required for longitudinal projects to avoid overestimation. REDCap event(s) to analyze. |
filter |
Optional string of filters to apply to the dataset, such as the branching logic of a variable. |
addTo |
Optional data frame from a previous query report to which the new results can be appended. |
variables_names |
Optional character vector of descriptions for each variable. Defaults to the variables labels in the dictionary. |
query_name |
Optional character vector describing each query. Defaults to a standard format: |
instrument |
Optional REDCap instrument(s) for each variable. Defaults to the instrument reported in the dictionary. |
report_title |
Optional string specifying the title of the final report. Defaults to |
report_zeros |
Logical, include variables with zero queries in the report. Default is |
by_dag |
Logical, split results by Data Access Group (DAG). Default is |
link |
Optional list containing project information ( |
data |
A |
dic |
A |
event_form |
Only applicable for longitudinal projects (presence of events). Event-to-form mapping for longitudinal projects. |
The function performs the following steps:
Applies user-specified expressions to the selected variables to detect queries.
Optionally negates the expressions to find values that do not satisfy the condition.
Handles REDCap branching logic, converting it into R-compatible expressions for evaluation.
Applies additional user-specified filters before identifying queries.
Generates structured query results with metadata including:
Identifier (record_id)
DAG (if present)
Event and Instrument
Field, Repetition, Description, Query statement
Optional link to REDCap entry
Optionally combines results with previous query outputs using addTo.
Produces a summarized report, optionally including variables with zero queries.
Provides warnings for variables with branching logic that could not be automatically evaluated.
A list containing:
A data frame or a list of data frames (if by_dag = TRUE) with detailed query information for each record.
A formatted report (HTML table) summarizing total queries per variable, event, and DAG if applicable.
## Not run: # Identify missing values for multiple variables result <- rd_query(covican, variables = c("copd", "age"), expression = c("is.na(x)", "x %in% NA"), event = "baseline_visit_arm_1" ) result$results # Identify values exceeding a threshold result <- rd_query(covican, variables = "age", expression = "x > 20", event = "baseline_visit_arm_1" ) # Apply a filter to select subset of data result <- rd_query(covican, variables = "potassium", expression = "is.na(x)", event = "baseline_visit_arm_1", filter = "available_analytics == '1'" ) ## End(Not run)## Not run: # Identify missing values for multiple variables result <- rd_query(covican, variables = c("copd", "age"), expression = c("is.na(x)", "x %in% NA"), event = "baseline_visit_arm_1" ) result$results # Identify values exceeding a threshold result <- rd_query(covican, variables = "age", expression = "x > 20", event = "baseline_visit_arm_1" ) # Apply a filter to select subset of data result <- rd_query(covican, variables = "potassium", expression = "is.na(x)", event = "baseline_visit_arm_1", filter = "available_analytics == '1'" ) ## End(Not run)
Recalculates fields marked as calculated in the REDCap dictionary, compares them with the original values, and reports discrepancies. If any differences are found, new recalculated fields are added to the dataset and dictionary with _recalc appended to the names.
rd_recalculate( project = NULL, data = NULL, dic = NULL, event_form = NULL, exclude = NULL )rd_recalculate( project = NULL, data = NULL, dic = NULL, event_form = NULL, exclude = NULL )
project |
A list containing the REDCap data, dictionary, and event mapping (expected |
data |
A |
dic |
A |
event_form |
Only applicable for longitudinal projects (presence of events). Event-to-form mapping for longitudinal projects. |
exclude |
Optional. Character vector of field names to exclude from recalculation. |
Fields of type calc in the dictionary are recalculated.
Recalculated values are compared with the original values.
If differences exist, new fields [field_name]_recalc are added to the dataset and dictionary.
Works for single-event projects; for longitudinal projects, event_form must be provided.
Fields with incomplete branching logic or smart variables may fail to recalculate.
A list with:
The dataset with new _recalc fields for any differing calculated fields.
Updated dictionary including the new _recalc fields.
The event_form passed in (if applicable).
Summary report of the recalculation process, including tables of discrepancies.
# Recalculate all calculated fields ## Not run: results <- rd_recalculate( data = covican$data, dic = covican$dictionary, event_form = covican$event_form ) ## End(Not run) # Recalculate but exclude some variables ## Not run: results <- rd_recalculate( project = covican, exclude = c("age", "screening_fail_crit") ) ## End(Not run)# Recalculate all calculated fields ## Not run: results <- rd_recalculate( data = covican$data, dic = covican$dictionary, event_form = covican$event_form ) ## End(Not run) # Recalculate but exclude some variables ## Not run: results <- rd_recalculate( project = covican, exclude = c("age", "screening_fail_crit") ) ## End(Not run)
Converts a REDCap logic expression into R-compatible logic. Processes one logic expression (logic) for one target variable (var) at a time. Supports common REDCap operators (and, or, =, <, >, etc.) and handles event-specific logic in longitudinal projects. Logic involving smart variables or repeated instruments may require manual review.
rd_rlogic( project = NULL, data = NULL, dic = NULL, event_form = NULL, logic, var )rd_rlogic( project = NULL, data = NULL, dic = NULL, event_form = NULL, logic, var )
project |
A list containing the REDCap data, dictionary, and event mapping (expected |
data |
A |
dic |
A |
event_form |
Only applicable for longitudinal projects (presence of events). Event-to-form mapping for longitudinal projects. |
logic |
A single REDCap logic string (e.g., |
var |
A single string specifying the target variable the logic applies to. |
Translates REDCap operators and functions into R equivalents:
= → ==, <> → !=, and → &, or → |.
Converts functions like if(), rounddown(), datediff(), sum() to R equivalents.
Handles date transformations and empty strings ('') → NA.
Adjusts logic for longitudinal data using event_form if provided.
Evaluates the translated R logic against the dataset and returns the results.
Logic with repeated instruments, smart variables, or multiple events per variable may require manual inspection.
A list with:
The translated R-compatible logic as a string.
The evaluation of the translated logic on the provided dataset, filtered by event if applicable.
# Translate a single REDCap logic expression for one variable covican |> rd_rlogic( logic = "if([exc_1]='1' or [inc_1]='0' or [inc_2]='0' or [inc_3]='0', 1, 0)", var = "screening_fail_crit" )# Translate a single REDCap logic expression for one variable covican |> rd_rlogic( logic = "if([exc_1]='1' or [inc_1]='0' or [inc_2]='0' or [inc_3]='0', 1, 0)", var = "screening_fail_crit" )
Splits a REDCap dataset into separate datasets by form or event using the data dictionary. Supports both longitudinal and non-longitudinal projects and can return wide or long formats for repeated measures.
rd_split( project = NULL, data = NULL, dic = NULL, event_form = NULL, which = NULL, by = "form", wide = FALSE )rd_split( project = NULL, data = NULL, dic = NULL, event_form = NULL, which = NULL, by = "form", wide = FALSE )
project |
A list containing the REDCap data, dictionary, and event mapping (expected |
data |
A |
dic |
A |
event_form |
Only applicable for longitudinal projects (presence of events). Event-to-form mapping for longitudinal projects. |
which |
Optional. A single form or event to extract. If not provided, all forms or events are returned. |
by |
Character. Criteria to split the dataset: |
wide |
Logical. If |
Handles checkbox variables and REDCap default variables (_complete, _timestamp) appropriately.
For form-based splits in longitudinal projects, uses event_form to map variables to events.
Wide format expands repeated instances into multiple columns per record.
Filtering by which allows extracting a single form or event.
Projects with repeated instruments are handled by filtering on the redcap_repeat_instrument variable.
Depending on which and wide:
A data.frame or a list of data.frames representing the split datasets.
The original REDCap dictionary.
The original event-form mapping (if applicable).
A summary message of the splitting operation.
# Split by form and return wide format result <- covican |> rd_split(by = "form", wide = TRUE) print(result) # Split by event (long format) result <- covican |> rd_split(by = "event") print(result)# Split by form and return wide format result <- covican |> rd_split(by = "form", wide = TRUE) print(result) # Split by event (long format) result <- covican |> rd_split(by = "event") print(result)
Transforms the raw REDCap data read by the redcap_data function. The function runs in one-step pipeline all functions dedicated to processing the data and returns the transformed data and dictionary, along with a summary of each step done.
rd_transform( project = NULL, data = NULL, dic = NULL, event_form = NULL, checkbox_labels = c("No", "Yes"), na_logic = "none", exclude_recalc = NULL, exclude_factor = NULL, delete_vars = NULL, delete_pattern = NULL, final_format = "raw", which_event = NULL, which_form = NULL, wide = NULL )rd_transform( project = NULL, data = NULL, dic = NULL, event_form = NULL, checkbox_labels = c("No", "Yes"), na_logic = "none", exclude_recalc = NULL, exclude_factor = NULL, delete_vars = NULL, delete_pattern = NULL, final_format = "raw", which_event = NULL, which_form = NULL, wide = NULL )
project |
A list containing the REDCap data, dictionary, and event mapping (expected |
data |
A |
dic |
A |
event_form |
Only applicable for longitudinal projects (presence of events). Event-to-form mapping for longitudinal projects. |
checkbox_labels |
Character vector of length 2 for labels of unchecked/checked values. Default: |
na_logic |
Controls how missing values are set based on the branching logic of a checkbox. Must be one of |
exclude_recalc |
Optional. Character vector of field names to exclude from recalculation. |
exclude_factor |
Optional character vector of variable names (use original names without the |
delete_vars |
Optional. A character vector of variable names to remove from both the dataset and dictionary. |
delete_pattern |
Optional. A character vector of regular expression patterns. Variables matching these patterns will be removed from the dataset and dictionary. |
final_format |
Character string indicating the final format of the data. Options are |
which_event |
Character. If |
which_form |
Character. If |
wide |
Logical. If |
A list with elements:
Transformed data (data.frame or nested list when split).
Updated dictionary data.frame.
Event–form mapping (if applicable).
Character summary of transformation steps performed.
# Minimal usage (project object or data + dictionary) trans <- rd_transform(covican) # Custom checkbox labels trans <- rd_transform(covican, checkbox_labels = c("Not present", "Present")) # Return only a single form (wide) trans <- rd_transform(covican, final_format = "by_form", which_form = "laboratory_findings", wide = TRUE)# Minimal usage (project object or data + dictionary) trans <- rd_transform(covican) # Custom checkbox labels trans <- rd_transform(covican, checkbox_labels = c("Not present", "Present")) # Return only a single form (wide) trans <- rd_transform(covican, final_format = "by_form", which_form = "laboratory_findings", wide = TRUE)
This function recalculates each calculated field if the logic can be transcribed to R. Note that calculated fields containing smart-variables or variables from other events cannot be transcribed.
The function returns the dataset and dictionary with the recalculated variables appended (named as the original field plus _recalc), along with a summary table of the recalculation results.
recalculate(data, dic, event_form = NULL, exclude_recalc = NULL)recalculate(data, dic, event_form = NULL, exclude_recalc = NULL)
data |
Data frame containing data from REDCap. |
dic |
Data frame containing the dictionary read from REDCap. |
event_form |
Data frame containing the correspondence of each event with each form. |
exclude_recalc |
Character vector with the names of the variables that should not be recalculated. Useful for projects with time-consuming recalculations for certain calculated fields. |
Import REDCap data into R either from REDCap's exported R file or directly via the REDCap API. The function returns a list with the dataset, the project dictionary (metadata) and, for longitudinal projects, the instrument–event mapping (event_form) when available.
redcap_data( data_path = NA, dic_path = NA, event_path = NA, uri = NA, token = NA, sep = ",", filter_field = NULL, survey_fields = FALSE )redcap_data( data_path = NA, dic_path = NA, event_path = NA, uri = NA, token = NA, sep = ",", filter_field = NULL, survey_fields = FALSE )
data_path |
Path to exported R file (use with |
dic_path |
Path to the dictionary file (CSV or XLSX; use with |
event_path |
Path to the event-form mapping file (CSV or XLSX) for longitudinal projects (downloadable via the |
uri |
REDCap API base URI (use with |
token |
REDCap API token (use with |
sep |
Character string specifying the field separator for the exported dictionary CSV file. Defaul |
filter_field |
Optional character vector of field names to request from the API. |
survey_fields |
Logical; include survey-related fields when pulling via API. Default |
Two import modes are supported:
Exported files — use data_path (REDCap R export) and dic_path (dictionary CSV/XLSX).
API — use uri and token to pull data and metadata directly from REDCap.
If the project is longitudinal, provide event_path (instrument–event mapping)
or the function will attempt to fetch it from the API when using API mode.
Steps for using exported data in REDCap:
Use the REDCap Export Data function and choose R Statistical Software format.
REDCap generates:
A CSV file with observations.
An R script to format variables for import.
Ensure the exported files, dictionary, and event mapping (if any) are in the same directory.
A list with:
data: Imported dataset.
dictionary: Variable dictionary (project metadata).
event_form: Event-form mapping for longitudinal projects (if applicable).
To use other package functions effectively, include the dic_path argument to load the project dictionary.
Use either exported-files mode (data_path + dic_path) or API mode (uri + token) — not both.
For exported files, REDCap's R export is required for data_path. Dictionary and event files must be CSV or XLSX.
## Not run: # From exported files out <- redcap_data( data_path = "project_export.r", dic_path = "project_dictionary.csv", event_path = "instrument_event_map.csv" ) # From API out_api <- redcap_data( uri = "https://redcap.example.org/api/", token = "REPLACE_WITH_TOKEN" ) ## End(Not run)## Not run: # From exported files out <- redcap_data( data_path = "project_export.r", dic_path = "project_dictionary.csv", event_path = "instrument_event_map.csv" ) # From API out_api <- redcap_data( uri = "https://redcap.example.org/api/", token = "REPLACE_WITH_TOKEN" ) ## End(Not run)
This function rounds numeric values to the specified number of decimal digits,
mimicking the behavior of the base R round() function but implemented manually.
round(x, digits)round(x, digits)
x |
A numeric vector to be rounded. |
digits |
Integer indicating the number of decimal places to round to. |
A numeric vector rounded to the specified number of digits.
round(3.14159, 2) round(c(-2.718, 3.14159), 1)round(3.14159, 2) round(c(-2.718, 3.14159), 1)
This function generates a nested dataset filtered by each event, containing only the variables associated with each event. It uses the provided data, dictionary, and event-form mapping. You can choose to return data for a specific event.
split_event(data, dic, event_form, which = NULL)split_event(data, dic, event_form, which = NULL)
data |
Data frame containing data from REDCap. |
dic |
Data frame containing the dictionary read from REDCap. |
event_form |
Data frame containing the correspondence of each event with each form. |
which |
Character string specifying an event if only data for that event is desired. |
This function generates a nested dataset containing only the variables associated with each form, using the provided data, dictionary, and event-form mapping. You can choose to return data for a specific form.
split_form(data, dic, event_form = NULL, which = NULL, wide = FALSE)split_form(data, dic, event_form = NULL, which = NULL, wide = FALSE)
data |
Data frame containing data from REDCap. |
dic |
Data frame containing the dictionary read from REDCap. |
event_form |
Data frame containing the correspondence of each event with each form. |
which |
Character string specifying a form if only data for that form is desired. |
wide |
Logical indicating if the dataset should be returned in a wide format ( |
This function converts all variables in the dataset to factors, except those specified in the exclude parameter.
to_factor(data, dic, exclude = NULL)to_factor(data, dic, exclude = NULL)
data |
Data frame containing the REDCap data. |
dic |
Data frame containing the REDCap dictionary. |
exclude |
Character vector specifying the names of variables that should not be converted to factors. If |
This function inspects all the checkboxes in the study to determine if they have a branching logic. If a branching logic is present and its result is missing, the function will input a missing value into the checkbox. If checkbox_na is TRUE, the function will additionally input a missing value when the branching logic isn't satisfied, not just when it is missing. If a branching logic cannot be found or the logic cannot be transcribed due to the presence of smart variables, the variable is added to a list of reviewable variables that will be printed.
The function returns the dataset with the transformed checkboxes and a table summarizing the results.
transform_checkboxes(data, dic, event_form = NULL, checkbox_na = FALSE)transform_checkboxes(data, dic, event_form = NULL, checkbox_na = FALSE)
data |
Data frame containing data from REDCap. |
dic |
Data frame containing the dictionary read from REDCap. |
event_form |
Data frame containing the correspondence of each event with each form. |
checkbox_na |
Logical indicating if values of checkboxes with branching logic should be set to missing only when the branching logic is missing ( |