---
title: "REDCapDM - Queries"
output:
rmarkdown::html_vignette:
toc: true
toc_depth: 5
number_sections: true
vignette: >
%\VignetteIndexEntry{REDCapDM - Queries}
%\VignetteEncoding{UTF-8}
%\VignetteEngine{knitr::rmarkdown}
editor_options:
chunk_output_type: inline
---
```{r message=FALSE, warning=FALSE, include=FALSE}
rm(list = ls())
library(REDCapDM)
library(kableExtra)
library(knitr)
library(dplyr)
library(magrittr)
library(purrr)
covican_transformed <- rd_transform(covican)
```
This vignette provides a summary of the simple and common use of [REDCapDM](https://github.com/bruigtp/REDCapDM) to identify discrepancies in [REDCap](https://www.project-redcap.org/) data imported into R.
# **Queries**
Queries are crucial for the accuracy and reliability of a [REDCap](https://www.project-redcap.org/) dataset. They help identify missing values, inconsistencies, and potential errors in the collected data. The [`rd_query()`](https://bruigtp.github.io/REDCapDM/reference/rd_query.html) function allows you to generate queries using a specific expression.
To identify missing values in certain variables, simply provide the relevant information to the `variables` and `expression` arguments. In this scenario, the expression would be 'is.na(x)', where 'x' represents the variable itself:
```{r echo=TRUE, message=FALSE, warning=FALSE}
example <- rd_query(covican_transformed,
variables = "copd",
expression = "is.na(x)")
```
Note: For variables with branching logic, the function will automatically apply the associated branching logic or at least report it.
Alternatively, to identify outliers or observations that meet a certain condition (for example, range):
```{r message=FALSE, warning=TRUE, comment=NA}
example <- rd_query(covican_transformed,
variables = c("age", "potassium"),
expression = c("x > 80", "x > 4.2 & x < 4.3"),
event = "baseline_visit_arm_1")
```
In both cases, the function returns a list containing a data frame designed to aid you to locate each query in the [REDCap](https://www.project-redcap.org/) project:
```{r echo=TRUE, message=FALSE, warning=FALSE, comment=NA, results='hide'}
example$queries
```
```{r echo=FALSE, message=FALSE, warning=FALSE, comment=NA}
kable(head(example$queries, 2)) %>%
kableExtra::row_spec(0, bold = TRUE) %>%
kableExtra::kable_styling()
```
And a summary of the generated queries per specified variable for each applied expression:
```{r echo=TRUE, message=FALSE, warning=FALSE, comment=NA}
example$results
```
For longitudinal projects, the [`rd_event()`](https://bruigtp.github.io/REDCapDM/reference/rd_event.html) allows you to check if a particular event is missing from a record in the exported data. This happens in REDCap when there is no collected data in a particular event from a record, as REDCap will not export the corresponding row. To identify these cases, you can use the following code:
```{r message=FALSE, warning=FALSE, comment=NA}
example <- rd_event(covican_transformed,
event = "follow_up_visit_da_arm_1")
```
# **Control**
After identifying queries, it is common practice to correct the original dataset in [REDCap](https://www.project-redcap.org/) and re-run the query process for a new query dataset.
The [`check_queries()`](https://bruigtp.github.io/REDCapDM/reference/check_queries.html) functiona allows you to compare the previous query dataset with the new one:
```{r message=FALSE, warning=FALSE, include=FALSE}
example <- rd_query(covican_transformed,
variables = c("copd", "age"),
expression = c("is.na(x)", "is.na(x)"),
event = "baseline_visit_arm_1")
new_example <- example
new_example$queries <- as.data.frame(new_example$queries)
new_example$queries <- new_example$queries[c(1:5, 10:11),] # We take only some of the previously created queries
new_example$queries[nrow(new_example$queries) + 1,] <- c("100-79", "Hospital 11", "Baseline visit", "Comorbidities", "copd", "-", "Chronic obstructive pulmonary disease", "The value is NA and it should not be missing", "100-79-4") # we create a new query
new_example$queries[nrow(new_example$queries) + 1, ] <- c("105-56", "Hospital 5", "Baseline visit", "Demographics", "age", "-", "Age", "The value is 80 and it should not be >70", "105-56-2")
```
```{r message=FALSE, warning=FALSE, comment=NA}
check <- check_queries(old = example$queries,
new = new_example$queries)
```
The output, in addition to the query data frame, now includes a summary with the number of new, miscorrected, solved and pending queries:
```{r message=FALSE, warning=FALSE, comment=NA}
# Print results
check$results
```
Note: The "Miscorrected" category includes queries that belong to the same combination of record identifier and variable in both the old and new reports, but with a different reason. For instance, if a variable had a missing value in the old report, but in the new report shows a value outside the established range, it would be classified as "Miscorrected".
# **Export**
With the help of the `rd_export()` function, you can export the identified queries to a `.xlsx` file of your choice:
```{r message=FALSE, warning=FALSE, comment=NA, include=FALSE}
example <- rd_query(covican_transformed,
variables = c("copd", "age"),
expression = c("is.na(x)", "is.na(x)"),
event = "baseline_visit_arm_1")
```
```{r message=FALSE, warning=FALSE, comment=NA, eval=FALSE}
rd_export(example)
```
This is the simplets way to use the function and will create a file named "example.xlsx" in your current working directory, but you can customise this exported file:
```{r message=FALSE, warning=FALSE, comment=NA, eval=FALSE}
rd_export(queries = example$queries,
column = "Link",
sheet_name = "Queries - Proyecto",
path = "C:/User/Desktop/queries.xlsx",
password = "123")
```
In both cases, a message will be generated in the console informing you that the file has been created and where it is located.
**For more information, consult the complete vignette available at: https://bruigtp.github.io/REDCapDM/articles/REDCapDM.html**