Title: | Checks for Exclusion Criteria in Online Data |
Version: | 0.5.2 |
Description: | Data that are collected through online sources such as Mechanical Turk may require excluding rows because of IP address duplication, geolocation, or completion duration. This package facilitates exclusion of these data for Qualtrics datasets. |
License: | GPL (≥ 3) |
URL: | https://docs.ropensci.org/excluder/, https://github.com/ropensci/excluder/ |
BugReports: | https://github.com/ropensci/excluder/issues/ |
Depends: | R (≥ 3.5.0) |
Imports: | cli, curl, dplyr, ipaddress, janitor, lubridate, magrittr, maps, rlang, stringr, tidyr, tidyselect |
Suggests: | covr, knitr, lifecycle, readr, rmarkdown, testthat (≥ 3.0.0), withr |
VignetteBuilder: | knitr |
Config/testthat/edition: | 3 |
Encoding: | UTF-8 |
LazyData: | true |
RoxygenNote: | 7.3.2 |
NeedsCompilation: | no |
Packaged: | 2025-05-15 19:51:54 UTC; jstevens |
Author: | Jeffrey R. Stevens
|
Maintainer: | Jeffrey R. Stevens <jeffrey.r.stevens@protonmail.com> |
Repository: | CRAN |
Date/Publication: | 2025-05-15 20:10:02 UTC |
excluder: Checks for Exclusion Criteria in Online Data
Description
Data that are collected through online sources such as Mechanical Turk may require excluding rows because of IP address duplication, geolocation, or completion duration. This package facilitates exclusion of these data for Qualtrics datasets.
Author(s)
Maintainer: Jeffrey R. Stevens jeffrey.r.stevens@protonmail.com (ORCID) [copyright holder]
Other contributors:
Joseph O'Brien (ORCID) [reviewer]
Julia Silge julia.silge@gmail.com (ORCID) [reviewer]
See Also
Useful links:
Report bugs at https://github.com/ropensci/excluder/issues/
Pipe operator
Description
See magrittr::%>%
for details.
Usage
lhs %>% rhs
Arguments
lhs |
A value or the magrittr placeholder. |
rhs |
A function call using the magrittr semantics. |
Value
The result of calling rhs(lhs)
.
Check for duplicate IP addresses and/or locations
Description
The check_duplicates()
function subsets rows of data, retaining rows
that have the same IP address and/or same latitude and longitude. The
function is written to work with data from
Qualtrics surveys.
Usage
check_duplicates(
x,
id_col = "ResponseId",
ip_col = "IPAddress",
location_col = c("LocationLatitude", "LocationLongitude"),
rename = TRUE,
dupl_ip = TRUE,
dupl_location = TRUE,
include_na = FALSE,
keep = FALSE,
quiet = FALSE,
print = TRUE
)
Arguments
x |
Data frame (preferably imported from Qualtrics using {qualtRics}). |
id_col |
Column name for unique row ID (e.g., participant). |
ip_col |
Column name for IP addresses. |
location_col |
Two element vector specifying columns for latitude and longitude (in that order). |
rename |
Logical indicating whether to rename columns (using |
dupl_ip |
Logical indicating whether to check IP addresses. |
dupl_location |
Logical indicating whether to check latitude and longitude. |
include_na |
Logical indicating whether to include rows with NAs for IP address and location as potentially excluded rows. |
keep |
Logical indicating whether to keep or remove exclusion column. |
quiet |
Logical indicating whether to print message to console. |
print |
Logical indicating whether to print returned tibble to console. |
Details
To record this information in your Qualtrics survey, you must ensure that Anonymize responses is disabled.
Default column names are set based on output from the
qualtRics::fetch_survey()
.
By default, IP address and location are both checked, but they can be
checked separately with the dupl_ip
and dupl_location
arguments.
The function outputs to console separate messages about the number of rows with duplicate IP addresses and rows with duplicate locations. These counts are computed independently, so rows may be counted for both types of duplicates.
Value
An object of the same type as x
that includes the rows with
duplicate IP addresses and/or locations. This includes a column
called dupe_count that returns the number of duplicates.
For a function that marks these rows, use mark_duplicates()
.
For a function that excludes these rows, use exclude_duplicates()
.
See Also
Other duplicates functions:
exclude_duplicates()
,
mark_duplicates()
Other check functions:
check_duration()
,
check_ip()
,
check_location()
,
check_preview()
,
check_progress()
,
check_resolution()
Examples
# Check for duplicate IP addresses and locations
data(qualtrics_text)
check_duplicates(qualtrics_text)
# Check only for duplicate locations
qualtrics_text %>%
check_duplicates(dupl_location = FALSE)
# Do not print rows to console
qualtrics_text %>%
check_duplicates(print = FALSE)
# Do not print message to console
qualtrics_text %>%
check_duplicates(quiet = TRUE)
Check for minimum or maximum durations
Description
The check_duration()
function subsets rows of data, retaining rows
that have durations that are too fast or too slow.
The function is written to work with data from
Qualtrics surveys.
Usage
check_duration(
x,
min_duration = 10,
max_duration = NULL,
id_col = "ResponseId",
duration_col = "Duration (in seconds)",
rename = TRUE,
keep = FALSE,
quiet = FALSE,
print = TRUE
)
Arguments
x |
Data frame (preferably imported from Qualtrics using {qualtRics}). |
min_duration |
Minimum duration that is too fast in seconds. |
max_duration |
Maximum duration that is too slow in seconds. |
id_col |
Column name for unique row ID (e.g., participant). |
duration_col |
Column name for durations. |
rename |
Logical indicating whether to rename columns (using |
keep |
Logical indicating whether to keep or remove exclusion column. |
quiet |
Logical indicating whether to print message to console. |
print |
Logical indicating whether to print returned tibble to console. |
Details
Default column names are set based on output from the
qualtRics::fetch_survey()
.
By default, minimum durations of 10 seconds are checked, but either
minima or maxima can be checked with the min_duration
and
max_duration
arguments. The function outputs to console separate
messages about the number of rows that are too fast or too slow.
This function returns the fast and slow rows.
Value
An object of the same type as x
that includes the rows with fast and/or
slow duration.
For a function that marks these rows, use mark_duration()
.
For a function that excludes these rows, use exclude_duration()
.
See Also
Other duration functions:
exclude_duration()
,
mark_duration()
Other check functions:
check_duplicates()
,
check_ip()
,
check_location()
,
check_preview()
,
check_progress()
,
check_resolution()
Examples
# Check for durations faster than 100 seconds
data(qualtrics_text)
check_duration(qualtrics_text, min_duration = 100)
# Remove preview data first
qualtrics_text %>%
exclude_preview() %>%
check_duration(min_duration = 100)
# Check only for durations slower than 800 seconds
qualtrics_text %>%
exclude_preview() %>%
check_duration(max_duration = 800)
# Do not print rows to console
qualtrics_text %>%
exclude_preview() %>%
check_duration(min_duration = 100, print = FALSE)
# Do not print message to console
qualtrics_text %>%
exclude_preview() %>%
check_duration(min_duration = 100, quiet = TRUE)
Check for IP addresses from outside of a specified country.
Description
The check_ip()
function subsets rows of data, retaining rows
that have IP addresses from outside the specified country.
The function is written to work with data from
Qualtrics surveys.
Usage
check_ip(
x,
id_col = "ResponseId",
ip_col = "IPAddress",
rename = TRUE,
country = "US",
include_na = FALSE,
keep = FALSE,
quiet = FALSE,
print = TRUE
)
Arguments
x |
Data frame or tibble (preferably imported from Qualtrics using {qualtRics}). |
id_col |
Column name for unique row ID (e.g., participant). |
ip_col |
Column name for IP addresses. |
rename |
Logical indicating whether to rename columns (using |
country |
Two-letter abbreviation of country to check (default is "US"). |
include_na |
Logical indicating whether to include rows with NA in IP address column in the output list of potentially excluded data. |
keep |
Logical indicating whether to keep or remove exclusion column. |
quiet |
Logical indicating whether to print message to console. |
print |
Logical indicating whether to print returned tibble to console. |
Details
To record this information in your Qualtrics survey, you must ensure that Anonymize responses is disabled.
Default column names are set based on output from the
qualtRics::fetch_survey()
.
The function uses ipaddress::country_networks()
to assign IP addresses to
specific countries using
ISO 3166-1 alpha-2 country codes.
The function outputs to console a message about the number of rows
with IP addresses outside of the specified country. If there are NA
s for IP
addresses (likely due to including preview data—see check_preview()
), it
will print a message alerting to the number of rows with NA
s.
Value
An object of the same type as x
that includes the rows with
IP addresses outside of the specified country.
For a function that marks these rows, use mark_ip()
.
For a function that excludes these rows, use exclude_ip()
.
Note
This function requires internet connectivity as it uses the
ipaddress::country_networks()
function, which pulls daily updated data
from https://www.iwik.org/ipcountry/. It only updates the data once
per session, as it caches the results for future work during the session.
See Also
Other ip functions:
exclude_ip()
,
mark_ip()
Other check functions:
check_duplicates()
,
check_duration()
,
check_location()
,
check_preview()
,
check_progress()
,
check_resolution()
Examples
# Check for IP addresses outside of the US
data(qualtrics_text)
check_ip(qualtrics_text)
# Remove preview data first
qualtrics_text %>%
exclude_preview() %>%
check_ip()
# Check for IP addresses outside of Germany
qualtrics_text %>%
exclude_preview() %>%
check_ip(country = "DE")
# Do not print rows to console
qualtrics_text %>%
exclude_preview() %>%
check_ip(print = FALSE)
# Do not print message to console
qualtrics_text %>%
exclude_preview() %>%
check_ip(quiet = TRUE)
Check for locations outside of the US
Description
The check_location()
function subsets rows of data, retaining rows
that have locations outside of the US.
The function is written to work with data from
Qualtrics surveys.
Usage
check_location(
x,
id_col = "ResponseId",
location_col = c("LocationLatitude", "LocationLongitude"),
rename = TRUE,
include_na = FALSE,
keep = FALSE,
quiet = FALSE,
print = TRUE
)
Arguments
x |
Data frame (preferably imported from Qualtrics using {qualtRics}). |
id_col |
Column name for unique row ID (e.g., participant). |
location_col |
Two element vector specifying columns for latitude and longitude (in that order). |
rename |
Logical indicating whether to rename columns (using |
include_na |
Logical indicating whether to include rows with NA in latitude and longitude columns in the output list of potentially excluded data. |
keep |
Logical indicating whether to keep or remove exclusion column. |
quiet |
Logical indicating whether to print message to console. |
print |
Logical indicating whether to print returned tibble to console. |
Details
To record this information in your Qualtrics survey, you must ensure that Anonymize responses is disabled.
Default column names are set based on output from the
qualtRics::fetch_survey()
.
The function only works for the United States.
It uses the #' maps::map.where()
to determine if latitude and longitude
are inside the US.
The function outputs to console a message about the number of rows with locations outside of the US.
Value
The output is a data frame of the rows that are located outside of
the US and (if include_na == FALSE
) rows with no location information.
For a function that marks these rows, use mark_location()
.
For a function that excludes these rows, use exclude_location()
.
See Also
Other location functions:
exclude_location()
,
mark_location()
Other check functions:
check_duplicates()
,
check_duration()
,
check_ip()
,
check_preview()
,
check_progress()
,
check_resolution()
Examples
# Check for locations outside of the US
data(qualtrics_text)
check_location(qualtrics_text)
# Remove preview data first
qualtrics_text %>%
exclude_preview() %>%
check_location()
# Do not print rows to console
qualtrics_text %>%
exclude_preview() %>%
check_location(print = FALSE)
# Do not print message to console
qualtrics_text %>%
exclude_preview() %>%
check_location(quiet = TRUE)
Check for survey previews
Description
The check_preview()
function subsets rows of data, retaining rows
that are survey previews.
The function is written to work with data from
Qualtrics surveys.
Usage
check_preview(
x,
id_col = "ResponseId",
preview_col = "Status",
rename = TRUE,
keep = FALSE,
quiet = FALSE,
print = TRUE
)
Arguments
x |
Data frame (preferably imported from Qualtrics using {qualtRics}). |
id_col |
Column name for unique row ID (e.g., participant). |
preview_col |
Column name for survey preview. |
rename |
Logical indicating whether to rename columns (using |
keep |
Logical indicating whether to keep or remove exclusion column. |
quiet |
Logical indicating whether to print message to console. |
print |
Logical indicating whether to print returned tibble to console. |
Details
Default column names are set based on output from the
qualtRics::fetch_survey()
.
The preview column in Qualtrics can be a numeric or character vector
depending on whether it is exported as choice text or numeric values.
This function works for both.
The function outputs to console a message about the number of rows that are survey previews.
Value
The output is a data frame of the rows
that are survey previews.
For a function that marks these rows, use mark_preview()
.
For a function that excludes these rows, use exclude_preview()
.
See Also
Other preview functions:
exclude_preview()
,
mark_preview()
Other check functions:
check_duplicates()
,
check_duration()
,
check_ip()
,
check_location()
,
check_progress()
,
check_resolution()
Examples
# Check for survey previews
data(qualtrics_text)
check_preview(qualtrics_text)
# Works for Qualtrics data exported as numeric values, too
qualtrics_numeric %>%
check_preview()
# Do not print rows to console
qualtrics_text %>%
check_preview(print = FALSE)
# Do not print message to console
qualtrics_text %>%
check_preview(quiet = TRUE)
Check for survey progress
Description
The check_progress()
function subsets rows of data, retaining rows
that have incomplete progress.
The function is written to work with data from
Qualtrics surveys.
Usage
check_progress(
x,
min_progress = 100,
id_col = "ResponseId",
finished_col = "Finished",
progress_col = "Progress",
rename = TRUE,
keep = FALSE,
quiet = FALSE,
print = TRUE
)
Arguments
x |
Data frame (preferably imported from Qualtrics using {qualtRics}). |
min_progress |
Amount of progress considered acceptable to include. |
id_col |
Column name for unique row ID (e.g., participant). |
finished_col |
Column name for whether survey was completed. |
progress_col |
Column name for percentage of survey completed. |
rename |
Logical indicating whether to rename columns (using |
keep |
Logical indicating whether to keep or remove exclusion column. |
quiet |
Logical indicating whether to print message to console. |
print |
Logical indicating whether to print returned tibble to console. |
Details
Default column names are set based on output from the
qualtRics::fetch_survey()
.
The default requires 100% completion, but lower levels of completion
maybe acceptable and can be allowed by specifying the min_progress
argument.
The finished column in Qualtrics can be a numeric or character vector
depending on whether it is exported as choice text or numeric values.
This function works for both.
The function outputs to console a message about the number of rows that have incomplete progress.
Value
The output is a data frame of the rows
that have incomplete progress.
For a function that marks these rows, use mark_progress()
.
For a function that excludes these rows, use exclude_progress()
.
See Also
Other progress functions:
exclude_progress()
,
mark_progress()
Other check functions:
check_duplicates()
,
check_duration()
,
check_ip()
,
check_location()
,
check_preview()
,
check_resolution()
Examples
# Check for rows with incomplete progress
data(qualtrics_text)
check_progress(qualtrics_text)
# Remove preview data first
qualtrics_text %>%
exclude_preview() %>%
check_progress()
# Include a lower acceptable completion percentage
qualtrics_numeric %>%
exclude_preview() %>%
check_progress(min_progress = 98)
# Do not print rows to console
qualtrics_text %>%
exclude_preview() %>%
check_progress(print = FALSE)
# Do not print message to console
qualtrics_text %>%
exclude_preview() %>%
check_progress(quiet = TRUE)
Check screen resolution
Description
The check_resolution()
function subsets rows of data, retaining rows
that have unacceptable screen resolution. This can be used, for example, to
determine data collected via phones when desktop monitors are required.
The function is written to work with data from
Qualtrics surveys.
Usage
check_resolution(
x,
res_min = 1000,
width_min = 0,
height_min = 0,
id_col = "ResponseId",
res_col = "Resolution",
rename = TRUE,
keep = FALSE,
quiet = FALSE,
print = TRUE
)
Arguments
x |
Data frame (preferably imported from Qualtrics using {qualtRics}). |
res_min |
Minimum acceptable screen resolution (width and height). |
width_min |
Minimum acceptable screen width. |
height_min |
Minimum acceptable screen height. |
id_col |
Column name for unique row ID (e.g., participant). |
res_col |
Column name for screen resolution (in format widthxheight). |
rename |
Logical indicating whether to rename columns (using |
keep |
Logical indicating whether to keep or remove exclusion column. |
quiet |
Logical indicating whether to print message to console. |
print |
Logical indicating whether to print returned tibble to console. |
Details
To record this information in your Qualtrics survey, you must insert a meta info question.
Default column names are set based on output from the
qualtRics::fetch_survey()
.
The function outputs to console a message about the number of rows with unacceptable screen resolution.
Value
The output is a data frame of the rows that have unacceptable screen
resolutions. This includes new columns for resolution width and height.
For a function that marks these rows, use mark_resolution()
.
For a function that excludes these rows, use exclude_resolution()
.
See Also
Other resolution functions:
exclude_resolution()
,
mark_resolution()
Other check functions:
check_duplicates()
,
check_duration()
,
check_ip()
,
check_location()
,
check_preview()
,
check_progress()
Examples
# Check for survey previews
data(qualtrics_text)
check_resolution(qualtrics_text)
# Remove preview data first
qualtrics_text %>%
exclude_preview() %>%
check_resolution()
# Do not print rows to console
qualtrics_text %>%
exclude_preview() %>%
check_resolution(print = FALSE)
# Do not print message to console
qualtrics_text %>%
exclude_preview() %>%
check_resolution(quiet = TRUE)
Unite multiple exclusion columns into single column
Description
collapse_exclusions()
was renamed to unite_exclusions()
to create a more
consistent API with tidyverse's unite()
function—please use
unite_exclusions()
.
Usage
collapse_exclusions(
x,
exclusion_types = c("duplicates", "duration", "ip", "location", "preview", "progress",
"resolution"),
separator = ",",
remove = TRUE
)
Remove columns that could include identifiable information
Description
The deidentify()
function selects out columns from
Qualtrics surveys that may include identifiable
information such as IP address, location, or computer characteristics.
Usage
deidentify(x, strict = TRUE)
Arguments
x |
Data frame (downloaded from Qualtrics). |
strict |
Logical indicating whether to use strict or non-strict level of deidentification. Strict removes computer information columns in addition to IP address and location. |
Details
The function offers two levels of deidentification. The default strict level removes columns associated with IP address and location and computer information (browser type and version, operating system, and screen resolution). The non-strict level removes only columns associated with IP address and location.
Typically, deidentification should be used at the end of a processing pipeline so that these columns can be used to exclude rows.
Value
An object of the same type as x
that excludes Qualtrics columns with
identifiable information.
Examples
names(qualtrics_numeric)
# Remove IP address, location, and computer information columns
deid <- deidentify(qualtrics_numeric)
names(deid)
# Remove only IP address and location columns
deid2 <- deidentify(qualtrics_numeric, strict = FALSE)
names(deid2)
Exclude rows with duplicate IP addresses and/or locations
Description
The exclude_duplicates()
function removes
rows of data that have the same IP address and/or same latitude and
longitude. The function is written to work with data from
Qualtrics surveys.
Usage
exclude_duplicates(
x,
id_col = "ResponseId",
ip_col = "IPAddress",
location_col = c("LocationLatitude", "LocationLongitude"),
rename = TRUE,
dupl_ip = TRUE,
dupl_location = TRUE,
include_na = FALSE,
quiet = TRUE,
print = TRUE,
silent = FALSE
)
Arguments
x |
Data frame (preferably imported from Qualtrics using {qualtRics}). |
id_col |
Column name for unique row ID (e.g., participant). |
ip_col |
Column name for IP addresses. |
location_col |
Two element vector specifying columns for latitude and longitude (in that order). |
rename |
Logical indicating whether to rename columns (using |
dupl_ip |
Logical indicating whether to check IP addresses. |
dupl_location |
Logical indicating whether to check latitude and longitude. |
include_na |
Logical indicating whether to include rows with NAs for IP address and location as potentially excluded rows. |
quiet |
Logical indicating whether to print message to console. |
print |
Logical indicating whether to print returned tibble to console. |
silent |
Logical indicating whether to print message to console. Note this argument controls the exclude message not the check message. |
Details
To record this information in your Qualtrics survey, you must ensure that Anonymize responses is disabled.
Default column names are set based on output from the
qualtRics::fetch_survey()
.
By default, IP address and location are both checked, but they can be
checked separately with the dupl_ip
and dupl_location
arguments.
The function outputs to console separate messages about the number of rows with duplicate IP addresses and rows with duplicate locations. These counts are computed independently, so rows may be counted for both types of duplicates.
Value
An object of the same type as x
that excludes rows
with duplicate IP addresses and/or locations.
For a function that just checks for and returns duplicate rows,
use check_duplicates()
. For a function that marks these rows,
use mark_duplicates()
.
See Also
Other duplicates functions:
check_duplicates()
,
mark_duplicates()
Other exclude functions:
exclude_duration()
,
exclude_ip()
,
exclude_location()
,
exclude_preview()
,
exclude_progress()
,
exclude_resolution()
Examples
# Exclude duplicate IP addresses and locations
data(qualtrics_text)
df <- exclude_duplicates(qualtrics_text)
# Remove preview data first
df <- qualtrics_text %>%
exclude_preview() %>%
exclude_duplicates()
# Exclude only for duplicate locations
df <- qualtrics_text %>%
exclude_preview() %>%
exclude_duplicates(dupl_location = FALSE)
Exclude rows with minimum or maximum durations
Description
The exclude_duration()
function removes
rows of data that have durations that are too fast or too slow.
The function is written to work with data from
Qualtrics surveys.
Usage
exclude_duration(
x,
min_duration = 10,
max_duration = NULL,
id_col = "ResponseId",
duration_col = "Duration (in seconds)",
rename = TRUE,
quiet = TRUE,
print = TRUE,
silent = FALSE
)
Arguments
x |
Data frame (preferably imported from Qualtrics using {qualtRics}). |
min_duration |
Minimum duration that is too fast in seconds. |
max_duration |
Maximum duration that is too slow in seconds. |
id_col |
Column name for unique row ID (e.g., participant). |
duration_col |
Column name for durations. |
rename |
Logical indicating whether to rename columns (using |
quiet |
Logical indicating whether to print message to console. |
print |
Logical indicating whether to print returned tibble to console. |
silent |
Logical indicating whether to print message to console. Note this argument controls the exclude message not the check message. |
Details
Default column names are set based on output from the
qualtRics::fetch_survey()
.
By default, minimum durations of 10 seconds are checked, but either
minima or maxima can be checked with the min_duration
and
max_duration
arguments. The function outputs to console separate
messages about the number of rows that are too fast or too slow.
This function returns the fast and slow rows.
Value
An object of the same type as x
that excludes rows
with fast and/or slow duration.
For a function that checks for these rows, use check_duration()
.
For a function that marks these rows, use mark_duration()
.
See Also
Other duration functions:
check_duration()
,
mark_duration()
Other exclude functions:
exclude_duplicates()
,
exclude_ip()
,
exclude_location()
,
exclude_preview()
,
exclude_progress()
,
exclude_resolution()
Examples
# Exclude durations faster than 100 seconds
data(qualtrics_text)
df <- exclude_duration(qualtrics_text, min_duration = 100)
# Remove preview data first
df <- qualtrics_text %>%
exclude_preview() %>%
exclude_duration()
# Exclude only for durations slower than 800 seconds
df <- qualtrics_text %>%
exclude_preview() %>%
exclude_duration(max_duration = 800)
Exclude IP addresses from outside of a specified country.
Description
The exclude_ip()
function removes rows of data that have
IP addresses from outside the specified country.
The function is written to work with data from
Qualtrics surveys.
Usage
exclude_ip(
x,
id_col = "ResponseId",
ip_col = "IPAddress",
rename = TRUE,
country = "US",
include_na = FALSE,
quiet = TRUE,
print = TRUE,
silent = FALSE
)
Arguments
x |
Data frame or tibble (preferably imported from Qualtrics using {qualtRics}). |
id_col |
Column name for unique row ID (e.g., participant). |
ip_col |
Column name for IP addresses. |
rename |
Logical indicating whether to rename columns (using |
country |
Two-letter abbreviation of country to check (default is "US"). |
include_na |
Logical indicating whether to include rows with NA in IP address column in the output list of potentially excluded data. |
quiet |
Logical indicating whether to print message to console. |
print |
Logical indicating whether to print returned tibble to console. |
silent |
Logical indicating whether to print message to console. Note this argument controls the exclude message not the check message. |
Details
To record this information in your Qualtrics survey, you must ensure that Anonymize responses is disabled.
Default column names are set based on output from the
qualtRics::fetch_survey()
.
The function uses ipaddress::country_networks()
to assign IP addresses to
specific countries using
ISO 3166-1 alpha-2 country codes.
The function outputs to console a message about the number of rows
with IP addresses outside of the specified country. If there are NA
s for IP
addresses (likely due to including preview data—see check_preview()
), it
will print a message alerting to the number of rows with NA
s.
Value
An object of the same type as x
that excludes rows
with IP addresses outside of the specified country.
For a function that checks these rows, use check_ip()
.
For a function that marks these rows, use mark_ip()
.
Note
This function requires internet connectivity as it uses the
ipaddress::country_networks()
function, which pulls daily updated data
from http://www.iwik.org/ipcountry/. It only updates the data once
per session, as it caches the results for future work during the session.
See Also
Other ip functions:
check_ip()
,
mark_ip()
Other exclude functions:
exclude_duplicates()
,
exclude_duration()
,
exclude_location()
,
exclude_preview()
,
exclude_progress()
,
exclude_resolution()
Examples
# Exclude IP addresses outside of the US
data(qualtrics_text)
df <- exclude_ip(qualtrics_text)
# Remove preview data first
df <- qualtrics_text %>%
exclude_preview() %>%
exclude_ip()
# Exclude IP addresses outside of Germany
df <- qualtrics_text %>%
exclude_preview() %>%
exclude_ip(country = "DE")
Exclude locations outside of US
Description
The exclude_location()
function removes
rows that have locations outside of the US.
The function is written to work with data from
Qualtrics surveys.
Usage
exclude_location(
x,
id_col = "ResponseId",
location_col = c("LocationLatitude", "LocationLongitude"),
rename = TRUE,
include_na = FALSE,
quiet = TRUE,
print = TRUE,
silent = FALSE
)
Arguments
x |
Data frame (preferably imported from Qualtrics using {qualtRics}). |
id_col |
Column name for unique row ID (e.g., participant). |
location_col |
Two element vector specifying columns for latitude and longitude (in that order). |
rename |
Logical indicating whether to rename columns (using |
include_na |
Logical indicating whether to include rows with NA in latitude and longitude columns in the output list of potentially excluded data. |
quiet |
Logical indicating whether to print message to console. |
print |
Logical indicating whether to print returned tibble to console. |
silent |
Logical indicating whether to print message to console. Note this argument controls the exclude message not the check message. |
Details
To record this information in your Qualtrics survey, you must ensure that Anonymize responses is disabled.
Default column names are set based on output from the
qualtRics::fetch_survey()
.
The function only works for the United States.
It uses the #' maps::map.where()
to determine if latitude and longitude
are inside the US.
The function outputs to console a message about the number of rows with locations outside of the US.
Value
An object of the same type as x
that excludes rows
that are located outside of the US and (if include_na == FALSE
) rows with
no location information.
For a function that checks for these rows, use check_location()
.
For a function that marks these rows, use mark_location()
.
See Also
Other location functions:
check_location()
,
mark_location()
Other exclude functions:
exclude_duplicates()
,
exclude_duration()
,
exclude_ip()
,
exclude_preview()
,
exclude_progress()
,
exclude_resolution()
Examples
# Exclude locations outside of the US
data(qualtrics_text)
df <- exclude_location(qualtrics_text)
# Remove preview data first
df <- qualtrics_text %>%
exclude_preview() %>%
exclude_location()
Exclude survey previews
Description
The exclude_preview()
function removes
rows that are survey previews.
The function is written to work with data from
Qualtrics surveys.
Usage
exclude_preview(
x,
id_col = "ResponseId",
preview_col = "Status",
rename = TRUE,
quiet = TRUE,
print = TRUE,
silent = FALSE
)
Arguments
x |
Data frame (preferably imported from Qualtrics using {qualtRics}). |
id_col |
Column name for unique row ID (e.g., participant). |
preview_col |
Column name for survey preview. |
rename |
Logical indicating whether to rename columns (using |
quiet |
Logical indicating whether to print message to console. |
print |
Logical indicating whether to print returned tibble to console. |
silent |
Logical indicating whether to print message to console. Note this argument controls the exclude message not the check message. |
Details
Default column names are set based on output from the
qualtRics::fetch_survey()
.
The preview column in Qualtrics can be a numeric or character vector
depending on whether it is exported as choice text or numeric values.
This function works for both.
The function outputs to console a message about the number of rows that are survey previews.
Value
An object of the same type as x
that excludes rows
that are survey previews.
For a function that checks for these rows, use check_preview()
.
For a function that marks these rows, use mark_preview()
.
See Also
Other preview functions:
check_preview()
,
mark_preview()
Other exclude functions:
exclude_duplicates()
,
exclude_duration()
,
exclude_ip()
,
exclude_location()
,
exclude_progress()
,
exclude_resolution()
Examples
# Exclude survey previews
data(qualtrics_text)
df <- exclude_preview(qualtrics_text)
# Works for Qualtrics data exported as numeric values, too
df <- qualtrics_numeric %>%
exclude_preview()
# Do not print rows to console
df <- qualtrics_text %>%
exclude_preview(print = FALSE)
Exclude survey progress
Description
The exclude_progress()
function removes
rows that have incomplete progress.
The function is written to work with data from
Qualtrics surveys.
Usage
exclude_progress(
x,
min_progress = 100,
id_col = "ResponseId",
finished_col = "Finished",
progress_col = "Progress",
rename = TRUE,
quiet = TRUE,
print = TRUE,
silent = FALSE
)
Arguments
x |
Data frame (preferably imported from Qualtrics using {qualtRics}). |
min_progress |
Amount of progress considered acceptable to include. |
id_col |
Column name for unique row ID (e.g., participant). |
finished_col |
Column name for whether survey was completed. |
progress_col |
Column name for percentage of survey completed. |
rename |
Logical indicating whether to rename columns (using |
quiet |
Logical indicating whether to print message to console. |
print |
Logical indicating whether to print returned tibble to console. |
silent |
Logical indicating whether to print message to console. Note this argument controls the exclude message not the check message. |
Details
Default column names are set based on output from the
qualtRics::fetch_survey()
.
The default requires 100% completion, but lower levels of completion
maybe acceptable and can be allowed by specifying the min_progress
argument.
The finished column in Qualtrics can be a numeric or character vector
depending on whether it is exported as choice text or numeric values.
This function works for both.
The function outputs to console a message about the number of rows that have incomplete progress.
Value
An object of the same type as x
that excludes rows
that have incomplete progress.
For a function that checks for these rows, use check_progress()
.
For a function that marks these rows, use mark_progress()
.
See Also
Other progress functions:
check_progress()
,
mark_progress()
Other exclude functions:
exclude_duplicates()
,
exclude_duration()
,
exclude_ip()
,
exclude_location()
,
exclude_preview()
,
exclude_resolution()
Examples
# Exclude rows with incomplete progress
data(qualtrics_text)
df <- exclude_progress(qualtrics_text)
# Remove preview data first
df <- qualtrics_text %>%
exclude_preview() %>%
exclude_progress()
# Include a lower acceptable completion percentage
df <- qualtrics_numeric %>%
exclude_preview() %>%
exclude_progress(min_progress = 98)
# Do not print rows to console
df <- qualtrics_text %>%
exclude_preview() %>%
exclude_progress(print = FALSE)
Exclude unacceptable screen resolution
Description
The exclude_resolution()
function removes
rows that have unacceptable screen resolution.
The function is written to work with data from
Qualtrics surveys.
Usage
exclude_resolution(
x,
res_min = 1000,
width_min = 0,
height_min = 0,
id_col = "ResponseId",
res_col = "Resolution",
rename = TRUE,
quiet = TRUE,
print = TRUE,
silent = FALSE
)
Arguments
x |
Data frame (preferably imported from Qualtrics using {qualtRics}). |
res_min |
Minimum acceptable screen resolution (width and height). |
width_min |
Minimum acceptable screen width. |
height_min |
Minimum acceptable screen height. |
id_col |
Column name for unique row ID (e.g., participant). |
res_col |
Column name for screen resolution (in format widthxheight). |
rename |
Logical indicating whether to rename columns (using |
quiet |
Logical indicating whether to print message to console. |
print |
Logical indicating whether to print returned tibble to console. |
silent |
Logical indicating whether to print message to console. Note this argument controls the exclude message not the check message. |
Details
To record this information in your Qualtrics survey, you must insert a meta info question.
Default column names are set based on output from the
qualtRics::fetch_survey()
.
The function outputs to console a message about the number of rows with unacceptable screen resolution.
Value
An object of the same type as x
that excludes rows
that have unacceptable screen resolutions.
For a function that checks for these rows, use check_resolution()
.
For a function that marks these rows, use mark_resolution()
.
See Also
Other resolution functions:
check_resolution()
,
mark_resolution()
Other exclude functions:
exclude_duplicates()
,
exclude_duration()
,
exclude_ip()
,
exclude_location()
,
exclude_preview()
,
exclude_progress()
Examples
# Exclude low screen resolutions
data(qualtrics_text)
df <- exclude_resolution(qualtrics_text)
# Remove preview data first
df <- qualtrics_text %>%
exclude_preview() %>%
exclude_resolution()
Keep column with marked rows
Description
For check_*() functions, keep the column that has marked rows and move to
first column or remove the column depending on keep
flag.
This function is not exported.
Usage
keep_marked_column(x, column, keep)
Arguments
x |
Data set. |
column |
Name of exclusion column. |
keep |
Logical indicating whether to keep or remove exclusion column. |
Mark duplicate IP addresses and/or locations
Description
The mark_duplicates()
function creates a column labeling
rows of data that have the same IP address and/or same latitude and
longitude. The function is written to work with data from
Qualtrics surveys.
Usage
mark_duplicates(
x,
id_col = "ResponseId",
ip_col = "IPAddress",
location_col = c("LocationLatitude", "LocationLongitude"),
rename = TRUE,
dupl_ip = TRUE,
dupl_location = TRUE,
include_na = FALSE,
quiet = FALSE,
print = TRUE
)
Arguments
x |
Data frame (preferably imported from Qualtrics using {qualtRics}). |
id_col |
Column name for unique row ID (e.g., participant). |
ip_col |
Column name for IP addresses. |
location_col |
Two element vector specifying columns for latitude and longitude (in that order). |
rename |
Logical indicating whether to rename columns (using |
dupl_ip |
Logical indicating whether to check IP addresses. |
dupl_location |
Logical indicating whether to check latitude and longitude. |
include_na |
Logical indicating whether to include rows with NAs for IP address and location as potentially excluded rows. |
quiet |
Logical indicating whether to print message to console. |
print |
Logical indicating whether to print returned tibble to console. |
Details
To record this information in your Qualtrics survey, you must ensure that Anonymize responses is disabled.
Default column names are set based on output from the
qualtRics::fetch_survey()
.
By default, IP address and location are both checked, but they can be
checked separately with the dupl_ip
and dupl_location
arguments.
The function outputs to console separate messages about the number of rows with duplicate IP addresses and rows with duplicate locations. These counts are computed independently, so rows may be counted for both types of duplicates.
Value
An object of the same type as x
that includes a column marking rows
with duplicate IP addresses and/or locations.
For a function that just checks for and returns duplicate rows,
use check_duplicates()
. For a function that excludes these rows,
use exclude_duplicates()
.
See Also
Other duplicates functions:
check_duplicates()
,
exclude_duplicates()
Other mark functions:
mark_duration()
,
mark_ip()
,
mark_location()
,
mark_preview()
,
mark_progress()
,
mark_resolution()
Examples
# Mark duplicate IP addresses and locations
data(qualtrics_text)
df <- mark_duplicates(qualtrics_text)
# Remove preview data first
df <- qualtrics_text %>%
exclude_preview() %>%
mark_duplicates()
# Mark only for duplicate locations
df <- qualtrics_text %>%
exclude_preview() %>%
mark_duplicates(dupl_location = FALSE)
Mark minimum or maximum durations
Description
The mark_duration()
function creates a column labeling
rows with fast and/or slow duration.
The function is written to work with data from
Qualtrics surveys.
Usage
mark_duration(
x,
min_duration = 10,
max_duration = NULL,
id_col = "ResponseId",
duration_col = "Duration (in seconds)",
rename = TRUE,
quiet = FALSE,
print = TRUE
)
Arguments
x |
Data frame (preferably imported from Qualtrics using {qualtRics}). |
min_duration |
Minimum duration that is too fast in seconds. |
max_duration |
Maximum duration that is too slow in seconds. |
id_col |
Column name for unique row ID (e.g., participant). |
duration_col |
Column name for durations. |
rename |
Logical indicating whether to rename columns (using |
quiet |
Logical indicating whether to print message to console. |
print |
Logical indicating whether to print returned tibble to console. |
Details
Default column names are set based on output from the
qualtRics::fetch_survey()
.
By default, minimum durations of 10 seconds are checked, but either
minima or maxima can be checked with the min_duration
and
max_duration
arguments. The function outputs to console separate
messages about the number of rows that are too fast or too slow.
This function returns the fast and slow rows.
Value
An object of the same type as x
that includes a column marking rows
with fast and slow duration.
For a function that checks for these rows, use check_duration()
.
For a function that excludes these rows, use exclude_duration()
.
See Also
Other duration functions:
check_duration()
,
exclude_duration()
Other mark functions:
mark_duplicates()
,
mark_ip()
,
mark_location()
,
mark_preview()
,
mark_progress()
,
mark_resolution()
Examples
# Mark durations faster than 100 seconds
data(qualtrics_text)
df <- mark_duration(qualtrics_text, min_duration = 100)
# Remove preview data first
df <- qualtrics_text %>%
exclude_preview() %>%
mark_duration()
# Mark only for durations slower than 800 seconds
df <- qualtrics_text %>%
exclude_preview() %>%
mark_duration(max_duration = 800)
Mark IP addresses from outside of a specified country.
Description
The mark_ip()
function creates a column labeling
rows of data that have IP addresses from outside the specified country.
The function is written to work with data from
Qualtrics surveys.
Usage
mark_ip(
x,
id_col = "ResponseId",
ip_col = "IPAddress",
rename = TRUE,
country = "US",
include_na = FALSE,
quiet = FALSE,
print = TRUE
)
Arguments
x |
Data frame or tibble (preferably imported from Qualtrics using {qualtRics}). |
id_col |
Column name for unique row ID (e.g., participant). |
ip_col |
Column name for IP addresses. |
rename |
Logical indicating whether to rename columns (using |
country |
Two-letter abbreviation of country to check (default is "US"). |
include_na |
Logical indicating whether to include rows with NA in IP address column in the output list of potentially excluded data. |
quiet |
Logical indicating whether to print message to console. |
print |
Logical indicating whether to print returned tibble to console. |
Details
To record this information in your Qualtrics survey, you must ensure that Anonymize responses is disabled.
Default column names are set based on output from the
qualtRics::fetch_survey()
.
The function uses ipaddress::country_networks()
to assign IP addresses to
specific countries using
ISO 3166-1 alpha-2 country codes.
The function outputs to console a message about the number of rows
with IP addresses outside of the specified country. If there are NA
s for IP
addresses (likely due to including preview data—see check_preview()
), it
will print a message alerting to the number of rows with NA
s.
Value
An object of the same type as x
that includes a column marking rows
with IP addresses outside of the specified country.
For a function that checks these rows, use check_ip()
.
For a function that excludes these rows, use exclude_ip()
.
Note
This function requires internet connectivity as it uses the
ipaddress::country_networks()
function, which pulls daily updated data
from https://www.iwik.org/ipcountry/. It only updates the data once
per session, as it caches the results for future work during the session.
See Also
Other ip functions:
check_ip()
,
exclude_ip()
Other mark functions:
mark_duplicates()
,
mark_duration()
,
mark_location()
,
mark_preview()
,
mark_progress()
,
mark_resolution()
Examples
# Mark IP addresses outside of the US
data(qualtrics_text)
df <- mark_ip(qualtrics_text)
# Remove preview data first
df <- qualtrics_text %>%
exclude_preview() %>%
mark_ip()
# Mark IP addresses outside of Germany
df <- qualtrics_text %>%
exclude_preview() %>%
mark_ip(country = "DE")
Mark locations outside of US
Description
The mark_location()
function creates a column labeling
rows that have locations outside of the US.
The function is written to work with data from
Qualtrics surveys.
Usage
mark_location(
x,
id_col = "ResponseId",
location_col = c("LocationLatitude", "LocationLongitude"),
rename = TRUE,
include_na = FALSE,
quiet = FALSE,
print = TRUE
)
Arguments
x |
Data frame (preferably imported from Qualtrics using {qualtRics}). |
id_col |
Column name for unique row ID (e.g., participant). |
location_col |
Two element vector specifying columns for latitude and longitude (in that order). |
rename |
Logical indicating whether to rename columns (using |
include_na |
Logical indicating whether to include rows with NA in latitude and longitude columns in the output list of potentially excluded data. |
quiet |
Logical indicating whether to print message to console. |
print |
Logical indicating whether to print returned tibble to console. |
Details
To record this information in your Qualtrics survey, you must ensure that Anonymize responses is disabled.
Default column names are set based on output from the
qualtRics::fetch_survey()
.
The function only works for the United States.
It uses the #' maps::map.where()
to determine if latitude and longitude
are inside the US.
The function outputs to console a message about the number of rows with locations outside of the US.
Value
An object of the same type as x
that includes a column marking rows
that are located outside of the US and (if include_na == FALSE
) rows with
no location information.
For a function that checks for these rows, use check_location()
.
For a function that excludes these rows, use exclude_location()
.
See Also
Other location functions:
check_location()
,
exclude_location()
Other mark functions:
mark_duplicates()
,
mark_duration()
,
mark_ip()
,
mark_preview()
,
mark_progress()
,
mark_resolution()
Examples
# Mark locations outside of the US
data(qualtrics_text)
df <- mark_location(qualtrics_text)
# Remove preview data first
df <- qualtrics_text %>%
exclude_preview() %>%
mark_location()
Mark survey previews
Description
The mark_preview()
function creates a column labeling
rows that are survey previews.
The function is written to work with data from
Qualtrics surveys.
Usage
mark_preview(
x,
id_col = "ResponseId",
preview_col = "Status",
rename = TRUE,
quiet = FALSE,
print = TRUE
)
Arguments
x |
Data frame (preferably imported from Qualtrics using {qualtRics}). |
id_col |
Column name for unique row ID (e.g., participant). |
preview_col |
Column name for survey preview. |
rename |
Logical indicating whether to rename columns (using |
quiet |
Logical indicating whether to print message to console. |
print |
Logical indicating whether to print returned tibble to console. |
Details
Default column names are set based on output from the
qualtRics::fetch_survey()
.
The preview column in Qualtrics can be a numeric or character vector
depending on whether it is exported as choice text or numeric values.
This function works for both.
The function outputs to console a message about the number of rows that are survey previews.
Value
An object of the same type as x
that includes a column marking rows
that are survey previews.
For a function that checks for these rows, use check_preview()
.
For a function that excludes these rows, use exclude_preview()
.
See Also
Other preview functions:
check_preview()
,
exclude_preview()
Other mark functions:
mark_duplicates()
,
mark_duration()
,
mark_ip()
,
mark_location()
,
mark_progress()
,
mark_resolution()
Examples
# Mark survey previews
data(qualtrics_text)
df <- mark_preview(qualtrics_text)
# Works for Qualtrics data exported as numeric values, too
df <- qualtrics_numeric %>%
mark_preview()
Mark survey progress
Description
The mark_progress()
function creates a column labeling
rows that have incomplete progress.
The function is written to work with data from
Qualtrics surveys.
Usage
mark_progress(
x,
min_progress = 100,
id_col = "ResponseId",
finished_col = "Finished",
progress_col = "Progress",
rename = TRUE,
quiet = FALSE,
print = TRUE
)
Arguments
x |
Data frame (preferably imported from Qualtrics using {qualtRics}). |
min_progress |
Amount of progress considered acceptable to include. |
id_col |
Column name for unique row ID (e.g., participant). |
finished_col |
Column name for whether survey was completed. |
progress_col |
Column name for percentage of survey completed. |
rename |
Logical indicating whether to rename columns (using |
quiet |
Logical indicating whether to print message to console. |
print |
Logical indicating whether to print returned tibble to console. |
Details
Default column names are set based on output from the
qualtRics::fetch_survey()
.
The default requires 100% completion, but lower levels of completion
maybe acceptable and can be allowed by specifying the min_progress
argument.
The finished column in Qualtrics can be a numeric or character vector
depending on whether it is exported as choice text or numeric values.
This function works for both.
The function outputs to console a message about the number of rows that have incomplete progress.
Value
An object of the same type as x
that includes a column marking rows
that have incomplete progress.
For a function that checks for these rows, use check_progress()
.
For a function that excludes these rows, use exclude_progress()
.
See Also
Other progress functions:
check_progress()
,
exclude_progress()
Other mark functions:
mark_duplicates()
,
mark_duration()
,
mark_ip()
,
mark_location()
,
mark_preview()
,
mark_resolution()
Examples
# Mark rows with incomplete progress
data(qualtrics_text)
df <- mark_progress(qualtrics_text)
# Remove preview data first
df <- qualtrics_text %>%
exclude_preview() %>%
mark_progress()
# Include a lower acceptable completion percentage
df <- qualtrics_numeric %>%
exclude_preview() %>%
mark_progress(min_progress = 98)
Mark unacceptable screen resolution
Description
The mark_resolution()
function creates a column labeling
rows that have unacceptable screen resolution.
The function is written to work with data from
Qualtrics surveys.
Usage
mark_resolution(
x,
res_min = 1000,
width_min = 0,
height_min = 0,
id_col = "ResponseId",
res_col = "Resolution",
rename = TRUE,
quiet = FALSE,
print = TRUE
)
Arguments
x |
Data frame (preferably imported from Qualtrics using {qualtRics}). |
res_min |
Minimum acceptable screen resolution (width and height). |
width_min |
Minimum acceptable screen width. |
height_min |
Minimum acceptable screen height. |
id_col |
Column name for unique row ID (e.g., participant). |
res_col |
Column name for screen resolution (in format widthxheight). |
rename |
Logical indicating whether to rename columns (using |
quiet |
Logical indicating whether to print message to console. |
print |
Logical indicating whether to print returned tibble to console. |
Details
To record this information in your Qualtrics survey, you must insert a meta info question.
Default column names are set based on output from the
qualtRics::fetch_survey()
.
The function outputs to console a message about the number of rows with unacceptable screen resolution.
Value
An object of the same type as x
that includes a column marking rows
that have unacceptable screen resolutions.
For a function that checks for these rows, use check_resolution()
.
For a function that excludes these rows, use exclude_resolution()
.
See Also
Other resolution functions:
check_resolution()
,
exclude_resolution()
Other mark functions:
mark_duplicates()
,
mark_duration()
,
mark_ip()
,
mark_location()
,
mark_preview()
,
mark_progress()
Examples
# Mark low screen resolutions
data(qualtrics_text)
df <- mark_resolution(qualtrics_text)
# Remove preview data first
df <- qualtrics_text %>%
exclude_preview() %>%
mark_resolution()
Return marked rows
Description
Create new column marking rows that meet exclusion criteria. This function is not exported.
Usage
mark_rows(x, filtered_data, id_col, exclusion_type)
Arguments
x |
Original data. |
filtered_data |
Data to be excluded. |
id_col |
Column name for unique row ID (e.g., participant). |
exclusion_type |
Column name for exclusion column. |
Print data to console
Description
Prints the data to the console. This function is not exported.
Usage
print_data(x, print)
Arguments
x |
Data set to print or not |
print |
Logical indicating whether to print returned tibble to console. |
Print number of excluded rows
Description
Prints a message to the console with the number of excluded rows. This function is not exported.
Usage
print_exclusion(remaining_data, x, msg)
Arguments
remaining_data |
Data after removing exclusions. |
x |
Original data before removing exclusions. |
msg |
Text to describe what types of rows were excluded. |
Example numeric metadata imported with qualtRics::fetch_survey()
from
simulated Qualtrics study
Description
A dataset containing the metadata from a standard Qualtrics survey with
browser metadata collected and exported with "Use numeric values". The data
were imported using
qualtRics::fetch_survey()
.
These data were randomly generated using iptools::ip_random() and
rgeolocate::ip2location() functions.
Usage
qualtrics_fetch
Format
A data frame with 100 rows and 17 variables:
- StartDate
date and time data collection started, in ISO 8601 format
- EndDate
date and time data collection ended, in ISO 8601 format
- Status
numeric flag for preview (1) vs. implemented survey (0) entries
- IPAddress
participant IP address (truncated for anonymity)
- Progress
percentage of survey completed
- Duration (in seconds)
duration of time required to complete survey, in seconds
- Finished
numeric flag for whether survey was completed (1) or progress was < 100 (0)
- RecordedDate
date and time survey was recorded, in ISO 8601 format
- ResponseId
random ID for participants
- LocationLatitude
latitude geolocated from IP address
- LocationLongitude
longitude geolocated from IP address
- UserLanguage
language set in Qualtrics
- Q1_Browser
user web browser type
- Q1_Version
user web browser version
- Q1_Operating System
user operating system
- Q1_Resolution
user screen resolution
- Q2
response to question about whether the user liked the survey (1 = Yes, 0 = No)
See Also
Other data:
qualtrics_fetch2
,
qualtrics_numeric
,
qualtrics_raw
,
qualtrics_text
Example numeric metadata imported with qualtRics::fetch_survey()
from
simulated Qualtrics study but with labels included as column names
Description
A dataset containing the metadata from a standard Qualtrics survey with
browser metadata collected and exported with "Use numeric values". The data
were imported using
qualtRics::fetch_survey()
.
and then the secondary labels were assigned as column names with
sjlabelled::get_label()
.
These data were randomly generated using iptools::ip_random() and
rgeolocate::ip2location() functions.
Usage
qualtrics_fetch2
Format
A data frame with 100 rows and 17 variables:
- Start Date
date and time data collection started, in ISO 8601 format
- End Date
date and time data collection ended, in ISO 8601 format
- Response Type
numeric flag for preview (1) vs. implemented survey (0) entries
- IP Address
participant IP address (truncated for anonymity)
- Progress
percentage of survey completed
- Duration (in seconds)
duration of time required to complete survey, in seconds
- Finished
numeric flag for whether survey was completed (1) or progress was < 100 (0)
- Recorded Date
date and time survey was recorded, in ISO 8601 format
- Response ID
random ID for participants
- Location Latitude
latitude geolocated from IP address
- Location Longitude
longitude geolocated from IP address
- User Language
language set in Qualtrics
- Click to write the question text - Browser
user web browser type
- Click to write the question text - Version
user web browser version
- Click to write the question text - Operating System
user operating system
- Click to write the question text - Resolution
user screen resolution
- like
response to question about whether the user liked the survey (1 = Yes, 0 = No)
See Also
Other data:
qualtrics_fetch
,
qualtrics_numeric
,
qualtrics_raw
,
qualtrics_text
Example numeric metadata from simulated Qualtrics study
Description
A dataset containing the metadata from a standard Qualtrics survey with browser metadata collected and exported with "Use numeric values". These data were randomly generated using iptools::ip_random() and rgeolocate::ip2location() functions.
Usage
qualtrics_numeric
Format
A data frame with 100 rows and 16 variables:
- StartDate
date and time data collection started, in ISO 8601 format
- EndDate
date and time data collection ended, in ISO 8601 format
- Status
numeric flag for preview (1) vs. implemented survey (0) entries
- IPAddress
participant IP address (truncated for anonymity)
- Progress
percentage of survey completed
- Duration (in seconds)
duration of time required to complete survey, in seconds
- Finished
numeric flag for whether survey was completed (1) or progress was < 100 (0)
- RecordedDate
date and time survey was recorded, in ISO 8601 format
- ResponseId
random ID for participants
- LocationLatitude
latitude geolocated from IP address
- LocationLongitude
longitude geolocated from IP address
- UserLanguage
language set in Qualtrics
- Browser
user web browser type
- Version
user web browser version
- Operating System
user operating system
- Resolution
user screen resolution
See Also
Other data:
qualtrics_fetch
,
qualtrics_fetch2
,
qualtrics_raw
,
qualtrics_text
Example text-based metadata from simulated Qualtrics study
Description
A dataset containing the metadata from a standard Qualtrics survey with browser metadata collected and exported with "Use choice text". These data were randomly generated using iptools::ip_random() and rgeolocate::ip2location() functions. This dataset includes the two header rows of with column information that is exported by Qualtrics.
Usage
qualtrics_raw
Format
A data frame with 102 rows and 16 variables:
- StartDate
date and time data collection started, in ISO 8601 format
- EndDate
date and time data collection ended, in ISO 8601 format
- Status
flag for preview (Survey Preview) vs. implemented survey (IP Address) entries
- IPAddress
participant IP address (truncated for anonymity)
- Progress
percentage of survey completed
- Duration (in seconds)
duration of time required to complete survey, in seconds
- Finished
logical for whether survey was completed (TRUE) or progress was < 100 (FALSE)
- RecordedDate
date and time survey was recorded, in ISO 8601 format
- ResponseId
random ID for participants
- LocationLatitude
latitude geolocated from IP address
- LocationLongitude
longitude geolocated from IP address
- UserLanguage
language set in Qualtrics
- Browser
user web browser type
- Version
user web browser version
- Operating System
user operating system
- Resolution
user screen resolution
See Also
Other data:
qualtrics_fetch
,
qualtrics_fetch2
,
qualtrics_numeric
,
qualtrics_text
Example text-based metadata from simulated Qualtrics study
Description
A dataset containing the metadata from a standard Qualtrics survey with browser metadata collected and exported with "Use choice text". These data were randomly generated using iptools::ip_random() and rgeolocate::ip2location() functions.
Usage
qualtrics_text
Format
A data frame with 100 rows and 16 variables:
- StartDate
date and time data collection started, in ISO 8601 format
- EndDate
date and time data collection ended, in ISO 8601 format
- Status
flag for preview (Survey Preview) vs. implemented survey (IP Address) entries
- IPAddress
participant IP address (truncated for anonymity)
- Progress
percentage of survey completed
- Duration (in seconds)
duration of time required to complete survey, in seconds
- Finished
logical for whether survey was completed (TRUE) or progress was < 100 (FALSE)
- RecordedDate
date and time survey was recorded, in ISO 8601 format
- ResponseId
random ID for participants
- LocationLatitude
latitude geolocated from IP address
- LocationLongitude
longitude geolocated from IP address
- UserLanguage
language set in Qualtrics
- Browser
user web browser type
- Version
user web browser version
- Operating System
user operating system
- Resolution
user screen resolution
See Also
Other data:
qualtrics_fetch
,
qualtrics_fetch2
,
qualtrics_numeric
,
qualtrics_raw
Remove two initial rows created in Qualtrics data
Description
The remove_label_rows()
function filters out the initial label rows from
datasets downloaded from Qualtrics surveys.
Usage
remove_label_rows(x, convert = TRUE, rename = FALSE)
Arguments
x |
Data frame (downloaded from Qualtrics). |
convert |
Logical indicating whether to convert/coerce date, logical and numeric columns from the metadata. |
rename |
Logical indicating whether to rename columns based on first row of data. |
Details
The function (1) checks if the data set uses Qualtrics column names, (2) checks if label rows are already used as column names, (3) removes label rows if present, and (4) converts date, logical, and numeric metadata columns to proper data type. Datasets imported using qualtRics::fetch_survey() should not need this function.
The convert
argument only converts the StartDate, EndDate,
RecordedDate, Progress, Finished, Duration (in seconds),
LocationLatitude, and LocationLongitude columns. To convert other data
columns, see dplyr::mutate()
.
Value
An object of the same type as x
that excludes Qualtrics label rows and
with date, logical, and numeric metadata columns converted to the correct
data class.
Examples
# Remove label rows
data(qualtrics_raw)
df <- remove_label_rows(qualtrics_raw)
Rename columns to match standard Qualtrics names
Description
The rename_columns()
function renames the metadata columns to match
standard Qualtrics names.
Usage
rename_columns(x, alert = TRUE)
Arguments
x |
Data frame (preferably imported from Qualtrics using {qualtRics}). |
alert |
Logical indicating whether to alert user to the fact that the columns do not match the secondary labels and therefore cannot be renamed. |
Details
When importing Qualtrics data using
qualtRics::fetch_survey()
.
labels entered in Qualtrics questions are saved as 'subtitles' for column
names. Using sjlabelled::get_label()
can make these secondary labels be the
primary column names. However, this results in a different set of names for
the metadata columns than is used in all of the mark_()
, check_()
, and
exclude_()
functions. This function renames these columns to match the
standard Qualtrics names.
Value
An object of the same type as x
that has column names that match standard
Qualtrics names.
See Also
Other column name functions:
use_labels()
Examples
# Rename columns
data(qualtrics_fetch)
qualtrics_renamed <- qualtrics_fetch %>%
rename_columns()
names(qualtrics_fetch)
names(qualtrics_renamed)
# Alerts when columns cannot be renamed
data(qualtrics_numeric)
rename_columns(qualtrics_numeric)
# Turn off alert
rename_columns(qualtrics_numeric, alert = FALSE)
Unite multiple exclusion columns into single column
Description
Each of the mark_*()
functions appends a new column to the data.
The unite_exclusions()
function unites all of those columns in a
single column that can be used to filter any or all exclusions downstream.
Rows with multiple exclusions are concatenated with commas.
Usage
unite_exclusions(
x,
exclusion_types = c("duplicates", "duration", "ip", "location", "preview", "progress",
"resolution"),
separator = ",",
remove = TRUE
)
Arguments
x |
Data frame or tibble (preferably exported from Qualtrics). |
exclusion_types |
Vector of types of exclusions to unite. |
separator |
Character string specifying what character to use to separate multiple exclusion types |
remove |
Logical specifying whether to remove united columns (default = TRUE) or leave them in the data frame (FALSE) |
Value
An object of the same type as x
that includes the all of the same
rows but with a single exclusion
column replacing all of the specified
exclusion_*
columns.
Examples
# Unite all exclusion types
df <- qualtrics_text %>%
mark_duplicates() %>%
mark_duration(min_duration = 100) %>%
mark_ip() %>%
mark_location() %>%
mark_preview() %>%
mark_progress() %>%
mark_resolution()
df2 <- df %>%
unite_exclusions()
# Unite subset of exclusion types
df2 <- df %>%
unite_exclusions(exclusion_types = c("duplicates", "duration", "ip"))
Use Qualtrics labels as column names
Description
The use_labels()
function renames the columns using the labels generated
in Qualtrics. Data must be imported using
qualtRics::fetch_survey()
.
Usage
use_labels(x)
Arguments
x |
Data frame imported using |
Value
An object of the same type as x
that has column names using the labels
generated in Qualtrics.
See Also
Other column name functions:
rename_columns()
Examples
# Rename columns
data(qualtrics_fetch)
qualtrics_renamed <- qualtrics_fetch %>%
use_labels()
names(qualtrics_fetch)
names(qualtrics_renamed)
Check number, names, and type of columns
Description
Determines whether the correct number and names of columns were specified as arguments to the functions. This function is not exported.
Usage
validate_columns(x, column)
Arguments
x |
Data set. |
column |
Name of column argument to check. |