Type: | Package |
Title: | Assesses the Quality of Estimates Made by Complex Sample Designs |
Version: | 0.8.1 |
Description: | Assesses the quality of estimates made by complex sample designs, following the methodology developed by the National Institute of Statistics Chile (Household Survey Standard 2020, https://www.ine.cl/docs/default-source/institucionalidad/buenas-pr%C3%A1cticas/clasificaciones-y-estandares/est%C3%A1ndar-evaluaci%C3%B3n-de-calidad-de-estimaciones-publicaci%C3%B3n-27022020.pdf), (Economics Survey Standard 2024, https://www.ine.gob.cl/docs/default-source/buenas-practicas/directrices-metodologicas/estandares/documentos/est%C3%A1ndar-evaluaci%C3%B3n-de-calidad-de-estimaciones-econ%C3%B3micas.pdf?sfvrsn=201fbeb9_2) and by Economic Commission for Latin America and Caribbean (2020, https://repositorio.cepal.org/bitstream/handle/11362/45681/1/S2000293_es.pdf), (2024, https://repositorio.cepal.org/server/api/core/bitstreams/f04569e6-4f38-42e7-a32b-e0b298e0ab9c/content). |
License: | GPL-3 |
Encoding: | UTF-8 |
LazyData: | true |
RoxygenNote: | 7.3.2 |
Suggests: | knitr, rmarkdown, testthat, srvyr |
VignetteBuilder: | knitr |
Depends: | R (≥ 3.5.0) |
Imports: | rlang, dplyr, purrr, survey, kableExtra, stringr, haven |
NeedsCompilation: | no |
Packaged: | 2025-06-02 18:56:39 UTC; klehmannm |
Author: | Klaus Lehmann [aut, cre], Ricardo Pizarro [aut], Ignacio Agloni [ctb], Andrea López [ctb], Javiera Preuss [ctb] |
Maintainer: | Klaus Lehmann <klehmann@fen.uchile.cl> |
Repository: | CRAN |
Date/Publication: | 2025-06-02 19:30:05 UTC |
Encuesta Longitudinal de Empresas
Description
ELE data for the year 2022. Contains only a few variables.
Usage
ELE7
Format
dataframe with 6.592 rows y 13 columns
- rol_ficticio
Company ID
- cod_actividad
Economic activity
- cod_tamano
Company size by sales
- tramo
Inclusion range
- fe_transversal
Cross-sectional weights
- fe_longitudinal
Longitudinal weights
- panel
Panel sample
- estrato
Strata
- pob
Finite population correction
- VA_2022
Value added 2022, difference between production value and intermediate consumption
- VA_2022f
VA_2022f is an adjusted version of VA_2022, where negative values are replaced with 0, while non-negative values remain unchanged.
- EMP
Total personnel employed and hired by the company on a monthly basis
- REMP_TOTAL
Total gross remuneration of personnel hired by the company
Source
Examples
data(ELE7)
Tamano muestra objetivo Encuesta Longitudinal de Empresas
Description
Target cross-sectional sample size ELE data for the year 2022.
Usage
ELE7_n_obj
Format
dataframe with 59 rows y 4 columns
- cod_tamano
Company size by sales
- cod_actividad_letra
Economic activity
- cod_actividad
Economic activity ID
- n_obj
Target sample size
Source
Examples
data(ELE7_n_obj)
Assess the quality of mean estimations
Description
assess
evaluates the quality of mean estimations using the
methodology created by INE Chile, which considers sample size, degrees of freedom, and
coefficient of variation.
Usage
assess(
table,
publish = FALSE,
scheme = c("chile", "eclac_2020", "eclac_2023", "chile_economics"),
domain_info = FALSE,
low_df_justified = FALSE,
table_n_obj = NULL,
ratio_between_0_1 = TRUE,
...
)
Arguments
table |
|
publish |
|
scheme |
|
domain_info |
Logical. If |
low_df_justified |
Logical. If |
table_n_obj |
Default |
ratio_between_0_1 |
|
... |
additional parameters for the evaluation. The complete list of parameters is: 1. General Parameters
2. chile Parameters
3. CEPAL 2020 Parameters
4. CEPAL 2023 Parameters
5. Chile Economic Survey Standard Parameters
|
Value
dataframe
with all the columns included in the input table, plus a new column
containing a label indicating the evaluation of each estimation: reliable, bit reliable, or unreliable.
Examples
dc <- survey::svydesign(ids = ~varunit, strata = ~varstrat, data = epf_personas, weights = ~fe)
assess(create_mean("gastot_hd", domains = "zona+sexo", design = dc))
Encuesta de Caracterización Socioeconómica Nacional 2020 - CASEN en Pandemia 2020
Description
CASEN data for the year 2020. Contains only a few variables.
Usage
casen
Format
dataframe with 185.437 rows y 6 columns
- folio
household id
- sexo
1 = man; 2 = woman
- edad
age
- activ
Economic activity status
- ing_aut_hog
Household Income
- pobreza
poverty status: 1 = extreme poverty, 2 = non-extreme poverty, 3 = non-poverty
- expr
regional sample weights
- estrato
strata
- cod_upm
PSU
Source
http://observatorio.ministeriodesarrollosocial.gob.cl/encuesta-casen-en-pandemia-2020
Examples
data(casen)
Create html table with the results of the evaluation
Description
Create html table with the results of the evaluation
Usage
create_html(table)
Arguments
table |
|
Value
html
table
Examples
library(survey)
library(dplyr)
hogar <- epf_personas %>%
group_by(folio) %>%
slice(1)
dc <- survey::svydesign(ids = ~varunit, strata = ~varstrat, data = hogar, weights = ~fe)
table <- assess(create_prop("ocupado", domains = "zona+sexo", design = dc))
Create the inputs to evaluate the quality of mean estimations
Description
create_mean
generates a dataframe
with the following elements: mean,
degrees of freedom, sample size, and coefficient of variation. The function allows
grouping in several domains.
Usage
create_mean(
var,
domains = NULL,
subpop = NULL,
design,
ci = FALSE,
ess = FALSE,
ajuste_ene = FALSE,
standard_eval = FALSE,
rm.na = FALSE,
deff = FALSE,
rel_error = FALSE,
unweighted = FALSE,
eclac_input = FALSE
)
Arguments
var |
numeric variable within the |
domains |
domains to be estimated separated by the + character. |
subpop |
integer dummy variable to filter the dataframe. |
design |
complex design created by |
ci |
|
ess |
|
ajuste_ene |
|
standard_eval |
|
rm.na |
|
deff |
|
rel_error |
|
unweighted |
|
eclac_input |
|
Value
dataframe
that contains the inputs and all domains to be evaluated.
Examples
dc <- survey::svydesign(ids = ~varunit, strata = ~varstrat, data = epf_personas, weights = ~fe)
create_mean("gastot_hd", "zona+sexo", design = dc)
Create the inputs to evaluate the quality of proportion estimations
Description
create_prop
generates a dataframe
with the following elements: sum,
degrees of freedom, sample size, standard error, and coefficient of variation. The function allows
grouping in several domains.
Usage
create_prop(
var,
denominator = NULL,
domains = NULL,
subpop = NULL,
design,
ci = FALSE,
deff = FALSE,
ess = FALSE,
ajuste_ene = FALSE,
rel_error = FALSE,
log_cv = FALSE,
unweighted = FALSE,
standard_eval = FALSE,
eclac_input = FALSE,
ci_logit = FALSE,
scheme = c("eclac_2020", "eclac_2023")
)
Arguments
var |
numeric variable within the |
denominator |
numeric variable within the |
domains |
domains to be estimated separated by the + character. |
subpop |
integer dummy variable to filter the dataframe. |
design |
complex design created by |
ci |
|
deff |
|
ess |
|
ajuste_ene |
|
rel_error |
|
log_cv |
|
unweighted |
|
standard_eval |
|
eclac_input |
|
ci_logit |
|
scheme |
|
Value
dataframe
that contains the inputs and all domains to be evaluated.
Examples
library(survey)
library(dplyr)
epf <- mutate(epf_personas, gasto_zona1 = if_else(zona == 1, gastot_hd, 0))
dc <- svydesign(ids = ~varunit, strata = ~varstrat, data = epf, weights = ~fe)
old_options <- options()
options(survey.lonely.psu = "certainty")
create_prop(var = "gasto_zona1", denominator = "gastot_hd", design = dc)
enusc <- filter(enusc, Kish == 1)
dc <- svydesign(ids = ~Conglomerado, strata = ~VarStrat, data = enusc, weights = ~Fact_Pers)
options(survey.lonely.psu = "certainty")
create_prop(var = "VP_DC", denominator = "hom_insg_taxi", design = dc)
options(old_options)
internal function to calculate proportion estimations
Description
internal function to calculate proportion estimations
Usage
create_prop_internal(
var,
domains = NULL,
subpop = NULL,
disenio,
ci = FALSE,
deff = FALSE,
ess = FALSE,
ajuste_ene = FALSE,
rel_error = FALSE,
log_cv = FALSE,
unweighted = FALSE,
standard_eval = TRUE,
rm.na = FALSE,
env = parent.frame(),
ci_logit = FALSE
)
Arguments
var |
integer dummy variable within the |
domains |
domains to be estimated separated by the + character. |
subpop |
integer dummy variable to filter the dataframe |
disenio |
complex design created by |
ci |
|
deff |
|
ess |
|
ajuste_ene |
|
rel_error |
|
log_cv |
|
unweighted |
|
standard_eval |
|
rm.na |
|
env |
parent environment to get some variables |
ci_logit |
|
Value
dataframe
that contains the inputs and all domains to be evaluated
internal function to calculate ratios estimations
Description
internal function to calculate ratios estimations
Usage
create_ratio_internal(
var,
denominator,
domains = NULL,
subpop = NULL,
disenio,
ci = FALSE,
deff = FALSE,
ess = FALSE,
ajuste_ene = FALSE,
unweighted = FALSE,
rel_error = FALSE,
log_cv = FALSE,
rm.na = FALSE
)
Arguments
var |
numeric variable within the |
denominator |
numeric variable within the |
domains |
domains to be estimated separated by the + character. |
subpop |
integer dummy variable to filter the dataframe |
disenio |
complex design created by |
ci |
|
deff |
|
ess |
|
ajuste_ene |
|
unweighted |
|
rel_error |
|
log_cv |
|
rm.na |
|
Value
dataframe
that contains the inputs and all domains to be evaluated
Create the inputs to evaluate the quality of total estimations
Description
create_size
generates a dataframe
with the following elements: sum,
degrees of freedom, sample size, and coefficient of variation. The function allows
grouping in several domains.
Usage
create_size(
var,
domains = NULL,
subpop = NULL,
design,
ci = FALSE,
ess = FALSE,
ajuste_ene = FALSE,
standard_eval = FALSE,
rm.na = FALSE,
deff = FALSE,
rel_error = FALSE,
unweighted = FALSE,
df_type = c("chile", "eclac"),
eclac_input = FALSE
)
Arguments
var |
numeric variable within the |
domains |
domains to be estimated separated by the + character. |
subpop |
integer dummy variable to filter the dataframe. |
design |
complex design created by |
ci |
|
ess |
|
ajuste_ene |
|
standard_eval |
|
rm.na |
|
deff |
|
rel_error |
|
unweighted |
|
df_type |
|
eclac_input |
|
Value
dataframe
that contains the inputs and all domains to be evaluated.
Examples
dc <- survey::svydesign(ids = ~varunit, strata = ~varstrat, data = epf_personas, weights = ~fe)
create_size("ocupado", "zona+sexo", design = dc)
Create the inputs to evaluate the quality of the sum of continuous variables
Description
create_total
generates a dataframe
with the following elements: sum,
degrees of freedom, sample size, and coefficient of variation. The function allows
grouping in several domains.
Usage
create_total(
var,
domains = NULL,
subpop = NULL,
design,
ci = FALSE,
ess = FALSE,
ajuste_ene = FALSE,
standard_eval = FALSE,
rm.na = FALSE,
deff = FALSE,
rel_error = FALSE,
unweighted = FALSE,
eclac_input = FALSE
)
Arguments
var |
numeric variable within the |
domains |
domains to be estimated separated by the + character. |
subpop |
integer dummy variable to filter the dataframe. |
design |
complex design created by |
ci |
|
ess |
|
ajuste_ene |
|
standard_eval |
|
rm.na |
|
deff |
|
rel_error |
|
unweighted |
|
eclac_input |
|
Value
dataframe
that contains the inputs and all domains to be evaluated.
Examples
dc <- survey::svydesign(ids = ~varunit, strata = ~varstrat, data = epf_personas, weights = ~fe)
create_total("gastot_hd", "zona+sexo", subpop = "ocupado", design = dc)
Turn on all the indicators needed for the eclac standard
Description
This function activates the appropriate indicators based on the selected eclac standard and whether proportion indicators are needed.
Usage
eclac_standard(eclac, env = parent.frame(), proportion = FALSE)
Arguments
eclac |
A logical value indicating the eclac standard. |
env |
The environment from which to retrieve the existing indicator values. Defaults to the parent frame. |
proportion |
A logical value indicating whether proportion indicators should be turned on. Defaults to FALSE. |
Value
A list of logical values indicating which indicators are turned on.
Encuesta Nacional de Empleo - ENE. 2020-efm
Description
Reduced version of the ENE database. Contains some sociodemographic variables and the necessary information to work with complex design
Usage
ene
Format
dataframe with 87.842 rows y 7 columns
- sexo
1 = man; 2 = woman
- region
region
- cae_especifico
Economic activity status
- fe
sample weights
- varunit
PSU
- varstrat
strata
- fdt
It shows if the person belongs to labour force: 1 = yes; 0 = no
- ocupado
1 = employed; 0 = non-employed
- desocupado
1 = non-employed; 0 = employed
Source
https://www.ine.cl/estadisticas/sociales/mercado-laboral/ocupacion-y-desocupacion
Examples
data(ene)
Encuesta Nacional Urbana de Seguridad Ciudadana 2019 - ENUSC 2019
Description
ENUSC data for the year 2019. Contains only a few variables.
Usage
enusc
Format
dataframe with 24.465 rows y 22 columns
- rph_sexo
1 = man; 2 = woman
- region
16 regions
- Fact_Pers
person sample weights
- Fact_Hog
household sample weights
- Conglomerado
PSU
- VarStrat
strata
- VP_DC
Individual victimization. It works combined with Fact_Pers
- VA_DC
Household victimization. It works combined with Fact_Hog
- rph_edad
age
- P3_1_1
Perception of increased crime in the country. It works combined with Fact_Pers
- P8_1_1
Cause of increased crime in the neighborhood. It works combined with Fact_Pers
- muj_insg_taxi
Female perception of insecurity inside taxis. Variable elaborated with variables P9 and rph_sexo . It works combined with Fact_Pers
- hom_insg_taxi
Male perception of insecurity inside taxis. Variable elaborated with variables P9 and rph_sexo. It works combined with Fact_Pers
- muj_insg_micro
Female perception of insecurity inside buses. Variable elaborated with variables P9 and rph_sexo. It works combined with Fact_Pers
- hom_insg_micro
Male perception of insecurity inside buses. Variable elaborated with variables P9 and rph_sexo. It works combined with Fact_Pers
- muj_insg_centr.com
Female perception of insecurity inside malls. Variable elaborated with variables P9 and rph_sexo. It works combined with Fact_Pers
- hom_insg_centr.com
Male perception of insecurity inside malls. Variable elaborated with variables P9 and rph_sexo. It works combined with Fact_Pers
- muj_insg_loc.col
Female perception of insecurity public transport. Variable elaborated with variables P9 and rph_sexo. It works combined with Fact_Pers
- hom_insg_loc.col
Male perception of insecurity public transport. Variable elaborated with variables P9 and rph_sexo. It works combined with Fact_Pers
- muj_insg_barrio
Female perception of insecurity neighborhood. Variable elaborated with variables P9 and rph_sexo. It works combined with Fact_Pers
- hom_insg_barrio
Male perception of insecurity neighborhood. Variable elaborated with variables P9 and rph_sexo. It works combined with Fact_Pers
Source
Examples
data(enusc)
Encuesta Nacional Urbana de Seguridad Ciudadana 2023 - ENUSC 2023
Description
ENUSC data for the year 2023. Contains only a few variables.
Usage
enusc_2023
Format
dataframe with 49.813 rows y 15 columns
- enc_region
16 regions
- enc_rpc
Code of region, province and commune
- Fact_Pers_Reg
Person sample weights at region level
- Fact_Pers_Com
Person sample weights at commune level
- Fact_Hog_Reg
Household sample weights at region level
- Fact_Hog_Com
Household sample weights at commune level
- VarStrat
Strata
- Conglomerado
PSU
- VH_DV
Households victimized by violent crimes. It works combined with Fact_Hog_*
- VH_DC
Household victimization. It works combined with Fact_Hog_*
- VP_DV
People victimized by violent crimes. It works combined with Fact_Pers_*
- VP_DC
Individual victimization. It works combined with Fact_Pers_*
- PAD
Perception of increased crime in the country. It works combined with Fact_Pers_*
- rph_sexo
1 = man; 2 = woman
- rph_edad
Age
Source
Examples
data(enusc_2023)
VIII Encuesta de Presupuestos Familiares
Description
Reduced version of the VIII EPF database. Contains some sociodemographic variables and the necessary information to work with complex design.
Usage
epf_personas
Format
dataframe compuesto por 48.308 observaciones y 8 variables
- sexo
1 = male; 2 = female
- zona
1 = metropolitan area; 2 = rest of the regional capitals
- ecivil
marital status
- fe
sample weights
- varunit
PSU
- varstrat
strata
- gastot_hd
household expenditure
- ocupado
1 = employed; 0 = non-employed
Source
https://www.ine.cl/estadisticas/sociales/ingresos-y-gastos/encuesta-de-presupuestos-familiares
Examples
data(epf_personas)
Get the coefficient of variation
Description
Receive a table created with survey and return the coefficient of variation for each cell
Usage
get_cv(table, design, domains, type_est = "all", env = parent.frame())
Arguments
table |
|
design |
design |
domains |
|
type_est |
type of estimation: all or size. |
env |
parent environment |
Value
dataframe
with results including including CV
Get degrees of freedom
Description
Receive data and domains. Returns a data frame with the psu, strata and df for each cell
Usage
get_df(data, domains, df_type = "eclac")
Arguments
data |
|
domains |
|
df_type |
|
Value
dataframe
with results including degrees of freedom
Calculates multiple estimations. Internal wrapper for survey package
Description
Generates a table with estimates for a given aggregation
Usage
get_survey_table(
var,
domains,
complex_design,
estimation = "mean",
env = parent.frame(),
fun,
denom = NULL,
type_est = "all"
)
Arguments
var |
|
domains |
|
complex_design |
design from |
estimation |
|
env |
parent environment |
fun |
function required regarding the estimation |
denom |
denominator. This parameter works for the ratio estimation |
type_est |
type of estimation: all or size |
Value
dataframe
containing main results from survey
Calcula el valor de una función cuadrática
Description
quadratic
returns the output of a particular function created by INE Chile, which
is assessed at the value of the estimated proportion from a sample. If the output of the
function is higher than the standard error, it is interpreted as a signal that the
estimation is not reliable.
Usage
quadratic(p)
Arguments
p |
numeric vector with the values of the estimations for proportions |
Value
numeric vector
standardize and sort column names
Description
Receive the survey table in raw state and sort it
Usage
standardize_columns(data, var, denom)
Arguments
data |
|
var |
|
denom |
denominator |
Value
dataframe
with standardized data
Standardize the name of design variables
Description
Rename design variables, so we can use the later
Usage
standardize_design_variables(design)
Arguments
design |
|
Value
design survey