Type: | Package |
Title: | Download Data from Brazil's Population Census |
Version: | 0.5.0 |
Description: | Easy access to data from Brazil's population censuses. The package provides a simple and efficient way to download and read the data sets and the documentation of all the population censuses taken in and after 1960 in the country. The package is built on top of the 'Arrow' platform https://arrow.apache.org/docs/r/, which allows users to work with larger-than-memory census data using 'dplyr' familiar functions. https://arrow.apache.org/docs/r/articles/arrow.html#analyzing-arrow-data-with-dplyr. |
License: | MIT + file LICENSE |
URL: | https://github.com/ipeaGIT/censobr, https://ipeagit.github.io/censobr/ |
BugReports: | https://github.com/ipeaGIT/censobr/issues |
Depends: | R (≥ 4.1.0) |
Imports: | arrow (≥ 15.0.1), checkmate, cli, curl (≥ 5.0.0), dplyr, duckdb, fs, glue, rlang, tools |
Suggests: | covr, DBI, dbplyr, geobr, ggplot2 (≥ 3.3.1), rmarkdown, kableExtra, knitr, scales, testthat |
VignetteBuilder: | knitr |
Encoding: | UTF-8 |
RoxygenNote: | 7.3.2 |
NeedsCompilation: | no |
Packaged: | 2025-07-07 02:16:11 UTC; rafap |
Author: | Rafael H. M. Pereira
|
Maintainer: | Rafael H. M. Pereira <rafa.pereira.br@gmail.com> |
Repository: | CRAN |
Date/Publication: | 2025-07-07 02:30:02 UTC |
censobr: Download Data from Brazil's Population Census
Description
Download data data from Brazil's population Census.
Usage
Please check the vignettes and data documentation on the website.
Author(s)
Maintainer: Rafael H. M. Pereira rafa.pereira.br@gmail.com (ORCID)
Authors:
Rogério J. Barbosa antrologos@gmail.com (ORCID)
Other contributors:
Diego Rabatone Oliveira diraol@diraol.eng.br [contributor]
Neal Richardson neal.p.richardson@gmail.com [contributor]
Ipea - Institute for Applied Economic Research [copyright holder, funder]
See Also
Useful links:
Report bugs at https://github.com/ipeaGIT/censobr/issues
Safely use arrow to open a Parquet file
Description
This function handles some failure modes, including if the Parquet file is corrupted.
Usage
arrow_open_dataset(filename)
Arguments
filename |
A local Parquet file |
Value
An arrow::Dataset
Message when caching file
Description
Message when caching file
Usage
cache_message(
local_file = parent.frame()$local_file,
cache = parent.frame()$cache,
verbose = parent.frame()$verbose
)
Arguments
local_file |
The address of a file passed from the download_file function |
cache |
Logical. Whether the cached data should be used |
verbose |
Logical. Whether the message should be printed |
Value
A message
Manage cached files from the censobr package
Description
Manage cached files from the censobr package
Usage
censobr_cache(
list_files = TRUE,
print_tree = FALSE,
delete_file = NULL,
verbose = TRUE
)
Arguments
list_files |
Logical. Whether to print a message with the address of all
censobr data sets cached locally. Defaults to |
print_tree |
Logical. Whether the cache files should be printed in a
tree-like format. This parameter only works if |
delete_file |
String. The file name or a string pattern that matches the
file path of a file cached locally and which should be deleted.
Defaults to |
verbose |
A logical. Whether the function should print informative
messages. Defaults to |
Value
A message indicating which file exist and/or which ones have been deleted from the local cache directory.
See Also
Other Cache data:
get_censobr_cache_dir()
,
set_censobr_cache_dir()
Examples
# list all files cached
censobr_cache(list_files = TRUE)
# delete particular file
censobr_cache(delete_file = '2010_deaths')
Data dictionary of Brazil's census data
Description
Open on a browser the data dictionary of Brazil's census data.
Usage
data_dictionary(
year,
dataset,
showProgress = TRUE,
cache = TRUE,
verbose = TRUE
)
Arguments
year |
Numeric. Year of reference in the format |
dataset |
Character. The dataset of data dictionary to be opened. Options
include |
showProgress |
Logical. Defaults to |
cache |
Logical. Whether the function should read the data cached
locally, which is much faster. Defaults to |
verbose |
A logical. Whether the function should print informative
messages. Defaults to |
Value
Returns NULL
and opens an .html, .pdf or excel file
See Also
Other Census documentation:
interview_manual()
Examples
# Open data dictionary
data_dictionary(year = 2010,
dataset = 'population',
showProgress = FALSE)
data_dictionary(year = 2022,
dataset = 'tracts',
showProgress = FALSE)
data_dictionary(year = 1980,
dataset = 'households',
showProgress = FALSE)
Download file from url
Description
Download file from url
Usage
download_file(
file_url = parent.frame()$file_url,
showProgress = parent.frame()$showProgress,
cache = parent.frame()$cache,
verbose = parent.frame()$verbose
)
Arguments
file_url |
String. A url. |
showProgress |
Logical. |
cache |
Logical. |
verbose |
Logical. |
Value
A string to the address of the file
Error missing data sets
Description
Error missing data sets
Usage
error_missing_datasets(d)
Arguments
d |
Vector with the data sets available |
Value
An informative error
Error missing years
Description
Error missing years
Usage
error_missing_years(y)
Arguments
y |
Vector with the years available |
Value
An informative error
Get path to cache directory for censobr files
Description
Get the path to the cache directory currently being used for for the censobr files
Usage
get_censobr_cache_dir()
Value
Path to cache dir
See Also
Other Cache data:
censobr_cache()
,
set_censobr_cache_dir()
Examples
# get path to cache directory
get_censobr_cache_dir()
Interview manual of the data collection of Brazil's censuses
Description
Open on a browser the interview manual of the data collection of Brazil's censuses
Usage
interview_manual(
year = NULL,
showProgress = TRUE,
cache = TRUE,
verbose = TRUE
)
Arguments
year |
Numeric. Year of reference in the format |
showProgress |
Logical. Defaults to |
cache |
Logical. Whether the function should read the data cached
locally, which is much faster. Defaults to |
verbose |
A logical. Whether the function should print informative
messages. Defaults to |
Value
Opens a .pdf
file on the browser
See Also
Other Census documentation:
data_dictionary()
Examples
# Open interview manual on the browser
interview_manual(
year = 2010,
showProgress = FALSE
)
Add household variables to the data set
Description
Add household variables to the data set
Usage
merge_household_var(
df,
year = parent.frame()$year,
add_labels = parent.frame()$add_labels,
showProgress = parent.frame()$showProgress,
verbose = parent.frame()$verbose
)
Arguments
df |
An arrow |
year |
Numeric. Passed from function above. |
add_labels |
Character. Passed from function above. |
showProgress |
Logical. Passed from function above. |
verbose |
Logical. Passed from function above. |
Value
An arrow Dataset
with additional household variables.
Questionnaires used in the data collection of Brazil's censuses
Description
Open on a browser the questionnaire used in the data collection of Brazil's censuses
Usage
questionnaire(
year = 2010,
type = NULL,
showProgress = TRUE,
cache = TRUE,
verbose = TRUE
)
Arguments
year |
Numeric. Year of reference in the format |
type |
Character. The type of questionnaire used in the survey, whether
the |
showProgress |
Logical. Defaults to |
cache |
Logical. Whether the function should read the data cached
locally, which is much faster. Defaults to |
verbose |
A logical. Whether the function should print informative
messages. Defaults to |
Value
Opens a .pdf
file on the browser
Examples
library(censobr)
# Open questionnaire on browser
questionnaire(year = 2010, type = 'long', showProgress = FALSE)
Download microdata of emigration records from Brazil's census
Description
Download microdata of emigration records from Brazil's census. Data collected in the sample component of the questionnaire.
Usage
read_emigration(
year,
columns = NULL,
add_labels = NULL,
merge_households = FALSE,
as_data_frame = FALSE,
showProgress = TRUE,
cache = TRUE,
verbose = TRUE
)
Arguments
year |
Numeric. Year of reference in the format |
columns |
String. A vector of column names to keep. The rest of the
columns are not read. Defaults to |
add_labels |
Character. Whether the function should add labels to the
responses of categorical variables. When |
merge_households |
Logical. Indicate whether the function should merge
household variables to the output data. Defaults to |
as_data_frame |
Logical. When |
showProgress |
Logical. Defaults to |
cache |
Logical. Whether the function should read the data cached
locally, which is much faster. Defaults to |
verbose |
A logical. Whether the function should print informative
messages. Defaults to |
Value
An arrow Dataset
or a "data.frame"
object.
See Also
Other Microdata:
read_families()
,
read_households()
,
read_mortality()
,
read_population()
Examples
# return data as arrow Dataset
df <- read_emigration(
year = 2010,
showProgress = FALSE
)
# return data as data.frame
df <- read_emigration(
year = 2010,
as_data_frame = TRUE,
showProgress = FALSE
)
Download microdata of family records from Brazil's census
Description
Download microdata of family records from Brazil's census. Data collected in the sample component of the questionnaire.
Usage
read_families(
year,
columns = NULL,
add_labels = NULL,
as_data_frame = FALSE,
showProgress = TRUE,
cache = TRUE,
verbose = TRUE
)
Arguments
year |
Numeric. Year of reference in the format |
columns |
String. A vector of column names to keep. The rest of the
columns are not read. Defaults to |
add_labels |
Character. Whether the function should add labels to the
responses of categorical variables. When |
as_data_frame |
Logical. When |
showProgress |
Logical. Defaults to |
cache |
Logical. Whether the function should read the data cached
locally, which is much faster. Defaults to |
verbose |
A logical. Whether the function should print informative
messages. Defaults to |
Value
An arrow Dataset
or a "data.frame"
object.
See Also
Other Microdata:
read_emigration()
,
read_households()
,
read_mortality()
,
read_population()
Examples
# return data as arrow Dataset
df <- read_families(
year = 2000,
showProgress = FALSE
)
Download microdata of household records from Brazil's census
Description
Download microdata of household records from Brazil's census. Data collected in the sample component of the questionnaire.
Usage
read_households(
year,
columns = NULL,
add_labels = NULL,
as_data_frame = FALSE,
showProgress = TRUE,
cache = TRUE,
verbose = TRUE
)
Arguments
year |
Numeric. Year of reference in the format |
columns |
String. A vector of column names to keep. The rest of the
columns are not read. Defaults to |
add_labels |
Character. Whether the function should add labels to the
responses of categorical variables. When |
as_data_frame |
Logical. When |
showProgress |
Logical. Defaults to |
cache |
Logical. Whether the function should read the data cached
locally, which is much faster. Defaults to |
verbose |
A logical. Whether the function should print informative
messages. Defaults to |
Value
An arrow Dataset
or a "data.frame"
object.
1960 Census
The 1960 microdata version available in censobr is a combination of two versions of the Demographic Census sample. The 25% sample data from the 1960 Census was never fully processed by IBGE - several states did not have their questionnaires digitized. Currently, this dataset only has data from 16 states of the Federation (and from a contested border region between Minas Gerais and Espirito Santo called Serra dos Aimores). Information is missing for the states of the former Northern Region, Maranhão, Piaui, Guanabara, Santa Catarina, and Espírito Santo. In 1965, IBGE decided to draw a probabilistic sub-sample of approximately 1.27% of the population, including all units of the federation. With this data, IBGE produced several official reports at the time. The data from censobr is the combination of these two datasets.
We pre-processed the 1.27% sample data to ensured data consistency, given the original data was partially corrupted. We also created a sample weight variable to correct for unbalanced data and to expand te sample to the total population. For the data from the 25% sample, the weights expand to the municipal totals. Meanwhile, for the data from the 1.27% sample, the weights expand to the state totals. Additionally, we constructed a few variables that allow for the approximate incorporation of the complex sample design, enabling the proper calculation of standard errors and confidence intervals.
You can read more about the 1960 Census and find a thorough documentation of how this dataset was processed on this link https://github.com/antrologos/ConsistenciaCenso1960Br.
See Also
Other Microdata:
read_emigration()
,
read_families()
,
read_mortality()
,
read_population()
Examples
# return data as arrow Dataset
df <- read_households(
year = 2010,
showProgress = FALSE
)
Download microdata of death records from Brazil's census
Description
Download microdata of death records from Brazil's census. Data collected in the sample component of the questionnaire.
Usage
read_mortality(
year,
columns = NULL,
add_labels = NULL,
merge_households = FALSE,
as_data_frame = FALSE,
showProgress = TRUE,
cache = TRUE,
verbose = TRUE
)
Arguments
year |
Numeric. Year of reference in the format |
columns |
String. A vector of column names to keep. The rest of the
columns are not read. Defaults to |
add_labels |
Character. Whether the function should add labels to the
responses of categorical variables. When |
merge_households |
Logical. Indicate whether the function should merge
household variables to the output data. Defaults to |
as_data_frame |
Logical. When |
showProgress |
Logical. Defaults to |
cache |
Logical. Whether the function should read the data cached
locally, which is much faster. Defaults to |
verbose |
A logical. Whether the function should print informative
messages. Defaults to |
Value
An arrow Dataset
or a "data.frame"
object.
See Also
Other Microdata:
read_emigration()
,
read_families()
,
read_households()
,
read_population()
Examples
library(censobr)
# return data as arrow Dataset
df <- read_mortality(
year = 2010,
showProgress = FALSE
)
# dplyr::glimpse(df)
# return data as data.frame
df <- read_mortality(
year = 2010,
as_data_frame = TRUE,
showProgress = FALSE
)
# dplyr::glimpse(df)
Download microdata of population records from Brazil's census
Description
Download microdata of population records from Brazil's census. Data collected in the sample component of the questionnaire.
Usage
read_population(
year,
columns = NULL,
add_labels = NULL,
as_data_frame = FALSE,
showProgress = TRUE,
cache = TRUE,
verbose = TRUE
)
Arguments
year |
Numeric. Year of reference in the format |
columns |
String. A vector of column names to keep. The rest of the
columns are not read. Defaults to |
add_labels |
Character. Whether the function should add labels to the
responses of categorical variables. When |
as_data_frame |
Logical. When |
showProgress |
Logical. Defaults to |
cache |
Logical. Whether the function should read the data cached
locally, which is much faster. Defaults to |
verbose |
A logical. Whether the function should print informative
messages. Defaults to |
Value
An arrow Dataset
or a "data.frame"
object.
1960 Census
The 1960 microdata version available in censobr is a combination of two versions of the Demographic Census sample. The 25% sample data from the 1960 Census was never fully processed by IBGE - several states did not have their questionnaires digitized. Currently, this dataset only has data from 16 states of the Federation (and from a contested border region between Minas Gerais and Espirito Santo called Serra dos Aimores). Information is missing for the states of the former Northern Region, Maranhão, Piaui, Guanabara, Santa Catarina, and Espírito Santo. In 1965, IBGE decided to draw a probabilistic sub-sample of approximately 1.27% of the population, including all units of the federation. With this data, IBGE produced several official reports at the time. The data from censobr is the combination of these two datasets.
We pre-processed the 1.27% sample data to ensured data consistency, given the original data was partially corrupted. We also created a sample weight variable to correct for unbalanced data and to expand te sample to the total population. For the data from the 25% sample, the weights expand to the municipal totals. Meanwhile, for the data from the 1.27% sample, the weights expand to the state totals. Additionally, we constructed a few variables that allow for the approximate incorporation of the complex sample design, enabling the proper calculation of standard errors and confidence intervals.
You can read more about the 1960 Census and find a thorough documentation of how this dataset was processed on this link https://github.com/antrologos/ConsistenciaCenso1960Br.
See Also
Other Microdata:
read_emigration()
,
read_families()
,
read_households()
,
read_mortality()
Examples
# return data as arrow Dataset
df <- read_population(
year = 2010,
showProgress = FALSE
)
Download census tract-level data from Brazil's censuses
Description
Download census tract-level aggregate data from Brazil's censuses.
Usage
read_tracts(
year,
dataset,
as_data_frame = FALSE,
showProgress = TRUE,
cache = TRUE,
verbose = TRUE
)
Arguments
year |
Numeric. Year of reference in the format |
dataset |
Character. The dataset to be opened. The following options are available for each edition of the census: 2000 Census
2010 Census
2022 Census
The For a complete description of the datasets, themes, and variables, check
|
as_data_frame |
Logical. When |
showProgress |
Logical. Defaults to |
cache |
Logical. Whether the function should read the data cached
locally, which is much faster. Defaults to |
verbose |
A logical. Whether the function should print informative
messages. Defaults to |
Value
An arrow Dataset
or a "data.frame"
object.
Examples
library(censobr)
# return data as arrow Dataset
df <- read_tracts(
year = 2022,
dataset = 'Domicilio',
showProgress = FALSE
)
# return data as data.frame
df <- read_tracts(
year = 2010,
dataset = 'Basico',
as_data_frame = TRUE,
showProgress = FALSE
)
Set custom cache directory for censobr files
Description
Set custom directory for caching files from the censobr package. The user only needs to run this function once. This set directory is persistent across R sessions.
Usage
set_censobr_cache_dir(path, verbose = TRUE)
Arguments
path |
String. The path to an existing directory. It defaults to
|
verbose |
A logical. Whether the function should print informative
messages. Defaults to |
Value
A message pointing to the directory where censobr files are cached.
See Also
Other Cache data:
censobr_cache()
,
get_censobr_cache_dir()
Examples
# Set custom cache directory
tempd <- tempdir()
set_censobr_cache_dir(path = tempd)
# back to default path
set_censobr_cache_dir(path = NULL)
Check if user is using the default cache dir of censobr
Description
Check if user is using the default cache dir of censobr
Usage
using_default_censobr_cache_dir()
Value
TRUE or FALSE