Type: Package
Title: Tools for Eurostat Open Data
Version: 4.0.0
Date: 2023-12-19
Description: Tools to download data from the Eurostat database https://ec.europa.eu/eurostat together with search and manipulation utilities.
License: BSD_2_clause + file LICENSE
URL: https://ropengov.github.io/eurostat/, https://github.com/rOpenGov/eurostat
BugReports: https://github.com/rOpenGov/eurostat/issues
Depends: R (≥ 3.6.0)
Imports: classInt, countrycode, curl, digest, dplyr, httr2 (≥ 0.2.3), ISOweek, jsonlite, lubridate, rappdirs, readr, RefManageR, regions, rlang, stringi, stringr, tibble, tidyr (≥ 1.0.0), xml2, data.table (≥ 1.14.8)
Suggests: giscoR, knitr, rmarkdown, sf, testthat (≥ 3.0.0)
VignetteBuilder: knitr
Config/Needs/website: ggplot2, tmap, styler, sessioninfo, ropengov/rogtemplate, ragg
Config/testthat/edition: 3
Config/testthat/parallel: false
Encoding: UTF-8
LazyData: true
MailingList: rOpenGov <ropengov-forum@googlegroups.com>
NeedsCompilation: no
Repository: CRAN
RoxygenNote: 7.2.3
X-schema.org-isPartOf: http://ropengov.org/
X-schema.org-keywords: ropengov
Packaged: 2023-12-19 20:11:33 UTC; leo
Author: Leo Lahti ORCID iD [aut, cre], Janne Huovari [aut], Markus Kainu [aut], Przemyslaw Biecek [aut], Daniel Antal [ctb], Diego Hernangomez ORCID iD [ctb], Joona Lehtomaki [ctb], Francois Briatte [ctb], Reto Stauffer [ctb], Paul Rougieux [ctb], Anna Vasylytsya [ctb], Oliver Reiter [ctb], Pyry Kantanen ORCID iD [ctb], Enrico Spinielli ORCID iD [ctb]
Maintainer: Leo Lahti <leo.lahti@iki.fi>
Date/Publication: 2023-12-19 20:30:02 UTC

R Tools for Eurostat open data

Description

Tools to download data from the Eurostat database https://ec.europa.eu/eurostat together with search and manipulation utilities.

Details

Package eurostat
Type Package
Version 4.0.0
Date 2014-2023
License BSD_2_clause + file LICENSE
LazyLoad yes

Eurostat

Eurostat website: https://ec.europa.eu/eurostat Eurostat database: https://ec.europa.eu/eurostat/web/main/data/database

Information about the data update schedule from Eurostat: "Eurostat datasets are updated twice a day at 11:00 and 23:00 CET, if newer data is available or for structural changes, for example for the dimensions in the dataset.

The Eurostat database always contains the latest version of the datasets, meaning that there is no versioning or documentation of past versions of the data."

Data source: Eurostat SDMX 2.1 Dissemination API

Data is downloaded from Eurostat SDMX 2.1 API endpoint as compressed TSV files that are transformed into tabular format. See Eurostat documentation for more information: https://wikis.ec.europa.eu/display/EUROSTATHELP/API+SDMX+2.1+-+data+query

The new dissemination API replaces the old bulk download facility that was used by Eurostat before October 2023 and by the eurostat R package versions before 4.0.0. See Eurostat documentation about the transition from Bulk Download to API for more information about the differences between the old bulk download facility and the data provided by the new API connection: https://wikis.ec.europa.eu/display/EUROSTATHELP/Transition+-+from+Eurostat+Bulk+Download+to+API

See especially the document Migrating_to_API_TSV.pdf that describes the changes in TSV file format in new applications.

For more information about SDMX 2.1, see SDMX standards: Section 7: Guidelines for the use of web services, Version 2.1: https://sdmx.org/wp-content/uploads/SDMX_2-1_SECTION_7_WebServicesGuidelines.pdf

Disclaimer: Availability of filtering functionalities

Currently it only possible to download filtered data through API Statistics (JSON API) when using eurostat package, although technically filtering datasets downloaded through the SDMX Dissemination API is also supported by Eurostat. We may support this feature in the future. In the meantime, if you are interested in filtering Dissemination API data queries manually, please consult the following Eurostat documentation: https://wikis.ec.europa.eu/display/EUROSTATHELP/API+SDMX+2.1+-+data+filtering

Data source: Eurostat API Statistics (JSON API)

Data is downloaded from Eurostat API Statistics. See Eurostat documentation for more information about data queries in API Statistics https://wikis.ec.europa.eu/display/EUROSTATHELP/API+Statistics+-+data+query

This replaces the old JSON Web Services that was used by Eurostat before February 2023 and by the eurostat R package versions before 3.7.13. See Eurostat documentation about the migration from JSON web service to API Statistics for more information about the differences between the old and the new service: https://wikis.ec.europa.eu/display/EUROSTATHELP/API+Statistics+-+migrating+from+JSON+web+service+to+API+Statistics

For easily viewing which filtering options are available - in addition to the default ones, time and language - Eurostat Web services Query builder tool may be useful: https://ec.europa.eu/eurostat/web/query-builder

Filtering datasets

When using Eurostat API Statistics (JSON API), datasets can be filtered before they are downloaded and saved in local memory. The general format for filter parameters is ⁠<DIMENSION_CODE>=<VALUE>⁠.

Filter parameters are optional but the used dimension codes must be present in the data product that is being queried. Dimension codes can vary between different data products so it may be useful to examine new datasets in Eurostat data browser beforehand. However, most if not all Eurostat datasets concern European countries and contain information that was gathered at some point in time, so geo and time dimension codes can usually be used.

⁠<DIMENSION_CODE>⁠ and ⁠<VALUE>⁠ are case-insensitive and they can be written in lowercase or uppercase in the query.

Parameters are passed onto the eurostat package functions get_eurostat() and get_eurostat_json() as a list item. If an individual item contains multiple items, as it often can be in the case of geo parameters and other optional items, they must be in the form of a vector: c("FI", "SE"). For examples on how to use these parameters, see function examples below.

Time parameters

time and time_period address the same TIME_PERIOD dimension in the dataset and can be used interchangeably. In the Eurostat documentation it is stated that "Using more than one Time parameter in the same query is not accepted", but practice has shown that actually Eurostat API allows multiple time parameters in the same query. This makes it possible to use R colon operator when writing queries, so time = c(2015:2018) translates to ⁠&time=2015&time=2016&time=2017&time=2018⁠.

The only exception to this is when the queried dataset contains e.g. quarterly data and TIME_PERIOD is saved as 2015-Q1, 2015-Q2 etc. Then it is possible to use time=2015-Q1&time=2015-Q2 style in the query URL, but this makes it unfeasible to use the colon operator and requires a lot of manual typing.

Because of this, it is useful to know about other time parameters as well:

Using both untilTimePeriod and sinceTimePeriod parameters in the same query is allowed, making the usage of the R colon operator unnecessary. In the case of quarterly data, using untilTimePeriod and sinceTimePeriod parameters also works, as opposed to the colon operator, so it is generally safer to use them as well.

Other dimensions

In get_eurostat_json() examples nama_10_gdp dataset is filtered with two additional filter parameters:

Filters like these are most likely unique to the nama_10_gdp dataset (or other datasets within the same domain) and should not be used with others dataset without user discretion. By using label_eurostat() we know that "B1GQ" stands for "Gross domestic product at market prices" and "CLV_I10" means "Chain linked volumes, index 2010=100".

Different dimension codes can be translated to a natural language by using the get_eurostat_dic() function, which returns labels for individual dimension items such as na_item and unit, as opposed to label_eurostat() which does it for whole datasets. For example, the parameter na_item stands for "National accounts indicator (ESA 2010)" and unit stands for "Unit of measure".

Language

All datasets have metadata available in English, French and German. If no parameter is given, the labels are returned in English.

Example:

More information

For more information about data filtering see Eurostat documentation on API Statistics: https://wikis.ec.europa.eu/display/EUROSTATHELP/API+Statistics+-+data+query#APIStatisticsdataquery-TheparametersdefinedintheRESTrequest

Data source: Eurostat Table of Contents

The Eurostat Table of Contents (TOC) is downloaded from https://ec.europa.eu/eurostat/api/dissemination/catalogue/toc/txt?lang=en (default) or from French or German language variants: https://ec.europa.eu/eurostat/api/dissemination/catalogue/toc/txt?lang=fr https://ec.europa.eu/eurostat/api/dissemination/catalogue/toc/txt?lang=de

See Eurostat documentation on TOC items: https://wikis.ec.europa.eu/display/EUROSTATHELP/API+-+Detailed+guidelines+-+Catalogue+API+-+TOC

Data source: GISCO - General Copyright

"Eurostat's general copyright notice and licence policy is applicable and can be consulted here: https://ec.europa.eu/eurostat/about-us/policies/copyright

Please also be aware of the European Commission's general conditions: https://commission.europa.eu/legal-notice_en

Moreover, there are specific provisions applicable to some of the following datasets available for downloading. The download and usage of these data is subject to their acceptance:

Of the abovementioned datasets, Administrative Units / Statistical Units is applicable if the user wants to draw maps with borders provided by GISCO / EuroGeographics.

Data source: GISCO - Administrative Units / Statistical Units

The following copyright notice is provided for end user convenience. Please check up-to-date copyright information from the GISCO website: GISCO: Geographical information and maps - Administrative units/statistical units

"In addition to the general copyright and licence policy applicable to the whole Eurostat website, the following specific provisions apply to the datasets you are downloading. The download and usage of these data is subject to the acceptance of the following clauses:

  1. The Commission agrees to grant the non-exclusive and not transferable right to use and process the Eurostat/GISCO geographical data downloaded from this page (the "data").

  2. The permission to use the data is granted on condition that:

    1. the data will not be used for commercial purposes;

    2. the source will be acknowledged. A copyright notice, as specified below, will have to be visible on any printed or electronic publication using the data downloaded from this page."

Copyright notice

When data downloaded from this page is used in any printed or electronic publication, in addition to any other provisions applicable to the whole Eurostat website, data source will have to be acknowledged in the legend of the map and in the introductory page of the publication with the following copyright notice:

EN: © EuroGeographics for the administrative boundaries

FR: © EuroGeographics pour les limites administratives

DE: © EuroGeographics bezüglich der Verwaltungsgrenzen

For publications in languages other than English, French or German, the translation of the copyright notice in the language of the publication shall be used.

If you intend to use the data commercially, please contact EuroGeographics for information regarding their licence agreements."

Eurostat: Copyright notice and free re-use of data

The following copyright notice is provided for end user convenience. Please check up-to-date copyright information from the eurostat website: https://ec.europa.eu/eurostat/about-us/policies/copyright

"(c) European Union, 1995 - today

Eurostat has a policy of encouraging free re-use of its data, both for non-commercial and commercial purposes. All statistical data, metadata, content of web pages or other dissemination tools, official publications and other documents published on its website, with the exceptions listed below, can be reused without any payment or written licence provided that:

For exceptions to the abovementioned principles see Eurostat website

Citing Eurostat data

For citing datasets, use get_bibentry() to build a bibliography that is suitable for your reference manager of choice.

When using Eurostat data in other contexts than academic publications that in-text citations or footnotes/endnotes, the following guidelines may be helpful:

It should be avoided to associate different entities (e.g. Eurostat, National Statistical Offices, other data providers) to the same dataset or indicator without specifying the role of each of them in the treatment of data.

See also section "Eurostat: Copyright notice and free re-use of data" in get_eurostat() documentation.

Strategies for handling large datasets more efficiently

Most Eurostat datasets are relatively manageable, at least on a machine with 16 GB of RAM. The largest dataset in Eurostat database, at the time of writing this, had 148362539 (148 million) values, which results in an object with 148 million rows in tidy data (long) format. The test machine with 16 GB of RAM was able to handle the second largest dataset in the database with 91 million values (rows).

There are still some methods to make data fetching functions perform faster:

regions functions

For working with sub-national statistics the basic functions of the regions package are imported https://regions.dataobservatory.eu/.

Author(s)

Leo Lahti, Janne Huovari, Markus Kainu, Przemyslaw Biecek

References

See citation("eurostat"):

Kindly cite the eurostat R package as follows:

  Lahti L., Huovari J., Kainu M., and Biecek P. (2017). Retrieval and
  analysis of Eurostat open data with the eurostat package. The R
  Journal 9(1), pp. 385-392. doi: 10.32614/RJ-2017-019

  Lahti, L., Huovari J., Kainu M., Biecek P., Hernangomez D., Antal D.,
  and Kantanen P. (2023). eurostat: Tools for Eurostat Open Data
  [Computer software]. R package version 4.0.0.
  https://github.com/rOpenGov/eurostat

To see these entries in BibTeX format, use 'print(<citation>,
bibtex=TRUE)', 'toBibtex(.)', or set
'options(citation.bibtex.max=999)'.

When citing data downloaded from Eurostat, see section "Citing Eurostat data" in get_eurostat() documentation.

See Also

help("regions"), https://regions.dataobservatory.eu/

Examples

library(eurostat)

Add the statistical aggregation level to data frame

Description

Eurostat regional statistics contain country, and various regional level information. In many cases, for example, when mapping, it is useful to filter out national level data from NUTS2 level regional data, for example.

This function will be deprecated. Use the more comprehensive ⁠[regions::validate_nuts_regions()]⁠ instead.

Usage

add_nuts_level(dat, geo_labels = "geo")

Arguments

dat

A data frame or tibble returned by get_eurostat().

geo_labels

A geographical label, defaults to geo.

Details

DEPRECATED FUNCTIONS FOR BACKWARD COMPATIBILITY FUNCTIONS GIVE WARNING AND CALL APPROPRIATE regions FUNCTIONS

Value

a new numeric variable nuts_level with the numeric value of NUTS level 0 (country), 1 (greater region), 2 (region), 3 (small region).

Author(s)

Daniel Antal

See Also

regions::validate_nuts_regions()

Other regions functions: harmonize_geo_code(), recode_to_nuts_2013(), recode_to_nuts_2016(), reexports

Examples


dat <- data.frame(
  geo    = c("FR", "IE04", "DEB1C"),
  values = c(1000, 23, 12)
)

add_nuts_level(dat)

Check access to ec.europe.eu

Description

Check if R has access to resources at http://ec.europa.eu

Usage

check_access_to_data()

Value

a logical.

Author(s)

Markus Kainu markus.kainu@kapsi.fi

Examples


check_access_to_data()



Clean Eurostat Cache

Description

Delete all .rds files from the eurostat cache directory. See get_eurostat() for more on cache.

Usage

clean_eurostat_cache(cache_dir = NULL, config = FALSE)

Arguments

cache_dir

A path to cache directory. If NULL (default) tries to clean default temporary cache directory.

config

Logical TRUE/FALSE. Should the cached path be deleted?

Author(s)

Przemyslaw Biecek, Leo Lahti, Janne Huovari, Markus Kainu and Diego Hernangómez

See Also

Other cache utilities: set_eurostat_cache_dir()

Examples

## Not run: 
clean_eurostat_cache()

## End(Not run)

Time Column Conversions for data from new dissemination API

Description

Internal function to convert time column.

Usage

convert_time_col(x, time_format)

Arguments

x

A time column (vector) from a downloaded dataset

time_format

one of the following: date, date_last, or num. See tidy_eurostat() for more information.


Cuts the Values Column into Classes and Polishes the Labels

Description

Categorises a numeric vector into automatic or manually defined categories and polishes the labels ready for used in mapping with ggplot2.

Usage

cut_to_classes(
  x,
  n = 5,
  style = "equal",
  manual = FALSE,
  manual_breaks = NULL,
  decimals = 0,
  nodata_label = "No data"
)

Arguments

x

A numeric vector, eg. values variable in data returned by get_eurostat().

n

A numeric. number of classes/categories

style

chosen style: one of "fixed", "sd", "equal", "pretty", "quantile", "kmeans", "hclust", "bclust", "fisher", "jenks", "dpih", "headtails", "maximum", or "box"

manual

Logical. If manual breaks are being used

manual_breaks

Numeric vector with manual threshold values

decimals

Number of decimals to include with labels

nodata_label

String. Text label for NA category.

Value

a factor.

Author(s)

Markus Kainu markuskainu@gmail.com

See Also

classInt::classIntervals()

Other helpers: dic_order(), eurotime2date(), eurotime2num(), harmonize_country_code(), label_eurostat()

Examples



# lp <- get_eurostat("nama_aux_lp")
lp <- get_eurostat("nama_10_lp_ulc")
lp$class <- cut_to_classes(lp$values, n = 5, style = "equal", decimals = 1)



Order of Variable Levels from Eurostat Dictionary.

Description

Orders the factor levels.

Usage

dic_order(x, dic, type)

Arguments

x

a variable (code or labelled) to get order for.

dic

a name of the dictionary. Correspond a variable name in the data_frame from get_eurostat(). Can be also data_frame from get_eurostat_dic().

type

a type of the x. Could be code or label.

Details

Some variables, like classifications, have logical or conventional ordering. Eurostat data tables are nor necessary ordered in this order. The function dic_order() get the ordering from Eurostat classifications dictionaries. The function label_eurostat() can also order factor levels of labels with argument eu_order = TRUE.

Value

A numeric vector of orders.

Author(s)

Przemyslaw Biecek, Leo Lahti, Janne Huovari and Markus Kainu

See Also

Other helpers: cut_to_classes(), eurotime2date(), eurotime2num(), harmonize_country_code(), label_eurostat()


Countries and Country Codes

Description

Countries and country codes in EU, Euro area, EFTA and EU candidate countries.

Usage

eu_countries

ea_countries

efta_countries

eu_candidate_countries

Format

A data_frame:

An object of class tbl_df (inherits from tbl, data.frame) with 19 rows and 3 columns.

An object of class tbl_df (inherits from tbl, data.frame) with 4 rows and 3 columns.

An object of class tbl_df (inherits from tbl, data.frame) with 7 rows and 3 columns.

Source

https://ec.europa.eu/eurostat/statistics-explained/index.php/Tutorial:Country_codes_and_protocol_order, https://ec.europa.eu/eurostat/statistics-explained/index.php/Glossary:Euro_area

See Also

Other datasets: eurostat_geodata_60_2016, tgs00026


Defunct functions in eurostat

Description

This list of defunct functions is maintained to document changes to eurostat functions in a transparent manner.

Usage

grepEurostatTOC(...)

Arguments

...

Generic representation of old arguments

Details

The following functions are defunct:


Geospatial data of Europe from GISCO in 1:60 million scale from year 2016

Description

Geospatial data of Europe from GISCO in 1:60 million scale from year 2016

Format

sf object

Details

The dataset contains 2016 observations (rows) and 12 variables (columns).

The object contains the following columns:

Dataset updated: 2023-06-29. For a more recent version, please use giscoR::gisco_get_nuts() function.

Source

Data source: Eurostat via giscoR::gisco_get_nuts().

© EuroGeographics for the administrative boundaries

Data downloaded from: https://ec.europa.eu/eurostat/web/gisco/geodata/reference-data/administrative-units-statistical-units

References

The following copyright notice is provided for end user convenience. Please check up-to-date copyright information from the eurostat website: GISCO: Geographical information and maps - Administrative units/statistical units

"In addition to the general copyright and licence policy applicable to the whole Eurostat website, the following specific provisions apply to the datasets you are downloading. The download and usage of these data is subject to the acceptance of the following clauses:

  1. The Commission agrees to grant the non-exclusive and not transferable right to use and process the Eurostat/GISCO geographical data downloaded from this page (the "data").

  2. The permission to use the data is granted on condition that:

    1. the data will not be used for commercial purposes;

    2. the source will be acknowledged. A copyright notice, as specified below, will have to be visible on any printed or electronic publication using the data downloaded from this page.

Copyright notice

When data downloaded from this page is used in any printed or electronic publication, in addition to any other provisions applicable to the whole Eurostat website, data source will have to be acknowledged in the legend of the map and in the introductory page of the publication with the following copyright notice:

EN: © EuroGeographics for the administrative boundaries

FR: © EuroGeographics pour les limites administratives

DE: © EuroGeographics bezüglich der Verwaltungsgrenzen

For publications in languages other than English, French or German, the translation of the copyright notice in the language of the publication shall be used.

If you intend to use the data commercially, please contact EuroGeographics for information regarding their licence agreements."

See Also

giscoR::gisco_get_nuts() and Eurostat. (2019). Methodological manual on territorial typologies – 2018 edition. Manuals and guidelines.

Other datasets: eu_countries, tgs00026

Other geospatial: get_eurostat_geospatial()

Examples


eurostat_geodata_60_2016 <- eurostat::eurostat_geodata_60_2016

# Manipulate and plot
if (require(sf)) {
  library(sf)
  # Filter NUTS3 from select countries like in a regular data frame
  example_nuts <- subset(eurostat_geodata_60_2016, LEVL_CODE == 3 &
    CNTR_CODE %in% c("DK", "DE", "PL"))

  plot(example_nuts["CNTR_CODE"])
}


Date Conversion from New Eurostat Time Format

Description

Date conversion from Eurostat time format. A function to convert Eurostat time values to objects of class Date() representing calendar dates.

Usage

eurotime2date(x, last = FALSE)

Arguments

x

a charter string with time information in Eurostat time format.

last

a logical. If FALSE (default) the date is the first date of the period (month, quarter or year). If TRUE the date is the last date of the period.

Details

Available patterns are YYYY (year), YYYY-SN (semester), YYYY-QN (quarter), YYYY-MM (month), YYYY-WNN (week) and YYYY-MM-DD (day).

Value

an object of class Date().

Author(s)

Janne Huovari janne.huovari@ptt.fi

References

See citation("eurostat"):

# Kindly cite the eurostat R package as follows:
# 
#   Lahti L., Huovari J., Kainu M., and Biecek P. (2017). Retrieval and
#   analysis of Eurostat open data with the eurostat package. The R
#   Journal 9(1), pp. 385-392. doi: 10.32614/RJ-2017-019
# 
#   Lahti, L., Huovari J., Kainu M., Biecek P., Hernangomez D., Antal D.,
#   and Kantanen P. (2023). eurostat: Tools for Eurostat Open Data
#   [Computer software]. R package version 4.0.0.
#   https://github.com/rOpenGov/eurostat
# 
# To see these entries in BibTeX format, use 'print(<citation>,
# bibtex=TRUE)', 'toBibtex(.)', or set
# 'options(citation.bibtex.max=999)'.

See Also

lubridate::ymd()

Other helpers: cut_to_classes(), dic_order(), eurotime2num(), harmonize_country_code(), label_eurostat()

Examples



na_q <- get_eurostat("namq_10_pc", time_format = "raw")
na_q$TIME_PERIOD <- eurotime2date(x = na_q$TIME_PERIOD)
unique(na_q$TIME_PERIOD)


## Not run: 
# Test for weekly data
get_eurostat(
  id = "lfsi_abs_w",
  select_time = c("W"),
  time_format = "date"
  )

## End(Not run)


Conversion of Eurostat Time Format to Numeric

Description

A conversion of a Eurostat time format to numeric.

Usage

eurotime2num(x)

Arguments

x

a charter string with time information in Eurostat time format.

Details

Bi-annual (semester), quarterly, monthly and weekly data can be presented as a fraction of the year in beginning of the period. Conversion of daily data is not supported.

Value

see as.numeric().

Author(s)

Janne Huovari janne.huovari@ptt.fi, Pyry Kantanen

See Also

Other helpers: cut_to_classes(), dic_order(), eurotime2date(), harmonize_country_code(), label_eurostat()

Examples



na_q <- get_eurostat("namq_10_pc", time_format = "raw")
na_q$TIME_PERIOD <- eurotime2num(x = na_q$TIME_PERIOD)

unique(na_q$TIME_PERIOD)



Calculate a fixity checksum for an object

Description

Uses a hash function (md5) on an object and calculates a digest of the object in the form of a character string.

Usage

fixity_checksum(data_object, algorithm = "md5")

Arguments

data_object

A dataset downloaded with some eurostat package function.

algorithm

Algorithm to use when calculating a checksum for a dataset. Default is 'md5', but can be any supported algorithm in digest function.

Details

“Fixity, in the preservation sense, means the assurance that a digital file has remained unchanged, i.e. fixed.” (Bailey, 2014). In practice, fixity can most easily be established by calculating a checksum for the data object that changes if anything in the data object has changed. What we use as a checksum here is by default calculated with md5 hash algorithm. It is possible to use other algorithms supported by the imported digest function, see function documentation.

In the case of big objects with millions of rows of data calculating a checksum can take a bit longer and require some amount of RAM to be available. Selecting another algorithm might perform faster and/or more efficiently. Whichever algorithm you are using, please make sure to report it transparently in your work for transparency and ensuring replicability.

This function takes the whole data object as an input, meaning that everything counts when calculating the fixity checksum. If the dataset column names are labeled, if the data itself is labeled, if stringsAsFactors is TRUE, if flags are removed or kept, if data is somehow edited... all these affect the calculated checksum. It is advisable to calculate the checksum immediately after downloading the data, before adding any labels or doing other mutating operations. If you are using other arguments than the default ones when downloading data, it is also good to report the exact arguments used.

This implementation fulfills the level 1 requirement of National Digital Stewardship Alliance (NDSA) preservation levels by creating "fixity info if it wasn’t provided with the content". In the current version of the package, fixity information has to be created manually and is at the responsibility of the user.

Source

https://www.dpconline.org/handbook/technical-solutions-and-tools/fixity-and-checksums

See Also

digest::digest()


Create A Data Bibliography

Description

Creates a bibliography from selected Eurostat data files, including last Eurostat update, URL access data, and optional keywords set by the user.

Usage

get_bibentry(code, keywords = NULL, format = "Biblatex", lang = "en")

Arguments

code

A Eurostat data code or a vector of Eurostat data codes as character or factor.

keywords

A list of keywords to be added to the entries. Defaults to NULL.

format

Default is 'Biblatex', alternatives are 'bibentry' or 'Bibtex' (not case sensitive)

lang

2-letter language code, default is "en" (English), other options are "fr" (French) and "de" (German). Used for labeling datasets.

Value

a bibentry, Bibtex or Biblatex object.

Citing Eurostat data

For citing datasets, use get_bibentry() to build a bibliography that is suitable for your reference manager of choice.

When using Eurostat data in other contexts than academic publications that in-text citations or footnotes/endnotes, the following guidelines may be helpful:

It should be avoided to associate different entities (e.g. Eurostat, National Statistical Offices, other data providers) to the same dataset or indicator without specifying the role of each of them in the treatment of data.

See also section "Eurostat: Copyright notice and free re-use of data" in get_eurostat() documentation.

Author(s)

Daniel Antal, Przemyslaw Biecek

See Also

utils::bibentry RefManageR::toBiblatex

Examples


## Not run: 
  my_bibliography <- get_bibentry(
    code = c("tran_hv_frtra", "tec00001"),
    keywords = list(
      c("transport", "freight", "multimodal data", "GDP"),
      c("economy and finance", "annual", "national accounts", "GDP")
    ),
    format = "Biblatex"
  )
  my_bibliography

## End(Not run)


Get Eurostat Data

Description

Download data sets from Eurostat https://ec.europa.eu/eurostat

Usage

get_eurostat(
  id,
  time_format = "date",
  filters = NULL,
  type = "code",
  select_time = NULL,
  lang = "en",
  cache = TRUE,
  update_cache = FALSE,
  cache_dir = NULL,
  compress_file = TRUE,
  stringsAsFactors = FALSE,
  keepFlags = FALSE,
  use.data.table = FALSE,
  ...
)

Arguments

id

A unique identifier / code for the dataset of interest. If code is not known search_eurostat() function can be used to search Eurostat table of contents.

time_format

a string giving a type of the conversion of the time column from the eurostat format. The default argument "date" converts to a Date() class with the date being the first day of the period. A "date_last" argument converts the dataset date to a Date() class object with the difference that the exact date is the last date of the period. Period can be year, semester (half year), quarter, month, or week (See eurotime2date() for more information). Argument "num" converts the date into a numeric (integer) meaning that the first day of the year 2000 is close to 2000.01 and the last day of the year is close to 2000.99 (see eurotime2num() for more information). Using the argument "raw" preserves the dates as they were in the original Eurostat data.

filters

A named list of filters. Names of list objects are Eurostat variable codes and values are vectors of observation codes. If NULL (default) the whole dataset is returned. See details for more information on filters and limitations per query.

type

A type of variables, "code" (default), "label" or "both". The parameter "both" will return a data_frame with named vectors, labels as values and codes as names.

select_time

a character symbol for a time frequency or NULL, which is used by default as most datasets have just one time frequency. For datasets with multiple time frequencies, select one or more of the desired frequencies with: "Y" (or "A") = annual, "S" = semi-annual / semester, "Q" = quarterly, "M" = monthly, "W" = weekly. For all frequencies in same data frame time_format = "raw" should be used.

lang

2-letter language code, default is "en" (English), other options are "fr" (French) and "de" (German). Used for labeling datasets.

cache

a logical whether to do caching. Default is TRUE.

update_cache

a logical whether to update cache. Can be set also with options(eurostat_update = TRUE)

cache_dir

a path to a cache directory. NULL (default) uses and creates 'eurostat' directory in the temporary directory defined by base R tempdir() function. The user can set the cache directory to an existing directory by using this argument. The cache directory can also be set with set_eurostat_cache_dir() function.

compress_file

a logical whether to compress the RDS-file in caching. Default is TRUE.

stringsAsFactors

if TRUE (the default) variables are converted to factors in the original Eurostat order. If FALSE they are returned as strings.

keepFlags

a logical whether the flags (e.g. "confidential", "provisional") should be kept in a separate column or if they can be removed. Default is FALSE. For flag values see: https://ec.europa.eu/eurostat/data/database/information. Also possible non-real zero "0n" is indicated in flags column. Flags are not available for eurostat API, so keepFlags can not be used with a filters.

use.data.table

Use faster data.table functions? Default is FALSE. On Windows requires that RTools is installed.

...

Arguments passed on to get_eurostat_json

proxy

Use proxy, TRUE or FALSE (default).

Details

Datasets are downloaded from the Eurostat SDMX 2.1 API in TSV format or from The Eurostat API Statistics JSON API. If only the table id is given, the whole table is downloaded from the SDMX API. If any filters are given JSON API is used instead.

The bulk download facility is the fastest method to download whole datasets. It is also often the only way as the JSON API has limitation of maximum 50 sub-indicators at time and whole datasets usually exceeds that. Also, it seems that multi frequency datasets can only be retrieved via bulk download facility and the select_time is not available for JSON API method.

If your connection is through a proxy, you may have to set proxy parameters to use JSON API, see get_eurostat_json().

By default datasets are cached to reduce load on Eurostat services and because some datasets can be quite large. Cache files are stored in a temporary directory by default or in a named directory (See set_eurostat_cache_dir()). The cache can be emptied with clean_eurostat_cache().

The id, a code, for the dataset can be searched with the search_eurostat() or from the Eurostat database https://ec.europa.eu/eurostat/data/database. The Eurostat database gives codes in the Data Navigation Tree after every dataset in parenthesis.

Value

a tibble.

One column for each dimension in the data, the time column for a time dimension and the values column for numerical values. Eurostat data does not include all missing values and a treatment of missing values depend on source. In bulk download facility missing values are dropped if all dimensions are missing on particular time. In JSON API missing values are dropped only if all dimensions are missing on all times. The data from bulk download facility can be completed for example with tidyr::complete().

Eurostat: Copyright notice and free re-use of data

The following copyright notice is provided for end user convenience. Please check up-to-date copyright information from the eurostat website: https://ec.europa.eu/eurostat/about-us/policies/copyright

"(c) European Union, 1995 - today

Eurostat has a policy of encouraging free re-use of its data, both for non-commercial and commercial purposes. All statistical data, metadata, content of web pages or other dissemination tools, official publications and other documents published on its website, with the exceptions listed below, can be reused without any payment or written licence provided that:

For exceptions to the abovementioned principles see Eurostat website

Filtering datasets

When using Eurostat API Statistics (JSON API), datasets can be filtered before they are downloaded and saved in local memory. The general format for filter parameters is ⁠<DIMENSION_CODE>=<VALUE>⁠.

Filter parameters are optional but the used dimension codes must be present in the data product that is being queried. Dimension codes can vary between different data products so it may be useful to examine new datasets in Eurostat data browser beforehand. However, most if not all Eurostat datasets concern European countries and contain information that was gathered at some point in time, so geo and time dimension codes can usually be used.

⁠<DIMENSION_CODE>⁠ and ⁠<VALUE>⁠ are case-insensitive and they can be written in lowercase or uppercase in the query.

Parameters are passed onto the eurostat package functions get_eurostat() and get_eurostat_json() as a list item. If an individual item contains multiple items, as it often can be in the case of geo parameters and other optional items, they must be in the form of a vector: c("FI", "SE"). For examples on how to use these parameters, see function examples below.

Time parameters

time and time_period address the same TIME_PERIOD dimension in the dataset and can be used interchangeably. In the Eurostat documentation it is stated that "Using more than one Time parameter in the same query is not accepted", but practice has shown that actually Eurostat API allows multiple time parameters in the same query. This makes it possible to use R colon operator when writing queries, so time = c(2015:2018) translates to ⁠&time=2015&time=2016&time=2017&time=2018⁠.

The only exception to this is when the queried dataset contains e.g. quarterly data and TIME_PERIOD is saved as 2015-Q1, 2015-Q2 etc. Then it is possible to use time=2015-Q1&time=2015-Q2 style in the query URL, but this makes it unfeasible to use the colon operator and requires a lot of manual typing.

Because of this, it is useful to know about other time parameters as well:

Using both untilTimePeriod and sinceTimePeriod parameters in the same query is allowed, making the usage of the R colon operator unnecessary. In the case of quarterly data, using untilTimePeriod and sinceTimePeriod parameters also works, as opposed to the colon operator, so it is generally safer to use them as well.

Other dimensions

In get_eurostat_json() examples nama_10_gdp dataset is filtered with two additional filter parameters:

Filters like these are most likely unique to the nama_10_gdp dataset (or other datasets within the same domain) and should not be used with others dataset without user discretion. By using label_eurostat() we know that "B1GQ" stands for "Gross domestic product at market prices" and "CLV_I10" means "Chain linked volumes, index 2010=100".

Different dimension codes can be translated to a natural language by using the get_eurostat_dic() function, which returns labels for individual dimension items such as na_item and unit, as opposed to label_eurostat() which does it for whole datasets. For example, the parameter na_item stands for "National accounts indicator (ESA 2010)" and unit stands for "Unit of measure".

Language

All datasets have metadata available in English, French and German. If no parameter is given, the labels are returned in English.

Example:

More information

For more information about data filtering see Eurostat documentation on API Statistics: https://wikis.ec.europa.eu/display/EUROSTATHELP/API+Statistics+-+data+query#APIStatisticsdataquery-TheparametersdefinedintheRESTrequest

Citing Eurostat data

For citing datasets, use get_bibentry() to build a bibliography that is suitable for your reference manager of choice.

When using Eurostat data in other contexts than academic publications that in-text citations or footnotes/endnotes, the following guidelines may be helpful:

It should be avoided to associate different entities (e.g. Eurostat, National Statistical Offices, other data providers) to the same dataset or indicator without specifying the role of each of them in the treatment of data.

See also section "Eurostat: Copyright notice and free re-use of data" in get_eurostat() documentation.

Disclaimer: Availability of filtering functionalities

Currently it only possible to download filtered data through API Statistics (JSON API) when using eurostat package, although technically filtering datasets downloaded through the SDMX Dissemination API is also supported by Eurostat. We may support this feature in the future. In the meantime, if you are interested in filtering Dissemination API data queries manually, please consult the following Eurostat documentation: https://wikis.ec.europa.eu/display/EUROSTATHELP/API+SDMX+2.1+-+data+filtering

Strategies for handling large datasets more efficiently

Most Eurostat datasets are relatively manageable, at least on a machine with 16 GB of RAM. The largest dataset in Eurostat database, at the time of writing this, had 148362539 (148 million) values, which results in an object with 148 million rows in tidy data (long) format. The test machine with 16 GB of RAM was able to handle the second largest dataset in the database with 91 million values (rows).

There are still some methods to make data fetching functions perform faster:

Author(s)

Przemyslaw Biecek, Leo Lahti, Janne Huovari, Markus Kainu and Pyry Kantanen

References

See citation("eurostat"):

Kindly cite the eurostat R package as follows:

  Lahti L., Huovari J., Kainu M., and Biecek P. (2017). Retrieval and
  analysis of Eurostat open data with the eurostat package. The R
  Journal 9(1), pp. 385-392. doi: 10.32614/RJ-2017-019

  Lahti, L., Huovari J., Kainu M., Biecek P., Hernangomez D., Antal D.,
  and Kantanen P. (2023). eurostat: Tools for Eurostat Open Data
  [Computer software]. R package version 4.0.0.
  https://github.com/rOpenGov/eurostat

To see these entries in BibTeX format, use 'print(<citation>,
bibtex=TRUE)', 'toBibtex(.)', or set
'options(citation.bibtex.max=999)'.

When citing data downloaded from Eurostat, see section "Citing Eurostat data" in get_eurostat() documentation.

See Also

search_eurostat(), label_eurostat()

Examples


## Not run: 
k <- get_eurostat("nama_10_lp_ulc")
k <- get_eurostat("nama_10_lp_ulc", time_format = "num")
k <- get_eurostat("nama_10_lp_ulc", update_cache = TRUE)

k <- get_eurostat("nama_10_lp_ulc",
  cache_dir = file.path(tempdir(), "r_cache")
)
options(eurostat_update = TRUE)
k <- get_eurostat("nama_10_lp_ulc")
options(eurostat_update = FALSE)

set_eurostat_cache_dir(file.path(tempdir(), "r_cache2"))
k <- get_eurostat("nama_10_lp_ulc")
k <- get_eurostat("nama_10_lp_ulc", cache = FALSE)
k <- get_eurostat("avia_gonc", select_time = "Y", cache = FALSE)

dd <- get_eurostat("nama_10_gdp",
  filters = list(
    geo = "FI",
    na_item = "B1GQ",
    unit = "CLV_I10"
  )
)

# A dataset with multiple time series in one
dd2 <- get_eurostat("AVIA_GOR_ME",
  select_time = c("A", "M", "Q"),
  time_format = "date_last"
)

# An example of downloading whole dataset from JSON API
dd3 <- get_eurostat("AVIA_GOR_ME",
  filters = list()
)

# Filtering a dataset from a local file
dd3_filter <- get_eurostat("AVIA_GOR_ME",
  filters = list(
    tra_meas = "FRM_BRD"
  )
)


## End(Not run)


Download Eurostat Dictionary

Description

Download a Eurostat dictionary.

Usage

get_eurostat_dic(dictname, lang = "en")

Arguments

dictname

A character, dictionary for the variable to be downloaded.

lang

A character, language code. Options: "en" (default), "fr", "de".

Details

For given coded variable from Eurostat https://ec.europa.eu/eurostat/. The dictionaries link codes with human-readable labels. To translate codes to labels, use label_eurostat().

Value

tibble with two columns: code names and full names.

Author(s)

Przemyslaw Biecek and Leo Lahti leo.lahti@iki.fi. Thanks to Wietse Dol for contributions. Updated by Pyry Kantanen to support XML codelists.

References

See citation("eurostat"):

# Kindly cite the eurostat R package as follows:
# 
#   Lahti L., Huovari J., Kainu M., and Biecek P. (2017). Retrieval and
#   analysis of Eurostat open data with the eurostat package. The R
#   Journal 9(1), pp. 385-392. doi: 10.32614/RJ-2017-019
# 
#   Lahti, L., Huovari J., Kainu M., Biecek P., Hernangomez D., Antal D.,
#   and Kantanen P. (2023). eurostat: Tools for Eurostat Open Data
#   [Computer software]. R package version 4.0.0.
#   https://github.com/rOpenGov/eurostat
# 
# To see these entries in BibTeX format, use 'print(<citation>,
# bibtex=TRUE)', 'toBibtex(.)', or set
# 'options(citation.bibtex.max=999)'.

See Also

label_eurostat(), get_eurostat(), search_eurostat().

Examples



get_eurostat_dic("crop_pro")

# Try another language
get_eurostat_dic("crop_pro", lang = "fr")



Get all datasets in a folder

Description

Loops over all files in a Eurostat database folder, downloads the data and assigns the datasets to environment.

Usage

get_eurostat_folder(code, env = .EurostatEnv)

Arguments

code

Folder code from Eurostat Table of Contents.

env

Name of the environment where downloaded datasets are assigned. Default is .EurostatEnv. If NULL, datasets are returned as a list object.

Details

The datasets are assigned into .EurostatEnv by default, using dataset codes as object names. The datasets are downloaded from SDMX API as TSV files, meaning that they are returned without filtering. No filters can be provided using this function.

Please do not attempt to download too many datasets or the whole database at once. The number of datasets that can be downloaded at once is hardcoded to 20. The function also asks the user for confirmation if the number of datasets in a folder is more than 10. This is by design to discourage straining Eurostat API.

Data source: Eurostat Table of Contents

The Eurostat Table of Contents (TOC) is downloaded from https://ec.europa.eu/eurostat/api/dissemination/catalogue/toc/txt?lang=en (default) or from French or German language variants: https://ec.europa.eu/eurostat/api/dissemination/catalogue/toc/txt?lang=fr https://ec.europa.eu/eurostat/api/dissemination/catalogue/toc/txt?lang=de

See Eurostat documentation on TOC items: https://wikis.ec.europa.eu/display/EUROSTATHELP/API+-+Detailed+guidelines+-+Catalogue+API+-+TOC

Data source: Eurostat SDMX 2.1 Dissemination API

Data is downloaded from Eurostat SDMX 2.1 API endpoint as compressed TSV files that are transformed into tabular format. See Eurostat documentation for more information: https://wikis.ec.europa.eu/display/EUROSTATHELP/API+SDMX+2.1+-+data+query

The new dissemination API replaces the old bulk download facility that was used by Eurostat before October 2023 and by the eurostat R package versions before 4.0.0. See Eurostat documentation about the transition from Bulk Download to API for more information about the differences between the old bulk download facility and the data provided by the new API connection: https://wikis.ec.europa.eu/display/EUROSTATHELP/Transition+-+from+Eurostat+Bulk+Download+to+API

See especially the document Migrating_to_API_TSV.pdf that describes the changes in TSV file format in new applications.

For more information about SDMX 2.1, see SDMX standards: Section 7: Guidelines for the use of web services, Version 2.1: https://sdmx.org/wp-content/uploads/SDMX_2-1_SECTION_7_WebServicesGuidelines.pdf

Author(s)

Pyry Kantanen

See Also

get_eurostat_toc() toc_count_children() toc_determine_hierarchy() toc_list_children() toc_count_whitespace()


Download Geospatial Data from GISCO

Description

Downloads either a simple features (sf) or a data_frame of NUTS regions. This function is a wrapper of giscoR::gisco_get_nuts(). This function requires to have installed the packages sf and giscoR.

Usage

get_eurostat_geospatial(
  output_class = "sf",
  resolution = "60",
  nuts_level = "all",
  year = "2016",
  cache = TRUE,
  update_cache = FALSE,
  cache_dir = NULL,
  crs = "4326",
  make_valid = "DEPRECATED",
  ...
)

Arguments

output_class

Class of object returned, either sf ⁠simple features⁠ or df (data_frame). spdf output has been soft-deprecated, the function would switch to sf.

resolution

Resolution of the geospatial data. One of

  • "60" (1:60million),

  • "20" (1:20million)

  • "10" (1:10million)

  • "03" (1:3million) or

  • "01" (1:1million).

nuts_level

Level of NUTS classification of the geospatial data. One of "0", "1", "2", "3" or "all" (mimics the original behaviour)

year

NUTS release year. One of "2003", "2006", "2010", "2013", "2016" or "2021"

cache

a logical whether to do caching. Default is TRUE.

update_cache

a logical whether to update cache. Can be set also with options(eurostat_update = TRUE)

cache_dir

a path to a cache directory. See set_eurostat_cache_dir(). If NULL and the cache dir has not been set globally the file would be stored in the tempdir().

crs

projection of the map: 4-digit EPSG code. One of:

  • "4326" - WGS84

  • "3035" - ETRS89 / ETRS-LAEA

  • "3857" - Pseudo-Mercator

make_valid

Deprecated

...

Arguments passed on to giscoR::gisco_get_nuts

verbose

Logical, displays information. Useful for debugging, default is FALSE.

spatialtype

Type of geometry to be returned:

  • "BN": Boundaries - LINESTRING object.

  • "LB": Labels - POINT object.

  • "RG": Regions - MULTIPOLYGON/POLYGON object.

country

Optional. A character vector of country codes. It could be either a vector of country names, a vector of ISO3 country codes or a vector of Eurostat country codes. Mixed types (as c("Turkey","US","FRA")) would not work. See also countrycode::countrycode().

nuts_id

Optional. A character vector of NUTS IDs.

Details

The objects downloaded from GISCO should contain all or some of the following variable columns:

Value

a sf or data_frame

Eurostat: Copyright notice and free re-use of data

The following copyright notice is provided for end user convenience. Please check up-to-date copyright information from the eurostat website: https://ec.europa.eu/eurostat/about-us/policies/copyright

"(c) European Union, 1995 - today

Eurostat has a policy of encouraging free re-use of its data, both for non-commercial and commercial purposes. All statistical data, metadata, content of web pages or other dissemination tools, official publications and other documents published on its website, with the exceptions listed below, can be reused without any payment or written licence provided that:

For exceptions to the abovementioned principles see Eurostat website

Data source: GISCO - General Copyright

"Eurostat's general copyright notice and licence policy is applicable and can be consulted here: https://ec.europa.eu/eurostat/about-us/policies/copyright

Please also be aware of the European Commission's general conditions: https://commission.europa.eu/legal-notice_en

Moreover, there are specific provisions applicable to some of the following datasets available for downloading. The download and usage of these data is subject to their acceptance:

Of the abovementioned datasets, Administrative Units / Statistical Units is applicable if the user wants to draw maps with borders provided by GISCO / EuroGeographics.

Data source: GISCO - Administrative Units / Statistical Units

The following copyright notice is provided for end user convenience. Please check up-to-date copyright information from the GISCO website: GISCO: Geographical information and maps - Administrative units/statistical units

"In addition to the general copyright and licence policy applicable to the whole Eurostat website, the following specific provisions apply to the datasets you are downloading. The download and usage of these data is subject to the acceptance of the following clauses:

  1. The Commission agrees to grant the non-exclusive and not transferable right to use and process the Eurostat/GISCO geographical data downloaded from this page (the "data").

  2. The permission to use the data is granted on condition that:

    1. the data will not be used for commercial purposes;

    2. the source will be acknowledged. A copyright notice, as specified below, will have to be visible on any printed or electronic publication using the data downloaded from this page."

Copyright notice

When data downloaded from this page is used in any printed or electronic publication, in addition to any other provisions applicable to the whole Eurostat website, data source will have to be acknowledged in the legend of the map and in the introductory page of the publication with the following copyright notice:

EN: © EuroGeographics for the administrative boundaries

FR: © EuroGeographics pour les limites administratives

DE: © EuroGeographics bezüglich der Verwaltungsgrenzen

For publications in languages other than English, French or German, the translation of the copyright notice in the language of the publication shall be used.

If you intend to use the data commercially, please contact EuroGeographics for information regarding their licence agreements."

Author(s)

Markus Kainu markuskainu@gmail.com, Diego Hernangomez https://github.com/dieghernan/

Source

Data source: Eurostat

© EuroGeographics for the administrative boundaries

Data downloaded using giscoR

See Also

giscoR::gisco_get_nuts()

Other geospatial: eurostat_geodata_60_2016

Examples


# Uses cached dataset
sf <- get_eurostat_geospatial(
  output_class = "sf",
  resolution = "60",
  nuts_level = "all"
)
# Downloads dataset from server
sf2 <- get_eurostat_geospatial(
  output_class = "sf",
  resolution = "20",
  nuts_level = "all"
)
df <- get_eurostat_geospatial(
  output_class = "df",
  nuts_level = "0"
)



Get Eurostat data interactive

Description

A simple interactive helper function to go through the steps of downloading and/or finding suitable eurostat datasets.

Usage

get_eurostat_interactive(code = NULL)

Arguments

code

A unique identifier / code for the dataset of interest. If code is not known search_eurostat() function can be used to search Eurostat table of contents.

Details

This function is intended to enable easy exploration of different eurostat package functionalities and functions. In order to not drown the end user in endless menus this function does not allow for setting all possible get_eurostat() function arguments. It is possible to set time_format, type, lang, stringsAsFactors, keepFlags, and use.data.table in the interactive menus.

In some datasets setting these parameters may result in a "Error in label_eurostat" error, for example: "labels for XXXXXX includes duplicated labels in the Eurostat dictionary". In these cases, and with other more complex queries, please use get_eurostat() function directly.

See Also

get_eurostat()


Get Data from Eurostat API in JSON

Description

Retrieve data from Eurostat API in JSON format.

Usage

get_eurostat_json(
  id,
  filters = NULL,
  type = "code",
  lang = "en",
  stringsAsFactors = FALSE,
  proxy = FALSE,
  ...
)

Arguments

id

A unique identifier / code for the dataset of interest. If code is not known search_eurostat() function can be used to search Eurostat table of contents.

filters

A named list of filters. Names of list objects are Eurostat variable codes and values are vectors of observation codes. If NULL (default) the whole dataset is returned. See details for more information on filters and limitations per query.

type

A type of variables, "code" (default), "label" or "both". The parameter "both" will return a data_frame with named vectors, labels as values and codes as names.

lang

2-letter language code, default is "en" (English), other options are "fr" (French) and "de" (German). Used for labeling datasets.

stringsAsFactors

if TRUE (the default) variables are converted to factors in the original Eurostat order. If FALSE they are returned as strings.

proxy

Use proxy, TRUE or FALSE (default).

...

Arguments passed on to httr2::req_proxy

req

A request.

url,port

Location of proxy.

username,password

Login details for proxy, if needed.

auth

Type of HTTP authentication to use. Should be one of the following: basic, digest, digest_ie, gssnegotiate, ntlm, any.

Details

Data to retrieve from The Eurostat Web Services can be specified with filters. Normally, it is better to use JSON query through get_eurostat(), than to use get_eurostat_json() directly.

Queries are limited to 50 sub-indicators at a time. A time can be filtered with fixed "time" filter or with "sinceTimePeriod" and "lastTimePeriod" filters. A sinceTimePeriod = 2000 returns observations from 2000 to a last available. A lastTimePeriod = 10 returns a 10 last observations. See "Filtering datasets" section below for more detailed information about filters.

To use a proxy to connect, proxy arguments can be passed to httr2::req_perform() via httr2::req_proxy() - see latter function documentation for parameter names that can be passed with .... A non-functional example: get_eurostat_json(id, filters, proxy = TRUE, url = "127.0.0.1", port = 80).

When retrieving data from Eurostat JSON API the user may encounter errors. For end user convenience, we have provided a ready-made internal dataset sdmx_http_errors that contains descriptive labels and descriptions about the possible interpretation or cause of each error. These messages are returned if the API returns a status indicating a HTTP error (400 or greater).

The Eurostat implementation seems to be based on SDMX 2.1, which is the reason we've used SDMX Standards guidelines as a supplementary source that we have included in the dataset. What this means in practice is that the dataset contains error codes and their mappings that are not mentioned in the Eurostat website. We hope you never encounter them.

Value

A dataset as an object of data.frame class.

Data source: Eurostat API Statistics (JSON API)

Data is downloaded from Eurostat API Statistics. See Eurostat documentation for more information about data queries in API Statistics https://wikis.ec.europa.eu/display/EUROSTATHELP/API+Statistics+-+data+query

This replaces the old JSON Web Services that was used by Eurostat before February 2023 and by the eurostat R package versions before 3.7.13. See Eurostat documentation about the migration from JSON web service to API Statistics for more information about the differences between the old and the new service: https://wikis.ec.europa.eu/display/EUROSTATHELP/API+Statistics+-+migrating+from+JSON+web+service+to+API+Statistics

For easily viewing which filtering options are available - in addition to the default ones, time and language - Eurostat Web services Query builder tool may be useful: https://ec.europa.eu/eurostat/web/query-builder

Filtering datasets

When using Eurostat API Statistics (JSON API), datasets can be filtered before they are downloaded and saved in local memory. The general format for filter parameters is ⁠<DIMENSION_CODE>=<VALUE>⁠.

Filter parameters are optional but the used dimension codes must be present in the data product that is being queried. Dimension codes can vary between different data products so it may be useful to examine new datasets in Eurostat data browser beforehand. However, most if not all Eurostat datasets concern European countries and contain information that was gathered at some point in time, so geo and time dimension codes can usually be used.

⁠<DIMENSION_CODE>⁠ and ⁠<VALUE>⁠ are case-insensitive and they can be written in lowercase or uppercase in the query.

Parameters are passed onto the eurostat package functions get_eurostat() and get_eurostat_json() as a list item. If an individual item contains multiple items, as it often can be in the case of geo parameters and other optional items, they must be in the form of a vector: c("FI", "SE"). For examples on how to use these parameters, see function examples below.

Time parameters

time and time_period address the same TIME_PERIOD dimension in the dataset and can be used interchangeably. In the Eurostat documentation it is stated that "Using more than one Time parameter in the same query is not accepted", but practice has shown that actually Eurostat API allows multiple time parameters in the same query. This makes it possible to use R colon operator when writing queries, so time = c(2015:2018) translates to ⁠&time=2015&time=2016&time=2017&time=2018⁠.

The only exception to this is when the queried dataset contains e.g. quarterly data and TIME_PERIOD is saved as 2015-Q1, 2015-Q2 etc. Then it is possible to use time=2015-Q1&time=2015-Q2 style in the query URL, but this makes it unfeasible to use the colon operator and requires a lot of manual typing.

Because of this, it is useful to know about other time parameters as well:

Using both untilTimePeriod and sinceTimePeriod parameters in the same query is allowed, making the usage of the R colon operator unnecessary. In the case of quarterly data, using untilTimePeriod and sinceTimePeriod parameters also works, as opposed to the colon operator, so it is generally safer to use them as well.

Other dimensions

In get_eurostat_json() examples nama_10_gdp dataset is filtered with two additional filter parameters:

Filters like these are most likely unique to the nama_10_gdp dataset (or other datasets within the same domain) and should not be used with others dataset without user discretion. By using label_eurostat() we know that "B1GQ" stands for "Gross domestic product at market prices" and "CLV_I10" means "Chain linked volumes, index 2010=100".

Different dimension codes can be translated to a natural language by using the get_eurostat_dic() function, which returns labels for individual dimension items such as na_item and unit, as opposed to label_eurostat() which does it for whole datasets. For example, the parameter na_item stands for "National accounts indicator (ESA 2010)" and unit stands for "Unit of measure".

Language

All datasets have metadata available in English, French and German. If no parameter is given, the labels are returned in English.

Example:

More information

For more information about data filtering see Eurostat documentation on API Statistics: https://wikis.ec.europa.eu/display/EUROSTATHELP/API+Statistics+-+data+query#APIStatisticsdataquery-TheparametersdefinedintheRESTrequest

Eurostat: Copyright notice and free re-use of data

The following copyright notice is provided for end user convenience. Please check up-to-date copyright information from the eurostat website: https://ec.europa.eu/eurostat/about-us/policies/copyright

"(c) European Union, 1995 - today

Eurostat has a policy of encouraging free re-use of its data, both for non-commercial and commercial purposes. All statistical data, metadata, content of web pages or other dissemination tools, official publications and other documents published on its website, with the exceptions listed below, can be reused without any payment or written licence provided that:

For exceptions to the abovementioned principles see Eurostat website

Citing Eurostat data

For citing datasets, use get_bibentry() to build a bibliography that is suitable for your reference manager of choice.

When using Eurostat data in other contexts than academic publications that in-text citations or footnotes/endnotes, the following guidelines may be helpful:

It should be avoided to associate different entities (e.g. Eurostat, National Statistical Offices, other data providers) to the same dataset or indicator without specifying the role of each of them in the treatment of data.

See also section "Eurostat: Copyright notice and free re-use of data" in get_eurostat() documentation.

Disclaimer: Availability of filtering functionalities

Currently it only possible to download filtered data through API Statistics (JSON API) when using eurostat package, although technically filtering datasets downloaded through the SDMX Dissemination API is also supported by Eurostat. We may support this feature in the future. In the meantime, if you are interested in filtering Dissemination API data queries manually, please consult the following Eurostat documentation: https://wikis.ec.europa.eu/display/EUROSTATHELP/API+SDMX+2.1+-+data+filtering

Author(s)

Przemyslaw Biecek, Leo Lahti, Janne Huovari Markus Kainu and Pyry Kantanen

References

See citation("eurostat"):

Kindly cite the eurostat R package as follows:

  Lahti L., Huovari J., Kainu M., and Biecek P. (2017). Retrieval and
  analysis of Eurostat open data with the eurostat package. The R
  Journal 9(1), pp. 385-392. doi: 10.32614/RJ-2017-019

  Lahti, L., Huovari J., Kainu M., Biecek P., Hernangomez D., Antal D.,
  and Kantanen P. (2023). eurostat: Tools for Eurostat Open Data
  [Computer software]. R package version 4.0.0.
  https://github.com/rOpenGov/eurostat

To see these entries in BibTeX format, use 'print(<citation>,
bibtex=TRUE)', 'toBibtex(.)', or set
'options(citation.bibtex.max=999)'.

When citing data downloaded from Eurostat, see section "Citing Eurostat data" in get_eurostat() documentation.

See Also

httr2::req_proxy()

Examples

## Not run: 
# Generally speaking these queries would be done through get_eurostat
tmp <- get_eurostat_json("nama_10_gdp")
yy <- get_eurostat_json("nama_10_gdp", filters = list(
  geo = c("FI", "SE", "EU28"),
  time = c(2015:2023),
  lang = "FR",
  na_item = "B1GQ",
  unit = "CLV_I10"
))

# TIME_PERIOD filter works also with the new JSON API
yy2 <- get_eurostat_json("nama_10_gdp", filters = list(
   geo = c("FI", "SE", "EU28"),
   TIME_PERIOD = c(2015:2023),
   lang = "FR",
   na_item = "B1GQ",
   unit = "CLV_I10"
))

# An example from get_eurostat
dd <- get_eurostat("nama_10_gdp",
  filters = list(
  geo = "FI",
  na_item = "B1GQ",
  unit = "CLV_I10"
))

## End(Not run)

Download Data from Eurostat Dissemination API

Description

Download data from the eurostat database through the new dissemination API.

Usage

get_eurostat_raw(id, use.data.table = FALSE)

Arguments

id

A unique identifier / code for the dataset of interest. If code is not known search_eurostat() function can be used to search Eurostat table of contents.

use.data.table

Use faster data.table functions? Default is FALSE. On Windows requires that RTools is installed.

Value

A dataset in tibble format. First column contains comma separated codes of cases. Other columns usually corresponds to years and column names are years with preceding X. Data is in character format as it contains values together with eurostat flags for data.

Data source: Eurostat SDMX 2.1 Dissemination API

Data is downloaded from Eurostat SDMX 2.1 API endpoint as compressed TSV files that are transformed into tabular format. See Eurostat documentation for more information: https://wikis.ec.europa.eu/display/EUROSTATHELP/API+SDMX+2.1+-+data+query

The new dissemination API replaces the old bulk download facility that was used by Eurostat before October 2023 and by the eurostat R package versions before 4.0.0. See Eurostat documentation about the transition from Bulk Download to API for more information about the differences between the old bulk download facility and the data provided by the new API connection: https://wikis.ec.europa.eu/display/EUROSTATHELP/Transition+-+from+Eurostat+Bulk+Download+to+API

See especially the document Migrating_to_API_TSV.pdf that describes the changes in TSV file format in new applications.

For more information about SDMX 2.1, see SDMX standards: Section 7: Guidelines for the use of web services, Version 2.1: https://sdmx.org/wp-content/uploads/SDMX_2-1_SECTION_7_WebServicesGuidelines.pdf

Eurostat: Copyright notice and free re-use of data

The following copyright notice is provided for end user convenience. Please check up-to-date copyright information from the eurostat website: https://ec.europa.eu/eurostat/about-us/policies/copyright

"(c) European Union, 1995 - today

Eurostat has a policy of encouraging free re-use of its data, both for non-commercial and commercial purposes. All statistical data, metadata, content of web pages or other dissemination tools, official publications and other documents published on its website, with the exceptions listed below, can be reused without any payment or written licence provided that:

For exceptions to the abovementioned principles see Eurostat website

Citing Eurostat data

For citing datasets, use get_bibentry() to build a bibliography that is suitable for your reference manager of choice.

When using Eurostat data in other contexts than academic publications that in-text citations or footnotes/endnotes, the following guidelines may be helpful:

It should be avoided to associate different entities (e.g. Eurostat, National Statistical Offices, other data providers) to the same dataset or indicator without specifying the role of each of them in the treatment of data.

See also section "Eurostat: Copyright notice and free re-use of data" in get_eurostat() documentation.

Disclaimer: Availability of filtering functionalities

Currently it only possible to download filtered data through API Statistics (JSON API) when using eurostat package, although technically filtering datasets downloaded through the SDMX Dissemination API is also supported by Eurostat. We may support this feature in the future. In the meantime, if you are interested in filtering Dissemination API data queries manually, please consult the following Eurostat documentation: https://wikis.ec.europa.eu/display/EUROSTATHELP/API+SDMX+2.1+-+data+filtering

Author(s)

Przemyslaw Biecek, Leo Lahti, Janne Huovari and Pyry Kantanen

References

See citation("eurostat"):

# Kindly cite the eurostat R package as follows:
# 
#   Lahti L., Huovari J., Kainu M., and Biecek P. (2017). Retrieval and
#   analysis of Eurostat open data with the eurostat package. The R
#   Journal 9(1), pp. 385-392. doi: 10.32614/RJ-2017-019
# 
#   Lahti, L., Huovari J., Kainu M., Biecek P., Hernangomez D., Antal D.,
#   and Kantanen P. (2023). eurostat: Tools for Eurostat Open Data
#   [Computer software]. R package version 4.0.0.
#   https://github.com/rOpenGov/eurostat
# 
# To see these entries in BibTeX format, use 'print(<citation>,
# bibtex=TRUE)', 'toBibtex(.)', or set
# 'options(citation.bibtex.max=999)'.

See Also

get_eurostat()

Examples



eurostat:::get_eurostat_raw("educ_iste")



Download Table of Contents of Eurostat Data Sets

Description

Download table of contents (TOC) of eurostat datasets.

Usage

get_eurostat_toc(lang = "en")

Arguments

lang

2-letter language code, default is "en" (English), other options are "fr" (French) and "de" (German). Used for labeling datasets.

Details

In the downloaded Eurostat Table of Contents the 'code' column values are refer to the function 'id' that is used as an argument in certain functions when downloading datasets.

Value

A tibble with nine columns:

title

Dataset title in English (default)

code

Each item (dataset, table and folder) of the TOC has a unique code which allows it to be identified in the API. Used in the get_eurostat() and get_eurostat_raw() functions to retrieve datasets.

type

dataset, folder or table

last.update.of.data

Date, indicates the last time the dataset/table was updated (format DD.MM.YYYY or ⁠%d.%m.%Y⁠)

last.table.structure.change

Date, indicates the last time the dataset/table structure was modified (format DD.MM.YYYY or ⁠%d.%m.%Y⁠)

data.start

Date of the oldest value included in the dataset (if available) (format usually YYYY or ⁠%Y⁠ but can also be YYYY-MM, YYYY-MM-DD, YYYY-SN, YYYY-QN etc.)

data.end

Date of the most recent value included in the dataset (if available) (format usually YYYY or ⁠%Y⁠ but can also be YYYY-MM, YYYY-MM-DD, YYYY-SN, YYYY-QN etc.)

values

Number of actual values included in the dataset

hierarchy

Hierarchy of the data navigation tree, represented in the original txt file by a 4-spaces indentation prefix in the title

Data source: Eurostat Table of Contents

The Eurostat Table of Contents (TOC) is downloaded from https://ec.europa.eu/eurostat/api/dissemination/catalogue/toc/txt?lang=en (default) or from French or German language variants: https://ec.europa.eu/eurostat/api/dissemination/catalogue/toc/txt?lang=fr https://ec.europa.eu/eurostat/api/dissemination/catalogue/toc/txt?lang=de

See Eurostat documentation on TOC items: https://wikis.ec.europa.eu/display/EUROSTATHELP/API+-+Detailed+guidelines+-+Catalogue+API+-+TOC

Author(s)

Przemyslaw Biecek, Leo Lahti and Pyry Kantanen ropengov-forum@googlegroups.com

References

See citation("eurostat"):

Kindly cite the eurostat R package as follows:

  Lahti L., Huovari J., Kainu M., and Biecek P. (2017). Retrieval and
  analysis of Eurostat open data with the eurostat package. The R
  Journal 9(1), pp. 385-392. doi: 10.32614/RJ-2017-019

  Lahti, L., Huovari J., Kainu M., Biecek P., Hernangomez D., Antal D.,
  and Kantanen P. (2023). eurostat: Tools for Eurostat Open Data
  [Computer software]. R package version 4.0.0.
  https://github.com/rOpenGov/eurostat

To see these entries in BibTeX format, use 'print(<citation>,
bibtex=TRUE)', 'toBibtex(.)', or set
'options(citation.bibtex.max=999)'.

When citing data downloaded from Eurostat, see section "Citing Eurostat data" in get_eurostat() documentation.

See Also

get_eurostat(), search_eurostat()

Examples



tmp <- get_eurostat_toc()
head(tmp)

# Convert columns containing dates as character into Date class
# Last update of data
tmp[[4]] <- as.Date(tmp[[4]], format = c("%d.%m.%Y"))
# Last table structure change
tmp[[5]] <- as.Date(tmp[[5]], format = c("%d.%m.%Y"))
# Data start, contains several formats (date, week, month quarter, semester)
# Unfortunately semesters are not directly supported so they need to be
# changed into quarters
tmp$data.start <- gsub("S2", "Q3", tmp$data.start)
tmp$data.start <- lubridate::as_date(
 x = tmp$data.start, 
 format = c("%Y", "%Y-Q%q", "%Y-W%W", "%Y-S%q", "%Y-%m-%d", "%Y-%m")
 )
# Data end, same as data start
tmp$data.end <- gsub("S2", "Q3", tmp$data.end)
tmp$data.end <- lubridate::as_date(
 x = tmp$data.end, 
 format = c("%Y", "%Y-Q%q", "%Y-W%W", "%Y-S%q", "%Y-%m-%d", "%Y-%m")
 )



Harmonize Country Code

Description

The European Commission and the Eurostat generally uses ISO 3166-1 alpha-2 codes with two exceptions: EL (not GR) is used to represent Greece, and UK (not GB) is used to represent the United Kingdom. This function turns country codes into to ISO 3166-1 alpha-2.

Usage

harmonize_country_code(x)

Arguments

x

A character or a factor vector of eurostat countycodes.

Value

a vector.

Author(s)

Janne Huovari janne.huovari@ptt.fi

See Also

Other helpers: cut_to_classes(), dic_order(), eurotime2date(), eurotime2num(), label_eurostat()

Examples



lp <- get_eurostat("nama_10_lp_ulc")
lp$geo <- harmonize_country_code(lp$geo)



Harmonize NUTS region codes that changed with the NUTS2016 definition

Description

Eurostat mixes NUTS2013 and NUTS2016 geographic label codes in the 'geo' column, which creates time-wise comparativity issues. This deprecated function checked if you data is affected by this problem and gives information on what to do.

This function is deprecated, and a more general function was moved to regions::validate_nuts_regions().

Usage

harmonize_geo_code(dat)

Arguments

dat

A Eurostat data frame downloaded with get_eurostat()

Value

An augmented data frame that explains potential problems and possible solutions.

Author(s)

Daniel Antal

See Also

regions::validate_nuts_regions()

Other regions functions: add_nuts_level(), recode_to_nuts_2013(), recode_to_nuts_2016(), reexports

Examples

dat <- eurostat::tgs00026
regions::validate_nuts_regions(dat)

Get Eurostat Codes for data downloaded from new dissemination API

Description

Get definitions for Eurostat codes from Eurostat dictionaries.

Usage

label_eurostat(
  x,
  dic = NULL,
  code = NULL,
  eu_order = FALSE,
  lang = "en",
  countrycode = NULL,
  countrycode_nomatch = NULL,
  custom_dic = NULL,
  fix_duplicated = FALSE
)

label_eurostat_vars(x = NULL, id, lang = "en")

label_eurostat_tables(x, lang = "en")

Arguments

x

A character or a factor vector or a data_frame.

dic

A string (vector) naming eurostat dictionary or dictionaries. If NULL (default) dictionary names taken from column names of the data_frame.

code

For data_frames names of the column for which also code columns should be retained. The suffix "_code" is added to code column names.

eu_order

Logical. Should Eurostat ordering used for label levels. Affects only factors.

lang

2-letter language code, default is "en" (English), other options are "fr" (French) and "de" (German). Used for labeling datasets.

countrycode

A NULL or a name of the coding scheme for the countrycode::countrycode() to label "geo" variable with countrycode-package. It can be used to convert to short and long country names in many different languages. If NULL (default) eurostat dictionary is used instead.

countrycode_nomatch

What to do when using the countrycode to label a "geo" and countrycode fails to find a match, for example other than country codes like EU28. The original code is used with a NULL (default), eurostat dictionary label is used with "eurostat", and NA is used with NA.

custom_dic

a named vector or named list of named vectors to give an own dictionary for (part of) codes. Names of the vector should be codes and values labels. List can be used to specify dictionaries and then list names should be dictionary codes.

fix_duplicated

A logical. If TRUE, the code is added to the duplicated label values. If FALSE (default) error is given if labeling produce duplicates.

id

A unique identifier / code for the dataset of interest. If code is not known search_eurostat() function can be used to search Eurostat table of contents.

Details

A character or a factor vector of codes returns a corresponding vector of definitions. label_eurostat() labels also data_frames from get_eurostat(). For vectors a dictionary name have to be supplied. For data_frames dictionary names are taken from column names. "time" and "values" columns are returned as they were, so you can supply data_frame from get_eurostat() and get data_frame with definitions instead of codes.

Some Eurostat dictionaries includes duplicated labels. By default duplicated labels cause an error, but they can be fixed automatically with fix_duplicated = TRUE.

Value

a vector or a data_frame.

Functions

Author(s)

Janne Huovari janne.huovari@ptt.fi

See Also

countrycode::countrycode()

Other helpers: cut_to_classes(), dic_order(), eurotime2date(), eurotime2num(), harmonize_country_code()

Examples

## Not run: 
lp <- get_eurostat("nama_10_lp_ulc")
lpl <- label_eurostat(lp)
str(lpl)
lpl_order <- label_eurostat(lp, eu_order = TRUE)
lpl_code <- label_eurostat(lp, code = "unit")
# Note that the dataset id must be provided in label_eurostat_vars
label_eurostat_vars(id = "nama_10_lp_ulc", x = "geo", lang = "en")
label_eurostat_tables("nama_10_lp_ulc")
label_eurostat(c("FI", "DE", "EU28"), dic = "geo")
label_eurostat(
  c("FI", "DE", "EU28"),
  dic = "geo",
  custom_dic = c(DE = "Germany")
)
label_eurostat(
  c("FI", "DE", "EU28"),
  dic = "geo", countrycode = "country.name",
  custom_dic = c(EU28 = "EU")
)
label_eurostat(
  c("FI", "DE", "EU28"),
  dic = "geo",
  countrycode = "country.name"
)
# In Finnish
label_eurostat(
  c("FI", "DE", "EU28"),
  dic = "geo",
  countrycode = "cldr.short.fi"
)

## End(Not run)


Output cache information as data.frame

Description

Parses cache_list.json file and returns a data.frame

Usage

list_eurostat_cache_items(cache_dir = NULL)

Arguments

cache_dir

a path to a cache directory. NULL (default) uses and creates 'eurostat' directory in the temporary directory defined by base R tempdir() function. The user can set the cache directory to an existing directory by using this argument. The cache directory can also be set with set_eurostat_cache_dir() function.

Value

A data.frame object with 3 columns: dataset code, download date and query md5 hash


Recode geo labels and rename regions from NUTS2016 to NUTS2013

Description

Eurostat mixes NUTS2013 and NUTS2016 geographic label codes in the 'geo' column, which creates time-wise comparativity issues.

This function is deprecated, and a more general function was moved to ⁠[regions::recode_nuts()]⁠.

Usage

recode_to_nuts_2013(dat)

Arguments

dat

A Eurostat data frame downloaded with get_eurostat().

Value

An augmented and potentially relabelled data frame which contains all formerly 'NUTS2013' definition geo labels in the 'NUTS2016' vocabulary when only the code changed, but the boundary did not. It also contains some information on other geo labels that cannot be brought to the current 'NUTS2013' definition. Furthermore, when the official name of the region changed, it will use the new name (if the otherwise the region boundary did not change.) If not called before, the function will use the helper function harmonize_geo_code()

Author(s)

Daniel Antal

See Also

regions::recode_nuts()

Other regions functions: add_nuts_level(), harmonize_geo_code(), recode_to_nuts_2016(), reexports

Examples

test_regional_codes <- data.frame(
  geo = c("FRB", "FRE", "UKN02", "IE022", "FR243", "FRB03"),
  time = c(rep(as.Date("2014-01-01"), 5), as.Date("2015-01-01")),
  values = c(1:6),
  control = c(
    "Changed from NUTS2 to NUTS1",
    "New region NUTS2016 only",
    "Discontinued region NUTS2013",
    "Boundary shift NUTS2013",
    "Recoded in NUTS2013",
    "Recoded in NUTS2016"
  )
)

recode_to_nuts_2013(test_regional_codes)

Recode geo labels and rename regions from NUTS2013 to NUTS2016

Description

Eurostat mixes NUTS2013 and NUTS2016 geographic label codes in the 'geo' column, which creates time-wise comparativity issues.

This function is deprecated, and a more general function was moved to ⁠[regions::recode_nuts()]⁠.

Usage

recode_to_nuts_2016(dat)

Arguments

dat

A Eurostat data frame downloaded with get_eurostat().

Value

An augmented and potentially relabelled data frame which contains all formerly 'NUTS2013' definition geo labels in the 'NUTS2016' vocabulary when only the code changed, but the boundary did not. It also contains some information on other geo labels that cannot be brought to the current 'NUTS2016' definition. Furthermore, when the official name of the region changed, it will use the new name (if the otherwise the region boundary did not change.) If not called before, the function will use the helper function harmonize_geo_code()

Author(s)

Daniel Antal

See Also

regions::recode_nuts()

Other regions functions: add_nuts_level(), harmonize_geo_code(), recode_to_nuts_2013(), reexports

Examples

test_regional_codes <- data.frame(
  geo = c("FRB", "FRE", "UKN02", "IE022", "FR243", "FRB03"),
  time = c(rep(as.Date("2014-01-01"), 5), as.Date("2015-01-01")),
  values = c(1:6),
  control = c(
    "Changed from NUTS2 to NUTS1",
    "New region NUTS2016 only",
    "Discontinued region NUTS2013",
    "Boundary shift NUTS2013",
    "Recoded in NUTS2013",
    "Recoded in NUTS2016"
  )
)

recode_to_nuts_2016(test_regional_codes)

Recode Region Codes From Source To Target NUTS Typology

Description

These objects are imported from other packages. Follow the links below to see their documentation.

regions

recode_nuts, validate_geo_code, validate_nuts_regions

Arguments

dat

A data frame with a 3-5 character geo_var variable to be validated.

geo_var

Defaults to "geo". The variable that contains the 3-5 character geo codes to be validated.

geo

A vector of geographical code to validate.

nuts_year

A valid NUTS edition year.

Details

While country codes are technically not part of the NUTS typologies, Eurostat de facto uses a NUTS0 typology to identify countries. This de facto typology has three exception which are handled by the validate_nuts_countries function.

NUTS typologies have different versions, therefore the conformity is validated with one specific versions, which can be any of these: 1999, 2003, 2006, 2010, 2013, the currently used 2016 and the already announced and defined 2021.

The NUTS typology was codified with the NUTS2003, and the pre-1999 NUTS typologies may confuse programmatic data processing, given that some NUTS1 regions were identified with country codes in smaller countries that had no NUTS1 divisions.

Currently the 2016 is used by Eurostat, but many datasets still contain 2013 and sometimes earlier metadata.

Value

The original data frame with a 'geo_var' column is extended with a 'typology' column that states in which typology is the 'geo_var' a valid code. For invalid codes, looks up potential reasons of invalidity and adds them to the 'typology_change' column, and at last it adds a column of character vector containing the desired codes in the target typology, for example, in the NUTS2013 typology.

Returns the original dat data frame with a column that specifies the comformity with the NUTS definition of the year nuts_year.

A character list with the valid typology, or 'invalid' in the cases when the geo coding is not valid.

See Also

Other regions functions: add_nuts_level(), harmonize_geo_code(), recode_to_nuts_2013(), recode_to_nuts_2016()

Other regions functions: add_nuts_level(), harmonize_geo_code(), recode_to_nuts_2013(), recode_to_nuts_2016()

Other regions functions: add_nuts_level(), harmonize_geo_code(), recode_to_nuts_2013(), recode_to_nuts_2016()

Examples

{
foo <- data.frame (
  geo  =  c("FR", "DEE32", "UKI3" ,
            "HU12", "DED",
            "FRK"),
  values = runif(6, 0, 100 ),
  stringsAsFactors = FALSE )

recode_nuts(foo, nuts_year = 2013)
}

my_reg_data <- data.frame(
  geo = c(
    "BE1", "HU102", "FR1",
    "DED", "FR7", "TR", "DED2",
    "EL", "XK", "GB"
  ),
  values = runif(10)
)

validate_nuts_regions(my_reg_data)

validate_nuts_regions(my_reg_data, nuts_year = 2013)

validate_nuts_regions(my_reg_data, nuts_year = 2003)


my_reg_data <- data.frame(
  geo = c(
    "BE1", "HU102", "FR1",
    "DED", "FR7", "TR", "DED2",
    "EL", "XK", "GB"
  ),
  values = runif(10)
)

validate_geo_code(my_reg_data$geo)


Grep Datasets Titles from Eurostat

Description

Lists datasets from eurostat table of contents with the particular pattern in item titles.

Usage

search_eurostat(
  pattern,
  type = "dataset",
  column = "title",
  fixed = TRUE,
  lang = "en"
)

Arguments

pattern

Text string that is used to search from dataset, folder or table titles, depending on the type argument.

type

Selection for types of datasets to be searched. Default is dataset, other possible options are table, folder and all for all types.

column

Selection for the column of TOC where search is done. Default is title, other possible option is code.

fixed

logical. If TRUE (default), pattern is a string to be matched as is. See grep() documentation for more information.

lang

2-letter language code, default is "en" (English), other options are "fr" (French) and "de" (German). Used for labeling datasets.

Details

Downloads list of all datasets available on eurostat and return list of names of datasets that contains particular pattern in the dataset description. E.g. all datasets related to education of teaching.

If you wish to perform searches on other fields than item title, you can download the Eurostat Table of Contents manually using get_eurostat_toc() function and use grep() function normally. The data browser on Eurostat website may also return useful results.

Value

A tibble with nine columns:

title

Dataset title in English (default)

code

Each item (dataset, table and folder) of the TOC has a unique code which allows it to be identified in the API. Used in the get_eurostat() and get_eurostat_raw() functions to retrieve datasets.

type

dataset, folder or table

last.update.of.data

Date, indicates the last time the dataset/table was updated (format DD.MM.YYYY or ⁠%d.%m.%Y⁠)

last.table.structure.change

Date, indicates the last time the dataset/table structure was modified (format DD.MM.YYYY or ⁠%d.%m.%Y⁠)

data.start

Date of the oldest value included in the dataset (if available) (format usually YYYY or ⁠%Y⁠ but can also be YYYY-MM, YYYY-MM-DD, YYYY-SN, YYYY-QN etc.)

data.end

Date of the most recent value included in the dataset (if available) (format usually YYYY or ⁠%Y⁠ but can also be YYYY-MM, YYYY-MM-DD, YYYY-SN, YYYY-QN etc.)

values

Number of actual values included in the dataset

hierarchy

Hierarchy of the data navigation tree, represented in the original txt file by a 4-spaces indentation prefix in the title

Data source: Eurostat Table of Contents

The Eurostat Table of Contents (TOC) is downloaded from https://ec.europa.eu/eurostat/api/dissemination/catalogue/toc/txt?lang=en (default) or from French or German language variants: https://ec.europa.eu/eurostat/api/dissemination/catalogue/toc/txt?lang=fr https://ec.europa.eu/eurostat/api/dissemination/catalogue/toc/txt?lang=de

See Eurostat documentation on TOC items: https://wikis.ec.europa.eu/display/EUROSTATHELP/API+-+Detailed+guidelines+-+Catalogue+API+-+TOC

Author(s)

Przemyslaw Biecek and Leo Lahti ropengov-forum@googlegroups.com

References

See citation("eurostat"):

Kindly cite the eurostat R package as follows:

  Lahti L., Huovari J., Kainu M., and Biecek P. (2017). Retrieval and
  analysis of Eurostat open data with the eurostat package. The R
  Journal 9(1), pp. 385-392. doi: 10.32614/RJ-2017-019

  Lahti, L., Huovari J., Kainu M., Biecek P., Hernangomez D., Antal D.,
  and Kantanen P. (2023). eurostat: Tools for Eurostat Open Data
  [Computer software]. R package version 4.0.0.
  https://github.com/rOpenGov/eurostat

To see these entries in BibTeX format, use 'print(<citation>,
bibtex=TRUE)', 'toBibtex(.)', or set
'options(citation.bibtex.max=999)'.

When citing data downloaded from Eurostat, see section "Citing Eurostat data" in get_eurostat() documentation.

See Also

get_eurostat(), search_eurostat()

Examples



tmp <- search_eurostat("education")
head(tmp)
# Use "fixed = TRUE" when pattern has characters that would need escaping.
# Here, parentheses would normally need to be escaped in regex
tmp <- search_eurostat("Live births (total) by NUTS 3 region", fixed = TRUE)



Set Eurostat Cache

Description

This function will store your cache_dir path on your local machine and would load it for future sessions. Type Sys.getenv("EUROSTAT_CACHE_DIR") to find your cached path.

Alternatively, you can store the cache_dir manually with the following options:

Usage

set_eurostat_cache_dir(
  cache_dir,
  overwrite = FALSE,
  install = FALSE,
  verbose = TRUE
)

Arguments

cache_dir

A path to a cache directory. On missing value the function would store the cached files on a temporary dir (See base::tempdir()).

overwrite

If this is set to TRUE, it will overwrite an existing EUROSTAT_CACHE_DIR that you already have in local machine.

install

if TRUE, will install the key in your local machine for use in future sessions. Defaults to FALSE. If cache_dir is FALSE this parameter is set to FALSE automatically.

verbose

Logical, displays information. Useful for debugging, default is FALSE.

Value

An (invisible) character with the path to your cache_dir.

Author(s)

Diego Hernangómez

See Also

rappdirs::user_config_dir()

Other cache utilities: clean_eurostat_cache()

Examples


# Don't run this! It would modify your current state
## Not run: 
set_eurostat_cache_dir(verbose = TRUE)

## End(Not run)

Sys.getenv("EUROSTAT_CACHE_DIR")

Set Eurostat TOC

Description

Internal function.

Usage

set_eurostat_toc(lang = "en")

Arguments

lang

2-letter language code, default is "en" (English), other options are "fr" (French) and "de" (German). Used for labeling datasets.

Value

Empty element

Author(s)

Przemyslaw Biecek and Leo Lahti ropengov-forum@googlegroups.com

References

see citation("eurostat")

See Also

get_eurostat_toc() toc_count_children() toc_determine_hierarchy() toc_list_children() toc_count_whitespace()


Auxiliary Data

Description

Auxiliary Data Sets

Usage

tgs00026

Format

data_frame

Details

Disposable income of private households by NUTS 2 regions Retrieved with: tgs00026 <- get_eurostat("tgs00026", time_format = "raw") Data retrieval date: 2022-06-27

See Also

Other datasets: eu_countries, eurostat_geodata_60_2016


Transform Data into Row-Column-Value Format

Description

Transform raw Eurostat data table downloaded from the API into a tidy row-column-value format (RCV).

Usage

tidy_eurostat(
  dat,
  time_format = "date",
  select_time = NULL,
  stringsAsFactors = FALSE,
  keepFlags = FALSE,
  use.data.table = FALSE
)

Arguments

dat

a data_frame from get_eurostat_raw().

time_format

a string giving a type of the conversion of the time column from the eurostat format. The default argument "date" converts to a Date() class with the date being the first day of the period. A "date_last" argument converts the dataset date to a Date() class object with the difference that the exact date is the last date of the period. Period can be year, semester (half year), quarter, month, or week (See eurotime2date() for more information). Argument "num" converts the date into a numeric (integer) meaning that the first day of the year 2000 is close to 2000.01 and the last day of the year is close to 2000.99 (see eurotime2num() for more information). Using the argument "raw" preserves the dates as they were in the original Eurostat data.

select_time

a character symbol for a time frequency or NULL, which is used by default as most datasets have just one time frequency. For datasets with multiple time frequencies, select one or more of the desired frequencies with: "Y" (or "A") = annual, "S" = semi-annual / semester, "Q" = quarterly, "M" = monthly, "W" = weekly. For all frequencies in same data frame time_format = "raw" should be used.

stringsAsFactors

if TRUE (the default) variables are converted to factors in the original Eurostat order. If FALSE they are returned as strings.

keepFlags

a logical whether the flags (e.g. "confidential", "provisional") should be kept in a separate column or if they can be removed. Default is FALSE. For flag values see: https://ec.europa.eu/eurostat/data/database/information. Also possible non-real zero "0n" is indicated in flags column. Flags are not available for eurostat API, so keepFlags can not be used with a filters.

use.data.table

Use faster data.table functions? Default is FALSE. On Windows requires that RTools is installed.

Value

tibble in the melted format with the last column 'values'.

Author(s)

Przemyslaw Biecek, Leo Lahti, Janne Huovari and Pyry Kantanen

References

See citation("eurostat"):

Kindly cite the eurostat R package as follows:

  Lahti L., Huovari J., Kainu M., and Biecek P. (2017). Retrieval and
  analysis of Eurostat open data with the eurostat package. The R
  Journal 9(1), pp. 385-392. doi: 10.32614/RJ-2017-019

  Lahti, L., Huovari J., Kainu M., Biecek P., Hernangomez D., Antal D.,
  and Kantanen P. (2023). eurostat: Tools for Eurostat Open Data
  [Computer software]. R package version 4.0.0.
  https://github.com/rOpenGov/eurostat

To see these entries in BibTeX format, use 'print(<citation>,
bibtex=TRUE)', 'toBibtex(.)', or set
'options(citation.bibtex.max=999)'.

When citing data downloaded from Eurostat, see section "Citing Eurostat data" in get_eurostat() documentation.

See Also

get_eurostat(), convert_time_col(), eurotime2date()

Examples

## Not run: 
# Example of a dataset with multiple time series
get_eurostat("AVIA_GOR_ME",
  time_format = "date_last",
  cache = F
  )

## End(Not run)


Count number of children

Description

Determine how many children a certain TOC item (usually a folder) has.

Usage

toc_count_children(code)

Arguments

code

Eurostat TOC item code (folder, dataset, table)

Author(s)

Pyry Kantanen

See Also

get_eurostat_toc() toc_count_children() toc_determine_hierarchy() toc_list_children() toc_count_whitespace()


Count white space at the start of the title

Description

Counts the number of white space characters at the start of the string.

Usage

toc_count_whitespace(input_string)

Arguments

input_string

A string containing Eurostat TOC titles

Details

Used in toc_determine_hierarchy function to determine hierarchy. Hierarchy is defined in Eurostat .txt format TOC files by the number of white space characters at intervals of four. For example, " Foo" (4 white space characters) is one level higher than " Bar" (8 white space characters). "Database by themes" (0 white space characters before the first alphanumeric character) is highest in the hierarchy.

The function will return a warning if the input has white space in anything else than as increments of 4. 0, 4, 8... are acceptable but 3, 6, 10... are not.

Value

Numeric (number of white space characters)

Author(s)

Pyry Kantanen

See Also

get_eurostat_toc() toc_count_children() toc_determine_hierarchy() toc_list_children() toc_count_whitespace()

Examples

strings <- c("    abc", "  cdf", "no_spaces")
for (string in strings) {
 whitespace_count <- eurostat:::toc_count_whitespace(string)
 cat("String:", string, "\tWhitespace Count:", whitespace_count, "\n")
}


Determine level in hierarchy

Description

Divides the number of spaces before alphanumeric characters with 4 and uses the result to determine hierarchy. Top level is 0.

Usage

toc_determine_hierarchy(input_string)

Arguments

input_string

A string containing Eurostat TOC titles

Details

Used in toc_determine_hierarchy function to determine hierarchy. Hierarchy is defined in Eurostat .txt format TOC files by the number of white space characters at intervals of four. For example, " Foo" (4 white space characters) is one level higher than " Bar" (8 white space characters). "Database by themes" (0 white space characters before the first alphanumeric character) is highest in the hierarchy.

The function will return a warning if the input has white space in anything else than as increments of 4. 0, 4, 8... are acceptable but 3, 6, 10... are not.

Value

Numeric

Author(s)

Pyry Kantanen

See Also

get_eurostat_toc() toc_count_children() toc_determine_hierarchy() toc_list_children() toc_count_whitespace()

Examples

strings <- c("        abc", "    cdf", "no_spaces")
eurostat:::toc_determine_hierarchy(strings)


List children

Description

List children of a specific folder.

Usage

toc_list_children(code)

Arguments

code

Eurostat TOC item code (folder, dataset, table)

Author(s)

Pyry Kantanen

See Also

get_eurostat_toc() toc_count_children() toc_determine_hierarchy() toc_list_children() toc_count_whitespace()