Type: | Package |
Title: | Tools for Eurostat Open Data |
Version: | 4.0.0 |
Date: | 2023-12-19 |
Description: | Tools to download data from the Eurostat database https://ec.europa.eu/eurostat together with search and manipulation utilities. |
License: | BSD_2_clause + file LICENSE |
URL: | https://ropengov.github.io/eurostat/, https://github.com/rOpenGov/eurostat |
BugReports: | https://github.com/rOpenGov/eurostat/issues |
Depends: | R (≥ 3.6.0) |
Imports: | classInt, countrycode, curl, digest, dplyr, httr2 (≥ 0.2.3), ISOweek, jsonlite, lubridate, rappdirs, readr, RefManageR, regions, rlang, stringi, stringr, tibble, tidyr (≥ 1.0.0), xml2, data.table (≥ 1.14.8) |
Suggests: | giscoR, knitr, rmarkdown, sf, testthat (≥ 3.0.0) |
VignetteBuilder: | knitr |
Config/Needs/website: | ggplot2, tmap, styler, sessioninfo, ropengov/rogtemplate, ragg |
Config/testthat/edition: | 3 |
Config/testthat/parallel: | false |
Encoding: | UTF-8 |
LazyData: | true |
MailingList: | rOpenGov <ropengov-forum@googlegroups.com> |
NeedsCompilation: | no |
Repository: | CRAN |
RoxygenNote: | 7.2.3 |
X-schema.org-isPartOf: | http://ropengov.org/ |
X-schema.org-keywords: | ropengov |
Packaged: | 2023-12-19 20:11:33 UTC; leo |
Author: | Leo Lahti |
Maintainer: | Leo Lahti <leo.lahti@iki.fi> |
Date/Publication: | 2023-12-19 20:30:02 UTC |
R Tools for Eurostat open data
Description
Tools to download data from the Eurostat database https://ec.europa.eu/eurostat together with search and manipulation utilities.
Details
Package | eurostat |
Type | Package |
Version | 4.0.0 |
Date | 2014-2023 |
License | BSD_2_clause + file LICENSE |
LazyLoad | yes |
Eurostat
Eurostat website: https://ec.europa.eu/eurostat Eurostat database: https://ec.europa.eu/eurostat/web/main/data/database
Information about the data update schedule from Eurostat: "Eurostat datasets are updated twice a day at 11:00 and 23:00 CET, if newer data is available or for structural changes, for example for the dimensions in the dataset.
The Eurostat database always contains the latest version of the datasets, meaning that there is no versioning or documentation of past versions of the data."
Data source: Eurostat SDMX 2.1 Dissemination API
Data is downloaded from Eurostat SDMX 2.1 API endpoint as compressed TSV files that are transformed into tabular format. See Eurostat documentation for more information: https://wikis.ec.europa.eu/display/EUROSTATHELP/API+SDMX+2.1+-+data+query
The new dissemination API replaces the old bulk download facility that was used by Eurostat before October 2023 and by the eurostat R package versions before 4.0.0. See Eurostat documentation about the transition from Bulk Download to API for more information about the differences between the old bulk download facility and the data provided by the new API connection: https://wikis.ec.europa.eu/display/EUROSTATHELP/Transition+-+from+Eurostat+Bulk+Download+to+API
See especially the document Migrating_to_API_TSV.pdf that describes the changes in TSV file format in new applications.
For more information about SDMX 2.1, see SDMX standards: Section 7: Guidelines for the use of web services, Version 2.1: https://sdmx.org/wp-content/uploads/SDMX_2-1_SECTION_7_WebServicesGuidelines.pdf
Disclaimer: Availability of filtering functionalities
Currently it only possible to download filtered data through API Statistics
(JSON API) when using eurostat
package, although technically filtering
datasets downloaded through the SDMX Dissemination API is also supported by
Eurostat. We may support this feature in the future. In the meantime, if you
are interested in filtering Dissemination API data queries manually, please
consult the following Eurostat documentation:
https://wikis.ec.europa.eu/display/EUROSTATHELP/API+SDMX+2.1+-+data+filtering
Data source: Eurostat API Statistics (JSON API)
Data is downloaded from Eurostat API Statistics. See Eurostat documentation for more information about data queries in API Statistics https://wikis.ec.europa.eu/display/EUROSTATHELP/API+Statistics+-+data+query
This replaces the old JSON Web Services that was used by Eurostat before February 2023 and by the eurostat R package versions before 3.7.13. See Eurostat documentation about the migration from JSON web service to API Statistics for more information about the differences between the old and the new service: https://wikis.ec.europa.eu/display/EUROSTATHELP/API+Statistics+-+migrating+from+JSON+web+service+to+API+Statistics
For easily viewing which filtering options are available - in addition to the default ones, time and language - Eurostat Web services Query builder tool may be useful: https://ec.europa.eu/eurostat/web/query-builder
Filtering datasets
When using Eurostat API Statistics (JSON API), datasets can be filtered
before they are downloaded and saved in local memory. The general format
for filter parameters is <DIMENSION_CODE>=<VALUE>
.
Filter parameters are optional but the used dimension codes must be present
in the data product that is being queried. Dimension codes can
vary between different data products so it may be useful to examine new
datasets in Eurostat data browser beforehand. However, most if not all
Eurostat datasets concern European countries and contain information that
was gathered at some point in time, so geo
and time
dimension codes
can usually be used.
<DIMENSION_CODE>
and <VALUE>
are case-insensitive and they can be written
in lowercase or uppercase in the query.
Parameters are passed onto the eurostat
package functions get_eurostat()
and get_eurostat_json()
as a list item. If an individual item contains
multiple items, as it often can be in the case of geo
parameters and
other optional items, they must be in the form of a vector: c("FI", "SE")
.
For examples on how to use these parameters, see function examples below.
Time parameters
time
and time_period
address the same TIME_PERIOD
dimension in the
dataset and can be used interchangeably. In the Eurostat documentation
it is stated that "Using more than one Time parameter in the same query
is not accepted", but practice has shown that actually Eurostat API allows
multiple time
parameters in the same query. This makes it possible to
use R colon operator when writing queries, so time = c(2015:2018)
translates to &time=2015&time=2016&time=2017&time=2018
.
The only exception
to this is when the queried dataset contains e.g. quarterly data and
TIME_PERIOD
is saved as 2015-Q1
, 2015-Q2
etc. Then it is possible
to use time=2015-Q1&time=2015-Q2
style in the query URL, but this makes it
unfeasible to use the colon operator and requires a lot of manual typing.
Because of this, it is useful to know about other time parameters as well:
-
untilTimePeriod
: return dataset items from the oldest record up until the set time, for example "all data until 2000":untilTimePeriod = 2000
-
sinceTimePeriod
: return dataset items starting from set time, for example "all datastarting from 2008":sinceTimePeriod = 2008
-
lastTimePeriod
: starting from the most recent time period, how many preceding time periods should be returned? For example 10 most recent observations:lastTimePeriod = 10
Using both untilTimePeriod
and sinceTimePeriod
parameters in the same
query is allowed, making the usage of the R colon operator unnecessary.
In the case of quarterly data, using untilTimePeriod
and sinceTimePeriod
parameters also works, as opposed to the colon operator, so it is generally
safer to use them as well.
Other dimensions
In get_eurostat_json()
examples nama_10_gdp
dataset is filtered with
two additional filter parameters:
-
na_item = "B1GQ"
-
unit = "CLV_I10"
Filters like these are most likely unique to the nama_10_gdp
dataset
(or other datasets within the same domain) and should
not be used with others dataset without user discretion.
By using label_eurostat()
we know that "B1GQ"
stands for
"Gross domestic product at market prices" and
"CLV_I10"
means "Chain linked volumes, index 2010=100".
Different dimension codes can be translated to a natural language by using
the get_eurostat_dic()
function, which returns labels for individual
dimension items such as na_item
and unit
, as opposed to
label_eurostat()
which does it for whole datasets. For example, the
parameter na_item
stands for "National accounts indicator (ESA 2010)" and
unit
stands for "Unit of measure".
Language
All datasets have metadata available in English, French and German. If no parameter is given, the labels are returned in English.
Example:
-
lang = "fr"
More information
For more information about data filtering see Eurostat documentation on API Statistics: https://wikis.ec.europa.eu/display/EUROSTATHELP/API+Statistics+-+data+query#APIStatisticsdataquery-TheparametersdefinedintheRESTrequest
Data source: Eurostat Table of Contents
The Eurostat Table of Contents (TOC) is downloaded from https://ec.europa.eu/eurostat/api/dissemination/catalogue/toc/txt?lang=en (default) or from French or German language variants: https://ec.europa.eu/eurostat/api/dissemination/catalogue/toc/txt?lang=fr https://ec.europa.eu/eurostat/api/dissemination/catalogue/toc/txt?lang=de
See Eurostat documentation on TOC items: https://wikis.ec.europa.eu/display/EUROSTATHELP/API+-+Detailed+guidelines+-+Catalogue+API+-+TOC
Data source: GISCO - General Copyright
"Eurostat's general copyright notice and licence policy is applicable and can be consulted here: https://ec.europa.eu/eurostat/about-us/policies/copyright
Please also be aware of the European Commission's general conditions: https://commission.europa.eu/legal-notice_en
Moreover, there are specific provisions applicable to some of the following datasets available for downloading. The download and usage of these data is subject to their acceptance:
Administrative Units / Statistical Units
Population distribution / Demography
Transport Networks
Land Cover
Elevation (DEM)"
Of the abovementioned datasets, Administrative Units / Statistical Units is applicable if the user wants to draw maps with borders provided by GISCO / EuroGeographics.
Data source: GISCO - Administrative Units / Statistical Units
The following copyright notice is provided for end user convenience. Please check up-to-date copyright information from the GISCO website: GISCO: Geographical information and maps - Administrative units/statistical units
"In addition to the general copyright and licence policy applicable to the whole Eurostat website, the following specific provisions apply to the datasets you are downloading. The download and usage of these data is subject to the acceptance of the following clauses:
The Commission agrees to grant the non-exclusive and not transferable right to use and process the Eurostat/GISCO geographical data downloaded from this page (the "data").
The permission to use the data is granted on condition that:
the data will not be used for commercial purposes;
the source will be acknowledged. A copyright notice, as specified below, will have to be visible on any printed or electronic publication using the data downloaded from this page."
Copyright notice
When data downloaded from this page is used in any printed or electronic publication, in addition to any other provisions applicable to the whole Eurostat website, data source will have to be acknowledged in the legend of the map and in the introductory page of the publication with the following copyright notice:
EN: © EuroGeographics for the administrative boundaries
FR: © EuroGeographics pour les limites administratives
DE: © EuroGeographics bezüglich der Verwaltungsgrenzen
For publications in languages other than English, French or German, the translation of the copyright notice in the language of the publication shall be used.
If you intend to use the data commercially, please contact EuroGeographics for information regarding their licence agreements."
Eurostat: Copyright notice and free re-use of data
The following copyright notice is provided for end user convenience. Please check up-to-date copyright information from the eurostat website: https://ec.europa.eu/eurostat/about-us/policies/copyright
"(c) European Union, 1995 - today
Eurostat has a policy of encouraging free re-use of its data, both for non-commercial and commercial purposes. All statistical data, metadata, content of web pages or other dissemination tools, official publications and other documents published on its website, with the exceptions listed below, can be reused without any payment or written licence provided that:
the source is indicated as Eurostat;
when re-use involves modifications to the data or text, this must be stated clearly to the end user of the information."
For exceptions to the abovementioned principles see Eurostat website
Citing Eurostat data
For citing datasets, use get_bibentry()
to build a bibliography that
is suitable for your reference manager of choice.
When using Eurostat data in other contexts than academic publications that in-text citations or footnotes/endnotes, the following guidelines may be helpful:
The origin of the data should always be mentioned as "Source: Eurostat".
The online dataset codes(s) should also be provided in order to ensure transparency and facilitate access to the Eurostat data and related methodological information. For example: "Source: Eurostat (online data code: namq_10_gdp)"
Online publications (e.g. web pages, PDF) should include a clickable link to the dataset using the bookmark functionality available in the Eurostat data browser.
It should be avoided to associate different entities (e.g. Eurostat, National Statistical Offices, other data providers) to the same dataset or indicator without specifying the role of each of them in the treatment of data.
See also section "Eurostat: Copyright notice and free re-use of data"
in get_eurostat()
documentation.
Strategies for handling large datasets more efficiently
Most Eurostat datasets are relatively manageable, at least on a machine with 16 GB of RAM. The largest dataset in Eurostat database, at the time of writing this, had 148362539 (148 million) values, which results in an object with 148 million rows in tidy data (long) format. The test machine with 16 GB of RAM was able to handle the second largest dataset in the database with 91 million values (rows).
There are still some methods to make data fetching functions perform faster:
turn caching off:
get_eurostat(cache = FALSE)
turn cache compression off (may result in rather large cache files!):
get_eurostat(compress_file = FALSE)
if you want faster caching with manageable file sizes, use stringsAsFactors:
get_eurostat(cache = TRUE, compress_file = TRUE, stringsAsFactors = TRUE)
Use faster data.table functions:
get_eurostat(use.data.table = TRUE)
Keep column processing to a minimum:
get_eurostat(time_format = "raw", type = "code")
etc.Read
get_eurostat()
function documentation carefully so you understand what different arguments doFilter the dataset so that you fetch only the parts you need!
regions functions
For working with sub-national statistics the basic functions of the regions package are imported https://regions.dataobservatory.eu/.
Author(s)
Leo Lahti, Janne Huovari, Markus Kainu, Przemyslaw Biecek
References
See citation("eurostat")
:
Kindly cite the eurostat R package as follows: Lahti L., Huovari J., Kainu M., and Biecek P. (2017). Retrieval and analysis of Eurostat open data with the eurostat package. The R Journal 9(1), pp. 385-392. doi: 10.32614/RJ-2017-019 Lahti, L., Huovari J., Kainu M., Biecek P., Hernangomez D., Antal D., and Kantanen P. (2023). eurostat: Tools for Eurostat Open Data [Computer software]. R package version 4.0.0. https://github.com/rOpenGov/eurostat To see these entries in BibTeX format, use 'print(<citation>, bibtex=TRUE)', 'toBibtex(.)', or set 'options(citation.bibtex.max=999)'.
When citing data downloaded from Eurostat, see section "Citing Eurostat data"
in get_eurostat()
documentation.
See Also
help("regions")
, https://regions.dataobservatory.eu/
Examples
library(eurostat)
Add the statistical aggregation level to data frame
Description
Eurostat regional statistics contain country, and various regional level information. In many cases, for example, when mapping, it is useful to filter out national level data from NUTS2 level regional data, for example.
This function will be deprecated. Use the more comprehensive
[regions::validate_nuts_regions()]
instead.
Usage
add_nuts_level(dat, geo_labels = "geo")
Arguments
dat |
A data frame or tibble returned by |
geo_labels |
A geographical label, defaults to |
Details
DEPRECATED FUNCTIONS FOR BACKWARD COMPATIBILITY FUNCTIONS GIVE WARNING AND CALL APPROPRIATE regions FUNCTIONS
Value
a new numeric variable nuts_level with the numeric value of NUTS level 0 (country), 1 (greater region), 2 (region), 3 (small region).
Author(s)
Daniel Antal
See Also
regions::validate_nuts_regions()
Other regions functions:
harmonize_geo_code()
,
recode_to_nuts_2013()
,
recode_to_nuts_2016()
,
reexports
Examples
dat <- data.frame(
geo = c("FR", "IE04", "DEB1C"),
values = c(1000, 23, 12)
)
add_nuts_level(dat)
Check access to ec.europe.eu
Description
Check if R has access to resources at http://ec.europa.eu
Usage
check_access_to_data()
Value
a logical.
Author(s)
Markus Kainu markus.kainu@kapsi.fi
Examples
check_access_to_data()
Clean Eurostat Cache
Description
Delete all .rds files from the eurostat cache directory.
See get_eurostat()
for more on cache.
Usage
clean_eurostat_cache(cache_dir = NULL, config = FALSE)
Arguments
cache_dir |
A path to cache directory. If |
config |
Logical |
Author(s)
Przemyslaw Biecek, Leo Lahti, Janne Huovari, Markus Kainu and Diego Hernangómez
See Also
Other cache utilities:
set_eurostat_cache_dir()
Examples
## Not run:
clean_eurostat_cache()
## End(Not run)
Time Column Conversions for data from new dissemination API
Description
Internal function to convert time column.
Usage
convert_time_col(x, time_format)
Arguments
x |
A time column (vector) from a downloaded dataset |
time_format |
one of the following: |
Cuts the Values Column into Classes and Polishes the Labels
Description
Categorises a numeric vector into automatic or manually defined
categories and polishes the labels ready for used in mapping with ggplot2
.
Usage
cut_to_classes(
x,
n = 5,
style = "equal",
manual = FALSE,
manual_breaks = NULL,
decimals = 0,
nodata_label = "No data"
)
Arguments
x |
A numeric vector, eg. |
n |
A numeric. number of classes/categories |
style |
chosen style: one of "fixed", "sd", "equal", "pretty", "quantile", "kmeans", "hclust", "bclust", "fisher", "jenks", "dpih", "headtails", "maximum", or "box" |
manual |
Logical. If manual breaks are being used |
manual_breaks |
Numeric vector with manual threshold values |
decimals |
Number of decimals to include with labels |
nodata_label |
String. Text label for NA category. |
Value
a factor.
Author(s)
Markus Kainu markuskainu@gmail.com
See Also
Other helpers:
dic_order()
,
eurotime2date()
,
eurotime2num()
,
harmonize_country_code()
,
label_eurostat()
Examples
# lp <- get_eurostat("nama_aux_lp")
lp <- get_eurostat("nama_10_lp_ulc")
lp$class <- cut_to_classes(lp$values, n = 5, style = "equal", decimals = 1)
Order of Variable Levels from Eurostat Dictionary.
Description
Orders the factor levels.
Usage
dic_order(x, dic, type)
Arguments
x |
a variable (code or labelled) to get order for. |
dic |
a name of the dictionary. Correspond a variable name in the
data_frame from |
type |
a type of the x. Could be |
Details
Some variables, like classifications, have logical or conventional
ordering. Eurostat data tables are nor necessary ordered in this order.
The function dic_order()
get the ordering from Eurostat classifications
dictionaries. The function label_eurostat()
can also order factor levels
of labels with argument eu_order = TRUE
.
Value
A numeric vector of orders.
Author(s)
Przemyslaw Biecek, Leo Lahti, Janne Huovari and Markus Kainu
See Also
Other helpers:
cut_to_classes()
,
eurotime2date()
,
eurotime2num()
,
harmonize_country_code()
,
label_eurostat()
Countries and Country Codes
Description
Countries and country codes in EU, Euro area, EFTA and EU candidate countries.
Usage
eu_countries
ea_countries
efta_countries
eu_candidate_countries
Format
A data_frame:
-
code: Country code in the Eurostat database.
-
name: Country name in English.
-
label: Country name in the Eurostat database.
An object of class tbl_df
(inherits from tbl
, data.frame
) with 19 rows and 3 columns.
An object of class tbl_df
(inherits from tbl
, data.frame
) with 4 rows and 3 columns.
An object of class tbl_df
(inherits from tbl
, data.frame
) with 7 rows and 3 columns.
Source
https://ec.europa.eu/eurostat/statistics-explained/index.php/Tutorial:Country_codes_and_protocol_order, https://ec.europa.eu/eurostat/statistics-explained/index.php/Glossary:Euro_area
See Also
Other datasets:
eurostat_geodata_60_2016
,
tgs00026
Defunct functions in eurostat
Description
This list of defunct functions is maintained to document changes to eurostat functions in a transparent manner.
Usage
grepEurostatTOC(...)
Arguments
... |
Generic representation of old arguments |
Details
The following functions are defunct:
-
grepEurostatTOC
: Usesearch_eurostat
instead
Geospatial data of Europe from GISCO in 1:60 million scale from year 2016
Description
Geospatial data of Europe from GISCO in 1:60 million scale from year 2016
Format
sf object
Details
The dataset contains 2016 observations (rows) and 12 variables (columns).
The object contains the following columns:
-
id: JSON id code, the same as NUTS_ID. See NUTS_ID below for further clarification.
-
LEVL_CODE: NUTS level code: 0 (national level), 1 (major socio-economic regions), 2 (basic regions for the application of regional policies) or 3 (small regions).
-
NUTS_ID: NUTS ID code, consisting of country code and numbers (1 for NUTS 1, 2 for NUTS 2 and 3 for NUTS 3)
-
CNTR_CODE: Country code: two-letter ISO code (ISO 3166 alpha-2), except in the case of Greece (EL).
-
NAME_LATN: NUTS name in local language, transliterated to Latin script
-
NUTS_NAME: NUTS name in local language, in local script.
-
MOUNT_TYPE: Mountain typology for NUTS 3 regions.
1: "where more than 50 % of the surface is covered by topographic mountain areas"
2: "in which more than 50 % of the regional population lives in topographic mountain areas"
3: "where more than 50 % of the surface is covered by topographic mountain areas and where more than 50 % of the regional population lives in these mountain areas"
4: non-mountain region / other region
0: no classification provided (e.g. in the case of NUTS 1 and NUTS 2 and non-EU countries)
-
URBN_TYPE: Urban-rural typology for NUTS 3 regions.
1: predominantly urban region
2: intermediate region
3: predominantly rural region
0: no classification provided (e.g. in the case of NUTS 1 and NUTS 2 regions)
-
COAST_TYPE: Coastal typology for NUTS 3 regions.
1: coastal (on coast)
2: coastal (>= 50% of population living within 50km of the coastline)
3: non-coastal region
0: no classification provided (e.g. in the case of NUTS 1 and NUTS 2 regions)
-
FID: Same as NUTS_ID.
-
geo: Same as NUTS_ID, added for for easier joins with dplyr. However, it is recommended to use other identical fields for this purpose.
-
geometry: geospatial information.
Dataset updated: 2023-06-29. For a more recent version, please use
giscoR::gisco_get_nuts()
function.
Source
Data source: Eurostat via giscoR::gisco_get_nuts()
.
© EuroGeographics for the administrative boundaries
Data downloaded from: https://ec.europa.eu/eurostat/web/gisco/geodata/reference-data/administrative-units-statistical-units
References
The following copyright notice is provided for end user convenience. Please check up-to-date copyright information from the eurostat website: GISCO: Geographical information and maps - Administrative units/statistical units
"In addition to the general copyright and licence policy applicable to the whole Eurostat website, the following specific provisions apply to the datasets you are downloading. The download and usage of these data is subject to the acceptance of the following clauses:
The Commission agrees to grant the non-exclusive and not transferable right to use and process the Eurostat/GISCO geographical data downloaded from this page (the "data").
The permission to use the data is granted on condition that:
the data will not be used for commercial purposes;
the source will be acknowledged. A copyright notice, as specified below, will have to be visible on any printed or electronic publication using the data downloaded from this page.
Copyright notice
When data downloaded from this page is used in any printed or electronic publication, in addition to any other provisions applicable to the whole Eurostat website, data source will have to be acknowledged in the legend of the map and in the introductory page of the publication with the following copyright notice:
EN: © EuroGeographics for the administrative boundaries
FR: © EuroGeographics pour les limites administratives
DE: © EuroGeographics bezüglich der Verwaltungsgrenzen
For publications in languages other than English, French or German, the translation of the copyright notice in the language of the publication shall be used.
If you intend to use the data commercially, please contact EuroGeographics for information regarding their licence agreements."
See Also
giscoR::gisco_get_nuts()
and
Eurostat. (2019). Methodological manual on territorial typologies – 2018 edition. Manuals and guidelines.
Other datasets:
eu_countries
,
tgs00026
Other geospatial:
get_eurostat_geospatial()
Examples
eurostat_geodata_60_2016 <- eurostat::eurostat_geodata_60_2016
# Manipulate and plot
if (require(sf)) {
library(sf)
# Filter NUTS3 from select countries like in a regular data frame
example_nuts <- subset(eurostat_geodata_60_2016, LEVL_CODE == 3 &
CNTR_CODE %in% c("DK", "DE", "PL"))
plot(example_nuts["CNTR_CODE"])
}
Date Conversion from New Eurostat Time Format
Description
Date conversion from Eurostat time format. A function to
convert Eurostat time values to objects of class Date()
representing calendar dates.
Usage
eurotime2date(x, last = FALSE)
Arguments
x |
a charter string with time information in Eurostat time format. |
last |
a logical. If |
Details
Available patterns are YYYY (year), YYYY-SN (semester), YYYY-QN (quarter), YYYY-MM (month), YYYY-WNN (week) and YYYY-MM-DD (day).
Value
an object of class Date()
.
Author(s)
Janne Huovari janne.huovari@ptt.fi
References
See citation("eurostat")
:
# Kindly cite the eurostat R package as follows: # # Lahti L., Huovari J., Kainu M., and Biecek P. (2017). Retrieval and # analysis of Eurostat open data with the eurostat package. The R # Journal 9(1), pp. 385-392. doi: 10.32614/RJ-2017-019 # # Lahti, L., Huovari J., Kainu M., Biecek P., Hernangomez D., Antal D., # and Kantanen P. (2023). eurostat: Tools for Eurostat Open Data # [Computer software]. R package version 4.0.0. # https://github.com/rOpenGov/eurostat # # To see these entries in BibTeX format, use 'print(<citation>, # bibtex=TRUE)', 'toBibtex(.)', or set # 'options(citation.bibtex.max=999)'.
See Also
Other helpers:
cut_to_classes()
,
dic_order()
,
eurotime2num()
,
harmonize_country_code()
,
label_eurostat()
Examples
na_q <- get_eurostat("namq_10_pc", time_format = "raw")
na_q$TIME_PERIOD <- eurotime2date(x = na_q$TIME_PERIOD)
unique(na_q$TIME_PERIOD)
## Not run:
# Test for weekly data
get_eurostat(
id = "lfsi_abs_w",
select_time = c("W"),
time_format = "date"
)
## End(Not run)
Conversion of Eurostat Time Format to Numeric
Description
A conversion of a Eurostat time format to numeric.
Usage
eurotime2num(x)
Arguments
x |
a charter string with time information in Eurostat time format. |
Details
Bi-annual (semester), quarterly, monthly and weekly data can be presented as a fraction of the year in beginning of the period. Conversion of daily data is not supported.
Value
see as.numeric()
.
Author(s)
Janne Huovari janne.huovari@ptt.fi, Pyry Kantanen
See Also
Other helpers:
cut_to_classes()
,
dic_order()
,
eurotime2date()
,
harmonize_country_code()
,
label_eurostat()
Examples
na_q <- get_eurostat("namq_10_pc", time_format = "raw")
na_q$TIME_PERIOD <- eurotime2num(x = na_q$TIME_PERIOD)
unique(na_q$TIME_PERIOD)
Calculate a fixity checksum for an object
Description
Uses a hash function (md5) on an object and calculates a digest of the object in the form of a character string.
Usage
fixity_checksum(data_object, algorithm = "md5")
Arguments
data_object |
A dataset downloaded with some eurostat package function. |
algorithm |
Algorithm to use when calculating a checksum for a dataset. Default is 'md5', but can be any supported algorithm in digest function. |
Details
“Fixity, in the preservation sense, means the assurance that a digital file has remained unchanged, i.e. fixed.” (Bailey, 2014). In practice, fixity can most easily be established by calculating a checksum for the data object that changes if anything in the data object has changed. What we use as a checksum here is by default calculated with md5 hash algorithm. It is possible to use other algorithms supported by the imported digest function, see function documentation.
In the case of big objects with millions of rows of data calculating a checksum can take a bit longer and require some amount of RAM to be available. Selecting another algorithm might perform faster and/or more efficiently. Whichever algorithm you are using, please make sure to report it transparently in your work for transparency and ensuring replicability.
This function takes the whole data object as an input, meaning that everything counts when calculating the fixity checksum. If the dataset column names are labeled, if the data itself is labeled, if stringsAsFactors is TRUE, if flags are removed or kept, if data is somehow edited... all these affect the calculated checksum. It is advisable to calculate the checksum immediately after downloading the data, before adding any labels or doing other mutating operations. If you are using other arguments than the default ones when downloading data, it is also good to report the exact arguments used.
This implementation fulfills the level 1 requirement of National Digital Stewardship Alliance (NDSA) preservation levels by creating "fixity info if it wasn’t provided with the content". In the current version of the package, fixity information has to be created manually and is at the responsibility of the user.
Source
https://www.dpconline.org/handbook/technical-solutions-and-tools/fixity-and-checksums
See Also
Create A Data Bibliography
Description
Creates a bibliography from selected Eurostat data files, including last Eurostat update, URL access data, and optional keywords set by the user.
Usage
get_bibentry(code, keywords = NULL, format = "Biblatex", lang = "en")
Arguments
code |
A Eurostat data code or a vector of Eurostat data codes as character or factor. |
keywords |
A list of keywords to be added to the entries. Defaults
to |
format |
Default is |
lang |
2-letter language code, default is " |
Value
a bibentry, Bibtex or Biblatex object.
Citing Eurostat data
For citing datasets, use get_bibentry()
to build a bibliography that
is suitable for your reference manager of choice.
When using Eurostat data in other contexts than academic publications that in-text citations or footnotes/endnotes, the following guidelines may be helpful:
The origin of the data should always be mentioned as "Source: Eurostat".
The online dataset codes(s) should also be provided in order to ensure transparency and facilitate access to the Eurostat data and related methodological information. For example: "Source: Eurostat (online data code: namq_10_gdp)"
Online publications (e.g. web pages, PDF) should include a clickable link to the dataset using the bookmark functionality available in the Eurostat data browser.
It should be avoided to associate different entities (e.g. Eurostat, National Statistical Offices, other data providers) to the same dataset or indicator without specifying the role of each of them in the treatment of data.
See also section "Eurostat: Copyright notice and free re-use of data"
in get_eurostat()
documentation.
Author(s)
Daniel Antal, Przemyslaw Biecek
See Also
utils::bibentry RefManageR::toBiblatex
Examples
## Not run:
my_bibliography <- get_bibentry(
code = c("tran_hv_frtra", "tec00001"),
keywords = list(
c("transport", "freight", "multimodal data", "GDP"),
c("economy and finance", "annual", "national accounts", "GDP")
),
format = "Biblatex"
)
my_bibliography
## End(Not run)
Get Eurostat Data
Description
Download data sets from Eurostat https://ec.europa.eu/eurostat
Usage
get_eurostat(
id,
time_format = "date",
filters = NULL,
type = "code",
select_time = NULL,
lang = "en",
cache = TRUE,
update_cache = FALSE,
cache_dir = NULL,
compress_file = TRUE,
stringsAsFactors = FALSE,
keepFlags = FALSE,
use.data.table = FALSE,
...
)
Arguments
id |
A unique identifier / code for the dataset of interest. If code is not
known |
time_format |
a string giving a type of the conversion of the time column from the
eurostat format. The default argument " |
filters |
A named list of filters. Names of list objects are Eurostat
variable codes and values are vectors of observation codes. If |
type |
A type of variables, " |
select_time |
a character symbol for a time frequency or |
lang |
2-letter language code, default is " |
cache |
a logical whether to do caching. Default is |
update_cache |
a logical whether to update cache. Can be set also with
|
cache_dir |
a path to a cache directory. |
compress_file |
a logical whether to compress the RDS-file in caching. Default is |
stringsAsFactors |
if |
keepFlags |
a logical whether the flags (e.g. "confidential",
"provisional") should be kept in a separate column or if they
can be removed. Default is |
use.data.table |
Use faster data.table functions? Default is FALSE. On Windows requires that RTools is installed. |
... |
Arguments passed on to
|
Details
Datasets are downloaded from
the Eurostat SDMX 2.1 API
in TSV format or from The Eurostat
API Statistics JSON API.
If only the table id
is given, the whole table is downloaded from the
SDMX API. If any filters
are given JSON API is used instead.
The bulk download facility is the fastest method to download whole datasets.
It is also often the only way as the JSON API has limitation of maximum
50 sub-indicators at time and whole datasets usually exceeds that. Also,
it seems that multi frequency datasets can only be retrieved via
bulk download facility and the select_time
is not available for
JSON API method.
If your connection is through a proxy, you may have to set proxy parameters
to use JSON API, see get_eurostat_json()
.
By default datasets are cached to reduce load on Eurostat services and
because some datasets can be quite large.
Cache files are stored in a temporary directory by default or in
a named directory (See set_eurostat_cache_dir()
).
The cache can be emptied with clean_eurostat_cache()
.
The id
, a code, for the dataset can be searched with
the search_eurostat()
or from the Eurostat database
https://ec.europa.eu/eurostat/data/database. The Eurostat
database gives codes in the Data Navigation Tree after every dataset
in parenthesis.
Value
a tibble.
One column for each dimension in the data, the time column for a time
dimension and the values column for numerical values. Eurostat data does
not include all missing values and a treatment of missing values depend
on source. In bulk download facility missing values are dropped if all
dimensions are missing on particular time. In JSON API missing values are
dropped only if all dimensions are missing on all times. The data from
bulk download facility can be completed for example with tidyr::complete()
.
Eurostat: Copyright notice and free re-use of data
The following copyright notice is provided for end user convenience. Please check up-to-date copyright information from the eurostat website: https://ec.europa.eu/eurostat/about-us/policies/copyright
"(c) European Union, 1995 - today
Eurostat has a policy of encouraging free re-use of its data, both for non-commercial and commercial purposes. All statistical data, metadata, content of web pages or other dissemination tools, official publications and other documents published on its website, with the exceptions listed below, can be reused without any payment or written licence provided that:
the source is indicated as Eurostat;
when re-use involves modifications to the data or text, this must be stated clearly to the end user of the information."
For exceptions to the abovementioned principles see Eurostat website
Filtering datasets
When using Eurostat API Statistics (JSON API), datasets can be filtered
before they are downloaded and saved in local memory. The general format
for filter parameters is <DIMENSION_CODE>=<VALUE>
.
Filter parameters are optional but the used dimension codes must be present
in the data product that is being queried. Dimension codes can
vary between different data products so it may be useful to examine new
datasets in Eurostat data browser beforehand. However, most if not all
Eurostat datasets concern European countries and contain information that
was gathered at some point in time, so geo
and time
dimension codes
can usually be used.
<DIMENSION_CODE>
and <VALUE>
are case-insensitive and they can be written
in lowercase or uppercase in the query.
Parameters are passed onto the eurostat
package functions get_eurostat()
and get_eurostat_json()
as a list item. If an individual item contains
multiple items, as it often can be in the case of geo
parameters and
other optional items, they must be in the form of a vector: c("FI", "SE")
.
For examples on how to use these parameters, see function examples below.
Time parameters
time
and time_period
address the same TIME_PERIOD
dimension in the
dataset and can be used interchangeably. In the Eurostat documentation
it is stated that "Using more than one Time parameter in the same query
is not accepted", but practice has shown that actually Eurostat API allows
multiple time
parameters in the same query. This makes it possible to
use R colon operator when writing queries, so time = c(2015:2018)
translates to &time=2015&time=2016&time=2017&time=2018
.
The only exception
to this is when the queried dataset contains e.g. quarterly data and
TIME_PERIOD
is saved as 2015-Q1
, 2015-Q2
etc. Then it is possible
to use time=2015-Q1&time=2015-Q2
style in the query URL, but this makes it
unfeasible to use the colon operator and requires a lot of manual typing.
Because of this, it is useful to know about other time parameters as well:
-
untilTimePeriod
: return dataset items from the oldest record up until the set time, for example "all data until 2000":untilTimePeriod = 2000
-
sinceTimePeriod
: return dataset items starting from set time, for example "all datastarting from 2008":sinceTimePeriod = 2008
-
lastTimePeriod
: starting from the most recent time period, how many preceding time periods should be returned? For example 10 most recent observations:lastTimePeriod = 10
Using both untilTimePeriod
and sinceTimePeriod
parameters in the same
query is allowed, making the usage of the R colon operator unnecessary.
In the case of quarterly data, using untilTimePeriod
and sinceTimePeriod
parameters also works, as opposed to the colon operator, so it is generally
safer to use them as well.
Other dimensions
In get_eurostat_json()
examples nama_10_gdp
dataset is filtered with
two additional filter parameters:
-
na_item = "B1GQ"
-
unit = "CLV_I10"
Filters like these are most likely unique to the nama_10_gdp
dataset
(or other datasets within the same domain) and should
not be used with others dataset without user discretion.
By using label_eurostat()
we know that "B1GQ"
stands for
"Gross domestic product at market prices" and
"CLV_I10"
means "Chain linked volumes, index 2010=100".
Different dimension codes can be translated to a natural language by using
the get_eurostat_dic()
function, which returns labels for individual
dimension items such as na_item
and unit
, as opposed to
label_eurostat()
which does it for whole datasets. For example, the
parameter na_item
stands for "National accounts indicator (ESA 2010)" and
unit
stands for "Unit of measure".
Language
All datasets have metadata available in English, French and German. If no parameter is given, the labels are returned in English.
Example:
-
lang = "fr"
More information
For more information about data filtering see Eurostat documentation on API Statistics: https://wikis.ec.europa.eu/display/EUROSTATHELP/API+Statistics+-+data+query#APIStatisticsdataquery-TheparametersdefinedintheRESTrequest
Citing Eurostat data
For citing datasets, use get_bibentry()
to build a bibliography that
is suitable for your reference manager of choice.
When using Eurostat data in other contexts than academic publications that in-text citations or footnotes/endnotes, the following guidelines may be helpful:
The origin of the data should always be mentioned as "Source: Eurostat".
The online dataset codes(s) should also be provided in order to ensure transparency and facilitate access to the Eurostat data and related methodological information. For example: "Source: Eurostat (online data code: namq_10_gdp)"
Online publications (e.g. web pages, PDF) should include a clickable link to the dataset using the bookmark functionality available in the Eurostat data browser.
It should be avoided to associate different entities (e.g. Eurostat, National Statistical Offices, other data providers) to the same dataset or indicator without specifying the role of each of them in the treatment of data.
See also section "Eurostat: Copyright notice and free re-use of data"
in get_eurostat()
documentation.
Disclaimer: Availability of filtering functionalities
Currently it only possible to download filtered data through API Statistics
(JSON API) when using eurostat
package, although technically filtering
datasets downloaded through the SDMX Dissemination API is also supported by
Eurostat. We may support this feature in the future. In the meantime, if you
are interested in filtering Dissemination API data queries manually, please
consult the following Eurostat documentation:
https://wikis.ec.europa.eu/display/EUROSTATHELP/API+SDMX+2.1+-+data+filtering
Strategies for handling large datasets more efficiently
Most Eurostat datasets are relatively manageable, at least on a machine with 16 GB of RAM. The largest dataset in Eurostat database, at the time of writing this, had 148362539 (148 million) values, which results in an object with 148 million rows in tidy data (long) format. The test machine with 16 GB of RAM was able to handle the second largest dataset in the database with 91 million values (rows).
There are still some methods to make data fetching functions perform faster:
turn caching off:
get_eurostat(cache = FALSE)
turn cache compression off (may result in rather large cache files!):
get_eurostat(compress_file = FALSE)
if you want faster caching with manageable file sizes, use stringsAsFactors:
get_eurostat(cache = TRUE, compress_file = TRUE, stringsAsFactors = TRUE)
Use faster data.table functions:
get_eurostat(use.data.table = TRUE)
Keep column processing to a minimum:
get_eurostat(time_format = "raw", type = "code")
etc.Read
get_eurostat()
function documentation carefully so you understand what different arguments doFilter the dataset so that you fetch only the parts you need!
Author(s)
Przemyslaw Biecek, Leo Lahti, Janne Huovari, Markus Kainu and Pyry Kantanen
References
See citation("eurostat")
:
Kindly cite the eurostat R package as follows: Lahti L., Huovari J., Kainu M., and Biecek P. (2017). Retrieval and analysis of Eurostat open data with the eurostat package. The R Journal 9(1), pp. 385-392. doi: 10.32614/RJ-2017-019 Lahti, L., Huovari J., Kainu M., Biecek P., Hernangomez D., Antal D., and Kantanen P. (2023). eurostat: Tools for Eurostat Open Data [Computer software]. R package version 4.0.0. https://github.com/rOpenGov/eurostat To see these entries in BibTeX format, use 'print(<citation>, bibtex=TRUE)', 'toBibtex(.)', or set 'options(citation.bibtex.max=999)'.
When citing data downloaded from Eurostat, see section "Citing Eurostat data"
in get_eurostat()
documentation.
See Also
search_eurostat()
, label_eurostat()
Examples
## Not run:
k <- get_eurostat("nama_10_lp_ulc")
k <- get_eurostat("nama_10_lp_ulc", time_format = "num")
k <- get_eurostat("nama_10_lp_ulc", update_cache = TRUE)
k <- get_eurostat("nama_10_lp_ulc",
cache_dir = file.path(tempdir(), "r_cache")
)
options(eurostat_update = TRUE)
k <- get_eurostat("nama_10_lp_ulc")
options(eurostat_update = FALSE)
set_eurostat_cache_dir(file.path(tempdir(), "r_cache2"))
k <- get_eurostat("nama_10_lp_ulc")
k <- get_eurostat("nama_10_lp_ulc", cache = FALSE)
k <- get_eurostat("avia_gonc", select_time = "Y", cache = FALSE)
dd <- get_eurostat("nama_10_gdp",
filters = list(
geo = "FI",
na_item = "B1GQ",
unit = "CLV_I10"
)
)
# A dataset with multiple time series in one
dd2 <- get_eurostat("AVIA_GOR_ME",
select_time = c("A", "M", "Q"),
time_format = "date_last"
)
# An example of downloading whole dataset from JSON API
dd3 <- get_eurostat("AVIA_GOR_ME",
filters = list()
)
# Filtering a dataset from a local file
dd3_filter <- get_eurostat("AVIA_GOR_ME",
filters = list(
tra_meas = "FRM_BRD"
)
)
## End(Not run)
Download Eurostat Dictionary
Description
Download a Eurostat dictionary.
Usage
get_eurostat_dic(dictname, lang = "en")
Arguments
dictname |
A character, dictionary for the variable to be downloaded. |
lang |
A character, language code. Options: "en" (default), "fr", "de". |
Details
For given coded variable from Eurostat
https://ec.europa.eu/eurostat/. The dictionaries link codes with
human-readable labels. To translate codes to labels, use
label_eurostat()
.
Value
tibble with two columns: code names and full names.
Author(s)
Przemyslaw Biecek and Leo Lahti leo.lahti@iki.fi. Thanks to Wietse Dol for contributions. Updated by Pyry Kantanen to support XML codelists.
References
See citation("eurostat")
:
# Kindly cite the eurostat R package as follows: # # Lahti L., Huovari J., Kainu M., and Biecek P. (2017). Retrieval and # analysis of Eurostat open data with the eurostat package. The R # Journal 9(1), pp. 385-392. doi: 10.32614/RJ-2017-019 # # Lahti, L., Huovari J., Kainu M., Biecek P., Hernangomez D., Antal D., # and Kantanen P. (2023). eurostat: Tools for Eurostat Open Data # [Computer software]. R package version 4.0.0. # https://github.com/rOpenGov/eurostat # # To see these entries in BibTeX format, use 'print(<citation>, # bibtex=TRUE)', 'toBibtex(.)', or set # 'options(citation.bibtex.max=999)'.
See Also
label_eurostat()
, get_eurostat()
,
search_eurostat()
.
Examples
get_eurostat_dic("crop_pro")
# Try another language
get_eurostat_dic("crop_pro", lang = "fr")
Get all datasets in a folder
Description
Loops over all files in a Eurostat database folder, downloads the data and assigns the datasets to environment.
Usage
get_eurostat_folder(code, env = .EurostatEnv)
Arguments
code |
Folder code from Eurostat Table of Contents. |
env |
Name of the environment where downloaded datasets are assigned. Default is .EurostatEnv. If NULL, datasets are returned as a list object. |
Details
The datasets are assigned into .EurostatEnv by default, using dataset codes as object names. The datasets are downloaded from SDMX API as TSV files, meaning that they are returned without filtering. No filters can be provided using this function.
Please do not attempt to download too many datasets or the whole database at once. The number of datasets that can be downloaded at once is hardcoded to 20. The function also asks the user for confirmation if the number of datasets in a folder is more than 10. This is by design to discourage straining Eurostat API.
Data source: Eurostat Table of Contents
The Eurostat Table of Contents (TOC) is downloaded from https://ec.europa.eu/eurostat/api/dissemination/catalogue/toc/txt?lang=en (default) or from French or German language variants: https://ec.europa.eu/eurostat/api/dissemination/catalogue/toc/txt?lang=fr https://ec.europa.eu/eurostat/api/dissemination/catalogue/toc/txt?lang=de
See Eurostat documentation on TOC items: https://wikis.ec.europa.eu/display/EUROSTATHELP/API+-+Detailed+guidelines+-+Catalogue+API+-+TOC
Data source: Eurostat SDMX 2.1 Dissemination API
Data is downloaded from Eurostat SDMX 2.1 API endpoint as compressed TSV files that are transformed into tabular format. See Eurostat documentation for more information: https://wikis.ec.europa.eu/display/EUROSTATHELP/API+SDMX+2.1+-+data+query
The new dissemination API replaces the old bulk download facility that was used by Eurostat before October 2023 and by the eurostat R package versions before 4.0.0. See Eurostat documentation about the transition from Bulk Download to API for more information about the differences between the old bulk download facility and the data provided by the new API connection: https://wikis.ec.europa.eu/display/EUROSTATHELP/Transition+-+from+Eurostat+Bulk+Download+to+API
See especially the document Migrating_to_API_TSV.pdf that describes the changes in TSV file format in new applications.
For more information about SDMX 2.1, see SDMX standards: Section 7: Guidelines for the use of web services, Version 2.1: https://sdmx.org/wp-content/uploads/SDMX_2-1_SECTION_7_WebServicesGuidelines.pdf
Author(s)
Pyry Kantanen
See Also
get_eurostat_toc()
toc_count_children()
toc_determine_hierarchy()
toc_list_children()
toc_count_whitespace()
Download Geospatial Data from GISCO
Description
Downloads either a simple features (sf) or a data_frame
of NUTS regions. This function is a wrapper of giscoR::gisco_get_nuts()
.
This function requires to have installed the packages sf and
giscoR.
Usage
get_eurostat_geospatial(
output_class = "sf",
resolution = "60",
nuts_level = "all",
year = "2016",
cache = TRUE,
update_cache = FALSE,
cache_dir = NULL,
crs = "4326",
make_valid = "DEPRECATED",
...
)
Arguments
output_class |
Class of object returned,
either |
resolution |
Resolution of the geospatial data. One of
|
nuts_level |
Level of NUTS classification of the geospatial data. One of "0", "1", "2", "3" or "all" (mimics the original behaviour) |
year |
NUTS release year. One of "2003", "2006", "2010", "2013", "2016" or "2021" |
cache |
a logical whether to do caching. Default is |
update_cache |
a logical whether to update cache. Can be set also with
|
cache_dir |
a path to a cache directory. See
|
crs |
projection of the map: 4-digit EPSG code. One of:
|
make_valid |
Deprecated |
... |
Arguments passed on to
|
Details
The objects downloaded from GISCO should contain all or some of the following variable columns:
-
id: JSON id code, the same as NUTS_ID. See NUTS_ID below for further clarification.
-
LEVL_CODE: NUTS level code: 0 (national level), 1 (major socio-economic regions), 2 (basic regions for the application of regional policies) or 3 (small regions).
-
NUTS_ID: NUTS ID code, consisting of country code and numbers (1 for NUTS 1, 2 for NUTS 2 and 3 for NUTS 3)
-
CNTR_CODE: Country code: two-letter ISO code (ISO 3166 alpha-2), except in the case of Greece (EL).
-
NAME_LATN: NUTS name in local language, transliterated to Latin script
-
NUTS_NAME: NUTS name in local language, in local script.
-
MOUNT_TYPE: Mountain typology for NUTS 3 regions.
1: "where more than 50 % of the surface is covered by topographic mountain areas"
2: "in which more than 50 % of the regional population lives in topographic mountain areas"
3: "where more than 50 % of the surface is covered by topographic mountain areas and where more than 50 % of the regional population lives in these mountain areas"
4: non-mountain region / other region
0: no classification provided (e.g. in the case of NUTS 1 and NUTS 2 and non-EU countries)
-
URBN_TYPE: Urban-rural typology for NUTS 3 regions.
1: predominantly urban region
2: intermediate region
3: predominantly rural region
0: no classification provided (e.g. in the case of NUTS 1 and NUTS 2 regions)
-
COAST_TYPE: Coastal typology for NUTS 3 regions.
1: coastal (on coast)
2: coastal (>= 50% of population living within 50km of the coastline)
3: non-coastal region
0: no classification provided (e.g. in the case of NUTS 1 and NUTS 2 regions)
-
FID: Same as NUTS_ID.
-
geo: Same as NUTS_ID, added for for easier joins with dplyr. Consider the status of this column "questioning" and use other columns for joins when possible.
-
geometry: geospatial information.
Value
a sf or data_frame
Eurostat: Copyright notice and free re-use of data
The following copyright notice is provided for end user convenience. Please check up-to-date copyright information from the eurostat website: https://ec.europa.eu/eurostat/about-us/policies/copyright
"(c) European Union, 1995 - today
Eurostat has a policy of encouraging free re-use of its data, both for non-commercial and commercial purposes. All statistical data, metadata, content of web pages or other dissemination tools, official publications and other documents published on its website, with the exceptions listed below, can be reused without any payment or written licence provided that:
the source is indicated as Eurostat;
when re-use involves modifications to the data or text, this must be stated clearly to the end user of the information."
For exceptions to the abovementioned principles see Eurostat website
Data source: GISCO - General Copyright
"Eurostat's general copyright notice and licence policy is applicable and can be consulted here: https://ec.europa.eu/eurostat/about-us/policies/copyright
Please also be aware of the European Commission's general conditions: https://commission.europa.eu/legal-notice_en
Moreover, there are specific provisions applicable to some of the following datasets available for downloading. The download and usage of these data is subject to their acceptance:
Administrative Units / Statistical Units
Population distribution / Demography
Transport Networks
Land Cover
Elevation (DEM)"
Of the abovementioned datasets, Administrative Units / Statistical Units is applicable if the user wants to draw maps with borders provided by GISCO / EuroGeographics.
Data source: GISCO - Administrative Units / Statistical Units
The following copyright notice is provided for end user convenience. Please check up-to-date copyright information from the GISCO website: GISCO: Geographical information and maps - Administrative units/statistical units
"In addition to the general copyright and licence policy applicable to the whole Eurostat website, the following specific provisions apply to the datasets you are downloading. The download and usage of these data is subject to the acceptance of the following clauses:
The Commission agrees to grant the non-exclusive and not transferable right to use and process the Eurostat/GISCO geographical data downloaded from this page (the "data").
The permission to use the data is granted on condition that:
the data will not be used for commercial purposes;
the source will be acknowledged. A copyright notice, as specified below, will have to be visible on any printed or electronic publication using the data downloaded from this page."
Copyright notice
When data downloaded from this page is used in any printed or electronic publication, in addition to any other provisions applicable to the whole Eurostat website, data source will have to be acknowledged in the legend of the map and in the introductory page of the publication with the following copyright notice:
EN: © EuroGeographics for the administrative boundaries
FR: © EuroGeographics pour les limites administratives
DE: © EuroGeographics bezüglich der Verwaltungsgrenzen
For publications in languages other than English, French or German, the translation of the copyright notice in the language of the publication shall be used.
If you intend to use the data commercially, please contact EuroGeographics for information regarding their licence agreements."
Author(s)
Markus Kainu markuskainu@gmail.com, Diego Hernangomez https://github.com/dieghernan/
Source
Data source: Eurostat
© EuroGeographics for the administrative boundaries
Data downloaded using giscoR
See Also
Other geospatial:
eurostat_geodata_60_2016
Examples
# Uses cached dataset
sf <- get_eurostat_geospatial(
output_class = "sf",
resolution = "60",
nuts_level = "all"
)
# Downloads dataset from server
sf2 <- get_eurostat_geospatial(
output_class = "sf",
resolution = "20",
nuts_level = "all"
)
df <- get_eurostat_geospatial(
output_class = "df",
nuts_level = "0"
)
Get Eurostat data interactive
Description
A simple interactive helper function to go through the steps of downloading and/or finding suitable eurostat datasets.
Usage
get_eurostat_interactive(code = NULL)
Arguments
code |
A unique identifier / code for the dataset of interest. If code is not
known |
Details
This function is intended to enable easy exploration of different eurostat
package functionalities and functions. In order to not drown the end user
in endless menus this function does not allow for setting
all possible get_eurostat()
function arguments. It is possible to set
time_format
, type
, lang
, stringsAsFactors
, keepFlags
, and
use.data.table
in the interactive menus.
In some datasets setting these parameters may result in a
"Error in label_eurostat" error, for example:
"labels for XXXXXX includes duplicated labels in the Eurostat dictionary".
In these cases, and with other more complex queries, please
use get_eurostat()
function directly.
See Also
Get Data from Eurostat API in JSON
Description
Retrieve data from Eurostat API in JSON format.
Usage
get_eurostat_json(
id,
filters = NULL,
type = "code",
lang = "en",
stringsAsFactors = FALSE,
proxy = FALSE,
...
)
Arguments
id |
A unique identifier / code for the dataset of interest. If code is not
known |
filters |
A named list of filters. Names of list objects are Eurostat
variable codes and values are vectors of observation codes. If |
type |
A type of variables, " |
lang |
2-letter language code, default is " |
stringsAsFactors |
if |
proxy |
Use proxy, TRUE or FALSE (default). |
... |
Arguments passed on to
|
Details
Data to retrieve from
The Eurostat Web Services
can be specified with filters. Normally, it is
better to use JSON query through get_eurostat()
, than to use
get_eurostat_json()
directly.
Queries are limited to 50 sub-indicators at a time. A time can be
filtered with fixed "time" filter or with "sinceTimePeriod" and
"lastTimePeriod" filters. A sinceTimePeriod = 2000
returns
observations from 2000 to a last available. A lastTimePeriod = 10
returns a 10 last observations. See "Filtering datasets" section below
for more detailed information about filters.
To use a proxy to connect, proxy arguments can be
passed to httr2::req_perform()
via httr2::req_proxy()
- see latter
function documentation for parameter names that can be passed with ...
.
A non-functional example:
get_eurostat_json(id, filters, proxy = TRUE, url = "127.0.0.1", port = 80)
.
When retrieving data from Eurostat JSON API the user may encounter errors.
For end user convenience, we have provided a ready-made internal dataset
sdmx_http_errors
that contains descriptive labels and descriptions about
the possible interpretation or cause of each error. These messages are
returned if the API returns a status indicating a HTTP error
(400 or greater).
The Eurostat implementation seems to be based on SDMX 2.1, which is the reason we've used SDMX Standards guidelines as a supplementary source that we have included in the dataset. What this means in practice is that the dataset contains error codes and their mappings that are not mentioned in the Eurostat website. We hope you never encounter them.
Value
A dataset as an object of data.frame
class.
Data source: Eurostat API Statistics (JSON API)
Data is downloaded from Eurostat API Statistics. See Eurostat documentation for more information about data queries in API Statistics https://wikis.ec.europa.eu/display/EUROSTATHELP/API+Statistics+-+data+query
This replaces the old JSON Web Services that was used by Eurostat before February 2023 and by the eurostat R package versions before 3.7.13. See Eurostat documentation about the migration from JSON web service to API Statistics for more information about the differences between the old and the new service: https://wikis.ec.europa.eu/display/EUROSTATHELP/API+Statistics+-+migrating+from+JSON+web+service+to+API+Statistics
For easily viewing which filtering options are available - in addition to the default ones, time and language - Eurostat Web services Query builder tool may be useful: https://ec.europa.eu/eurostat/web/query-builder
Filtering datasets
When using Eurostat API Statistics (JSON API), datasets can be filtered
before they are downloaded and saved in local memory. The general format
for filter parameters is <DIMENSION_CODE>=<VALUE>
.
Filter parameters are optional but the used dimension codes must be present
in the data product that is being queried. Dimension codes can
vary between different data products so it may be useful to examine new
datasets in Eurostat data browser beforehand. However, most if not all
Eurostat datasets concern European countries and contain information that
was gathered at some point in time, so geo
and time
dimension codes
can usually be used.
<DIMENSION_CODE>
and <VALUE>
are case-insensitive and they can be written
in lowercase or uppercase in the query.
Parameters are passed onto the eurostat
package functions get_eurostat()
and get_eurostat_json()
as a list item. If an individual item contains
multiple items, as it often can be in the case of geo
parameters and
other optional items, they must be in the form of a vector: c("FI", "SE")
.
For examples on how to use these parameters, see function examples below.
Time parameters
time
and time_period
address the same TIME_PERIOD
dimension in the
dataset and can be used interchangeably. In the Eurostat documentation
it is stated that "Using more than one Time parameter in the same query
is not accepted", but practice has shown that actually Eurostat API allows
multiple time
parameters in the same query. This makes it possible to
use R colon operator when writing queries, so time = c(2015:2018)
translates to &time=2015&time=2016&time=2017&time=2018
.
The only exception
to this is when the queried dataset contains e.g. quarterly data and
TIME_PERIOD
is saved as 2015-Q1
, 2015-Q2
etc. Then it is possible
to use time=2015-Q1&time=2015-Q2
style in the query URL, but this makes it
unfeasible to use the colon operator and requires a lot of manual typing.
Because of this, it is useful to know about other time parameters as well:
-
untilTimePeriod
: return dataset items from the oldest record up until the set time, for example "all data until 2000":untilTimePeriod = 2000
-
sinceTimePeriod
: return dataset items starting from set time, for example "all datastarting from 2008":sinceTimePeriod = 2008
-
lastTimePeriod
: starting from the most recent time period, how many preceding time periods should be returned? For example 10 most recent observations:lastTimePeriod = 10
Using both untilTimePeriod
and sinceTimePeriod
parameters in the same
query is allowed, making the usage of the R colon operator unnecessary.
In the case of quarterly data, using untilTimePeriod
and sinceTimePeriod
parameters also works, as opposed to the colon operator, so it is generally
safer to use them as well.
Other dimensions
In get_eurostat_json()
examples nama_10_gdp
dataset is filtered with
two additional filter parameters:
-
na_item = "B1GQ"
-
unit = "CLV_I10"
Filters like these are most likely unique to the nama_10_gdp
dataset
(or other datasets within the same domain) and should
not be used with others dataset without user discretion.
By using label_eurostat()
we know that "B1GQ"
stands for
"Gross domestic product at market prices" and
"CLV_I10"
means "Chain linked volumes, index 2010=100".
Different dimension codes can be translated to a natural language by using
the get_eurostat_dic()
function, which returns labels for individual
dimension items such as na_item
and unit
, as opposed to
label_eurostat()
which does it for whole datasets. For example, the
parameter na_item
stands for "National accounts indicator (ESA 2010)" and
unit
stands for "Unit of measure".
Language
All datasets have metadata available in English, French and German. If no parameter is given, the labels are returned in English.
Example:
-
lang = "fr"
More information
For more information about data filtering see Eurostat documentation on API Statistics: https://wikis.ec.europa.eu/display/EUROSTATHELP/API+Statistics+-+data+query#APIStatisticsdataquery-TheparametersdefinedintheRESTrequest
Eurostat: Copyright notice and free re-use of data
The following copyright notice is provided for end user convenience. Please check up-to-date copyright information from the eurostat website: https://ec.europa.eu/eurostat/about-us/policies/copyright
"(c) European Union, 1995 - today
Eurostat has a policy of encouraging free re-use of its data, both for non-commercial and commercial purposes. All statistical data, metadata, content of web pages or other dissemination tools, official publications and other documents published on its website, with the exceptions listed below, can be reused without any payment or written licence provided that:
the source is indicated as Eurostat;
when re-use involves modifications to the data or text, this must be stated clearly to the end user of the information."
For exceptions to the abovementioned principles see Eurostat website
Citing Eurostat data
For citing datasets, use get_bibentry()
to build a bibliography that
is suitable for your reference manager of choice.
When using Eurostat data in other contexts than academic publications that in-text citations or footnotes/endnotes, the following guidelines may be helpful:
The origin of the data should always be mentioned as "Source: Eurostat".
The online dataset codes(s) should also be provided in order to ensure transparency and facilitate access to the Eurostat data and related methodological information. For example: "Source: Eurostat (online data code: namq_10_gdp)"
Online publications (e.g. web pages, PDF) should include a clickable link to the dataset using the bookmark functionality available in the Eurostat data browser.
It should be avoided to associate different entities (e.g. Eurostat, National Statistical Offices, other data providers) to the same dataset or indicator without specifying the role of each of them in the treatment of data.
See also section "Eurostat: Copyright notice and free re-use of data"
in get_eurostat()
documentation.
Disclaimer: Availability of filtering functionalities
Currently it only possible to download filtered data through API Statistics
(JSON API) when using eurostat
package, although technically filtering
datasets downloaded through the SDMX Dissemination API is also supported by
Eurostat. We may support this feature in the future. In the meantime, if you
are interested in filtering Dissemination API data queries manually, please
consult the following Eurostat documentation:
https://wikis.ec.europa.eu/display/EUROSTATHELP/API+SDMX+2.1+-+data+filtering
Author(s)
Przemyslaw Biecek, Leo Lahti, Janne Huovari Markus Kainu and Pyry Kantanen
References
See citation("eurostat")
:
Kindly cite the eurostat R package as follows: Lahti L., Huovari J., Kainu M., and Biecek P. (2017). Retrieval and analysis of Eurostat open data with the eurostat package. The R Journal 9(1), pp. 385-392. doi: 10.32614/RJ-2017-019 Lahti, L., Huovari J., Kainu M., Biecek P., Hernangomez D., Antal D., and Kantanen P. (2023). eurostat: Tools for Eurostat Open Data [Computer software]. R package version 4.0.0. https://github.com/rOpenGov/eurostat To see these entries in BibTeX format, use 'print(<citation>, bibtex=TRUE)', 'toBibtex(.)', or set 'options(citation.bibtex.max=999)'.
When citing data downloaded from Eurostat, see section "Citing Eurostat data"
in get_eurostat()
documentation.
See Also
Examples
## Not run:
# Generally speaking these queries would be done through get_eurostat
tmp <- get_eurostat_json("nama_10_gdp")
yy <- get_eurostat_json("nama_10_gdp", filters = list(
geo = c("FI", "SE", "EU28"),
time = c(2015:2023),
lang = "FR",
na_item = "B1GQ",
unit = "CLV_I10"
))
# TIME_PERIOD filter works also with the new JSON API
yy2 <- get_eurostat_json("nama_10_gdp", filters = list(
geo = c("FI", "SE", "EU28"),
TIME_PERIOD = c(2015:2023),
lang = "FR",
na_item = "B1GQ",
unit = "CLV_I10"
))
# An example from get_eurostat
dd <- get_eurostat("nama_10_gdp",
filters = list(
geo = "FI",
na_item = "B1GQ",
unit = "CLV_I10"
))
## End(Not run)
Download Data from Eurostat Dissemination API
Description
Download data from the eurostat database through the new dissemination API.
Usage
get_eurostat_raw(id, use.data.table = FALSE)
Arguments
id |
A unique identifier / code for the dataset of interest. If code is not
known |
use.data.table |
Use faster data.table functions? Default is FALSE. On Windows requires that RTools is installed. |
Value
A dataset in tibble format. First column contains comma separated codes of cases. Other columns usually corresponds to years and column names are years with preceding X. Data is in character format as it contains values together with eurostat flags for data.
Data source: Eurostat SDMX 2.1 Dissemination API
Data is downloaded from Eurostat SDMX 2.1 API endpoint as compressed TSV files that are transformed into tabular format. See Eurostat documentation for more information: https://wikis.ec.europa.eu/display/EUROSTATHELP/API+SDMX+2.1+-+data+query
The new dissemination API replaces the old bulk download facility that was used by Eurostat before October 2023 and by the eurostat R package versions before 4.0.0. See Eurostat documentation about the transition from Bulk Download to API for more information about the differences between the old bulk download facility and the data provided by the new API connection: https://wikis.ec.europa.eu/display/EUROSTATHELP/Transition+-+from+Eurostat+Bulk+Download+to+API
See especially the document Migrating_to_API_TSV.pdf that describes the changes in TSV file format in new applications.
For more information about SDMX 2.1, see SDMX standards: Section 7: Guidelines for the use of web services, Version 2.1: https://sdmx.org/wp-content/uploads/SDMX_2-1_SECTION_7_WebServicesGuidelines.pdf
Eurostat: Copyright notice and free re-use of data
The following copyright notice is provided for end user convenience. Please check up-to-date copyright information from the eurostat website: https://ec.europa.eu/eurostat/about-us/policies/copyright
"(c) European Union, 1995 - today
Eurostat has a policy of encouraging free re-use of its data, both for non-commercial and commercial purposes. All statistical data, metadata, content of web pages or other dissemination tools, official publications and other documents published on its website, with the exceptions listed below, can be reused without any payment or written licence provided that:
the source is indicated as Eurostat;
when re-use involves modifications to the data or text, this must be stated clearly to the end user of the information."
For exceptions to the abovementioned principles see Eurostat website
Citing Eurostat data
For citing datasets, use get_bibentry()
to build a bibliography that
is suitable for your reference manager of choice.
When using Eurostat data in other contexts than academic publications that in-text citations or footnotes/endnotes, the following guidelines may be helpful:
The origin of the data should always be mentioned as "Source: Eurostat".
The online dataset codes(s) should also be provided in order to ensure transparency and facilitate access to the Eurostat data and related methodological information. For example: "Source: Eurostat (online data code: namq_10_gdp)"
Online publications (e.g. web pages, PDF) should include a clickable link to the dataset using the bookmark functionality available in the Eurostat data browser.
It should be avoided to associate different entities (e.g. Eurostat, National Statistical Offices, other data providers) to the same dataset or indicator without specifying the role of each of them in the treatment of data.
See also section "Eurostat: Copyright notice and free re-use of data"
in get_eurostat()
documentation.
Disclaimer: Availability of filtering functionalities
Currently it only possible to download filtered data through API Statistics
(JSON API) when using eurostat
package, although technically filtering
datasets downloaded through the SDMX Dissemination API is also supported by
Eurostat. We may support this feature in the future. In the meantime, if you
are interested in filtering Dissemination API data queries manually, please
consult the following Eurostat documentation:
https://wikis.ec.europa.eu/display/EUROSTATHELP/API+SDMX+2.1+-+data+filtering
Author(s)
Przemyslaw Biecek, Leo Lahti, Janne Huovari and Pyry Kantanen
References
See citation("eurostat")
:
# Kindly cite the eurostat R package as follows: # # Lahti L., Huovari J., Kainu M., and Biecek P. (2017). Retrieval and # analysis of Eurostat open data with the eurostat package. The R # Journal 9(1), pp. 385-392. doi: 10.32614/RJ-2017-019 # # Lahti, L., Huovari J., Kainu M., Biecek P., Hernangomez D., Antal D., # and Kantanen P. (2023). eurostat: Tools for Eurostat Open Data # [Computer software]. R package version 4.0.0. # https://github.com/rOpenGov/eurostat # # To see these entries in BibTeX format, use 'print(<citation>, # bibtex=TRUE)', 'toBibtex(.)', or set # 'options(citation.bibtex.max=999)'.
See Also
Examples
eurostat:::get_eurostat_raw("educ_iste")
Download Table of Contents of Eurostat Data Sets
Description
Download table of contents (TOC) of eurostat datasets.
Usage
get_eurostat_toc(lang = "en")
Arguments
lang |
2-letter language code, default is " |
Details
In the downloaded Eurostat Table of Contents the 'code' column values are refer to the function 'id' that is used as an argument in certain functions when downloading datasets.
Value
A tibble with nine columns:
- title
Dataset title in English (default)
- code
Each item (dataset, table and folder) of the TOC has a unique code which allows it to be identified in the API. Used in the
get_eurostat()
andget_eurostat_raw()
functions to retrieve datasets.- type
dataset, folder or table
- last.update.of.data
Date, indicates the last time the dataset/table was updated (format
DD.MM.YYYY
or%d.%m.%Y
)- last.table.structure.change
Date, indicates the last time the dataset/table structure was modified (format
DD.MM.YYYY
or%d.%m.%Y
)- data.start
Date of the oldest value included in the dataset (if available) (format usually
YYYY
or%Y
but can also beYYYY-MM
,YYYY-MM-DD
,YYYY-SN
,YYYY-QN
etc.)- data.end
Date of the most recent value included in the dataset (if available) (format usually
YYYY
or%Y
but can also beYYYY-MM
,YYYY-MM-DD
,YYYY-SN
,YYYY-QN
etc.)- values
Number of actual values included in the dataset
- hierarchy
Hierarchy of the data navigation tree, represented in the original txt file by a 4-spaces indentation prefix in the title
Data source: Eurostat Table of Contents
The Eurostat Table of Contents (TOC) is downloaded from https://ec.europa.eu/eurostat/api/dissemination/catalogue/toc/txt?lang=en (default) or from French or German language variants: https://ec.europa.eu/eurostat/api/dissemination/catalogue/toc/txt?lang=fr https://ec.europa.eu/eurostat/api/dissemination/catalogue/toc/txt?lang=de
See Eurostat documentation on TOC items: https://wikis.ec.europa.eu/display/EUROSTATHELP/API+-+Detailed+guidelines+-+Catalogue+API+-+TOC
Author(s)
Przemyslaw Biecek, Leo Lahti and Pyry Kantanen ropengov-forum@googlegroups.com
References
See citation("eurostat")
:
Kindly cite the eurostat R package as follows: Lahti L., Huovari J., Kainu M., and Biecek P. (2017). Retrieval and analysis of Eurostat open data with the eurostat package. The R Journal 9(1), pp. 385-392. doi: 10.32614/RJ-2017-019 Lahti, L., Huovari J., Kainu M., Biecek P., Hernangomez D., Antal D., and Kantanen P. (2023). eurostat: Tools for Eurostat Open Data [Computer software]. R package version 4.0.0. https://github.com/rOpenGov/eurostat To see these entries in BibTeX format, use 'print(<citation>, bibtex=TRUE)', 'toBibtex(.)', or set 'options(citation.bibtex.max=999)'.
When citing data downloaded from Eurostat, see section "Citing Eurostat data"
in get_eurostat()
documentation.
See Also
get_eurostat()
, search_eurostat()
Examples
tmp <- get_eurostat_toc()
head(tmp)
# Convert columns containing dates as character into Date class
# Last update of data
tmp[[4]] <- as.Date(tmp[[4]], format = c("%d.%m.%Y"))
# Last table structure change
tmp[[5]] <- as.Date(tmp[[5]], format = c("%d.%m.%Y"))
# Data start, contains several formats (date, week, month quarter, semester)
# Unfortunately semesters are not directly supported so they need to be
# changed into quarters
tmp$data.start <- gsub("S2", "Q3", tmp$data.start)
tmp$data.start <- lubridate::as_date(
x = tmp$data.start,
format = c("%Y", "%Y-Q%q", "%Y-W%W", "%Y-S%q", "%Y-%m-%d", "%Y-%m")
)
# Data end, same as data start
tmp$data.end <- gsub("S2", "Q3", tmp$data.end)
tmp$data.end <- lubridate::as_date(
x = tmp$data.end,
format = c("%Y", "%Y-Q%q", "%Y-W%W", "%Y-S%q", "%Y-%m-%d", "%Y-%m")
)
Harmonize Country Code
Description
The European Commission and the Eurostat generally uses ISO 3166-1 alpha-2 codes with two exceptions: EL (not GR) is used to represent Greece, and UK (not GB) is used to represent the United Kingdom. This function turns country codes into to ISO 3166-1 alpha-2.
Usage
harmonize_country_code(x)
Arguments
x |
A character or a factor vector of eurostat countycodes. |
Value
a vector.
Author(s)
Janne Huovari janne.huovari@ptt.fi
See Also
Other helpers:
cut_to_classes()
,
dic_order()
,
eurotime2date()
,
eurotime2num()
,
label_eurostat()
Examples
lp <- get_eurostat("nama_10_lp_ulc")
lp$geo <- harmonize_country_code(lp$geo)
Harmonize NUTS region codes that changed with the NUTS2016
definition
Description
Eurostat mixes NUTS2013
and NUTS2016
geographic
label codes in the 'geo'
column, which creates time-wise comparativity
issues. This deprecated function checked if you data is affected by this
problem and gives information on what to do.
This function is deprecated, and a more general function was moved to
regions::validate_nuts_regions()
.
Usage
harmonize_geo_code(dat)
Arguments
dat |
A Eurostat data frame downloaded with |
Value
An augmented data frame that explains potential problems and possible solutions.
Author(s)
Daniel Antal
See Also
regions::validate_nuts_regions()
Other regions functions:
add_nuts_level()
,
recode_to_nuts_2013()
,
recode_to_nuts_2016()
,
reexports
Examples
dat <- eurostat::tgs00026
regions::validate_nuts_regions(dat)
Get Eurostat Codes for data downloaded from new dissemination API
Description
Get definitions for Eurostat codes from Eurostat dictionaries.
Usage
label_eurostat(
x,
dic = NULL,
code = NULL,
eu_order = FALSE,
lang = "en",
countrycode = NULL,
countrycode_nomatch = NULL,
custom_dic = NULL,
fix_duplicated = FALSE
)
label_eurostat_vars(x = NULL, id, lang = "en")
label_eurostat_tables(x, lang = "en")
Arguments
x |
A character or a factor vector or a data_frame. |
dic |
A string (vector) naming eurostat dictionary or dictionaries.
If |
code |
For data_frames names of the column for which also code columns should be retained. The suffix "_code" is added to code column names. |
eu_order |
Logical. Should Eurostat ordering used for label levels. Affects only factors. |
lang |
2-letter language code, default is " |
countrycode |
A |
countrycode_nomatch |
What to do when using the countrycode to label
a "geo" and countrycode fails to find a match, for example other than
country codes like EU28. The original code is used with
a |
custom_dic |
a named vector or named list of named vectors to give an own dictionary for (part of) codes. Names of the vector should be codes and values labels. List can be used to specify dictionaries and then list names should be dictionary codes. |
fix_duplicated |
A logical. If TRUE, the code is added to the duplicated label values. If FALSE (default) error is given if labeling produce duplicates. |
id |
A unique identifier / code for the dataset of interest. If code is not
known |
Details
A character or a factor vector of codes returns a corresponding
vector of definitions. label_eurostat()
labels also data_frames from
get_eurostat()
. For vectors a dictionary name have to be
supplied. For data_frames dictionary names are taken from column names.
"time" and "values" columns are returned as they were, so you can supply
data_frame from get_eurostat()
and get data_frame with
definitions instead of codes.
Some Eurostat dictionaries includes duplicated labels. By default
duplicated labels cause an error, but they can be fixed automatically
with fix_duplicated = TRUE
.
Value
a vector or a data_frame.
Functions
-
label_eurostat_vars()
: Get definitions for variable (column) names. -
label_eurostat_tables()
: Get definitions for table names
Author(s)
Janne Huovari janne.huovari@ptt.fi
See Also
Other helpers:
cut_to_classes()
,
dic_order()
,
eurotime2date()
,
eurotime2num()
,
harmonize_country_code()
Examples
## Not run:
lp <- get_eurostat("nama_10_lp_ulc")
lpl <- label_eurostat(lp)
str(lpl)
lpl_order <- label_eurostat(lp, eu_order = TRUE)
lpl_code <- label_eurostat(lp, code = "unit")
# Note that the dataset id must be provided in label_eurostat_vars
label_eurostat_vars(id = "nama_10_lp_ulc", x = "geo", lang = "en")
label_eurostat_tables("nama_10_lp_ulc")
label_eurostat(c("FI", "DE", "EU28"), dic = "geo")
label_eurostat(
c("FI", "DE", "EU28"),
dic = "geo",
custom_dic = c(DE = "Germany")
)
label_eurostat(
c("FI", "DE", "EU28"),
dic = "geo", countrycode = "country.name",
custom_dic = c(EU28 = "EU")
)
label_eurostat(
c("FI", "DE", "EU28"),
dic = "geo",
countrycode = "country.name"
)
# In Finnish
label_eurostat(
c("FI", "DE", "EU28"),
dic = "geo",
countrycode = "cldr.short.fi"
)
## End(Not run)
Output cache information as data.frame
Description
Parses cache_list.json file and returns a data.frame
Usage
list_eurostat_cache_items(cache_dir = NULL)
Arguments
cache_dir |
a path to a cache directory. |
Value
A data.frame object with 3 columns: dataset code, download date and query md5 hash
Recode geo labels and rename regions from NUTS2016 to NUTS2013
Description
Eurostat mixes NUTS2013 and NUTS2016 geographic label codes
in the 'geo'
column, which creates time-wise comparativity issues.
This function is deprecated, and a more general function was moved to
[regions::recode_nuts()]
.
Usage
recode_to_nuts_2013(dat)
Arguments
dat |
A Eurostat data frame downloaded with
|
Value
An augmented and potentially relabelled data frame which
contains all formerly 'NUTS2013'
definition geo labels in the
'NUTS2016'
vocabulary when only the code changed, but the
boundary did not. It also contains some information on other geo labels
that cannot be brought to the current 'NUTS2013'
definition.
Furthermore, when the official name of the region changed, it will use
the new name (if the otherwise the region boundary did not change.)
If not called before, the function will use the helper function
harmonize_geo_code()
Author(s)
Daniel Antal
See Also
Other regions functions:
add_nuts_level()
,
harmonize_geo_code()
,
recode_to_nuts_2016()
,
reexports
Examples
test_regional_codes <- data.frame(
geo = c("FRB", "FRE", "UKN02", "IE022", "FR243", "FRB03"),
time = c(rep(as.Date("2014-01-01"), 5), as.Date("2015-01-01")),
values = c(1:6),
control = c(
"Changed from NUTS2 to NUTS1",
"New region NUTS2016 only",
"Discontinued region NUTS2013",
"Boundary shift NUTS2013",
"Recoded in NUTS2013",
"Recoded in NUTS2016"
)
)
recode_to_nuts_2013(test_regional_codes)
Recode geo labels and rename regions from NUTS2013 to NUTS2016
Description
Eurostat mixes NUTS2013 and NUTS2016 geographic label codes
in the 'geo'
column, which creates time-wise comparativity issues.
This function is deprecated, and a more general function was moved to
[regions::recode_nuts()]
.
Usage
recode_to_nuts_2016(dat)
Arguments
dat |
A Eurostat data frame downloaded with
|
Value
An augmented and potentially relabelled data frame which
contains all formerly 'NUTS2013'
definition geo labels in the
'NUTS2016'
vocabulary when only the code changed, but the
boundary did not. It also contains some information on other geo labels
that cannot be brought to the current 'NUTS2016'
definition.
Furthermore, when the official name of the region changed, it will use
the new name (if the otherwise the region boundary did not change.)
If not called before, the function will use the helper function
harmonize_geo_code()
Author(s)
Daniel Antal
See Also
Other regions functions:
add_nuts_level()
,
harmonize_geo_code()
,
recode_to_nuts_2013()
,
reexports
Examples
test_regional_codes <- data.frame(
geo = c("FRB", "FRE", "UKN02", "IE022", "FR243", "FRB03"),
time = c(rep(as.Date("2014-01-01"), 5), as.Date("2015-01-01")),
values = c(1:6),
control = c(
"Changed from NUTS2 to NUTS1",
"New region NUTS2016 only",
"Discontinued region NUTS2013",
"Boundary shift NUTS2013",
"Recoded in NUTS2013",
"Recoded in NUTS2016"
)
)
recode_to_nuts_2016(test_regional_codes)
Recode Region Codes From Source To Target NUTS Typology
Description
These objects are imported from other packages. Follow the links below to see their documentation.
Arguments
dat |
A data frame with a 3-5 character |
geo_var |
Defaults to |
geo |
A vector of geographical code to validate. |
nuts_year |
A valid NUTS edition year. |
Details
While country codes are technically not part of the NUTS typologies,
Eurostat de facto uses a NUTS0
typology to identify countries.
This de facto typology has three exception which are handled by the
validate_nuts_countries function.
NUTS typologies have different versions, therefore the conformity
is validated with one specific versions, which can be any of these:
1999
, 2003
, 2006
, 2010
,
2013
, the currently used 2016
and the already
announced and defined 2021
.
The NUTS typology was codified with the NUTS2003
, and the
pre-1999 NUTS typologies may confuse programmatic data processing,
given that some NUTS1 regions were identified with country codes
in smaller countries that had no NUTS1
divisions.
Currently the 2016
is used by Eurostat, but many datasets
still contain 2013
and sometimes earlier metadata.
Value
The original data frame with a 'geo_var'
column is extended
with a 'typology'
column that states in which typology is the 'geo_var'
a valid code. For invalid codes, looks up potential reasons of invalidity
and adds them to the 'typology_change'
column, and at last it
adds a column of character vector containing the desired codes in the
target typology, for example, in the NUTS2013 typology.
Returns the original dat
data frame with a column
that specifies the comformity with the NUTS definition of the year
nuts_year
.
A character list with the valid typology, or 'invalid' in the cases when the geo coding is not valid.
See Also
Other regions functions:
add_nuts_level()
,
harmonize_geo_code()
,
recode_to_nuts_2013()
,
recode_to_nuts_2016()
Other regions functions:
add_nuts_level()
,
harmonize_geo_code()
,
recode_to_nuts_2013()
,
recode_to_nuts_2016()
Other regions functions:
add_nuts_level()
,
harmonize_geo_code()
,
recode_to_nuts_2013()
,
recode_to_nuts_2016()
Examples
{
foo <- data.frame (
geo = c("FR", "DEE32", "UKI3" ,
"HU12", "DED",
"FRK"),
values = runif(6, 0, 100 ),
stringsAsFactors = FALSE )
recode_nuts(foo, nuts_year = 2013)
}
my_reg_data <- data.frame(
geo = c(
"BE1", "HU102", "FR1",
"DED", "FR7", "TR", "DED2",
"EL", "XK", "GB"
),
values = runif(10)
)
validate_nuts_regions(my_reg_data)
validate_nuts_regions(my_reg_data, nuts_year = 2013)
validate_nuts_regions(my_reg_data, nuts_year = 2003)
my_reg_data <- data.frame(
geo = c(
"BE1", "HU102", "FR1",
"DED", "FR7", "TR", "DED2",
"EL", "XK", "GB"
),
values = runif(10)
)
validate_geo_code(my_reg_data$geo)
Grep Datasets Titles from Eurostat
Description
Lists datasets from eurostat table of contents with the particular pattern in item titles.
Usage
search_eurostat(
pattern,
type = "dataset",
column = "title",
fixed = TRUE,
lang = "en"
)
Arguments
pattern |
Text string that is used to search from dataset, folder or table titles, depending on the type argument. |
type |
Selection for types of datasets to be searched. Default is |
column |
Selection for the column of TOC where search is done. Default is |
fixed |
logical. If TRUE (default), pattern is a string to be matched as is.
See |
lang |
2-letter language code, default is " |
Details
Downloads list of all datasets available on eurostat and return list of names of datasets that contains particular pattern in the dataset description. E.g. all datasets related to education of teaching.
If you wish to perform searches on other fields than item title,
you can download the Eurostat Table of Contents manually using
get_eurostat_toc()
function and use grep()
function normally. The data
browser on Eurostat website may also return useful results.
Value
A tibble with nine columns:
- title
Dataset title in English (default)
- code
Each item (dataset, table and folder) of the TOC has a unique code which allows it to be identified in the API. Used in the
get_eurostat()
andget_eurostat_raw()
functions to retrieve datasets.- type
dataset, folder or table
- last.update.of.data
Date, indicates the last time the dataset/table was updated (format
DD.MM.YYYY
or%d.%m.%Y
)- last.table.structure.change
Date, indicates the last time the dataset/table structure was modified (format
DD.MM.YYYY
or%d.%m.%Y
)- data.start
Date of the oldest value included in the dataset (if available) (format usually
YYYY
or%Y
but can also beYYYY-MM
,YYYY-MM-DD
,YYYY-SN
,YYYY-QN
etc.)- data.end
Date of the most recent value included in the dataset (if available) (format usually
YYYY
or%Y
but can also beYYYY-MM
,YYYY-MM-DD
,YYYY-SN
,YYYY-QN
etc.)- values
Number of actual values included in the dataset
- hierarchy
Hierarchy of the data navigation tree, represented in the original txt file by a 4-spaces indentation prefix in the title
Data source: Eurostat Table of Contents
The Eurostat Table of Contents (TOC) is downloaded from https://ec.europa.eu/eurostat/api/dissemination/catalogue/toc/txt?lang=en (default) or from French or German language variants: https://ec.europa.eu/eurostat/api/dissemination/catalogue/toc/txt?lang=fr https://ec.europa.eu/eurostat/api/dissemination/catalogue/toc/txt?lang=de
See Eurostat documentation on TOC items: https://wikis.ec.europa.eu/display/EUROSTATHELP/API+-+Detailed+guidelines+-+Catalogue+API+-+TOC
Author(s)
Przemyslaw Biecek and Leo Lahti ropengov-forum@googlegroups.com
References
See citation("eurostat")
:
Kindly cite the eurostat R package as follows: Lahti L., Huovari J., Kainu M., and Biecek P. (2017). Retrieval and analysis of Eurostat open data with the eurostat package. The R Journal 9(1), pp. 385-392. doi: 10.32614/RJ-2017-019 Lahti, L., Huovari J., Kainu M., Biecek P., Hernangomez D., Antal D., and Kantanen P. (2023). eurostat: Tools for Eurostat Open Data [Computer software]. R package version 4.0.0. https://github.com/rOpenGov/eurostat To see these entries in BibTeX format, use 'print(<citation>, bibtex=TRUE)', 'toBibtex(.)', or set 'options(citation.bibtex.max=999)'.
When citing data downloaded from Eurostat, see section "Citing Eurostat data"
in get_eurostat()
documentation.
See Also
get_eurostat()
, search_eurostat()
Examples
tmp <- search_eurostat("education")
head(tmp)
# Use "fixed = TRUE" when pattern has characters that would need escaping.
# Here, parentheses would normally need to be escaped in regex
tmp <- search_eurostat("Live births (total) by NUTS 3 region", fixed = TRUE)
Set Eurostat Cache
Description
This function will store your cache_dir
path on your local machine
and would load it for future sessions. Type
Sys.getenv("EUROSTAT_CACHE_DIR")
to
find your cached path.
Alternatively, you can store the cache_dir
manually with the following
options:
Run
Sys.setenv(EUROSTAT_CACHE_DIR = "cache_dir")
. You would need to run this command on each session (Similar toinstall = FALSE
).Set
options(eurostat_cache_dir = "cache_dir")
. Similar to the previous option. This is provided for backwards compatibility purposes.Write this line on your .Renviron file:
EUROSTAT_CACHE_DIR = "value_for_cache_dir"
(same behavior thaninstall = TRUE
). This would store yourcache_dir
permanently.
Usage
set_eurostat_cache_dir(
cache_dir,
overwrite = FALSE,
install = FALSE,
verbose = TRUE
)
Arguments
cache_dir |
A path to a cache directory. On missing value the function
would store the cached files on a temporary dir (See
|
overwrite |
If this is set to |
install |
if |
verbose |
Logical, displays information. Useful for debugging,
default is |
Value
An (invisible) character with the path to your cache_dir
.
Author(s)
Diego Hernangómez
See Also
Other cache utilities:
clean_eurostat_cache()
Examples
# Don't run this! It would modify your current state
## Not run:
set_eurostat_cache_dir(verbose = TRUE)
## End(Not run)
Sys.getenv("EUROSTAT_CACHE_DIR")
Set Eurostat TOC
Description
Internal function.
Usage
set_eurostat_toc(lang = "en")
Arguments
lang |
2-letter language code, default is " |
Value
Empty element
Author(s)
Przemyslaw Biecek and Leo Lahti ropengov-forum@googlegroups.com
References
see citation("eurostat")
See Also
get_eurostat_toc()
toc_count_children()
toc_determine_hierarchy()
toc_list_children()
toc_count_whitespace()
Auxiliary Data
Description
Auxiliary Data Sets
Usage
tgs00026
Format
data_frame
Details
Disposable income of private households by NUTS 2 regions
Retrieved with: tgs00026 <- get_eurostat("tgs00026", time_format = "raw")
Data retrieval date: 2022-06-27
See Also
Other datasets:
eu_countries
,
eurostat_geodata_60_2016
Transform Data into Row-Column-Value Format
Description
Transform raw Eurostat data table downloaded from the API into a tidy row-column-value format (RCV).
Usage
tidy_eurostat(
dat,
time_format = "date",
select_time = NULL,
stringsAsFactors = FALSE,
keepFlags = FALSE,
use.data.table = FALSE
)
Arguments
dat |
a data_frame from |
time_format |
a string giving a type of the conversion of the time column from the
eurostat format. The default argument " |
select_time |
a character symbol for a time frequency or |
stringsAsFactors |
if |
keepFlags |
a logical whether the flags (e.g. "confidential",
"provisional") should be kept in a separate column or if they
can be removed. Default is |
use.data.table |
Use faster data.table functions? Default is FALSE. On Windows requires that RTools is installed. |
Value
tibble in the melted format with the last column 'values'.
Author(s)
Przemyslaw Biecek, Leo Lahti, Janne Huovari and Pyry Kantanen
References
See citation("eurostat")
:
Kindly cite the eurostat R package as follows: Lahti L., Huovari J., Kainu M., and Biecek P. (2017). Retrieval and analysis of Eurostat open data with the eurostat package. The R Journal 9(1), pp. 385-392. doi: 10.32614/RJ-2017-019 Lahti, L., Huovari J., Kainu M., Biecek P., Hernangomez D., Antal D., and Kantanen P. (2023). eurostat: Tools for Eurostat Open Data [Computer software]. R package version 4.0.0. https://github.com/rOpenGov/eurostat To see these entries in BibTeX format, use 'print(<citation>, bibtex=TRUE)', 'toBibtex(.)', or set 'options(citation.bibtex.max=999)'.
When citing data downloaded from Eurostat, see section "Citing Eurostat data"
in get_eurostat()
documentation.
See Also
get_eurostat()
, convert_time_col()
, eurotime2date()
Examples
## Not run:
# Example of a dataset with multiple time series
get_eurostat("AVIA_GOR_ME",
time_format = "date_last",
cache = F
)
## End(Not run)
Count number of children
Description
Determine how many children a certain TOC item (usually a folder) has.
Usage
toc_count_children(code)
Arguments
code |
Eurostat TOC item code (folder, dataset, table) |
Author(s)
Pyry Kantanen
See Also
get_eurostat_toc()
toc_count_children()
toc_determine_hierarchy()
toc_list_children()
toc_count_whitespace()
Count white space at the start of the title
Description
Counts the number of white space characters at the start of the string.
Usage
toc_count_whitespace(input_string)
Arguments
input_string |
A string containing Eurostat TOC titles |
Details
Used in toc_determine_hierarchy function to determine hierarchy. Hierarchy is defined in Eurostat .txt format TOC files by the number of white space characters at intervals of four. For example, " Foo" (4 white space characters) is one level higher than " Bar" (8 white space characters). "Database by themes" (0 white space characters before the first alphanumeric character) is highest in the hierarchy.
The function will return a warning if the input has white space in anything else than as increments of 4. 0, 4, 8... are acceptable but 3, 6, 10... are not.
Value
Numeric (number of white space characters)
Author(s)
Pyry Kantanen
See Also
get_eurostat_toc()
toc_count_children()
toc_determine_hierarchy()
toc_list_children()
toc_count_whitespace()
Examples
strings <- c(" abc", " cdf", "no_spaces")
for (string in strings) {
whitespace_count <- eurostat:::toc_count_whitespace(string)
cat("String:", string, "\tWhitespace Count:", whitespace_count, "\n")
}
Determine level in hierarchy
Description
Divides the number of spaces before alphanumeric characters with 4 and uses the result to determine hierarchy. Top level is 0.
Usage
toc_determine_hierarchy(input_string)
Arguments
input_string |
A string containing Eurostat TOC titles |
Details
Used in toc_determine_hierarchy function to determine hierarchy. Hierarchy is defined in Eurostat .txt format TOC files by the number of white space characters at intervals of four. For example, " Foo" (4 white space characters) is one level higher than " Bar" (8 white space characters). "Database by themes" (0 white space characters before the first alphanumeric character) is highest in the hierarchy.
The function will return a warning if the input has white space in anything else than as increments of 4. 0, 4, 8... are acceptable but 3, 6, 10... are not.
Value
Numeric
Author(s)
Pyry Kantanen
See Also
get_eurostat_toc()
toc_count_children()
toc_determine_hierarchy()
toc_list_children()
toc_count_whitespace()
Examples
strings <- c(" abc", " cdf", "no_spaces")
eurostat:::toc_determine_hierarchy(strings)
List children
Description
List children of a specific folder.
Usage
toc_list_children(code)
Arguments
code |
Eurostat TOC item code (folder, dataset, table) |
Author(s)
Pyry Kantanen
See Also
get_eurostat_toc()
toc_count_children()
toc_determine_hierarchy()
toc_list_children()
toc_count_whitespace()