Help for package cbsREPS

Type:

Package

Title:

Hedonic and Multilateral Index Methods for Real Estate Price Statistics

Version:

0.1.0

Maintainer:

Vivek Gajadhar <v.gajadhar@cbs.nl>

Description:

Compute price indices using various Hedonic and multilateral methods, including Laspeyres, Paasche, Fisher, and HMTS (Hedonic Multilateral Time series re-estimation with splicing). The central function calculate_price_index() offers a unified interface for running these methods on structured datasets. This package is designed to support index construction workflows for real estate and other domains where quality-adjusted price comparisons over time are essential. The development of this package was funded by Eurostat and Statistics Netherlands (CBS), and carried out by Statistics Netherlands. The HMTS method implemented here is described in Ishaak, Ouwehand and Remøy (2024) <doi:10.1177/0282423X241246617>. For broader methodological context, see Eurostat (2013, ISBN:978-92-79-25984-5, <doi:10.2785/34007>).

License:

GPL-2

Encoding:

UTF-8

LazyData:

true

RoxygenNote:

7.3.1

Depends:

R (≥ 4.4.0)

Imports:

dplyr, stats, assertthat, KFAS, stringr

Suggests:

knitr, rmarkdown, testthat (≥ 3.0.0)

Config/testthat/edition:

VignetteBuilder:

knitr

NeedsCompilation:

Packaged:

2025-04-25 09:20:19 UTC; VGAR

Author:

Farley Ishaak [aut], Pim Ouwehand [aut], David Pietersz [aut], Liu Nuo Su [aut], Cynthia Cao [aut], Mohammed Kardal [aut], Odens van der Zwan [aut], Vivek Gajadhar [aut, cre]

Repository:

CRAN

Date/Publication:

2025-04-25 15:10:01 UTC

Calculate direct index according to the Fisher hedonic double imputation method

Description

By the parameters 'dependent_variable', 'continue_variable' and 'categorical_variables' as regression model is compiled. With the model, a direct series of index figures is estimated by use of hedonic regression.

Usage

calculate_fisher(
  dataset,
  period_variable,
  dependent_variable,
  continuous_variables,
  categorical_variables,
  reference_period = NULL,
  number_of_observations = FALSE
)

Arguments

dataset

table with data (does not need to be a selection of relevant variables)

period_variable

variable in the table with periods

dependent_variable

usually the sale price

continuous_variables

vector with quality determining numeric variables (no dummies)

categorical_variables

vector with quality determining categorical variables (also dummies)

reference_period

period or group of periods that will be set to 100 (numeric/string)

number_of_observations

number of observations per period (default = TRUE)

Details

N.B.: the independent variables must be entered transformed (and ready) in the parameters. Hence, not: log(floor_area), but transform the variable in advance and then provide log_floor_area. This does not count for the dependent variable. This should be entered untransformed

Within the data, it is not neccesary to filter the data on relevant variables or complete records. This is taken care of in the function.

Value

table with index, imputation averages, number of observations and confidence intervals per period

Author(s)

Farley Ishaak

Calculate the geometric average of a series of values

Description

The equation for the calculation is:: exp(mean(log(series_values)))

Usage

calculate_geometric_average(values)

Arguments

values

series with numeric values

Value

geometric average

Author(s)

Farley Ishaak

Calculate imputation averages with the 1st period as base period

Description

Prices are estimated based on a provided Hedonic model The model values are calculated for each period in the data With these values, new prices of base period observations are estimated. With this function, imputations according to the Laspeyres and Paasche method can be estimated.

Usage

calculate_hedonic_imputation(
  dataset_temp = dataset,
  period_temp = "period_var_temp",
  dependent_variable_temp = dependent_variable,
  independent_variables_temp = independent_variables,
  number_of_observations_temp = number_of_observations,
  period_list_temp = period_list
)

Arguments

dataset_temp

table with data

period_temp

'period'

dependent_variable_temp

usually the sale price

independent_variables_temp

vector with quality determining variables

period_list_temp

list with all available periods

Value

Table with imputation averages per period

Author(s)

Farley Ishaak

Calculate a matrix with hedonic imputation averages, re-estimated time series imputation averages and corresponding index series.

Description

Based on a hedonic model, a series of imputed values is calculated in below steps: 1: for every period average imputed prices are estimated with the 1st period as base period. 2: the above is repeated for each possible base period. This result in an equal number of series as the number of periods. 3: All series are re-estimated with a time series model (state space). This step is optionally skipped with a parameter (state_space_model = NULL) 4: the series imputed values are transformed into index series. This matrix can be used for an index calculations according to the HMTS method.

Usage

calculate_hedonic_imputationmatrix(
  dataset,
  period_variable,
  dependent_variable,
  continuous_variables,
  categorical_variables,
  periods_in_year,
  number_of_observations = TRUE,
  production_since = NULL,
  number_preliminary_periods
)

Arguments

dataset

table with data (does not need to be a selection of relevant variables)

period_variable

variable in the dataset with the period

dependent_variable

usually the sale price

continuous_variables

vector with quality-determining continues variables (numeric, no dummies)

categorical_variables

vector with categorical variables (also dummy)

periods_in_year

if month, then 12. If quarter, then 4, etc. (default = 4)

number_of_observations

number of observations per period (default = TRUE)

production_since

1 period in the format of the period_variable. See description above (default = NULL)

number_preliminary_periods

number of periods that the index is preliminary. Only works if production_since <> NULL. default = 3

Details

Parameter 'production_since': To simulate a series, where 1 period a time expires (as in production), a manual choice in the past is possible. Until this period, all periods are imputed. After that, 1 period is added.

Value

$Matrix_HMTS_index table with index series based on estimations with time series re-estimations $Matrix_HMTS table with estimated values based on time series re-estimations $Matrix_HMS_index table with index series based on estimations with the hedonic model $Matrix_HMS table with estimated values based on the hedonic model $Matrix_HMTS_analysis table with analysis values of the time series model per base period

Author(s)

Farley Ishaak

Calculate HMTS index (Hedonic Multilateral Time series re-estimation Splicing)

Description

Based on a hedonic model, an index is calculated in below steps. See also Ishaak, Ouwehand, Remoy & De Haan (2023). 1: for each period, average imputed prices are calculated with the first period as base period. 2: step 1 is repeated for every possible base period. This result in as many series of imputed values as the number of periods. 3: All series with imputed prices are re-estimated with a Kalman filter (also time series model/state space model) This step can be turned off with a parameter. 4: The series of imputed values are transformed into index series. 5: a specified (parameter) window is chosen of index figures that continues in the calculation. This step can be turned off with a parameter. 6: Of the remaining index figures, the geometric average per period is calculated. The remaining index figures form the final index.

Usage

calculate_hmts(
  dataset,
  period_variable,
  dependent_variable,
  continuous_variables,
  categorical_variables,
  reference_period,
  periods_in_year,
  production_since = NULL,
  number_preliminary_periods,
  number_of_observations,
  resting_points
)

Arguments

period_variable

variable in the dataset with the period

dependent_variable

usually the sale price

continuous_variables

vector with quality-determining continues variables (numeric, no dummies)

categorical_variables

vector with categorical variables (also dummy)

reference_period

period or group of periods that will be set to 100 (numeric/string)

periods_in_year

if month, then 12. If quarter, then 4, etc. (default = 4)

production_since

1 period in the format of the period_variable. See description above (default = NULL)

number_preliminary_periods

number of periods that the index is preliminary. Only works if production_since <> NULL. default = 3

number_of_observations

number of observations per period (default = TRUE)

resting_points

should analyses values be returned? (default = FALSE)

Details

Parameter 'resting_points': If TRUE, the output is a list of tables. These tables can be called with a $ after the output. $Index table with periods, index and number of observations $Window table with the index figures within the chosen window $Chosen_index_series table with index series before the window splice $Matrix_HMTS_index table with index series based on re-estimated imputations (time series model) $Matrix_HMTS table with re-estimated imputations (time series model) $Matrix_HMTS_index table with index series based on estimated imputations (hedonic model) $Matrix_HMTS table with estimated imputations (time series model) $Matrix_HMTS_analyse table with diagnostic values of the time series model per base period

Value

table with periods, index (and optional confidence intervals) and number of observations. If resting_points = TRUE, then list with tables. See general description and examples.

Author(s)

Farley Ishaak

Calculate HMTS index only (Hedonic Multilateral Time series re-estimation Splicing)

Description

Usage

calculate_hmts_index(
  dataset,
  period_variable,
  dependent_variable,
  continuous_variables,
  categorical_variables,
  reference_period,
  periods_in_year,
  production_since = NULL,
  number_preliminary_periods,
  number_of_observations = NULL,
  resting_points
)

Arguments

period_variable

variable in the dataset with the period

dependent_variable

usually the sale price

continuous_variables

vector with quality-determining continues variables (numeric, no dummies)

categorical_variables

vector with categorical variables (also dummy)

reference_period

period or group of periods that will be set to 100 (numeric/string)

periods_in_year

if month, then 12. If quarter, then 4, etc. (default = 4)

production_since

1 period in the format of the period_variable. See description above (default = NULL)

number_preliminary_periods

number of periods that the index is preliminary. Only works if production_since <> NULL. default = 3

number_of_observations

number of observations per period (default = TRUE)

resting_points

should analyses values be returned? (default = FALSE)

Details

Parameter 'resting_points': If TRUE, the output is a list of tables. These tables can be called with a $ after the output. $Index table with periods, index and number of observations $Window table with the index figures within the chosen window $Chosen_index_series table with index series before the window splice $Matrix_HMTS_index table with index series based on re-estimated imputations (time series model) $Matrix_HMTS table with re-estimated imputations (time series model) $Matrix_HMTS_index table with index series based on estimated imputations (hedonic model) $Matrix_HMTS table with estimated imputations (time series model)l $Matrix_HMTS_analyse table with diagnostic values of the time series model per base period

Value

table with periods, index and number of observations. If resting_points = TRUE, then list with tables. See general description and examples.

Author(s)

Farley Ishaak

Transform series into index

Description

The index can be calculated in two ways:

from a series of values
from a series of mutations (from_growth_rate = TRUE)

Usage

calculate_index(periods, values, reference_period = NULL)

Arguments

periods

vector/variable with periods (numeric/string)

values

vector/variable with to be transformed values (numeric)

reference_period

period or group of periods that will be set to 100 (numeric/string)

Details

N.B. with from_growth_rate: The series of mutations must be equally long to the series of values. The vector should, therefore, also contain a mutation for the first period (this is likely 1). In the calculation, this first mutation is not used.

N.B. for the reference period: The first value is on default set to 100. An adjusted reference period can be provided in the paramater. The reference period can also be a part of a period. E.g. if the series contains months (2019jan, 2019feb), the reference period can be a year (2019).

Value

Index series

Author(s)

Farley Ishaak

Calculate direct index according to the Laspeyres hedonic double imputation method

Description

Usage

calculate_laspeyres(
  dataset,
  period_variable,
  dependent_variable,
  continuous_variables,
  categorical_variables,
  reference_period = NULL,
  index = TRUE,
  number_of_observations = FALSE,
  imputation = FALSE
)

Arguments

dataset

table with data (does not need to be a selection of relevant variables)

period_variable

variable in the table with periods

dependent_variable

usually the sale price

continuous_variables

vector with quality determining numeric variables (no dummies)

categorical_variables

vector with quality determining categorical variables (also dummies)

reference_period

period or group of periods that will be set to 100 (numeric/string)

index

caprice index

number_of_observations

number of observations per period (default = TRUE)

imputation

display the underlying average imputation values? (default = FALSE)

Details

Within the data, it is not necessary to filter the data on relevant variables or complete records. This is taken care of in the function.

Value

table with index, imputation averages, number of observations and confidence intervals per period

Author(s)

Farley Ishaak

Calculate direct index according to the Paasche hedonic double imputation method

Description

Usage

calculate_paasche(
  dataset,
  period_variable,
  dependent_variable,
  continuous_variables,
  categorical_variables,
  reference_period = NULL,
  index = TRUE,
  number_of_observations = FALSE,
  imputation = FALSE
)

Arguments

dataset

table with data (does not need to be a selection of relevant variables)

period_variable

variable in the table with periods

dependent_variable

usually the sale price

continuous_variables

vector with quality determining numeric variables (no dummies)

categorical_variables

vector with quality determining categorical variables (also dummies)

reference_period

period or group of periods that will be set to 100 (numeric/string)

index

caprice index

number_of_observations

number of observations per period (default = TRUE)

imputation

display the underlying average imputation values? (default = FALSE)

Details

Within the data, it is not necessary to filter the data on relevant variables or complete records. This is taken care of in the function.

Value

table with index, imputation averages, number of observations and confidence intervals per period

Author(s)

Farley Ishaak

Calculate index based on specified method (Fisher, Laspeyres, Paasche, HMTS)

Description

Central hub function to calculate index figures using different methods.

Usage

calculate_price_index(
  method,
  dataset,
  period_variable,
  dependent_variable,
  continuous_variables,
  categorical_variables,
  reference_period = NULL,
  number_of_observations = TRUE,
  periods_in_year = 4,
  production_since = NULL,
  number_preliminary_periods = 3,
  resting_points = FALSE,
  index = TRUE,
  imputation = FALSE
)

Arguments

method

one of: "fisher", "laspeyres", "paasche", "hmts"

dataset

data frame with input data

period_variable

name of the variable indicating time periods

dependent_variable

usually the price

continuous_variables

vector with numeric quality-determining variables

categorical_variables

vector with categorical variables (also dummies)

reference_period

period or group of periods that will be set to 100

number_of_observations

show number of observations? Default = TRUE

periods_in_year

(HMTS only) number of periods per year (e.g. 12 for months)

production_since

(HMTS only) start period for production simulation

number_preliminary_periods

(HMTS only) number of preliminary periods

resting_points

(HMTS only) return detailed outputs? Default = FALSE

index

(Laspeyres/Paasche only) include index column? Default = TRUE

imputation

(Laspeyres/Paasche only) include imputation values? Default = FALSE

Value

A data.frame (or list for when method is HMTS with resting_points = TRUE)

Author(s)

Vivek Gajadhar

Examples

# Laspeyres index
Tbl_Laspeyres <- calculate_price_index(
  method = "laspeyres",
  dataset = data_constraxion,
  period_variable = "period",
  dependent_variable = "price",
  continuous_variables = "floor_area",
  categorical_variables = "neighbourhood_code",
  reference_period = 2015,
  number_of_observations = TRUE,
  imputation = FALSE
)
head(Tbl_Laspeyres)

# Paasche index
Tbl_Paasche <- calculate_price_index(
  method = "paasche",
  dataset = data_constraxion,
  period_variable = "period",
  dependent_variable = "price",
  continuous_variables = "floor_area",
  categorical_variables = "neighbourhood_code",
  reference_period = 2015,
  number_of_observations = TRUE,
  imputation = FALSE
)
head(Tbl_Paasche)

# Fisher index (geometric mean of Laspeyres and Paasche)
Tbl_Fisher <- calculate_price_index(
  method = "fisher",
  dataset = data_constraxion,
  period_variable = "period",
  dependent_variable = "price",
  continuous_variables = "floor_area",
  categorical_variables = "neighbourhood_code",
  reference_period = 2015,
  number_of_observations = TRUE
)
head(Tbl_Fisher)

Calculate the trend line for a provided time series of numeric values

Description

Calculate the trend line with the state space method for a provided time series (chronological order is assumed). The series are calculated with the package KFAS.

Usage

calculate_trend_line_kfas(original_series, periodicity, resting_points)

Arguments

original_series

time series with values in chrolological order

periodicity

if month, then 12. If quarter, then 4, etc. (defaul = 4)

resting_points

should analyses values be returned? (default = FALSE)

Value

Trend line

Author(s)

Pim Ouwehand, Farley Ishaak

Default update function

Description

This function is used in the function: calculate_trend_line_KFAS()

Usage

custom_update_function(params, model)

Arguments

params

startvalues

model

state space modelnumber

Value

Newmodel

Author(s)

Vivek Gajadhar

A real estate example dataframe

Description

A subset of data from a fictitious real estate data frame containing transaction prices and some categorical and numerical characteristics of each dwelling.

Usage

data_constraxion

Format

A data frame with 7,800 rows and 6 columns:

period: A (string) vector indicating a time period
price: A (string) vector indicating the transaction price of the dwelling
floor_area: A real-valued vector of (the logarithm of) the floor area of the dwelling
dist_trainstation: A real-valued vector of (the logarithm of) the distance of the dwelling to the nearest train station
neighbourhood_code: A categorical code/string referring to the neighbourhood the dwelling belongs to
dummy_large_city: A vector indicating whether the dwelling belongs to a large city or not

Source

A fictitious dataset for illustration purposes

Examples

data(data_constraxion)
head(data_constraxion)

Determine_initial_parameters

Description

Determine startvalues within state space models This function is used in the function: calculate_trend_line_KFAS()

Usage

determine_initial_parameters(
  model,
  initial_values,
  FUN = custom_update_function
)

Arguments

model

modelvalues as output of the function select_state_space_model()

initial_values

$initial.values as output of the model

FUN

function called: custom_update_function

Value

New initial startvalues

Author(s)

Pim Ouwehand, Farley Ishaak

Estimate time series parameters

Description

Estimate parameters to estimate trend lines This function is used in the function: calculate_trend_line_KFAS()#'

Usage

estimate_ts_parameters(model, initial_values)

Arguments

model

model values as output of the function select_state_space_model()

initial_values

$initial.values as output of the model

Value

Parameter for the time series model

Author(s)

Pim Ouwehand, Farley Ishaak

Select the state space model type

Description

This function is used in the function: calculate_trend_line_KFAS()

Usage

select_state_space_model(series, initial_values_all)

Arguments

series

time series with values in chronological order

initial_values_all

start values for 5 hyperparameters: meas, level, slope, seas, scaling

Value

modelvalues (level, slope) of the chosen state space model and the provided time series

Author(s)

Pim Ouwehand

Set starting values for hyperparameters in state space models

Description

Set starting values for hyperparameters in state space models

Usage

set_startvalues(a, b, c, d, e)

Value

starting values for hyperparameters

Run forward and backward pass of time series estimation

Description

Calculate a trend line based on a provided model. This function is used in the function: calculate_trend_line_KFAS()

Usage

smooth_ts(fittedmodel)

Arguments

fittedmodel

model values as output of the function estimate.TS.parameters()

Value

A list containing multiple elements; sub-list signalsubconf[, 1] provides the estimated trend line.

Author(s)

Pim Ouwehand, Farley Ishaak

Validate Input Data for Hedonic Index Calculation

Description

This function checks whether the dataset contains all required variables, whether the dependent and continuous variables are numeric, and whether the period variable is formatted correctly (e.g., "2020Q1", "2020M01"). It ensures that the data is suitable for further processing in hedonic index calculations.

Usage

validate_input(
  dataset,
  period_variable,
  dependent_variable,
  continuous_variables,
  categorical_variables
)

Arguments

dataset

A data.frame containing the dataset to be validated.

period_variable

A string specifying the name of the period variable column.

dependent_variable

A string specifying the name of the dependent variable (usually the sale price).

continuous_variables

A character vector with names of numeric quality-determining variables.

categorical_variables

A character vector with names of categorical variables (including dummies).

Value

Returns TRUE invisibly if all checks pass. Otherwise, an error is thrown.

Author(s)

David Pietersz