Type: | Package |
Title: | Hedonic and Multilateral Index Methods for Real Estate Price Statistics |
Version: | 0.1.0 |
Maintainer: | Vivek Gajadhar <v.gajadhar@cbs.nl> |
Description: | Compute price indices using various Hedonic and multilateral methods, including Laspeyres, Paasche, Fisher, and HMTS (Hedonic Multilateral Time series re-estimation with splicing). The central function calculate_price_index() offers a unified interface for running these methods on structured datasets. This package is designed to support index construction workflows for real estate and other domains where quality-adjusted price comparisons over time are essential. The development of this package was funded by Eurostat and Statistics Netherlands (CBS), and carried out by Statistics Netherlands. The HMTS method implemented here is described in Ishaak, Ouwehand and Remøy (2024) <doi:10.1177/0282423X241246617>. For broader methodological context, see Eurostat (2013, ISBN:978-92-79-25984-5, <doi:10.2785/34007>). |
License: | GPL-2 |
Encoding: | UTF-8 |
LazyData: | true |
RoxygenNote: | 7.3.1 |
Depends: | R (≥ 4.4.0) |
Imports: | dplyr, stats, assertthat, KFAS, stringr |
Suggests: | knitr, rmarkdown, testthat (≥ 3.0.0) |
Config/testthat/edition: | 3 |
VignetteBuilder: | knitr |
NeedsCompilation: | no |
Packaged: | 2025-04-25 09:20:19 UTC; VGAR |
Author: | Farley Ishaak [aut], Pim Ouwehand [aut], David Pietersz [aut], Liu Nuo Su [aut], Cynthia Cao [aut], Mohammed Kardal [aut], Odens van der Zwan [aut], Vivek Gajadhar [aut, cre] |
Repository: | CRAN |
Date/Publication: | 2025-04-25 15:10:01 UTC |
Calculate direct index according to the Fisher hedonic double imputation method
Description
By the parameters 'dependent_variable', 'continue_variable' and 'categorical_variables' as regression model is compiled. With the model, a direct series of index figures is estimated by use of hedonic regression.
Usage
calculate_fisher(
dataset,
period_variable,
dependent_variable,
continuous_variables,
categorical_variables,
reference_period = NULL,
number_of_observations = FALSE
)
Arguments
dataset |
table with data (does not need to be a selection of relevant variables) |
period_variable |
variable in the table with periods |
dependent_variable |
usually the sale price |
continuous_variables |
vector with quality determining numeric variables (no dummies) |
categorical_variables |
vector with quality determining categorical variables (also dummies) |
reference_period |
period or group of periods that will be set to 100 (numeric/string) |
number_of_observations |
number of observations per period (default = TRUE) |
Details
N.B.: the independent variables must be entered transformed (and ready) in the parameters. Hence, not: log(floor_area), but transform the variable in advance and then provide log_floor_area. This does not count for the dependent variable. This should be entered untransformed
Within the data, it is not neccesary to filter the data on relevant variables or complete records. This is taken care of in the function.
Value
table with index, imputation averages, number of observations and confidence intervals per period
Author(s)
Farley Ishaak
Calculate the geometric average of a series of values
Description
The equation for the calculation is:: exp(mean(log(series_values)))
Usage
calculate_geometric_average(values)
Arguments
values |
series with numeric values |
Value
geometric average
Author(s)
Farley Ishaak
Calculate imputation averages with the 1st period as base period
Description
Prices are estimated based on a provided Hedonic model The model values are calculated for each period in the data With these values, new prices of base period observations are estimated. With this function, imputations according to the Laspeyres and Paasche method can be estimated.
Usage
calculate_hedonic_imputation(
dataset_temp = dataset,
period_temp = "period_var_temp",
dependent_variable_temp = dependent_variable,
independent_variables_temp = independent_variables,
number_of_observations_temp = number_of_observations,
period_list_temp = period_list
)
Arguments
dataset_temp |
table with data |
period_temp |
'period' |
dependent_variable_temp |
usually the sale price |
independent_variables_temp |
vector with quality determining variables |
period_list_temp |
list with all available periods |
Value
Table with imputation averages per period
Author(s)
Farley Ishaak
Calculate a matrix with hedonic imputation averages, re-estimated time series imputation averages and corresponding index series.
Description
Based on a hedonic model, a series of imputed values is calculated in below steps: 1: for every period average imputed prices are estimated with the 1st period as base period. 2: the above is repeated for each possible base period. This result in an equal number of series as the number of periods. 3: All series are re-estimated with a time series model (state space). This step is optionally skipped with a parameter (state_space_model = NULL) 4: the series imputed values are transformed into index series. This matrix can be used for an index calculations according to the HMTS method.
Usage
calculate_hedonic_imputationmatrix(
dataset,
period_variable,
dependent_variable,
continuous_variables,
categorical_variables,
periods_in_year,
number_of_observations = TRUE,
production_since = NULL,
number_preliminary_periods
)
Arguments
dataset |
table with data (does not need to be a selection of relevant variables) |
period_variable |
variable in the dataset with the period |
dependent_variable |
usually the sale price |
continuous_variables |
vector with quality-determining continues variables (numeric, no dummies) |
categorical_variables |
vector with categorical variables (also dummy) |
periods_in_year |
if month, then 12. If quarter, then 4, etc. (default = 4) |
number_of_observations |
number of observations per period (default = TRUE) |
production_since |
1 period in the format of the period_variable. See description above (default = NULL) |
number_preliminary_periods |
number of periods that the index is preliminary. Only works if production_since <> NULL. default = 3 |
Details
Parameter 'production_since': To simulate a series, where 1 period a time expires (as in production), a manual choice in the past is possible. Until this period, all periods are imputed. After that, 1 period is added.
Value
$Matrix_HMTS_index table with index series based on estimations with time series re-estimations $Matrix_HMTS table with estimated values based on time series re-estimations $Matrix_HMS_index table with index series based on estimations with the hedonic model $Matrix_HMS table with estimated values based on the hedonic model $Matrix_HMTS_analysis table with analysis values of the time series model per base period
Author(s)
Farley Ishaak
Calculate HMTS index (Hedonic Multilateral Time series re-estimation Splicing)
Description
Based on a hedonic model, an index is calculated in below steps. See also Ishaak, Ouwehand, Remoy & De Haan (2023). 1: for each period, average imputed prices are calculated with the first period as base period. 2: step 1 is repeated for every possible base period. This result in as many series of imputed values as the number of periods. 3: All series with imputed prices are re-estimated with a Kalman filter (also time series model/state space model) This step can be turned off with a parameter. 4: The series of imputed values are transformed into index series. 5: a specified (parameter) window is chosen of index figures that continues in the calculation. This step can be turned off with a parameter. 6: Of the remaining index figures, the geometric average per period is calculated. The remaining index figures form the final index.
Usage
calculate_hmts(
dataset,
period_variable,
dependent_variable,
continuous_variables,
categorical_variables,
reference_period,
periods_in_year,
production_since = NULL,
number_preliminary_periods,
number_of_observations,
resting_points
)
Arguments
period_variable |
variable in the dataset with the period |
dependent_variable |
usually the sale price |
continuous_variables |
vector with quality-determining continues variables (numeric, no dummies) |
categorical_variables |
vector with categorical variables (also dummy) |
reference_period |
period or group of periods that will be set to 100 (numeric/string) |
periods_in_year |
if month, then 12. If quarter, then 4, etc. (default = 4) |
production_since |
1 period in the format of the period_variable. See description above (default = NULL) |
number_preliminary_periods |
number of periods that the index is preliminary. Only works if production_since <> NULL. default = 3 |
number_of_observations |
number of observations per period (default = TRUE) |
resting_points |
should analyses values be returned? (default = FALSE) |
Details
Parameter 'production_since': To simulate a series, where 1 period a time expires (as in production), a manual choice in the past is possible. Until this period, all periods are imputed. After that, 1 period is added.
Parameter 'resting_points': If TRUE, the output is a list of tables. These tables can be called with a $ after the output. $Index table with periods, index and number of observations $Window table with the index figures within the chosen window $Chosen_index_series table with index series before the window splice $Matrix_HMTS_index table with index series based on re-estimated imputations (time series model) $Matrix_HMTS table with re-estimated imputations (time series model) $Matrix_HMTS_index table with index series based on estimated imputations (hedonic model) $Matrix_HMTS table with estimated imputations (time series model) $Matrix_HMTS_analyse table with diagnostic values of the time series model per base period
Value
$Matrix_HMTS_index table with index series based on estimations with time series re-estimations $Matrix_HMTS table with estimated values based on time series re-estimations $Matrix_HMS_index table with index series based on estimations with the hedonic model $Matrix_HMS table with estimated values based on the hedonic model $Matrix_HMTS_analysis table with analysis values of the time series model per base period
table with periods, index (and optional confidence intervals) and number of observations. If resting_points = TRUE, then list with tables. See general description and examples.
Author(s)
Farley Ishaak
Calculate HMTS index only (Hedonic Multilateral Time series re-estimation Splicing)
Description
Based on a hedonic model, an index is calculated in below steps. See also Ishaak, Ouwehand, Remoy & De Haan (2023). 1: for each period, average imputed prices are calculated with the first period as base period. 2: step 1 is repeated for every possible base period. This result in as many series of imputed values as the number of periods. 3: All series with imputed prices are re-estimated with a Kalman filter (also time series model/state space model) This step can be turned off with a parameter. 4: The series of imputed values are transformed into index series. 5: a specified (parameter) window is chosen of index figures that continues in the calculation. This step can be turned off with a parameter. 6: Of the remaining index figures, the geometric average per period is calculated. The remaining index figures form the final index.
Usage
calculate_hmts_index(
dataset,
period_variable,
dependent_variable,
continuous_variables,
categorical_variables,
reference_period,
periods_in_year,
production_since = NULL,
number_preliminary_periods,
number_of_observations = NULL,
resting_points
)
Arguments
period_variable |
variable in the dataset with the period |
dependent_variable |
usually the sale price |
continuous_variables |
vector with quality-determining continues variables (numeric, no dummies) |
categorical_variables |
vector with categorical variables (also dummy) |
reference_period |
period or group of periods that will be set to 100 (numeric/string) |
periods_in_year |
if month, then 12. If quarter, then 4, etc. (default = 4) |
production_since |
1 period in the format of the period_variable. See description above (default = NULL) |
number_preliminary_periods |
number of periods that the index is preliminary. Only works if production_since <> NULL. default = 3 |
number_of_observations |
number of observations per period (default = TRUE) |
resting_points |
should analyses values be returned? (default = FALSE) |
Details
Parameter 'production_since': To simulate a series, where 1 period a time expires (as in production), a manual choice in the past is possible. Until this period, all periods are imputed. After that, 1 period is added.
Parameter 'resting_points': If TRUE, the output is a list of tables. These tables can be called with a $ after the output. $Index table with periods, index and number of observations $Window table with the index figures within the chosen window $Chosen_index_series table with index series before the window splice $Matrix_HMTS_index table with index series based on re-estimated imputations (time series model) $Matrix_HMTS table with re-estimated imputations (time series model) $Matrix_HMTS_index table with index series based on estimated imputations (hedonic model) $Matrix_HMTS table with estimated imputations (time series model)l $Matrix_HMTS_analyse table with diagnostic values of the time series model per base period
Value
$Matrix_HMTS_index table with index series based on estimations with time series re-estimations $Matrix_HMTS table with estimated values based on time series re-estimations $Matrix_HMS_index table with index series based on estimations with the hedonic model $Matrix_HMS table with estimated values based on the hedonic model $Matrix_HMTS_analysis table with analysis values of the time series model per base period
table with periods, index and number of observations. If resting_points = TRUE, then list with tables. See general description and examples.
Author(s)
Farley Ishaak
Transform series into index
Description
The index can be calculated in two ways:
from a series of values
from a series of mutations (from_growth_rate = TRUE)
Usage
calculate_index(periods, values, reference_period = NULL)
Arguments
periods |
vector/variable with periods (numeric/string) |
values |
vector/variable with to be transformed values (numeric) |
reference_period |
period or group of periods that will be set to 100 (numeric/string) |
Details
N.B. with from_growth_rate: The series of mutations must be equally long to the series of values. The vector should, therefore, also contain a mutation for the first period (this is likely 1). In the calculation, this first mutation is not used.
N.B. for the reference period: The first value is on default set to 100. An adjusted reference period can be provided in the paramater. The reference period can also be a part of a period. E.g. if the series contains months (2019jan, 2019feb), the reference period can be a year (2019).
Value
Index series
Author(s)
Farley Ishaak
Calculate direct index according to the Laspeyres hedonic double imputation method
Description
By the parameters 'dependent_variable', 'continue_variable' and 'categorical_variables' as regression model is compiled. With the model, a direct series of index figures is estimated by use of hedonic regression.
Usage
calculate_laspeyres(
dataset,
period_variable,
dependent_variable,
continuous_variables,
categorical_variables,
reference_period = NULL,
index = TRUE,
number_of_observations = FALSE,
imputation = FALSE
)
Arguments
dataset |
table with data (does not need to be a selection of relevant variables) |
period_variable |
variable in the table with periods |
dependent_variable |
usually the sale price |
continuous_variables |
vector with quality determining numeric variables (no dummies) |
categorical_variables |
vector with quality determining categorical variables (also dummies) |
reference_period |
period or group of periods that will be set to 100 (numeric/string) |
index |
caprice index |
number_of_observations |
number of observations per period (default = TRUE) |
imputation |
display the underlying average imputation values? (default = FALSE) |
Details
N.B.: the independent variables must be entered transformed (and ready) in the parameters. Hence, not: log(floor_area), but transform the variable in advance and then provide log_floor_area. This does not count for the dependent variable. This should be entered untransformed/
Within the data, it is not necessary to filter the data on relevant variables or complete records. This is taken care of in the function.
Value
table with index, imputation averages, number of observations and confidence intervals per period
Author(s)
Farley Ishaak
Calculate direct index according to the Paasche hedonic double imputation method
Description
By the parameters 'dependent_variable', 'continue_variable' and 'categorical_variables' as regression model is compiled. With the model, a direct series of index figures is estimated by use of hedonic regression.
Usage
calculate_paasche(
dataset,
period_variable,
dependent_variable,
continuous_variables,
categorical_variables,
reference_period = NULL,
index = TRUE,
number_of_observations = FALSE,
imputation = FALSE
)
Arguments
dataset |
table with data (does not need to be a selection of relevant variables) |
period_variable |
variable in the table with periods |
dependent_variable |
usually the sale price |
continuous_variables |
vector with quality determining numeric variables (no dummies) |
categorical_variables |
vector with quality determining categorical variables (also dummies) |
reference_period |
period or group of periods that will be set to 100 (numeric/string) |
index |
caprice index |
number_of_observations |
number of observations per period (default = TRUE) |
imputation |
display the underlying average imputation values? (default = FALSE) |
Details
N.B.: the independent variables must be entered transformed (and ready) in the parameters. Hence, not: log(floor_area), but transform the variable in advance and then provide log_floor_area. This does not count for the dependent variable. This should be entered untransformed
Within the data, it is not necessary to filter the data on relevant variables or complete records. This is taken care of in the function.
Value
table with index, imputation averages, number of observations and confidence intervals per period
Author(s)
Farley Ishaak
Calculate index based on specified method (Fisher, Laspeyres, Paasche, HMTS)
Description
Central hub function to calculate index figures using different methods.
Usage
calculate_price_index(
method,
dataset,
period_variable,
dependent_variable,
continuous_variables,
categorical_variables,
reference_period = NULL,
number_of_observations = TRUE,
periods_in_year = 4,
production_since = NULL,
number_preliminary_periods = 3,
resting_points = FALSE,
index = TRUE,
imputation = FALSE
)
Arguments
method |
one of: "fisher", "laspeyres", "paasche", "hmts" |
dataset |
data frame with input data |
period_variable |
name of the variable indicating time periods |
dependent_variable |
usually the price |
continuous_variables |
vector with numeric quality-determining variables |
categorical_variables |
vector with categorical variables (also dummies) |
reference_period |
period or group of periods that will be set to 100 |
number_of_observations |
show number of observations? Default = TRUE |
periods_in_year |
(HMTS only) number of periods per year (e.g. 12 for months) |
production_since |
(HMTS only) start period for production simulation |
number_preliminary_periods |
(HMTS only) number of preliminary periods |
resting_points |
(HMTS only) return detailed outputs? Default = FALSE |
index |
(Laspeyres/Paasche only) include index column? Default = TRUE |
imputation |
(Laspeyres/Paasche only) include imputation values? Default = FALSE |
Value
A data.frame (or list for when method is HMTS with resting_points = TRUE)
Author(s)
Vivek Gajadhar
Examples
# Laspeyres index
Tbl_Laspeyres <- calculate_price_index(
method = "laspeyres",
dataset = data_constraxion,
period_variable = "period",
dependent_variable = "price",
continuous_variables = "floor_area",
categorical_variables = "neighbourhood_code",
reference_period = 2015,
number_of_observations = TRUE,
imputation = FALSE
)
head(Tbl_Laspeyres)
# Paasche index
Tbl_Paasche <- calculate_price_index(
method = "paasche",
dataset = data_constraxion,
period_variable = "period",
dependent_variable = "price",
continuous_variables = "floor_area",
categorical_variables = "neighbourhood_code",
reference_period = 2015,
number_of_observations = TRUE,
imputation = FALSE
)
head(Tbl_Paasche)
# Fisher index (geometric mean of Laspeyres and Paasche)
Tbl_Fisher <- calculate_price_index(
method = "fisher",
dataset = data_constraxion,
period_variable = "period",
dependent_variable = "price",
continuous_variables = "floor_area",
categorical_variables = "neighbourhood_code",
reference_period = 2015,
number_of_observations = TRUE
)
head(Tbl_Fisher)
Calculate the trend line for a provided time series of numeric values
Description
Calculate the trend line with the state space method for a provided time series (chronological order is assumed). The series are calculated with the package KFAS.
Usage
calculate_trend_line_kfas(original_series, periodicity, resting_points)
Arguments
original_series |
time series with values in chrolological order |
periodicity |
if month, then 12. If quarter, then 4, etc. (defaul = 4) |
resting_points |
should analyses values be returned? (default = FALSE) |
Value
Trend line
Author(s)
Pim Ouwehand, Farley Ishaak
Default update function
Description
This function is used in the function: calculate_trend_line_KFAS()
Usage
custom_update_function(params, model)
Arguments
params |
startvalues |
model |
state space modelnumber |
Value
Newmodel
Author(s)
Vivek Gajadhar
A real estate example dataframe
Description
A subset of data from a fictitious real estate data frame containing transaction prices and some categorical and numerical characteristics of each dwelling.
Usage
data_constraxion
Format
A data frame with 7,800 rows and 6 columns:
- period
A (string) vector indicating a time period
- price
A (string) vector indicating the transaction price of the dwelling
- floor_area
A real-valued vector of (the logarithm of) the floor area of the dwelling
- dist_trainstation
A real-valued vector of (the logarithm of) the distance of the dwelling to the nearest train station
- neighbourhood_code
A categorical code/string referring to the neighbourhood the dwelling belongs to
- dummy_large_city
A vector indicating whether the dwelling belongs to a large city or not
Source
A fictitious dataset for illustration purposes
Examples
data(data_constraxion)
head(data_constraxion)
Determine_initial_parameters
Description
Determine startvalues within state space models This function is used in the function: calculate_trend_line_KFAS()
Usage
determine_initial_parameters(
model,
initial_values,
FUN = custom_update_function
)
Arguments
model |
modelvalues as output of the function select_state_space_model() |
initial_values |
$initial.values as output of the model |
FUN |
function called: custom_update_function |
Value
New initial startvalues
Author(s)
Pim Ouwehand, Farley Ishaak
Estimate time series parameters
Description
Estimate parameters to estimate trend lines This function is used in the function: calculate_trend_line_KFAS()#'
Usage
estimate_ts_parameters(model, initial_values)
Arguments
model |
model values as output of the function select_state_space_model() |
initial_values |
$initial.values as output of the model |
Value
Parameter for the time series model
Author(s)
Pim Ouwehand, Farley Ishaak
Select the state space model type
Description
This function is used in the function: calculate_trend_line_KFAS()
Usage
select_state_space_model(series, initial_values_all)
Arguments
series |
time series with values in chronological order |
initial_values_all |
start values for 5 hyperparameters: meas, level, slope, seas, scaling |
Value
modelvalues (level, slope) of the chosen state space model and the provided time series
Author(s)
Pim Ouwehand
Set starting values for hyperparameters in state space models
Description
Set starting values for hyperparameters in state space models
Usage
set_startvalues(a, b, c, d, e)
Value
starting values for hyperparameters
Run forward and backward pass of time series estimation
Description
Calculate a trend line based on a provided model. This function is used in the function: calculate_trend_line_KFAS()
Usage
smooth_ts(fittedmodel)
Arguments
fittedmodel |
model values as output of the function estimate.TS.parameters() |
Value
A list containing multiple elements; sub-list signalsubconf[, 1]
provides the estimated trend line.
Author(s)
Pim Ouwehand, Farley Ishaak
Validate Input Data for Hedonic Index Calculation
Description
This function checks whether the dataset contains all required variables, whether the dependent and continuous variables are numeric, and whether the period variable is formatted correctly (e.g., "2020Q1", "2020M01"). It ensures that the data is suitable for further processing in hedonic index calculations.
Usage
validate_input(
dataset,
period_variable,
dependent_variable,
continuous_variables,
categorical_variables
)
Arguments
dataset |
A data.frame containing the dataset to be validated. |
period_variable |
A string specifying the name of the period variable column. |
dependent_variable |
A string specifying the name of the dependent variable (usually the sale price). |
continuous_variables |
A character vector with names of numeric quality-determining variables. |
categorical_variables |
A character vector with names of categorical variables (including dummies). |
Value
Returns TRUE invisibly if all checks pass. Otherwise, an error is thrown.
Author(s)
David Pietersz