---
title: "Temporal Data Manipulation with m61r"
author: "pv71u98h1"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Temporal Data Manipulation with m61r}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r setup, include=FALSE}
library(m61r)

```

# Introduction

Leveraging the memory efficiency of Base R, the m61r package provides tools for manipulating time series, such as expanding time intervals into discrete slots or performing window-based aggregations.
# Temporal Function Reference

In `m61r`, temporal operations rely on R primitives executed inside `mutate()` and `summarise()`.

| Operation | Base R Expression (`m61r`) |
| --- | --- |
| Cast to `Date` | `~as.Date(col)` |
| Cast to `Datetime` | `~as.POSIXct(col)` |
| `Datetime Sequence` | `~seq(from, to, by="1 day")` |
| Extract Hour | `~as.POSIXlt(col)$hour` |

# Formatting and Component Extraction

The `mutate()` and `transmutate()` methods allow for the creation of derived columns using standard R `format()` codes.

| Component | R Format Code | Example Expression |
| --- | --- | --- |
| Year | `%Y` | `year = ~format(dt, "%Y")` |
| Month (numeric) | `%m` | `month = ~format(dt, "%m")` |
| Day of the week | `%w` (0-6) | `weekday = ~format(dt, "%w")` |
| Hour (24h) | `%H` | `hour = ~format(dt, "%H")` |
| Grouping Key | `%Y-%m` | `ym_key = ~format(dt, "%Y-%m")` |

# Practical Examples

## Component Extraction

```{r example1}
raw_data <- data.frame(
  timestamp = as.POSIXct(c("2023-01-01 10:00:00", "2023-01-01 11:30:00")),
  value = c(10, 20)
)
data <- m61r(raw_data)

data$mutate(
  year  = ~format(timestamp, "%Y"),
  hour  = ~as.POSIXlt(timestamp)$hour
)

```

# Interval Expansion

When having a "start" and an "end" time, a powerful way to analyse the data is to "explode" the intervals into individual rows.

## 1. Creating Sequences

```{r step1}
df_intervals <- data.frame(
  id = 1,
  start = as.POSIXct("2025-01-01 08:00"),
  end   = as.POSIXct("2025-01-01 10:00"),
  load  = 100
)

p <- m61r(df_intervals)
p$mutate(slot = ~Map(function(s, e) seq(s, e, by = "hour"), start, end))

```

## 2. Structural Explosion

The `explode` method flattens the list-column, duplicating other column values for each element in the sequence.

```{r step2}
p$explode("slot")
p$head()

```

# Approximate Matching with As-Of Joins

As-of joins are essential for synchronizing two time-series where timestamps do not match exactly.

```{r as_of_example}
prices <- data.frame(
  ts = as.POSIXct("2025-01-01 08:00") + c(0, 3600),
  val = c(10, 12)
)

df_sync <- data.frame(ts = as.POSIXct("2025-01-01 09:30"), event = "Trade")
p_sync <- m61r(df_sync)
p_sync$join_asof(prices, by_x = "ts", by_y = "ts", direction = "backward")

```

# Date Extraction Example

The `m61r` object handles `Date`-class objects just as efficiently as `POSIXct`.

```{r step4}
df_dates <- data.frame(
  id = 1:2,
  entry = as.Date(c("2020-01-01", "2021-01-01")),
  exit  = as.Date(c("2020-06-01", "2021-06-01"))
)

p_dates <- m61r(df_dates)
p_dates$mutate(year = ~as.POSIXlt(entry)$year + 1900)
p_dates$head()

```

# Time-Based Aggregation

A common task in time-series analysis is to bin data into specific time intervals (e.g., hourly or daily) and compute statistics. The following example demonstrates how to create a "bin" key and use it for grouping.

## Hourly Summary Example
```{r temporal_aggregation}
# Create a dataset with random timestamps within a day
set.seed(123)
ts_data <- data.frame(
  time = as.POSIXct("2025-01-01 00:00:00") + runif(50, 0, 86400),
  consumption = rnorm(50, 500, 100)
)

p_agg <- m61r(ts_data)

p_agg$mutate(hour_bin = ~format(time, "%Y-%m-%d %H:00"))
p_agg$group_by(~hour_bin)
p_agg$summarise(
  n_obs = ~length(consumption),
  avg_load = ~mean(consumption)
)

p_agg$head(5)
```

# Conclusion

By staying true to Base R, `m61r` ensures that your temporal workflows remain portable, fast, and light.