---
title: "fossil_commecometrics"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{fossil_commecometrics}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

```{r setup}
library(commecometrics)
library(ggplot2)
library(sf)
```
## 1. Overview
The `commecometrics` package offers a flexible framework for modeling trait–environment relationships and applying these models to fossil data. This vignette demonstrates how the `commecometrics` package can reconstruct past environments from fossil community traits. Environmental conditions such as precipitation or vegetation class can be estimated at fossil localities based on where their trait combinations fall within the modern trait space. For a full description of the modern trait summarization and model fitting workflow, please refer to the introductory vignette (`introduction_to_commecometrics`).

## 2. Data Loading

We begin by loading:

- **Modern data:**
  + `samplingPoints`: Community locations and environmental conditions  
  + `traits`: Species-level trait data  
  + `geography`: Species range shapefiles  

- **Fossil data:**
  + `fossils`: Mean and standard deviation of trait values for seven North American fossil sites

```{r set-working-directory, echo=FALSE, eval = FALSE}
options(timeout = 600)
download.file("https://ndownloader.figshare.com/files/56228033", destfile = "data.zip", mode = "wb")
unzip("data.zip")
```

```{r eval = FALSE}
samplingPoints <- read.csv("data/sampling_points.csv")
traits <- read.csv("data/traits.csv")
fossils <- read.csv("data/fossil_RBL.csv")
head(fossils)
```

```{r load-shapefile, message = FALSE, eval = FALSE}
geography <- sf::st_read("data/data_0.shp", quiet = TRUE)
geography$SCI_NAME <- gsub(" ", "_", geography$SCI_NAME)
```

## 3. Summarize Traits by Sampling Point
We calculate community-level trait mean and standard standard deviations by intersecting species ranges with community locations: 
``` {r eval = FALSE}
traitsByPoint <- summarize_traits_by_point(
  points_df = samplingPoints,
  trait_df = traits,
  species_polygons = geography,
  trait_column = "RBL",
  species_name_col = "SCI_NAME",
  continent = FALSE,
  parallel = FALSE
)
```

## 4. Run the ecometric model
We model precipitation as a function of trait mean and standard deviation, in this case, manually setting a 25 x 25 grid:
``` {r eval = FALSE}
ecoModel <- ecometric_model(
  points_df = traitsByPoint$points,
  env_var = "precip",
  transform_fun = function(x) log(x + 1),
  inv_transform_fun = function(x) exp(x) - 1,
  grid_bins_1 = 25,
  grid_bins_2 = 25,
  min_species = 3
)

summary(ecoModel$model)

print(ecoModel$correlation)
```

## 5. Visualize the modern ecometric space
This plot illustrates the trait–environment relationship across modern communities. Each grid cell (bin) corresponds to a specific range of community mean and standard deviation values. Communities whose trait summaries fall within that range are grouped into the same bin. The color of each bin represents the estimated environmental value (e.g., log-precipitation).
``` {r fig.width=5, fig.height=4, eval = FALSE}
ecoPlot <- ecometric_space(
  model_out = ecoModel,
  env_name = "Precipitation (log)",
  x_label = "Community mean",
  y_label = "Community standard deviation"
) 

print(ecoPlot)
```

## 6. Reconstruct Past Environments from Fossils
We now apply the trained model to fossil trait data to estimate paleoenvironmental conditions at each site. Fossils are matched to the closest trait bin, and predictions are only returned for analog trait combinations. Sites falling outside the trait space observed in the modern training data are flagged as non-analog. Here, we see Brynjulfson and Friesenhahn Caves are expected to have had the highest precipitation, whereas January Cave is expected to have had the lowest precipitation.
``` {r eval = FALSE}
recon <- reconstruct_env(
  fossildata = fossils,
  model_out = ecoModel,
  match_nearest = TRUE,
  fossil_lon = "Long",
  fossil_lat = "Lat",
  modern_id = "GlobalID",
  modern_lon = "Longitude",
  modern_lat = "Latitude"
)

head(recon[, c("Site", "fossil_env_est_UN", "fossil_minlimit_UN", "fossil_maxlimit_UN")])
```

## 7. Plot fossil ecometric space
We can also overlay fossil communities in the ecometric space with the modern communities. Black boxes represent fossil site trait distributions; red boxes represent modern communities from the same geographic coordinates.

``` {r fig.width=5, fig.height=4, eval = FALSE}
fossilPlot <- ecometric_space(
  model_out = ecoModel,
  env_name = "Precipitation (log mm)",
  fossil_data = recon, 
  x_label = "Community mean",
  y_label = "Community standard deviation"
)

print(fossilPlot)
```

## 8. Conclusion
This vignette demonstrates how `commecometrics` can be used to reconstruct past environmental conditions based on fossil community trait distributions. Ecometric models trained on modern data are used to estimate paleoenvironmental variables such as precipitation or vegetation type at fossil localities. This approach provides a powerful way to connect traits preserved in the fossil record with environmental gradients.

For a complete overview of the full workflow used to build trait–environment models from modern data, please refer to the companion vignette "`introduction-to-commecometrics`"