--- title: "Getting Started with HCUPtools" author: "Vikrant Dev Rathore" date: "`r Sys.Date()`" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Getting Started with HCUPtools} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", eval = FALSE, fig.width = 7, fig.height = 5 ) ``` `HCUPtools` is an R package for accessing and working with resources from the **Agency for Healthcare Research and Quality (AHRQ) Healthcare Cost and Utilization Project (HCUP)**. This vignette provides a comprehensive guide to using the package for common healthcare data analysis tasks. ## Installation and Setup ```{r} # Install from CRAN install.packages("HCUPtools") # Load the package library(HCUPtools) library(dplyr) # For data manipulation examples ``` ## Part 1: Downloading CCSR Mapping Files The Clinical Classifications Software Refined (CCSR) is a tool developed by AHRQ/HCUP to categorize ICD-10-CM diagnosis codes and ICD-10-PCS procedure codes into clinically meaningful categories. The `download_ccsr()` function provides direct access to these mapping files. ### Download Latest Version ```{r} # Download the latest diagnosis CCSR mapping file dx_map <- download_ccsr("diagnosis") # Download the latest procedure CCSR mapping file pr_map <- download_ccsr("procedure") ``` ### Download Specific Version ```{r} # Download a specific version (useful for reproducibility) dx_map_v2025 <- download_ccsr("diagnosis", version = "v2025.1") pr_map_v2025 <- download_ccsr("procedure", version = "v2025.1") ``` ### List Available Versions ```{r} # List all available versions all_versions <- list_ccsr_versions() print(all_versions) # List only diagnosis versions dx_versions <- list_ccsr_versions("diagnosis") # List only procedure versions pr_versions <- list_ccsr_versions("procedure") ``` ## Part 2: Mapping ICD-10 Codes to CCSR Categories Once you have downloaded a mapping file, you can use `ccsr_map()` to map ICD-10 codes to CCSR categories. This function supports multiple output formats to accommodate different analytical needs. ### Prepare Sample Data ```{r} # Create sample patient data with ICD-10 diagnosis codes patient_data <- tibble::tibble( patient_id = 1:10, admission_date = as.Date(c("2024-01-15", "2024-02-20", "2024-03-10", "2024-04-05", "2024-05-12", "2024-06-18", "2024-07-22", "2024-08-30", "2024-09-14", "2024-10-08")), icd10_dx = c("E11.9", "I10", "M79.3", "E78.5", "K21.9", "I50.9", "N18.6", "E78.5", "I25.10", "J44.1") ) ``` ### Long Format (Default) The long format duplicates records for each assigned CCSR category. This is essential for cross-classification analysis where you need to count all assigned categories. ```{r} # Map codes using long format (default) mapped_long <- ccsr_map( data = patient_data, code_col = "icd10_dx", map_df = dx_map, output_format = "long" ) # View the results head(mapped_long, 20) # Count occurrences of each CCSR category ccsr_counts <- mapped_long |> count(ccsr_category, sort = TRUE) print(ccsr_counts) ``` **Use Case**: Long format is ideal when you want to: - Count how many times each CCSR category appears - Analyze cross-classifications (one ICD-10 code mapping to multiple CCSR categories) - Create frequency tables of CCSR categories ### Wide Format The wide format creates multiple columns (CCSR_1, CCSR_2, etc.) for multiple categories, keeping one row per ICD-10 code. ```{r} # Map codes using wide format mapped_wide <- ccsr_map( data = patient_data, code_col = "icd10_dx", map_df = dx_map, output_format = "wide" ) # View the results head(mapped_wide) ``` **Use Case**: Wide format is ideal when you want to: - Keep all CCSR categories for each patient in a single row - Perform patient-level analysis - Maintain the original data structure with additional CCSR columns ### Default Category Only For diagnosis codes, CCSR assigns a "default" category that is recommended for principal diagnosis analysis. Use `default_only = TRUE` to extract only this default category. ```{r} # Map codes using default category only mapped_default <- ccsr_map( data = patient_data, code_col = "icd10_dx", map_df = dx_map, default_only = TRUE ) # View the results head(mapped_default) ``` **Use Case**: Default category is ideal when you want to: - Analyze principal diagnoses only - Follow HCUP recommendations for diagnosis analysis - Maintain one-to-one mapping (one ICD-10 code = one CCSR category) ## Part 3: Getting CCSR Descriptions To understand what CCSR categories mean, use `get_ccsr_description()`: ```{r} # Get descriptions for specific CCSR codes ccsr_codes <- c("ADM010", "NEP003", "CIR019", "END001", "MBD001") descriptions <- get_ccsr_description(ccsr_codes, map_df = dx_map) print(descriptions) # Get descriptions without pre-downloaded mapping (will download automatically) descriptions_auto <- get_ccsr_description( c("ADM010", "NEP003"), type = "diagnosis" ) ``` ## Part 4: Working with Procedure Codes The package also supports ICD-10-PCS procedure codes: ```{r} # Download procedure mapping pr_map <- download_ccsr("procedure") # Create sample procedure data procedure_data <- tibble::tibble( case_id = 1:5, procedure_date = as.Date(c("2024-01-20", "2024-02-15", "2024-03-22", "2024-04-10", "2024-05-18")), icd10_pcs = c("0DB60ZZ", "0DT70ZZ", "0WQ3XZ", "0FB00ZZ", "0HB00ZX") ) # Map procedure codes mapped_procedures <- ccsr_map( data = procedure_data, code_col = "icd10_pcs", map_df = pr_map ) # View the results head(mapped_procedures) ``` ## Part 5: Complete Analysis Workflow Here's a complete workflow for analyzing CCSR categories in a dataset: ```{r} # Step 1: Download mapping file dx_map <- download_ccsr("diagnosis") # Step 2: Map diagnosis codes patient_data_mapped <- ccsr_map( data = patient_data, code_col = "icd10_dx", map_df = dx_map, output_format = "long" ) # Step 3: Count occurrences of each CCSR category ccsr_counts <- patient_data_mapped |> count(ccsr_category, sort = TRUE) # Step 4: Merge with descriptions for reporting ccsr_counts_with_desc <- ccsr_counts |> left_join( get_ccsr_description( unique(patient_data_mapped$ccsr_category), map_df = dx_map ), by = c("ccsr_category" = "ccsr_code") ) # Step 5: View the final results print(ccsr_counts_with_desc) ``` ## Part 6: Downloading HCUP Summary Trend Tables The package also provides access to HCUP Summary Trend Tables, which contain aggregated information on hospital utilization trends: ```{r} # List available tables (interactive menu) available_tables <- download_trend_tables() print(available_tables) # Download a specific table by ID # Table 2a: All Inpatient Encounter Types - Trends in Number of Discharges table_path <- download_trend_tables("2a") # Download all tables as a ZIP file (~81 MB) all_tables_zip <- download_trend_tables("all") ``` The trend tables include: - Overview of trends in inpatient and emergency department utilization - All inpatient encounter types (discharges, percent, length of stay, mortality, population rates) - Inpatient encounter types (normal newborns, deliveries, elective/non-elective stays) - Inpatient service lines (maternal/neonatal, mental health, injuries, surgeries, medical conditions) - ED treat-and-release visits For more information, see: [HCUP Summary Trend Tables](https://hcup-us.ahrq.gov/reports/trendtables/summarytrendtables.jsp) ### Reading Trend Tables ```{r} # Read the trend table data trend_data <- read_trend_table(table_path, sheet = "National") head(trend_data) # List available sheets sheets <- list_trend_table_sheets(table_path) print(sheets) # Read specific state data california_data <- read_trend_table(table_path, sheet = "California") ``` ## Part 7: Accessing CCSR Change Logs View changes between CCSR versions: ```{r} # Get change log as data table (default) changelog <- ccsr_changelog(version = "v2026.1") print(changelog) # Get change log URL changelog_url <- ccsr_changelog(version = "v2026.1", format = "url") # View change log in default PDF viewer ccsr_changelog(version = "v2026.1", format = "view") # Download change log file changelog_file <- ccsr_changelog(version = "v2026.1", format = "download") ``` ## Part 8: Generating Citations When using HCUP data in publications, always cite the source properly: ```{r} # Generate text citation for CCSR cat(hcup_citation()) # Generate citation for Summary Trend Tables cat(hcup_citation(resource = "trend_tables")) # Generate BibTeX citation (for LaTeX documents) cat(hcup_citation(format = "bibtex")) # Generate R citation object (for R markdown) citation_obj <- hcup_citation(format = "r") print(citation_obj) ``` ## Part 9: Reading Downloaded Files If you've already downloaded files, you can read them directly: ```{r} # Read CCSR file from various formats dx_map <- read_ccsr("path/to/DXCCSR-v2026-1.zip") dx_map <- read_ccsr("path/to/DXCCSR_v2026-1.csv") dx_map <- read_ccsr("path/to/DXCCSR_v2026-1.xlsx") dx_map <- read_ccsr("path/to/extracted_directory/") # Read trend table Excel file national_data <- read_trend_table( "path/to/HCUP_SummaryTrendTables_T2a.xlsx", sheet = "National" ) ``` ## Important Notes ### Data Download - The package downloads data directly from HCUP, so an internet connection is required for the first download - Downloaded files are cached by default to avoid re-downloading - Set `cache = FALSE` to disable caching ### Cross-Classification - One ICD-10 code can map to multiple CCSR categories - Use long format to see all mappings - Use default category for principal diagnosis analysis ### Default Categories - For diagnosis codes, CCSR assigns a default category recommended for principal diagnosis analysis - Use `default_only = TRUE` to extract only the default category ### Performance - CCSR mapping files contain ~75,000 rows - Consider using `as_data_table = TRUE` in `read_ccsr()` and `read_trend_table()` for very large datasets ## Legal and Compliance **Important Disclaimer:** This package is an independent, non-commercial tool developed by a third party. It is **not affiliated with, endorsed by, or supported by AHRQ or HCUP in any way.** This package is not an official AHRQ or HCUP product. This package facilitates access to **publicly available and free** HCUP resources: - **CCSR Mapping Files** - Classification software tools (free download) - **HCUP Summary Trend Tables** - Aggregated statistical reports (free download) **Critical:** This package does **NOT** access any HCUP databases (NIS, KID, SID, NEDS, etc.) that require purchase through the HCUP Central Distributor. ### User Responsibilities Users are responsible for: - Ensuring compliance with all applicable HCUP Data Use Agreements (DUAs) - Verifying the accuracy of results - Citing the appropriate AHRQ/HCUP sources in publications - Understanding and adhering to all HCUP data usage restrictions ### Essential Resources - [HCUP Data Use Agreement Training](https://hcup-us.ahrq.gov/tech_assist/dua.jsp) - [HCUP Data Use Agreements](https://hcup-us.ahrq.gov/team/NationwideDUA.pdf) - [HCUP Publishing Requirements](https://hcup-us.ahrq.gov/db/publishing.jsp) - [CCSR Overview](https://hcup-us.ahrq.gov/toolssoftware/ccsr/ccs_refined.jsp) - [HCUP Summary Trend Tables](https://hcup-us.ahrq.gov/reports/trendtables/summarytrendtables.jsp) ## Additional Resources - **Package GitHub**: https://github.com/vikrant31/HCUPtools - **HCUP Homepage**: https://hcup-us.ahrq.gov/ - **CCSR Overview**: https://hcup-us.ahrq.gov/toolssoftware/ccsr/ccs_refined.jsp - **HCUP CCSR Tools**: https://hcup-us.ahrq.gov/toolssoftware/ccsr/ccs_refined.jsp - **HCUP Summary Trend Tables**: https://hcup-us.ahrq.gov/reports/trendtables/summarytrendtables.jsp