--- title: "Getting Started with orcidtr" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Getting Started with orcidtr} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", eval = FALSE ) ``` ## Introduction **orcidtr** provides a modern, user-friendly interface to the ORCID public API. ORCID (Open Researcher and Contributor ID) is a persistent digital identifier for researchers that connects researchers with their professional activities. This package allows you to programmatically fetch: - **Biographical data**: Names, biographies, keywords, researcher URLs - **Professional affiliations**: Employments, education, distinctions, memberships - **Research outputs**: Publications, datasets, funding, peer reviews - **Search capabilities**: Find researchers by name, affiliation, or DOI All functions return structured `data.table` objects for efficient data manipulation and analysis. ## Installation ```{r installation} # Install from CRAN (when available) install.packages("orcidtr") # Or install development version from GitHub # install.packages("pak") pak::pak("lorenzoFabbri/orcidtr") ``` ## Basic Usage ### Fetching Researcher Information Let's start by fetching basic biographical information for a researcher. We'll use Hadley Wickham's ORCID as an example: ```{r person-data} library(orcidtr) # Fetch complete person data person <- orcid_person("0000-0003-4757-117X") print(person) # Fetch just the biography bio <- orcid_bio("0000-0003-4757-117X") print(bio$biography) # Fetch research keywords keywords <- orcid_keywords("0000-0003-4757-117X") print(keywords) ``` ### Employment and Education History ```{r affiliations} # Fetch employment history employments <- orcid_employments("0000-0003-4757-117X") print(employments[, .(organization, role, city, country, start_date, end_date)]) # Fetch education records education <- orcid_educations("0000-0003-4757-117X") print(education[, .(organization, role, start_date, end_date)]) ``` ### Research Outputs ```{r research-outputs} # Fetch publications and other works works <- orcid_works("0000-0003-4757-117X") # Display summary cat(sprintf("Total works: %d\n", nrow(works))) cat(sprintf("Journal articles: %d\n", sum(works$type == "journal-article", na.rm = TRUE))) # View recent publications recent_works <- works[order(-publication_date)][1:5, .(title, type, publication_date, doi)] print(recent_works) # Fetch funding information funding <- orcid_funding("0000-0003-4757-117X") if (nrow(funding) > 0) { print(funding[, .(title, organization, start_date, amount, currency)]) } # Fetch peer review activities reviews <- orcid_peer_reviews("0000-0003-4757-117X") if (nrow(reviews) > 0) { cat(sprintf("Total peer reviews: %d\n", nrow(reviews))) } ``` ### Professional Activities ```{r professional-activities} # Fetch distinctions and awards distinctions <- orcid_distinctions("0000-0003-4757-117X") if (nrow(distinctions) > 0) { print(distinctions[, .(organization, role, start_date)]) } # Fetch professional memberships memberships <- orcid_memberships("0000-0003-4757-117X") if (nrow(memberships) > 0) { print(memberships[, .(organization, role, start_date, end_date)]) } # Fetch all affiliations types invited_positions <- orcid_invited_positions("0000-0003-4757-117X") qualifications <- orcid_qualifications("0000-0003-4757-117X") services <- orcid_services("0000-0003-4757-117X") ``` ## Efficient Data Retrieval ### Fetching Complete Records Instead of calling multiple individual functions, you can fetch all sections at once: ```{r complete-record} # Fetch everything record <- orcid_fetch_record("0000-0003-4757-117X") names(record) # Access individual sections record$works record$employments record$person # Fetch only specific sections for efficiency record <- orcid_fetch_record( "0000-0003-4757-117X", sections = c("works", "employments", "funding") ) ``` ### Getting All Activities in One Call The `orcid_activities()` function provides summaries of all activity types in a single API request: ```{r activities} # Fetch all activities summary activities <- orcid_activities("0000-0003-4757-117X") # Access different activity types activities$works activities$employments activities$fundings activities$distinctions ``` ### Batch Processing Multiple ORCIDs When working with multiple researchers, use `orcid_fetch_many()`: ```{r batch-processing} # Define multiple ORCIDs orcids <- c( "0000-0003-4757-117X", # Hadley Wickham "0000-0002-1825-0097", # Yihui Xie "0000-0003-1419-2405" # Jenny Bryan ) # Fetch works for all all_works <- orcid_fetch_many(orcids, section = "works") # Analyze combined data works_by_researcher <- all_works[, .N, by = orcid] print(works_by_researcher) # Get works by type across all researchers works_by_type <- all_works[, .N, by = type][order(-N)] print(works_by_type) ``` ## Searching the ORCID Registry ### Basic Search ```{r search-basic} # Search by family name results <- orcid_search(family_name = "Wickham") print(results[, .(orcid_id, given_names, family_name)]) # Search by affiliation results <- orcid_search( affiliation_org = "Stanford University", rows = 20 ) # Combine multiple criteria results <- orcid_search( family_name = "Smith", given_names = "John", affiliation_org = "MIT" ) ``` ### Advanced Search with Solr Queries For more complex searches, use the `orcid()` function with Solr syntax: ```{r search-advanced} # Search with field-specific queries results <- orcid( query = 'family-name:Smith AND affiliation-org-name:"Harvard University"', rows = 10 ) # Search by keywords results <- orcid( query = 'keyword:("machine learning" OR "artificial intelligence")', rows = 15 ) # Check total number of results cat(sprintf("Total found: %d\n", attr(results, "found"))) cat(sprintf("Returned: %d\n", nrow(results))) ``` ### Search by DOI Find researchers associated with specific publications: ```{r search-doi} # Search for a single DOI results <- orcid_doi("10.1371/journal.pone.0001543") # Search for multiple DOIs dois <- c( "10.1371/journal.pone.0001543", "10.1038/nature12345" ) results <- orcid_doi(dois, rows = 50) ``` ## Data Analysis Examples ### Publication Trends Over Time ```{r analysis-trends} library(orcidtr) # Fetch works works <- orcid_works("0000-0003-4757-117X") # Extract publication years works[, pub_year := as.integer(substr(publication_date, 1, 4))] # Count publications by year pub_by_year <- works[!is.na(pub_year), .N, by = pub_year][order(pub_year)] print(pub_by_year) # Publications by type pub_by_type <- works[, .N, by = type][order(-N)] print(pub_by_type) ``` ### Collaboration Networks ```{r analysis-networks} # Fetch works for multiple researchers orcids <- c( "0000-0003-4757-117X", "0000-0002-1825-0097", "0000-0003-1419-2405" ) all_works <- orcid_fetch_many(orcids, section = "works") # Count works per researcher works_count <- all_works[, .N, by = orcid] print(works_count) # Get researchers with most recent publications recent_cutoff <- "2020-01-01" recent_works <- all_works[publication_date >= recent_cutoff] recent_count <- recent_works[, .N, by = orcid] print(recent_count) ``` ### Funding Analysis ```{r analysis-funding} # Fetch funding for a researcher funding <- orcid_funding("0000-0003-4757-117X") # Summarize by organization funding_by_org <- funding[, .N, by = organization][order(-N)] print(funding_by_org) # Funding by type funding_by_type <- funding[, .N, by = type][order(-N)] print(funding_by_type) # Active grants active_grants <- funding[ is.na(end_date) | end_date >= Sys.Date() ] print(active_grants[, .(title, organization, start_date)]) ``` ## Authentication (Optional) **Important:** The ORCID public API does NOT require authentication for reading public data. All functions in this package work without any token by default. Authentication is entirely optional and only provides: - **Higher API rate limits** - Useful if making many requests rapidly - **Access to private data** - Only if you've been explicitly granted permission ### When You Need Authentication Most users do not need authentication. Consider getting a token only if: - You're making very frequent requests (>24/second sustained) - Building an application with many concurrent users - Accessing private data you've been granted permission to view ### Setting Up Authentication (If Needed) 1. Register for ORCID API credentials at [https://orcid.org/developer-tools](https://orcid.org/developer-tools) 2. Click "Register for the free ORCID public API" 3. Follow the OAuth2 client credentials flow to obtain an access token 4. Set the environment variable: ```{r auth-setup} # Set temporarily in session Sys.setenv(ORCID_TOKEN = "your-token-here") # Or add to .Renviron file (recommended for persistent use) # ORCID_TOKEN=your-token-here ``` ### Using Authentication ```{r auth-usage} # The token is automatically used if ORCID_TOKEN environment variable is set works <- orcid_works("0000-0003-4757-117X") # Or pass explicitly (overrides environment variable) works <- orcid_works("0000-0003-4757-117X", token = "your-token-here") ``` **Note:** If you set an ORCID_TOKEN environment variable but find it's not working, unset it. The public API works fine without any token. ## API Status and Error Handling ### Checking API Health ```{r api-status} # Check if ORCID API is online status <- orcid_ping() print(status) ``` ### Handling Errors ```{r error-handling} # Wrap API calls in tryCatch for production use result <- tryCatch( { orcid_works("0000-0003-4757-117X") }, error = function(e) { message("Failed to fetch works: ", conditionMessage(e)) data.table::data.table() # Return empty data.table } ) # For batch operations, use stop_on_error = FALSE orcids <- c("0000-0003-4757-117X", "invalid-orcid", "0000-0002-1825-0097") all_works <- orcid_fetch_many( orcids, section = "works", stop_on_error = FALSE # Continue despite errors ) ``` ## Best Practices ### Rate Limiting The ORCID public API has rate limits (~24 requests/second for unauthenticated requests). For large batch operations: ```{r rate-limiting} # Add delays between requests orcids <- c("0000-0003-4757-117X", "0000-0002-1825-0097", "0000-0003-1419-2405") results <- lapply(orcids, function(id) { result <- orcid_works(id) Sys.sleep(0.1) # 100ms delay result }) all_results <- data.table::rbindlist(results) ``` ### Caching Results For frequently accessed data, consider caching: ```{r caching} # Simple file-based cache cache_file <- "orcid_cache.rds" fetch_with_cache <- function(orcid_id) { if (file.exists(cache_file)) { cache <- readRDS(cache_file) if (orcid_id %in% names(cache)) { message("Using cached data for ", orcid_id) return(cache[[orcid_id]]) } } else { cache <- list() } # Fetch fresh data data <- orcid_fetch_record(orcid_id) cache[[orcid_id]] <- data saveRDS(cache, cache_file) data } record <- fetch_with_cache("0000-0003-4757-117X") ``` ### Working with data.table All functions return `data.table` objects, which provide efficient data manipulation: ```{r data-table-usage} library(data.table) works <- orcid_works("0000-0003-4757-117X") # Filter by type articles <- works[type == "journal-article"] # Select specific columns works[, .(title, publication_date, doi)] # Chain operations works[ type == "journal-article" & !is.na(doi) ][ order(-publication_date) ][1:10, .(title, doi)] # Aggregate works[, .N, by = type][order(-N)] ``` ## Additional Resources - **ORCID API Documentation**: https://info.orcid.org/documentation/api-tutorials/ - **Package Repository**: https://github.com/lorenzoFabbri/orcidtr - **Report Issues**: https://github.com/lorenzoFabbri/orcidtr/issues - **ORCID Registry**: https://orcid.org ## Summary The **orcidtr** package provides comprehensive access to ORCID data with: - Simple, consistent function interface - Efficient `data.table` return objects - Batch processing capabilities - Flexible search functions - No authentication required for public data - Full CRAN compliance For more information, see `help(package = "orcidtr")` or visit the package website.