Type: | Package |
Title: | Google Citation Parser |
Version: | 0.11.0 |
Description: | Scrapes Google Citation pages and creates data frames of citations over time. |
License: | GPL-3 |
Imports: | xml2, httr, rvest, stats, pbapply, data.table, wordcloud, tm, graphics |
RoxygenNote: | 7.3.2 |
Encoding: | UTF-8 |
Suggests: | covr, testthat, spelling |
Language: | en-US |
NeedsCompilation: | no |
Packaged: | 2025-04-01 15:28:03 UTC; johnmuschelli |
Author: | John Muschelli |
Maintainer: | John Muschelli <muschellij2@gmail.com> |
Repository: | CRAN |
Date/Publication: | 2025-04-01 15:50:09 UTC |
Make Wordcloud of authors from Papers
Description
Takes a vector of authors and then creates a frequency table of those words and plots a wordcloud
Usage
author_cloud(
authors,
addstopwords = gcite_stopwords(),
author_pattern = NULL,
split = ",",
verbose = TRUE,
colors = c("#66C2A4", "#41AE76", "#238B45", "#006D2C", "#00441B"),
...
)
author_frequency(
authors,
author_pattern = NULL,
split = ",",
addstopwords = gcite_stopwords(),
verbose = TRUE
)
Arguments
authors |
Vector of authors of papers |
addstopwords |
Additional words to remove from wordcloud |
author_pattern |
regular expression for patterns to exclude from individual authors |
split |
split author names (default |
verbose |
Print diagnostic messages |
colors |
color words from least to most frequent. Passed to
|
... |
additional options passed to |
Value
A data.frame
of the words and the frequencies of the
authors
Examples
## Not run:
L = gcite_author_info("John Muschelli")
paper_df = L$paper_df
authors = paper_df$authors
author_cloud(authors)
## End(Not run)
Google Citations Information
Description
Wraps getting the information from Google Citations and plotting the wordcloud
Usage
gcite(
author,
user,
plot_wordcloud = TRUE,
author_args = list(),
title_args = list(),
warn = FALSE,
force = FALSE,
sleeptime = 0,
...
)
Arguments
author |
author name separated by spaces |
user |
user ID for google Citations |
plot_wordcloud |
should the wordcloud be plotted |
author_args |
Arguments to pass to |
title_args |
Arguments to pass to |
warn |
should warnings be printed from wordcloud? |
force |
If passing a URL and there is a failure, should the
program return |
sleeptime |
time in seconds between http requests, to avoid Google Scholar rate limit |
... |
additional options passed to |
Value
List from either gcite_user_info
or gcite_author_info
Examples
if (!is_travis() & !is_cran()) {
res = gcite(author = "John Muschelli")
paper_df = res$paper_df
gcite_wordcloud(paper_df)
author_cloud(paper_df$authors)
}
Getting User Information from name
Description
Calls gcite_user_info
after getting the user
identifier
Usage
gcite_author_info(
author,
ask = TRUE,
pagesize = 100,
verbose = TRUE,
secure = TRUE,
force = FALSE,
read_citations = TRUE,
sleeptime = 0,
...
)
Arguments
author |
author name separated by spaces |
ask |
If multiple authors are found, should a menu be given |
pagesize |
Size of pages, max 100, passed to |
verbose |
Print diagnostic messages |
secure |
use https vs. http |
force |
If passing a URL and there is a failure, should the
program return |
read_citations |
Should all citation pages be read? |
sleeptime |
time in seconds between http requests, to avoid Google Scholar rate limit |
... |
Additional arguments passed to |
Value
A list of citations, citation indices, and a
data.frame
of authors, journal, and citations, and a
data.frame
of the links to all paper URLs.
Examples
## Not run:
if (!is_travis()) {
df = gcite_author_info(author = "John Muschelli", secure = FALSE)
}
## End(Not run)
if (!is_travis() & !is_cran()) {
df = gcite_author_info(author = "Jiawei Bai", secure = FALSE)
}
Parse Google Citation Index
Description
Parses a google citation indices (h-index, etc.) from main page
Usage
gcite_citation_index(doc, ...)
## S3 method for class 'xml_node'
gcite_citation_index(doc, ...)
## S3 method for class 'xml_document'
gcite_citation_index(doc, ...)
## S3 method for class 'character'
gcite_citation_index(doc, ...)
Arguments
doc |
A xml_document or the url for the main page |
... |
Additional arguments passed to |
Value
A matrix of indices
Examples
library(httr)
library(rvest)
library(gcite)
url = "https://scholar.google.com/citations?user=T9eqZgMAAAAJ"
url = gcite_url(url = url, pagesize = 10, cstart = 0)
if (!is_travis() & !is_cran()) {
ind = gcite_citation_index(url)
doc = content(httr::GET(url))
ind = gcite_citation_index(doc)
ind_nodes = rvest::html_nodes(doc, "#gsc_rsb_st")[[1]]
ind = gcite_citation_index(ind_nodes)
}
Parse Google Citation Index
Description
Parses a google citation indices (h-index, etc.) from main page
Usage
gcite_citation_page(doc, title = NULL, force = FALSE, ...)
## S3 method for class 'xml_nodeset'
gcite_citation_page(doc, title = NULL, force = FALSE, ...)
## S3 method for class 'xml_document'
gcite_citation_page(doc, title = NULL, force = FALSE, ...)
## S3 method for class 'character'
gcite_citation_page(doc, title = NULL, force = FALSE, ...)
## S3 method for class 'list'
gcite_citation_page(doc, title = NULL, force = FALSE, ...)
## Default S3 method:
gcite_citation_page(doc, title = NULL, force = FALSE, ...)
Arguments
doc |
A xml_document or the url for the main page |
title |
title of the article |
force |
If passing a URL and there is a failure, should the
program return |
... |
arguments passed to |
Value
A matrix of indices
Examples
library(httr)
library(rvest)
url = paste0("https://scholar.google.com/citations?view_op=view_citation&",
"hl=en&oe=ASCII&user=T9eqZgMAAAAJ&pagesize=100&",
"citation_for_view=T9eqZgMAAAAJ:W7OEmFMy1HYC")
url = gcite_url(url = url, pagesize = 10, cstart = 0)
if (!is_travis() & !is_cran()) {
ind = gcite_citation_page(url)
doc = content(httr::GET(url))
ind = gcite_citation_page(doc)
ind_nodes = html_nodes(doc, "#gsc_oci_table div")
ind_nodes = html_nodes(ind_nodes, xpath = '//div[@class = "gs_scl"]')
ind = gcite_citation_page(ind_nodes)
}
Parse Google Citations Over Time
Description
Parses a google citations over time from the main Citation page
Usage
gcite_cite_over_time(doc, ...)
## S3 method for class 'xml_node'
gcite_cite_over_time(doc, ...)
## S3 method for class 'xml_document'
gcite_cite_over_time(doc, ...)
## S3 method for class 'character'
gcite_cite_over_time(doc, ...)
## Default S3 method:
gcite_cite_over_time(doc, ...)
Arguments
doc |
A xml_document or the url for the main page |
... |
arguments passed to |
Value
A matrix of citations
Examples
library(httr)
library(rvest)
url = "https://scholar.google.com/citations?user=T9eqZgMAAAAJ"
url = gcite_url(url = url, pagesize = 10, cstart = 0)
if (!is_travis() & !is_cran()) {
#' ind = gcite_cite_over_time(url)
doc = content(httr::GET(url))
ind = gcite_cite_over_time(doc)
ind_nodes = rvest::html_nodes(doc, ".gsc_md_hist_b")
ind = gcite_cite_over_time(ind_nodes)
}
Parse Google Citation Graph
Description
Parses a google citation bar graph from html
Usage
gcite_graph(citations, ...)
## S3 method for class 'xml_node'
gcite_graph(citations, ...)
## S3 method for class 'xml_document'
gcite_graph(citations, ...)
## S3 method for class 'character'
gcite_graph(citations, ...)
## Default S3 method:
gcite_graph(citations, ...)
Arguments
citations |
A list of nodes or xml_node |
... |
arguments passed to |
Value
A matrix of citations and years
Parse Google Citation Graph
Description
Parses a google citation bar graph from html
Usage
gcite_main_graph(citations, ...)
## S3 method for class 'xml_document'
gcite_main_graph(citations, ...)
## S3 method for class 'character'
gcite_main_graph(citations, ...)
## Default S3 method:
gcite_main_graph(citations, ...)
Arguments
citations |
A list of nodes or xml_node |
... |
arguments passed to |
Value
A matrix of citations and years
Get Paper Data Frame from Title URLs
Description
Get Paper Data Frame from Title URLs
Usage
gcite_paper_df(urls, verbose = TRUE, force = FALSE, sleeptime = 0, ...)
Arguments
urls |
A character vector of urls, from
|
verbose |
Print diagnostic messages |
force |
If passing a URL and there is a failure, should the
program return |
sleeptime |
time in seconds between http requests, to avoid Google Scholar rate limit |
... |
Additional arguments passed to |
Value
A data.frame
of authors, journal, and citations
Examples
if (!is_travis() & !is_cran()) {
L = gcite_user_info(user = "uERvKpYAAAAJ",
read_citations = FALSE)
urls = L$all_papers$title_link
paper_df = gcite_paper_df(urls = urls, force = TRUE)
}
Parse Google Citation Index
Description
Parses a google citation indices (h-index, etc.) from main page
Usage
gcite_papers(doc, ...)
## S3 method for class 'xml_nodeset'
gcite_papers(doc, ...)
## S3 method for class 'xml_document'
gcite_papers(doc, ...)
## S3 method for class 'character'
gcite_papers(doc, ...)
## Default S3 method:
gcite_papers(doc, ...)
Arguments
doc |
A xml_document or the url for the main page |
... |
Additional arguments passed to |
Value
A matrix of indices
Examples
library(httr)
library(rvest)
url = "https://scholar.google.com/citations?user=T9eqZgMAAAAJ"
url = gcite_url(url = url, pagesize = 10, cstart = 0)
if (!is_travis() & !is_cran()) {
ind = gcite_papers(url)
doc = content(httr::GET(url))
ind = gcite_papers(doc)
ind_nodes = rvest::html_nodes(doc, "#gsc_a_b")
ind = gcite_papers(ind_nodes)
}
Google Cite Stopwords
Description
Additional stopwords to remove from Google Cite results
Usage
gcite_stopwords()
Value
Character Vector
Examples
gcite_stopwords()
Google Citations URL
Description
Simple wrapper for adding in pagesize
and start values for the page
Usage
gcite_url(url, cstart = 0, pagesize = 100)
gcite_base_url(secure = TRUE)
gcite_user_url(user, secure = TRUE)
Arguments
url |
URL of the google citations page |
cstart |
Starting value for the citation page |
pagesize |
number of citations to return, max is 100 |
secure |
should https be used (default), instead of http |
user |
Username/user ID for Google Scholar Citations |
Value
A character string
Examples
url = "https://scholar.google.com/citations?user=T9eqZgMAAAAJ"
gcite_url(url = url, pagesize = 100, cstart = 5)
Getting User Information of papers
Description
Loops through pages for all information on Google Citations
Usage
gcite_user_info(
user,
pagesize = 100,
verbose = TRUE,
secure = TRUE,
force = FALSE,
read_citations = TRUE,
sleeptime = 0,
...
)
Arguments
user |
user ID for google Citations |
pagesize |
Size of pages, max 100, passed to |
verbose |
Print diagnostic messages |
secure |
use https vs. http |
force |
If passing a URL and there is a failure, should the
program return |
read_citations |
Should all citation pages be read? |
sleeptime |
time in seconds between http requests, to avoid Google Scholar rate limit |
... |
Additional arguments passed to |
Value
A list of citations, citation indices, and a
data.frame
of authors, journal, and citations, and a
data.frame
of the links to all paper URLs and the character
string of the user name.
Examples
## Not run:
if (!is_travis() & !is_cran()) {
df = gcite_user_info(user = "uERvKpYAAAAJ")
}
## End(Not run)
Google Citation Username Searcher
Description
Search Google Citation for an author username
Usage
gcite_username(author, verbose = TRUE, ask = TRUE, secure = TRUE, ...)
Arguments
author |
author name separated by spaces |
verbose |
Verbose diagnostic printing |
ask |
If multiple authors are found, should a menu be given |
secure |
use https vs. http |
... |
arguments passed to |
Value
A character vector of the username of the author
Examples
if (!is_travis() & !is_cran()) {
gcite_username("John Muschelli")
}
Wordcloud of Google Citations Information
Description
Simple wrapper for author_cloud
and title_cloud
Usage
gcite_wordcloud(
paper_df,
author_args = list(),
title_args = list(),
warn = FALSE
)
Arguments
paper_df |
A |
author_args |
Arguments to pass to |
title_args |
Arguments to pass to |
warn |
should warnings be printed from wordcloud? |
gcite Wordcloud default
Description
Simple wrapper for wordcloud
with
different defaults
Usage
gcite_wordcloud_spec(
words,
freq,
min.freq = 1,
max.words = Inf,
random.order = FALSE,
colors = c("#F768A1", "#DD3497", "#AE017E", "#7A0177", "#49006A"),
vfont = c("sans serif", "plain"),
...
)
Arguments
words |
words to be plotted |
freq |
the frequency of those words |
min.freq |
words with frequency below min.freq will not be plotted |
max.words |
Maximum number of words to be plotted. least frequent terms dropped |
random.order |
plot words in random order. If false, they will be plotted in decreasing frequency |
colors |
color words from least to most frequent |
vfont |
passed to text for the font |
... |
additional options passed to |
Value
Nothing
Check if on Travis CI
Description
Simple check for Travis CI for examples
Usage
is_travis()
is_cran()
Value
Logical if user is named travis
Examples
is_travis()
is_cran()
Set Cookies from Text file
Description
Set Cookies from Text file
Usage
set_cookies_txt(file)
Arguments
file |
tab-delimited text file of cookies, to be read in using
|
Value
Either NULL
if no domains contain the word "scholar"
,
or an object of class request
from set_cookies
Note
This function searches for domains that contain the word "scholar"
Make Wordcloud of Titles from Papers
Description
Takes a vector of titles and then creates a frequency table of those words and plots a wordcloud
Usage
title_cloud(titles, addstopwords = gcite_stopwords(), ...)
paper_cloud(...)
title_word_frequency(titles, addstopwords = NULL)
Arguments
titles |
Vector of titles of papers |
addstopwords |
Additional words to remove from wordcloud |
... |
additional options passed to |
Value
A data.frame
of the words and the frequencies of the
title words
Examples
## Not run:
L = gcite_author_info("John Muschelli")
paper_df = L$paper_df
titles = paper_df$title
title_cloud(titles)
## End(Not run)