Type: | Package |
Title: | Discovery, Access and Manipulation of 'TreeBASE' Phylogenies |
Version: | 0.1.5 |
Description: | Interface to the API for 'TreeBASE' http://treebase.org from 'R.' 'TreeBASE' is a repository of user-submitted phylogenetic trees (of species, population, or genes) and the data used to create them. |
License: | CC0 |
Encoding: | UTF-8 |
URL: | https://docs.ropensci.org/treebase/, https://github.com/ropensci/treebase |
BugReports: | https://github.com/ropensci/treebase/issues |
Depends: | R (≥ 2.15), ape |
Imports: | XML, RCurl, methods, utils, httr |
Suggests: | testthat, knitr, rmarkdown |
RoxygenNote: | 7.3.1 |
VignetteBuilder: | knitr |
NeedsCompilation: | no |
Packaged: | 2024-02-16 05:36:51 UTC; rstudio |
Author: | Carl Boettiger [aut, cre], Duncan Temple Lang [aut] |
Maintainer: | Carl Boettiger <cboettig@gmail.com> |
Repository: | CRAN |
Date/Publication: | 2024-02-16 06:20:02 UTC |
A function to cache the phylogenies in treebase locally
Description
A function to cache the phylogenies in treebase locally
Usage
cache_treebase(
file = paste("treebase-", Sys.Date(), ".rda", sep = ""),
pause1 = 3,
pause2 = 3,
attempts = 10,
max_trees = Inf,
only_metadata = FALSE,
save = TRUE
)
Arguments
file |
filename for the cache, otherwise created with datestamp |
pause1 |
number of seconds to hesitate between requests |
pause2 |
number of seconds to hesitate between individual files |
attempts |
number of attempts to access a particular resource |
max_trees |
maximum number of trees to return (default is Inf) |
only_metadata |
option to only return metadata about matching trees |
save |
logical indicating whether to save a file with the resuls. |
Details
it's a good idea to let this run overnight
Value
saves a cached file of treebase
Examples
## Not run:
treebase <- cache_treebase()
## End(Not run)
clean the fish.base data into pure ASCII
Description
clean the fish.base data into pure ASCII
Usage
clean_data(metadata)
Arguments
metadata |
list item with fishbase data |
Value
the item scrubbed of non-ASCII characters
Download the metadata on treebase using the OAI-MPH interface
Description
Download the metadata on treebase using the OAI-MPH interface
Usage
download_metadata(
query = "",
by = c("all", "until", "from"),
curl = getCurlHandle()
)
Arguments
query |
a date in format yyyy-mm-dd |
by |
return all data "until" that date, "from" that date to current, or "all" |
curl |
if calling in series many times, call getCurlHandle() first and then pass the return value in here. Avoids repeated handshakes with server. |
Details
query must be#' download_metadata(2010-01-01, by="until") all isn't a real query type, but will return all trees regardless of date
Examples
## Not run:
Near <- search_treebase("Near", "author", max_trees=1)
metadata(Near[[1]]$S.id)
## or manualy give a sudy id
metadata("2377")
### get all trees from a certain depostition date forwards ##
m <- download_metadata("2009-01-01", by="until")
## extract any metadata, e.g. publication date:
dates <- sapply(m, function(x) as.numeric(x$date))
hist(dates, main="TreeBase growth", xlab="Year")
### show authors with most tree submissions in that date range
authors <- sapply(m, function(x){
index <- grep( "creator", names(x))
x[index]
})
a <- as.factor(unlist(authors))
head(summary(a))
## Show growth of TreeBASE
all <- download_metadata("", by="all")
dates <- sapply(all, function(x) as.numeric(x$date))
hist(dates, main="TreeBase growth", xlab="Year")
## make a barplot submission volume by journals
journals <- sapply(all, function(x) x$publisher)
J <- tail(sort(table(as.factor(unlist(journals)))),5)
b<- barplot(as.numeric(J))
text(b, names(J), srt=70, pos=4, xpd=T)
## End(Not run)
remove non-ASCII characters
Description
remove non-ASCII characters
Usage
drop_nonascii(string)
Arguments
string |
any character string |
Value
the string after dropping all html tags to spaces
drop errors from the search
Description
drop errors from the search
Usage
drop_nontrees(tr)
Arguments
tr |
a list of phylogenetic trees returned by search_treebase |
Details
primarily for the internal use of search_treebase, but may be useful
Value
the list of phylogenetic trees returned successfully
Search the dryad metadata archive
Description
Search the dryad metadata archive
Usage
dryad_metadata(study.id, curl = getCurlHandle())
Arguments
study.id |
the dryad identifier |
curl |
if calling in series many times, call getCurlHandle() first and then pass the return value in here. Avoids repeated handshakes with server. |
Value
a list object containing the study metadata
Examples
## Not run:
dryad_metadata("10255/dryad.12")
## End(Not run)
imports phylogenetic trees from treebase. internal function
Description
imports phylogenetic trees from treebase. internal function
Usage
get_nex(
query,
max_trees = "last()",
returns = "tree",
curl = getCurlHandle(),
verbose = TRUE,
pause1 = 1,
pause2 = 1,
attempts = 5,
only_metadata = FALSE
)
Arguments
query |
: a phylows formatted search, see https://sourceforge.net/apps/mediawiki/treebase/index.php?title=API |
max_trees |
limits the number of trees returned should be kept. |
returns |
should return the tree object or the matrix (of sequences) |
curl |
the handle to the curl |
verbose |
a logical indicating if output should be printed to screen |
pause1 |
number of seconds to hesitate between requests |
pause2 |
number of seconds to hesitate between individual files |
attempts |
number of attempts to access a particular resource |
Value
A list object containing all the trees matching the search (of class phylo)
return the trees in treebase that correspond to the search results get_study is deprecated, and now can be performed more easily using phylo_metadata and oai_metadata search functions.
Description
return the trees in treebase that correspond to the search results get_study is deprecated, and now can be performed more easily using phylo_metadata and oai_metadata search functions.
Usage
get_study(search_results, curl = getCurlHandle(), ...)
Arguments
search_results |
the output of download_metadata, or a subset thereof |
curl |
the handle to the curl web utility for repeated calls, see the getCurlHandle() function in RCurl package for details. |
... |
additional arguments to pass to search_treebase |
Details
this function is commonly used to get trees corresponding to the metadata search.
Value
all corresponding phylogenies.
return the study.id from the search results.
Description
get_study_id is deprecated, and now can be performed more easily using phylo_metadata and oai_metadata search functions.
Usage
get_study_id(search_results)
Arguments
search_results |
the output of download_metadata, or a subset thereof |
Details
this function is commonly used to get trees corresponding to the metadata search.
Value
the study id
Simple function to identify which trees have branch lengths
Description
Simple function to identify which trees have branch lengths
Usage
have_branchlength(trees)
Arguments
trees |
a list of phylogenetic trees (ape/phylo format) |
Value
logical string indicating which have branch length data
metadata.rda
Description
Contains a cache of all publication metadata the search_metadata() to pull down when run on 2012-05-12.
Usage
metadata(phylo.md = NULL, oai.md = NULL)
Arguments
phylo.md |
cached phyloWS (tree) metadata, (optional) |
oai.md |
cached OAI-PMH (study) metadata (optional) |
Details
recreate with:
search_metadata()
Value
a data frame of all available metadata, (as a data.table object) columns are: "Study.id", "Tree.id", "kind", "type", "quality", "ntaxa" "date", "publisher", "author", "title".
Examples
## Not run:
meta <- metadata()
meta[publisher %in% c("Nature", "Science") & ntaxa > 50 & kind == "Species Tree",]
## End(Not run)
Internal function for OAI-MPH interface to the Dryad database
Description
Internal function for OAI-MPH interface to the Dryad database
Usage
metadata_from_oai(query, curl = curl)
Arguments
query |
a properly formed url query to dryad |
curl |
if calling in series many times, call getCurlHandle() first and then pass the return value in here. Avoids repeated handshakes with server. |
See Also
Search the OAI-PMH metadata by date, publisher, or identifier
Description
Search the OAI-PMH metadata by date, publisher, or identifier
Usage
oai_metadata(
x = c("date", "publisher", "author", "title", "Study.id", "attributes"),
metadata = NULL,
...
)
Arguments
x |
one of "date", "publisher", "identifier" for the study |
metadata |
returned from |
... |
additional arguments to |
Value
a list of values matching the query
Search the PhyloWS metadata
Description
Search the PhyloWS metadata
Usage
phylo_metadata(
x = c("Study.id", "Tree.id", "kind", "type", "quality", "ntaxa"),
metadata = NULL,
...
)
Arguments
x |
one of "Study.ids", "Tree.ids", "kind", "type", "quality", "ntaxa" |
metadata |
returned from |
... |
additional arguments to |
Value
a list of the values matching the query
A function to pull in the phyologeny/phylogenies matching a search query
Description
A function to pull in the phyologeny/phylogenies matching a search query
Usage
search_treebase(
input,
by,
returns = c("tree", "matrix"),
exact_match = FALSE,
max_trees = Inf,
branch_lengths = FALSE,
curl = getCurlHandle(),
verbose = TRUE,
pause1 = 0,
pause2 = 0,
attempts = 3,
only_metadata = FALSE
)
Arguments
input |
a search query (character string) |
by |
the kind of search; author, taxon, subject, study, etc (see list of possible search terms, details) |
returns |
should the fn return the tree or the character matrix? |
exact_match |
force exact matching for author name, taxon, etc. Otherwise does partial matching |
max_trees |
Upper bound for the number of trees returned, good for keeping possibly large initial queries fast |
branch_lengths |
logical indicating whether should only return trees that have branch lengths. |
curl |
the handle to the curl web utility for repeated calls, see the getCurlHandle() function in RCurl package for details. |
verbose |
logical indicating level of progress reporting |
pause1 |
number of seconds to hesitate between requests |
pause2 |
number of seconds to hesitate between individual files |
attempts |
number of attempts to access a particular resource |
only_metadata |
option to only return metadata about matching trees which lists study.id, tree.id, kind (gene,species,barcode) type (single, consensus) number of taxa, and possible quality score. |
Value
either a list of trees (multiphylo) or a list of character matrices
Examples
## Not run:
## defaults to return phylogeny
Huelsenbeck <- search_treebase("Huelsenbeck", by="author")
## can ask for character matrices:
wingless <- search_treebase("2907", by="id.matrix", returns="matrix")
## Some nexus matrices don't meet read.nexus.data's strict requirements,
## these aren't returned
H_matrices <- search_treebase("Huelsenbeck", by="author", returns="matrix")
## Use Booleans in search: and, or, not
## Note that by must identify each entry type if a Boolean is given
HR_trees <- search_treebase("Ronquist or Hulesenbeck", by=c("author", "author"))
## We'll often use max_trees in the example so that they run quickly,
## notice the quotes for species.
dolphins <- search_treebase('"Delphinus"', by="taxon", max_trees=5)
## can do exact matches
humans <- search_treebase('"Homo sapiens"', by="taxon", exact_match=TRUE, max_trees=10)
## all trees with 5 taxa
five <- search_treebase(5, by="ntax", max_trees = 10)
## These are different, a tree id isn't a Study id. we report both
studies <- search_treebase("2377", by="id.study")
tree <- search_treebase("2377", by="id.tree")
c("TreeID" = tree$Tr.id, "StudyID" = tree$S.id)
## Only results with branch lengths
## Has to grab all the trees first, then toss out ones without branch_lengths
Near <- search_treebase("Near", "author", branch_lengths=TRUE)
## End(Not run)
Get the metadata associated with the study in which the phylogeny was published.
Description
Get the metadata associated with the study in which the phylogeny was published.
Usage
show_metadata(study.id, curl = getCurlHandle())
Arguments
study.id |
The treebase study id (numbers only, specify in quotes) |
curl |
if calling in series many times, call getCurlHandle() first and then pass the return value in here. avoids repeated handshakes with server. |
Details
if the tree is imported with search_treebase, then this is in tree$S.id
treebase.rda
Description
Contains a cache of all phylogenies cache_treebase()
function was able
to pull down when run on 2012-05-14.
Details
recreate with:
cache_treebase()