Title: | Programmatically Collect Normalized News from (Almost) Any Website |
Version: | 0.1.2 |
Description: | Programmatically collect normalized news from (almost) any website. An 'R' clone of the https://github.com/kotartemiy/newscatcher 'Python' module. |
License: | MIT + file LICENSE |
URL: | https://github.com/discindo/newscatcheR/ |
BugReports: | https://github.com/discindo/newscatcheR/issues/ |
Depends: | R (≥ 2.10) |
Imports: | tidyRSS (≥ 2.0.2), utils |
Suggests: | knitr, rmarkdown, testthat |
VignetteBuilder: | knitr |
Encoding: | UTF-8 |
LazyData: | true |
RoxygenNote: | 7.2.3 |
Language: | en-US |
NeedsCompilation: | no |
Packaged: | 2023-09-20 10:22:29 UTC; VI2451 |
Author: | Novica Nakov [aut, cre], Teofil Nakov [ctb], Artem Bugara [ctb], Discindo [cph] |
Maintainer: | Novica Nakov <nnovica@gmail.com> |
Repository: | CRAN |
Date/Publication: | 2023-09-20 10:40:02 UTC |
newscatcheR: Programmatically Collect Normalized News from (Almost) Any Website
Description
Programmatically collect normalized news from (almost) any website. An 'R' clone of the https://github.com/kotartemiy/newscatcher 'Python' module.
Author(s)
Maintainer: Novica Nakov nnovica@gmail.com
Other contributors:
Teofil Nakov teofiln@gmail.com [contributor]
Artem Bugara bugara.artem@gmail.com [contributor]
Discindo [copyright holder]
See Also
Useful links:
Report bugs at https://github.com/discindo/newscatcheR/issues/
Check URL A helper function to verify user input before fetching the feed.
Description
Check URL A helper function to verify user input before fetching the feed.
Usage
check_url(website = "ycombinator.com", rss_table = package_rss)
Arguments
website |
a url of a new source in the format "news.ycombinator.com" |
rss_table |
a dataframe with urls and rss feeds in case you need to construct your own out of websites not in the included database. Be sure to have the same format as the included data. See 'R/package_rss.R' for details. |
Describe URL
Description
Describe URL
Usage
describe_url(website = "ycombinator.com", rss_table = package_rss)
Arguments
website |
a url of a new source in the format "news.ycombinator.com" |
rss_table |
a dataframe with urls and rss feeds in case you
need to construct your own out of websites not in the included database.
Be sure to have the same format as the included data. See |
Value
A character vector with topics.
Examples
describe_url(website = "ycombinator.com", rss_table = package_rss)
Filter URLs in the provided database based on topic, country and language
Description
Filter URLs in the provided database based on topic, country and language
Usage
filter_urls(
topic = NULL,
country = NULL,
language = NULL,
rss_table = package_rss
)
Arguments
topic |
the topic of the feed see |
country |
the country of origin of the feed using two capital
letters, for example "US". See |
language |
the language of the content of the feed using two
lowercase letters, for example "en". See |
rss_table |
a dataframe with urls and rss feeds in case you
need to construct your own out of websites not in the included database.
Be sure to have the same format as the included data. See |
Value
a tibble filtered according to the given parameters
Examples
filter_urls(topic = "tech", country = "US", language = "en")
Get headlines A helper function to get just the headlines of the feed
Description
Get headlines A helper function to get just the headlines of the feed
Usage
get_headlines(
website = "ycombinator.com",
topic = NULL,
rss_table = package_rss
)
Arguments
website |
a url of a new source in the format "news.ycombinator.com" |
topic |
the topic of the feed, by default it is NULL which means it
will fetch the "main" feed. topics are 'tech', 'news', 'business', 'science',
'finance', 'food', 'politics', 'economics', 'travel', 'entertainment',
'music', 'sport', 'world', but not all site have all topics.
use |
rss_table |
a dataframe with urls and rss feeds in case you
need to construct your own out of websites not in the included database.
Be sure to have the same format as the included data. See |
Value
a tibble containing the headlines contained in the feed
Examples
## Not run:
Sys.sleep(3) # adding a small time delay to avoid
# simultaneous posts to the API
get_headlines(website = "ycombinator.com", rss_table = package_rss)
## End(Not run)
Get news Get the contents of a rss feed
Description
Get news Get the contents of a rss feed
Usage
get_news(website = "ycombinator.com", topic = NULL, rss_table = package_rss)
Arguments
website |
a url of a new source in the format "news.ycombinator.com" |
topic |
the topic of the feed, by default it is NULL which means it
will fetch the "main" feed. topics are 'tech', 'news', 'business', 'science',
'finance', 'food', 'politics', 'economics', 'travel', 'entertainment',
'music', 'sport', 'world', but not all site have all topics.
use |
rss_table |
a dataframe with urls and rss feeds in case you
need to construct your own out of websites not in the included database.
Be sure to have the same format as the included data. See |
Value
a tibble containing the contents of the rss feed
Examples
## Not run:
Sys.sleep(3) # adding a small time delay to avoid
# simultaneous posts to the API
get_news(website = "ycombinator.com", rss_table = package_rss)
## End(Not run)
RSS table from python package newscatcher
Description
A dataset containing sample medical data.
Usage
package_rss
Format
A data frame with 4505 rows and 7 variables:
- clean_url
url of news website
- language
the language of the website
- topic_unified
the topic of the website
- main
main
- clean_country
clean_country
- rss_url
location of feed
- GlobalRank
rank of website
Source
https://github.com/kotartemiy/newscatcher
Show countries Show all countries in the database.
Description
Show countries Show all countries in the database.
Usage
show_countries(rss_table = package_rss)
Arguments
rss_table |
a dataframe with urls and rss feeds in case you #need to construct your own out of websites not in the included database. #Be sure to have the same format as the included data. See 'R/package_rss.R' #for details. |
Value
a character vector of available countries
Show languages Show all languages in the database.
Description
Show languages Show all languages in the database.
Usage
show_languages(rss_table = package_rss)
Arguments
rss_table |
a dataframe with urls and rss feeds in case you #need to construct your own out of websites not in the included database.#' #Be sure to have the same format as the included data. See 'R/package_rss.R' #for details. |
Value
a character vector of available languages
Show topics Show all topics in the database.
Description
Show topics Show all topics in the database.
Usage
show_topics(rss_table = package_rss)
Arguments
rss_table |
a dataframe with urls and rss feeds in case you #need to construct your own out of websites not in the included database. #Be sure to have the same format as the included data. See 'R/package_rss.R' #for details. |
Value
a character vector of available topics