Type: | Package |
Title: | Interface for 'GraphFrames' |
Version: | 0.1.2 |
Maintainer: | Kevin Kuo <kevin.kuo@rstudio.com> |
Description: | A 'sparklyr' https://spark.rstudio.com/ extension that provides an R interface for 'GraphFrames' https://graphframes.github.io/. 'GraphFrames' is a package for 'Apache Spark' that provides a DataFrame-based API for working with graphs. Functionality includes motif finding and common graph algorithms, such as PageRank and Breadth-first search. |
URL: | https://github.com/rstudio/graphframes |
BugReports: | https://github.com/rstudio/graphframes/issues |
License: | Apache License 2.0 | file LICENSE |
Encoding: | UTF-8 |
LazyData: | true |
RoxygenNote: | 6.1.0 |
Imports: | sparklyr, tibble, forge |
Suggests: | testthat, covr, dplyr |
NeedsCompilation: | no |
Packaged: | 2018-10-30 19:01:05 UTC; kevinykuo |
Author: | Kevin Kuo |
Repository: | CRAN |
Date/Publication: | 2018-10-30 19:20:03 UTC |
Breadth-first search (BFS)
Description
Breadth-first search (BFS)
Usage
gf_bfs(x, from_expr, to_expr, max_path_length = 10, edge_filter = NULL,
...)
Arguments
x |
An object coercable to a GraphFrame (typically, a
|
from_expr |
Spark SQL expression specifying valid starting vertices for the BFS. |
to_expr |
Spark SQL expression specifying valid target vertices for the BFS. |
max_path_length |
Limit on the length of paths. |
edge_filter |
Spark SQL expression specifying edges which may be used in the search. |
... |
Optional arguments, currently not used. |
Examples
## Not run:
g <- gf_friends(sc)
gf_bfs(g, from_expr = "name = 'Esther'", to_expr = "age < 32")
## End(Not run)
Cache the GraphFrame
Description
Cache the GraphFrame
Usage
gf_cache(x)
Arguments
x |
An object coercable to a GraphFrame (typically, a
|
Chain graph
Description
Returns a chain graph of the given size with Long ID type. The vertex IDs are 0, 1, ..., n-1, and the edges are (0, 1), (1, 2), ...., (n-2, n-1).
Usage
gf_chain(sc, n)
Arguments
sc |
A Spark connection. |
n |
Size of the graph to return. |
Examples
## Not run:
gf_chain(sc, 5)
## End(Not run)
Connected components
Description
Computes the connected component membership of each vertex and returns a DataFrame of vertex information with each vertex assigned a component ID.
Usage
gf_connected_components(x, broadcast_threshold = 1000000L,
algorithm = c("graphframes", "graphx"), checkpoint_interval = 2L,
...)
Arguments
x |
An object coercable to a GraphFrame (typically, a
|
broadcast_threshold |
Broadcast threshold in propagating component assignments. |
algorithm |
One of 'graphframes' or 'graphx'. |
checkpoint_interval |
Checkpoint interval in terms of number of iterations. |
... |
Optional arguments, currently not used. |
Examples
## Not run:
# checkpoint directory is required for gf_connected_components()
spark_set_checkpoint_dir(sc, tempdir())
g <- gf_friends(sc)
gf_connected_components(g)
## End(Not run)
Degrees of vertices
Description
Degrees of vertices
Usage
gf_degrees(x)
Arguments
x |
An object coercable to a GraphFrame (typically, a
|
Edges column names
Description
Edges column names
Usage
gf_edge_columns(x)
Arguments
x |
An object coercable to a GraphFrame (typically, a
|
Extract edges DataFrame
Description
Extract edges DataFrame
Usage
gf_edges(x)
Arguments
x |
An object coercable to a GraphFrame (typically, a
|
Motif finding: Searching the graph for structural patterns
Description
Motif finding uses a simple Domain-Specific Language (DSL) for expressing structural queries. For example, gf_find(g, "(a)-[e]->(b); (b)-[e2]->(a)") will search for pairs of vertices a,b connected by edges in both directions. It will return a DataFrame of all such structures in the graph, with columns for each of the named elements (vertices or edges) in the motif. In this case, the returned columns will be in order of the pattern: "a, e, b, e2."
Usage
gf_find(x, pattern)
Arguments
x |
An object coercable to a GraphFrame (typically, a
|
pattern |
pattern specifying a motif to search for |
Examples
## Not run:
gf_friends(sc) %>%
gf_find("(a)-[e]->(b); (b)-[e2]->(a)")
## End(Not run)
Graph of friends in a social network.
Description
Graph of friends in a social network.
Usage
gf_friends(sc)
Arguments
sc |
A Spark connection. |
Examples
## Not run:
library(sparklyr)
sc <- spark_connect(master = "local")
gf_friends(sc)
## End(Not run)
Create a new GraphFrame
Description
Create a new GraphFrame
Usage
gf_graphframe(vertices = NULL, edges)
Arguments
vertices |
A |
edges |
A |
Examples
## Not run:
library(sparklyr)
sc <- spark_connect(master = "local", version = "2.3.0")
v_tbl <- sdf_copy_to(
sc, data.frame(id = 1:3, name = LETTERS[1:3])
)
e_tbl <- sdf_copy_to(
sc, data.frame(src = c(1, 2, 2), dst = c(2, 1, 3),
action = c("love", "hate", "follow"))
)
gf_graphframe(v_tbl, e_tbl)
gf_graphframe(edges = e_tbl)
## End(Not run)
Generate a grid Ising model with random parameters
Description
Generate a grid Ising model with random parameters
Usage
gf_grid_ising_model(sc, n, v_std = 1, e_std = 1)
Arguments
sc |
A Spark connection. |
n |
Length of one side of the grid. The grid will be of size n x n. |
v_std |
Standard deviation of normal distribution used to generate vertex factors "a". Default of 1.0. |
e_std |
Standard deviation of normal distribution used to generate edge factors "b". Default of 1.0. |
Details
This method generates a grid Ising model with random parameters. Ising models are probabilistic graphical models over binary variables xi. Each binary variable xi corresponds to one vertex, and it may take values -1 or +1. The probability distribution P(X) (over all xi) is parameterized by vertex factors ai and edge factors bij:
P(X) = (1/Z) * exp[ \sum_i a_i x_i + \sum_{ij} b_{ij} x_i x_j ]
Value
GraphFrame. Vertices have columns "id" and "a". Edges have columns "src", "dst", and "b". Edges are directed, but they should be treated as undirected in any algorithms run on this model. Vertex IDs are of the form "i,j". E.g., vertex "1,3" is in the second row and fourth column of the grid.
Examples
## Not run:
gf_grid_ising_model(sc, 5)
## End(Not run)
In-degrees of vertices
Description
In-degrees of vertices
Usage
gf_in_degrees(x)
Arguments
x |
An object coercable to a GraphFrame (typically, a
|
Label propagation algorithm (LPA)
Description
Run static Label Propagation for detecting communities in networks. Each node in the network is initially assigned to its own community. At every iteration, nodes send their community affiliation to all neighbors and update their state to the mode community affiliation of incoming messages. LPA is a standard community detection algorithm for graphs. It is very inexpensive computationally, although (1) convergence is not guaranteed and (2) one can end up with trivial solutions (all nodes are identified into a single community).
Usage
gf_lpa(x, max_iter, ...)
Arguments
x |
An object coercable to a GraphFrame (typically, a
|
max_iter |
Maximum number of iterations. |
... |
Optional arguments, currently not used. |
Examples
## Not run:
g <- gf_friends(sc)
gf_lpa(g, max_iter = 5)
## End(Not run)
Out-degrees of vertices
Description
Out-degrees of vertices
Usage
gf_out_degrees(x)
Arguments
x |
An object coercable to a GraphFrame (typically, a
|
PageRank
Description
PageRank
Usage
gf_pagerank(x, tol = NULL, reset_probability = 0.15, max_iter = NULL,
source_id = NULL, ...)
Arguments
x |
An object coercable to a GraphFrame (typically, a
|
tol |
Tolerance. |
reset_probability |
Reset probability. |
max_iter |
Maximum number of iterations. |
source_id |
(Optional) Source vertex for a personalized pagerank. |
... |
Optional arguments, currently not used. |
Examples
## Not run:
g <- gf_friends(sc)
gf_pagerank(g, reset_probability = 0.15, tol = 0.01)
## End(Not run)
Persist the GraphFrame
Description
Persist the GraphFrame
Usage
gf_persist(x, storage_level = "MEMORY_AND_DISK")
Arguments
x |
An object coercable to a GraphFrame (typically, a
|
storage_level |
The storage level to be used. Please view the Spark Documentation for information on what storage levels are accepted. |
Register a GraphFrame object
Description
Register a GraphFrame object
Usage
gf_register(x)
Arguments
x |
An object coercable to a GraphFrame (typically, a
|
Strongly connected components
Description
Compute the strongly connected component (SCC) of each vertex and return a DataFrame with each vertex assigned to the SCC containing that vertex.
Usage
gf_scc(x, max_iter, ...)
Arguments
x |
An object coercable to a GraphFrame (typically, a
|
max_iter |
Maximum number of iterations. |
... |
Optional arguments, currently not used. |
Examples
## Not run:
g <- gf_friends(sc)
gf_scc(g, max_iter = 10)
## End(Not run)
Shortest paths
Description
Computes shortest paths from every vertex to the given set of landmark vertices. Note that this takes edge direction into account.
Usage
gf_shortest_paths(x, landmarks, ...)
Arguments
x |
An object coercable to a GraphFrame (typically, a
|
landmarks |
IDs of landmark vertices. |
... |
Optional arguments, currently not used. |
Examples
## Not run:
g <- gf_friends(sc)
gf_shortest_paths(g, landmarks = c("a", "d"))
## End(Not run)
Generate a star graph
Description
Returns a star graph with Long ID type, consisting of a central element indexed 0 (the root) and the n other leaf vertices 1, 2, ..., n.
Usage
gf_star(sc, n)
Arguments
sc |
A Spark connection. |
n |
The number of leaves. |
Examples
## Not run:
gf_star(sc, 5)
## End(Not run)
Computes the number of triangles passing through each vertex.
Description
This algorithm ignores edge direction; i.e., all edges are treated as undirected. In a multigraph, duplicate edges will be counted only once.
Usage
gf_triangle_count(x, ...)
Arguments
x |
An object coercable to a GraphFrame (typically, a
|
... |
Optional arguments, currently not used. |
Examples
## Not run:
g <- gf_friends(sc)
gf_triangle_count(g)
## End(Not run)
Triplets of graph
Description
Triplets of graph
Usage
gf_triplets(x)
Arguments
x |
An object coercable to a GraphFrame (typically, a
|
Generate two blobs
Description
Two densely connected blobs (vertices 0->n-1 and n->2n-1) connected by a single edge (0->n).
Usage
gf_two_blobs(sc, blob_size)
Arguments
sc |
A Spark connection. |
blob_size |
The size of each blob. |
Examples
## Not run:
gf_two_blobs(sc, 3)
## End(Not run)
Unpersist the GraphFrame
Description
Unpersist the GraphFrame
Usage
gf_unpersist(x, blocking = FALSE)
Arguments
x |
An object coercable to a GraphFrame (typically, a
|
blocking |
whether to block until all blocks are deleted |
Vertices column names
Description
Vertices column names
Usage
gf_vertex_columns(x)
Arguments
x |
An object coercable to a GraphFrame (typically, a
|
Extract vertices DataFrame
Description
Extract vertices DataFrame
Usage
gf_vertices(x)
Arguments
x |
An object coercable to a GraphFrame (typically, a
|
Retrieve a GraphFrame
Description
Retrieve a GraphFrame
Usage
spark_graphframe(x, ...)
spark_graphframe(x, ...)
Arguments
x |
An object coercable to a GraphFrame (typically, a
|
... |
additional arguments, not used |