A graph, while not "tidy" in itself, can be thought of as two tidy data frames describing node and edge data respectively. 'tidygraph' provides an approach to manipulate these two virtual data frames using the API defined in the 'dplyr' package, as well as provides tidy interfaces to a lot of common graph algorithms.
This package provides a tidy API for graph/network manipulation. While network data itself is not tidy, it can be envisioned as two tidy tables, one for node data and one for edge data.
tidygraph provides a way to switch between the two tables and provides
dplyr verbs for manipulating them. Furthermore it provides access to a lot of graph algorithms with return values that facilitate their use in a tidy workflow.
library(tidygraph)play_erdos_renyi(10, 0.5) %>%activate(nodes) %>%mutate(degree = centrality_degree()) %>%activate(edges) %>%mutate(centrality = centrality_edge_betweenness()) %>%arrange(centrality)#> ##> # A directed simple graph with 1 component#> ##> # Edge Data: 37 x 3 (active)#> from to centrality#> <int> <int> <dbl>#> 1 10 3 1.500000#> 2 5 6 1.500000#> 3 2 7 1.500000#> 4 10 9 1.500000#> 5 8 7 1.833333#> 6 5 8 1.833333#> # ... with 31 more rows#> ##> # Node Data: 10 x 1#> degree#> <dbl>#> 1 5#> 2 3#> 3 4#> # ... with 7 more rows
tidygraph is a huge package that exports 280 different functions and methods. It more or less wraps the full functionality of
igraph in a tidy API giving you access to almost all of the
dplyr verbs plus a few more, developed for use with relational data.
tidygraph adds some extra verbs for specific use in network analysis and manipulation. The
activate() defines wether one is manipulating node or edge data at the moment as shown in the example above.
bind_graphs() lets you expand the graph structure you're working with, while
graph_join() lets you merge two graphs on some node identifier.
reroute() on the other hand lets you change the terminal nodes of the edges in the graph.
tidygraph wraps almost all of
igraphs graph algorithms and provides a consistent interface and output that always matches the sequence of nodes and edges. All
tidygraph algorithm wrappers are intended for use inside verbs where they know the context they are being called in. In the example above it is not necessary to supply the graph nor the node/edge ids to
centrality_edge_betweenness() as they are aware of that already. This leads to much clearer code and less typing.
tidygraph goes beyond
dplyr and also implement graph centric version of the
purrr map functions. You can now call a function on the nodes in the order of a breath or depth first search while getting access to the result of the previous calls.
tidygraph lets you temporarily change the representation of your graph, do some manipulation of the node and edge data, and then change back to the original graph with the changes being merged in automatically. This is powered by the new
unmorph() verbs hat lets you e.g. contract nodes, work on the linegraph representation, split communities to seperate graphs etc. If you wish to continue with the morphed version, the
crystallise() verb lets you freeze the temporary representation into a proper
tidygraph is powered by igraph underneath it wants everyone to join the fun. the
as_tbl_graph() function can easily convert relational data from all your favourite objects, such as
graph, etc. More conversion will be added in the order I get aware of them.
tidygraph itself does not provide any means of visualisation, but it works flawlessly with
ggraph. This division makes it easy to develop the visualisation and manipulation code at different speeds depending on where the needs arise.
tidygraph is available on CRAN and can be installed simply, using
install.packages(tidygraph). For the development version available on GitHub, use the
devtools package for installation:
tidygraph stands on the shoulders of particularly the
dplyr/tidyverse teams. It would not have happened without them, so thanks so much to them.
tbl_graphfrom an adjacency list containing
convertverb to perform both
crystallisein one go, returning a single
morphthe original data will be stored in
.datato avoid conflicts with
.dataargument in many tidyverse verbs (BREAKING)
as_tbl_graph.data.framenow recognises set tables (each column gives eachs rows membership to that set)
with_graphto allow computation of algorithms outside of verbs
graph_is_*set of querying functions has been added that all returns logical scalars.
%E>%for activating nodes and edges respectively as part of the piping.
mutatenow lets you reference created columns in graph algorithms so it behaves in line with expected
mutatebehaviour. This has led to a slight performance decrease (millisecond scale). The old behaviour can be accessed using
mutate_as_tblwhere the graph will only get updated in the end.
bind_graphsnow work with a single
.register_graph_contextto allow the use of tidygraph algorithms in external functions.
node_rank_*family of algorithms for seriation of nodes
to_hierarchical_clustersmorpher to work with hierarchical representations of community detection algorithms.
group_*algorithms now ensure that the groups are enumerated in descending order based on size, i.e. members of the largest group/community will always have
netrankrresulting in 19 new centrality scores and a manual mode for composing new centrality scores
edge_is_[from|to|between|incident]()to help find edges related to certain nodes