Skip to content

jakub-borusewicz/networkx-neo4j

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

54 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

networkx-neo4j

This library provides NetworkX API for Neo4j Graph Data Science. You should be able to use it as you would NetworkX but algorithms will run against Neo4j.

Installation

You can install the library by running the following command:

pip install networkx-neo4j

You’ll also need to install Neo4j and the Graph Algorithms library.

plugin gds

Usage

Here’s how you use it.

First let’s import our libraries and create an instance of the Neo4j driver:

import nxneo4j
from neo4j import GraphDatabase

driver = GraphDatabase.driver("bolt://localhost", auth=(user_name, password))

And now let’s create a Graph object around the driver:

G = nxneo4j.Graph(driver)

We can create a little graph:

G.add_node(1)
G.add_nodes_from([2, 3])

G.add_edge(1, 2)
G.add_edge(4, 5)
G.add_edges_from([(1, 2), (1, 3), (2, 3)])

And run algorithms against it:

>>> list(nxneo4j.community.label_propagation_communities(G))
[{1, 2, 3}, {4, 5}]

Graph of Thrones

We can also work with an existing graph. You can import this dataset by playing the following guide from the Neo4j browser:

:play http://guides.neo4j.com/data_science/01_eda.html

And then click through the sections that import the data. This graph contains nodes with a Character label that have INTERACTS1, INTERACTS2, INTERACTS3, and INTERACTS45 relationships between them.

Now let’s create a Graph object that knows about this graph. First we’ll create a config map that we’ll pass in to the Graph() constructor.

config = {
    "node_label": "Character",
    "relationship_type": None,
    "identifier_property": "name"
}

We set:

  • node_label is Character so that we’ll only consider nodes with that label

  • relationship_type is None so that we’ll consider all relationship types in the graph

  • identifier_property is the node property that we’ll use to identify each node from the networkx-neo4j API

G = nxneo4j.Graph(driver, config)

We can find the most influential characters by executing the PageRank algorithm against this graph like this:

sorted_pagerank = sorted(nxneo4j.centrality.pagerank(G).items(), key=lambda x: x[1], reverse=True)
for character, score in sorted_pagerank[:10]:
    print(character, score)

Tyrion-Lannister 11.312183000000001
Stannis-Baratheon 7.211595999999998
Tywin-Lannister 6.997056000000001
Varys 6.228078
Theon-Greyjoy 4.6472225
Sansa-Stark 4.234794
Walder-Frey 3.2763000000000004
Robb-Stark 3.0024044999999995
Samwell-Tarly 2.9787575000000004
Jon-Snow 2.9183989999999995

Hopefully there are some familiar names there!

What about if we want to find the shortest path between characters?

nxneo4j.path_finding.shortest_path(G, "Tyrion-Lannister", "Hodor")

['Tyrion-Lannister', 'Robb-Stark', 'Hodor']

We can also partition the characters into communities:

communities = nxneo4j.community.label_propagation_communities(G)
sorted_communities = sorted(communities, key=lambda x: len(x), reverse=True)
for community in sorted_communities[:10]:
    print(list(community)[:10])

['Josmyn-Peckledon', 'Belwas', 'Rafford', 'Polliver', 'Petyr-Frey', 'Tristifer-IV-Mudd', 'Jeyne-Heddle', 'Urswyck', 'Falyse-Stokeworth', 'Hoster-Blackwood']
['Trystane-Martell', 'Blue-Bard', 'Matthos-Seaworth', 'Marya-Seaworth', 'Mors-Umber', 'Jaehaerys-I-Targaryen', 'Myrcella-Baratheon', 'Justin-Massey', 'Denys-Mallister', 'Clayton-Suggs']
['Oberyn-Martell', 'Nurse', 'Tommen-Baratheon', 'Tanda-Stokeworth', 'Garlan-Tyrell', 'Morgo', 'Qavo-Nogarys', 'Moon-Boy', 'Leonette-Fossoway', 'Allar-Deem']
['Owen', 'Jon-Snow', 'Gerrick-Kingsblood', 'Lanna-(Happy-Port)', 'Maekar-I-Targaryen', 'Gorne', 'Arron', 'Arson', 'Satin', 'Rast']
['Asha-Greyjoy', 'Palla', 'Squirrel', 'Tristifer-Botley', 'Yellow-Dick', 'Lorren', 'Jason-Mallister', 'Benfred-Tallhart', 'Kyra', 'Gynir']
['Harras-Harlaw', 'Baelor-Blacktyde', 'Dunstan-Drumm', 'Ralf-Stonehouse', 'Gorold-Goodbrother', 'Rodrik-Harlaw', 'Talbert-Serry', 'Sigfryd-Harlaw', 'Rodrik-Sparr', 'Wulfe']
['Alliser-Thorne', 'Othell-Yarwyck', 'Jaremy-Rykker', 'Ragwyle', 'Craster', 'Clubfoot-Karl', 'Blane', 'Donal-Noye', 'Halder', 'Mag-Mar-Tun-Doh-Weg']
['Tomard', 'Horton-Redfort', 'Lothor-Brune', 'Myranda-Royce', 'Grisel', 'Merrett-Frey', 'Loras-Tyrell', 'Nestor-Royce', 'Anya-Waynwood', 'Marillion']
['Marq-Piper', 'Rickard-Karstark', 'Margaery-Tyrell', 'Senelle', 'Hallis-Mollen', 'Harren-Hoare', 'Nan', 'Colen-of-Greenpools', 'Desmond-Grell', 'Edmure-Tully']
['Koss', 'Woth', 'Meralyn', 'Mad-Huntsman', 'Dobber', 'Ravella-Swann', 'Ternesio-Terys', 'Yoren', 'Amabel', 'Waif']

What algorithms are supported?

Centrality:

for module in dir(nxneo4j.centrality):
    if not module.startswith("__"):
        print(module)

betweenness_centrality
closeness_centrality
harmonic_centrality
pagerank

Community Detection:

for module in dir(nxneo4j.community):
    if not module.startswith("__"):
        print(module)

average_clustering
clustering
connected_components
label_propagation_communities
number_connected_components
triangles

Path Finding:

for module in dir(nxneo4j.path_finding):
    if not module.startswith("__"):
        print(module)

shortest_path

Shortest Path currently only works if you provide both Target and Source nodes.

What’s still missing?

Not all the algorithms are translated yet. These ones are next on the list:

  • Shortest path

  • A-star

NetworkX vs Neo4j

You can also run an example showing NetworkX and Neo4j side by side:

python -m examples.networkx_vs_neo4j

About

NetworkX API for Neo4j Graph Algorithms.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%