Category:
DatabasesIntroduction – The Neo4j Universe
If you use social media in any capacity on the internet, you will undoubtedly come across a site that is backed by a graph database. The whole concept of and theory behind the graph database goes hand in hand with any site that deals with universal entities and the many types of relationships between them. Graph databases are conceptualized, designed, and optimized to leverage these relationships so that traversing them to either find or mine data is not only easier than in traditional RDBMS systems, but overall query performance is increased dramatically.
For example, in a traditional relational database, if you had a user table and a relation table that contains two user IDs to denote one is a freind of the other, then in order to start investigating friends of friends questions at depth greater than one, you would have to inner join this relation table with itself. As you increase the depth level (friends of friends of friends, etc) you would have to inner join the relation table with itself as many times to equal the depth of the query. This can get costly in terms of performance if you have a large user table, and even larger relation table, and their associated indexes.
The fundamental concept behind graph databases is the idea of the Property Graph Model (PGM). Similar to traditional object models in relational databases, the PGM breaks data down in terms of a variety of entity nodes and relationships. An entity node is an ordinary object that has one or more properties and can contains zero or more relationships. A relationship is itself an object that contains properties but also has the additional restriction on requiring both a starting and ending node. A relationship has to connect two entity nodes – the PGM prevents the situation where an entity node has a relationship with a non-entity (NULL).
Neo4J is an implementation of the PGM in a purely Java-based environment with lots of tools, a graphical query browser, and APIs to support you in most anything you would need in order to load, traverse, query, and extract data out of a graph database. It also has a diverse and energetic user community.
Accessing the Data
Neo4j has essentially three basic ways to query the graph data:
- A Neo4j Core Java API that is baked into the Neo4j library itself. This query methodology allows the user to build a query object by using standard library methods.
- The Neo4j Traversal API is a callback-based framework that contains a builder API that allows you to expressively build the traversal rules into one line of code.
- Cypher: the declarative query language for graphs that use graph pattern-matching as the main driver for graph data selection. It’s the equivalent to SQL in RDBMS systems.
All three of these methods have the flexibility that allow you to query the data by traversing the nodes and edges (relationships) that meet the specified query criteria. Depending on the type of query you are executing, you may return one or more entities, relationships, or subsets of properties of either.
Graph databases are unique in that traversing the graph means you have to start somewhere – a starting node. So, unless you know the node you want to start with, you may end up querying for starter nodes that meet a certain criteria. The ability to efficiently query entities and relationships by property value(s) is provided by an underlying Lucene indexing paradigm. Once starting nodes are identified, then traversing from these nodes becomes quite efficient. There are lots of strategies on indexing property data in Neo4j but for the most part, you get a lot of indexing automatically. The main point here is that you typically use Lucene-backed queries to identify starting nodes and from there you can build your queries/traversals. The main benefit for starting with a initial node is that traversal efficiency is not hindered by the volume of data or the size of the indexes.
Exploring Neo4j and graph databases in general is an interesting exercise. Facebook is based purely on the PGM concept and have developed their own proprietary graph database and query tools. Essentially any social media outlet has a graph database behind it because relationships are at the heart of these systems and tools like Neo4j are specifically designed to interact with and leverage entities and the relationships between them.
Simple Example
Obviously the best way to learn anything is to dig in and start tinkering. With that in mind, I created a small Spring Boot application that uses Neo4j as the backend database so that anyone could get up and running quickly with a working instance – with data.
The use of Spring as the application framework was an obvious choice since it ties nicely into the Spring Data JPA project where we can leverage the Spring Data Neo4j module. Harnessing the power and convenience of the JPA paradigm for Neo4j – and specifically Cypher queries themselves – makes a lot of sense. One can utilize the convenience of method-based repository access to the data as well as write your own JPA-based queries and finally you can even write your own native Cypher queries. The sample application contains all of these query options.
As stated above, once you have installed Neo4j and get it up and running, you can browse your data with their web-based browser that leverages the D3 javascript visualization library. You can explore your data graphically/visually that, over time, shows you relationships that you may not have recognized otherwise.
The idea behind the application is that you have users who have playlists of songs where songs belong to albums. The idea of this simple application is to show you can answer some interesting questions easily with graph databases. What songs do my friends have in their playlists? Find out what other songs are contained in the playlists that contain a specific song. On and on…..
The link to the repository is listed below and the associated README should get you up and running fairly quickly.
Conclusion
Exploring graph databases was more fun than I ever expected. You gain a lot of efficiencies by using graphs if you find that your data supports the idea of nodes and relationships. I would recommend that you check out the project/repository in the BES Bitbucket site as well as the Neo4j and Spring Data Neo4J websites for further reading.
Links
The main Neo4j page link:
A great overview of the Property Graph Model
https://neo4j.com/developer/graph-database/#property-graph
The Spring Data Neo4j project
https://projects.spring.io/spring-data-neo4j/
BES Bitbucket Page
https://bitbucket.org/bestechnologyinc/springneo/src/master/