Remember our recent post on 5 Hottest Amazon Web Services Yet to Come? In the post, we revealed that one AWS service we cannot wait to become publicly available is Neptune, Amazon’s first graph database.
But what is all our excitement about? Read on to find out.
What are graph databases?
To acknowledge the power of Neptune, first we need to understand the ins and outs of graph databases. Here’s a little help.
Today, every piece of information is somehow related to another. No data item exists in isolation. Modern applications often operate on relationships between highly connected data, and to do that efficiently they need to understand and navigate complex data structures. Let’s use an example to illustrate this.
Graph databases – an example of application
Are you using any social media platform? Every social media user has some connections who like and read different stories or check in to different places. All this information is interrelated. Assume that you want to prioritize a newsfeed from your friends who like the same food as you and recently visited a restaurant in your area. You could use a relational database to store and query this data, but this would require using numerous tables, foreign keys, nested queries, and so on. Without a graph database, this seemingly primary use case turns into a grueling task.
On the other hand, if you choose a graph database, it will store the social media connections in your network as first-class citizens in the data model. Such solution enables you to perform a query against all data at no time and to find the requested information quickly. The connections are autonomous of the total size of a dataset and use the starting point and a pattern to collect and aggregate data from neighboring nodes and relationships.
Nodes, properties, and edges
Graph databases consist of nodes, properties, and edges.
- Nodes are entities in the graph, and they can represent people, businesses, or any other trackable items. You can compare them to records, relations, or rows in a relational database.
- Properties. Nodes can hold key-value-paired attributes like name, age, or date of birth. These attributes are called properties and are germane to the nodes.
- Edges (also called graphs or relationships) are the critical concept in graph databases. They provide connections between the nodes. Edges always have a direction, type, start node and end node, and occasionally can also have properties.
In a graph database, every two nodes can share any number of relationships, and any single node can be in a relationship with any number of nodes. An important rule is that a relationship never points to a non-existing node. If you remove a node, you also have to remove the associated connections.
Relational vs. graph databases. What’s the difference?
Relational databases rely on a concept where data tables link to one another by storing one record’s unique key in another record’s data. For example:
To find a department where an employee belongs, you need to identify EmployeeID in the Employees table, then search for DeptID associated with that EmployeeID in the Dept_Employees table, and take that DeptID and search for DeptName in the Departments table. The final results are merged into a single output with the employee name and the associated department.
In a graph database, the entire process is much more straightforward as the database stores all relationships between objects directly. Compare:
The above example involves only one level of complexity, but the real power of the graph approach becomes evident in more advanced scenarios.
For example, imagine that you need to find a project where employees from a particular department used to work together on any given project in the past. That is a detailed enough query to make your head explode, but a graph database will handle this request without any problem.
Graph databases – common use cases
We already know that graph databases can quickly process large sets of objects and analyze the connections between them. That gives them an advantage over relational databases for use cases like social networking, recommendation engines, fraud detection, knowledge graphs, or network/IT operations.
Here are some more detailed examples of graph databases application:
- With graph databases, you enable your users to prioritize newsfeed results in social networks quickly.
- They are also ideal for storing customer interests or purchase history. You may create a personalized offer based on a customer’s purchase history, interests of their friends, or on people who bought similar items.
- You can also use graph queries for fraud detection. They will help you recognize customers sharing the same IP but living in different geographical regions, or detect email addresses or card numbers previously associated with identity theft cases or credit card fraud.
- Another typical case for graph databases usage is knowledge graphs. They can be used to create catalogs where one can browse a category or a single object and easily find related objects (e.g., books by the same author or clothes in the same size).
- Your network/IT operation teams can use graph databases to store a map of a network as graphs. Such method may help you efficiently trace anomalies and therefore better protect your network. For example, you will be able to quickly spot a malicious file in your system and delete it.
I will leave other use cases for graph databases to your imagination. As long as your scenario requires navigating data relations, graph DBs will ideally serve your purpose.
Now, let’s move on to a particular instance of a graph database. On to Amazon Neptune! The first AWS graph DB service.
What is Amazon Neptune?
Amazon Neptune is a new database by Amazon announced as a preview on November 29, 2017. Currently, it is available in the North Virginia region on t2.medium and R4 family instances.
Neptune is fast, reliable, and fully managed. But what makes it so unique that we proclaimed it the most anticipated AWS service in the preview stage?
Currently, to run graph DBs on AWS, you must use external Amazon Machine Images for Neo4j, OrientDB, and GraphDB, or use DynamoDB as a storage backend for JanusGraph. When Neptune launches, it will become the first publicly available and fully-managed graph database on AWS.
That’s not all, however. Here are seven steadfast reasons why Neptune stands a high chance of becoming one of your favorite AWS services.
7 reasons you will love Amazon Neptune
- Neptune provides 99.99% availability. On instance failure, it can failover to one of up to 15 low-latency replicas, which makes it pretty fast and reliable.
- The database volume size can grow according to your needs. It increments by 10 GB up to the maximum of 64 TB.
- To provide security, the Amazon Neptune database runs in a VPC where access is managed with firewall access control lists. You can encrypt both, data in transit and at rest.
- Neptune is fully managed, which makes it easy to use and operate. All instances are pre-configured at launch, with settings and parameters adjusted to match the selected instance class. No additional configuration is required.
- Amazon will keep your instances up-to-date thanks to automatic software patching. The database parameters can be monitored with CloudWatch, and the monitoring includes over 20 critical operational metrics like compute, memory, storage, query throughput, and active connections.
- The service supports popular graph models like Apache TinkerPop and W3C’s RDF, and their associated query languages: TinkerPop Gremlin and RDF SPARQL. Such choice is sufficient to configure your existing applications to point to Neptune and make them use your new database.
- Amazon Neptune instances start at $0.098 per hour for a db.t2.medium instance. The storage cost is $0.10 per GB-month, and the I/O rate price is $0.20 per 1 million requests. Regarding backup storage, there’s no extra charge for up to 100% of your total Neptune storage per region. All exceeding storage comes at standard S3 rates. Standard data transfer rates apply for outbound transfer.
How to get Amazon Neptune?
Amazon Neptune is still in the preview stage (you can sign up here: https://pages.awscloud.com/NeptunePreview.html), however, until it becomes publicly available, you may use graph databases on AWS by running your database with EC2 and EBS services.
Neo4j, OrientDB, and GraphDB are available on the AWS marketplace. You can also use Amazon DynamoDB as a backend for JanusGraph to store graphs of any size in fully-managed DynamoDB, without changing your application.
You just can’t go wrong with graph databases
Graph databases are perfect when you need to store and query relationships between objects in your datasets. They eliminate the need for complicated SQL queries. As long as your application requires operating on relationships, then there’s no better choice. And if you would like to use an AWS product, that leaves you with Amazon Neptune.
I strongly encourage you to sign up for a preview and test Neptune on your own, especially if you have never used graph databases before. You will become a devoted fan in no time.