Graph Databases Ian Robinson download
Graph Databases Ian Robinson download
https://ebookname.com/product/graph-databases-ian-robinson/
https://ebookname.com/product/genomes-browsers-and-databases-
data-mining-tools-for-integrated-genomic-databases-1st-edition-
peter-schattner/
https://ebookname.com/product/graph-algorithms-2nd-edition-
shimon-even-2/
https://ebookname.com/product/graph-algorithms-2nd-edition-
shimon-even/
https://ebookname.com/product/british-history-for-dummies-2nd-
edition-sean-lang/
The Neurocognition of Language 1st Edition Colin M.
Brown
https://ebookname.com/product/the-neurocognition-of-language-1st-
edition-colin-m-brown/
https://ebookname.com/product/social-problems-2003rd-edition-
kurt-finsterbusch/
https://ebookname.com/product/the-tyranny-of-experts-economists-
dictators-and-the-forgotten-rights-of-the-poor-1st-edition-
william-easterly/
https://ebookname.com/product/the-status-and-appraisal-of-
classic-texts-conal-condren/
https://ebookname.com/product/lipid-domains-1st-edition-
kenworthy/
Essentials of Sports Law Fourth Edition Glenn M. Wong
https://ebookname.com/product/essentials-of-sports-law-fourth-
edition-glenn-m-wong/
Graph Databases
Nutshell Handbook, the Nutshell Handbook logo, and the O’Reilly logo are registered trademarks of O’Reilly
Media, Inc. !!FILL THIS IN!! and related trade dress are trademarks of O’Reilly Media, Inc.
Many of the designations used by manufacturers and sellers to distinguish their products are claimed as
trademarks. Where those designations appear in this book, and O’Reilly Media, Inc., was aware of a trade‐
mark claim, the designations have been printed in caps or initial caps.
While every precaution has been taken in the preparation of this book, the publisher and authors assume
no responsibility for errors or omissions, or for damages resulting from the use of the information contained
herein.
ISBN: 978-1-449-35626-2
[?]
Table of Contents
Preface. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii
1. Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
About This Book 2
What is a Graph? 2
A High Level View of the Graph Space 5
Graph Databases 6
Graph Compute Engines 8
The Power of Graph Databases 10
Performance 10
Flexibility 10
Agility 11
Summary 11
iii
A Comparison of Relational and Graph Modeling 30
Relational Modeling in a Systems Management Domain 31
Graph Modeling in a Systems Management Domain 34
Testing the Model 36
Cross-Domain Models 37
Creating the Shakespeare Graph 40
Beginning a Query 42
Declaring Information Patterns to Find 42
Constraining Matches 44
Processing Results 45
Query Chaining 46
Common Modeling Pitfalls 46
Email Provenance Problem Domain 47
A Sensible First Iteration? 47
Second Time’s the Charm 49
Evolving the Domain 51
Avoiding Anti-Patterns 54
Summary 55
iv | Table of Contents
Why Organizations Choose Graph Databases 93
Common Use Cases 94
Social 94
Recommendations 95
Geo 96
Master Data Management 96
Network and Data Center Management 97
Authorization and Access Control (Communications) 98
Real-World Examples 99
Social Recommendations (Professional Social Network) 99
Authorization and Access Control 107
Geo (Logistics) 113
Index. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189
Table of Contents | v
Preface
vii
do not need to contact us for permission unless you’re reproducing a significant portion
of the code. For example, writing a program that uses several chunks of code from this
book does not require permission. Selling or distributing a CD-ROM of examples from
O’Reilly books does require permission. Answering a question by citing this book and
quoting example code does not require permission. Incorporating a significant amount
of example code from this book into your product’s documentation does require per‐
mission.
We appreciate, but do not require, attribution. An attribution usually includes the title,
author, publisher, and ISBN. For example: “Book Title by Some Author (O’Reilly).
Copyright 2012 Some Copyright Holder, 978-0-596-xxxx-x.”
If you feel your use of code examples falls outside fair use or the permission given above,
feel free to contact us at [email protected].
How to Contact Us
Please address comments and questions concerning this book to the publisher:
viii | Preface
707-829-0104 (fax)
We have a web page for this book, where we list errata, examples, and any additional
information. You can access this page at http://www.oreilly.com/catalog/<catalog page>.
To comment or ask technical questions about this book, send email to bookques
[email protected].
For more information about our books, courses, conferences, and news, see our website
at http://www.oreilly.com.
Find us on Facebook: http://facebook.com/oreilly
Follow us on Twitter: http://twitter.com/oreillymedia
Watch us on YouTube: http://www.youtube.com/oreillymedia
Acknowledgments
Preface | ix
CHAPTER 1
Introduction
Graph databases address one of the great macroscopic business trends of today: lever‐
aging complex and dynamic relationships in highly-connected data to generate insight
and competitive advantage. Whether we want to understand relationships between
customers, elements in a telephone or datacenter network, entertainment producers
and consumers, or genes and proteins, the ability to understand and analyze vast graphs
of highly-connected data will be key in determining which companies outperform their
competitors over the coming decade.
For data of any significant size or value, graph databases are the best way to represent
and query connected data. Connected data is data whose interpretation and value re‐
quires us first to understand the ways in which its constituent elements are related. More
often than not, to generate this understanding, we need to name and qualify the con‐
nections between things.
While large corporates realized this some time ago, creating their own proprietary graph
processing technologies, we’re now in an era where that technology has rapidly become
democratized. Today, general-purpose graph databases are a reality, allowing main‐
stream users to experience the benefits of connected data without having to invest in
building their own graph infrastructure.
What’s remarkable about this renaissance of graph data and graph thinking is that graph
theory itself is not new. Graph theory was pioneered by Euler in the 18th century, and
has been actively researched and improved by mathematicians, sociologists, anthro‐
pologists, and others ever since. However, it is only in the last few years that graph theory
and graph thinking have been applied to information management. In that time, graph
databases have helped solve important problems in the areas of social networking, mas‐
ter data management, geospatial, recommendations, and more. This increased focus on
graph is driven by twin forces: by the massive commercial successes of companies such
as Facebook, Google, and Twitter, all of whom have centered their business models
1
around their own proprietary graph technologies; and by the introduction of general
purpose graph databases into the technology landscape.
What is a Graph?
Formally a graph is just a collection of vertices and edges--or, in less intimidating lan‐
guage, a set of nodes and the relationships that connect them. Graphs represent entities
as nodes and the ways in which those entities relate to the world as relationships. This
general-purpose, expressive structure allows us to model all kinds of scenarios, from
1. http://www.gartner.com/id=2081316
2. For introductions to graph theory, see Richard J. Trudeau, Introduction To Graph Theory (Dover, 1993) and
Gary Chartrand, Introductory Graph Theory (Dover, 1985). For an excellent introduction to how graphs
provide insight into complex events and behaviors, see David Easley and Jon Kleinberg, Networks, Crowds,
and Markets: Reasoning about a Highly Connected World (Cambridge University Press, 2010)
2 | Chapter 1: Introduction
the construction of a space rocket, to a system of roads, and from the supply-chain or
provenance of foodstuff, to medical history for populations, and beyond.
For example, Twitter’s data is easily represented as a graph. In Figure 1-1 we see a small
network of followers. The relationships are key here in establishing the semantic context:
namely, that Billy follows Harry, and that Harry, in turn, follows Billy. Ruth and Harry
likewise follow each other, but sadly, while Ruth follows Billy, Billy hasn’t (yet) recip‐
rocated.
Of course, Twitter’s real graph is hundreds of millions of times larger than the example
in Figure 1-1, but it works on precisely the same principles. In Figure 1-2 we’ve expanded
the graph to include the messages published by Ruth.
What is a Graph? | 3
Figure 1-2. Publishing messages
Though simple, Figure 1-2 shows the expressive power of the graph model. It’s easy to
see that Ruth has published a string of messages. The most recent message can be found
4 | Chapter 1: Introduction
by following a relationship marked CURRENT; PREVIOUS relationships then create a time‐
line of posts.
Most people find the property graph model intuitive and easy to understand. While
simple, it can be used to describe the overwhelming majority of graph use cases in ways
that yield useful insights into our data.
Graph Databases
A graph database management system (henceforth, a graph database) is an online da‐
tabase management system with Create, Read, Update and Delete methods that expose
a graph data model. Graph databases are generally built for use with transactional
(OLTP) systems. Accordingly, they are normally optimized for transactional perfor‐
mance, and engineered with transactional integrity and operational availability in mind.
There are two properties of graph databases you should consider when investigating
graph database technologies:
1. The underlying storage. Some graph databases use native graph storage that is op‐
timized and designed for storing and managing graphs. Not all graph database
technologies use native graph storage however. Some serialize the graph data into
a relational database, an object-oriented database, or some other general-purpose
data store.
2. The processing engine. Some definitions require that a graph database use index-
free adjacency, meaning that connected nodes physically “point” to each other in
the database.4 Here we take a slightly broader view: any database that from the user’s
perspective behaves like a graph database, i.e. exposes a graph data model through
CRUD operations, qualifies as a graph database. We do acknowledge however the
significant performance advantages of index-free adjacency, and therefore use the
term native graph processing to describe graph databases that leverage index-free
adjacency.
4. See Rodriguez, M.A., Neubauer, P., “The Graph Traversal Pattern,” 2010 (http://arxiv.org/abs/1004.1001)
6 | Chapter 1: Introduction
It’s important to note that native graph storage and native graph pro‐
cessing are neither good nor bad—they’re simply classic engineering
tradeoffs. The benefit of native graph storage is that its purpose-built
stack is engineered for performance and scalability. The benefit of non-
native graph storage, in contrast, is that it typically depends on a mature
non-graph backend (such as MySQL) whose production characteristics
are well understood by operations teams. Native graph processing
(index-free adjacency) benefits traversal performance, but at the ex‐
pense of making some non-traversal queries difficult or memory in‐
tensive.
Relationships are first-class citizens of the graph data model, unlike other database
management systems, which require us to infer connections between entities using
contrived properties such as foreign keys, or out-of-band processing like map-reduce.
By assembling the simple abstractions of nodes and relationships into connected struc‐
tures, graph databases allow us to build arbitrarily sophisticated models that map closely
to our problem domain. The resulting models are simpler and at the same time more
expressive than those produced using traditional relational databases and the other
NOSQL stores.
Figure 1-3 shows a pictorial overview of some of the graph databases on the market
today based on their storage and processing models:
8 | Chapter 1: Introduction
Figure 1-4. A high level view of a typical graph compute engine deployment
A variety of different types of graph compute engines exist. Most notably there are in-
memory/single machine graph compute engines like Cassovary, and distributed graph
compute engines like Pegasus or Giraph. Most distributed graph compute engines are
based on the Pregel white paper, authored by Google, which describes the graph com‐
pute engine Google uses to rank pages.5
Performance
One compelling reason, then, for choosing a graph database is the sheer performance
increase when dealing with connected data versus relational databases and NOSQL
stores. In contrast to relational databases, where join-intensive query performance de‐
teriorates as the dataset gets bigger, with a graph database performance tends to remain
relatively constant, even as the dataset grows. This is because queries are localized to a
portion of the graph. As a result, the execution time for each query is proportional only
to the size of the part of the graph traversed to satisfy that query, rather than the size of
the overall graph.
Flexibility
As developers and data architects we want to connect data as the domain dictates,
thereby allowing structure and schema to emerge in tandem with our growing under‐
standing of the problem space, rather than being imposed upfront, when we know least
about the real shape and intricacies of the data. Graph databases address this want
directly. As we show in Chapter 3, the graph data model expresses and accommodates
the business’ needs in a way that enables IT to move at the speed of business.
Graphs are naturally additive, meaning we can add new kinds of relationships, new
nodes, and new subgraphs to an existing structure without disturbing existing queries
and application functionality. These things have generally positive implications for de‐
veloper productivity and project risk. Because of the graph model’s flexibility, we don’t
have to model our domain in exhaustive detail ahead of time—a practice which is all
but foolhardy in the face of changing business requirements. The additive nature of
graphs also means we tend to perform fewer migrations, thereby reducing maintenance
overhead and risk.
10 | Chapter 1: Introduction
Agility
We want to be able to evolve our data model in step with the rest of our application,
using a technology aligned with today’s incremental and iterative software delivery
practices. Modern graph databases equip us to perform frictionless development and
graceful systems maintenance. In particular, the schema-free nature of the graph data
model, coupled with the testable nature of a graph database’s API and query language,
empower us to evolve an application in a controlled manner.
Graph users cannot rely on fixed schemas to provide some level of governance at the
level of the database. But this is not a risk; rather it presents an opportunity to implement
more visible, actionable governance. As we show in Chapter 4, governance is typically
applied in a programmatic fashion, using tests to drive out the data model and queries,
as well as assert the business rules that depend upon the graph. This is no longer a
controversial practice: more so than relational development, graph database develop‐
ment aligns well with today’s agile and test-driven software development practices, al‐
lowing graph database-backed applications to evolve in step with changing business
environment.
Summary
In this chapter we’ve defined connected data and reviewed the graph property model,
a simple yet expressive tool for representing connected data. Property graphs capture
complex domains in an expressive and flexible fashion, while graph databases make it
easy to develop applications that manipulate our graph models.
In the next chapter we’ll look in more detail at how several different technologies address
the challenge of connected data, starting with relational databases, moving onto aggre‐
gate NOSQL stores, and ending with graph databases. In the course of the discussion,
we’ll see why graphs and graph databases provide the best means for modeling, storing
and querying connected data. Later chapters then go on to show how to design and
implement a graph database-based solution.
Summary | 11
CHAPTER 2
Options for Storing Connected Data
We live in a connected world. To thrive and progress, we need to understand and in‐
fluence the web of connections that surrounds us.
How do today’s technologies deal with the challenge of connected data? In this chapter
we look at how relational databases and aggregate NOSQL stores manage graphs and
connected data, and compare their performance to that of a graph database.1
1. For readers interested in exploring the topic of NOSQL, Appendix A describes the four major types of NOSQL
databases
13
Figure 2-1 shows a relational schema for storing customer orders in a customer-centric,
transactional application.
The application exerts a tremendous influence over the design of this schema, making
some queries very easy, others more difficult:
• Join tables add accidental complexity; they mix business data with foreign key met‐
adata.
• Foreign key constraints add additional development and maintenance overhead
just to make the database work.
• Sparse tables with nullable columns require special checking in code, despite the
presence of a schema.
• Several expensive joins are needed just to discover what a customer bought.
• Reciprocal queries are even more costly. “What products did a customer buy?” is
relatively cheap compared to “which customers bought this product?”, which is the
basis of recommendation systems. We could introduce an index, but even with an
index, recursive questions such as “which customers bought this product who also
bought that product?” quickly become prohibitively expensive as the degree of re‐
cursion increases.
On the whole we conclude that the custom of men dressing as women and
of women dressing as men has been practised from a variety of
superstitious motives, among which the principal would seem to be the wish
to please certain powerful spirits or to deceive others.
Thus, while the Pelew custom of prostituting the unmarried girls to all the
men of their own village, but not of their own clan, is a form of sexual
communism practised within a local group, the custom of prostituting them
to men of other villages is a form of sexual communism practised between
members of different local groups; it is a kind of group-marriage. These
customs of the Pelew Islanders therefore support by analogy the hypothesis
that among the ancient peoples of Western Asia also the systematic
prostitution of unmarried women may have been derived from an earlier
period of sexual communism.718
A somewhat similar custom prevails in Yap, one of the western group of the
Caroline Islands, situated to the north of the Pelew group. In each of the
men's clubhouses “are kept three or four unmarried girls or Mespil, whose
business it is to minister to the pleasures of the men of the particular clan or
brotherhood to which the building belongs. As with the Kroomen on the
f Gold Coast, each man, married or single, takes his turn by rotation in the
rites through which each girl must pass before she is deemed ripe for
f marriage. The natives say it is an ordeal or preliminary trial to fit them for
the cares and burden of maternity. She is rarely a girl of the same village,
and, of course, must be sprung from a different sept. Whenever she wishes
to become a Langin or respectable married woman, she may, and is thought
none the less of for her frailties as a Mespil.... But I believe this self-
immolation before marriage is confined to the daughters of the inferior
chiefs and [pg 266] commons. The supply of Mespil is generally kept up by
the purchase of slave girls from the neighbouring districts.”719 According to
another account a mespil “must always be stolen, by force or cunning, from
a district at some distance from that wherein her captors reside. After she
has been fairly, or unfairly, captured and installed in her new home, she
loses no shade of respect among her own people; on the contrary, have not
her beauty and her worth received the highest proof of her exalted
perfection, in the devotion, not of one, but of a whole community of
lovers?”720 However, though the girl is nominally stolen from another district,
the matter is almost always arranged privately with the local chief, who
consents to wink hard at the theft in consideration of a good round sum of
shell money and stone money, which serves “to salve the wounds of a
disrupted family and dispel all thoughts of a bloody retaliation. Nevertheless,
the whole proceeding is still carried out with the greatest possible secrecy
and stealth.”721
In the Pelew Islands when the chief of a clan has reigned too long or has
made himself unpopular, the heir has a formal right to put him to death,
though for reasons which will appear this right is only exercised in some of
the principal clans. The practice of regicide, if that word may be extended to
the assassination of chiefs, is in these islands a national institution regulated
by exact rules, and every high chief must lay his account with it. Indeed so
well recognized is the custom that when the heir-apparent, who under the
system of mother-kin must be a brother, a nephew, or a cousin on the
mother's side, proves himself precocious and energetic, the people say, “The
cousin is a grown man. The chief's tobolbel is nigh at hand.”722
But if he has omitted to massacre his predecessor and has allowed him to
die a natural death, he suffers for his negligence by being compelled to
observe a long series of complicated and irksome formalities before he can
make good his succession in the eyes of the law. For in that case the title of
chief has to be formally withdrawn from the dead man and conferred on his
successor by a curious ceremony, which includes the presentation of a coco-
nut and a taro plant to the new chief. Moreover, at first he may not enter
the chief's house, but has to be shut up in a tiny hut for thirty or forty days
during all the time of mourning, and even when that is over he may not
come out till he has received and paid for a human head brought him by the
people of a friendly state. After that he still may not go to the sea-shore
until more formalities have been fully observed. These comprise a very
costly fishing expedition, which is conducted by the inhabitants of another
district and lasts for weeks. At the end of it a net full of fish is brought to
f the chief's house, and the people of the neighbouring communities are
summoned by the blast of trumpets. As soon as the stranger fishermen
have been publicly paid for their services, a relative of the new chief steps
across the net and solemnly splits a coco-nut in two with an old-fashioned
knife made of a Tridacna shell, while at the same time he bans all the evils
that might befall his kinsman. Then, without looking at the nut, he throws
the pieces on the ground, and if they [pg 268] fall so that the two halves lie
with the opening upwards, it is an omen that the chief will live long. The
pieces of the nut are then tied together and taken to the house of another
chief, the friend of the new ruler, and there they are kept in token that the
ceremony has been duly performed. Thereupon the fish are divided among
the people, the strangers receiving half. This completes the legal ceremonies
of accession, and the new chief may now go about freely. But these tedious
formalities and others which I pass over are dispensed with when the new
chief has proved his title by slaying his predecessor. In that case the
procedure is much simplified, but on the other hand the death duties are so
very heavy that only rich men can afford to indulge in the luxury of regicide.
Hence in the Pelew Islands of to-day, or at least of yesterday, the old-
fashioned mode of succession by slaughter is now restricted to a few
families of the bluest blood and the longest purses.723
If this account of the existing or recent usage of the Pelew Islanders sheds
little light on the motives for putting chiefs to death, it well illustrates the
business-like precision with which such a custom may be carried out, and
the public indifference, if not approval, with which it may be regarded as an
ordinary incident of constitutional government. So far, therefore, the Pelew
custom bears out the view that a systematic practice of regicide, however
strange and revolting it may seem to us, is perfectly compatible with a state
of society in which human conduct and human life are estimated by a
standard very different from ours. If we would understand the early history
of institutions, we must learn to detach ourselves from the prepossessions
of our own time and country, and to place ourselves as far as possible at the
standpoint of men in distant lands and distant ages.
[pg 269]
Index.
Abi-baal, i. 51 n. 4
Abi-el, i. 51 n. 4
Ajax and Teucer, names of priestly kings of Olba, i. 144 sq., 161
All Saints, feast of, perhaps substituted for an old pagan festival of
the dead, ii. 82 sq.
[pg 271]
Amenophis IV., king of Egypt, his attempt to abolish all gods but the
sun-god, ii. 123 sqq.
Anacreon, on Cinyras, i. 55
Anklets made of human sinews worn by king of Uganda, ii. 224 sq.
[pg 272]
—— and Marsyas, i. 55
—— Tauropolis, i. 275 n. 1
Assyrian cavalry, i. 25 n. 3
—— Aphrodite, i. 304 n.
Asvattha tree, i. 82
Aun, or On, King of Sweden, sacrifices his sons to Odin, ii. 220
Aunis, feast of All Souls in, ii. 69 sq.
[pg 274]
—— of the Lebanon, i. 32
—— and Baal, i. 27
—— Gebal, i. 14
Baalbec, i. 28;
sacred prostitution at, 37;
image of Hadad at, 163
Bangalas of the Congo, rebirth of dead among the, i. 92. See also
Boloki
Barley forced for festival, i. 240, 241, 242, 244, 251 sq.
[pg 275]
Barsom, bundle of twigs used by Parsee priests, i. 191 n. 2
—— of Aphrodite, i. 280
—— of Demeter, i. 280
Begbie, General, i. 62 n.
Our website is not just a platform for buying books, but a bridge
connecting readers to the timeless values of culture and wisdom. With
an elegant, user-friendly interface and an intelligent search system,
we are committed to providing a quick and convenient shopping
experience. Additionally, our special promotions and home delivery
services ensure that you save time and fully enjoy the joy of reading.
ebookname.com