2011 Webber-A Programmatic Introduction To Neo4j
2011 Webber-A Programmatic Introduction To Neo4j
Programma)c
Introduc)on
to
Neo4j
Roadmap
NOSQL
overview
Neo4j
Whats
fabulous
in
1.3?
Some
hacking
!)""$
!"""$
')""$
!"'"$
'"""$
)""$
"$
!""#$
!""%$
!""&$
Trend
2:
Connectedness
GGG
Onotologies
Informa)on connec)vity
RDFa
Folksonomies
Tagging
Wikis
UGC
Blogs
Feeds
Hypertext
Text
Documents
web 1.0
1990
web 2.0
2000
web 3.0
2010
2020
Salary List
Performance
Requirement of application
Majority of
Webapps
Social network
Semantic Trading
Data complexity
Key-Value
Stores
Dynamo:
Amazons
Highly
Available
Key-
Value
Store
(2007)
Data
model:
Global
key-value
mapping
Big
scalable
HashMap
Highly
fault
tolerant
(typically)
Examples:
Riak,
Redis,
Voldemort
Weaknesses:
Simplis)c
data
model
Poor
for
complex
data
Examples:
HBase,
HyperTable,
Cassandra
Weaknesses:
Unsuited
for
interconnected
data
Document
Databases
Data
model
Collec)ons
of
documents
A
document
is
a
key-value
collec)on
Index-centric,
lots
of
map-reduce
Examples
CouchDB,
MongoDB
Weaknesses:
Unsuited
for
interconnected
data
Query
model
limited
to
keys
(and
indexes)
Map
reduce
for
larger
queries
Graph
Databases
Data
model:
Nodes
with
proper)es
Named
rela)onships
with
proper)es
Hypergraph,
some)mes
Examples:
Neo4j
(of
course),
Sones
GraphDB,
OrientDB,
InniteGraph,
AllegroGraph
Weaknesses:
Sharding
Though
they
can
scale
reasonably
well
And
for
some
domains
you
can
shard
too!
1000
2000ms
Neo4j
1000
2ms
Neo4j
1000000
2ms
Recommenda)ons
Business
intelligence
Social
compu)ng
Geospa)al
MDM
Systems
management
Web
of
things
Genealogy
Time
series
data
Product
catalogue
Web
analy)cs
Scien)c
compu)ng
(especially
bioinforma)cs)
Indexing
your
slow
RDBMS
And
much
more!
hlp://en.wikipedia.org/wiki/File:Leonhard_Euler_2.jpg
vehicle: tardis
model: Type 40
What
you
end
up
with
What
you
know
hlp://talent-dynamics.com/tag/sqaure-peg-round-hole/
Schema-less
Databases
Graph
databases
dont
excuse
you
from
design
Any
more
than
dynamically
typed
languages
excuse
you
from
design
Neo4j
Whats
Neo4j?
Its
is
a
Graph
Database
Embeddable
and
server
Full
ACID
transac)ons
We
dont
mess
around
with
durability,
ever.
More
on
Neo4j
Neo4j
is
stable
In
24/7
opera)on
since
2003
Advanced:
AGPL/commercial
Management
features,
commercial
grade
support
Enterprise:
AGPL/commercial
HA
NOSQL is simply
Run
it!
Server
is
easy
to
start
stop
cd <install directory>
bin/neo4j start
bin/neo4j stop
Embed
it!
If
you
want
to
host
the
database
in
your
process
just
load
the
jars
And
point
the
cong
at
the
right
place
on
disk
Embedded
databases
can
be
HA
too
You
dont
have
to
run
as
server
Crea)ng
Nodes
GraphDatabaseService db = new
EmbeddedGraphDatabase("/tmp/neo");
Transaction tx = db.beginTx();
try {
Node theDoctor = db.createNode();
theDoctor.setProperty("name", "the Doctor");
tx.success();
} finally {
tx.finish();
}
Crea)ng
Rela)onships
Transaction tx = db.beginTx();
try {
Node theDoctor = db.createNode();
theDoctor.setProperty("name", "The Doctor");
Node susan = db.createNode();
susan.setProperty("firstname", "Susan");
susan.setProperty("lastname", "Campbell");
susan.createRelationshipTo(theDoctor,
DynamicRelationshipType.withName("COMPANION_OF"));
tx.success();
} finally {
tx.finish();
}
Repeat...un)l
Graph
Algorithms
The
Doctor
and
the
Master
been
around
for
a
while
But
whats
the
key
feature
of
their
rela)onship?
Theyre
both
)melords,
they
both
come
from
Gallifrey,
they
pilot
a
Tardis,
theyve
fought
Shortest
Path
Whats
the
most
direct
path
between
the
Doctor
and
the
Master?
Node theMaster =
Node theDoctor =
int maxDepth = 5;
PathFinder<Path> shortestPathFinder =
GraphAlgoFactory.shortestPath(
Traversal.expanderForAllTypes(),
maxDepth);
Path shortestPath =
shortestPathFinder.findSinglePath(theDoctor, theMaster);
Graph
matching
Its
super-powerful
to
look
for
palerns
in
a
data
set
E.g.
retail
analy)cs
Koans
Chaly Network
Java
Ruby
Clojure
Traversal
Framework
Core
API
Caches
Memory-Mapped
(N)IO
Filesystem
A
Humble
Blade
Blades
are
powerful!
A
typical
blade
will
contain
128GB
memory
We
can
use
most
of
that
Cache
Sharding
A
strategy
for
coping
with
large
data
sets
Terabyte
scale
Consistent Rou)ng
Domain-specic
sharding
Eventually
(Petabyte)
level
data
cannot
be
replicated
prac)cally
Need
to
shard
data
across
machines
Remember:
no
perfect
algorithm
exists
But
we
humans
some)mes
have
domain
insight
Summary
Neo4j
1.3
community
edi)on
is
free
as
in
beer
Graphs
are
extremely
expressive
for
modeling
Neo4j
is
fast
at
graph
traversals
No
more
mul)-join
woes
No
more
insane
indexes
No
more
map
reduce
Ques)ons?
hlp://neo4j.org
@jimwebber