This is a quick run through of the NOSQL exchange that Ciaran & I attended on Nov 2 at SkillsMatter, which featured 8 speakers and links to all talks are included.
A lot of people were asking which NoSQL solution to use?
This was the advice given by the speakers…. There is no silver bullet. Is there a need for reading/writing lots of Big data? Think about the shape of the data and how are you going to query your data to help understand which NOSQL solution fits best. Also understand the trade-offs when you choose your solution. Finally at the talks there was a lot of evidence of people using NOSQL solutions when a SQL solution would have sufficed.
1) THE STATE OF NOSQL TODAY by Emil Eifrem
This was the best talk of the day and anyone interested in NOSQL should watch the talk.
NOSQL stands for Not Only SQL.
Main types of NOSQL:
- Key-value originated from Amazon’s paper on Dynamo e.g. Riak, Voldemort (used in Linkedin)
- Column Family e.g. Cassandra, Hbase, Hyper table
- Document databases (most popular) descended from Lotus notes. e.g. CouchDb & MongoDb
- Graph Databases (nodes with properties) originated from Euler and Graph theory. e.g. infinitegraph, Neo4J
Documents are superset of Key-values. Graphs are supersets of documents and thus all others. Does this imply you should use Graph NOSQL solutions for all your NOSQL concerns? The graph NOSQL advocates think so.
- Acidity is increasing e.g. MongoDb adding durable logging storage, Cassandra adding stronger consistency
- More query languages – Cassandra -CQL, CouchDb UnQL, Neo4J – Cyper, Mongo.
- Potentially more schemas?
- Tool support
- Middleware support
Oracle now adopting NOSQL with a KeyValue solution despite debunking NOSQL in May this year. NOSQL seems to be following similar historical trends to SQL. SQL which had many vendors to begin with, over time resulted in 4 large vendors. Could NOSQL result in a similar situation in the near future?
2) HANDLING CONFLICTS IN EVENTUALLY CONSISTENT SYSTEMS by Russell Brown
Key quote from this talk: “Large systems are always in some degree of failure”
Problem: According to CAP: Consistency, Availability & Partition tolerance – you can’t have all 3. Have to compromise by picking 2.
In the case of a partition (P), trade availability (A) for consistency (C)
Else (E) trade latency for consistency (C)
Riak inspired by Dynamo. Built in Erlang/OTP. Has features such as MapReduce, links, full text search. Uses vector clocks not timestamps. Statebox for automation of resolving conflicts.
Uses a wheel for storing clustered data.
3) MONGODB + SCALA: CASE CLASSES, DOCUMENTS AND SHARDS FOR A NEW DATA MODEL by Brendan McAdams (creator of Casbah)
MongoDb is not suited for highly transactional applications or ad-hoc intelligence that requires SQL support. MongoDb resolves around memory mapped files. Mongo has an autosharding system.
Things to remember:
The datastore is a servant to the application not vice-versa
4) REAL LIFE CASSANDRA by Dave Gardner (from Cassandra user group)
- Elastic – Read/Write throughput increases as scale horizontally.
- Decentralised no master node.
- Based on Amazon’s Dynamo paper
- Rich data set model
- High write performance
If your requirements are big data, high availability high number of writes then use Cassandra.
When data modelling, start from your queries and work backwards.
Has expiring columns.
Avoid read before write & locking by safely mutating individual columns
Avoid super columns, instead use composite columns
Use Brisk (uses hadoop) for analysing data directly from Cassandra cluster.
5) DOCTOR WHO AND NEO4J by Ian Robinson
Although it was a fairly slick presentation it seemed to focus too much on modelling Doctor Who and his universe as a working example of graphs & Neo4J. Could this be to hide some shortcomings in Neo4J?
- Neo4J is a fully ACID replacement for Mysql/Oracle.
- Neo4j is a NOSQL solution that tries to sell itself as the most enterprise ready solution.
- Has master/slave nodes.
- Has 3 licenses: Community/Advanced/Enterprise.
With mentions of 2 phase commits, other than the advantage of modelling relationships such as social networks, there seemed little benefit from moving away from a relational database.
Having spoken to the Neo4J guys afterwards, it seems that the DB loses its ACIDity once you cluster it, and becomes another eventually-consistent store!
6) BUILDING REAL WORLD SOLUTION WITH DOCUMENT STORAGE, SCALA AND LIFT by Aleksa Vukotic
- Written in Erlang has Lift support (Scala framework)
- Exposes REST/JSON endpoints
- Eventually consistent
- Versioning appends only updates
- Mapreduce for querying using views
7) ROBERT REES ON POLYGLOT PERSISTENCE
A muddled presentation talking about mixing graph NOSQL solution with a document based one.
8) THE FUTURE OF NOSQL AND BIG DATA STORAGE by Tom Wilkie
Rather than using the out of the box storage engines for NOSQL solutions, there can be dramatic throughput gains for using alternative storage engines such as Tokutek and Acunu (Castle).