What is NoSQL?
NoSQL provides the new data management technologies designed to meet the increasing volume, velocity, and variety of data. It can store and retrieve data that is modeled in means other than the tabular relations used in relational databases. NoSQL systems are also called “Not only SQL” to emphasize that they may also support SQL-like query languages.
Why do I need NoSQL?
The Relational Databases have the following challenges:
- Not good for large volume (Petabytes) of data with variety of data types (eg. images, videos, text)
- Cannot scale for large data volume
- Cannot scale-up, limited by memory and CPU capabilities
- Cannot scale-out, limited by cache dependent Read and Write operations
- Sharding (break database into pieces and store in different nodes) causes operational problems (e.g. managing a shared failure)
- Complex RDBMS model
- Consistency limits the scalability in RDBMS
Compared to relational databases, NoSQL databases are more scalable and provide superior performance. NoSQL databases address the challenges that the relational model does not by providing the following solution:
- A scale-out, shared-nothing architecture, capable of running on a large number of nodes
- A non-locking concurrency control mechanism so that real-time reads will not conflict writes
- Scalable replication and distribution – thousands of machines with distributed data
- An architecture providing higher performance per node than RDBMS
- Schema-less data model
CAP Theorem and NoSQL databases
CAP provides the basic requirements for a distributed system to follow the following requirements:
- Consistency (all nodes see the same data at the same time)
- Availability (a guarantee that every request receives a response about whether it was successful or failed)
- Partition tolerance (the system continues to operate despite arbitrary message loss or failure of part of the system)
Theoretically it is impossible to fulfill all three requirements. Therefore the current NoSQL databases follow the different combinations of the C,A,P from the CAP theorem.
CA – Single site cluster, therefore all nodes are always in contact. When a partition occurs, the systems blocks.
CP – Some data may be not accessible, but the rest is still consistent/accurate.
AP – System is still available under partitioning, but some of the data returned may be inaccurate.
The following graph shows where RDBMS and different NoSQL databases fit into the CAP theorem.
NoSQL is A BASE not ACID system
NoSQL is a BASE system that gives up on consistency. A BASE system has the following characteristics:
- Basically Available indicates that the system does guarantee availability, in terms of the CAP theorem.
- Soft State indicates that the state of the system may change over time, even without input. This is because of the eventual consistency model.
- Eventual Consistency indicates that the system will become consistent over time, given that the system does not receive input during that time.
|NoSQL Type||Document Data Store||Key Value||Column||Graph|
|Data Model||Collection of key value connections||Collection of key value pairs||Column families||“Property Graph” – Nodes|
|Strength||Incomplete Data Tolerant||Fast Look-ups||Fast Look-ups||Graph Algorithms – Shortest path, etc|
|Weakness||Query Performance, No Standard Query Syntax||Stored Data has no schema||Very low level API||Not easy to cluster, need to traverse whole graph to get answer|
|Example||MongoDB, CouchDB||Amazon Simple DB, Redis||HBase, Cassandra||InfoGrid, Infinite Graph|
Read/Write speed: column > document > key-value >graph
Query/Navigation speed: graph > key-value > column > document
HBase vs Cassandra vs MongoDB
|Not good for||
|Use Case||Facebook message||Twitter, Travel portal||Craigslist, Foursquare|
Generally, Cassandra performs better than the other two when the data volume is very big.
Choose the right NoSQL Databases https://www.youtube.com/watch?v=gJFG04Sy6NY
NoSQL Databases Explained http://www.mongodb.com/nosql-explained
Why NoSQL? http://www.couchbase.com/why-nosql/nosql-database