Comparison between different NoSQL technologies

Relational databases were not designed to cope with the scale and agility challenges that face modern applications, nor were they built to take advantage of the commodity storage and processing power available today.

NoSQL encompasses a wide variety of different database technologies that were developed in response to the demands presented in building modern applications:

The Benefits of NoSQL

When compared to relational databases, NoSQL databases are more scalable and provide superior performance, and their data model addresses several issues that the relational model is not designed to address:

  • Supports large volumes of dynamic schema (structured, semi-structured, and unstructured data). Using NoSQL is especially useful in agile development environments where changes to schema happen frequently which would require migration of the entire database involving significant downtime
  • Automatic Sharding – NoSQL databases, on the other hand, usually support auto-sharding, meaning that they natively and automatically spread data across an arbitrary number of servers, without requiring the application to even be aware of the composition of the server pool. Data and query load are automatically balanced across servers, and when a server goes down, it can be quickly and transparently replaced with no application disruption.
  • Replication

    Most NoSQL databases also support automatic database replication to maintain availability in the event of outages or planned maintenance events. More sophisticated NoSQL databases are fully self-healing, offering automated failover and recovery, as well as the ability to distribute the database across multiple geographic regions to withstand regional failures and enable data localization. Unlike relational databases, NoSQL databases generally have no requirement for separate applications or expensive add-ons to implement replication.

  • Integrated Caching

    Many NoSQL database technologies have excellent integrated caching capabilities, keeping frequently-used data in system memory as much as possible and removing the need for a separate caching layer. Some NoSQL databases also offer fully managed, integrated in-memory database management layer for workloads demanding the highest throughput and lowest latency.

NoSQL Database Types

  • Key-value stores are the simplest NoSQL databases. Every single item in the database is stored as an attribute name (or ‘key’), together with its value. Examples of key-value stores are Riak and Berkeley DB. Some key-value stores, such as Redis, allow each value to have a type, such as ‘integer’, which adds functionality.
  • Wide-column stores such as Cassandra and HBase are optimized for queries over large datasets, and store columns of data together, instead of rows.
  • Document databases pair each key with a complex data structure known as a document. Documents can contain many different key-value pairs, or key-array pairs, or even nested documents.
  • Graph stores are used to store information about networks of data, such as social connections. Graph stores include Titan, Neo4J and Giraph.

Criteria for evaulating NoSQL technologies

  1. Storage model: How is the data stored in the nosql database.
  2. Use cases for storage model: What are the use cases that can be supported due to the underlying storage model
  3. Limitations of storage model: What are the limitations due to the underlying storage model
  4. Supported data types: What are the supported data types for storing data
  5. Size Limitations of storage model: If there are any size limitations imposed by the database
  6. Storage/Access costs: What are the storage and access costs
  7. Scalability: How does the database scale for large volumes of data
  8. Replication: How is the data replicated to handle fault tolerance
  9. Sharding/Data Partitioning: What are the various sharding/data partitioning strategies that are supported
  10. CAP model: Where each nosql technology fits in the CAP model
  11. Querying: What are the various querying capabilities provided
  12. Indexes: What is the support for secondary indexes for faster non-key based lookups
  13. Transactions: What is the support provided for transactions
  14. Concurrency: How does the nosql technology deal with concurrent updates and what support does it have for dealing with concurrency
  15. Integration with search tools: Is there any integration with search tools like lucene, solr etc for advanced querying capabilities
  16. Multi-tenancy: What is the support for handling multi-tenancy if used in a SaaS software
  17. Client Libraries: What are the client libraries available for interacting with the nosql database.

1. REDIS

Redis is an open source (BSD licensed), in-memory data structure store, used as a database, cache, blocking queue, message broker etc. In order to achieve its outstanding performance, Redis works with an in-memory dataset. Depending on your use case, you can persist it either by dumping the dataset to disk every once in a while, or by appending each command to a log. Persistence can be optionally disabled, if you just need a feature-rich, networked, in-memory cache.

  1. Storage and retrieval Model

Redis is not a plain key-value store, it is actually a data structures server, supporting different kinds of values. In traditional key-value stores we associate string keys to string values, in Redis the value is not limited to a simple string, but can also hold more complex data structures. Redis primarily supports key based retrieval. Clients can use the capabilities provided by Redis in order to create secondary indexes of different kinds, including composite (multi-column) indexes.

2. UseCases where Redis is suitable

LRU cache with automatic eviction, Session Cache, Full Page Cache, User Profile Information, Product details, Product Reviews, Leaderboard/Counting etc

Redis can also be used as a Message Queue and a PubSub messaging system

3. Limitations

 

Indexes: https://redis.io/topics/indexes

The restrictions of handling data only in JSON or Column Family format carry implications about how the data is stored in the system and how the query engine must process requests. These restrictions and implications have further impacts on scaling profiles of those databases. KV databases don’t have these restrictions, and they rely on application code to parse the data. As a result, it is easier to scale the KV database irrespective of the type of data being stored within it. This is particularly true of distributed databases.

http://database.guide/what-is-a-key-value-database/ Size limitations

Some real world use cases for NoSQL

Google Cloud Datastore vs MongoDB vs Amazon DynamoDB vs Couchbase vs Redis vs HBase vs Neo4j

https://cloud.google.com/datastore/docs/concepts/overview

https://cloud.google.com/storage-options/

Advertisements
This entry was posted in Uncategorized. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s