Relational databases were not designed to cope with the scale and agility challenges that face modern applications, nor were they built to take advantage of the commodity storage and processing power available today.
NoSQL encompasses a wide variety of different database technologies that were developed in response to the demands presented in building modern applications:
The Benefits of NoSQL
When compared to relational databases, NoSQL databases are more scalable and provide superior performance, and their data model addresses several issues that the relational model is not designed to address:
- Supports large volumes of dynamic schema (structured, semi-structured, and unstructured data). Using NoSQL is especially useful in agile development environments where changes to schema happen frequently which would require migration of the entire database involving significant downtime
- Automatic Sharding – NoSQL databases, on the other hand, usually support auto-sharding, meaning that they natively and automatically spread data across an arbitrary number of servers, without requiring the application to even be aware of the composition of the server pool. Data and query load are automatically balanced across servers, and when a server goes down, it can be quickly and transparently replaced with no application disruption.
Most NoSQL databases also support automatic database replication to maintain availability in the event of outages or planned maintenance events. More sophisticated NoSQL databases are fully self-healing, offering automated failover and recovery, as well as the ability to distribute the database across multiple geographic regions to withstand regional failures and enable data localization. Unlike relational databases, NoSQL databases generally have no requirement for separate applications or expensive add-ons to implement replication.
Many NoSQL database technologies have excellent integrated caching capabilities, keeping frequently-used data in system memory as much as possible and removing the need for a separate caching layer. Some NoSQL databases also offer fully managed, integrated in-memory database management layer for workloads demanding the highest throughput and lowest latency.
NoSQL Database Types
- Key-value stores are the simplest NoSQL databases. Every single item in the database is stored as an attribute name (or ‘key’), together with its value. Examples of key-value stores are Riak and Berkeley DB. Some key-value stores, such as Redis, allow each value to have a type, such as ‘integer’, which adds functionality.
- Wide-column stores such as Cassandra and HBase are optimized for queries over large datasets, and store columns of data together, instead of rows.
- Document databases pair each key with a complex data structure known as a document. Documents can contain many different key-value pairs, or key-array pairs, or even nested documents.
- Graph stores are used to store information about networks of data, such as social connections. Graph stores include Titan, Neo4J and Giraph.
Criteria for evaulating NoSQL technologies
- Data/Storage model: How is the data stored in the nosql database.
- Use cases (suitable and not suitable): What are the use cases that can be supported due to the underlying storage model
- Limitations: What are the limitations due to the underlying storage model
- Supported data types: What are the supported data types for storing data
- Storage/Access costs: What are the storage and access costs
- Scalability: How does the database scale for large volumes of data
- Availability: How is the data replicated and partitioned to handle fault tolerance and provide high availability
- CAP model: Where each NoSQL technology fits in the CAP model
- Querying: What are the various querying capabilities provided (batch)
- Pagination of results: What is the capability provided by the NoSQL solution to return paginated results
- Indexes: What is the support for secondary indexes for faster non-key based lookups
- Transactions: What is the support provided for transactions
- Concurrency: How does the nosql technology deal with concurrent updates and what support does it have for dealing with concurrency
- Search: What are the search capabilities provided
- Multi-tenancy: What is the support for handling multi-tenancy if used in a SaaS software
- Client Libraries: What are the client libraries available for interacting with the nosql database.
- Backup and Recovery: Whether the solution provides any mechanisms for backing up and recovering data
- Deployment and Operations: What are the options for deploying, maintaining and monitoring the cluster
1. REDIS (Remote Dictionary Server)
Like memcached, Redis can also store a mapping of keys to values and can even achieve similar performance levels as memcached. Redis also supports the writing of its data to disk automatically in two different ways, and can store data in four structures in addition to plain string keys as memcached does. These additional capabilities allow Redis to be used either as a primary database or as an auxiliary database with other storage systems. Persistence can be optionally disabled, if you just need a feature-rich, networked, in-memory cache.
- Data/Storage Model
In traditional key-value stores we associate string keys to string values, in Redis the value is not limited to a simple string, but can also hold more complex data structures like lists, sets, hashes (similar to map), sorted sets etc.
Redis primarily supports key based retrieval. Clients can use the capabilities provided by Redis in order to create secondary indexes of different kinds, including composite (multi-column) indexes.
2. UseCases where Redis is suitable
- Storing last login time for each user
- Storing last uptime for various services (especially useful in a microservices architecture for implementing retries and circuit breakers)
- LRU cache with automatic eviction, Session Cache, Full Page Cache
- User Profile Information
- Product details/reviews for e-commerce applications
- Leaderboard/Counting etc
- Storing temporary state that requires fast access. For example, storing the most recent views details for answers to quora questions. This can be used to batch the update calls to the primary storage (say BigTable or HBase) allowing us to throttle the writes to the underlying storage. Storing the views data for all answers in Redis does not add much value as all the data would not be accessed frequently. For more details please refer to Most Viewed Writers
Redis can also be used as a Message Queue and a PubSub messaging system
UseCases where Redis is not suitable
- Storing large amounts of data in a single string value (e.g. the most recent feed contents for each user). This means queries on these keys will be slow, which will block other queries and thereby generally slow down all clients.
- Storing data across two or more dimensions (e.g. a score for each (user, topic) pair). The data set size for these keys will likely grow superlinearly with the rest of the dataset, and we’ll have to reshard too soon.
- Storing data that requires queries with high time complexity. Using a Redis list as a queue is fine (queries at the ends of the list take constant time), but if the list is long, queries that operate far from the ends of the list or on the entire list will be very slow. Some commands have unconditionally high time complexity (e.g. SMEMBERS, HGETALL, KEYS) and we’ll want to avoid those entirely.
- Storing data that requires secondary access paths. Redis doesn’t have secondary indexes since it’s a key-value store, so we would have to implement them in the client, and using them would require multiple queries to the server.
- Each Redis instance is single-threaded. This means individual queries that take a long time are inadvisable, since they will block other queries to the same Redis instance until they complete. Starting Redis 4.0, each instance can support more than one thread for a few commands like deleting objects in the background and blocking commands implemented via Redis modules.
- The data set size is limited by the available memory on the machine on which the server is running. Redis will either be killed by the Linux kernel OOM killer, crash with an error, or will start to slow down so you’ll probably notice there is something wrong. Redis has built-in protections allowing the user to set a max limit to memory usage, using the
maxmemoryoption in the configuration file to put a limit to the memory Redis can use. If this limit is reached Redis will start to reply with an error to write commands (but will continue to accept read-only commands), or you can configure it to evict keys when the max memory limit is reached in the case you are using Redis for caching.
- Redis can handle up to 232 keys, and was tested in practice to handle at least 250 million keys per instance. Every hash, list, set, and sorted set, can hold 232 elements. In other words your limit is likely the available memory in your system.
4. Supported data types
Strings, Lists, Hashes, Sets, Sorted Sets
5. Storage/Access costs:
Since its not a managed database, there is no specific read/write costs. The only cost incurred would be to setup and maintain the redis servers (master/slave) and the redis sentinel for automatic failover if using the master/slave instances. If using redis cluster the cost incurred would be to setup and maintain the redis cluster servers.
Redis supports scaling with Master/Slave architecture. Reads can happen on slave nodes; but writes have to go through master. However, there is still a limitation on the size of the data that can be supported in memory based on the underlying hardware.
At the base of Redis replication there is a very simple to use and configure master-slave replication that allows slave Redis servers to be exact copies of master servers. The slave will automatically reconnect to the master every time the link breaks, and will attempt to be an exact copy of it regardless of what happens to the master.
This system works using three main mechanisms:
- When a master and a slave instance are well-connected, the master keeps the slave updated by sending a stream of commands in order to replicate the effects on the dataset happening in the master dataset: client writes, keys expiring or evicted, and so forth.
- When the link between the master and the slave breaks, for network issues or because a timeout is sensed in the master or the slave, the slave reconnects and attempts to proceed with a partial resynchronization: it means that it will try to just obtain the part of the stream of commands it missed during the disconnection.
- When a partial resynchronization is not possible, the slave will ask for a full resynchronization. This will involve a more complex process in which the master needs to create a snapshot of all its data, send it to the slave, and then continue sending the stream of commands as the dataset changes.
Redis uses by default asynchronous replication, which being high latency and high performance, is the natural replication mode for the vast majority of Redis use cases
Zookeeper vs Redis
This is similar to how Zookeeper operates; however the significant difference being zookeeper writes will have to wait for a quorum of nodes to replicate the data before the write is considered successful.
Redis is not a typical distributed system unlike Zookeeper. The Zookeeper master/slave nodes communicate with each other using the Paxos protocol for leader election and to sync writes with the master. Since Redis replicates the data asynchronously, the writes are significantly faster compared to Zookeeper. Zookeeper is most suitable for coordination: tracking which nodes are active, leader election amongst a group, etc. Use redis for datasets that need faster writes but where the occasional outage isn’t a disaster.
For more details, refer to Master Slave Replication
Partitioning in Redis serves two main goals:
- It allows for much larger databases, using the sum of the memory of many computers. Without partitioning you are limited to the amount of memory a single computer can support.
- It allows scaling the computational power to multiple cores and multiple computers, and the network bandwidth to multiple computers and network adapters.
There are different partitioning criteria. Imagine we have four Redis instances R0, R1, R2, R3, and many keys representing users like
user:2, … and so forth, we can find different ways to select in which instance we store a given key. In other words there are different systems to map a given key to a given Redis server.
One of the simplest ways to perform partitioning is with range partitioning, and is accomplished by mapping ranges of objects into specific Redis instances. For example, I could say users from ID 0 to ID 10000 will go into instance R0, while users form ID 10001 to ID 20000 will go into instance R1 and so forth.
This system works and is actually used in practice, however, it has the disadvantage of requiring a table that maps ranges to instances. This table needs to be managed and a table is needed for every kind of object, so therefore range partitioning in Redis is often undesirable because it is much more inefficient than other alternative partitioning approaches.
An alternative to range partitioning is hash partitioning. This scheme works with any key, without requiring a key in the form
object_name:<id>, and is as simple as:
- Take the key name and use a hash function (e.g., the
crc32hash function) to turn it into a number. For example, if the key is
crc32(foobar)will output something like
- Use a modulo operation with this number in order to turn it into a number between 0 and 3, so that this number can be mapped to one of my four Redis instances.
93024922 modulo 4equals
2, so I know my key
foobarshould be stored into the R2 instance. Note: the modulo operation returns the remainder from a division operation, and is implemented with the
%operator in many programming languages.
There are many other ways to perform partitioning, but with these two examples you should get the idea. One advanced form of hash partitioning is called consistent hashing and is implemented by a few Redis clients and proxies.
Different implementations of partitioning
Partitioning can be the responsibility of different parts of a software stack.
- Client side partitioning means that the clients directly select the right node where to write or read a given key. Many Redis clients implement client side partitioning.
- Proxy assisted partitioning means that our clients send requests to a proxy that is able to speak the Redis protocol, instead of sending requests directly to the right Redis instance. The proxy will make sure to forward our request to the right Redis instance accordingly to the configured partitioning schema, and will send the replies back to the client. The Redis and Memcached proxy Twemproxy implements proxy assisted partitioning.
- Query routing means that you can send your query to a random instance, and the instance will make sure to forward your query to the right node. Redis Cluster implements an hybrid form of query routing, with the help of the client (the request is not directly forwarded from a Redis instance to another, but the client gets redirected to the right node).
Disadvantages of partitioning
Some features of Redis don’t play very well with partitioning:
- Operations involving multiple keys are usually not supported. For instance you can’t perform the intersection between two sets if they are stored in keys that are mapped to different Redis instances (actually there are ways to do this, but not directly).
- Redis transactions involving multiple keys can not be used.
- The partitioning granularity is the key, so it is not possible to shard a dataset with a single huge key like a very big sorted set.
- When partitioning is used, data handling is more complex, for instance you have to handle multiple RDB / AOF files, and to make a backup of your data you need to aggregate the persistence files from multiple instances and hosts.
- Adding and removing capacity can be complex. For instance Redis Cluster supports mostly transparent rebalancing of data with the ability to add and remove nodes at runtime, but other systems like client side partitioning and proxies don’t support this feature. However a technique called Pre-sharding helps in this regard
Using Redis as Primary Data store or cache?
Although partitioning in Redis is conceptually the same whether using Redis as a data store or as a cache, there is a significant limitation when using it as a data store. When Redis is used as a data store, a given key must always map to the same Redis instance. When Redis is used as a cache, if a given node is unavailable it is not a big problem if a different node is used, altering the key-instance map as we wish to improve the availability of the system (that is, the ability of the system to reply to our queries).
Consistent hashing implementations are often able to switch to other nodes if the preferred node for a given key is not available. Similarly if you add a new node, part of the new keys will start to be stored on the new node.
For more details, please check Redis partitioning
The main concept here is the following:
- If Redis is used as a cache scaling up and down using consistent hashing is easy.
- If Redis is used as a store, a fixed keys-to-nodes map is used, so the number of nodes must be fixed and cannot vary. Otherwise, a system is needed that is able to rebalance keys between nodes when nodes are added or removed, and currently only Redis Cluster is able to do this
Redis Cluster provides a way to run a Redis installation where data is automatically sharded across multiple Redis nodes.
Redis Cluster also provides some degree of availability during partitions, that is in practical terms the ability to continue the operations when some nodes fail or are not able to communicate. However the cluster stops to operate in the event of larger failures (for example when the majority of masters are unavailable).
Redis Cluster is not able to guarantee strong consistency. In practical terms this means that under certain conditions it is possible that Redis Cluster will lose writes that were acknowledged by the system to the client.
The first reason why Redis Cluster can lose writes is because it uses asynchronous replication. This means that during writes the following happens:
- Your client writes to the master B.
- The master B replies OK to your client.
- The master B propagates the write to its slaves B1, B2 and B3.
As you can see B does not wait for an acknowledge from B1, B2, B3 before replying to the client, since this would be a prohibitive latency penalty for Redis, so if your client writes something, B acknowledges the write, but crashes before being able to send the write to its slaves, one of the slaves (that did not receive the write) can be promoted to master, losing the write forever.
So in practical terms, what you get with Redis Cluster?
- The ability to automatically split your dataset among multiple nodes.
- The ability to continue operations when a subset of the nodes are experiencing failures or are unable to communicate with the rest of the cluster.
Redis Sentinel is used to provide high availability for Redis by taking care of automatic failover. In practical terms this means that using Sentinel you can create a Redis deployment that resists without human intervention to certain kind of failures.
Redis Sentinel also provides other collateral tasks such as monitoring, notifications and acts as a configuration provider for clients.
This is the full list of Sentinel capabilities at a macroscopical level (i.e. the big picture):
- Monitoring. Sentinel constantly checks if your master and slave instances are working as expected.
- Notification. Sentinel can notify the system administrator, another computer programs, via an API, that something is wrong with one of the monitored Redis instances.
- Automatic failover. If a master is not working as expected, Sentinel can start a failover process where a slave is promoted to master, the other additional slaves are reconfigured to use the new master, and the applications using the Redis server informed about the new address to use when connecting.
- Configuration provider. Sentinel acts as a source of authority for clients service discovery: clients connect to Sentinels in order to ask for the address of the current Redis master responsible for a given service. If a failover occurs, Sentinels will report the new address.
8. CAP Model
Redis was designed primarily as a single-server system. As a consequence it aims to be CP. It can be distributed either by master-slave or truly distributed (multi-master) with Redis Sentinel protocol.
Both ways to distribute Redis are faulty in that they loose both data consistency and also can loose persistent data in the presence of network partitions. So not only Redis is not CP, but looses data forever (something which AP systems don’t, e.g. strong eventually consistent ones).
Yet another way to use Redis is to shard by consistent hashing from the client, moment in which you simply work with different single-node systems, however data can be lost in this scenario, and is only useful for caching or recomputable data.
One more thought: taking any single-node service, you could theoretically transform it into a CP distributed system, by starting an uneven number of nodes and running a distributed consensus between those nodes: Raft, the Paxos successor scales really well and is implemented in great projects including consul. If a majority of nodes vote for changes and they apply them to local dbs, things should be smooth.
In conclusion, with Redis Sentinel, Redis can be AP if used as a cache and not a primary store. If its used as a primary store, it cannot even be AP since there can be data loss.
Redis provides commands for operating on the various data strucutures (strings, hashes, lists, sets, sorted sets) that it provides.
Operations on Strings:
Operations on Hashes:
Operations on Lists:
Operations on Sets:
Operations on SortedSets:
For a complete list of commands, refer Redis Commands
Batch operations are also supported on some of the data structures. However, when using Redis-cluster, some of the batch operations may not be supported since the data might reside on several nodes.
10. Query Pagination
There are different ways in which pagination can be implemented in Redis.
a) Using the SSCAN Command
The SSCAN command is part of a group of commands similar to the regular SCAN command. These include:
- SCAN – Used to iterate over the set of keys in the current database.
- SSCAN – Used to iterate over elements of sets.
- HSCAN – Used to iterate fields hashes and associated values.
- ZSCAN – Used to iterate elements of sorted sets and their scores.
While the regular SCAN command iterates over the database keys, the SSCAN command can iterate over elements of sets. By using the returned SSCAN cursor, you could paginate over a Redis set.
The downside is that you need some way to persist the value of the cursor, and if there are concurrent users this could lead to some odd behavior, since the cursor may not be where it is expected. However, this can be useful for applications where traffic to these paginated areas may be lighter.
b) Using Sorted Sets
To paginate with Sorted Set, we can use the ZRANGE command to select a range of elements in a sorted set based on their scores. So, you could, for example, select scores from 1-20, 21-40, and so on. By programmatically adjusting the range as the user moves through the data, you can achieve the pagination you need for your application. Since sorted sets and ZRANGE do this task more intuitively than using a scan, it is often the preferred method of pagination, and is easier to implement with multiple users, since you can programmatically keep track of which ZRANGE each user is selecting at any given time.
Redis does not have built-in support for indexes. Redis only offers primary key access. However since Redis is a data structures server, its capabilities can be used for indexing, in order to create secondary indexes of different kinds, including composite (multi-column) indexes. Here is a few ways in which we can create indexes with Redis
- Sorted sets to create secondary indexes by ID or other numerical fields.
- Sorted sets with lexicographical ranges for creating more advanced secondary indexes, composite indexes and graph traversal indexes.
- Sets for creating random indexes.
- Lists for creating simple iterable indexes and last N items indexes.
For details on how to create indexes with Redis, refer Redis Indexes
- All the commands in a transaction are serialized and executed sequentially. It can never happen that a request issued by another client is served in the middle of the execution of a Redis transaction. This guarantees that the commands are executed as a single isolated operation.
- Either all of the commands or none are processed, so a Redis transaction is also atomic. The EXEC command triggers the execution of all the commands in the transaction, so if a client loses the connection to the server in the context of a transaction before calling the MULTI command none of the operations are performed, instead if the EXEC command is called, all the operations are performed
For more details on how to use transactions in Redis, refer to Redis Transactions
Redis is single-threaded and transactions block until they finish. However, multiple clients can connect to the instance and execute their commands concurrently. Redis handles execution of the commands sequentially with event loops.
By default every client (the benchmark simulates 50 clients if not otherwise specified with
-c) sends the next command only when the reply of the previous command is received, this means that the server will likely need a read call in order to read each command from every client.
Redis supports Pipelining, so it is possible to send multiple commands at once, a feature often exploited by real world applications. Redis pipelining is able to dramatically improve the number of operations per second a server is able do deliver. More details on Redis Benchmarks
Redis provides data structures (sorted sets) that can be used for implementing lexicographic search as well as typical document search. For details on how Redis can be used as a search engine, check Searching with Redis
To separate various tenants’ data in Redis, we have – like SQL – two options. The first is to maintain independent Redis server instances, and switch out the connection per request. Of course, this has the disadvantage of reducing or eliminating reuse of the connection pool
Instead, we could use a simple “namespace” concept to identify each key with its tenant.
When working with Redis, it’s customary to namespace keys by simply prepending their name with the namespace. For example, <
16. Client Libraries
There are numerous client libraries available for Redis in different languages. For a complete list, refer to client libraries
17. Backup and Recovery
Redis provides a different range of persistence options:
- The RDB persistence performs point-in-time snapshots of your dataset at specified intervals.
- the AOF persistence logs every write operation received by the server, that will be played again at server startup, reconstructing the original dataset. Commands are logged using the same format as the Redis protocol itself, in an append-only fashion. Redis is able to rewrite the log on background when it gets too big.
- If you wish, you can disable persistence at all, if you want your data to just exist as long as the server is running.
- It is possible to combine both AOF and RDB in the same instance. Notice that, in this case, when Redis restarts the AOF file will be used to reconstruct the original dataset since it is guaranteed to be the most complete.
The most important thing to understand is the different trade-offs between the RDB and AOF persistence.
For more details on the advantages and disadvantages of RDB and AOF, checkout Redis persistence
18. Deployment and Operations
Redis clusters have to be deployed, maintained and monitored. Its not a fully managed service. However, there are tools like Redis Pack that allow setting up of redis clusters and also monitoring the health of the deployment.
In summary, Redis is appropriate for small, simple data that requires fast access, but in most environments needs to be backed up by another slower, less expensive datastore. On AWS, data stored in memory is approximately 25 times more expensive than data stored on disk, so at Quora, we use Redis as a standalone datastore for only performance-sensitive data, and prefer MySQL or HBase for data that can tolerate more access latency. For some of the guidelines when using in production, check Production Usage guidelines
MongoDB is a document database with the scalability and flexibility that you want with the querying and indexing that you need. MongoDB is a distributed database at its core, so high availability, horizontal scaling, and geographic distribution are built in and easy to use. MongoDB’s design philosophy is focused on combining the critical capabilities of relational databases with the innovations of NoSQL technologies
- Data/Storage Model
MongoDB stores data in a binary representation called BSON (Binary JSON) which are JSON-like documents, meaning fields can vary from document to document and data structure can be changed over time. It allows data be represented as simple key-value pairs and flat, table-like structures, through to rich documents and objects with deeply nested arrays and sub-documents. The document model maps to the objects in your application code, making data easy to work with.
Documents that tend to share a similar structure are organized as collections. It may be helpful to think of collections as being analogous to a table in a relational database: documents are similar to rows, and fields are similar to columns. Fields can vary from document to document; there is no need to declare the structure of documents to the system – documents are self describing. If a new field needs to be added to a document then the field can be created without affecting all other documents in the system, without updating a central system catalog, and without taking the system offline. Developers can start writing code and persist the objects as they are created. And when developers add more features, MongoDB continues to store the updated objects without the need to perform costly ALTER TABLE operations, or worse – having to re-design the schema from scratch.
The storage engine is the component of the database that is responsible for managing how data is stored, both in memory and on disk. MongoDB supports multiple storage engines, as different engines perform better for specific workloads. Choosing the appropriate storage engine for your use case can significantly impact the performance of your applications. For more details, checkout Storage Engines
Data Model considerations
- Using Embedded documents vs normalized data model (Data Model Design)
- Atomicity of write operations involving multiple collections (Atomicity)
- Document Growth (Document Growth)
- Data Use and Performance
For complete details, check Data Model Considerations
MongoDB provides the capability to validate documents during updates and insertions. Validation rules can be specified on a per-collection basis. For complete details, check Document Validation
Operational Factors to consider when designing Data Model
Data Model Patterns and Examples
2. Use Cases
There are several restrictions that MongoDB places on Documents, Indexes, Namespaces, Data, Operations etc. For complete details, check MongoDB limitations.
Also check MongoDB limitations
4. Supported DataTypes
5. Storage/Access costs
MongoDB comes is several flavours.
a) MongoDB Professional
b) MongoDB Enterprise Advanced
c) MongoDB community server
d) MongoDB Atlas is a Database as a Service offering which allows to Deploy, operate and scale a MongoDB database in the cloud (supports AWS, GCP and Azure)
Each offering has a licensing cost. For differences between the various offerings and the pricing checkout MongoDB Professional vs Enterprise
For details on how MongoDB Atlas stacks up against other MongoDB as a service offerings, checkout MongoDB Atlas comparison
MongoDB provides horizontal scale-out for databases on low cost, commodity hardware or cloud infrastructure through sharding by distributing data across multiple physcial partitions which is transparent to applications. Sharding allows MongoDB deployments to address the hardware limitations of a single server, such as bottlenecks in RAM or disk I/O. MongoDB automatically balances the data in the sharded cluster as the data grows or the size of the cluster increases or decreases. The diagram below depicts how MongoDB allows Application to transparently put/fetch data from different shards underneath.
MongoDB supports 2 strategies for sharding the data through shard keys viz-a-viz Hashed Sharding and Range Sharding
MongoDB partitions data sets into chunks with a chunk storing all data within a specific range, and assigns chunks to shards. By default, it creates two 64 megabyte chunks per shard. When a chunk is full, MongoDB splits it into two smaller 32 megabyte chunks. To ensure an even distribution of data, MongoDB will automatically migrate chunks between shards so that each shard has the same number of chunks. This is why sharded clusters require routers and config servers. The config servers store cluster metadata including the mapping between chunks and shards. The routers are responsible for migrating chunks and updating the config servers. All read and write requests are sent to routers because only they can determine where the data resides.
MongoDB recommends running routers next to application servers. However, as the number of application servers increases, the number of routers increases and that can be a problem since the routers maintain open connections to MongoDB instances. Eventually, the routers will open too many connections and performance will degrade. The solution is to separate routers from application servers and reduce the number of them. While this alleviates the problem of too many open connections, it limits users’ ability to scale routers by adding more. When rebalancing the data due to addition/removal of shards, MongoDB will migrate only one chunk at a time in order minimize the performance impact on applications. However, migration happens automatically and will have some impact on performance.
MongoDB supports High Availability through the use of sharding and replication. A sharded cluster can continue to perform partial read / write operations even if one or more shards are unavailable. While the subset of data on the unavailable shards cannot be accessed during the downtime, reads or writes directed at the available shards can still succeed.
MongoDB also supports High Availability through automatic failover in replica sets. For details, check MongoDB Replication
With multiple copies of data on different database servers, replication protects a database from the loss of a single server. Replication thus provides redundancy and increases data availability. It also allows recovery from hardware failures and service interruptions. In some cases, replication is used to increase read capacity. Clients have the ability to send read and write operations to different servers. Maintaining copies in different data centers increases the locality and availability of data for distributed applications.
Increased data availability is achieved in MongoDB through Replica Set, which provides data redundancy across multiple servers. Replica Set is a group of mongod instances that host the same data set. One mongod, the primary, receives all data modification operations (insert / update / delete).
All other instances, secondaries, apply operations from the primary so that they have the same data set. Because only one member can accept write operations, replica sets provide strict consistency for all reads from the primary. To support replication, the primary logs all data set changes in its oplog. The secondaries asynchronously replicate the primary’s oplog and apply the operations to their data sets. Secondaries’ data sets reflect the primary’s data set. It is possible for the client to set up Read Preference to read data from secondaries. This way the client can balance loads from master to replicas, improving throughput and decreasing latency. However, as a result secondaries may not return the most recent data to the clients. Due to the asynchronous nature of the replication process, replica read preference guarantees only eventual consistency for its data.
Within the replica set, members are interconnected with each other to exchange heartbeat messages. A crashed server with a missing heartbeat will be detected by other members and removed from the replica set membership. After the dead secondary recovers, it can rejoin the cluster by connecting to the primary, then catch up to the latest update. If a crash occurs over a lengthy period of time, where the change log from the primary doesn’t cover the whole crash period, then the recovered secondary needs to reload the whole data from the primary as if it was a brand new server.
Starting in MongoDB 3.2, you can deploy config servers as replica sets. A sharded cluster with a Config Server Replica Set (CSRS) can continue to process reads and writes as long as a majority of the replica set is available.
8. CAP Model
Depending on the need of the application, MongoDB provides options to either maximize Consistency or Availability. So, its not possible to label MongoDB as either an AP or CP system.
With regards to consistency, MongoDB is strongly consistent by default but can be configured to be eventually consistent. However, there are possibilities of reading data which may be rolled back later based on the readPreference/readConcern and writeConcern settings.
If readPreference is secondary and readConcern is local, and writeConcern is majority, this case might arise. Depending on the consistency needs of the application, these need to be configured appropriately.
readConcern – is the way we want to read data from mongo – that means if we have a replica set, then readConcern majority will allow to get data saved (persisted) to majority of replica set, so we can be assured that this document will not be rolled back if there be an issue with replication.
readPreference – can help with balancing load, so we can have a report generator process which will always read data from secondary and leave primary server to serve data for online users.
Dirty Reads are possible with MongoDB based on the readConcern and writeConcern settings
MongoDB allows querying on documents, embedded documents, array and array of embedded documents as well. For details, check MongoDB querying
MongoDB also supports projections to select only a few fields in the query result when querying a collection for documents. For details, check MongoDB Projections
Understand any existing document schema – MongoDB Compass. If there is an existing MongoDB database that needs to be understood and optimized then MongoDB Compass is an invaluable tool. The MongoDB Compass GUI allows users to understand the structure of existing data in the database and perform ad hoc queries against it – all with zero knowledge of MongoDB’s query language. By understanding what kind of data is present, you’re better placed to determine what indexes might be appropriate. Without Compass, users wishing to understand the shape of their data would have to connect to the MongoDB shell and write queries to reverse engineer the document structure, field names and data types. MongoDB Compass is included with MongoDB Professional and MongoDB Enterprise Advanced.
10. Query Pagination
MongoDB supports cursors to iterate over a query result. By default, the server will automatically close the cursor after 10 minutes of inactivity, or if client has exhausted the cursor. As a cursor returns documents, other operations may interleave with the query. For the MMAPv1 storage engine, intervening write operations on a document may result in a cursor that returns a document more than once if that document has changed. To handle this situation, see the information on snapshot mode
MongoDB supports single-field, compound, multi-key index, geospatial, text and hashed indexes. For complete details, check MongoDB indexes
In MongoDB, a write operation is atomic on the level of a single document, even if the operation modifies multiple embedded documents within a single document.
When a single write operation modifies multiple documents, the modification of each document is atomic, but the operation as a whole is not atomic and other operations may interleave. However, you can isolate a single write operation that affects multiple documents using the $isolated operator. Using the $isolated operator, a write operation that affects multiple documents can prevent other processes from interleaving once the write operation modifies the first document. This ensures that no client sees the changes until the write operation completes or errors out. An isolated write operation does not provide “all-or-nothing” atomicity. That is, an error during the write operation does not roll back all its changes that preceded the error.
Since a single document can contain multiple embedded documents, single-document atomicity is sufficient for many practical use cases. For cases where a sequence of write operations must operate as if in a single transaction, you can implement a two-phase commit in your application.
However, two-phase commits can only offer transaction-like semantics. Using two-phase commit ensures data consistency, but it is possible for applications to return intermediate data during the two-phase commit or rollback.
For details, check how to apply Two Phase Commit
Concurrency control allows multiple applications to run concurrently without causing data inconsistency or conflicts.
One approach is to create a unique index on a field that can only have unique values. This prevents insertions or updates from creating duplicate data. Create a unique index on multiple fields to force uniqueness on that combination of field values. For examples of use cases, see update() and Unique Index and findAndModify() and Unique Index.
Another approach is to specify the expected current value of a field in the query predicate for the write operations. The two-phase commit pattern provides a variation where the query predicate includes the application identifier as well as the expected state of the data in the write operation.
MongoDB provides TextIndex which can be used to support text search queries on string content. To perform text search queries, you must have a
text index on your collection. A collection can only have onetext search index, but that index can cover multiple fields. You can check how MongoDB search works.
The common strategies employed for multi-tenancy in any database include
- All tenants in the same collection, using tenant-specific fields for security
- 1 Collection per tenant in a single shared DB
- 1 Database per tenant (kind of a Namespace)
All 3 strategies can be employed to support multi-tenancy with MongoDB. However, the most recommended approach is to have 1 database per tenant.
16. Client Libraries
An application communicates with MongoDB by way of a client library, called a driver, that handles all interaction with the database in a language appropriate to the application.
Check Client Libraries for a complete list of supported drivers
17. Backup and Recovery
MongoDB provides various options for backup
- Back Up with Atlas
- Back Up with CloudManager or OpsManager
- Back Up by copying underlying data files
- Back Up with mongodump
For more details on the various options, check Backup and Restore
18. Deployment and Operations
MongoDB clusters have to be deployed, maintained and monitored. The MongoDB team provides different support level agreements for clients for managing the cluster. However, MongoDB Atlas can be used to setup MongoDB as a Service. MongoDB Atlas also comes with different support packages and has a lot of tools to deploy, manage and monitor a MongoDB cluster reducing the effort required to maintain a MongoDB cluster. However, this is still not completely managed unlike Google Cloud Datastore or Amazon DynamoDB.
MongoDB should be tuned according to the needs of the application to achieve high performance. Check MongoDB-Performance-Best-Practices for the guidelines. For more details on MongoDB architecture, check MongoDB_Architecture_Guide
- Data/Storage Model
Couchbase can be used either as a key-value store or as a document database. Data in Couchbase Server can be either JSON documents or binary format. Almost all the data models supported by MongoDB can be supported with Couchbase as well. The major difference lies only in how the data can be queried. Since we have already seen the different data models that can be supported already in MongoDB, I will add some details about CouchBase architecture in this section.
The core architecture is designed to simplify building modern applications with a flexible data model, powerful SQL-based query language, and a secure core database platform that provides high availability, scalability, and performance. Couchbase Server consists of a single package that is installed on all nodes within a cluster. Through the SDKs (also known as client libraries), developers can write applications in the language of their choice (Java, Node.js, .NET or others) and connect to a Couchbase Server cluster to perform read and write operations and queries across key-value records and/or JSON documents with low latencies (sub-millisecond) at high throughput rates (millions of operations per second).
The basic database architecture consists of one or more Couchbase Servers (also called nodes) organized as a cluster. Each node contains a set of configurable services including a Managed Cache, Storage, Cluster Management as well as Data, Index, and Query Services.
The Cluster Manager handles the cluster level operations and the coordination between nodes in a Couchbase Server cluster. These management functions include configuring nodes, monitoring nodes, gathering statistics, logging, and controlling data rebalancing among cluster nodes. The cluster manager determines the node’s membership in a cluster, responds to heartbeat requests, monitors its own services and repairs itself if possible.
Although each node runs its own local Cluster Manager, there is only one node chosen from among them, called the orchestrator, that supervises the cluster at a given point in time. The orchestrator maintains the authoritative copy of the cluster configuration, and performs the necessary node management functions to avoid any conflicts from multiple nodes interacting. If a node becomes unresponsive for any reason, the orchestrator notifies the other nodes in the cluster and promotes the relevant replicas to active status. This process is called failover, and it can be done automatically or manually. If the orchestrator fails or loses communication with the cluster for any reason, the remaining nodes detect the failure when they stop receiving its heartbeat, so they immediately elect a new orchestrator. This is done immediately and is transparent to the operations of the cluster.
The Data Manager serves client requests and stores data. The Data Manager has several components including
Data Service: handles core data management operations, such as Key Value GET/SET. The data service is also responsible for building and maintaining MapReduce views. Note that even though MapReduce views can act as indexes, they are always managed by the data service, not the indexing service.
Index Service: efficiently maintains indexes for fast query execution. This involves the creation and deletion of indexes, keeping those indexes up to date based on change notifications from the data service and responding to index-scan requests from the query service. You can access and manage the indexing service through the query service.
Couchbase Server provides an indexing service to manage Global Secondary Indexes (GSIs). Global Secondary Indexes are aggregated and stored on a given Index Service node and can be independently partitioned by using a WHERE clause in the index definition. GSIs provide fast lookups and range scans which are especially useful for interactive applications that use secondary keys. Global Secondary Indexes are defined via the N1QL query language and are extremely flexible, enabling index creation on a specific subset of fields, on a subset of records, and on JSON arrays using built-in SQL functions. Additionally, Global Secondary Indexes can be declared to use in-memory storage optimization, enabling enhanced index performance. Couchbase indexes are automatically maintained asynchronously from the core Data Service write operations, using high speed in-memory queues to forward mutations (writes) to the appropriate indexer using Couchbase’s advanced Database Change Protocol (DCP). This asynchronous indexing architecture provides two fundamental benefits: a) write operations within the Data Service can be completed extremely quickly, and b) developers can create as many indexes as are needed by the application, no longer having to pay the typical performance and latency penalty that is common in traditional RDBMS systems. If needed, read operations can optionally request to wait for any pending index updates to be completed prior to performing the index scan if this is an application requirement for a given query.
Cache Service: Unlike other database platforms, Couchbase Server does not require a caching tier in front of it. Couchbase uses a memory-first architecture, ensuring that all data operations are performed through a configurable high-speed in-memory cache. Internally, Couchbase moves data to and from disk as needed, thereby automatically acting as both a read-through and a write-through cache, which facilitates an extremely high read/write rate. With Couchbase Server, application configuration and deployment is simplified as applications don’t have to deal with complex cache coherency issues or varying performance capabilities across technologies.
Each data node relies on a built-in multithreaded Managed Cache that is used both for the data service and separately for the indexing service. These caches are actively managed as opposed to relying upon the operating system’s file buffer cache and/or memory-mapped files. The data service’s integral managed cache is an object-managed cache based on memcached. All key-value operations are performed via the cache. Applications can access Couchbase using memcached compatible APIs such as get, set, delete, append, and prepend. Because this API is complete and transparent to the application, Couchbase Server can be used as a drop in replacement for memcached.
Because Couchbase Server automatically replicates data, cache availability is greatly improved and end users rarely hit a cold cache even if system components fail. The managed caching layer within the indexing service is built into the underlying storage engine (ForestDB in 4.0). It manages the caching of individual portions of indexes as RAM is available to speed performance both of inserting or updating information as well as servicing index scans.
Query Service: handles N1QL query parsing, optimization, and execution. This service interacts with both the indexing service and the data service to process and return those queries back to the requesting application.
The Query Service takes a N1QL query and performs the necessary functions to retrieve, filter, and/or project the data in order to resolve the application request. Actual data and index lookups and scans may occur in the Data Service or in the Index Service, depending on the data and the indexes that are required to resolve the query. Applications and developers can use the N1QL REST API, the cbq shell, the cbc command line tool, or the Query Workbench to query records (JSON documents) using SQL-like syntax. With N1QL, developers can run ad hoc or prepared queries using standard SQL syntax (including WHERE clauses and aggregate functions) with JSON-specific extensions to filter, process, and format query output. N1QL queries will automatically leverage existing MapReduce View and GSI indexes to resolve GROUP BY, ORDER BY, and WHERE clauses. The Query Service is where the N1QL query parser, planner, optimizer, and executor reside and it is the recipient of incoming N1QL queries. It provides the basis for how N1QL queries are executed across the Couchbase Server, leveraging the facilities that are available in the Data and Indexing Services.
Storage Layer: Couchbase uses a highly efficient append-only storage system to persist data to files that are placed in the local file system. Append-only storage systems are a technique used by many modern database architectures to provide optimized I/O throughput, especially for write-intensive applications. A background compaction process (automated or manually controlled) cleans up orphaned and fragmented space within the files caused by ongoing data mutations to the append-only file system. The compaction process is designed to minimize the impact to front end I/O operations. Although high performance storage systems (such as SSDs) can be used on Couchbase Server nodes, they are not required. Couchbase Server uses two optimized storage engines, Couchstore and ForestDB. Couchstore is used to store the data items and local indexes, while ForestDB is used to store the Secondary Indexes.
The Storage Layer persists data from RAM to disk and reads data back into memory when needed. Couchbase Server’s storage engines are responsible for persisting data to disk. They write every change operation that a node receives to an append-only file (AOF), meaning that mutations go to the end of the file and in-place updates are forbidden. This file is replayed if necessary at node startup to repopulate the locally managed cache.
Couchbase Server periodically compacts its data files and index files to save space (auto-compaction). During compaction, each node remains online and continues to process read and write requests. Couchbase Server performs compaction in parallel, both across the many nodes of a cluster as well as across the many files in persistent storage. This highly parallel processing allows very large datasets to be compacted incrementally.
Couchbase Server 4.0 and greater supports multidimensional scaling (MDS). MDS enables users to turn on or off specific services on each Couchbase Server node so that the node in effect becomes specialized to handle a specific workload: document storage (data service), global indexes (index service) or N1QL query processing (query service).
Multidimensional scaling has four main advantages:
- Each service can be independently scaled to suit an application’s evolution, whether that entails a growing data set, expanding indexing requirements, or increased query processing needs.
- The index and query services work most quickly and efficiently when a single or small number of machines contain the entire index.
- You can choose to customize machines to their workloads. For example, by adding more CPUs to a node running queries.
- Provides workload isolation so that query workloads do not interfere with indexing or data on the same node.
Multidimensional scaling allows specific services to both scale up and scale out without sacrificing ease of administration because they are all managed from within the same cluster and configured at run time, and the software installed on each machine is identical.
Buckets and vBuckets
A bucket is a logical collection of related documents in Couchbase, just like a database in RDBMS. It is a unique key space.
Couchbase administrators and developers work with buckets when performing tasks such as accessing and managing data, checking statistics, and issuing N1QL queries. Unlike a table in an RDBMS, a bucket can and typically does contain documents of varying schemas. Buckets are used to segregate the configuration and operational handling of data in terms of cache allocation, indexing and replication. While buckets can play a role in the concept of multitenancy, they are not necessarily the only component
Internally, Couchbase Server uses a mechanism called vBuckets (synonymous to shard or partition) to automatically distribute data across nodes, a process sometimes known as “auto-sharding”. vBuckets help enable data replication, failover, and dynamic cluster reconfiguration. Unlike data buckets, users and applications do not manipulate vBuckets directly. Couchbase Server automatically divides each bucket into a 1024 active vBuckets, 1024 replica vBuckets per replica and then distributes them evenly across the nodes running the Data Service within a cluster. vBuckets do not have a fixed physical location on nodes; therefore there is a mapping of vBuckets to nodes known as the cluster map. Through the Couchbase SDKs the application automatically and transparently distributes the data and workload across these vBuckets.
For more information on vBuckets, see the technical white paper available here.
Database Change Protocol
Database Change Protocol (DCP) is a high-performance streaming protocol that communicates the state of the data using an ordered change log with sequence numbers. It is a generic protocol designed for keeping a consumer of the Data Service up to date.
DCP is robust and resilient in the face of transitory errors. For example, if a stream is interrupted, DCP resumes from exactly the point of its last successful update once connectivity resumes. DCP provides version histories and rollbacks that provide a snapshot of the data. Snapshots can be used to create a brand new replica or catch up an interrupted or partially built node.
Couchbase Server leverages DCP internally and externally for several different purposes. Within Couchbase Server, DCP is used to create replicas of data within and between clusters, maintain indexes (both local and global), backups, etc. DCP also feeds most connectors, which are integrations with external systems such as Hadoop, Kafka or Elasticsearch. Some of these connectors leverage DCP directly or connect to Couchbase Server’s XDCR functionality to keep the external systems in sync by acting like Couchbase Server nodes.
2. Use cases
Couchbase can be used for almost all usecases that MongoDB can be used for. Since it supports better throughput and latency with concurrent users, it can also be used for high traffic applications like
Personalization, Profile Management, Real-Time BigData, Content Management, Catalogs, Mobile Apps, IOT etc
4. Data Types
The data types supported by Couchbase include Boolean, Numbers, Strings, Arrays, Objects, NULL, MISSING, Date, Collation, Binary
For more details, Couchbase DataTypes
5. Storage/Access costs
Couchbase Server comes in 3 main editions – Open Source Edition, Community Edition and Enterprise Edition.
Open Source Edition (OSE) is free and gives you access to the open source code to build your custom Couchbase Server. Open source project continues
to serve as the foundation for both the community edition and the enterprise edition. OSE is available under the Apache License 2.0. OSE allows endless customizations. However OSE comes with no support or guarantees.
Community Edition (CE) is free as well and gives you a set of binaries to run with your application. If you are building a simple application that does not need more than the basic availability, performance, scale, tooling, security
capabilities and community support through our forums, this is the edition for you. Documentation, mailing lists, and forums provide support for the Couchbase user community to help troubleshoot issues and answer questions. CE is built from the open source and its versions are aligned with the versions of Couchbase Server Enterprise Edition. However, CE lacks advanced features available in EE
Enterprise Edition (EE) is the latest, most stable, production-ready release of Couchbase Server and is recommended for production environments. EE provides full range of availability, performance, scalability, tooling and security capabilities and includes the latest quality improvements. EE subscribers get hot-fixes for issues. Use of EE in development and test environments is free. However it requires a paid subscription for production use. EE contains some unique features that make it the best fit for large production deployments running in data centers or a public cloud
For full details, check Couchbase Editions
There are different levels of Service Options (Silver, Gold, Platinum) and Couchbase is priced per node. The costs can range from $5600 to $9900 per node for a yearly subscription.
Some real world use cases for NoSQL
Google Cloud Datastore vs MongoDB vs Amazon DynamoDB vs Couchbase vs HBase vs Neo4j