NoSQL Databases

An Overview of the Different Types of NoSQL Databases

Backend developer
Javarevisited

--

Photo by Franki Chamaki on Unsplash

NoSQL databases can be broadly categorized into four types based on their data models:

  • key-value database
  • document database
  • column-family database
  • graph database

While each of these types has its unique features, the first three types — key-value, document, and column-family — share a common characteristic known as aggregate orientation. This means that they store data in a way that allows for fast retrieval and manipulation of entire aggregates of information, rather than individual data elements.

On the other hand, graph databases use graph structures to represent and store data, making them particularly useful for applications that involve complex relationships between entities.

Aggregate Oriented Databases : A Closer Look

In contrast to relational databases, aggregate-oriented databases are designed to support atomicity at the individual aggregate level. This means that transactions cannot extend over multiple aggregates, and any atomic operation that spans multiple aggregates must be handled by the application. While this approach may seem limiting, it offers several benefits, including faster performance, greater scalability, and increased flexibility.

Let’s dive deeper into each of the aggregate-oriented databases and their structures.

Key-Value Data Model : An Efficient and Scalable Storage Solution

Key-value datastores are often referred to as giant hash tables, where data can be queried and manipulated using a key and an efficient hash function. This makes key-value datastores extremely scalable and efficient for performing data lookups and updates on large clusters. However, key-value stores may not be the best choice for cases where querying based on aggregate values is critical.

Some of the use-cases where key-value data stores excel include web sessions, user profiles, and shopping carts. These types of data have a natural key-value structure, and the ability to efficiently retrieve and manipulate individual records is crucial. However, there are scenarios where key-value data stores may not be the optimal choice. For example, they may not be ideal for transactions that span multiple keys or when querying based on the value part of the key-value pairs. Additionally, if there are correlated keys in the data store, it may become challenging to manage and query data efficiently.

Some popular examples of key-value data stores include Amazon DynamoDB, Riak, and Redis. These databases are known for their high performance, scalability, and fault-tolerance, making them a popular choice for applications that require fast and efficient storage and retrieval of key-value data.

Document Data Model : Flexible and Efficient Data Storage

Document data-model stores documents, which are collections of named fields and data, as aggregates, similar to key-value stores. However, unlike key-value stores, document databases expose their aggregates, enabling applications to query and filter data using the values in these fields. This allows for more efficient and targeted retrieval of specific parts of the aggregate, rather than retrieving the entire document.

One of the key benefits of document databases is that they allow for flexible schemas. Unlike relational databases, where every tuple in a relation must follow the same schema, documents in this data model can belong to the same collection and have different schemas. This makes document databases well-suited for use cases such as managing orders in e-commerce applications, web application visitors for real-time analytics, and profiles in blogging platforms.

Like key-value data stores, transactions and querying across documents may not be optimal for document-based databases. However, document databases offer the ability to update only a subsection of the document, providing a more efficient way to manage and update data.

Some popular examples of document databases include MongoDB, Amazon DocumentDB, and Couchbase. These databases are known for their flexibility, scalability, and ability to handle complex data structures and relationships, making them a popular choice for modern applications that require efficient and flexible data storage.

Choosing the Right Data Store : Key-Value vs. Document

When it comes to choosing between key-value and document data stores, the specific needs of the data and application must be taken into consideration. While both data models have their advantages and disadvantages, the choice ultimately depends on the complexity of the data structure and query patterns.

For applications with relatively simple data structures and query patterns, where the entire aggregate needs to be fetched, key-value databases are often the preferred choice. They are designed for efficient data look-ups and manipulation using a key and an efficient hash function, making them extremely scalable on large clusters.

However, as the complexity of the data structure and query patterns increases, document databases become a better option. They allow for more flexible schemas and enable applications to query and filter data using the values in these fields, providing more efficient retrieval of specific parts of the aggregate. This makes them well-suited for managing more complex data structures, such as those found in e-commerce applications or blogging platforms.

That being said, the line between key-value and document data stores is often thin and can depend on the specific needs of the application. Ultimately, the choice between the two depends on factors such as the size and complexity of the data, the required query patterns, and the scalability requirements of the application.

Columnar Data Model

Unlike other databases where data is stored by serializing on rows, columnar databases store data by serializing on columns instead. Let’s try to understand it further using an example shown in the table below.

Sample data set in tabular format

If the data is serialized on the rows then all the columns of a row will be stored together on the disk, and the structure on disk may look something similar to : R1, [A1, A2, A3] |R2, [B1, B2, B3]. However, in case of columnar databases the data is stored by keeping the
columns of multiple rows together and the structure on the disk may look like this : [(R1, A1), (R2, B1)], [(R1, A2), (R2, B2)], [(R1, A3), (R2, B3)]

An excellent example of a columnar data store is Amazon Redshift, which is a cloud hosted data warehouse solution optimized for data analysis over the vast quantity of data.

Graph Data Model

The graph data model is a powerful way of storing and querying complex relationships between data entities. Unlike relational databases, which use foreign keys to establish relationships between tables, graph databases persist the relationships as edges between nodes. This approach allows for much more efficient traversal of relationships, as the connections are stored directly in the database and don’t need to be computed at query time.

Graph databases are particularly useful for applications that require storing and querying complex relationships, such as social networks, recommendation engines, and location-based services. However, they may not perform as well as other database models when it comes to updating properties of all nodes in the database.

One way to enhance performance in graph databases is to store the physical RAM address of adjacent nodes and cache links to directly-related nodes. This can help reduce the number of disk reads required to traverse relationships, making queries more efficient.

In the graph data model, data entities are stored as nodes and relationship information is stored as edges. Queries on the graph structure are called traversals and can be made more efficient by storing node properties as indexes. Overall, the graph data model is a powerful tool for working with complex relationships in data, and has become increasingly popular in recent years as more applications require sophisticated data storage and querying capabilities.

Let’s dive deeper by evaluating few graph queries on the graph data-model, by assuming that all the nodes are already added into an index named nodeIndex.

A Graph Data Model of favorite TV series of a group of people

Query 1: All friends of Justin.
Here, we will fetch all the relationships of justin having edge direction as Direction.OUTGOING and we can filter the relationships having type as Friend.

Query 2: Find all people who like House Of Cards and GOT and are friends with each other.

Here, we will fetch all the relationships of houseOfCards having edge direction as Direction.INGOING and we can filter the relationships having type as Likes.

At this point, we can apply a boilerplate logic to find the people who like GOT and House of Cards and are friends with each other.

Applications are no longer tied to a single database and have the flexibility to use different data-stores depending upon the data storage needs. For instance, within an e-commerce platform, we may have different microservices for each of its components, and each of those micro-services may use entirely different data-stores depending upon the use-case.

We need to understand these concepts so that we can choose right NoSQL database while designing the systems. I hope this article gives the clarity about the characteristics of NoSQL databases. We will cover more system design concepts in future. Stay Tuned. Happy Learning!

--

--