NoSQL’s popularity has been on the rise over the last five years, with favorites like MongoDB, CouchDB, Cassandra, and Redis on the forefront. With NoSQL’s strengths in its fantastic performance and the ability to store and query denormalized data sets, there are a number of reasons to use NoSQL over SQL technologies like MySQL or Microsoft SQL Server.
Denormalized Data Structures
NoSQL allows developers to store denormalized data structures in document form. While many find this more enjoyable and easier to work with, there is a bit of an adjustment for someone come from a structured SQL background. For example, take a look at the following documents inside of the same MongoDB collection:
A similar normalized database structure similar to the following 7 tables:
So now, for just a basic structure, we have to put some effort in to create these tables and their relationships, while using MongoDB prevents this by allowing a denormalized document storage method.
The amount of tables grows considerably the more variable your data becomes; if we add a collegeDegreesObtained array to list the degrees held by each person, we would have to create two more tables: to manage the types of degrees and the connecting table between the degree and person. In NoSQL, we simply add an array to the document and we are done. We do not risk breaking any queries, having errors from NULL data, or having to manage default column values.
- Less work overhead for one-to-many relationships
- Less work overhead for adding new data to an entity
Querying in MongoDB also tends to be a little simpler when doing the more basic lookups. For example, to find someone with the first name Kerry, we do the following:
This translates to the following in SQL:
Not much difference in the amount of work. However, if we want to find someone with knowledge of the PHP scripting language and the Nginx server, we would do the following in NoSQL:
This would translate to the following query in SQL:
This example demonstrates the simplicity of querying MongoDB documents in comparison to querying complex relational data in MySQL. As stated before, SQL query size and overhead will grow much faster with new data, while MongoDB queries and documents will still relatively small and manageable.
- Simpler querying mechanism when searching for relational data (in SQL) values
There are a number of benchmark studies out there demonstrating the speeds of various NoSQL implementations vs SQL implementations. While we do not have any formalized studies, I did some testing personally and found that when inserting millions of documents into MongoDB and inserting a corresponding row into Microsoft SQL Server table, MongoDB took under half of the time. Querying was a similar story; MongoDB cut about half of the time to find a piece of data in a very large dataset. However, MongoDB eats up a lot of memory, so make sure to be cautious of that.
If you are not sure whether NoSQL will offer you performance advantages, simply search the web for comparisons and you will find a number of them.
- NoSQL typically offers better performance and speed
- NoSQL tends to use and require a lot of RAM, so be cautious
There are some drawbacks to using NoSQL; it is not a perfect solution to everything. Each implementation has its own issues. In our development of a genealogical search, we used MongoDB and found these issues.
Disk space consumption: MongoDB tends to take up a lot of space in comparison to the amount of data. There are some solutions to this problem (TokuMX, which will be discussed in a later blog), but MongoDB itself isn’t the most space-efficient solutions.
Document size restriction: The max size of a MongoDB document is 16MB. This was a particularly complex issue to work around if your document is a company with an array of subdocuments containing information on employees. If that list gets too large, you will have to split the company document and then re-connect them with an aggregation pipeline.
Pagination using skips and limits have very bad performance: Say your company has 10,000 people subdocuments. To skip the first 9000 is a serious performance hit as it seems to run your query parameters against each row. This is a completely ineffective method for paginating large datasets, but there are some workarounds, such as using a $gte parameter on the last page’s item ID.
When to use NoSQL
While this choice is always up to the project’s needs and the developers, NoSQL should very much be considered if the data is going to be very relational. Also, NoSQL can have some great performance benefits when used correctly. If you’ve never worked with NoSQL before, its barrier to entry is relatively low and implementations like MongoDB have great communities who can help you when you’re stuck.