The Acumen software design blog provides you with helpful articles about MongoDB.

A classified NoSQL database program, MongoDB stores data in flexible, JSON-like documents with schemas. This database program has a variety of features, such as Ad hoc queries, indexing, replication, load balancing, file storage, aggregation, capped collection and server-side JavaScript execution.

Our Acumen Software Developers can assist you with your MongoDB database implementation.

Several web sites provide a good overview.

Official MongoDB Overview

MongoDB Wikipedia

NoSQL’s popularity has been on the rise over the last five years, with favorites like MongoDB, CouchDB, Cassandra, and Redis on the forefront. With NoSQL’s strengths in its fantastic performance and the ability to store and query denormalized data sets, there are a number of reasons to use NoSQL over SQL technologies like MySQL or Microsoft SQL Server.

Denormalized Data Structures

NoSQL allows developers to store denormalized data structures in document form. While many find this more enjoyable and easier to work with, there is a bit of an adjustment for someone come from a structured SQL background. For example, take a look at the following documents inside of the same MongoDB collection:

{
	firstName: "Kerry",
	lastName: "Ritter",
	city: "St. Louis",
	state: "Missouri",
	serverExperience: [
		"Apache",
		"IIS"
	],
	programmingLanguages: [
		"PHP",
		"C#",
		"JavaScript"
	]
}
{
	firstName: "Rob",
	lastName: "Wagnon",
	city: "St. Louis",
	state: "Missouri",
	serverExperience: [
		"Apache",
		"Nginx",
		"IIS"
	],
	databaseExperience: [
		"Microsoft SQL Server",
		"MySQL"
	]
}
{
	firstName: "Dave",
	lastName: "Mueller",
	city: "St. Louis",
	state: "Missouri",
	serverExperience: [
		"Apache",
		"Nginx",
		"IIS"
	],
	programmingLanguages: [
		"PHP",
		"C#"
	]
}

A similar normalized database structure similar to the following 7 tables:

Person (ID, FirstName, LastName, State)
ProgrammingLanguages (ID, Title)
ServerTypes (ID, Title)
DatabaseTypes (ID, Title)
Person_ProgrammingLanguages (PersonID, ProgrammingLanguageID)
Person_ServerExperience (PersonID, ServerTypeID)
Person_DatabaseExperience (PersonID, DatabaseTypeID)

So now, for just a basic structure, we have to put some effort in to create these tables and their relationships, while using MongoDB prevents this by allowing a denormalized document storage method.

The amount of tables grows considerably the more variable your data becomes; if we add a collegeDegreesObtained array to list the degrees held by each person, we would have to create two more tables: to manage the types of degrees and the connecting table between the degree and person. In NoSQL, we simply add an array to the document and we are done. We do not risk breaking any queries, having errors from NULL data, or having to manage default column values.

Key points:

  • Less work overhead for one-to-many relationships
  • Less work overhead for adding new data to an entity

Querying

Querying in MongoDB also tends to be a little simpler when doing the more basic lookups. For example, to find someone with the first name Kerry, we do the following:

db.people.find({ firstName: "Kerry" })

This translates to the following in SQL:

SELECT * FROM Person WHERE FirstName = "Kerry"

Not much difference in the amount of work. However, if we want to find someone with knowledge of the PHP scripting language and the Nginx server, we would do the following in NoSQL:

db.people.find({ programmingLanguages: "PHP", serverExperience: "Nginx" })

This would translate to the following query in SQL:

SELECT * FROM Person p
INNER JOIN Person_ProgrammingLanguages ppl ON p.ID = ppl.PersonID
INNER JOIN ProgrammingLanguages pl on ppl.ProgrammingLanguageID = pl.ID
INNER JOIN Person_ServerExperience pse ON p.ID = pse.PersonID
INNER JOIN ServerTypes se on pse.ServerTypeID = se.ID
WHERE pl.Title = "PHP" AND se.Title = "Nginx"

This example demonstrates the simplicity of querying MongoDB documents in comparison to querying complex relational data in MySQL. As stated before, SQL query size and overhead will grow much faster with new data, while MongoDB queries and documents will still relatively small and manageable.

Key points:

  • Simpler querying mechanism when searching for relational data (in SQL) values

Performance

There are a number of benchmark studies out there demonstrating the speeds of various NoSQL implementations vs SQL implementations. While we do not have any formalized studies, I did some testing personally and found that when inserting millions of documents into MongoDB and inserting a corresponding row into Microsoft SQL Server table, MongoDB took under half of the time. Querying was a similar story; MongoDB cut about half of the time to find a piece of data in a very large dataset. However, MongoDB eats up a lot of memory, so make sure to be cautious of that.

If you are not sure whether NoSQL will offer you performance advantages, simply search the web for comparisons and you will find a number of them.

Key points:

  • NoSQL typically offers better performance and speed
  • NoSQL tends to use and require a lot of RAM, so be cautious

Drawbacks

There are some drawbacks to using NoSQL; it is not a perfect solution to everything. Each implementation has its own issues. In our development of a genealogical search, we used MongoDB and found these issues.

Disk space consumption: MongoDB tends to take up a lot of space in comparison to the amount of data. There are some solutions to this problem (TokuMX, which will be discussed in a later blog), but MongoDB itself isn’t the most space-efficient solutions.

Document size restriction: The max size of a MongoDB document is 16MB. This was a particularly complex issue to work around if your document is a company with an array of subdocuments containing information on employees. If that list gets too large, you will have to split the company document and then re-connect them with an aggregation pipeline.

Pagination using skips and limits have very bad performance: Say your company has 10,000 people subdocuments. To skip the first 9000 is a serious performance hit as it seems to run your query parameters against each row. This is a completely ineffective method for paginating large datasets, but there are some workarounds, such as using a $gte parameter on the last page’s item ID.

When to use NoSQL

While this choice is always up to the project’s needs and the developers, NoSQL should very much be considered if the data is going to be very relational. Also, NoSQL can have some great performance benefits when used correctly. If you’ve never worked with NoSQL before, its barrier to entry is relatively low and implementations like MongoDB have great communities who can help you when you’re stuck.

Dumping a MongoDB database is done using the “mongodump” utility in the command prompt. This dump creates a binary export of the database. This export can be restored using the “mongorestore” utility.

A basic dump of a local MongoDB database can be code as follows:

mongodump --db mymongodatabase

To dump a specific collection, simply specify the collection:

mongodump --collection collection --db mymongodatabase

Also, you can dump the database to a specific server location:

mongodump --db mymongodatabase --dbpath /var/mypath/mongodumps

We wanted to create a utility that would allow our client to backup their WordPress-integrated MongoDB database using an admin utility. To do this, we built a PHP class that allows us to enter the database name, the dump location, and go. This tool also ZIPs the binary files for us, allowing us to conserve disk space and contain each dump in one file. You can access this class here: https://github.com/KerryRitter/MongoDumper

A basic usage of this class is done as follows:

$dumper = new MongoDumper("/var/mypath/mongodumps");
$dumper->run("mydb", true); // 'true' shows debug info

This will dump the local ‘mydb’ database to /var/mypath/mongodumps/mydb_[timestamp].zip and display the shell output and various debug info. To remove the debug info, remove the “true” parameter in the “run” function call.

To restore, unzip the BSON and JSON files and upload to a folder on the server with the name of the database you wish to restore to. For example, if you wish to restore “mydatabase”, you will need to put the files in a path such as /var/mypath/mongodumps/mydatabase. To restore the database, use the following command in your command prompt:

mongorestore /var/mypath/mongodumps/mydatabase

There are a number of options available for restoration that you can find in the documentation: http://docs.mongodb.org/manual/tutorial/backup-with-mongodump/

Please note that the published MongoDumper class is the simple core; there are security concerns with using shell_exec that need to be addressed before you use it on a publicly accessible site. If there is an error given by the dump or the dump did not work, make sure to check the MongoDumper.php file and the backup folder’s permissions.