The Acumen software design blog provides you with helpful articles about SQL Servers.

Several web sites provide a good overview.

Microsoft Official SQL Servers

Microsoft SQL Servers Wikipedia

NoSQL’s popularity has been on the rise over the last five years, with favorites like MongoDB, CouchDB, Cassandra, and Redis on the forefront. With NoSQL’s strengths in its fantastic performance and the ability to store and query denormalized data sets, there are a number of reasons to use NoSQL over SQL technologies like MySQL or Microsoft SQL Server.

Denormalized Data Structures

NoSQL allows developers to store denormalized data structures in document form. While many find this more enjoyable and easier to work with, there is a bit of an adjustment for someone come from a structured SQL background. For example, take a look at the following documents inside of the same MongoDB collection:

{
	firstName: "Kerry",
	lastName: "Ritter",
	city: "St. Louis",
	state: "Missouri",
	serverExperience: [
		"Apache",
		"IIS"
	],
	programmingLanguages: [
		"PHP",
		"C#",
		"JavaScript"
	]
}
{
	firstName: "Rob",
	lastName: "Wagnon",
	city: "St. Louis",
	state: "Missouri",
	serverExperience: [
		"Apache",
		"Nginx",
		"IIS"
	],
	databaseExperience: [
		"Microsoft SQL Server",
		"MySQL"
	]
}
{
	firstName: "Dave",
	lastName: "Mueller",
	city: "St. Louis",
	state: "Missouri",
	serverExperience: [
		"Apache",
		"Nginx",
		"IIS"
	],
	programmingLanguages: [
		"PHP",
		"C#"
	]
}

A similar normalized database structure similar to the following 7 tables:

Person (ID, FirstName, LastName, State)
ProgrammingLanguages (ID, Title)
ServerTypes (ID, Title)
DatabaseTypes (ID, Title)
Person_ProgrammingLanguages (PersonID, ProgrammingLanguageID)
Person_ServerExperience (PersonID, ServerTypeID)
Person_DatabaseExperience (PersonID, DatabaseTypeID)

So now, for just a basic structure, we have to put some effort in to create these tables and their relationships, while using MongoDB prevents this by allowing a denormalized document storage method.

The amount of tables grows considerably the more variable your data becomes; if we add a collegeDegreesObtained array to list the degrees held by each person, we would have to create two more tables: to manage the types of degrees and the connecting table between the degree and person. In NoSQL, we simply add an array to the document and we are done. We do not risk breaking any queries, having errors from NULL data, or having to manage default column values.

Key points:

  • Less work overhead for one-to-many relationships
  • Less work overhead for adding new data to an entity

Querying

Querying in MongoDB also tends to be a little simpler when doing the more basic lookups. For example, to find someone with the first name Kerry, we do the following:

db.people.find({ firstName: "Kerry" })

This translates to the following in SQL:

SELECT * FROM Person WHERE FirstName = "Kerry"

Not much difference in the amount of work. However, if we want to find someone with knowledge of the PHP scripting language and the Nginx server, we would do the following in NoSQL:

db.people.find({ programmingLanguages: "PHP", serverExperience: "Nginx" })

This would translate to the following query in SQL:

SELECT * FROM Person p
INNER JOIN Person_ProgrammingLanguages ppl ON p.ID = ppl.PersonID
INNER JOIN ProgrammingLanguages pl on ppl.ProgrammingLanguageID = pl.ID
INNER JOIN Person_ServerExperience pse ON p.ID = pse.PersonID
INNER JOIN ServerTypes se on pse.ServerTypeID = se.ID
WHERE pl.Title = "PHP" AND se.Title = "Nginx"

This example demonstrates the simplicity of querying MongoDB documents in comparison to querying complex relational data in MySQL. As stated before, SQL query size and overhead will grow much faster with new data, while MongoDB queries and documents will still relatively small and manageable.

Key points:

  • Simpler querying mechanism when searching for relational data (in SQL) values

Performance

There are a number of benchmark studies out there demonstrating the speeds of various NoSQL implementations vs SQL implementations. While we do not have any formalized studies, I did some testing personally and found that when inserting millions of documents into MongoDB and inserting a corresponding row into Microsoft SQL Server table, MongoDB took under half of the time. Querying was a similar story; MongoDB cut about half of the time to find a piece of data in a very large dataset. However, MongoDB eats up a lot of memory, so make sure to be cautious of that.

If you are not sure whether NoSQL will offer you performance advantages, simply search the web for comparisons and you will find a number of them.

Key points:

  • NoSQL typically offers better performance and speed
  • NoSQL tends to use and require a lot of RAM, so be cautious

Drawbacks

There are some drawbacks to using NoSQL; it is not a perfect solution to everything. Each implementation has its own issues. In our development of a genealogical search, we used MongoDB and found these issues.

Disk space consumption: MongoDB tends to take up a lot of space in comparison to the amount of data. There are some solutions to this problem (TokuMX, which will be discussed in a later blog), but MongoDB itself isn’t the most space-efficient solutions.

Document size restriction: The max size of a MongoDB document is 16MB. This was a particularly complex issue to work around if your document is a company with an array of subdocuments containing information on employees. If that list gets too large, you will have to split the company document and then re-connect them with an aggregation pipeline.

Pagination using skips and limits have very bad performance: Say your company has 10,000 people subdocuments. To skip the first 9000 is a serious performance hit as it seems to run your query parameters against each row. This is a completely ineffective method for paginating large datasets, but there are some workarounds, such as using a $gte parameter on the last page’s item ID.

When to use NoSQL

While this choice is always up to the project’s needs and the developers, NoSQL should very much be considered if the data is going to be very relational. Also, NoSQL can have some great performance benefits when used correctly. If you’ve never worked with NoSQL before, its barrier to entry is relatively low and implementations like MongoDB have great communities who can help you when you’re stuck.

SQL

In his excellent book, Star Schema, Christopher Adamson begins Chapter 7, “Hierarchies and Snowflakes,” with the following language:

It is possible to describe a dimension table as a series of parent-child relationships among groups of attributes. Days make up months, months fall into quarters, and quarters fall into years, for example. This chapter explores these attribute hierarchies and their implications for the design and use of dimensional databases.

Page 147. This understanding appears fully consistent with the definition of found in MSDN:

Attribute hierarchy

An attribute hierarchy is a hierarchy of attribute members that contains the following levels:

  • A leaf level that contains each distinct attribute member, with each member of the leaf level also known as a leaf member.
  • Intermediate levels if the attribute hierarchy is a parent-child hierarchy.
  • An optional (All) level (IsAggregatable=True) containing the aggregated value of the attribute hierarchy’s leaf members, with the member of the (All) level also known as the (All) member.

http://msdn.microsoft.com/en-us/library/ms144884.aspx
It may seem intuitively obvious what the phrase “Attribute Hierarchy” means, and the descriptions given by Mr. Adamson and the definition given in MSDN probably fit neatly with your intuition. If you follow your intuition, however, other documentation for Microsoft SQL Server Analysis Services (SSAS) becomes confusing, and you are unlikely to leverage features of the product that will make it more useful and responsive.

Microsoft Documentation

To cut to the chase, what you need to know is this: In most of Microsoft’s documentation (other than the definition shows above) and within the user interface of Business Intelligence Development Studio (“BIDS”), “attribute hierarchy” means a hierarchy with exactly one attribute. Intuitively, a hierarchy with only one attribute may not seem like much of a hierarchy at all to most people. By default, however, it does contain two levels: The All level and the level under the All level with (by default) the same name as the attribute.

MSDN defines an additional type of hierarchy, the “user-defined hierarchy.” It is defined in the following way (Attribute Hierarchies, from MSDN attributes and attribute hierarchies) :

User-defined hierarchy

A user-defined hierarchy is a balanced hierarchy of attribute hierarchies that is used to facilitate browsing of cube data by users. User-defined hierarchies do not add to cube space. Levels in a user-defined hierarchy can be hidden under certain circumstances and appear unbalanced.

Clearly, this definition contemplates the possibility of more than two levels created from more than one attribute, and this definition of “user-defined hierarchy” seems to be the same as our intuitive understanding of “attribute hierarchy.” But they are in fact different within the Microsoft world.

So… Why should we care?

Why do we care? Because in SSAS, individual attributes contained in dimensions have properties that only make sense if a stricter definition of attribute hierarchy is used–a definition that refers to hierarchy with exactly one attribute level and (usually) an All level. Look at the properties window for a single attribute contained in a dimension:

SSAS Screen Shot

Notice that there are several properties of the attribute that have “Attribute Hierarchy” contained within them and that these properties are not properties of a user-defined hierarchy. The description contained in gray-highlighted box at the bottom of the properties window only makes sense if “attribute hierarchy” and “user-defined hierarchy” mean different things.

Developers define user-defined hierarchies in the “Hierarchies” panel of the Dimension Structure tab.

SSAS Screenshot SQL

In conclusion, be aware that when you are reading Microsoft documentation and using the SSAS interface, “attribute hierarchy” and “user-defined hierarchy” are two very distinct things, and failing to understand this will result in confusing properties that only apply to attribute hierarchies to user-defined hierarchies.

When we use the word “schema”, we are usually referring to the overall design definitions of tables, views, stored procedures and other objects in a database. Within Microsoft SQL Server, however, there is an object called “schema” which permits other objects (such as tables, views and stored procedures) to be logically grouped together. Every table must belong to a schema, and by default, the default schema name for objects in SQL Server is “dbo.” 

When one is confronted with a list of tables in a database and all of the tables are in a single schema, it is easy to get frustrated when viewing a list of tables in the database. The list can contain several hundred table names, and searching to finding what one wants can be awkward.

In the past, designers of databases in Microsoft SQL Server almost always let all objects remain in the “dbo” schema, and according to the rules of T-SQL (Microsoft’s flavor of SQL), if, say, a table were referenced by its table name alone without including the schema name, it was assumed, even by the SQL Server query engine, to be in the dbo schema. Referring to the table Clients was the same as referring to the table dbo.Clients. There was no other schema than the dbo schema, and there was no other table named Clients.  This works well for databases with few objects, such as tables, in them.

In early versions of Microsoft SQL Server, the schema name could only be the name of a database user. Users could not be removed if there were objects in the schema having that user’s name, and this did not work out well in practice. If database designers used schemas other than dbo (not very common), imaginary users were created so that there was no undesirable linkage between users and schemas. So there were users named Mary and Joe, but also users name Purchasing. No one would ever log in as the Purchasing user, but the user/schema name was used to group objects together which primarily dealt with purchasing.

By creating schemas with meaningful names, it is possible to group tables in clusters, often representing the same clusters of tables that appear in database diagrams that show tables as they relate to each other through referential integrity. 

It is possible to grant security rights by schema, and often this turns out to be a logical classification on which to grant or deny such rights. 

Tearing apart and reconstructing an existing database to group tables into schemas is not, in my opinion, something that should necessarily be done; however, these concepts should be born in mind for future database development work. 

SQL Server 

Description:

We have encountered the unfortunate situation where a company has a third-party application and the performance problems associated with this application cannot be solved despite many in-house efforts to do so and many attempts to get cooperation from the vendor in troubleshooting the root cause of the problem.

The vendor has often come back with demands for additional or more powerful hardware or greater network bandwidth where none of the suggestions fix the problem. The company using the software doesn’t know what to do next, especially when the software product is well-known in the industry or is from a major software vendor. Customers don’t know where to go next to seek solutions when the vendor will not cooperate.

When a problem is addressed without cooperation from the third-party vendor, the vendor will usually no longer support the application. This is not an acceptable solution even if solutions are found. The vendor must be persuaded to help in order to effect a real solution. The customer is held back by its own lack of confidence in dealing with the problem to demand a solution.

Many third-party vendors hide behind flimsy policies regarding customer information to refuse to share information they have gathered from other customers which would clearly show that the problem is with their software. They use excuses and tactics which can often be easily disproven without their cooperation. For example, third party vendors often assert that a customer is using their product in a way that is different than other customers; even if there is a problem, the vendor cannot fix the problem without adversely affecting other customers. I have often found that this is nonsense.

SQL Server Problem:

The problem starts when the vendor allocates resources for development of their product. The vendor has or acquires a great deal of knowledge of the business process necessary for the product and designs a great user interface to implement the required functionality.

Because there isn’t a lot to “see” from good data architecture work during the development phase, the vendor has never seriously considered the database as an important part of the application requiring expert resources for design or optimization.

This is true even with Microsoft, where the SharePoint product clearly has many continuing design and tuning issues. (Microsoft continually gets feedback from within the company on this product and in a mind-blowing use of compartmentalization and denial continues to refuse to address these issues.)

Some companies can be persuaded or embarrassed into taking these issues seriously when presented with undeniable evidence and customers who refuse to accept nonsense for excuses.

There are many things that can be done even after deployment. These remedial actions can be implemented as a result of experienced analysis and interpretation of performance monitor traces, results from dynamic management views and SQL Server Profiler traces. Acumen has the experience to do this, and to teach you to do this as well.

As a result of this an engagement where we analyze the use of SQL Server in a third party application, Acumen can provide the following, where appropriate.

Steps:

  1. A written report targeted to a particular application that proves that problems with the application are harmfully affecting performance, if such is the case.
  2. Knowledge transfer from Acumen to you so that you have confidence to pressure the vendor to provide a fix and evidence that the required fixes will not negatively affect performance.
  3. An Acumen consultant can participate in conference calls (or in-person visits) where Acumen becomes your advocate for receiving fixes to your legitimate issues.
  4. Occasionally, Acumen may have previously experiences with this software to defeat claims or implications from the vendor that the problem is yours alone.

SQL Server

SQL Server

Problem

When migrating a Visual Studio project with a local SQL Server application database to IIS, you receive the following error:

“Failed to generate a user instance of SQL Server due to failure in retrieving the user’s local application data path. Please make sure the user has a local user profile on the computer. The connection will be closed.”

Resolution

In IIS, Select the appropriate application pool.  Change advanced settings->process model->Load User Profile value to true.

 

For more information about how Acumen can help with your software needs, contact us today!