Note: This is precursor post to my talk at Mongo Boston on September 20th. It’s gonna be at Microsoft’s NERD (which I hear is COMPLETELY AWESOME). If you haven’t signed up yet, stop being lame. It’s gonna be awesome. http://www.10gen.com/conferences/mongoboston2010
A lot of people come up to me and ask about MongoDB. Here’s a 101 for those of you still totally in the dark.
MongoDB is a database
It’s just like MySQL in the sense that you run a daemon, that daemon creates files on a filesystem, and you access it over a network via a client. A single mongod process runs on one machine, and can have many databases. A database can have many collections (“tables” in MySQL-speak). You can write to it & you can query based on attributes of records. Out of the box, it comes with support for replication and sharding. It has support for atomic operations. There are clients for it written in pretty much every popular language.
MongoDB is “schema-less”
In MySQL, you create a table w/ a pre-defined set of typed attributes (Create a ‘users’ table w/ name:string, email:string). When you write a new record to a table in MySQL, you specify attribute/value pairs: name = ryan. If you don’t specify a certain attribute (email, in this case) that’s usually ok. That record will just get a default value for that attribute. This default value could be null or an empty string or a predefined string, but it’s still there, and it has a value. Every record in the table will always have the same set of attributes. If you try to write a record to a table and include an attribute not in that table’s definition (wicked_attractive = true) you’ll get a nasty error .
In MongoDB, you create a “collection” with no pre-defined attributes (Create a ‘users’ collection). When you write a new document to a collection in MongoDB, you also specify attribute/value pairs: name = ryan. But in this case, there are no default values. There is no email attribute on that document. If you try to write a document to a collection with an attribute that no other document has (wicked_attractive = true), MongoDB will be ok with it. This is a key point: documents within the same collection can have different sets of attributes.
MongoDB is a “document-store”
This is closely related to being “schema-less.” In MySQL, you define a set of attributes for a table. Rows get inserted into tables, and the rows are 1 dimensional. What I mean by 1 dimensional is that all of the pieces of data in a row are first class citizens. The number of pieces of information in a row equals the number of attributes defined for that table.
MongoDB lets you store arbitrarily complex documents (think JSON). The following document can be stored in the users collection:
likes: [‘mongodb’, ‘skiing’, ‘Red Sox’, ‘Boulder chicks’],
dislikes: [‘humidity’, ‘Sarah Palin’, ‘bigotry’, ‘The Yankees’],
pants: ‘blue shorts’,
undies: ‘wouldn’t you like to know’
In this case, there are 5 “top level” attributes, but 14 “pieces of data.”
Along with standard DB types (string, integer, float, datetime, boolean), MongoDB also has arrays and hashes as native types. In this ‘users’ document, you have an embedded ‘current_outfit’ document, but ‘current_outfit’ isn’t a collection. It’s just an embedded document inside of this particular user document. You also have lists of likes and dislikes. The elements in a list do not have to be the same type.
You can put indexes on “deep” attributes. In MySQL, you can put an index on `users`.`email` to speed up queries on that attribute. In MongoDB, you can put indexes on any attribute in the document. In our previous example, for… example…, you can put an index on users.current_outfit.shirt and quickly query to see who is topless. If you put an index on an array type (users.likes), you’d be able to quickly query for any user who ‘liked’ ‘twitter’, and quickly get a result.
MongoDB is “NoSQL”
To query MySQL, you use (surprise, surprise) SQL:
SELECT * FROM `users` WHERE `users`.`email` = ‘email@example.com’ limit 1;
SQL is a very powerful language, where different types of joins give you the power to issue a single query that effectively spans multiple tables, and can return a result set with data from multiple tables.
There are $ operators for doing different types of inequalities, lat/long distance calculations, regex matches, etc….
When you issue a query to a MongoDB database, you cannot ask for stuff from two collections at once. There are no joins. However, keeping with our last example, if you query for a user with the email ‘firstname.lastname@example.org’ you would get back the entire user document we stored — with likes, dislikes, and current_outfit included. This is what people mean when they say “not having joins is ok because you don’t need them.” You can embed arbitrarily complex data inside a document, and get it all at once.
MongoDB is different (has downsides)
MongoDB is different. And anytime something is different, it has downsides from what you’re used to. Out of the box, MongoDB will acknowledge a write has completed before it’s on disk (although this is tunable on a write-by-write basis). MongoDB does not have transaction support (but after designing an app from the ground up with documents, you find you rarely need them). MongoDB will not make you more attractive to the opposite sex (although I hear they are working on it for 1.8).
I hope this has give you some insight into what MongoDB is. If you can make it to Mongo Boston, come say hi. I’ll be the wicked attractive topless guy. http://www.10gen.com/conferences/mongoboston2010