Read time: 4 minutes
What is MongoDB
Relational databases were developed during the '70s, when the needs of applications were different than today's environments. The scale of internet today is huge and what Amazon learnedis that performance is an essential and also a very important feature of a website.
For today's needs, a new database had to be engineered, this is how Mongo was born.
Mongo is a new database, of the type NoSQL with modern essential features builtin like:
- Safety and Availability using ReplicaSets
- Performance and Scalability with automatic sharding
- Flexible schema for modern agile applications
Main Features
Replication - MongoDB has the ability to automatically synchronize data across multiple servers. Replication is also available on SQL databases but MongoDB has also automatic failover and automatic node recovery, so there is no need to install additional tools to achieve this. More info at https://docs.mongodb.org/manual/replication/. In production a replicaset from at least 3 servers is recommended as per the diagram below:
Load balancing - Sharding on RDBMS systems require all the logic to be applied on the application code. Changes in the application code are not required on MongoDB because this is done automatically, you just need to choose a shard key. More info at https://docs.mongodb.org/manual/sharding/.
You can find a recommended architecture for production below. To ensure that your systems are highly available each shard should be a replica set.
File storage - MongoDB documents have a limit of 16MB per document. For use cases with files that are larger than 16MB MongoDB developed GridFS, a very convenient way to store and retrieve files. More info at https://docs.mongodb.org/manual/core/gridfs/.
Aggregation - MongoDB supports Map-Reduce but for most of the queries, Aggregation framework is much faster. Aggregations are used for complex queries to process the data set. They are similar with 'Group By' from SQL but aggregation can do more complex processing like filters, document transformations. More info at https ://docs.mongodb.org/manual/aggregation/.
Flexible schema - When developing agile applications the database schema is changed very often, so you need to use schema migrations that can be quite cumbersome for a complex schema. MongoDB doesn't have this schema restrictions.
Online features from the Mongo company:
MMS - free of charge monitoring. You can see below what kind of information is collected. More info at mms.mongodb.org.
Backup - external automatic backup with a free tier price plan for small datasets. Technically they'll setup a replica set member so database cache will not be affected.
Education - free online courses for DBA's, programmers. Starting from December MongoDB company will initially launch certification exams.
MongoDB versus SQL
Advantages
1. Scalability is built-in without additional tools to be installed
2. Dynamic schema which is very welcomed on today agile applications
Disadvantages
1. Consistency is not guaranted but latest libraries for mongoDB have more safer options enabled by default.
2. More easy to find programmers and DBA's with SQL experience
MongoDB versus CouchDB
MongoDB Advantages
- MongoDB is a better general database, suitable to replace MySQL
- Cluster deployments out of the box with automatic failover
- Automatic sharding built-in
CouchDB Advantages
- Better performance on writes
Performance recommendations
1. Schema: Name of fields to be short names
2. Use indexes and perform profiling to check for slow queries
3. Allocate lots of RAM (mongo is very aggressive and wants to add all dataset in memory). Sharing a dedicated machine with other services is not recommended if you have a large data set. Mongo process will tell OS that he needs most of the RAM, other services will be pushed out and the machine will use swap.
4. Replicasets to be odd numbers. MongoDB has the option to use machines only as arbitrers that don't store any data, and are used only for elections. For production as a minimum configuration is recommended a replica set of 3 machines with 2 machines storing data and an arbitrer.
5. Don't use mongodump to perform backup as it will add to RAM data that is not used. Use instead a replica set member with a low priority.
Clients
US government https://www.mongodb.com/industries/government.
Analytics companies
Craiglist, eBay, Foursquare, Cisco, Openstack, Openshift
More info at https://www.mongodb.com/customers.
How we use it
English Attack! - https://assist-software.net/project/english-attack
New features of website including coaching mechanism required to store and analysis in real time lots of key metrics. To accomplish this in realtime we used nodeJS and MongoDB and Redis for caching the heavy queries.
Right now we have a dataset of 30G and the hardware is cluster from 3 machines forming a replica set.
The following diagram shows the architecture:
Zelgor Analytics - https://assist-software.net/project/zelgor-iphone-game
Database for analytics of Zelgor mobile application is on MongoDB and helped us performing very fast analysis of large volumes of data.
Conclusions
In the new era of high performance web applications use of a database like MongoDB is not optional. One way or another any application will have to use it.
So if you need help on developing MongoDB applications please contact us.
As a MongoDB partner company we will be more than happy to help you!
Update: Please take a look at my last blog posts: Cloud offering: Comparison between IaaS, PaaS, SaaS, BaaS