An Overview of Multi-Document ACID Transactions in MongoDB and How to Use Them


Database systems have a mandate to guarantee data consistency and integrity especially when critical data is involved. These aspects are enforced through ACID transactions in MongoDB. An ACID transaction should meet some defined rules for data validity before making any updates to the database otherwise it should be aborted and no changes shall be made to the database. All database transactions are considered as a single logical operation and during the execution time the database is put in an inconsistent state until the changes have been committed. Operations that successfully change the state of the database are termed as write transactions whereas those that do not update the database but only retrieve data are referred to as read-only transactions. ACID is an acronym for Atomicity, Consistency, Isolation, and Durability. 

A database is a shared resource that can be accessed by different users at different or at the same time. For this reason, concurrent transactions may happen and if not well managed, they may result in system crashes, hardware failure, deadlock, slow database performance or repetition in the execution of the same transaction.

What Are ACID Rules?

All database systems must meet the ACID properties in order to guarantee data integrity.

Atomicity

A transaction is considered as a single unit of operation which can either succeed completely or fail completely. A transaction cannot be executed partially. If any condition consulting a transaction fails, the entire transaction will fail completely and the database will remain unchanged. For example, if you want to transfer funds from account X to Y, here there are two transactions, the first one is to remove funds from X and the second one is to record the funds in Y. If the first transaction fails, the whole transaction will be aborted

Consistency

When an operation is issued, before execution, the database is in a consistent state and it should remain so after every transaction. Even if there is an update, the transaction should always bring the database to a valid state, maintaining the database invariants. For instance, you cannot delete a primary key which has been referenced as a foreign key in another collection.  All data must meet the defined constraints to prevent data corruption from an illegal transaction.

Isolation

Multiple transactions running concurrently are executed without affecting each other and their result should be the same if they were to be executed sequentially. When two or more transactions modify the same documents in MongoDB, there may be a conflict. The database will detect a conflict immediately before it is committed. The first operation to acquire a lock on the document will continue whereas the other will fail and a conflict error message will be presented.

Durability

This dictates that, once the transaction has been committed, the changes should be upheld at all times even at an event of a system failure for example due to power outages or internet disconnection. 

MongoDB ACID Transactions

MongoDB is a document based  NoSQL database with a flexible schema. Transactions are not operations that should be executed for every write operation  since they incur a greater performance cost over a single document writes. With a document based structure and denormalized data model, there will be a minimized need for transactions. Since MongoDB allows document embedding, you don’t necessarily need to use a transaction to meet a write operation.

MongoDB version 4.0 provides multi-document transaction support for replica set deployments only and probably the version 4.2 will extend support for sharded deployments (per their release notes). 

Example of a transaction:

Ensure you have a replica set in place first. Assuming you have a database called app and a collection users in the Mongo Shell run the following commands:

$mongos and you should see something like username:PRIMARY>

$use app $db.users.insert([{_id:1, name: ‘Brian’}, {_id:2, name: ‘Sheila’}, {_id:3, name: ‘James’}])

We need to start a session for our transaction:

$db.getMongo().startSession() and you should see something like session { "id" : UUID("dcfa8de5-627d-3b1c-a890-63c9a355520c") }

Using this session we can add more users using a transaction with the following commands 

$session.startTransaction() session.getDatabase(‘app’).users.insert({_id:4, name: ‘Hitler’})

You will be presented with WriteResult({“nInsterted”: 2})

The transaction has not yet been committed and the normal $db.users.find({}) will give us the previously saved users only. But if we run the 

$session.getDatabase(“app”).users.find()

the last added record will be available in the returned results. To commit this transaction, we run the command below

$session.commitTransaction()

The transaction modification is stored in memory that is why even after failure, the data will be available on recovery.

Multi-Document ACID Transactions in MongoDB

These are multi-statement operations that need to be executed sequentially without affecting each other. For the sample above we can create two transactions, one to add a user and another to update a user with a field of age. I.e.

$session.startTransaction() db.users.insert({_id:6, name “Ibrahim”}) db.users.updateOne({_id:3 , {$set:{age:50}}}) session.commit_transaction()

Transactions can be applied to operations against multiple documents contained in one or many collection/database. Any changes due to document transaction do not impact performance for workloads not related or do not require them. Until the transaction is committed, uncommitted writes are neither replicated to the secondary nodes nor are they readable outside the transactions.

Best Practices for MongoDB Transactions

The multi-document transactions are only supported in the WiredTiger storage engine. As mentioned before, very few applications would require transactions and if so, we should try to make them short. Otherwise, for a single ACID transaction, if you try performing an excessive number of operations, it can result in high pressure on the WiredTiger cache. The cache is always dictated to maintain state for all subsequent writes since the oldest snapshot was created. This means new writes will accumulate in the cache throughout the duration of the transaction and will be flushed only after transactions currently running on old snapshots are committed or aborted. For the best database performance on the transaction, developers should consider:

  1. Always modify a small number of documents in a transaction. Otherwise, you will need to break the transaction into different parts and process the documents in different batches. At most, process 1000 documents at a time.
  2. Temporary exceptions such as awaiting to elect primary and transient network hiccups may result in abortion of the transaction. Developers should establish a logic to retry the transaction if the defined errors are presented.
  3. Configure optimal duration for the execution of the transaction from the default 60 seconds provided by MongoDB. Besides, employ indexing so that it can allow fast data access within the transaction.  You also have the flexibility to fine-tune the transaction in addressing timeouts by breaking it into batches that allow its execution within the time limits.
  4. Decompose your transaction into a small set of operation so that it fits the 16MB size constraints. Otherwise, if the operation together with oplog description exceed this limit, the transaction will be aborted.
  5. All data relating to an entity should be stored in a single, rich document structure. This is to reduce the number of documents that are to be cached when different fields are going to be changed.

Limitations of Transactions

  1. You cannot create or drop a collection inside a transaction.
  2. Transactions cannot make writes to a capped collection
  3. Transactions take plenty of time to execute and somehow they can slow the performance of the database.
  4. Transaction size is limited to 16MB requiring one to split any that tends to exceed this size into smaller transactions.
  5. Subjecting a large number of documents to a transaction may exert excessive pressure on the WiredTiger engine and since it relies on the snapshot capability, there will be a retention of large unflushed operations in memory. This renders some performance cost on the database.

Conclusion

MongoDB version 4.0 introduced the multi-document transaction support for replica sets as a feature of improving data integrity and consistency. However, there are very few applications that would require transactions when using MongoDB. There are limitations against this feature that make it considerably little bit immature as far as the transactions concept is concerned. For instance, transactions for a sharded cluster are not supported and they cannot be larger than 16MB size limit. Data modeling provides a better structure for reducing transactions in your database. Unless you are dealing with special cases, it will be a better practice to avoid transactions in MongoDB.