Thursday, October 20, 2005

The Opening of the Field: PostgreSQL's Multi-Version Concurrency Control LG #68:

A Deeper Look at MVCC

Database systems that use row-level locking do not retain old versions of the data, hence the need for locks to maintain data consistency. But a deeper look into how "no-locking" through MVCC works in PostgreSQL reveals how PostrgreSQL gets around this limitation. Each row in PostgreSQL has two transaction IDs. It has a creation transaction ID for the transaction that created the row, and an expiration transaction ID for the transaction that expired the row. When someone performs an UPDATE, PostgreSQL creates a new row and expires the old one. It's the same row, but in different versions. Unlike database systems that don't hold on to the old version, when PostgreSQL creates a new version of the row it also retains the old or expired version. (Note: Old versions are retained until a process called VACUUM is run on the database.)

That's how PostgreSQL creates versions of the data, but how does it know which version to display? It bases its display on several criteria. At the start of a query, PostgreSQL records two things: 1) the current transaction ID and 2) all in-process transaction IDs. When someone accesses data, Postgres issues a query to display all the row versions that match the following criteria: the row's creation transaction ID is a committed transaction and is less than the current transaction counter, and the row lacks an expiration transaction ID or its expiration transaction ID was in process at query start.

And this is where MVCC's power resides. It enables PostgreSQL to keep track of transaction IDs to determine the version of the data, and thereby avoid having to issue any locks. It's a very logical and efficient way of handling transactions. New PostgreSQL users are often pleasantly surprise by the performance boost of MVCC over row-level locking, especially in a large multi-user environment.

MVCC also offers another advantage: hot backups. Many other databases require users to shutdown the database or lock all tables to get a consistent snapshot - not so with PostgreSQL. MVCC allows PostgreSQL to make a full database backup while the database is live. It simply takes a snapshot of the entire database at a point in time and dumps the output even while data is being inserted, updated or deleted.

CONCLUSION

MVCC ensures that readers never wait for writers and writers never wait for readers. It is a logical and efficient version management mechanism that delivers better database performance than traditional row-level locking.

0 Comments:

Post a Comment

<< Home