Archive

Posts Tagged ‘database system’

Current Trends in Distributed Database Systems (Talk)

December 13th, 2010 Angus Macdonald 2 comments

I recently gave a talk to our Masters Databases class entitled Current Trends in Distributed Database Systems.

The talk (available here) covers some of the more innovative designs in database systems over the last few years, from Vertica and VoltDB, to larger-scale datastores such as Amazon’s Dynamo.

Major aside: I tried and failed to come up with a more entertaining title for the talk. The suggestions I received on twitter were better, but less relevant (one of the suggestions is on my title slide).

So, if you think you can do better and come up with something that is both relevant and witty/entertaining, there’ll be some form of prize in it for you!

Designing a Resource-Aware Distributed Database (Presentation)

February 17th, 2010 Angus Macdonald No comments

I recently gave a presentation to Masters students in the department on my research. It’s probably the longest talk I’ve done on my own work (~50 minutes), so it covers a number of areas: stating why it is I’m doing what I’m doing, the design issues in this area, and an overview of my implementation. You can find handouts of the slides below:

6-Up Slides

1-Up Slides

The Design of Concurrency Control Mechanisms

September 11th, 2009 Angus Macdonald 1 comment

I’ve spent some time recently revising concurrency control mechanisms, looking for the most appropriate solution for the database system we’re developing.

This article describes one such model, read-write-certify, and looks at how it might be extended for use in a distributed database system. I hope this helps to highlight the issues involved in the design of concurrency control mechanisms more generally.

The Basics

Read-write-certify (RWC) is a slightly more liberal version of the typical shared read, exclusive write locking model (as seen in most DBMSs). Unlike the latter approach RWC allows reads and writes to execute concurrently on the same item. To explain, consider the following example.

Yes

Figure 1: Single Item Database

Figure one shows a local database containing one item X, which stores the value ‘1’. In the shared-read, exclusive write model a machine updating X obtains an exclusive lock while it makes its update, meaning all read queries must wait on the update to complete. However in RWC, a transaction creates a new version of the item (named XI in figure 2) in its own transaction workspace on which to make the update. This ensures that the original copy of X is available for read queries.

Locking Comparison

Figure 2: Contrasting Approaches to Locking

The updating transaction uses  its local workspace copy (XI) to execute the update, but must then commit these changes by updating X, the global copy. Another lock called a certify lock is required to give exclusive access to the original while the update is committed.

Certify Locking

Figure 3: Certify Locking

The transaction with the certify lock now completes the update by transferring changes to the primary copy, as shown above.

The principal advantage of this approach in a centralized database is that read queries are not blocked while updates are taking place, giving slightly more concurrency. While certify locks block, they are likely to be held for less time because the update has already been written.

The Effect of Distribution

Despite the improvement from the previous approach, Read-Write-Certify must still block read requests when certifying an update. In a distributed database system is it possible to get around this restriction?

To answer that question we require an item with multiple replicas, each on a different machine.

Figure 4: Distributed Database

Figure 4: Multiple Replicas

In this system the same locking principles hold for individual replicas as they did with the local database, but the certify lock has only to the lock a single replica, not every copy. While the update is being completed on this replica read queries can still access other copies elsewhere.

Consider this in our example system. A transaction has previously obtained a write lock on X2 and has updated its value. It is now attempting to commit the update and has a certify lock on X2. Another transaction requests a read lock and has it granted for X1, the other replica. The read request can be made, unblocked, despite an update being completed on the other machine.

Figure 5: Non-blocking Read Query

Figure 5: Non-blocking Read Query

If the update transaction on X2 completes before the read on X1 then the read will take place on out-of-date data. This isn’t desirable, but for many applications it is acceptable on the proviso that the result-set is consistent, if outdated. However, this condition isn’t necessarily true either, as the following example shows.

There are two transactions, T1 and T2, and two items, X and Y. T1 starts by getting a read lock on a copy of X (with the notation R(X)), and subsequently gets a read lock for Y. However, another transaction, T2, has committed an update to both X and Y after T1 obtained a read lock on X but before it obtained a read lock on Y.

T1 completes its query by returning an old value of X and the current value of Y. The result is inconsistent.

Figure 6: Serializability of RWC

Figure 6: Serializability of RWC

This is only a problem because the first transaction obtained read locks for both tables separately. If the system uses a centralized lock manager, locks can be taken out at the same time, making the result consistent. However, a centralized lock manager creates a point of contention (and failure), so it may not be desirable to have one.

Alternatives

Its probably doubtful that you’d want to use this approach in practise. If your application can accept outdated but consistent data then a more expressive multi-version system may be more appropriate. My point is to show what happens when you expand or relax basic pessimistic approaches.

RWC is a two-phase locking approach to multi-version concurrency control. The alternative is timestamp ordering: each transaction is given a unique start timestamp, while data items are given read and write timestamps that are equal to the timestamp of the last timestamp to read or write the item.

Reads and updates never block. If an operation is a read then it accesses the version with the largest timestamp less than the reading transaction. The read timestamp is then updated. If an operation is an update then a new version of the item is created with the timestamp of the updating transaction, provided that a more recent transaction has not read an older version of the data than the updating transaction. In this case the transaction is aborted.

Serializability is ensured by aborting transactions which access data out of timestamp order.

Finally

Every concurrency control mechanism is designed with three things in mind: the degree of concurrency provided (obviously), the potential for deadlock, and the level of consistency that is guaranteed.

Because there is no definitive solution for any of these, concurrency control models vary widely based on their target application. Most of the fundamental research on the topic was done over thirty years ago, so for further reading you can either go back to that work or look at some more recent textbooks. The following texts were useful to me:

Trends in Database Systems Research: Energy Efficiency

July 9th, 2009 Angus Macdonald No comments

Wherever you look nowadays companies are searching for ways to market themselves as the environmental alternative, both because it makes customers feel good and promises to save them money. It follows that this is particularly true of large computing companies, given the cost of running data centres. As an example, US data centres alone run at an estimated cost of $2.7 billion, 1.2% of the total national energy consumption (ref).

The challenge is in finding ways to reduce the waste.

Broadly speaking there are two complementary approaches to reducing energy consumption: by making hardware more efficient, and software less resource intensive. Database research is beginning to appear on the latter approach, but I think a lot of the work on the hardware side is just as interesting.

Hardware Optimization

In The Case for Energy-Proportional Computing Barroso and Hölzle look at the energy efficiency of typical data centres. They show that the energy efficiency of a server is not directly proportional to its utilization – so, for example, a server running at near 0% utilization is using 50% of the power it uses at peak utilization. Ideally servers should use no power when not in use and power only in proportion to their utilization when they are. The authors call for future hardware design to aim for better energy proportionality, so that machines that are doing little, cost little.

Software Optimization

Two recent CIDR papers look at reducing energy consumption in database systems.

Energy Efficiency: The New Holy Grail of Data Management Systems Research looks at areas where software optimizations can be made, and more generally provides a number of approaches to reducing energy waste. It’s a worthwhile read if you want to get a good feel for the area.

Towards Eco-friendly Database Management Systems proposes that energy consumption be considered a first-class performance goal when planning and processing queries. The authors give details of two optimizations which can help to reduce the energy consumption of a database system.

The first uses the ability of modern processors to execute at a lower power voltage and frequency – their database can explicitly order the processor to operate at a lower voltage when such a change is desirable.

Their second technique is to queue queries where possible so that query aggregation can be used more often, reducing the number of repeat queries to the database. In some evaluations these approaches yield a 49% reduction in energy consumption against only a 3% increase in response time.

While these kind of solutions are not the whole answer (in many cases any reduction in performance would be unacceptable), they at the very least provide an interesting perspective.

Existing Resources

One of the most interesting statistics from this work is the 50% energy consumption of servers doing nothing at all. Essentially, machines doing nothing are still doing something. If we assume that we can’t reduce the energy consumption of these machines to zero, then the question becomes how can these machines be used?

When it comes to user workstations various volunteer computing projects (see Seti@HOME, Folding@Home) strive to make use of unused capacity. But within an enterprise there is a paucity of software able to take advantage of unused resources – resources, which as this article points out, are still costing companies money.

This is one of the motivations behind the H2O project which I am currently involved in.

http://www.cs.st-andrews.ac.uk/~angus/2009/07/trends-in-database-systems-research-column-stores/

Trends in Database Systems Research: Column-Stores

July 7th, 2009 Angus Macdonald No comments

Column-store databases are one of the more interesting areas of innovation in recent database systems research. While column-stores have been around for decades, research in the area has recently been kick-started by Mike Stonebraker and others as part of the C-Store project, and it makes for an interesting discussion.

What is a column-store?

The basic premise of this work is simple. Databases which store data by column (with attributes written contiguously on disk) are able to service read queries much faster than more traditional row-store databases (with records written contiguously on disk). Attributes not included in queries can be ignored, rather than just skipped over, and data can be easily compressed, because techniques such as run-length encoding work far more effectively over attributes (where entries are similar), than over rows (where they are distinct). Both features reduce the disk bandwidth required to execute a query, reducing a potentially large bottleneck.

Traditionally the problem with this approach has been a noted slowdown in the speed of updates – the design which makes reading from the database extremely fast, results in the opposite effect when writing. C-Store solves this by creating two stores: a large read-optimized store, and a smaller writeable store. Updates are sent to this smaller store, before being bulk moved to the larger variant at a later date.  This works because C-Store is targeted at the data warehousing market, where queries are read-mostly and updates are infrequent. Specialization is key.

If you read one paper from the area, make it C-Store: A Column Oriented DBMS.

Commercial Rivalry

Not surprisingly given the promise of this work, database vendors are taking note. C-Store itself has spawned a commercial version, Vertica.

Perhaps as a result we may see fewer academic papers on the subject, but thankfully a number of the parties involved have created blogs which provide useful insights into the current focus of their work.

The people behind Vertica (and thus C-Store) have an interesting blog named The Database Column, which ostensibly promotes the benefits of column-stores, but backs this up with a lot of interesting work and evaluation.

Daniel Abadi, yet another C-Store member, has recently created his own blog, which oddly seems to have a slightly more commercial slant than the previously mentioned Vertica blog. Again, if you have any interest in this area his posts are worth reading.

More generally, Curt Monash’s DBMS2 blog provides an interesting account of the latest happenings in the commercial database world.

H2 Database (Presentation)

May 28th, 2009 Angus Macdonald No comments

I recently gave a talk on the H2 database system, covering some basics of its internal structure. The slides can be found here.

For the most part the slides just contain various diagrams that I used to talk over, so it may be that they don’t work particuarly well on their own.

I started looking at H2 while going through the sources of numerous database systems, looking for something that I could easily hack up for my own research. It stood out over the others when it came down to code clarity and the relative modularity of its various components.