Prof. Tom Cormen
High-performance computing with disk-resident data? Although that might seem like an oxymoron, we are developing a software environment, FG, to enable it.
FG is for asynchronous programs that run on clusters and fit into a pipeline framework. Each pipeline stage corresponds to a function that operates on a buffer. Multiple buffers traverse the pipeline and correspond to blocks in the memory hiearchy. Stages run asynchronously (via threads) in order to make it easy to overlap their operations (computation, communication, and I/O).
Using FG, we have developed programs that can sort well in excess of 100 gigbytes of data. (Image, above, shows Tom's whiteboard covered with deep thoughts about pipelines.)