6.2. Using Twister in FutureGrid

6.2.1. What is Twister?

MapReduce programming model has simplified the implementations of many data parallel applications. The simplicity of the programming model and the quality of services provided by many implementations of MapReduce attract a lot of enthusiasm among parallel computing communities. From the years of experience in applying MapReduce programming model to various scientific applications, we identified a set of extensions to the programming model and improvements to its architecture that will expand the applicability of MapReduce to more classes of applications. Twister is a lightweight MapReduce runtime we have developed by incorporating these enhancements.

Twister provides the following features to support MapReduce computations. (Twister is developed as part of Jaliya Ekanayake’s Ph.D. research and is supported by the Salsa Team @ IU)

  • Distinction on static and variable data
  • Configurable long running (cacheable) map/reduce tasks
  • Pub/sub messaging based communication/data transfers
  • Efficient support for Iterative MapReduce computations (much faster than Hadoop or Dryad/DryadLINQ)
  • Combine phase to collect all reduce outputs
  • Data access via local disks
  • Lightweight (~5600 lines of Java code)
  • Support for typical MapReduce computations
  • Tools to manage data

image19

Iterative MapReduce programming model using Twister

6.2.2. Running Twister on FutureGrid

Twister can be run in various modes within FG either in FutureGrid HPC or FutureGrid Cloud environment:

6.2.3. Run Twister Applications

We provide Kmeans and Blast run on Twister as examples.

6.2.4. Papers and Presentations