7. Twister with FutureGrid Cloud Eucalyptus

Author: Tak-Lon Stephen Wu Version: 0.1 Date: 2011-10-30

Twister is widely used by domain scientists for running their scientific applications in parallel fashion. Here, we provide an example to use Hadoop on FutureGrid test-bed with FutureGrid India Eucalyptus.

7.1. Requirement

  1. FutureGrid HPC account, please apply via FutureGrid portal and request a HPC account.
  2. FutureGrid Eucalyptus account, please apply via FutureGrid (India) Eucalyptus Portal
  3. FutureGrid Eucalyptus credentials zip file (euca2-[username]-x509.zip) stored under user home directory.
  4. Linux command experience.

7.2. Get VM compute nodes

Before going through this tutorial, please obtain two VM instances (instance# emi-D778156D) from FutureGrid India-Eucalyptus. For detail information about starting Eucalyptus VM, please see the FutureGrid Eucalyptus tutorial. Assuming there are two VM instances running as shown in the following section, you will then need to set the hostname and mount a attached disk on each VM before starting Twister.

7.2.1. VM Hostname setting

In order to run Twister successfully without getting errors from the environment setting , please add hostname to each compute node, for instances, set “10.0.2.131” as “master” and set “10.0.2.132” as “slave”:

[johnny@i136 johnny-euca]$ euca-describe-instances
RESERVATION     r-442E080F      johnny  default
INSTANCE        i-46B007AE      emi-D778156D    149.165.146.207 10.0.2.131       running         johnny        0       c1.medium 2011-02-18T22:37:36.772Z     india   eki-78EF12D2    eri-5BB61255
INSTANCE        i-574E09D8      emi-D778156D    149.165.159.160 10.0.2.132       running         johnny        0       c1.medium 2011-02-18T22:37:36.772Z     india   eki-78EF12D2    eri-5BB61255

[johnny@i136 johnny-euca]$ ssh -i johnny.private root@149.165.146.207

…
Welcome to Ubuntu!
…
root@localhost:~# vim /etc/hosts
10.0.2.131 master
10.0.2.132 slave

root@localhost:~# hostname master
root@localhost:~# scp /etc/hosts root@slaves:/etc/hosts
root@localhost:~# ssh slave

…
Welcome to Ubuntu!
…
root@localhost:~# hostname slave
root@localhost:~# exit

root@localhost:~#

7.2.2. VM attached disk configuration

The started VM instance(s) will normally have an unformatted disk attached on /dev/sda2. If you require more disk space, you can do the following to format and to mount it. In our example, we format it and mount it to /tmp.

Noted that DO NOT format the /dev/sda1 which is the main partition contains the OS:

root@master:~/# fdisk -l | grep '^Disk'
Disk /dev/sda1 doesn't contain a valid partition table
Disk /dev/sda2 doesn't contain a valid partition table
Disk /dev/sda3 doesn't contain a valid partition table
Disk /dev/sda1: 2147 MB, 2147483648 bytes
Disk identifier: 0x00000000
Disk /dev/sda2: 8045 MB, 8045723648 bytes
Disk identifier: 0x00000000
Disk /dev/sda3: 536 MB, 536870912 bytes
Disk identifier: 0x00000000

root@master:~/# mkfs.ext3 /dev/sda2
.....

root@master:~# mount /dev/sda2 /tmp
root@master:~# df -h

Filesystem            Size  Used Avail Use% Mounted on
/dev/sda1             2.0G  1.1G  799M  59% /
udev                  3.0G   40K  3.0G   1% /dev
none                  3.0G     0  3.0G   0% /dev/shm
none                  3.0G   48K  3.0G   1% /var/run
none                  3.0G     0  3.0G   0% /var/lock
none                  3.0G     0  3.0G   0% /lib/init/rw
/dev/sda2             7.4G    1M  0.1G   1% /tmp

7.3. Twister Configuration

We provide a detail instruction to startup twister 0.9. In this example, we use “10.0.2.131”/master as Driver/Master node, “10.0.2.132”/slave as Worker/Slave node.

7.3.1. Download Twister 0.9

In VM mode, since we attach the extra larger disk on /tmp, we will download Twister package on all VM nodes under /tmp:

root@master:~# cd /tmp
root@master:tmp# wget http://salsahpc.indiana.edu/tutorial/apps/Twister-0.9.tar.gz
root@master:tmp# tar -zxvf Twister-0.9.tar.gz

7.3.2. Set $TWISTER_HOME, $JAVA_HOME and Worker nodes

Before running Twister configuration script, we need to setup several environment parameters:

root@master:tmp# echo export TWISTER_HOME=/tmp/twister-0.9 >> ~/.bashrc
root@master:tmp# echo export JAVA_HOME=/usr/lib/jvm/java-6-sun/ >> ~/.bashrc
root@master:tmp# source ~/.bashrc
root@master:tmp# vi $TWISTER_HOME/bin/nodes
master
slave

7.3.3. Run TwisterPowerMakeUp.sh

Within twister 0.9 package, there is a TwisterPowerMakeUp.sh script to automatically configure Twister. Generally, it randomly pick one of the working node as ActiveMQ messaging broker, set working daemon per node, and worker (mapper/reducer) per daemon. Also, it creates Twister required directories such as app_dir and data_dir:

root@master:tmp# cd $TWISTER_HOME/bin
root@master:bin# ./TwisterPowerMakeUp.sh
use normal MultiNode Setup
no special processing to nodes
ActiveMQ uri=failover:(tcp://slave:61616)
nodes_file=/tmp/twister-0.9/bin/nodes
daemons_per_node=1
workers_per_daemon=8
app_dir=/tmp/twister-0.9/apps
master:/tmp/twister-0.9/data created.
slave:/tmp/twister-0.9/data created.
data_dir=/tmp/twister-0.9/data
Change max memory to 16054 MB
copied to master:/tmp/twister-0.9
copied to slave:/tmp/twister-0.9
Auto configuration is done.

As shown in the message above “ActiveMQ uri=failover:(tcp://slave:61616)”, slave is the selected node where ActiveMQ messaging broker will be started.

7.3.4. Download and start ActiveMQ on specific node

Now ssh to the selected node, slave, then download and unzip the ActiveMQ package under /tmp. Finally, we start it up and return the previous Driver/Master node:

root@master:bin# ssh slave
root@slave:~# cd /tmp
root@slave:tmp# wget http://www.iterativemapreduce.org/apache-activemq-5.4.2-bin.tar.gz
root@slave:tmp# cd apache-activemq-5.4.2/bin
root@slave:bin# ./activemq console &
[1] 4009
[johnny@slave bin]$ INFO: Using default configuration
(you can configure options in one of these file: /etc/default/activemq /tmp/.activemqrc)
INFO: Invoke the following command to create a configuration file
./activemq setup [ /etc/default/activemq /tmp/.activemqrc ]
INFO: Using java '/usr/lib/jvm/java-6-sun/jre/bin/java'
INFO: Starting in foreground, this is just for debugging purposes (stop process by pressing CTRL+C)
Java Runtime: Sun Microsystems Inc. 1.6.0_20 /N/soft/jdk1.6.0_20-x86_64/jre
  Heap sizes: current=251264k  free=247327k  max=251264k
    JVM args: -Xms256M -Xmx256M -Dorg.apache.activemq.UseDedicatedTaskRunner=true -Djava.util.logging.config.file=logging.properties -Dcom.sun.management.jmxremote -Dactivemq.classpath=/tmp/apache-activemq-5.4.2/conf; -Dactivemq.home=/tmp/apache-activemq-5.4.2 -Dactivemq.base=/tmp/apache-activemq-5.4.2
ACTIVEMQ_HOME: /tmp/apache-activemq-5.4.2
ACTIVEMQ_BASE: /tmp/apache-activemq-5.4.2
Loading message broker from: xbean:activemq.xml
 INFO Refreshing org.apache.activemq.xbean.XBeanBrokerFactory$1@245e13ad: startup date [Sun Oct 30 23:33:22 EDT 2011]; root of context hierarchy
 WARN destroyApplicationContextOnStop parameter is deprecated, please use shutdown hooks instead
 INFO PListStore:/tmp/apache-activemq-5.4.2/data/localhost/tmp_storage started
 INFO Using Persistence Adapter: KahaDBPersistenceAdapter[/tmp/apache-activemq-5.4.2/data/kahadb]
 INFO KahaDB is version 3
 INFO Recovering from the journal ...
 INFO Recovery replayed 1 operations from the journal in 0.0080 seconds.
 INFO ActiveMQ 5.4.2 JMS Message Broker (localhost) is starting
 INFO For help or more information please see: http://activemq.apache.org/
 INFO Listening for connections at: tcp://slave:61616
 INFO Connector openwire Started
 INFO ActiveMQ JMS Message Broker (localhost, ID:slave-56404-1320032003342-0:1) started
 INFO jetty-7.1.6.v20100715
 INFO ActiveMQ WebConsole initialized.
 INFO Initializing Spring FrameworkServlet 'dispatcher'
 INFO ActiveMQ Console at http://0.0.0.0:8161/admin
 INFO Initializing Spring root WebApplicationContext
 INFO camel-osgi.jar/camel-spring-osgi.jar not detected in classpath
 INFO Apache Camel 2.4.0 (CamelContext: camel) is starting
 INFO JMX enabled. Using ManagedManagementStrategy.
 INFO Found 4 packages with 15 @Converter classes to load
 INFO Loaded 146 type converters in 0.337 seconds
 INFO Connector vm://localhost Started
 INFO Route: route1 started and consuming from: Endpoint[activemq://example.A]
 INFO Started 1 routes
 INFO Apache Camel 2.4.0 (CamelContext: camel) started in 0.783 seconds
 INFO Camel Console at http://0.0.0.0:8161/camel
 INFO ActiveMQ Web Demos at http://0.0.0.0:8161/demo
 INFO RESTful file access application at http://0.0.0.0:8161/fileserver
 INFO Started SelectChannelConnector@0.0.0.0:8161
root@slave:bin# exit
root@master:bin#

7.3.5. Start Twister

After you go back to the master node (master), simply type command ./start_twister.sh & under $TWISTER_HOME/bin:

[root@master:bin# ./start_twister.sh &
[1] 7844
root@master:bin# master
Oct 30, 2011 11:34:38 PM org.apache.activemq.transport.failover.FailoverTransport doReconnect
INFO: Successfully connected to tcp://slave:61616
1    [main] INFO  cgl.imr.worker.DaemonWorker  - Daemon no: 0 started with 8 workers.
slave
Oct 30, 2011 11:34:39 PM org.apache.activemq.transport.failover.FailoverTransport doReconnect
INFO: Successfully connected to tcp://slave:61616
0    [main] INFO  cgl.imr.worker.DaemonWorker  - Daemon no: 1 started with 8 workers.

[1]+  Done                    ./start_twister.sh

If you can see similar message above, twister has started successfully.

7.4. Verify Twister Status

Also you can use command “jps” on each node to make sure Twister (TwisterDaemon) is running.

On master node:

# on master node
root@master:bin# jps
7878 TwisterDaemon
7909 Jps

On slave and ActiveMQ node:

# on slave and ActiveMQ node
root@slave:bin# jps
4265 Jps
4025 run.jar
4185 TwisterDaemon