Monday, February 25, 2013

Setup a Storm cluster on Amazon EC2

Storm - Real-time  Hadoop! Yes, you can call it in that way. As you know, Hadoop provides a set of general primitives for doing batch processing, Storm also provides a set of primitives for doing real-time computation. It's a very powerful tool and pretty straight forward to setup a Storm cluster. If you want to setup a Strom cluster on Amazon EC2, you should try Nathan's storm-deploy project first which will auto deploy a Storm cluster on EC2 based on your configurations. But if you want to manually deploy a Storm cluster, you can follow these steps (if you want more detailed information, you can also follow original documentation of Storm):

Let me show you my machine's current configuration first:
  • Machine type: m1.large (for supervisor) and m1.small (for nimbus)
  • OS: 64-bit CentOS 6.3
  • JRE Version: 1.6.0_43
  • JDK Version: 1.6.0_43
  • Python Version: 2.6.6


For this tutorial, I am going to setup a 3-node Storm cluster. IP addresses of each hosts and my targetted configurations is:

10.0.0.194 - StormSupervisor1
10.0.0.195 - StormSupervisor2
10.0.0.196 - StormSupervisor3
10.0.0.182 - StormNimbus

Storm depends on Zookeeper for coordinating the cluster. I have already installed Zookeeper in each of those above hosts. Now apply each of the following steps on all Supervisor and Nimbus nodes:


A. Install ZeroMQ 2.1.7

Step #A-1: Download zeromq-2.1.7.tar.gz from http://download.zeromq.org/.

Step #A-2: Extract the gzip file:
[root@ip-10-0-0-194 tool]# tar -zxvf zeromq-2.1.7.tar.gz
Step #A-3: Build ZeroMQ and update the library:
Note: If you are facing "cannot link with -luuid, install uuid-dev." error when you are executing "./configure", then you need to install it. You can install it by executing "yum install libuuid-devel".


B. Install JZMQ

Step #B-1: Get the project from the Git by executing:
Step #B-2: Install it:

C. Setup Storm

Step #C-1: Download the latest version (for this tutorial, I'm using 0.8.1 version) from https://github.com/nathanmarz/storm/downloads.

Step #C-2: Unzip the downloaded zip file:
[root@ip-10-0-0-194 tool]# unzip storm-0.8.1.zip
Step #C-3: Now change the configuration based on your environment. Default location of the main Storm configuration file is: "/storm/conf/storm.yaml". Any setting you write on this file will overwrite default configuration file. Here is what I changed in the storm.yaml file:
Note: I have created the "storm" folder manually inside "/var" directory.

At this point, you are ready to start your Storm cluster. Here, I installed and setup everything on a single instance first (supervisor1 - 10.0.0.194) and then I created AMI from that instance and later created rest of the two supervisors and one nimbus node from that AMI.

Launch daemons by using the storm script (bin/storm) on each nodes. I started nimbus and UI daemons on the nimbus host and supervisor daemon on each of the supervisor nodes.

  • bin/storm nimbus on 10.0.0.182
  • bin/storm ui on 10.0.0.182
  • bin/storm supervisor on 10.0.0.194,10.0.0.195,10.0.0.196



You can see Storm UI by navigating to your nimbus host: http://{nimbus host}:8080. For my case, it was: http://54.208.24.209:8080 (here, 54.208.24.209 is the public IP address of my nimbus host).



Note: For privacy purpose, I had to modify several lines on this post from my original post. So if you find something is not working or facing any issues, please do not hesitate to contact me.


19 comments:

  1. " I have already installed Zookeeper in each of those above hosts."

    Why do you need Zookeeper in each machine? Only one Zookeeper is needed.

    ReplyDelete
  2. Thanks Rami for stopping by. It's true that single node Zookeeper is sufficient for most of the cases, But when I said "Installed Zookeeper in each of those above hosts" I meant replicated Zookeeper (for failover). Use of Zookeeper cluster is also suggested when deploying large Strom cluster.

    ReplyDelete
  3. Thanks man, Useful article

    ReplyDelete
  4. I am also doing the same trail of setting up a clustered setup on EC2 with 2 supervisors and one nimbus and one zookeeper.But for me only one supervisor instance is showing in the storm ui at a time.Both supervisors are able to connect and communicate with zookeeper but at a given time only one is being showed in the ui.There is a continuous switching between the supervisors in some random time difference. Need help.
    Thanks

    ReplyDelete
    Replies
    1. change the data dir in supervisors.path must be different in supervisors.

      Delete
  5. hello admin.I got enough knowledge about Setup a Storm cluster on Amazon EC2.Thank you for explaining in step by step.I am really impressed your blog.if you want know more details about..CCNA Training in Bangalore
    AWS Training in Bangalore

    ReplyDelete
  6. AWS Training in Bangalore - Live Online & Classroom
    myTectra Amazon Web Services (AWS) certification training helps you to gain real time hands on experience on AWS. myTectra offers AWS training in Bangalore using classroom and AWS Online Training globally. AWS Training at myTectra delivered by the experienced professional who has atleast 4 years of relavent AWS experince and overall 8-15 years of IT experience. myTectra Offers AWS Training since 2013 and retained the positions of Top AWS Training Company in Bangalore and India.

    ReplyDelete
  7. IOT Training in Bangalore - Live Online & Classroom
    IOT Training course observes iot as the platform for networking of different devices on the internet and their inter related communication. Reading data through the sensors and processing it with applications sitting in the cloud and thereafter passing the processed data to generate different kind of output is the motive of the complete curricula. Students are made to understand the type of input devices and communications among the devices in a wireless media.

    ReplyDelete
  8. Really very nice blog information for this one and more technical skills are improve,i like that kind of post.
    AWS training in chennai

    AWS Training in Bangalore

    ReplyDelete
  9. Home Mart is a site about Home Improvement, Furniture, Home Appliances and many more.
    Check out the best
    furniture nz
    furniture sale

    ReplyDelete
  10. Your info is really amazing with impressive content..Excellent blog with informative concept. Really I feel happy to see this useful blog, Thanks for sharing such a nice blog..
    If you are looking for any Data science Related information please visit our website data science institutes in bangalore page!

    ReplyDelete
  11. The top online casino games of 2021
    Best 바카라 총판 Online 비트 코인 온라인 카지노 Casino Bonuses · 1. Wild artaideastone.com Casino · 2. Jackpot City Casino · 3. PlayFrank Casino · 4. Ignition Casino · 5. Planet 7 Casino · 에볼루션 바카라 6. 샌즈 카지노 주소 Slotomania Casino · 7. Red

    ReplyDelete
  12. Online Baccarat - CreatePk
    Online gta5카지노임무 Baccarat is one of the most entertaining casino games 우리 계열사 카지노 to play 샌즈 카지노 주소 online. Play the game you love and 더킹 카지노 슬롯 try to win. Choose 바카라 from hundreds of exciting online baccarat

    ReplyDelete