Saturday, April 6, 2013

Configure Ganglia for multiple clusters in Unicast mode

In my previous post I talked about how to: Setting up Ganglia in CentOS environment. At that time, I used only a single cluster for the whole setup. But it's highly unlikely that you have only a single cluster in your development/production environment. Consider you have two clusters - 1. Storm 2. Kafka and you want to monitor all of these cluster nodes through a single Ganglia UI. You do not have to install Ganglia multiple times for that, you just need to configure your Ganglia. It would have been much easier if AWS supports multicast but as it doesn't support multicast, you need to do a work-around in unicast mode to achieve monitoring multiple clusters in one single Ganglia.

The idea behind this work-around is pretty straightforward. Suppose I have two clusters: cluster#1 - Storm and cluster#2 - Kafka and their respective IP addresses are:

10.0.0.194 - Storm Cluster (supervisor 1)
10.0.0.195 - Storm Cluster (supervisor 2)
10.0.0.196 - Storm Cluster (supervisor 3)
10.0.0.182 - Storm Cluster (nimbus)
10.0.0.249 - Kafka Cluster
10.0.0.250 - Kafka Cluster
10.0.0.251 - Kafka Cluster
10.0.0.33 - my client machine

What I am going to do is, I will configure each of the cluster to send collected data (gmond) to one of their specific node only and configure the gmetad daemon in a way that it can collects the data only from a designated node (gmond daemon) from each cluster. Ganglia will categorize each cluster data by their unique cluster name defined in gmond.conf file.


As you can see in the above figure that all Kakfa cluster's data is sending to one specific node - 10.0.0.249 and all Storm cluster's data is sending to one of its node - 10.0.0.182. Client machine (10.0.0.33) is running gmetad daemon and I will configure that daemon so that it can look for two data sources for two clusters where their source IP addresses will be 10.0.0.249 and 10.0.0.182 for Kafka and Storm respectively.

I'm assuming that you already setup your Ganglia and it's running as expected. So I am not going to discuess about what is gmond.conf and gmetad.conf files. In case if you have not setup yet, you might want to take a look at this post
This is my gmond.conf file (only the part which I modified) which I'm using for all Kafka hosts (this file is unique for each host per cluster):
And here is my gmond.conf file for all Storm hosts (this file is unique for each host per cluster):

You notice that I'm using unique host address for udp_send_channel for each cluster. Now, I need to tell my gmetad daemon to look for those two host address to collect data from. Here is my gmetad.conf file:
You are done! Now restart all gmond daemons and gmetad daemon and wait for few minutes.
Once you navigate to your Ganglia UI url you should be able to see your grid and list of your clusters in the drop-down.



You can dig further to see each of your host for each cluster:





There is another work-around which you can also try to get a better understanding of Ganglia. In that case you need to use separate port number for each cluster. Here, I'm distinguishing each cluster's data source per IP address, but in that work-around you can have a single IP address for all clusters but multiple port numbers. You can try that work-around as an exercise :).



Note: For privacy purpose, I had to modify several lines on this post from my original post. So if you find something is not working or facing any issues, please do not hesitate to contact me.



Monday, April 1, 2013

Setting up Ganglia in CentOS

Ganglia is a scalable distributed monitoring system for high-performance computing systems such as clusters and Grids (ref). Installing and configuring Ganglia is very straight-forward. It has two major parts:

Gmond (Ganglia monitoring daemon): Runs on every single node and collects the data and sends to meta daemon node.

Gmetad (Ganglia meta daemon): Runs on a head (or client) node and gathers the data from all monitoring nodes and displays it on UI.

Assume I have 4 nodes cluster and one of the nodes also works as client. So, I will install the Ganglia PHP UI on that machine.

Here are their IP addresses and list of services I am going to install on them:
  • 10.0.0.33 - client node (gmetad, gmond, ui)
  • 10.0.0.194 - monitoring node (gmond)
  • 10.0.0.195 - monitoring node (gmond)
  • 10.0.0.196 - monitoring node (gmond)

On client node:
--> Install meta daemon, monitoring daemon and web UI by executing:
--> If they are not available, then you might need to install EPEL repositories to your machine.

On monitoring node:
--> Install monitoring daemon by:

Configuration:

By this point, everything is installed and now you need to configure your Ganglia.
  • /etc/ganglia/gmetad.conf --- configuration file for gmetad daemon
  • /etc/ganglia/gmond.conf --- configuration file for gmond daemon

I have updated only the following part on gmond.conf file in each monitoring node.
Notice that I have commented out mcast_join and bind because multicast is not supported by AWS EC2 and unicast is only the option for Ganglia. So, all monitoring nodes are sending collected data to the node (10.0.0.33) which is collecting data (nodes which is running gmetad daemon).

On gmetad.conf file I have updated this:
data_source "Cloud for Beginners" 60  10.0.0.33:8649
Here I'm telling to meta daemon the name of the cluster (name should be matched to organize list of hosts by cluster) and host's IP address and port from where data will be collected from and duration (collect data in every 60 seconds).

You are done! Now start monitoring daemon and meta daemon in all nodes.
After 1-2 minutes you should be able to see all your monitoring data through:

You might want to change boot configuration so that gmetad and gmond daemons will be started at boot:

Common Issue: 
In case if you are facing that the gmetad is not starting up, you can check the log by:
In log you might see "Please make sure that /var/lib/ganglia/rrds is owned by nobody" error, in that case you need to execute this:



Note: For privacy purpose, I had to modify several lines on this post from my original post. So if you find something is not working or facing any issues, please do not hesitate to contact me.