Monday, April 1, 2013

Setting up Ganglia in CentOS

Ganglia is a scalable distributed monitoring system for high-performance computing systems such as clusters and Grids (ref). Installing and configuring Ganglia is very straight-forward. It has two major parts:

Gmond (Ganglia monitoring daemon): Runs on every single node and collects the data and sends to meta daemon node.

Gmetad (Ganglia meta daemon): Runs on a head (or client) node and gathers the data from all monitoring nodes and displays it on UI.

Assume I have 4 nodes cluster and one of the nodes also works as client. So, I will install the Ganglia PHP UI on that machine.

Here are their IP addresses and list of services I am going to install on them:
  • 10.0.0.33 - client node (gmetad, gmond, ui)
  • 10.0.0.194 - monitoring node (gmond)
  • 10.0.0.195 - monitoring node (gmond)
  • 10.0.0.196 - monitoring node (gmond)

On client node:
--> Install meta daemon, monitoring daemon and web UI by executing:
--> If they are not available, then you might need to install EPEL repositories to your machine.

On monitoring node:
--> Install monitoring daemon by:

Configuration:

By this point, everything is installed and now you need to configure your Ganglia.
  • /etc/ganglia/gmetad.conf --- configuration file for gmetad daemon
  • /etc/ganglia/gmond.conf --- configuration file for gmond daemon

I have updated only the following part on gmond.conf file in each monitoring node.
Notice that I have commented out mcast_join and bind because multicast is not supported by AWS EC2 and unicast is only the option for Ganglia. So, all monitoring nodes are sending collected data to the node (10.0.0.33) which is collecting data (nodes which is running gmetad daemon).

On gmetad.conf file I have updated this:
data_source "Cloud for Beginners" 60  10.0.0.33:8649
Here I'm telling to meta daemon the name of the cluster (name should be matched to organize list of hosts by cluster) and host's IP address and port from where data will be collected from and duration (collect data in every 60 seconds).

You are done! Now start monitoring daemon and meta daemon in all nodes.
After 1-2 minutes you should be able to see all your monitoring data through:

You might want to change boot configuration so that gmetad and gmond daemons will be started at boot:

Common Issue: 
In case if you are facing that the gmetad is not starting up, you can check the log by:
In log you might see "Please make sure that /var/lib/ganglia/rrds is owned by nobody" error, in that case you need to execute this:



Note: For privacy purpose, I had to modify several lines on this post from my original post. So if you find something is not working or facing any issues, please do not hesitate to contact me.


1 comment:

  1. Nice and easy process....
    i guess you forgot to teach adding authorization and authentication and modfying ganglia.conf

    ReplyDelete