Thursday, September 27, 2007

Adding host info to ganglia and gmond

GridWay was not reporting OS name and OS version for mileva. After making sure gmond was running and MDS was actually being populated by available information (info on this is available in earlier post), you have to make sure all the information that is needed by GridWay is being provided by ganglia.
For example, I was noticing on the problem host that telnet localhost 8649 was giving the followgin output:
[METRIC NAME="sample-metric" VAL="Linux mileva-0-3.local 2.6.9-55.0.2.ELsmp #1 SMP Tue Jun 26 14:14:47 EDT 2007 x86_64 x86_64 x86_64 GNU/Linux
[METRIC NAME="sample-metric" VAL="Linux mileva-0-2.local 2.6.9-55.0.2.ELsmp #1 SMP Tue Jun 26 14:14:47 EDT 2007 x86_64 x86_64 x86_64 GNU/Linux

While output on working host was giving the following output:
METRIC NAME="os_name" VAL="Linux" TYPE="string" UNITS="" TN="401" TMAX="1200" DMAX="0" SLOPE="zero" SOURCE="gmond"/]
[METRIC NAME="sample-metric" VAL="Linux everest-0-3.local 2.4.21-20.ELsmp #1 SMP Sat Sep 18 18:28:16 PDT 2004 x86_64 x86_64 x86_64 GNU/Linux
[METRIC NAME="os_name" VAL="Linux" TYPE="string" UNITS="" TN="943" TMAX="1200" DMAX="0" SLOPE="zero" SOURCE="gmond"/]
[METRIC NAME="sample-metric" VAL="Linux everest-0-13.local 2.4.21-20.ELsmp #1 SMP Sat Sep 18 18:28:16 PDT 2004 x86_64 x86_64 x86_64 GNU/Linux

This means ganglia is not providing needed information, so do the following to include needed info. (Following was done on Ganglia v3.0.4)
Edit /etc/gmond.conf (this is the default location) file and add the following text toward the bottom of the document (among all the other similar entries):
metric {
name = "os_name"
value_threshold = 10.0
}

It is likely that os_release will need to be added in the same fashion as was described for os_name.

Edited config file (/etc/gmond.conf) needs to be copied to all the nodes of the cluster before it will take effect.

Then restart gmond (/etc/init.d/gmond restart) on all the nodes and wait for GridWay info to be updated...

No comments: