+ -
当前位置:首页 → 问答吧 → 集群监视软件Ganglia

集群监视软件Ganglia

时间:2007-02-07

来源:互联网

什么是ganglia
Ganglia监控软件主要是用来监控系统性能的软件,如:cpu 、mem、硬盘利用率, I/O负载、网络流量情况等,通过曲线很容易见到每个节点的工作状态,对合理调整、分配系统资源,提高系统整体性能起到重要作用。

Ganglia的组成
ganglia 是分布式的监控系统,有两个Daemon, 分别是:客户端Ganglia Monitoring Daemon (gmond)和服务端Ganglia Meta Daemon (gmetad)

什么是PHP Web Frontend
Php Web Frontend 是Ganglia的一套基于php开发和运行的web统计浏览程序

什么是PHP
PHP是一个基于服务端来创建动态网站的脚本语言,您可以用PHP和HTML生成网站主页。当一个访问者打开主页时,服务端便执行PHP的命令并将执行结果发送至访问者的浏览器中,这类似于ASP和CoildFusion,然而PHP和他们不同之处在于PHP开放源码和跨越平台,PHP可以运行在WINDOWS NT和多种版本的UNIX上。它不需要任何预先处理而快速反馈结果,它也不需要mod_perl的调整来使您的服务器的内存映象减小。PHP消耗的资源较少,当PHP作为Apache Web服务器一部分时,运行代码不需要调用外部二进制程序,服务器不需要承担任何额外的负担。

什么是RRDtool
RRDtool是系统存放和显示time-series (即网络带宽、温度、人数、服务器负载等) ,并且它额可以绘出有用的图表用来显示处理的数据和数据密度。

相关资源

http://www.ganglia.info
Ganglia is a scalable distributed monitoring system for high-performance computing systems such as clusters and Grids. It is based on a hierarchical design targeted at federations of clusters. It relies on a multicast-based listen/announce protocol to monitor state within clusters and uses a tree of point-to-point connections amongst representative cluster nodes to federate clusters and aggregate their state. It leverages widely used technologies such as XML for data representation, XDR for compact, portable data transport, and RRDtool for data storage and visualization. It uses carefully engineered data structures and algorithms to achieve very low per-node overheads and high concurrency. The implementation is robust, has been ported to an extensive set of operating systems and processor architectures, and is currently in use on over 500 clusters around the world. It has been used to link clusters across university campuses and around the world and can scale to handle clusters with 2000 nodes.
The ganglia system is comprised of two unique daemons, a PHP-based web frontend and a few other small utility programs.
Ganglia Monitoring Daemon (gmond)

Gmond is a multi-threaded daemon which runs on each cluster node you want to monitor. Installation is easy. You don't have to have a common NFS filesystem or a database backend, install special accounts, maintain configuration files or other annoying hassles.
Gmond has four main responsibilities: monitor changes in host state, announce relevant changes, listen to the state of all other ganglia nodes via a unicast or multicast channel and answer requests for an XML description of the cluster state.

Each gmond transmits in information in two different ways: unicasting/multicasting host state in external data representation (XDR) format using UDP messages or sending XML over a TCP connection.

Ganglia Meta Daemon (gmetad)

Federation in Ganglia is achieved using a tree of point-to-point connections amongst representative cluster nodes to aggregate the state of multiple clusters. At each node in the tree, a Ganglia Meta Daemon (gmetad) periodically polls a collection of child data sources, parses the collected XML, saves all numeric, volatile metrics to round-robin databases and exports the aggregated XML over a TCP sockets to clients. Data sources may be either gmond daemons, representing specific clusters, or other gmetad daemons, representing sets of clusters. Data sources use source IP addresses for access control and can be specified using multiple IP addresses for failover. The latter capability is natural for aggregating data from clusters since each gmond daemon contains the entire state of its cluster.

Ganglia PHP Web Frontend

The Ganglia web frontend provides a view of the gathered information via real-time dynamic web pages. Most importantly, it displays Ganglia data in a meaningful way for system administrators and computer users. Although the web frontend to ganglia started as a simple HTML view of the XML tree, it has evolved into a system that keeps a colorful history of all collected data.
The Ganglia web frontend caters to system administrators and users. For example, one can view the CPU utilization over the past hour, day, week, month, or year. The web frontend shows similar graphs for Memory usage, disk usage, network statistics, number of running processes, and all other Ganglia metrics.

The web frontend depends on the existence of the gmetad which provides it with data from several Ganglia sources. Specifically, the web frontend will open the local port 8651 (by default) and expects to receive a Ganglia XML tree. The web pages themselves are highly dynamic; any change to the Ganglia data appears immediately on the site. This behavior leads to a very responsive site, but requires that the full XML tree be parsed on every page access. Therefore, the Ganglia web frontend should run on a fairly powerful, dedicated machine if it presents a large amount of data.

The Ganglia web frontend is written in the PHP scripting language, and uses graphs generated by gmetad to display history information. It has been tested on many flavours of Unix (primarily Linux) with the Apache webserver and the PHP 4.1 module.


http://www.php.net
PHP is a widely-used general-purpose scripting language that is especially suited for Web development and can be embedded into HTML.

http://www.rrdtool.org

RRD is the Acronym for Round Robin Database. RRD is a system to store and display time-series data (i.e. network bandwidth, machine-room temperature, server load average). It stores the data in a very compact way that will not expand over time, and it can create beautiful graphs. It can be used via simple shell scripts or as a perl module.

具体安装步骤

安装环境 Redhat AS3 update4 with Apache
安装软件 rrdtool-1.0.49  Ganglia3.0.1 PHP4.4.0 Ganglia-web-3.0.0-1

安装RRDTool

gmetad需要先安装RRDTool,默认的安装路径:/usr/local/rrdtool-1.0.49
Your_prompt>tar rrdtool.tar.gz
Your_prompt>cd rrdtool-1.0.49
Your_prompt>./configure
Your_prompt>make
Your_prompt>make install
更改rrdtool-1.0.49名称为rrdtool
Your_prompt>mv rrdtool-1.0.49 rrdtool
rrd.h in /usr/local/rrdtool/include/rrd.h
librrd.a in /usr/local/rrdtool/lib/librrd.a

Server端的安装和配置

gmetad的安装

gmetad不是默认安装的,安装时需要加参数 --with-gmetad 。即rrdtool库及其头文件必须存在,默认的路径是/usr/include/rrd.h和 /usr/lib/librrd.a,如果在安装rrdtool时安在了不同的路径下,这里需要指明它们的路径。
./configure CFLAGS="-I/rrd/header/path" CPPFLAGS="-I/rrd/header/path" \
LDFLAGS="-L/rrd/library/path" --with-gmetad
Your_prompt>tar –zxvf ganglia-3.0.1.tar.gz
Your_prompt>cd ganglia-3.0.1
Your_prompt>./configure CFLAGS="-I/rrd/header/usr/local/rrdtool/include/rrd.h"
CPPFLAGS="-I/rrd/header/usr/local/rrdtool/include/rrd.h "
LDFLAGS="-L/rrd/library/usr/local/rrdtool/lib/librrd.a " --with-gmetad
Your_prompt>make
Your_prompt>make install

为了保证在开始时启动,需要将gmetad.init文件拷贝到 /etc/rc.d/init.d/
Your_prompt> cd ganglia-3.0.1/gmetad
Your_prompt> cp gmetad.init /etc/rc.d/init.d/gmetad

将配置文件拷贝到/etc目录下
Your_prompt> cp gmetad.conf /etc/gmetad.conf
Add GMETAD to the list of programs at startup
Your_prompt> chkconfig --add gmetad
Your_prompt> chkconfig --list gmetad
GMETAD 0ff 1ff 2n 3n 4n 5n 6ff

启动gmetad
Your_prompt>/etc/rc.d/init.d/gmetad start
Starting GANGLIA gmetad: [ OK ]
Your_prompt>telnet localhost 8651 | grep “hostname”
就可以得到监控的各个主机的状态。

gmetad.conf的配置
# data_source "another source" 1.3.4.7:8655 1.3.4.8
data_source "SERVER" 10 node1 node2
data_source是最重要的参量,在GMOND的Cluser name配置必须与data_source的相同,这个参量被设置为群的名字,被监测以便能监测那群状态。如果有二个或更多监测对象,当有一对象不能被监测,将读取data_source 配置的下一个对象

client端安装和配置

Your_prompt>tar –zxvf ganglia-3.0.1.tar.gz
Your_prompt>cd ganglia-3.0.1
Your_prompt>./configure
Your_prompt>make
Your_prompt>make install
Your_prompt>cd gmond
Your_prompt>gmond –t > /etc/gmond.conf
Your_prompt>cp gmond.init /etc/rc.d/init.d/gmond
Your_prompt> chkconfig --add gmond
Your_prompt> chkconfig --list gmond
gmond 0ff 1ff 2n 3:on 4:on 5:on 6:off
Your_prompt>/etc/rc.d/init.d/gmond start
Starting GANGLIA gmond: [ OK ]
Your_prompt>telnet localhost 8649 就可以获取机群内运行gmond的主机的信息

配置gmond.conf

vi /etc/gmond.conf
globals {
setuid = no
user = nobody
cleanup_threshold = 300 /*secs */
}
修改为
setuid = yes
user = scett #本机用户名

cluster {
name = "unspecified" #Cluser name
}
修改监控组名称
name = “SERVER”
配置完成后重新启动gmond.

安装php4.40

tar zxvf php-4.4.0.tar.gz
cd php-4.4.0
./configure --prefix=/usr/local/php --with-config-filepath=/usr/local/php --with-apxs2=/usr/local/httpd/bin/apxs --with-openssl=/usr/local/ssl
--with-mysql=/usr/local/mysql
make
make install

修改apache配置文件
Vi /etc/httpd/conf/httpd.conf

LoadModule php4_module    extramodules/libphp4.so
AddModule mod_php4.c
   AddType  application/x-httpd-php         .php .php4 .php3 .phtml
AddType  application/x-httpd-php-source  .phps

安装 php web frontend

rpm -Uvh ganglia-web-3.0.0-1.i386.rpm

最终通过 http://server/ganglia 访问页面如下



作者: 5iwww   发布时间: 2007-02-07

LZ的服务器上没有能反向解析出其他服务器的名字,所以图上显示的都是IP

作者: straw   发布时间: 2007-02-07

ganglia还可以分组,是很不错的可视化集群监控。
当初做运维的时候就想做一个这样的系统,但是没做完就离开了。
现在用ganglia管理100多台机器,用起来很方便。
能再吸取一些cacti在图的缩放上面的精华,就无人能敌了。

作者: sinofool   发布时间: 2007-02-08

有一点没看明白,你要监控的集群IP什么的都是在哪里写的啊?还有就是不用开SNMP的么?

作者: fuyic   发布时间: 2007-02-08

ganglia有自己的mon程序,每个节点上都需要运行一个gmond,ip不需要写在服务器上

作者: straw   发布时间: 2007-02-08

就是说每个节点装一个client?

作者: fuyic   发布时间: 2007-02-08

我谢过一个文档,论坛很多人问我要过。

作者: soway   发布时间: 2007-02-08

我使用的rocks里面带有这个软件,还是挺好用的.不过我希望能在自己的程序中使用它的功能,不知道它有没有API什么的可以供C语言调用啊?我看了好象没有.

作者: zycismichael   发布时间: 2007-02-08



QUOTE:
原帖由 fuyic 于 2007-2-8 15:16 发表于 6楼  
就是说每个节点装一个client?



是要在每个cilent上装一个gmond

作者: 5iwww   发布时间: 2007-02-09

good.!

作者: unixsc   发布时间: 2007-02-13



QUOTE:
原帖由 zycismichael 于 2007-2-8 21:34 发表于 8楼  
我使用的rocks里面带有这个软件,还是挺好用的.不过我希望能在自己的程序中使用它的功能,不知道它有没有API什么的可以供C语言调用啊?我看了好象没有.




rock的还带有SQL管理,可以看到任务进程,计算时间,更方便一些。

作者: fdog   发布时间: 2007-03-02

楼主你好!谢谢你的帖子!最近我也在装ganglia,多向你学习,我有个问题了。
我最近在安装有6个刀片的集群,每个节点上都安装了gmond,管理节点还安装了gmetad,安装没有问题了。可是监控出了点问题!监控的服务器在一台Sun的机器上了。我有时改变了节点上的gmond.conf设置,监控的页面就显示有7台主机了。我改回来,可是监控的页面上显示还是7台主机,不过有台显示宕机了.我该怎么去除宕机的机器的图片,谢谢,看着难受了。
附:udp_send_channel {
host=10.0.33.1(管理节点内部地址)
port = 8649
}
此时监控界面上显示是6台主机,当我将host值改为202.119.113.99(管理节点外部地址),监控界面显示7台主机了,我改回10.0.33.1,可是页面还是显示7台主机,状态是down,我该怎么办了?

监控服务器的gmetad.conf中的data_source是这样写的:
data_source "NIC IBM Cluster" 202.119.113.99:8649
还有我想问下,管理节点的/etc/hosts文件这样写对吗?
127.0.0.1 localhost.localdomain localhost
10.0.33.1 master.hpc2 master
10.0.33.2 node01.hpc2 node01
10.0.33.3 node02.hpc2 node02
10.0.33.4 node03.hpc2 node03
10.0.33.5 node04.hpc2 node04
10.0.33.6 node05.hpc2 node05

作者: skrbys   发布时间: 2010-10-18