Thursday, July 12, 2007

GRAM Authentication test failure

If getting this:
[afgane@everest00 afgane]$ globusrun -r everest.cis.uab.edu -a

GRAM Authentication test failure: connecting to the job manager failed. Possible reasons: job terminated, invalid job contact, network problems, ...

After making sure you have current grid proxy (through grid-proxy-init), check that the globus-gatekeeper is running by telneting to port 2119 by executing:
telnet [hostname] 2119

If you get an error such as the following, read on...
telnet localhost 2119
Trying 127.0.0.1...
telnet: Unable to connect to remote host: Connection refused


2119 is globus-gatekeeper default port. If a different port is being used, you can check it by examining file usr/local/globus-4.0.2/etc/globus-gatekeeper.conf first.
Continue by examining /etc/xinetd.d/globus-gatekeeper. This is service startup file that looks something like this (make sure first line is giving the name of the service (i.e., globus-gatekeeper, in this case)):
service globus-gatekeeper
{
socket_type = stream
protocol = tcp
wait = no
user = root
env = LD_LIBRARY_PATH=/usr/local/globus-4.0.2/lib
server = /usr/local/globus-4.0.2/sbin/globus-gatekeeper
server_args = -conf /usr/local/globus-4.0.2/etc/globus-gatekeeper.conf
disable = no
}

In this file find the globus-gatekeeper config file (e.g., line: server_args = -conf /usr/local/globus-4.0.2/etc/globus-gatekeeper.conf) and then examine it next. This file looks something like this:
[root@everest00 xinetd.d]# cat /usr/local/globus-4.0.2/etc/globus-gatekeeper.conf
-x509_cert_dir /etc/grid-security/certificates
-x509_user_cert /etc/grid-security/hostcert.pem
-x509_user_key /etc/grid-security/hostkey.pem
-gridmap /etc/grid-security/grid-mapfile
-home /usr/local/globus-4.0.2/
-e libexec
-logfile var/globus-gatekeeper.log
-port 2119
-grid_services etc/grid-services
-inetd

Here you can find which port the gatekeeper is running on and then go back to telnet.

If any changes were made to the two files just mentioned, you must restart xinetd. This is done as root by executing:
/etc/rc.d/init.d/xinetd restart
If this still does not work, execute: netstat -lt
This will print a list of all service currentl running. You can also try starting globus-gatekeeper manually by starting the server. Path and parameters for the server can be found in /etc/xinetd.d/globus-gatekeeper again under server and server_args (e.g., $ /usr/local/globus-4.0.2/sbin/globus-gatekeeper -conf /usr/local/globus-4.0.2/etc/globus-gatekeeper.conf)

If you can succesfully telnet into to machine, gatekeeper is running and next steps would include checking host certificates and making sure permissions are set correctly and that they are still valid (in /etc/grid-security/):
-rw-r--r-- 1 root root 1.4K Mar 8 13:50 hostcert.pem
-r-------- 1 root root 887 Mar 8 13:49 hostkey.pem

gridmap file needs to hold distinguished names of individual users that map to local user names. (e.g.: "/C=US/ST=Alabama/L=Birmingham/O=University of Alabama at Birmingham/OU=UABgrid/CN=jpr/emailAddress=jpr@uab.edu" afgane).

Finally, /etc/grid-security/certificates directory must hold currently valid CA certificates for participating resources/organizations.


Additional (excellent) documentation can be gotten from Georgia Tech at http://www.hpcc.ttu.edu/Globus.html and http://www.sdsc.edu/~tkaiser/globus/build/. Information on globus-personal-gatekeeper is also included at the second link.

No comments: