Debugging Parallel GENESIS Scripts


Common errors

Source Level Debugging

There is no source level debugging of Genesis scripts; instead, one can set the debug level to provide more or less detailed information about what is being executed in a Genesis script. PGENESIS follows this model - a debug level can be specified in the paron statement to control what level of debugging information is printed out during a run.

Currently it is possible to run the worker nodes inside their own xterm window. This is achieved by providing the "-debug tty" flag to the pgenesis shell script which controls how PGENESIS is run. In this case it is important that the paron command in the GENESIS script not be given the -output flag, which redirects worker output to a file instead of stdout.

For those who need to debug C code (either GENESIS/PGENESIS source code, or custom user-written libraries), it is also possible on some platforms to run the workers and the master under a C code source level debugger such as gdb or dbx. For dbx the master and each worker run inside their own window, as for the "-debug tty" option, but each runs inside dbx. For gdb, the master and each worker run inside their own window which is running emacs, with gdb running inside emacs. These options are specified to the pgenesis shell script using "-debug dbx" and "-debug gdb" respectively.

Script modifications for debugging

In additon to adding more echo statements to the scripts, the following ideas may be helpful.
  • Timeout

    The timeout period is set by default to 120 seconds. You can modify this with the command

           setfield /post msg_hang_time n
    where n is the number of seconds to wait before timing out on barriers, responses to remote commands, etc.

  • Barriers

    Many errors in parallel programming are due to incorrect synchronization of the executing processes. Insertion of extra barrier and barrierall commands can help in ensuring that the synchronization you expect is in fact occuring.

  • Asynchronous remote function calls

    Asynchronous function calls increase the potential degree of parallelism in a parallel script, and therefore increase the risk of deadlock (no process can continue because each is waiting for a message from another) or other program error. If your scripts use the async command, you can turn all these calls into synchronous calls by globally replacing the string "async" with "//async \", effectively commenting out the "async".