PGENESIS 2.0 Manual


###
### WARNING: This document describes the obsolete PGENESIS 2.0, which
### dates from around 1995-1996; users should refer to the
### top-level README, the PGENESIS homepage
### (http://www.psc.edu/Packages/PGENESIS), and the online
### Book of GENESIS (http://www.genesis-sim.org/GENESIS/bog/bog.html
### for current information.
###
### This is included here for its historical relevance as a
### design document, and as a supplementary source of information
### for users seeking additional information about various PGENESIS
### features.
###


Script Language Extensions for Parallel Genesis

Reference Manual
Nigel Goddard
ngoddard@psc.edu

===================================================================
1. Introduction

Parallel Genesis allows a modeler to distribute a simulation across a
compuational platform that supports the PVM message passing library.
This includes most supercomputers and workstation networks. There is
a natural mapping from network-level models to a parallel computer,
but Parallel Genesis is also capable of executing a single-cell model
on a parallel platform.

This document describes the language extensions for parallel programming
in the Genesis script language. The first section introduces the
programming model and the second section specifies the syntax and
semantics of each new script language extension.

1.1 PVM and Parallel Genesis

The Parallel Virtual Machine (PVM) system provides the illusion of a
parallel platform. The PVM system may run on a single CPU or multiple
CPUs, possibly of different type. In the rest of this document when
we refer to the "parallel platform" we mean the illusion provided by
PVM. When we refer to the "parallel machine", we mean the physical
set of CPUs and network connecting them that PVM is running on. An
executing PVM program consists of user processes, typically one per
CPU, which communicate via the PVM daemon which runs on each
participating CPU. In parallel Genesis, each user process is an
independent Genesis simulation, which we call a "node" of the parallel
simulation. Nodes are uniquely identified by a node number
(consecutive integers starting at zero).

1.2 Zones

Nodes can be grouped in "zones" when the simulation is started. Each
node is in exactly one zone (by default, every node is in zone 0).
The zones form a fixed partition of the parallel platform. The
motivation for zones is to allow different parts of the simulation to
run asynchronously (uncoordinated). For example, in a parameter
search application one might wish to run many instances of a four-node
model in parallel. Each instance uses four nodes which must run
synchronously, but the instances need not be coordinated (except at
start and finish). Thus we can run each instance in a separate zone,
each zone containing four nodes. Zones are uniquely identified by
consecutive integers starting at zero. The nodes within a zone are
uniquely identified with a znode number (consecutive integers starting
at zero). The mapping from node numbers to (zone, znode) pairs is
discussed in section ???.

1.3 Programming model

1.3.1 Namespace (memory)

The Parallel library currently provides a private-namespace
programming model. This means that each node has no knowledge of the
elements that reside on other nodes. This implies that every
reference to an element on another node must specify the node
explicitly. It is envisioned that a shared-namespace programming
model will be implemented eventually. This will allow nodes within a
zone to reference elements on other nodes in the zone without
specifying the node number. To ease upgrade of parallel models to the
shared-namespace paradigm, it is recommended that element names within
a zone be unique. If this recommendation is not adhered to, there
will be naming conflicts if a model wishes to take advantage of the
shared-namespace cabability when it becomes available.

1.3.2 Execution (threads and synchronization)

The main thread of control on each node is that which reads commands
from the script file (or keyboard if the session is interactive). The
parallel library provides limited capabilities for this thread to
create new threads on any node. On each node the threads are pushed
onto a stack with the main thread at the bottom of the stack. Only
the topmost thread may execute and when it completes it is popped off
the stack so that the next thread down can continue. Threads ready to
execute are NOT guaranteed to execute: if the topmost thread is
blocked or looping, no ready thread lower on the stack can continue.

An executing thread is guaranteed to run to completion (assuming it
does not contain an infinite loop or block on I/O) so long as it
executes only local operations, i.e. no operations that explicitly or
implicitly involve communication with other nodes. The commands
descriptions below include specification of local or non-local status.
In addition, simulation steps and reset are by definition non-local
operations if there is more than one node in the zone. Users are
strongly encouraged to use only local operations in child threads
whenever possible. Users need to be very careful about thread
creation to ensure that deadlock (no thread can continue) does not
occur.

The parallel library provides facilites for blocking and non-blocking
thread creation, usually used to execute commands on nodes different
from the one the script is being executed on ("remote" nodes). When a
thread (including the main script) initiates a blocking remote thread
(also known as remote procedure call or remote function call), it
waits until the thread completes before continuing. When a thread
initiates a non-blocking remote thread (an asynchronous thread), it
continues immediately without waiting for termination of the thread.
While a thread is waiting, the node can accept a request for thread
creation arriving from any node (including itself). This new thread
is pushed on the thread stack and executed, so that the original
waiting thread does not continue until the new thread has completed.

Scripts running on different nodes can synchronize via several
different synchronization primitives. There are two types of barrier
(each script waits at a barrier until all have reached it), one which
involves all nodes in a zone, the other involving all nodes in the
parallel platform. By default there is an implicit zone-wide barrier
before a simulation step is executed, although this can be disabled.
Pairwise synchronization of nodes is also possible.

When a script requests that a command be run asynchronously on another
node it initiates a child thread of control on the other node. The
child thread runs asynchronously with its parent. The parent can
request notification or the child's result when the child completes,
and can wait on that notification or result (a "future"). By default
all threads block for child completion before each simulation step
although this feature is modifiable by the user. It is easy to reach
deadlock (no thread able to continue) if the creation and execution of
threads is handled carelessly.

If a node initiates several child threads on a particular remote node,
these are guaranteed to commence (but not necessarily complete)
execution in the order in which they were initiated. A thread is
guaranteed to execute eventually so long as no preceding thread 1)
enters a loop which only executes local operations, or 2) blocks
indefinitely because of deadlock. Once execution of a thread begins,
it runs to completion without interruption so long as it only executes
local operations.

1.4 Simulation and scheduling

The parallel library provides the ability to set up a Genesis message
between to objects on different nodes (a remote message), provided the
nodes are in the same zone. Data is physically transferred from one
node to the next at the beginning of a simulation step (see section
??? for user control of this process). This means that there is no
transfer of data between objects within a single timestep, which has
ramifications for the schedule. The parallel library guarantees that
execution on a parallel platform will be identical to that on a single
processor if and only if there are no remote messages for which the
source object precedes the destination object in the schedule (we
assume that every node in a zone has the same schedule).

1.5 Deferred features

[Mail ngoddard@psc.edu if you would like to see any of these features
implemented.]

Features which the parallel library will eventually include but which
are not currently implemented:

remote execution on arbitrary sets of nodes

remote messages deletion

copy or deletion of elements sending or receiving remote messages

inspection of remote messages (e.g., showmsg, getmsg)

active messages which have slot data

rget (get value of remote script variable)

rpoll (wait until remote script variable is non-zero)

rvolumeconnect (like volumeconnect)

=============================================================================
2.0 Commands

2.1 Syntax.

In the interests of backwards compatibility, the syntax of most
commands is an extension of existing command forms. The general idea
is to precede the existing command name with an "r" (for "remote") and
to add the specification of the remote node with an @.


Local syntax (existing) :
command args ....
element_path

Remote syntax
command@ args ...
element_path@node


<- ,,...

<- [.]

where is the znode number for the node, that is its number
within its zone. If . is omitted, the zone of the caller is
assumed. Any command or element without the @ extension is assumed to
be on the local node. If the node and zone are the same as the
caller's, the command is executed locally.

Nodes and zones are identified at present as an integer (znode number
or zone number) or one of two keywords:

other : All except the caller's ("others" also accepted).
all : All including the caller's.

Thus "@other" means all nodes in the caller's zone except the caller's
node. "@other.all" means every node in the parallel platform except
the caller's. "other.other" means every node in every other zone
except the nodes whose znode number is the same as the callers znode
number. "@all.all" means every node in the parallel platform. "@all.other"
means every node in every other zone. If "other" is given for node then
there must be a node whose znode number corresponds to the caller's
znode number in every zone that is specified. There must exist at
least one node in each node-spec.

[Deferred: appending "only" to the @ means that the reference is
exclusively to those nodes specified in the node-list - all other
nodes do nothing. Thus "command@only-2,4,6" results in the command
being executed on nodes 2,4 and 6 only. Mail ngoddard@psc.edu if
you would like this implemented.]

=============================================================================
2.2 Startup

paron
syntax : paron [option [arguments]]+

This command initializes the parallel simulation It performs the
following tasks :

* Creates the postmaster /post
* If global node zero then spawn all the other nodes
* Partitions the parallel platform into zones
* Adds the parallel event loop ParEventLoop as a job at priority 1
(which is high but not top)

The command should be issued before starting any parallel functions.

2.2.1 paron syntax options.

Several paron options control how the nodes are spawned by the master.
Worker nodes ignore these options in the paron command, except the
-debug option.

-executable filename (default: "parnxgenesis")
the file in $HOME/pvm4/bin/ARCH to execute

-nodes number (default: $NNODES if it is defined, otherwise
the size of the parallel machine)
the number of nodes to use in the parallel platform

-execdir directory-name (master's working directory at paron time)
the name of the directory which will become "." for the workers

-altsimrc filename (default: ".parnxsimrc")
the Genesis configuration file to use, located in $HOME or "."

-startup filename (default: same name as master's startup file)
the name of the .g file on the SIMPATH to read on startup

-silent level (default: 3)
the Genesis level of error/feedback reporting

-debug flags (default: 0)
the PVM debug options for spawning tasks. See the PVM
manual and section ??? for details. 4 means debug.

-output filename (no default)
the name of the file (relative to the master's working
directory if not an absolute pathname) to which
stdout/err from workers is directed

-nice level (master's nice level)
the nice level to run the workers at. Positive values only,
a higher level means a lower priority.

-dbgout filename (stdout, for implementors only)
the name of the file which workers will print debug messages
to, an extension is added (the worker's PID).
[NOT IMPLEMENTED]
-batch
if present, the worker will be run in batch mode
so any interaction with the user will be suppressed

The remaining options for paron relate to specification of zones.
It is an error, currently NOT detected, if the nodes do not all
have the same specification of zones.

-farm
each node is in a separate zone (default).

-parallel
all nodes in the same zone.

-mixed nnodes nzones [nnodes nzones] ...
This is the most general option. It sets up 'nzones' zones
with 'nnodes' nodes in each. Out of the remainder, a similar
specification can be used, and so on. The nodes left over
are placed in single-node zones. If
nnodes or nzones is <1 it means divide the remaining
number of nodes by the non-negative value to obtain
the other value. Remainders go on as before.
There is a special case when the first zone is of
size 1 (e.g. paron mixed 1 1 10 10 ....). This case
usually occurs when one wants a master node to coordinate
the operation of all the rest, as in a parameter search
simulation. In this situation the zero zone is placed
at the real zero node of the parallel machine, which is
usually the processor with a TTY interface if any exiss.

2.2.2 Examples
paron
By default this just sets up a simulation where each node is
in a separate zone.

paron -parallel
Sets up a simulation where each node is in the same zone, and
can send messages to each other.

paron -mixed 10 5 3 others
If this were starting with 64 nodes, the simulation would look
like this :
Zone 0,1,2,3,4 : 10 nodes each = 50 nodes
Zone 5,6,7,8 : 3 nodes each = 12 nodes
Zone 9,10 : 1 nodes each = 2 nodes


-----------------------------------------------------------------------------
2.3 Local commands
-----------------------------------------------------------------------------

mynode,nnodes,myzone,nzones,ntotalnodes,mytotalnode

mynode - number of this node in this zone
nnodes - number of nodes in this zone
myzone - number of this node's zone
nzones - number of zones
ntotalnodes - number of nodes in all zones
mytotalnode - unique number over all zones for this node

These are utility functions for finding out about
configuration of the parallel platform.

Syntax : They return an int, and do not take any
arguments.

Examples :
echo {mynode} {nnodes} {myzone} {nzones}

-----------------------------------------------------------------------------
2.4 PVM access
-----------------------------------------------------------------------------

mypvmid, npvmcpu

mypvmid - task identifier used by PVM for this node
npvmcpu - number of cpus used by PVM in the parallel machine

These functions take no arguments and return an integer.
They give access to some of the underlying PVM information.

-----------------------------------------------------------------------------
3.4 threadson, threadsoff, clearthreads

This set of functions is used for controlling execution of threads.
The paron command leaves the parallel library in a state in which
suspended threads ready to resume and threads launched by a remote node
can be executed when the current thread suspends itself. The threadsoff
command disables the (re)commencement of any other thread. The threadson
command enables the (re)commencement of other threads. The clearthreads
command executes all ready threads until completion or until they suspend
themselves (and even these may complete if they become ready again before
the queue of ready threads becomes empty).

Syntax : No arguments, no function values returned.

3.4.1 threadson : This operation re-enables parallelism.

3.4.2 threadsoff : This operation disables parallelism. Useful
if one is performing a function (eg, simulation steps)
which should not be interrupted to perform script functions.
Also useful if one wishes prevent access (via rpc calls) to script
variables till they have been evaluated.

3.4.3 clearthreads : This operation clears all pending parallel setup
commands. It is meant to be used after threadsoff has been
called, to complete execution of any remaining threads.

-----------------------------------------------------------------------------
Non-local commands
-----------------------------------------------------------------------------

3.3 remote procedure call

3.3.1 Syntax:
command@node-list args [-reduce type]
node-list <- node-spec[,node-spec]...
node-spec <- node[.zone]

NOTE: until a parser bug is fixed, use this variant if any node-specs
are enclosed in {} for evaluation by the parser, e.g. func@{i}.
The resulting warning message that can be ignored:

rcommand command@node-list args [-reduce type]



This causes a command to be executed on one or more nodes. It
suspends the script, creates a new thread on the nodes (usually remote
nodes), executes the command, and resumes the script when the command
has completed on all nodes. While the script is suspended other
threads may execute (other suspended threads which resume, or threads
created by remote calls from other nodes). The result of the command
is returned as a character string - note that this result is undefined
if the function is executed on more than one node.

[Deferred: If the the optional -reduce flag is given, the appropriate
reduction operation is performed on the results from all the nodes on
which the command is executed. The result after reduction is returned
as a character string. Mail ngoddard@psc.edu if you would like this
implemented.]

A heuristic to use to avoid deadlock in a remote procedure call is to
be sure that the function that is executed only performs local
operations. That is, it should not issue any of the remote operations
listed in this document, nor execute any simulation steps for a zone
containing multiple nodes. Local operations are guaranteed to execute
without suspending the thread, so that the remote thread which
implements the rpc call is guaranteed to execute to completion once it
begins execution. Violating this heuristic can easly lead to
livelock, deadlock, or infinite loops.


Example :
On node 0 we set up the function
function foo(x)
int x
return({x*10})
end

On node 1 we issue the command

int myfoo
myfoo = {foo@0 2}
echo {myfoo}

and the reply is :

20

-----------------------------------------------------------------------------

3.3.1 async
Syntax:
async function[@node-list] args... [[-complete]|[-scalar]]

node-list <- node-spec[,node-spec]...
node-spec <- node[.zone]

The async command causes one or more new threads to be created to
execute the function on the node or nodes specified in the nodelist.
These threads execute asynchronously with the script. The function
can be a Genesis shell command (including Unix shell commands) or a
script function. The async command returns as soon as the request for
new threads has been sent. If the [-complete] or [-scalar] flag is
given, the script can synchronize with the completion of the
asynchronous thread(s), and obtain a result in the case of -scalar,
using the "waiton" command. If no node is specified, the current node
is assumed. The async command has special meaning when applied to the
raddmsg command, see the documentation on raddmsg.

A heuristic to use to ensure that the remote thread completes is the
same as that for rpc callse, namely to be sure that the function that
is executed only performs local operations. That is, it should not
issue any of the remote operations listed in this document, nor
execute any simulation steps for a zone containing multiple nodes.
Local operations are guaranteed to execute without suspending the
thread, so that the remote thread which implements the rpc call is
guaranteed to execute to completion once it begins execution.
Violating this heuristic can easly lead to livelock, deadlock, or
infinite loops.

Examples :

async echo@2 foo
will cause node 2 in the caller's zone to echo 'foo'.

async echo@all foo
will cause all nodes in the caller's zone, including
the current one, to echo "foo".

async echo@all.others {mynode}
will cause all nodes in all zones except the zone of
the caller to echo the node number of the caller.
Note that the curly braces are evaluated on the
issuing node, not at the destinations !

async@1 echo@2 foo
will tell node 1 to asynchronously tell node 2 to
echo foo. This is an example of a remote thread
call that violates the heuristic for ensuring the
thread completes.

-----------------------------------------------------------------------------
3.3.2 waiton

Syntax:
waiton This command is used in conjunction with the "async" command. If async is given the -complete or -scalar flag, it returns an integer (a "future") which can be used as an argument to the "waiton" command. The effect is for the script to suspend until the command that was started asynchronously has completed. If the -scalar flag was given, "waiton" returns the result of the command as a character string. This implements a form of "future": the value returned by "async" is a promise of a future value which is retrieved with "waiton". If the -complete flag was given, then a form of split-phase computation is possible. The "async" command is used to initiate a computation and the "waiton" command used to wait for it to complete, with the script able to perform useful work that does not depend on completion between the two. Examples: // using a future to get a result int future, result // run command asynchronously putting the result in a future future = {async command@node args... -scalar} // do some useful work ... // get the result from the future result = {waiton future} // using a future for split-phase computation int future // run command asynchronously returning a future future = {async command@node args... -complete} // do some useful work ... // synchronize with the asynchronous command's completion waiton future ----------------------------------------------------------------------------- ----------------------------------------------------------------------------- 3.5 barrier, barrierall These functions synchronize nodes. A node waits at a barrier until all the other nodes have reached it. The command must be issued for all nodes, and when encountered by any node it blocks till all other nodes have also encountered it. Syntax : barrier [barrier_id [timeout_seconds]] barrierall [barrier_id [timeout_seconds]] Every barrier has an id number, if no number is given the default is specified. Barriers only match if the barrier id matches. Valid id numbers are from 1 to 24. Providing barrier ids in a parallel script helps to verify program correctness (by causing deadlock for incorrect programs). If no barrier ids are provided, then any barrier will match any other barrier in the script. A timeout in seconds can be provided to barrier commands, otherwise the default timeout value is used. 3.5.1 barrier : This applies to all nodes within a zone. The call blocks till all nodes in the zone have called barrier with the given barrier id. 3.5.2 barrierall : This applies to all nodes in all zones in the simulation. The call blocks till all nodes in all zones have called barrierall with the given barrierall id. ----------------------------------------------------------------------------- 3.7 raddmsg This function sets up a message between elements on different nodes but in the same zone. Syntax : raddmsg src dest[@node] MSGTYPE var1 var2.. (Note that this is the same as for addmsg.) As with addmsg, the simulation must be reset before the messages take effect properly. Only the variable fields will be transmitted on every time step. The constants will be set up by the command and then stay fixed. EFFICIENCY NOTE: The raddmsg command requires some communication between the source and destination nodes before it can complete, and the command blocks while this is occurring. If raddmsg is executed asynchronously using the "async" command, then two efficiencies result which can greatly speedup setup phase of a script. First, the script need not wait for a remote message to be completed before starting on the next remote message. Secondly, raddmsg recognizes that it is being called asynchronously and optimizes the communication of message specfication between nodes. In this case, remote message specifications are buffered on the source node until there are a large number for a given destination node. These specifications are shipped off in one large communication to the destination node, which sends back one large response. Buffers are flushed (i.e., remote messages completed) whenever a barrier or step command is issued or whenever required by a "waiton" command. Remote messages from the source node to a given destination node are guaranteed to be completed in the order they are issued in the script. Examples: // connect vector of elements to corresponding elements on other nodes // do it asynchronously for (i = 1; i < n; i = i+1) async raddmsg elt[i] elt[i]@others ... // guarantee all have completed with a synchronous call raddmsg elt[0] elt[0]@others .... Bugs: Messages created in this way cannot be deleted yet. These messages are not reported properly by showmsg or getmsg yet. Active messages cannot have slots yet. Constant fields are not yet implemented. ----------------------------------------------------------------------------- 3.8 rvolumeconnect [DEFERRED] Just like volumeconnect except the destination element specification can include a node-list (e.g., "@all"). When implemented this will be a much more efficient way to make multiple, spatially defined connections. ----------------------------------------------------------------------------- 3.10 rget [DEFERRED] This function returns the value of a global script variable on a remote node. It can only be called for a single remote node. It returns a string. It is a blocking call, which will not mean much unless the remote node has its threads handler turned off. Syntax : rget global[@node[.zone]] Example : (on node 1.1) : str foo="hello there" (on another node) echo {rget(foo@1.1)} hello there ----------------------------------------------------------------------------- 3.11 waitset [DEFERRED] This function polls the value of a global script variable on a remote node, and blocks until that value is nonzero. It returns the value of the remote global when it finally comes through. The function returns a string. However, for remote string variables it will not return unless the string is a nonzero number. Syntax : waitset global[@node[.zone]] [time [hang_time]] The time option allows one to set the polling interval. It defaults to 1 sec. The hang_time option allows one to tell the function when to give up. It defaults to the postmaster hang_time. Example : (on node 1.1) : int foo=0 (on node 0.0) echo {waitset(foo@1.1)} (the command blocks...) (on node 1.1) : foo=123 (on node 0.0, the poll command finally returns) 123 -----------------------------------------------------------------------------