The labschedule Tool

Next: Running an Experiment Up: Specifying the Executable and Previous: Multiple Data Sets, Compilations, Contents Index

The `labschedule` Tool

For more elaborate scheduling of multiple experiments, the tool labschedule is provided. This tool provides an easy means to

loop through sets of input values,
start several experiments simultaneously on one machine,
distribute a set of experiments among a cluster of machines.

This tool's extreme flexibility comes through the use of loops and variables. Variable names begin with a % and loops are designated using the --for option.

Each loop has an associated variable that is simply the number of the loop in the command line preceded by a %. For example, to run several experiments that differ only in the arguments given to the program, a single for loop will suffice.

    labschedule --for='10 20 30' bench %1

This command will cause the three experiments

   bench 10
   bench 20
   bench 30

to be started in succession on the local machine. More precisely, the following three labrun commands will be issued:

   labrun --name=schedule-10 bench 10
   labrun --name=schedule-20 bench 20
   labrun --name=schedule-30 bench 30

The --print option of labschedule will show you the commands that are to be executed with all variable names replaced with the corresponding values. Any number of --for options are possible, and the result will be a nested loop structure, with the first --for command corresponding to the outermost loop. For example,

   labschedule --for='10 20 30' --for='a b' bench %1 %2

will, in essence, cause the following to be executed:

   for %1 in [10, 20, 30] do
      for %2 in [a, b] do
         labrun --name=schedule-%1-%2 bench %1 %2

and thus six experiments will be started in succession.

There is a rich syntax available for specifying the ranges of the for loop variables. One can use python expressions (e.g., range(5) specifies the range 0 1 2 3 4), the results of commands (e.g., `find . -name \*.in -print`), the contents of files (e.g., @input), and the values of environment variables (e.g., $DATA_DIR/*.dat). The last example also shows that words containing a '*' or '?' will be replaced by files matching the pattern. Sytnax is also available for selecting regular expressions from any of these values.

In addition to the loop variables, several variables (e.g., %currdir, %host, %name) are predefined and will be expanded to their appropriate values upon execution of the loop command. The flag --macro allows the you to define other variables appropriate to your expeirments.

When scheduling many experiments at once, one may want to avoid the creation of many individual labrun calls, each of which will create its own .log file (See Section ). You can therefore limit the amount of nesting of the loops with the --nesting flag. If the value set with this flag is smaller than the number of loops specified, the executable given to labrun will itself be a call to labschedule containing the remaining loops . This labschedule call will not issue calls to labrun but will execute the commands give to it directly (achieved through the use of the --direct flag). For example,

    labschedule --for='x y' --for='A B' --nesting=1 bench %1 %2

will result in the following two calls to labrun

   labrun --name=schedule-x labschedule -d --nesting=1 --for='A B' bench x %2
   labrun --name=schedule-y labschedule -d --nesting=1 --for='A B' bench y %2

If the various experiments being scheduled could be run on any one of a cluster of machines, you can specify the names of the machines with the flag --hosts and labschedule will schedule the tasks on these machines as they become idle. For example, the effect of

     labschedule --for='10 20 30' --hosts='localhost turing' bench %1

is that the following two commands would be issued immediately:

     labrun --name=schedule-10 ssh localhost cd %curdir; bench 10
     labrun --name=schedule-20 ssh turing cd %curdir; bench 20

Then, when one of these two runs finshes, the third call to labrun for bench 30 would be issued using ssh to the idle machine.

If it is possible to have more than one instance of your experiment running at a time, the flag --maxtasks can be used to increase the maximum number of simultaneous experiments per machine. By default, each machine is assigned the number of tasks specified by --maxtasks (which is, by default, 1), but it may be desirable to check other conditions (such as the load of the machine) to determine if a host can accept a new task. For this, the flag --check, with which you can specify a condition to be checked, is available as well as the variable %idle that determines a host's idle percentage and %check that determines if a host's idle percentage is above 5.

In the course of running multiple experiments, it may happen that some of them fail for one reason or another. By default, labschedule will abort after such a failure. This behavior can be changed (with --ignore) such that the remaining experiments will continue to be scheduled. To rerun any failed experiments, it suffices to call labschedule once again in the same way it was originally called. The experiments that did not successfully finish will be rerun, but experiments for which a log file exists in which a successful completion is recorded are not rerun. Alternatively, one can indicate that all experiments should be rerun (--noskip), and/or that the log files of failed experiments will be preserved (--keep).

Further options for this tool allow one to specify the location of the log files (by default, this is ./lab_log); the prefix of the name to be passed to labrun (by default, this is schedule); a command other than ssh %host cd %curdir; to insert before the labrun call; further options to be passed to labrun; that the command should be run without using labrun or run in the background.

In addition to the log and output files produced by labrun, labschedule keeps track of its own actions in three files: a .log file that logs all relevant actions, a .out file that holds the output of all successful runs, and a .err file that holds the output of all failed runs. Note that this is in contrast to the meaning of .out and .err used for labrun. The files will be located in the same log directory as the files of labrun, and the names will be as follows: <exp_name>-<date and time>.<ext>, where <exp_name> is schedule by default and otherwise the name given as an arugment with the --name flag.

Next: Running an Experiment Up: Specifying the Executable and Previous: Multiple Data Sets, Compilations, Contents Index

Tobias Polzin 2003-05-30