For more elaborate scheduling of multiple experiments, the tool labschedule is provided. This tool provides an easy means to
%
and loops are designated using the
--for option.
Each loop has an associated variable that is simply the number of the
loop in the command line preceded by a %
.
For example, to run several experiments that differ only in the arguments
given to the program, a single for loop will suffice.
labschedule --for='10 20 30' bench %1This command will cause the three experiments
bench 10 bench 20 bench 30to be started in succession on the local machine. More precisely, the following three labrun commands will be issued:
labrun --name=schedule-10 bench 10 labrun --name=schedule-20 bench 20 labrun --name=schedule-30 bench 30The --print option of labschedule will show you the commands that are to be executed with all variable names replaced with the corresponding values. Any number of --for options are possible, and the result will be a nested loop structure, with the first --for command corresponding to the outermost loop. For example,
labschedule --for='10 20 30' --for='a b' bench %1 %2will, in essence, cause the following to be executed:
for %1 in [10, 20, 30] do for %2 in [a, b] do labrun --name=schedule-%1-%2 bench %1 %2and thus six experiments will be started in succession.
There is a rich syntax available for specifying the ranges of the
for loop variables. One can use
python expressions (e.g., range(5) specifies the range 0 1 2 3 4),
the results of commands (e.g., `find . -name \*.in -print`
), the
contents of files (e.g., @input
), and the values of environment
variables (e.g., $DATA_DIR/*.dat). The last example also
shows that words containing a '*' or '?' will be
replaced by files matching the pattern.
Sytnax is also available for selecting regular
expressions from any of these values.
In addition to the loop variables, several variables (e.g., %currdir, %host, %name) are predefined and will be expanded to their appropriate values upon execution of the loop command. The flag --macro allows the you to define other variables appropriate to your expeirments.
When scheduling many experiments at once, one may want to avoid the
creation of many individual labrun calls, each of which will create
its own .log file (See Section ). You can
therefore limit the amount of nesting of the loops with the --nesting
flag. If the value set with this flag is smaller than the number of loops
specified, the executable given to labrun will itself be a call to
labschedule containing the remaining loops . This labschedule call
will not issue calls to labrun but will execute the commands give to
it directly (achieved through the use of the --direct flag). For
example,
labschedule --for='x y' --for='A B' --nesting=1 bench %1 %2will result in the following two calls to labrun
labrun --name=schedule-x labschedule -d --nesting=1 --for='A B' bench x %2 labrun --name=schedule-y labschedule -d --nesting=1 --for='A B' bench y %2
If the various experiments being scheduled could be run on any one of a cluster of machines, you can specify the names of the machines with the flag --hosts and labschedule will schedule the tasks on these machines as they become idle. For example, the effect of
labschedule --for='10 20 30' --hosts='localhost turing' bench %1is that the following two commands would be issued immediately:
labrun --name=schedule-10 ssh localhost cd %curdir; bench 10 labrun --name=schedule-20 ssh turing cd %curdir; bench 20Then, when one of these two runs finshes, the third call to labrun for bench 30 would be issued using ssh to the idle machine.
If it is possible to have more than one instance of your experiment running at a time, the flag --maxtasks can be used to increase the maximum number of simultaneous experiments per machine. By default, each machine is assigned the number of tasks specified by --maxtasks (which is, by default, 1), but it may be desirable to check other conditions (such as the load of the machine) to determine if a host can accept a new task. For this, the flag --check, with which you can specify a condition to be checked, is available as well as the variable %idle that determines a host's idle percentage and %check that determines if a host's idle percentage is above 5.
In the course of running multiple experiments, it may happen that some of them fail for one reason or another. By default, labschedule will abort after such a failure. This behavior can be changed (with --ignore) such that the remaining experiments will continue to be scheduled. To rerun any failed experiments, it suffices to call labschedule once again in the same way it was originally called. The experiments that did not successfully finish will be rerun, but experiments for which a log file exists in which a successful completion is recorded are not rerun. Alternatively, one can indicate that all experiments should be rerun (--noskip), and/or that the log files of failed experiments will be preserved (--keep).
Further options for this tool allow one to specify the location of the log
files (by default, this is ./lab_log); the prefix of the name to be
passed to labrun (by default, this is schedule); a
command other than ssh %host cd %curdir;
to insert before the
labrun call; further options to be passed to labrun; that the command
should be run without using labrun or run in the background.
In addition to the log and output files
produced by labrun, labschedule keeps track of its own actions in
three files: a .log file that logs all relevant actions, a
.out file that holds the output of all successful runs, and a
.err file that holds the output of all failed runs.
Note that this is in contrast to the meaning of .out
and .err used for labrun. The files will be located in the
same log directory as the files of labrun, and the names will be as
follows: <exp_name>-<date and time>.<ext>
, where <exp_name>
is schedule by default and otherwise the name given as an arugment
with the --name flag.