Skip to content

Running simulations with CWS

davidshepherd7 edited this page Nov 10, 2014 · 14 revisions

Introduction

This page describes how to run simple simulations with CWS. The only prerequisite is that you know how to download and compile the code.

Getting sample workflows to simulate

If you don't have any workflows to simulate yet you can download them from Pegasus' page:
https://confluence.pegasus.isi.edu/display/pegasus/WorkflowGenerator
There you can find a set of sample workflows. For us the most interesting thing is downloadable set of pre-generated workflows available under:
https://download.pegasus.isi.edu/misc/SyntheticWorkflows.tar.gz (300 MB)
Download and extract the given archive.
Now there's a little problem. Downloaded workflows are in DAX format and we need files in DAG format. Fortunately that's not a problem, because we've created a tool for converting between those formats. The converting ruby script is located under scripts/ruby/dax2dag.rb (you will need the nokogiri ruby package). Just run it to convert any given DAX workflow. What's more, under Linux you can convert all downloaded DAXes easily (remember to change parts in <...>):

$ mkdir dags
$ cd dags
$ find <extracted_archive_location> -name '*.dax' -exec ruby <cws_directory>/scripts/converters/dax2dag.rb '{}' \;

Now you will have all workflows in DAG format in dags directory.

Running a simulation

In order to run a simulation you must first build CWS and then cd into CWS root directory. Suppose you want to run a simulation on GENOME workflows. To do this you may type:

$ ant
$ ./scripts/runners/run_simulation_locally.sh "--application GENOME --input-dir <dags-dir> --output-file simulation_out.csv --distribution pareto_unsorted --algorithm SPSS --storage-manager void --ensemble-size 20"

Short explaination to this command. First we set the classpath to current directory and directory with libraries. Then we selected a main class to run. Next, the application parameter said that we wan to simulate GENOME app, inputdir is the directory with *.dag files, outputfile is the simulation's results file, ensembleSize means how many workflows we will fit into ensemble, and finally, algorithm says which algorithm we want to simulate. All application's parameters will be described later.
After running this command and probably tuning some parameters you will get a CSV file with a pretty large number of columns which need to be described.

Simulation params

usage: cws.core.algorithms.TestRun
 -alg,--algorithm <ALGO>              (required) Algorithm
 -alp,--alpha <FLOAT>                 Optional alpha factor, defaults to 0.7
 -app,--application <APP>             (required) Application name
 -b,--budget <DEADLINE>               Optional budget, which overrides max and min budgets
    --chunk-transfer-time <TIME>      Global storage file chunk transfer time, defaults to 1.0
 -cs,--cache-size <SIZE>              VM cache size, defaluts to 100000000 bytes
 -d,--deadline <DEADLINE>             Optional deadline, which overrides max and min deadlines
 -dl,--delay <DELAY>                  Delay, defaluts to 0.0
 -dst,--distribution <DIST>           (required) Distribution
 -el,--enable-logging <BOOL>          Whether to enable logging, defaults to true
 -es,--ensemble-size <SIZE>           Ensemble size, defaults to 20
 -fr,--failure-rate <RATE>            Faliure rate, defaults to 0.0
 -id,--input-dir <DIR>                (required) Input dir
    --latency <LATENCY>               Global storage latency, defaults to 0.01
 -ms,--max-scaling <FLOAT>            Optional maximum VM number scaling factor, defaults to 1.0
 -nb,--n-budgets <N>                  Optional number of generated budgets, defaults to 10
 -nd,--n-deadlines <N>                Optional number of generated deadlines, defaults to 10
    --num-replicas <NUM>              Global storage num replicas, defaults to 1
 -of,--output-file <FILE>             (required) Output file
 -rv,--runtime-variance <VAR>         Runtime variance, defaults to 0.0
 -s,--seed <SEED>                     Random number generator seed, defaults to current time in milis
 -sc,--storage-cache <CACHE>          Storage cache, defaults to void
 -sf,--scaling-factor <FACTOR>        Scaling factor, defaults to 1.0
 -sm,--storage-manager <MRG>          (required) Storage manager
    --storage-manager-read <SPEED>    (required for storage-manager=global) Global storage manager read speed
    --storage-manager-write <SPEED>   (required for storage-manager=global) Global storage manager write speed

Analyzing the outputs

In the output file you will see a bunch of of columns. Among them the most notable are:

  • budget - the maximum budget used in the given row
  • deadline - the maximum makespan of the given row
  • algorithm - the name of an algorithm used in the given row
  • completed - the number of completed workflows
  • cost - the cost of resources
  • jobfinish, dagfinish, vmfinish - the finish times