Remove variance penality by default

breuleux · Jun 30, 2020 · b88a0ca · b88a0ca
1 parent 0826218
commit b88a0ca
Show file tree

Hide file tree

Showing 3 changed files with 22 additions and 5 deletions.
diff --git a/README.md b/README.md
@@ -162,7 162,8 @@ milarun report $OUTDIR --html results.html --weights weights/standard.json
 
 Notes:
 
-* `perf_adj = perf * (1 - std%) * (1 - fail/n)` -- it is a measure of performance that penalizes variance and test failures.
 * `perf_adj = perf * (1 - fail/n)` -- it is a measure of performance that penalizes test failures.
  * If the `--penalize-variance` flag is given (this is off by default): `perf_adj = perf * (1 - std%) * (1 - fail/n)` -- in this case we penalize variance by subtracting one standard deviation.
 * `score = exp(sum(log(perf_adj) * weight) / sum(weight))` -- this is the weighted geometric mean of `perf_adj`.
 
 
@@ -219,7 220,7 @@ The cgroups are used to emulate multiple users and force the resources of each u
 
 **Does the cgroups setup affect the results of the benchmark?**
 
-Yes. Because of resource segregation, the multiple experiments launched by `milarun` in parallel will not fight for resources, leading to reduced variance and different performance characteristics (some tests may do a little better, some others may do a little worse). According to our experiments, using the cgroup setup can increase the score by 2 to 3%.
 Yes. Because of resource segregation, the multiple experiments launched by `milarun` in parallel will not fight for resources, leading to reduced variance and different performance characteristics (some tests do a little better, but most do a little worse). According to our experiments, using the cgroup setup increases the score by about 1%.
 
 **Can we run the benchmarks without the cgroups?**
 

diff --git a/milarun/cli.py b/milarun/cli.py
@@ -390,6 390,9 @@ def command_report(subargv):
  # Compare the configuration's individual GPUs
  compare_gpus: Argument & bool = default(False)
 
  # Whether to penalize variance in the score (defaults to False)
  penalize_variance: Argument & bool = default(False)
 
  # Price of the configuration, to compute score/price ratio
  price: Argument & float = default(None)
 
@@ -411,6 414,7 @@ def command_report(subargv):
  compare_gpus=compare_gpus,
  price=price,
  title=title,
  penalize_variance=penalize_variance,
  )
 
 

diff --git a/milarun/lib/report.py b/milarun/lib/report.py
@@ -100,7 100,7 @@ def summarize(report_folder, filter, group):
  }
 
 
-def _make_row(summary, compare, weights):
 def _make_row(summary, compare, weights, penalize_variance=False):
  row = {}
  if weights is not None:
  row["weight"] = weights["weight"] if weights else nan
@@ -112,11 112,21 @@ def _make_row(summary, compare, weights):
  row["perf_ratio"] = row["perf"] / row["perf_base"]
  # row["std"] = summary["train"]["std"] if summary else nan
  row["std%"] = summary["train"]["std"] / summary["train"]["mean"] if summary else nan
  if penalize_variance:
  penalty = min(summary["train"]["std"], row["perf"])
  else:
  penalty = 0
  row["perf_adj"] = (1 - row["fail"] / row["n"]) * (
- row["perf"] - min(summary["train"]["std"], row["perf"])
  row["perf"] - penalty
  ) if summary else nan
  if compare is not None:
- row["perf_base_adj"] = row["perf_base"] - min(compare["train"]["std"], row["perf_base"]) if compare else nan
  if penalize_variance:
  penalty = min(compare["train"]["std"], row["perf_base"]) if compare else nan
  else:
  penalty = 0
  row["perf_base_adj"] = (
  row["perf_base"] - penalty if compare else nan
  )
  row["perf_ratio_adj"] = row["perf_adj"] / row["perf_base_adj"]
  return row
 
@@ -242,6 252,7 @@ def make_report(
  compare_gpus=False,
  price=None,
  title=None,
  penalize_variance=False,
 ):
  all_keys = list(sorted({
  *(summary.keys() if summary else []),
@@ -255,6 266,7 @@ def make_report(
  summary.get(key, {}),
  compare and compare.get(key, {}),
  weights and weights.get(key, {}),
  penalize_variance=penalize_variance,
  )
  for key in all_keys
  }