Skip to content

Commit

Permalink
Remove variance penality by default
Browse files Browse the repository at this point in the history
  • Loading branch information
breuleux committed Jun 30, 2020
1 parent 0826218 commit b88a0ca
Show file tree
Hide file tree
Showing 3 changed files with 22 additions and 5 deletions.
5 changes: 3 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -162,7 162,8 @@ milarun report $OUTDIR --html results.html --weights weights/standard.json

Notes:

* `perf_adj = perf * (1 - std%) * (1 - fail/n)` -- it is a measure of performance that penalizes variance and test failures.
* `perf_adj = perf * (1 - fail/n)` -- it is a measure of performance that penalizes test failures.
* If the `--penalize-variance` flag is given (this is off by default): `perf_adj = perf * (1 - std%) * (1 - fail/n)` -- in this case we penalize variance by subtracting one standard deviation.
* `score = exp(sum(log(perf_adj) * weight) / sum(weight))` -- this is the weighted geometric mean of `perf_adj`.


Expand Down Expand Up @@ -219,7 220,7 @@ The cgroups are used to emulate multiple users and force the resources of each u

**Does the cgroups setup affect the results of the benchmark?**

Yes. Because of resource segregation, the multiple experiments launched by `milarun` in parallel will not fight for resources, leading to reduced variance and different performance characteristics (some tests may do a little better, some others may do a little worse). According to our experiments, using the cgroup setup can increase the score by 2 to 3%.
Yes. Because of resource segregation, the multiple experiments launched by `milarun` in parallel will not fight for resources, leading to reduced variance and different performance characteristics (some tests do a little better, but most do a little worse). According to our experiments, using the cgroup setup increases the score by about 1%.

**Can we run the benchmarks without the cgroups?**

Expand Down
4 changes: 4 additions & 0 deletions milarun/cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -390,6 390,9 @@ def command_report(subargv):
# Compare the configuration's individual GPUs
compare_gpus: Argument & bool = default(False)

# Whether to penalize variance in the score (defaults to False)
penalize_variance: Argument & bool = default(False)

# Price of the configuration, to compute score/price ratio
price: Argument & float = default(None)

Expand All @@ -411,6 414,7 @@ def command_report(subargv):
compare_gpus=compare_gpus,
price=price,
title=title,
penalize_variance=penalize_variance,
)


Expand Down
18 changes: 15 additions & 3 deletions milarun/lib/report.py
Original file line number Diff line number Diff line change
Expand Up @@ -100,7 100,7 @@ def summarize(report_folder, filter, group):
}


def _make_row(summary, compare, weights):
def _make_row(summary, compare, weights, penalize_variance=False):
row = {}
if weights is not None:
row["weight"] = weights["weight"] if weights else nan
Expand All @@ -112,11 112,21 @@ def _make_row(summary, compare, weights):
row["perf_ratio"] = row["perf"] / row["perf_base"]
# row["std"] = summary["train"]["std"] if summary else nan
row["std%"] = summary["train"]["std"] / summary["train"]["mean"] if summary else nan
if penalize_variance:
penalty = min(summary["train"]["std"], row["perf"])
else:
penalty = 0
row["perf_adj"] = (1 - row["fail"] / row["n"]) * (
row["perf"] - min(summary["train"]["std"], row["perf"])
row["perf"] - penalty
) if summary else nan
if compare is not None:
row["perf_base_adj"] = row["perf_base"] - min(compare["train"]["std"], row["perf_base"]) if compare else nan
if penalize_variance:
penalty = min(compare["train"]["std"], row["perf_base"]) if compare else nan
else:
penalty = 0
row["perf_base_adj"] = (
row["perf_base"] - penalty if compare else nan
)
row["perf_ratio_adj"] = row["perf_adj"] / row["perf_base_adj"]
return row

Expand Down Expand Up @@ -242,6 252,7 @@ def make_report(
compare_gpus=False,
price=None,
title=None,
penalize_variance=False,
):
all_keys = list(sorted({
*(summary.keys() if summary else []),
Expand All @@ -255,6 266,7 @@ def make_report(
summary.get(key, {}),
compare and compare.get(key, {}),
weights and weights.get(key, {}),
penalize_variance=penalize_variance,
)
for key in all_keys
}
Expand Down

0 comments on commit b88a0ca

Please sign in to comment.