vignettes/supplementary/mlrmbo_and_the_command_line.rmd
mlrmbo_and_the_command_line.rmd
This Vignette demonstrates two ways of interaction through the command line. In the first part the algorithm we want to optimize is a program that we call from the command line. The second part shows how to call R and mlrMBO from the command line so that we don’t have to interact with R at all anymore.
First of all we need a bash script that we want to optimize. This Vignette is aimed at users of Unix systems (Linux, OSX etc.) but should also be informative for windows users. The following code writes a bash script that uses bc
to calculate \(sin(x_1-1) + (x_1^2 + x_2^2)\) and writes the result in a text file. This will serve as our target algorithm that we want to optimize.
# write bash script
lines = '#!/bin/bash
fun ()
{
x1=$1
x2=$2
command="(s($x1-1) + ($x1^2 + $x2^2))"
result=$(bc -l <<< $command)
}
echo "Start calculation."
fun $1 $2
echo "The result is $result!" > "result.txt"
echo "Finish calculation."
'
writeLines(lines, "fun.sh")
# make it executable:
system("chmod +x fun.sh")
The following code is an R function that starts the script, reads the result from the text file and returns it.
library(stringi)
runScript = function(x) {
command = sprintf("./fun.sh %f %f", x[['x1']], x[['x2']])
error.code = system(command)
if (error.code != 0) {
stop("Simulation had error.code != 0!")
}
result = readLines("result.txt")
# the pattern matches 12 as well as 12.34 and .34
# the ?: makes the decimals a non-capturing group.
result = stri_match_first_regex(result, pattern = "\\d*(?:\\.\\d+)?(?=\\!)")
as.numeric(result)
}
This function uses stringi
and regular expressions to match the actual result value in the result file. Depending on the output different strategies to read the result make sense. XML files can usually be accessed with XML::xmlParse
, XML::getNodeSet
, XML::xmlAttrs
etc. using XPath
queries. Sometimes read.table()
is also sufficient. Another way is to use source
if the result actually can be interpreted as valid R code. If, for example, the output is written in a file like this:
value1 = 23.45
value2 = 13.82
We can easily use source()
like that:
EV = new.env()
eval(expr = {a = 1}, envir = EV)
as.list(EV)
source(file = "result.txt", local = EV)
res = as.list(EV)
rm(EV)
which will return a list with the entries $value1
and $value2
.
To evaluate the function from within mlrMBO it has to be wrapped in smoof function. The smoof function also contains information about the bounds and scales of the domain of the objective function defined in a ParameterSet.
library(mlrMBO)
# Defining the bounds of the parameters:
par.set = makeParamSet(
makeNumericParam("x1", lower = -3, upper = 3),
makeNumericParam("x2", lower = -2.5, upper = 2.5)
)
# Wrapping everything in a smoof function:
fn = makeSingleObjectiveFunction(
id = "fun.sh",
fn = runScript,
par.set = par.set,
has.simple.signature = FALSE
)
We confirm that the function works as intended and evaluate the initial design:
des = generateGridDesign(par.set, resolution = 3)
des$y = apply(des, 1, fn)
des
## x1 x2 y
## 1 -3 -2.5 16.006802
## 2 0 -2.5 5.408529
## 3 3 -2.5 16.159297
## 4 -3 0.0 9.756802
## 5 0 0.0 0.841471
## 6 3 0.0 9.909297
## 7 -3 2.5 16.006802
## 8 0 2.5 5.408529
## 9 3 2.5 16.159297
The optimization with mlrMBO gets started as usually:
ctrl = makeMBOControl()
ctrl = setMBOControlInfill(ctrl, crit = crit.ei)
ctrl = setMBOControlTermination(ctrl, iters = 10)
configureMlr(show.info = FALSE, show.learner.output = FALSE)
run = mbo(fun = fn, control = ctrl)
## Computing y column(s) for design. Not provided.
## [mbo] 0: x1=-1.58; x2=-1.37 : y = 3.83 : 0.0 secs : initdesign
## [mbo] 0: x1=-0.469; x2=0.508 : y = 0.517 : 0.0 secs : initdesign
## [mbo] 0: x1=1.25; x2=2.37 : y = 7.43 : 0.0 secs : initdesign
## [mbo] 0: x1=-2.93; x2=-0.35 : y = 9.41 : 0.0 secs : initdesign
## [mbo] 0: x1=0.288; x2=-0.778 : y = 0.0356 : 0.0 secs : initdesign
## [mbo] 0: x1=2.46; x2=1.02 : y = 8.06 : 0.0 secs : initdesign
## [mbo] 0: x1=2.11; x2=-2.06 : y = 9.58 : 0.0 secs : initdesign
## [mbo] 0: x1=-1.16; x2=1.25 : y = 2.09 : 0.0 secs : initdesign
## [mbo] 1: x1=-0.0313; x2=-2.04 : y = 3.3 : 0.0 secs : infill_ei
## [mbo] 2: x1=0.104; x2=-0.113 : y = 0.757 : 0.0 secs : infill_ei
## [mbo] 3: x1=0.35; x2=0.734 : y = 0.0556 : 0.0 secs : infill_ei
## [mbo] 4: x1=-0.603; x2=2.5 : y = 5.61 : 0.0 secs : infill_ei
## [mbo] 5: x1=-0.0466; x2=0.862 : y = 0.12 : 0.0 secs : infill_ei
## [mbo] 6: x1=-0.204; x2=-0.878 : y = 0.121 : 0.0 secs : infill_ei
## [mbo] 7: x1=0.179; x2=-0.974 : y = 0.25 : 0.0 secs : infill_ei
## [mbo] 8: x1=0.036; x2=-0.721 : y = 0.301 : 0.0 secs : infill_ei
## [mbo] 9: x1=0.511; x2=-0.825 : y = 0.473 : 0.0 secs : infill_ei
## [mbo] 10: x1=0.266; x2=0.948 : y = 0.299 : 0.0 secs : infill_ei
# The resulting optimal configuration:
run$x
## $x1
## [1] 0.2879772
##
## $x2
## [1] -0.778453
# The best reached value:
run$y
## [1] 0.03555322
To start the optimization from a command line we have to write a R-script that also serves as the configuration for mlrMBO.
The following is a complete script based on the examples given above that accepts some basic arguments and writes the output as a JSON file.
library(mlrMBO)
library(stringi)
library(jsonlite)
# read command line args (in a not very safe way)
# Script can be called like that:
# Rscript runMBO.R iters=20 time=10 seed=1
args = commandArgs(TRUE)
# defaults:
iters = 50
time = 30
seed = 123
# parse args (and possibly overwrite defaults)
for (arg in args) {
eval(parse(text = arg))
}
set.seed(seed)
# write bash script
lines = '#!/bin/bash
fun ()
{
x1=$1
x2=$2
command="(s($x1-1) + ($x1^2 + $x2^2))"
result=$(bc -l <<< $command)
}
echo "Start calculation."
fun $1 $2
echo "The result is $result!" > "result.txt"
echo "Finish calculation."
'
writeLines(lines, "fun.sh")
system("chmod +x fun.sh")
# runScript function to execute bash script
runScript = function(x) {
# console output file output_1490030005_1.1_2.4.txt
output_file = sprintf("output_%i_%.1f_%.1f.txt", as.integer(Sys.time()), x[['x1']], x[['x2']])
# redirect output with ./fun.sh 1.1 2.4 > output.txt
# alternative: ./fun.sh 1.1 2.4 > /dev/null to drop it
command = sprintf("./fun.sh %f %f > %s", x[['x1']], x[['x2']], output_file)
error.code = system(command)
if (error.code != 0) {
stop("Simulation had error.code != 0!")
}
result = readLines("result.txt")
# the pattern matches 12 as well as 12.34 and .34
# the ?: makes the decimals a non-capturing group.
result = stri_match_first_regex(result, pattern = "\\d*(?:\\.\\d+)?(?=\\!)")
as.numeric(result)
}
# define mlrMBO optimization
par.set = makeParamSet(
makeNumericParam("x1", lower = -3, upper = 3),
makeNumericParam("x2", lower = -2.5, upper = 2.5)
)
fn = makeSingleObjectiveFunction(
id = "fun.sh",
fn = runScript,
par.set = par.set,
has.simple.signature = FALSE
)
ctrl = makeMBOControl()
ctrl = setMBOControlInfill(ctrl, crit = crit.ei)
ctrl = setMBOControlTermination(ctrl, iters = iters, time.budget = time)
configureMlr(show.info = FALSE, show.learner.output = FALSE)
run = mbo(fun = fn, control = ctrl)
# clean up intermediate files:
file.remove("result.txt")
output.files = list.files(pattern = "output_\\d+_[0-9_.-]+\\.txt")
file.remove(output.files)
# save result to json
write_json(run[c("x","y")], "mbo_res.json")
Assuming we saved the lines above in a file called runMBO.R
, we can simply run it from the command line as follows:
Rscript runMBO.R
As the script also handles some additional arguments it can also be called with the number of MBO iterations (iters
), the maximal time budget in seconds (time
) and a seed
value for reproducibility.
Rscript runMBO.R iters=20 time=10 seed=3
To build a more advanced command line interface you might want to have a look at docopt.