I am a researcher in the field of (metaheuristic) optimization, which means that I am developing and analyzing optimization algorithms, methods to find good solutions for hard problems. A hard problem is one for which we cannot obtain the optimal (best possible) solution within feasible time. Instead, heuristic and metaheuristic algorithms are used to obtain acceptably good solutions within an acceptable runtime. The goal of research in optimization is to find the best such methods for given problems or problem classes. This, of course, entails that we somehow need to compare optimization algorithms, which turns out to actually be quite complicated. Below I list some of the issues:

- For once, finding good optimization algorithms is actually a multi-objective problem, since we want algorithms, which can find solutions that are as good as possible as fast as possible.
- Second, many optimization algorithms are actually anytime algorithms: They have an “approximate” solution any time during their execution and as time progresses, the approximation quality should improve. An Evolutionary Algorithm (EA), for instance, starts with a set of randomly created (bad) solutions and refines them iteratively, (hopefully) constructing better and better solutions. The same is true for local search algorithms. Branch and Bound algorithms are anytime algorithms too. Per definition, an anytime algorithm has not just a single, final result, but creates many solutions on its path and can be run either for a short or a very, very long time. If there are no “final results”, comparing two algorithms means we somehow need to compare progress over runtime.
- This leads to the third problem: How can we define runtime? Is it the number of created candidate solutions? Or the actual CPU time which has elapsed. None of these two methods is really satisfying, an issue I have discussed already in an older post.
- Fourth, comparison must be done in a statistically sound way, which means that experiments need to be repeated for, say, 30 times, robust statistics must be calculated, and several different benchmark problem instances must be used.
- Fifth, several different statistics can be computed based on the gathered data. It is necessary to somehow aggregate them into meaningful conclusions.
- All of that, running experiments several times on several benchmark instances and evaluating the large amount of generated data, drawing figures, etc. is quite a lot of (cumbersome) work. It needs tool support.

In the current issue (August 2014) of the IEEE Computational Intelligence Magazine (CIM), there will be a paper titled *“Benchmarking Optimization Algorithms: An Open Source Framework for the Traveling Salesman Problem.”* In this paper, my colleagues and me present some ideas to tackle the problems listed above. Our *TSP Suite* is an automated experimentation framework for the TSP, which aids the researchers in implementing and unit testing their algorithms, running experiments in an automated fashion in parallel or distributed on a cluster, and evaluating the generated log data, resulting in stand-alone reports almost similar to Bachelor’s theses. The work presented in the paper is still preliminary – there is lots of things that need to be improved (in particular in terms of the statistics side) and there are many more issues to explore – but I think we are on a good way.

The abstract of our article is as follows:

The article proposes an experimentation procedure for evaluating and comparing optimization algorithms. It is argued that end-of-run results alone do not give sufficient information about an algorithm’s performance, so the proposed approach analyzes the algorithm’s progress over time. Comparisons of performance curves in diagrams can be formalized by comparing the areas under them. Algorithms can be ranked according to a performance metric. Rankings based on different metrics can then be aggregated into a global ranking, which provides a quick overview of the quality of algorithms in comparison. An open source software framework based on the Traveling Salesman Problem (TSP), called the

TSP Suite, is presented as an application of this experimental procedure. The framework can support researchers in implementing and unit testing TSP solvers and running experiments in a parallel and distributed fashion. It also has an evaluator component, which implements the proposed evaluation process and produces detailed reports. Comprehensive experiments have been conducted using theTSP Suiteto benchmark several local search and evolutionary computation methods. The results show that the tested pure global optimization algorithms are outperformed by local search. The best results, however, come from hybrid algorithms.

Thomas Weise, Raymond Chiong, Ke Tang, Jörg Lässig, Shigeyoshi Tsutsui, Wenxiang Chen, Zbigniew Michalewicz, and Xin Yao. *Benchmarking Optimization Algorithms: An Open Source Framework for the Traveling Salesman Problem.* IEEE Computational Intelligence Magazine (CIM), 9(3):40-52, August 2014. doi:10.1109/MCI.2014.2326101

Of course, our work is not the first one on this topic. Some pioneering and outstanding contributions, to name a few, are:

- The COCO Framework for numerical optimization used in the BBOB workshops,
- UBCSAT for satisfiability problems, and
- the Sequential Parameter Optimization Toolbox (SPOT) for algorithm parameter tuning.