New Article: An Open Source Framework for the Traveling Salesman Problem

I am a researcher in the field of (metaheuristic) optimization, which means that I am developing and analyzing optimization algorithms, methods to find good solutions for hard problems. A hard problem is one for which we cannot obtain the optimal (best possible) solution within feasible time. Instead, heuristic and metaheuristic algorithms are used to obtain acceptably good solutions within an acceptable runtime. The goal of research in optimization is to find the best such methods for given problems or problem classes. This, of course, entails that we somehow need to compare optimization algorithms, which turns out to actually be quite complicated. Below I list some of the issues:

  1. For once, finding good optimization algorithms is actually a multi-objective problem, since we want algorithms, which can find solutions that are as good as possible as fast as possible.
  2. Second, many optimization algorithms are actually anytime algorithms: They have an “approximate” solution any time during their execution and as time progresses, the approximation quality should improve. An Evolutionary Algorithm (EA), for instance, starts with a set of randomly created (bad) solutions and refines them iteratively, (hopefully) constructing better and better solutions. The same is true for local search algorithms. Branch and Bound algorithms are anytime algorithms too. Per definition, an anytime algorithm has not just a single, final result, but creates many solutions on its path and can be run either for a short or a very, very long time. If there are no “final results”, comparing two algorithms means we somehow need to compare progress over runtime.
  3. This leads to the third problem: How can we define runtime? Is it the number of created candidate solutions? Or the actual CPU time which has elapsed. None of these two methods is really satisfying, an issue I have discussed already in an older post.
  4. Fourth, comparison must be done in a statistically sound way, which means that experiments need to be repeated for, say, 30 times, robust statistics must be calculated, and several different benchmark problem instances must be used.
  5. Fifth, several different statistics can be computed based on the gathered data. It is necessary to somehow aggregate them into meaningful conclusions.
  6. All of that, running experiments several times on several benchmark instances and evaluating the large amount of generated data, drawing figures, etc. is quite a lot of (cumbersome) work. It needs tool support.

In the current issue (August 2014) of the IEEE Computational Intelligence Magazine (CIM), there will be a paper titled “Benchmarking Optimization Algorithms: An Open Source Framework for the Traveling Salesman Problem.” In this paper, my colleagues and me present some ideas to tackle the problems listed above. Our TSP Suite is an automated experimentation framework for the TSP, which aids the researchers in implementing and unit testing their algorithms, running experiments in an automated fashion in parallel or distributed on a cluster, and evaluating the generated log data, resulting in stand-alone reports almost similar to Bachelor’s theses. The work presented in the paper is still preliminary – there is lots of things that need to be improved (in particular in terms of the statistics side) and there are many more issues to explore – but I think we are on a good way.

The abstract of our article is as follows:

The article proposes an experimentation procedure for evaluating and comparing optimization algorithms. It is argued that end-of-run results alone do not give sufficient information about an algorithm’s performance, so the proposed approach analyzes the algorithm’s progress over time. Comparisons of performance curves in diagrams can be formalized by comparing the areas under them. Algorithms can be ranked according to a performance metric. Rankings based on different metrics can then be aggregated into a global ranking, which provides a quick overview of the quality of algorithms in comparison. An open source software framework based on the Traveling Salesman Problem (TSP), called the TSP Suite, is presented as an application of this experimental procedure. The framework can support researchers in implementing and unit testing TSP solvers and running experiments in a parallel and distributed fashion. It also has an evaluator component, which implements the proposed evaluation process and produces detailed reports. Comprehensive experiments have been conducted using the TSP Suite to benchmark several local search and evolutionary computation methods. The results show that the tested pure global optimization algorithms are outperformed by local search. The best results, however, come from hybrid algorithms.

Thomas Weise, Raymond Chiong, Ke Tang, Jörg Lässig, Shigeyoshi Tsutsui, Wenxiang Chen, Zbigniew Michalewicz, and Xin Yao. Benchmarking Optimization Algorithms: An Open Source Framework for the Traveling Salesman Problem. IEEE Computational Intelligence Magazine (CIM), 9(3):40-52, August 2014. doi:10.1109/MCI.2014.2326101

Of course, our work is not the first one on this topic. Some pioneering and outstanding contributions, to name a few, are:

Posted in Evolutionary Computation, Optimization | Tagged , , , , , , | Leave a comment

Using Java’s try-with-resources for Hierarchical and Parallel APIs

During my work on the TSP Suite (soon to appear in IEEE CIM), I came up with an – as I blieve – nice approach to construct hierarchical APIs by (mis-)using Java 7‘s try-with-resources statement and the related AutoCloseable interface. Since I have used this approach in several different locations in my programs and found it quite useful, I want to share it here. Before I do that, please let me point out that I do not know whether this idea already exists somewhere and whether it actually may be an anti-pattern. The complete source code of the examples and example APIs used in this post is provided at its end.

AutoCloseable and try-with-resources

First of all, let me shortly outline what the interface AutoCloseable and the try-with-resources concept are. A program will usually open resources, such as sockets for information exchange via the network or streams to read and write data. Such resources must eventually be closed, once we are finished with them. For example, a FileOutputStream must be closed after we are finished writing our data to a file.

Resources, Exceptions, and try-finally

However, many resources may throw Exceptions: The aforementioned FileOutputStream, for example, may throw an IOException if were are trying to write more data to it than the free capacity of the underlying file system permits. Still, we need to make sure that the resources are properly closed even in case of errors. This can be done with a try-finally-based approach, such as the one illustrated below:

As you can see, we put the calls to the close() methods into finally blocks, which are executed even in case of Exceptions being thrown. However, we need to do this try-finally approach for every resource. There is the danger that we sometimes may forget it. Also it makes the code a bit longer.

Resources, Exceptions, and try-with-resources

Since Java 7, classes representing resource that need to be closed should implement the AutoCloseable interface. (Neatly, AutoCloseable has been introduced as super-interface of Closeable, which is already implemented by many resource classes.) Any implementation of that interface can be embedded in a try-with-resources statement, which will invoke the close()-method automatically. A Java compiler (such as the one of Eclipse) can even be instructed to issue warnings or errors if a resource object is not properly embedded into such a statement. The Java 7 version of the example above now looks like:

This makes the code shorter and less error-prone.

How we can build simple APIs with that: Example

We can use this mechanism to build simple, hierarchical APIs. I will show this based on an example where we can relatively easily see why this could makes sense: XMLStreamWriter.

XML Output with XMLStreamWriter

The XMLStreamWriter is a lightweight output API for XML documents. It is somehow similar to a character stream Writer with additional methods to write XML elements and attributes (or an “inverse” version of SAX, for output, if you want). It is used like this:

Executing the above program will lead to output looking like this (without line breaks and indentation):

As you can see, each writeStartElement() is matched with a writeEndElement() (similar to how resource construction is matched with calls to close() in our previous examples). Elements can be arbitrarily nested, and it is up to the programmer to keep track of method calls to match.

XML Output with a Hierarchical API using try-with-resources

How about this: Let the method for opening an element return an instance of AutoCloseable, which can be handled with the try-with-resources approach. Its (implicitly invoked) close() method would replace the call to writeEndElement(). This would have a couple of advantages, foremost it reduces code size and second, it makes it impossible to mismatch element opening and closing calls, because forgetting or adding a superfluous element closing would result in a missing or excess curly braces (“}“), and the compiler does not appreciate that.

I did a quick and dirty implementation of this idea in the package of the attached sources for this example. The resulting code allows us to re-write the previous example as:

The code has the same behavior, i.e., it directly writes the XML data to the underlying stream and does not construct any fancy data structure in memory. Compared to the XMLStreamWriter, all the explicit calls to element closing methods have disappeared (and became implicitly invoked at the corresponding curly braces). There is one difference to the previous example: Each opened element will be closed, in exactly the order in which the elements are opened, even in case of Exceptions.

Under the hood, there now is an additional internal liked list of the AutoCloseable implementations implicitly managed by the try-with-resources statements. However, since XMLStreamWriter probably also maintains some kind of element stack internally, this does probably not even cause additional memory consumption.


This kind of API has a set of advantages:

  1. It reduces code complexity.
  2. It reduces the probability of coding errors.
  3. It can better represent the structure of method calls, since automated code formatting can add indentation representing the nesting level of API calls. This is not (or at least hardly) possible with separate open/close method calls, as in case of XMLStreamWriter.
  4. It does not necessarily need more memory.
  5. It automatically provides the robustness of try-finally, without the need to add any additional code.
  6. It looks as if you are working on complex objects, but it may actually be maintained by a very lightweight data structures. My XML example API somehow has a DOM-like look and feel, but it actually works like a XMLStreamWriter.
  7. It may be implemented to support parallel processing without much overhead, as I will discuss later in this post.

Fields of Application

I imagine two potential fields of application of this idea: First, for producing clear and simple structured output APIs, like the XML example above.

Hierarchical, Abstract Document Output APIs

In the TSP Suite, for instance, I have a more elaborate abstract API to produce documents, which is implemented for XHTML and LaTeX. If we can do this with XML, then we can also not make an API for general structured text documents, where different opened elements can have different children. A section may have a section body, a section body may contain figures, equations, or text, a text may contain emphasized sub-text, etc. All of this can easily be facilitated by hierarchically nesting different implementations of AutoCloseable, while it is virtually impossible to do this properly and in an easy-to-use style with a single output object (in the style of XMLStreamWriter). I will make a new release of this API for the next version of TSP Suite, but the currently available source code of that project already has a working version.

Hierarchical Builder Pattern

The second field of application is to create more complex builder patterns. The builder pattern allows you to avoid having very complex constructors. Say you have an immutable object with 20 fields. Now you could make a 20-parameter constructor for that object, or a builder class where you can set the fields with separate method calls and, once you are finished setting the fields, obtain the constructed object.

Sometimes, your goal is not just to build a flat data structure with many parameters, but maybe a hierarchical one. As simple example, take bibliography records as used in BibTeX. One such record, say, for a conference paper, may contain a list of authors (each having a personal and a family name), a title and features of the proceedings containing the paper, such as editors, a book title, a location, and a publisher. Creating an object with all of that information may be complex. By using the try-with-resources idea, we can construct hierarchical builders and get code such as:

OK, without context, it may hard to understand or appreciate this example, but the key point is: I want to construct a complex (immutable) data structure with nested immutable objects, and I can use try-with-resources and AutoCloseable to implement a hierarchically nested builder architecture for doing that.

Difference to Closures and Lambda Expressions

There is a certain similarity of this idea to closures. Java has always supported some kind “explicit” closures. In Java, you can define anonymous local classes inside a class or method, which may access fields or variables of the enclosing class. This concept is a very popular way (actually, probably the default method) to implement event listeners for GUI components. A closure is function which can access data of the enclosing entity, exactly as such a local class. In Java 8, Lambda expressions are introduced, an (I think) simplified syntax for a special case of local anonymous classes.

Anonymous local classes and lambda expressions may look a bit to what I suggest here, but they are a different thing: As the name implies, the code pieces go into classes, whereas in my case, the code is just normal code. Also, the code of local classes or lambda expressions is usually passive: it is invoked from an outside activity, say an enclosing method providing it with input, or an event source. This is different from my situation, where normal coded actively uses an API.

Still, the same effects that I list here could probably be realized with closures or lambda expressions. Using these devices may even be more suitable or intuitive to achieve that parallel output that I will discuss in the next section. Yet, it would come at the cost of more (probably many more) anonymous classes being created, whereas my method probably only introduces moderate overhead.

Parallel Single-Document Output

One final and, in my opinion, very interesting argument for using the nested AutoCloseable approach to construct structured APIs, is that it inherently supports parallelism. In case of the bibliography/builder example above, it seems to be quite natural. We can easily imagine that the “page-setting” code could run in a different thread. We can build a complex object in parallel.

But we can even create parallel output to a single document, say an XML document. With that I mean: You can have ten threads which all write to the same, single XML stream in parallel and the final result will still be valid XML, with all output properly ordered. This is a thing the XMLStreamWriter API cannot provide. Invoking startElement() methods in parallel in different threads is a recipe for trouble. Not so in case of an API managed with AutoCloseables.

Basic Idea

The idea is as follows:

  1. Each element of the API (say the XMLElement objects in my example) maintains an internal reference to its output destination, let’s say of the type Writer, to which it writes its output.
  2. Each element furthermore manages a queue of open child elements.
  3. It is allowed to have multiple open child elements. In case of the XML example, you could write to multiple elements in parallel. Every time a new element is opened, it will be added to the end of its parent’s child queue.
  4. When a new child element is opened, two cases are distinguished:
    1. The parent currently has no other open child element. In this case, the internal output destination reference (say, the Writer) of the child element will be set to reference the same Writer as the parent element.

      This is the default situation in the non-parallel scenario (and the previous examples), where each element has at most one open child element. After a child element is closed, the next one can be opened and written to. Exactly as it is the case XMLStreamWriter. All output goes (sequentially) to the same Writer, no additional overhead (except for the single additional field per element) occurs.

    2. However, if another child element is already open, the internal output reference (Writer) of the new child element will be a new, independent buffer in memory, say a CharArrayWriter.

      It is easy to see that all child elements can be written to at the same time and in parallel, since all the data goes to different locations. No synchronization is required for writing, only during the opening and closing phases.

  5. Whenever a child element is closed, the parent’s queue is checked from the start.
    1. If the head of the queue is a closed child element, there are two cases:
      1. If the internal output reference of (the Writer used by) the closed child element at the head of the queue is the same as the one of the parent, then nothing needs to be done.
      2. If he internal Writer of the closed child element at the head of the queue is a buffer, then the contents of this buffer are appended to the parent’s Writer.

      The head of the queue is removed and we continue with the next child in the queue. This sequential processing ensures that the outputs are arranged in the same order in which the elements were opened.

    2. If the head of the queue is an open child element, the processing of the queue is stopped here. (Once that element is closed, the queue will be processed again.)
  6. Of course, you can only close an element after all of its child elements are closed, but this bit of synchronization is not too complex..

I implemented a basic form of hierarchical API using queueing of child elements in a class called org.optimizationBenchmarking.utils.hierarchy.HierarchicalFSM in the example sources. The capability of parallel text output by using the internal references to output destinations or memory buffers is implemented exemplarily in class org.optimizationBenchmarking.utils.hierarchy.HierarchicalText. The classes used in my XML example are based on HierarchicalText. The Bibliography example is a hierarchical builder pattern and does write output to a stream, so it is based on HierarchicalFSM directly.

Use Case: ForkJoinPool

Following this concept, we can create APIs for writing any kind of stuff to a single stream, in parallel. These APIs should have almost no additional costs if you just want to use them sequentially. Moreover, this concept fits quite well to another novelty in Java 7: ForkJoinPool. A ForkJoinPool, as discussed in this tutorial, allows you to submit jobs which, in turn, can submit sub-jobs, and so on.

Each such task could write its output to a different element of the hierarchical API and close that element once its own output is written and its sub-tasks are finished. Apart from this, the programmer herself does not need to deal with parallelism too much. The output to the document will automatically be parallelized, with the level of parallelization depending on the number of available processors. Due to the queue-based sequencing of the output, the resulting documents will look exactly the same as if they were written in a single thread.

This form of parallel output may be quite nice in cases where contents for different parts of the output document require you to do independent, complex calculations. This is the case in our TSP Suite: The output documents contain several statistics and figures which are computed independently from each other. If we can compute and output all of them in parallel, the throughput may increase significantly.

Experiment: Does parallel output provide speedup?

Of course, parallelization causes some overhead. Buffers need to be allocated and re-sized as contents are written to them in parallel. Then we have to pay for using ForkJoinPool and implementing the output “generators” as instances of ForkJoinTask (such as RecursiveActions) instead of simple procedures. (This is similar to using closures directly, but closures would require us to put each nested output scope into a class, whereas we here can nest several try-with-resources statements into one task.) Anyway, we also need to use synchronization whenever a new task is started or one terminates. Finally, the documentation of ForkJoinTask states Tasks should also not perform blocking IO, and should ideally access variables that are completely independent of those accessed by other running tasks. … but we are doing IO.

Thus, I conducted a small experiment to investigate whether the throughput of writing to a sequential document can actually be increased by parallelizing the output with the API implemented exemplarily for XML based on my little HierarchicalText-class.

Experimental Setup

For this purpose, I wrote some code creating random example document structures in memory of fixed sizes s∈{10,100,…,10000} and serialize them using either

  1. Java’s XMLStreamWriter in the default, serial fashion (I will refere to this setup as X
  2. my hacked-together XML API in serial fashion (referred to as S)
  3. my XML API in parallel, using
    ForkJoinPools with the single-parameter constructor for different degrees of parallelism, ranging from 1 to Runtime.getRuntime().availableProcessors() (in my case, 8). I will refer to these setups as P#, where # stands for the number of processors.
  4. my XML API in a parallel fashion, using ForkJoinPools constructed with asyncMode=true. These setups are referred to as A1 to A8.

I measured the performance for two kinds of output destinations: A Writer simply discarding all of its output, i.e., not performing any blocking operations, and a FileWriter writing to a temporary file. Furthermore, I wanted to investigate how the computational requirements to create the output data influence the performance. For this purpose, I simulate a “computation” for any attribute or element text by performing loops of bogus mathematical operations. I did this for loops with d∈{0,10,100,…,10000} iterations.

I tried to follow my advices from my post on measuring runtimes by conducting 500 dry runs before the actual measurement to let the Java JIT compiler do its potential work and by conducting several runs per configuration and several runs per document for each output destination. In particular, for each document, I performed 3 runs and took the shortest measured runtime. Each setting of s and d is repeated like this for 15 different (random) documents and the median runtime is used.

The implementation of this performance test is given in package in the attached sources. The results I have measured on my Ubuntu computer and the OpenJDK 64-Bit Server VM for Java 1.7. The results and figures are contained the above package in the download, as file results.txt. Since actual runtimes are not important, I normalized them in each of the figures. In particular, for each figure and setting of the “computational effort to obtain output data” d, I picked the fastest setup and divided all measures runtimes of all setups by its runtime, for this particular value d.

Experimental Results

Figure 1 illustrates the relative performance of the different settings for the discarding Writer and documents with 10 elements for different values of d. In the case where no computation or only few calculations are necessary to obtain the data to be written, i.e., where the output is directly available, both XMLStreamWriter (X) and my API (S) used in a serial fashion are significantly faster than the parallel versions. However, if more computations become necessary to create the output (as I it would be the case in my TSP Suite), the parallel versions can save up to 25% compared to the serial output. Of course, the setups P1 and A1 cannot do that, as they pay the overhead for using ForkJoinPool without actually being parallel.

The output time requirements of XML APIs for 10-element XML documents to a Writer discarding its output, over different attribute- and element text access times. For small access times, serial output is faster. Parallel output is faster if the computational requirements to obtain the strings to write increases.

Figure 1: The output time requirements of XML APIs for 10-element XML documents to a Writer discarding its output, over different attribute- and element text access times.

Figure 2, illustrating the same output routine for 1000-element documents, confirms this finding, with two differences: First, for these larger documents, parallelism seems to already pay off for small computational requirements to “generate” output data and time savings of 50% seem to be possible. Second, it seems to stop paying off for large computational requirements and all setups except S have the same performance. I have no real idea why this would happen. It may be specific to my example implementation, be caused by some JIT behavior, or by some under-the-hood behavior of ForkJoinPool, maybe related to thread allocation and disposal, or other magic.

The output time requirements of XML APIs for 1000-element XML documents to a Writer discarding its output, over different attribute- and element text access times. For small access times, serial output is faster. Parallel output is faster if the computational requirements to obtain the strings to write increases. Oddly, for very high requirements, this advantage disappears.

Figure 2: The output time requirements of XML APIs for 1000-element XML documents to a Writer discarding its output, over different attribute- and element text access times.

When writing 10-element documents to a temporary file (illustrated in Figure 3), we can observe a behavior similar to when writing to a discarding Writer. The speedups provided by the parallel version seem to be more consistent, which probably results from the more extensive use of buffers in memory.

The output time requirements of XML APIs for 10-element XML documents to a FileWriter, over different attribute- and element text access times. For small access times, serial output is faster. For larger access times, parallel output is faster.

Figure 3: The output time requirements of XML APIs for 10-element XML documents to a FileWriter, over different attribute- and element text access times.

For 1000-element documents (see Figure 4), we can obtain nice speedups and may sometimes be able to write a document in half of the time. (Again, with the same odd behavior of performance collapse for large computations.)

The output time requirements of XML APIs for 1000-element XML documents to FileWriter, over different attribute- and element text access times. Parallel output is faster, except for large documents, where all experiments behave roughly the same.

Figure 4: The output time requirements of XML APIs for 1000-element XML documents to FileWriter, over different attribute- and element text access times.

All in all, we can find the parallel output to the same underlying stream may, at least in some cases, provide a significant speedup. Of course, we have to pay for synchronization and other overhead when using ForkJoinPool, plus we violate its contract by actually doing blocking I/O. As a result, the speedup is not equal to the number of used processors.

Then again, my API was hacked together and the way I create random documents may be biased in some way. It should furthermore be noted that I did not do too many repetitions for statistical sound evaluation. On a different machine, you may get entirely different results. Also, my API does not support anything fancy, which is likely the reason why it is a bit faster than XMLStreamWriter if used in a serial way. So these results here are just something inbetween a sanity test and a rough proof-of-concept.


In this post, I have outlined a concept for hierarchical APIs based on Java 7′s try-with-resources and AutoCloseable interface. I have argued that, at least to me, it seems to be robust, both in terms of reducing potential sources of coding errors as well as in the presence of exceptions being thrown. Furthermore, without much effort, it can be used to parallelize output to sequential streams, which seems to provide a performance improvement – at least in some preliminary experiments. If implemented properly, I could even imagine that this could speed up the way a web- or application server creates single-stream contents.

Of course, I am not a software engineer. I know nothing about frameworks out there, I know nothing about fancy concepts and patterns. So this idea may actually be an old hat. Or discouraged by experts for some reason. Anyway, I think it is cute.

Example Source Code

The current version of the source codes belonging to this article are contained in this ZIP archive. I put them under the LGPL version 3 license. I do not resume any responsibility of whether they may work or not. This holds, for instance, for my XML output API – it is just an example and not intended for any productive application at all.

The source archive contains two root packages:

  1. org.optimizationBenchmarking.utils is the utility package for the next version of TSP Suite. It contains examples for potential base classes to build a hierarchical API using AutoCloseable and try-with-resources in sub-package hierarchy, my example XML API in sub-package io.xml, and the example bibliography API in sub-package bibliography.
  2. Package examples holds examples for everything we have discussed above. If you are looking for examples of a class you found in package org.optimizationBenchmarking.utils.x.y, then you will find it in package, if there are any.

The source archive also contains a heap of other utility classes, which, in one way or another, are used by my code. You may ignore them.


Posted in Programming | Tagged , , , , | Leave a comment

Deadline Extension for SSCI’14 (and our special session)

Good news for everyone who was a bit late to submit their paper to SSCI 2014: The deadline has been extended to July 15, 2014. The same, of course, automatically holds for our Special Session on Benchmarking and Testing for Production and Logistic Optimization discussed on this blog. Its CfP has already been updated. We are looking forward to receive your submissions ^_^

Posted in Conferences, Evolutionary Computation | Tagged , , | Leave a comment

Towards a figure-like LaTeX construct that can break over multiple pages and works in double-column documents

I am working on a tool (TSPSuite) for analyzing optimization algorithms on the Traveling Salesman Problem. TSPSuite automatically generates reports including the comparison of the performance of different algorithms. These reports can be in output in the LaTeX language. In the reports, there are automatically-generated figures which contain a lot of automatically-generated sub-figures. And here the problem begins: TSPSuite may generate more rows of sub-figures for a single figure than what fits on a page. Floating objects (such as figures) in LaTeX cannot include page breaks. Thus, the hosting figure needs to be split into several figures, i.e., a “figure series”. But where do I split it? LaTeX has no automatic approach for that, so I need to code it inside my TSPSuite tool. However, since sub-figures and figures have captions, it becomes extremely complicated to calculate sizes outside of LaTeX. Thus, I sometimes end up with figures which are not nicely divided. Another problem is that LaTeX has a limit for the number of pending, not-yet-laid-out floating objects and this limit is relatively low (18, I believe). Thus, if my figure series gets too long, it may cause errors during compilation of the report with LaTeX. In other words, my TSPSuite needs to sporadically insert \clearpages or \FloatBarriers, which mess up the layout even further. All in all, what I need is something that:

  1. has the look-and-feel of a figure environment, but
  2. can break over several pages, and
  3. ideally float, and
  4. work in both single- and double-column documents (since quite a few computer science journals and conferences use double-column classes.

These requirements are, to the best of my knowledge, hard and borderline impossible to fulfill, as they arose a few times in internet communities without satisfying answer: 1, 2, and 3. Especially the last one, the requirement to work in double-column setups, turned out to be tricky.

How to make this (almost) work

Nevertheless, I think I made a small step towards solving these issues, although it does not yet work well. I want to document it here for future use and thus put together a LaTeX package. Also, by publishing the current state of my idea here, maybe I can get some feedback how to correct it.

My idea is as follows:

Something like a figure

LaTeX’ figures and floats cannot break. So let us first find something like a figure that can break, and later make it float. A figure is basically an environment that has a figure caption and a body. With the \captionof command from package caption, we can create a figure caption wherever it pleases us. So we just need the figure body, and for this a simple \begin{center}...\end{center} would do. We could put a lot of stuff into that, and our new “figure series” environment would nicely break over several pages (since it is not a floating object). Let’s say we have created an environment like this. We cannot really make it float, but we can “emulate” floating behavior by putting it into an \afterpage{...} call (from package afterpage). \afterpage will take its contents and render it at the top of the next page. Until this happens, the contents coming after the \afterpage (such as text following our figure series) command will be laid out. Of course, the contents of our environment, after starting on the next page, could then span over several following pages.

Sub-figures inside

Now we want to put sub-figures into our figure series body. There are several packages that can do this, but the subcaption package seems to be the state-of-the-art. It provides the command \subcaptionbox that takes a caption and a (figure) content as parameter. This is exactly what we want. We now can nicely put some \subcaptionboxs in a row and terminate this row with \par. In order to get an automatic and nice spacing between the sub-figures of a row, we can insert \strut\hfill\struts between them. Now we have an almost-floating environment into which we can insert arbitrarily many rows of sub-figures and which nicely breaks over several pages.

The Double-Column Problem

This would all be nice if there were not document classes which have two columns, such as IEEE’s IEEEtran and ACM’s sig-alternate. Putting our environment, as is, into a double-spaced document could mean that the figure series begins in the middle of the left column, fills this column with sub-figures, breaks over to the right column, and continues filling this column with sub-figures until its middle. That would look extremely odd. A figure series will usually contain many sub-figures and should thus span over the whole page width. We can achieve this by temporarily switching double-column documents to single-column ones by using the command \onecolumn and later, after our series, switch back to double-column mode using the command \twocolumn.

Page Breaks induced by \onecolumn and \twocolumn

The problem is that these two commands will always create a page break before switching the column mode. For the \onecolumn command, this is not a big deal. We can always let our figure series float using \afterpage. It would begin on the next page and since there is necessarily is a page break between the current page and the next page, the page break created by \onecolumn does not change anything.

\afterpage in Double-Column Mode

The \afterpage command behaves slightly different in double-column mode than in single-column mode. More precisely, in double-column mode, if called in the first column, it prints its contents at the start of the next (second, right) column instead on the next page. If called in the right column, it prints its stuff at the start of the left column on the next page. We can solve this minor issue by simply wrapping our figure series into two \afterpages if we are in the left column (which we can detect via \if@firstcolumn.

The Page Break after \twocolumn

While we could defuse the page break after the \onecolumn command, the one after \twocolumn poses a much more severe problem. This page break will potentially leave a larger fraction of the page blank, which does not look nice. We cannot \afterpage it away, since then the document’s main text would be set in single-column mode until the next page where it would be typeset in double-column again. One simple trick is to \let\clearpage\relax, i.e., to set the \clearpage command (used inside \twocolumn) to do nothing before \twocolumn and to restore it afterwards. However, this will break the layout completely: The text in the left column after our figure series will start nicely below the figure series. The text in the right column, however, will start at the top of the page — right inside the space occupied by the sub-figures!

Restoring Occupied Space in the Right Column

So we can either have an (potentially) almost blank page or text written over our figures. One way out of this may be as follows: Assume that we can somehow find out where exactly (vertically speaking) our figure series ends. If we know this “height” occupied by our figure series, then we could create an object at the top of the right column occupying the same amount of space. This object could be invisible, say a \vspace. Then, the text in the right column would start at the same vertical offset as the text in the left column, i.e., right under our figure.

\pagetotal holds how much space the current contents on a page occupy, if I understand correctly. I store the value of \pagetotal in a variable right before we do the \twocolumn (with the defused \clearpage). After \twocolumn, I need to occupy that much space at the top of the right column. Since I will be in the left column after \twocolumn, I can simply put a corresponding \vspace into an \afterpage. This turns out to do the trick quite well, although not perfect: The two columns now often start at the same vertical offset, but sometimes there are little visible deviations. I assume these come from some automatic space distribution done by LaTeX. Here I need more understanding to fix this.

Merging Neighboring Figure Series

Another issue to consider is that two figure series may be adjacent to each other. In this case, odd things may or may not happen. I think it is best to combine such neighboring figure series, i.e., to put them into a single, common pair of \onecolumn\twocolumn. Of course they should retain their own captions and be separate objects, just grouped together. This can be achieved by storing the body of the figure series in a special macro which initially is empty. The macro is then rendered when the figure series is actually laid out, i.e., when the body between \onecolumn\twocolumn of the corresponding \afterpage is executed. If a figure series begins and detects that our special macro is not empty, it simply attaches the body of the new series to the existing macro, by using \g@addto@macro. Right when the body of a figure series is laid out, we set the macro back to empty. This way, all figure series bodies of figure series issued on the same page will automatically be merged and the available space is used efficiently.

Lonely Captions

This can have one drawback, though. First, if we have a multi-page figure series, the captions should be at its top. Normally figure captions are below the figures, but that would mean you may encounter a page of subfigures without a main caption. So we put the caption at the top. If multiple figure series are merged, it may happen that the caption of second or third series occurs right at the bottom of the page, followed by a normal page break, before its subfigures begin. This can be solved by not rendering the caption directly, but to render it inside a \parbox together with the first row of sub-figures. For this purpose, we can store it in a special macro.

The Package

I put all of the above together with several examples and documentation into a package and called “figureSeries”. I want to use it for my own purposes, so for me and for now, it is more or less acceptable, although it is not perfect (as said, the spacing in two-column mode is still a tad bit off). I hope that I can resolve these issues in the future (and am welcoming any suggestions). I also asked for feedback in the stackexchange forum. Once I am fully satisfied with the package, I will submit it to ctan (and I will update this post as well).


Until then, you can download the preliminary version of the package here.

Posted in Programming | Tagged , , | Leave a comment

Why research in Computational Intelligence should be less inspired

The inspiration gleaned from observing nature has led to several important advances in the field of optimization. It seems to me that currently a lot of work is mainly based on such inspiration alone. This might divert attention away from practical and algorithmic concerns. As a result, there is a growing number of specialized terminologies used in the field of Evolutionary Computation (EC) and Swarm Intelligence (SI), which I consider as a problem for clarity in research. In this blog post, I will try to formulate my thoughts. Maybe this can even help to start start a, as I believe necessary, debate.

Origin and Terminology

Before getting to the core, let ume first exemplarily summarize (quickly and coarsely) the terminology and origin surrounding two algorithms, one from the area of EC and one from SI.

Genetic and Evolutionary Algorithms

The field of EC more or less originated with research on Genetic Algorithms (GAs). GAs are usually referred to as metaheuristics that mimic the process of natural evolution in order to solve optimization problems.

In the original GA, candidate solutions (“phenotypes“) are encoded as bit strings of a fixed length, called genotypes in correspondence to the DNA in nature. If the candidate solutions are something else than bit strings, a genotype-phenotype mapping takes place, a left-total binary relation from genotypes to phenotypes. A tuple of genotype and corresponding phenotype is usually called an individual. A list of individuals (the population) is maintained and refined in an iterative procedure (where each iteration is called a generation). The objective function is usually called fitness function and originally was subject to maximization. In the main loop of the algorithm, those strings that stand for candidate solutions of higher quality (with better objective values, i.e., higher fitness) are selected with a higher probability and those that represent bad solutions are (more likely) discarded. The selected individuals enter the so-called mating pool and are allowed to produce offsprings. These new solutions are derived by either creating modified copies of single strings (“mutation“) or by combining two strings to a new one (“crossover“, “recombination”). Both search operators are usually randomized. This way, the list of solutions is filled again and the next iteration begins. This process is repeated in a loop until, e.g., a time limit is reached or sufficiently good solutions are found.

We can replace the bit string genotypes with basically any genotype data structure in this algorithmic paradigm. This generalized algorithm from is then called Evolutionary Algorithm (EA).

Ant Colony Optimization

Ant Colony Optimization (ACO) is one of the most well-known SI techniques. It is solves problems that can be mapped to finding paths in graphs with weighted edges by mimicking the path-finding behavior of ants. Examples for such problems are Traveling Salesman Problems (TSPs) and a variety of other Vehicle Routing Problems (VRPs).

The problem solving is done by repetitively constructing paths (candidate solutions) by letting simulated ants walk through the graph. An ant starts its journey at a given node and probabilistically decides which node to visit next. This decision is influenced by two factors:

  1. the cost of going to the nodes, i.e., the weights of the edge from the current location to them, and
  2. the amount of simulated pheromone on the edges to the node.

The ant iteratively chooses where to go until it the path is completed. Several ants do this in one turn. The simulated pheromones are usually maintained as floating point numbers in a matrix. Usually the ants producing the best-rated paths will update the pheromone matrix by increasing the pheromones along the edges they have travelled. This, in turn, increases the probability that subsequent ants make similar decisions. At the same time, existing pheromones are reduced by evaporation. Then the process is repeated and new ants walk through the graph.

My Criticism

Two processes from nature have been successfully emulated as algorithms. These algorithms turned out to be quite useful in solving hard optimization problems. Although this itself is good, I believe that over the years, four problematic issues have arisen from this situation.

1. Incompatible Terminology and Redundancy

Each “inspired” optimization method has its own terminology. This makes it more complicated to discuss these methods without learning a lot of different vocabularies. Moreover, different vocabularies often use different words for the same thing, i.e., there are quite a lot of synonyms. A candidate solution in an EA called phenotype (or sometimes individual). A point in the search space is called genotype in EAs, in ACO it is called ant or path, in Particle Swarm Optimization (PSO) it is called particle. Depending on the algorithm usage, ants and particles may also represent candidate solutions (which is the normal case).

It seems to me that sometimes, the origins of optimization methods are valued over their algorithmic structures. Then, instead of the abstract computer science perspective, it is the inspirational source that dictates the terminology.

This can cause confusion and inconsistencies. For example, I personally have the opinion that ACO basically is a special Estimation of Distribution Algorithm (EDA). An EDA is an optimization method that tries to build and update a probabilistic model of what good solutions look like. In each iteration, it will sample this model to generate new candidate solutions. The best generated candidate solutions will then be used to update the model.

I think that a pheromone matrix as used in ACO could be considered as such a model. The process of letting the ants find new paths guided by the pheromone matrix (the model) then is nothing else than model sampling. Pheromone evaporation and update then are the model update. This similarity, to the best of my knowledge, is not widely recognized or discussed — and I believe that this is because of the “language barrier”.

One may also argue that the language used in EDAs is clearer and more precise than the biological metaphor used in ACO. A language focused on the statistical, mathematical, and theoretical features of an algorithm automatically invites researchers to test more complex stochastic concepts, e.g., more complex model building and sampling approaches, which are entirely unrelated to biological processes and not so easy to get to when starting from biological idea.

If ACO would indeed be a special form of EDAs, then having two distinct terminologies for essential the same thing would potentially impose a problem for researchers when looking for related work. If an improvement was discovered for ACOs, would it not be harder for researchers in the field of EDAs to see the relationship to their own work? Some research might even be carried out in duplication.

Now take a look on the Wikipedia pages for metaheuristics from February 2013. As you can see, at that time more than 50 different algorithms inspired by different natural phenomena were listed there and this is just a subset of the existing ones. How many different vocabularies will this entail? I fear that it becomes quite hard to see whether a new idea one is working on is truly new or already exists in an algorithmically identical form under a name that cannot be inferred from the algorithmic properties (such as Snapping Turtle Herding algorithm*).

2. Dissonance between Terminology and Reality

I furthermore perceive a dissonance between the inspiration based on which an algorithm was created and its actual (feasible) working version, as well as between the vocabularies in EC and SI and their actual meaning.

Obviously, it is neither the goal of GAs to simulate evolution nor is it the goal of ACO to actually simulate ant behavior or stigmergy in any realistic way. Instead, the goal is to solve optimization problems. This leads to the fact that any part of the simulation of the natural process will be abandoned if it is not useful for optimization or simply too complicate to implement.

A historical example for when the aim of modelling nature closely does not help in optimization is the original selection method used in in GAs, Fitness Proportional Selection. Here, the expected number of offsprings of an individual is proportional to its fitness. This sort of resembles the ideas of fitness in nature and natural selection. It turned out to not be a good idea in optimization, though, having many drawbacks. The algorithm will produce different results if, e.g., a constant offset is added to the fitness function, which does not change the nature of the optimization problem. Usually, Fitness Proportional Selection is unsuitable and researchers often use a less nature-like algorithm to get better results.

We can also look at this from a top-down perspective: Natural evolution is generally considered to be undirected whereas GAs have a clear direction (minimization or maximization of the objective function). The original GAs do not simulate concepts such as geography, environment, population or social structure, age, time, bio-chemical, genetic, or molecular processes (some of these aspects have, separately, been considered in later research, though). In ACO, all ants execute an identical behavior, all ants are the same (no queens, workers, etc.). Both ACO and GAs proceed in distinct iterations which solely depend on the state at the end of the previous iterations. These diversions from the natural paragon are necessary in order to achieve good results and to avoid making the algorithms too complex.

If we look closely, we will furthermore find that industrial-level and state-of-the-art EC methods do not only omit simulating phenomena from nature that are not useful, they even introduce algorithmic concepts that have no relationship to evolution at all. They do so because good results are not enough, excellent results are sought for.

For several problems, the best EAs are Memetic Algorithms (MAs), which incorporate local search methods. (The word meme is, I believe, another example for unnecessary inspiration: the term hybrid instead would be much clearer.) Often, data structures such as permutations, real vectors, or trees are used to represent candidate solutions, which cannot be understood as equivalents of DNA anymore. One of the best EC approaches for numerical problems is the Covariance Matrix Adaptation Evolution Strategy (CMA-ES). CMA-ES uses an approximated co-variance matrix to guide its search procedure. Sharing and niching penalize solutions in the population which are too similar to each other by changing their fitness in order to prevent an EA from prematurely converging to locally optimal solutions. EAs for multi-objective optimization use methods such as Pareto Ranking or hypervolume metrics which can hardly related to evolution. Several Evolution strategy algorithms encode not only the candidate solutions themselves but also the step widths to be used to create new solutions from them inside the individual records, i.e., they store a direction for mutation inside the individual. ACOs are often modified to have minimum/maximum pheromone limits or “elitist ants” in order to achieve better results. The Population-based ACO (PACO), one of the best ACO methods, maintains a set of ants to define the pheromone instead of a pheromone matrix. None of these things have obvious counterparts in nature.

This raises questions such as: Why should we describe an algorithm that has a list of λ permutations, selects the best μ of them as input for unary and binary search operations in order to generate the next λ permutations in a way any different from what I just wrote? If there is a divergence between the original biological meaning of a word and its meaning in our algorithm description, is it good to use it? May it not rather confuse beginners who are new to the subject and may make it harder for them to understand why the algorithm actually works. Maybe it makes them even think that it is good to closely simulate nature to solve an optimization problem?

3. Inspiration by Nature does not mean Good Optimization!

That would be the next problem. Inspiration from nature alone does not necessarily equal good optimization. Why do we not encounter “pure” GAs, ACO, or PSO in practical applications or as winners of optimization competitions? Because plain EAs or GAs can solve optimization problems to a certain degree, but often lose even to simple local search methods when compared to them thoroughly. Plain EAs are not really good optimization algorithms. It seems that in order to utilize their full power, they must be modified, hybridized, and extended with problem-specific knowledge. In other words, the natural inspiration only carries so-and-so far in optimization.

And this makes sense: It is not the main goal of natural evolution to produce perfectly adapted species. Evolution has no goal. Species will survive if they are adapted sufficiently well, not necessarily perfectly well. It is not the goal of bees to find the perfect shortest round trip starting in the nest, visiting 20 flowers, and returning back. A sufficiently short trip discovered in reasonable time will do. Otherwise, this would mean that biological processes could solve NP-hard problems in sub-exponential time — and although this would be awesome, it seems unlikely. If it worked, it could be better to let bees solve large TSPs instead of computers!

There is no reason to assume that being inspired by nature is enough to obtain better optimization methods than purely mathematical concerns. I therefore think that teaching the “inspired-by-nature” theme to many PhD students may pull research into a wrong direction. Instead of focusing on the algorithmic and computer science perspective on metaheuristcs (we have a set of solutions which are refined with search operators), much good research energy and wit may be spent in trying to develop algorithms inspired by baboons, squid, and the mating behavior of little crakes*. It is probably not a good idea to just simulate an arbitrary process in nature (that has not yet been used by anyone else) in order to obtain a optimization method. This may actually hurt the EC and SI research community in the long term.

4. It sounds unprofessional

And this hurting begins at the point where it makes research sounds unprofessional. People in the industry are probably willing to apply algorithms that seem to be mathematically sound, rigorously researched, and professionally presented. But even years of thorough research may become useless in the moment an interested engineer sees “Squid Mating Algorithm”* on the very first slide of a presentation, because he may just switch into mental standby for the rest of the time.

What can we do?

No criticism should go without some idea on how to do it better. Yet, I have to admit, that I do not have really good ideas to fix the situation, just some minor thoughts.

Using Inspiration-Neutral Terminology

If a terminology has become folklore, it becomes very hard to go against it. If publishing a paper at an EC-related conference, one has to use the corresponding vocabulary to some degree. However, there are subtle changes to the wording that can help to improve clarity. Some more general, formal terms can be used to describe our works. The introduction part of the current version of the Wikipedia page on PSO is doing very well on using an abstract terminology, for example.

Here some inspiration-neutral terms, most of which are actually quite common and frequently used. Maybe these could be used more consciously and more often.

Nature-Inspired TermPotential Alternative(s)
GenomeSearch Space
GenotypePoint in Search Space
Phenome?Solution Space
PhenotypesCandidate Solution, Solution, Point in Solution Space
SelectionSelection (term is inspiration-neutral)
Creation?Nullary Search Operation
MutationUnary Search Operation
Crossover, RecombinationBinary Search Operation
Ant (in ACO)Depending on situation: Point in search space or candidate solution
Pheromone MatrixModel (à la EDA)
Genotype-Phenotype Mapping?
Populationterm may be inspiration-neutral?
Memetic AlgorithmHybrid Algorithm
Epistasisnon-separability or variable interaction

For some terms like genotype-phenotype, population, and individual, I don’t know any good “inspiration-free” replacements (yet).

Clear Teaching

I admit using the bio-inspired terminology in my own teaching. As a teacher, I need to make the connection to literature in one way or another. However, GAs are not the first subject in my course. It instead begins by introducing the components of metaheuristic optimization in a way that is formal and as “inspiration-free” as possible, by often using the terms listed above.

Clear Algorithm Descriptions in Research

While we can change our own wording and notations to make our own work clearer, we may also influence others to do so as well. There are two “access points” that come to mind: When advising research students (research master students, PhD students, …) and when reviewing papers.

We can make clear that although inspiration may be drawn from whatever source, in the end, there must a formal algorithm, described in a clear and formal way. This entails discouraging the use (and in particular, the invention) of specialized vocabulary. Algorithms should be described short and precisely, ideally with meaningful pseudo code.

Proper Benchmarking

As same as important as a clear description what an algorithm is doing is a clear assessment of its results. This means:

Benchmark Problems

Benchmarking is done on well-known instances of well-known benchmarking problems. If a new algorithm is developed, results should be reported which are meaningful to a large share of the research community. For example, most researchers may either already know what the Traveling Salesman Problem is or can find that out quickly. They can google “Traveling Salesman Problem Benchmark” and will immediately find TSPLib, a benchmark dataset with known optimal results. If a new algorithm is applied to the TSPLib instances, it is extremely easy to get a quick impression on how good the results are and to find literature about the state-of-the-art or potentially related works. If the new algorithm is applied only to niche problems, it will be significantly harder to judge whether it is good or not.

Fair Comparison

Of course, for most such well-known problems like the TSP, one will not be able to beat the state-of-the-art which was developed during years of research. Although this must be the aim and the comparison is necessary, for a conference paper it may be satisfying if related metaheuristics are significantly outperformed. This requires a fair comparison. If a new algorithm is compared with a GA, then the GA should use the same search operators and be hybridized with the same local search (if any). It should also be granted the same amount of function evaluations and if the new algorithm is population-based, the GA should have the same population settings.

If the setup of the new algorithm has been determined with a smaller preliminary experiment testing several configurations, then a similar experiment should also be carried out for the GA to find its best setup as well. A new algorithm for the TSP representing solutions as permutations should not be compared with a GA using an unsuitable representation (say bit strings) for its solutions (and hence different search operators), because it would be unfair.


There is a set of things that bother me in the context of research on EC, especially with regard to the used terminology. Actually, in the past, I myself have often done several of the things that I criticize here. However, after realizing these issues, I will try to avoid them in the future and I hope this text can contribute to inspire others to do the same.

* Algorithm names are entirely fictional and chosen as mild polemic. Any potential resemblance with existing methods is unintended and by accident.

Posted in Evolutionary Computation, Optimization | Tagged , , , , , , | 4 Comments