Meeting of the German Research Network China on 2014-09-11 in Shanghai

Yesterday I attended the meeting of the German Research Network in the residency of the German General Consul in Shanghai in Yongfu Lu 151, Shanghai [上海市永福路151号]. The meeting was opened by the new Consul for Cultural and Press Affairs, as well as Scientific Cooperation, Marcel Viëtor. The main event was a presentation given by Nobel laureate, Prof. Dr. Erwin Neher, on the topic of discovery of Ion Channels. This presentation was remarkable in that I, as someone with little understanding of Biology, could follow and understand it. Thus, I want to outline it below.

Prof. Neher started with the history of the field, beginning with early experiments regarding electricity in the body of animals with, e.g., frog legs. He then outlined the remarkable contributions of Ramón y Cajal in the early 1900s, who already provided precise sketches of nerve cells comprising our brain. In the 1860s, Julius Bernstein discovered electric action potentials in muscles and formulated his membrane theory in 1902. In the 1950s, Hodgkin and Huxley discovered the nerve impulse. The membrane of a nerve cell becomes permeable for Na+ and K+ ions during the action potential change. In the late 1960s, ion canals were introduced into artificial membranes to experiment with such effects.

Prof. Neher wanted to investigate whether this was also the case in natural cells, or whether other mechanisms were responsible for their dynamic electrical behavior. The main problem here was that this requires measuring the electrical flow of such a cell at a resolution of 1 pico ampere, but the most exact measuring devices at that time were way to imprecise for that, as they exhibited noise at a much higher intensity. One of the most important contributions of Prof. Neher and his team was the development of a more precise method to measure electrical current in cells, the Patch Clamp Technique. Different from poking a small electrode into a cell, in this method a part of the cell’s membrane is sucked into a glass pipette. The advantage here is that there is little or no water between the measurement electrode and the cell, since such liquid is responsible for much of the noise in the measurement data. With the new, more precise measurement, it became possible to show that proteins in the cell membrane act as ion channels that can open and close.

This discovery led to several subsequent surprising findings. For instance, ion channel proteins fulfill important tasks in many different cell types in our body, ranging from nerve cells, heart cells, lymph- and liver cells, to light receptors in the eyes and even cells in plants. It turned out that many medical substances actually dock at the ion channels to achieve their respective effects, by, e.g., blocking them (as is the case in local anesthetics). Knowing the importance of ion channels, the third surprise is maybe not so surprising: Defects in the ion channels can lead to many diseases. This is the case in cardiac dysrhythmia, epilepsy, type-II-diabetes, cystic fibrosis, and myasthenia gravis. Defects in the ion channels can be caused by defects in the genetic material which, if interpreted, produces incorrectly shaped or otherwise faulty channel proteins.

Finally, ion channels seem to also be responsible for our temperature sensation. There are channel proteins which get active at different temperature ranges, some which become permeable at low temperatures, some permit ions to pass at high temperatures. These channels can also be activated by certain chemicals. Interestingly, the channels corresponding to low temperatures can be activated by mint and menthol chemicals while those corresponding to high temperatures are triggered by capsaicin, a chemical common in many spices. This would be a simple explanation why spicy food feels hot and why menthol is perceived as cold.

After a short and interesting discussion following Prof. Neher’s talk, the meeting concluded (and I had to catch my train back to Hefei).

This meeting was quite nice. It was my first chance to see a Nobel laureate and it was truly impressive. Also the ambient was great. The residency of the General Consul is a beautiful building near the city center, built in the 1940s and recently restored. I took a few snapshots before the meeting and attached them below.

Home » Meeting of the German Research Network China on 2014-09-11 in Shanghai » Conferences » Meeting of the German Research Network China in September 2014 in Shanghai
Front Gate of the Residency of the German Consul General in Shanghai
The Front Gate of the Residency of the German Consul General in Shanghai at Yongfu Lu 151, Shanghai [永福路151号, 上海], taken on 2014-09-10.
Heritage architecture plaque at the Front Gate of the Residency of the German Consul General in Shanghai
Heritage architecture plaque at the Front Gate of the Residency of the German Consul General in Shanghai at Yongfu Lu 151, Shanghai [永福路151号, 上海], taken on 2014-09-10.
Entrance of the Residency Building of the German Consul General in Shanghai
The Entrance of the Residency Building of the German Consul General in Shanghai at Yongfu Lu 151, Shanghai [永福路151号, 上海], taken on 2014-09-10.
Residency Building of the German Consul General in Shanghai
Residency Building of the German Consul General in Shanghai at Yongfu Lu 151, Shanghai [永福路151号, 上海], taken on 2014-09-10. The building has recently be reconstructed, but still keeps its historical windows, design, and structure.
Residency Building of the German Consul General in Shanghai
Residency Building of the German Consul General in Shanghai at Yongfu Lu 151, Shanghai [永福路151号, 上海], taken on 2014-09-10. The building has recently be reconstructed, but still keeps its historical windows, design, and structure.
Residency Building of the German Consul General in Shanghai
Residency Building of the German Consul General in Shanghai at Yongfu Lu 151, Shanghai [永福路151号, 上海], taken on 2014-09-10. The building has recently be reconstructed, but still keeps its historical windows, design, and structure.
Residency Building of the German Consul General in Shanghai
Residency Building of the German Consul General in Shanghai at Yongfu Lu 151, Shanghai [永福路151号, 上海], taken on 2014-09-10. The building has recently be reconstructed, but still keeps its historical windows, design, and structure.
Small Park in the Residency of the German Consul General in Shanghai
A small park in the Residency of the German Consul General in Shanghai at Yongfu Lu 151, Shanghai [永福路151号, 上海], taken on 2014-09-10.
Small Park in the Residency of the German Consul General in Shanghai
A small park in the Residency of the German Consul General in Shanghai at Yongfu Lu 151, Shanghai [永福路151号, 上海], taken on 2014-09-10.
Small Park in the Residency of the German Consul General in Shanghai
A small park in the Residency of the German Consul General in Shanghai at Yongfu Lu 151, Shanghai [永福路151号, 上海], taken on 2014-09-10.
Nice Window of the Residency Building of the German Consul General in Shanghai
A nice window of the Residency Building of the German Consul General in Shanghai at Yongfu Lu 151, Shanghai [永福路151号, 上海], taken on 2014-09-10. The building has recently be reconstructed, but still keeps its historical windows, design, and structure.
Nice Window of the Residency Building of the German Consul General in Shanghai
A nice window of the Residency Building of the German Consul General in Shanghai at Yongfu Lu 151, Shanghai [永福路151号, 上海], taken on 2014-09-10. The building has recently be reconstructed, but still keeps its historical windows, design, and structure.
Residency Building of the German Consul General in Shanghai
Residency Building of the German Consul General in Shanghai at Yongfu Lu 151, Shanghai [永福路151号, 上海], taken on 2014-09-10. The building has recently be reconstructed, but still keeps its historical windows, design, and structure.
Nice Window of the Residency Building of the German Consul General in Shanghai
A nice window of the Residency Building of the German Consul General in Shanghai at Yongfu Lu 151, Shanghai [永福路151号, 上海], taken on 2014-09-10. The building has recently be reconstructed, but still keeps its historical windows, design, and structure.
Meeting Room in the Residency Building of the German Consul General in Shanghai
A meeting room in the Residency Building of the German Consul General in Shanghai at Yongfu Lu 151, Shanghai [永福路151号, 上海], where the meeting of the German Research Network on 2014-09-10 took place. The building has recently be reconstructed, but still keeps its historical windows, design, and structure.

New Article: An Open Source Framework for the Traveling Salesman Problem

I am a researcher in the field of (metaheuristic) optimization, which means that I am developing and analyzing optimization algorithms, methods to find good solutions for hard problems. A hard problem is one for which we cannot obtain the optimal (best possible) solution within feasible time. Instead, heuristic and metaheuristic algorithms are used to obtain acceptably good solutions within an acceptable runtime. The goal of research in optimization is to find the best such methods for given problems or problem classes. This, of course, entails that we somehow need to compare optimization algorithms, which turns out to actually be quite complicated. Below I list some of the issues:

1. For once, finding good optimization algorithms is actually a multi-objective problem, since we want algorithms, which can find solutions that are as good as possible as fast as possible.
2. Second, many optimization algorithms are actually anytime algorithms: They have an “approximate” solution any time during their execution and as time progresses, the approximation quality should improve. An Evolutionary Algorithm (EA), for instance, starts with a set of randomly created (bad) solutions and refines them iteratively, (hopefully) constructing better and better solutions. The same is true for local search algorithms. Branch and Bound algorithms are anytime algorithms too. Per definition, an anytime algorithm has not just a single, final result, but creates many solutions on its path and can be run either for a short or a very, very long time. If there are no “final results”, comparing two algorithms means we somehow need to compare progress over runtime.
3. This leads to the third problem: How can we define runtime? Is it the number of created candidate solutions? Or the actual CPU time which has elapsed. None of these two methods is really satisfying, an issue I have discussed already in an older post.
4. Fourth, comparison must be done in a statistically sound way, which means that experiments need to be repeated for, say, 30 times, robust statistics must be calculated, and several different benchmark problem instances must be used.
5. Fifth, several different statistics can be computed based on the gathered data. It is necessary to somehow aggregate them into meaningful conclusions.
6. All of that, running experiments several times on several benchmark instances and evaluating the large amount of generated data, drawing figures, etc. is quite a lot of (cumbersome) work. It needs tool support.

In the current issue (August 2014) of the IEEE Computational Intelligence Magazine (CIM), there will be a paper titled “Benchmarking Optimization Algorithms: An Open Source Framework for the Traveling Salesman Problem.” In this paper, my colleagues and me present some ideas to tackle the problems listed above. Our TSP Suite is an automated experimentation framework for the TSP, which aids the researchers in implementing and unit testing their algorithms, running experiments in an automated fashion in parallel or distributed on a cluster, and evaluating the generated log data, resulting in stand-alone reports almost similar to Bachelor’s theses. The work presented in the paper is still preliminary – there is lots of things that need to be improved (in particular in terms of the statistics side) and there are many more issues to explore – but I think we are on a good way.

The abstract of our article is as follows:

The article proposes an experimentation procedure for evaluating and comparing optimization algorithms. It is argued that end-of-run results alone do not give sufficient information about an algorithm’s performance, so the proposed approach analyzes the algorithm’s progress over time. Comparisons of performance curves in diagrams can be formalized by comparing the areas under them. Algorithms can be ranked according to a performance metric. Rankings based on different metrics can then be aggregated into a global ranking, which provides a quick overview of the quality of algorithms in comparison. An open source software framework based on the Traveling Salesman Problem (TSP), called the TSP Suite, is presented as an application of this experimental procedure. The framework can support researchers in implementing and unit testing TSP solvers and running experiments in a parallel and distributed fashion. It also has an evaluator component, which implements the proposed evaluation process and produces detailed reports. Comprehensive experiments have been conducted using the TSP Suite to benchmark several local search and evolutionary computation methods. The results show that the tested pure global optimization algorithms are outperformed by local search. The best results, however, come from hybrid algorithms.

Thomas Weise, Raymond Chiong, Ke Tang, Jörg Lässig, Shigeyoshi Tsutsui, Wenxiang Chen, Zbigniew Michalewicz, and Xin Yao. Benchmarking Optimization Algorithms: An Open Source Framework for the Traveling Salesman Problem. IEEE Computational Intelligence Magazine (CIM), 9(3):40-52, August 2014. doi:10.1109/MCI.2014.2326101

Of course, our work is not the first one on this topic. Some pioneering and outstanding contributions, to name a few, are:

Using Java’s try-with-resources for Hierarchical and Parallel APIs

During my work on the TSP Suite (soon to appear in IEEE CIM), I came up with an – as I blieve – nice approach to construct hierarchical APIs by (mis-)using Java 7‘s try-with-resources statement and the related AutoCloseable interface. Since I have used this approach in several different locations in my programs and found it quite useful, I want to share it here. Before I do that, please let me point out that I do not know whether this idea already exists somewhere and whether it actually may be an anti-pattern. The complete source code of the examples and example APIs used in this post is provided at its end.

AutoCloseable and try-with-resources

First of all, let me shortly outline what the interface AutoCloseable and the try-with-resources concept are. A program will usually open resources, such as sockets for information exchange via the network or streams to read and write data. Such resources must eventually be closed, once we are finished with them. For example, a FileOutputStream must be closed after we are finished writing our data to a file.

Resources, Exceptions, and try-finally

However, many resources may throw Exceptions: The aforementioned FileOutputStream, for example, may throw an IOException if were are trying to write more data to it than the free capacity of the underlying file system permits. Still, we need to make sure that the resources are properly closed even in case of errors. This can be done with a try-finally-based approach, such as the one illustrated below:

As you can see, we put the calls to the close() methods into finally blocks, which are executed even in case of Exceptions being thrown. However, we need to do this try-finally approach for every resource. There is the danger that we sometimes may forget it. Also it makes the code a bit longer.

Resources, Exceptions, and try-with-resources

Since Java 7, classes representing resource that need to be closed should implement the AutoCloseable interface. (Neatly, AutoCloseable has been introduced as super-interface of Closeable, which is already implemented by many resource classes.) Any implementation of that interface can be embedded in a try-with-resources statement, which will invoke the close()-method automatically. A Java compiler (such as the one of Eclipse) can even be instructed to issue warnings or errors if a resource object is not properly embedded into such a statement. The Java 7 version of the example above now looks like:

This makes the code shorter and less error-prone.

How we can build simple APIs with that: Example

We can use this mechanism to build simple, hierarchical APIs. I will show this based on an example where we can relatively easily see why this could makes sense: XMLStreamWriter.

XML Output with XMLStreamWriter

The XMLStreamWriter is a lightweight output API for XML documents. It is somehow similar to a character stream Writer with additional methods to write XML elements and attributes (or an “inverse” version of SAX, for output, if you want). It is used like this:

Executing the above program will lead to output looking like this (without line breaks and indentation):

As you can see, each writeStartElement() is matched with a writeEndElement() (similar to how resource construction is matched with calls to close() in our previous examples). Elements can be arbitrarily nested, and it is up to the programmer to keep track of method calls to match.

XML Output with a Hierarchical API using try-with-resources

How about this: Let the method for opening an element return an instance of AutoCloseable, which can be handled with the try-with-resources approach. Its (implicitly invoked) close() method would replace the call to writeEndElement(). This would have a couple of advantages, foremost it reduces code size and second, it makes it impossible to mismatch element opening and closing calls, because forgetting or adding a superfluous element closing would result in a missing or excess curly braces (“}“), and the compiler does not appreciate that.

I did a quick and dirty implementation of this idea in the package org.optimizationBenchmarking.utils.io.xml of the attached sources for this example. The resulting code allows us to re-write the previous example as:

The code has the same behavior, i.e., it directly writes the XML data to the underlying stream and does not construct any fancy data structure in memory. Compared to the XMLStreamWriter, all the explicit calls to element closing methods have disappeared (and became implicitly invoked at the corresponding curly braces). There is one difference to the previous example: Each opened element will be closed, in exactly the order in which the elements are opened, even in case of Exceptions.

Under the hood, there now is an additional internal liked list of the AutoCloseable implementations implicitly managed by the try-with-resources statements. However, since XMLStreamWriter probably also maintains some kind of element stack internally, this does probably not even cause additional memory consumption.

This kind of API has a set of advantages:

1. It reduces code complexity.
2. It reduces the probability of coding errors.
3. It can better represent the structure of method calls, since automated code formatting can add indentation representing the nesting level of API calls. This is not (or at least hardly) possible with separate open/close method calls, as in case of XMLStreamWriter.
4. It does not necessarily need more memory.
5. It automatically provides the robustness of try-finally, without the need to add any additional code.
6. It looks as if you are working on complex objects, but it may actually be maintained by a very lightweight data structures. My XML example API somehow has a DOM-like look and feel, but it actually works like a XMLStreamWriter.
7. It may be implemented to support parallel processing without much overhead, as I will discuss later in this post.

Fields of Application

I imagine two potential fields of application of this idea: First, for producing clear and simple structured output APIs, like the XML example above.

Hierarchical, Abstract Document Output APIs

In the TSP Suite, for instance, I have a more elaborate abstract API to produce documents, which is implemented for XHTML and LaTeX. If we can do this with XML, then we can also not make an API for general structured text documents, where different opened elements can have different children. A section may have a section body, a section body may contain figures, equations, or text, a text may contain emphasized sub-text, etc. All of this can easily be facilitated by hierarchically nesting different implementations of AutoCloseable, while it is virtually impossible to do this properly and in an easy-to-use style with a single output object (in the style of XMLStreamWriter). I will make a new release of this API for the next version of TSP Suite, but the currently available source code of that project already has a working version.

Hierarchical Builder Pattern

The second field of application is to create more complex builder patterns. The builder pattern allows you to avoid having very complex constructors. Say you have an immutable object with 20 fields. Now you could make a 20-parameter constructor for that object, or a builder class where you can set the fields with separate method calls and, once you are finished setting the fields, obtain the constructed object.

Sometimes, your goal is not just to build a flat data structure with many parameters, but maybe a hierarchical one. As simple example, take bibliography records as used in BibTeX. One such record, say, for a conference paper, may contain a list of authors (each having a personal and a family name), a title and features of the proceedings containing the paper, such as editors, a book title, a location, and a publisher. Creating an object with all of that information may be complex. By using the try-with-resources idea, we can construct hierarchical builders and get code such as:

OK, without context, it may hard to understand or appreciate this example, but the key point is: I want to construct a complex (immutable) data structure with nested immutable objects, and I can use try-with-resources and AutoCloseable to implement a hierarchically nested builder architecture for doing that.

Difference to Closures and Lambda Expressions

There is a certain similarity of this idea to closures. Java has always supported some kind “explicit” closures. In Java, you can define anonymous local classes inside a class or method, which may access fields or variables of the enclosing class. This concept is a very popular way (actually, probably the default method) to implement event listeners for GUI components. A closure is function which can access data of the enclosing entity, exactly as such a local class. In Java 8, Lambda expressions are introduced, an (I think) simplified syntax for a special case of local anonymous classes.

Anonymous local classes and lambda expressions may look a bit to what I suggest here, but they are a different thing: As the name implies, the code pieces go into classes, whereas in my case, the code is just normal code. Also, the code of local classes or lambda expressions is usually passive: it is invoked from an outside activity, say an enclosing method providing it with input, or an event source. This is different from my situation, where normal coded actively uses an API.

Still, the same effects that I list here could probably be realized with closures or lambda expressions. Using these devices may even be more suitable or intuitive to achieve that parallel output that I will discuss in the next section. Yet, it would come at the cost of more (probably many more) anonymous classes being created, whereas my method probably only introduces moderate overhead.

Parallel Single-Document Output

One final and, in my opinion, very interesting argument for using the nested AutoCloseable approach to construct structured APIs, is that it inherently supports parallelism. In case of the bibliography/builder example above, it seems to be quite natural. We can easily imagine that the “page-setting” code could run in a different thread. We can build a complex object in parallel.

But we can even create parallel output to a single document, say an XML document. With that I mean: You can have ten threads which all write to the same, single XML stream in parallel and the final result will still be valid XML, with all output properly ordered. This is a thing the XMLStreamWriter API cannot provide. Invoking startElement() methods in parallel in different threads is a recipe for trouble. Not so in case of an API managed with AutoCloseables.

Basic Idea

The idea is as follows:

1. Each element of the API (say the XMLElement objects in my example) maintains an internal reference to its output destination, let’s say of the type Writer, to which it writes its output.
2. Each element furthermore manages a queue of open child elements.
3. It is allowed to have multiple open child elements. In case of the XML example, you could write to multiple elements in parallel. Every time a new element is opened, it will be added to the end of its parent’s child queue.
4. When a new child element is opened, two cases are distinguished:
1. The parent currently has no other open child element. In this case, the internal output destination reference (say, the Writer) of the child element will be set to reference the same Writer as the parent element.

This is the default situation in the non-parallel scenario (and the previous examples), where each element has at most one open child element. After a child element is closed, the next one can be opened and written to. Exactly as it is the case XMLStreamWriter. All output goes (sequentially) to the same Writer, no additional overhead (except for the single additional field per element) occurs.

2. However, if another child element is already open, the internal output reference (Writer) of the new child element will be a new, independent buffer in memory, say a CharArrayWriter.

It is easy to see that all child elements can be written to at the same time and in parallel, since all the data goes to different locations. No synchronization is required for writing, only during the opening and closing phases.

5. Whenever a child element is closed, the parent’s queue is checked from the start.
1. If the head of the queue is a closed child element, there are two cases:
1. If the internal output reference of (the Writer used by) the closed child element at the head of the queue is the same as the one of the parent, then nothing needs to be done.
2. If he internal Writer of the closed child element at the head of the queue is a buffer, then the contents of this buffer are appended to the parent’s Writer.

The head of the queue is removed and we continue with the next child in the queue. This sequential processing ensures that the outputs are arranged in the same order in which the elements were opened.

2. If the head of the queue is an open child element, the processing of the queue is stopped here. (Once that element is closed, the queue will be processed again.)
6. Of course, you can only close an element after all of its child elements are closed, but this bit of synchronization is not too complex..

I implemented a basic form of hierarchical API using queueing of child elements in a class called org.optimizationBenchmarking.utils.hierarchy.HierarchicalFSM in the example sources. The capability of parallel text output by using the internal references to output destinations or memory buffers is implemented exemplarily in class org.optimizationBenchmarking.utils.hierarchy.HierarchicalText. The classes used in my XML example are based on HierarchicalText. The Bibliography example is a hierarchical builder pattern and does write output to a stream, so it is based on HierarchicalFSM directly.

Use Case: ForkJoinPool

Following this concept, we can create APIs for writing any kind of stuff to a single stream, in parallel. These APIs should have almost no additional costs if you just want to use them sequentially. Moreover, this concept fits quite well to another novelty in Java 7: ForkJoinPool. A ForkJoinPool, as discussed in this tutorial, allows you to submit jobs which, in turn, can submit sub-jobs, and so on.

Each such task could write its output to a different element of the hierarchical API and close that element once its own output is written and its sub-tasks are finished. Apart from this, the programmer herself does not need to deal with parallelism too much. The output to the document will automatically be parallelized, with the level of parallelization depending on the number of available processors. Due to the queue-based sequencing of the output, the resulting documents will look exactly the same as if they were written in a single thread.

This form of parallel output may be quite nice in cases where contents for different parts of the output document require you to do independent, complex calculations. This is the case in our TSP Suite: The output documents contain several statistics and figures which are computed independently from each other. If we can compute and output all of them in parallel, the throughput may increase significantly.

Experiment: Does parallel output provide speedup?

Of course, parallelization causes some overhead. Buffers need to be allocated and re-sized as contents are written to them in parallel. Then we have to pay for using ForkJoinPool and implementing the output “generators” as instances of ForkJoinTask (such as RecursiveActions) instead of simple procedures. (This is similar to using closures directly, but closures would require us to put each nested output scope into a class, whereas we here can nest several try-with-resources statements into one task.) Anyway, we also need to use synchronization whenever a new task is started or one terminates. Finally, the documentation of ForkJoinTask states Tasks should also not perform blocking IO, and should ideally access variables that are completely independent of those accessed by other running tasks. … but we are doing IO.

Thus, I conducted a small experiment to investigate whether the throughput of writing to a sequential document can actually be increased by parallelizing the output with the API implemented exemplarily for XML based on my little HierarchicalText-class.

Experimental Setup

For this purpose, I wrote some code creating random example document structures in memory of fixed sizes s∈{10,100,…,10000} and serialize them using either

1. Java’s XMLStreamWriter in the default, serial fashion (I will refere to this setup as X
2. my hacked-together XML API in serial fashion (referred to as S)
3. my XML API in parallel, using
ForkJoinPools with the single-parameter constructor for different degrees of parallelism, ranging from 1 to Runtime.getRuntime().availableProcessors() (in my case, 8). I will refer to these setups as P#, where # stands for the number of processors.
4. my XML API in a parallel fashion, using ForkJoinPools constructed with asyncMode=true. These setups are referred to as A1 to A8.

I measured the performance for two kinds of output destinations: A Writer simply discarding all of its output, i.e., not performing any blocking operations, and a FileWriter writing to a temporary file. Furthermore, I wanted to investigate how the computational requirements to create the output data influence the performance. For this purpose, I simulate a “computation” for any attribute or element text by performing loops of bogus mathematical operations. I did this for loops with d∈{0,10,100,…,10000} iterations.

I tried to follow my advices from my post on measuring runtimes by conducting 500 dry runs before the actual measurement to let the Java JIT compiler do its potential work and by conducting several runs per configuration and several runs per document for each output destination. In particular, for each document, I performed 3 runs and took the shortest measured runtime. Each setting of s and d is repeated like this for 15 different (random) documents and the median runtime is used.

The implementation of this performance test is given in package examples.org.optimizationBenchmarking.utils.io.xml.performance in the attached sources. The results I have measured on my Ubuntu computer and the OpenJDK 64-Bit Server VM for Java 1.7. The results and figures are contained the above package in the download, as file results.txt. Since actual runtimes are not important, I normalized them in each of the figures. In particular, for each figure and setting of the “computational effort to obtain output data” d, I picked the fastest setup and divided all measures runtimes of all setups by its runtime, for this particular value d.

Experimental Results

Figure 1 illustrates the relative performance of the different settings for the discarding Writer and documents with 10 elements for different values of d. In the case where no computation or only few calculations are necessary to obtain the data to be written, i.e., where the output is directly available, both XMLStreamWriter (X) and my API (S) used in a serial fashion are significantly faster than the parallel versions. However, if more computations become necessary to create the output (as I it would be the case in my TSP Suite), the parallel versions can save up to 25% compared to the serial output. Of course, the setups P1 and A1 cannot do that, as they pay the overhead for using ForkJoinPool without actually being parallel.

Figure 1: The output time requirements of XML APIs for 10-element XML documents to a Writer discarding its output, over different attribute- and element text access times.

Figure 2, illustrating the same output routine for 1000-element documents, confirms this finding, with two differences: First, for these larger documents, parallelism seems to already pay off for small computational requirements to “generate” output data and time savings of 50% seem to be possible. Second, it seems to stop paying off for large computational requirements and all setups except S have the same performance. I have no real idea why this would happen. It may be specific to my example implementation, be caused by some JIT behavior, or by some under-the-hood behavior of ForkJoinPool, maybe related to thread allocation and disposal, or other magic.

Figure 2: The output time requirements of XML APIs for 1000-element XML documents to a Writer discarding its output, over different attribute- and element text access times.

When writing 10-element documents to a temporary file (illustrated in Figure 3), we can observe a behavior similar to when writing to a discarding Writer. The speedups provided by the parallel version seem to be more consistent, which probably results from the more extensive use of buffers in memory.

Figure 3: The output time requirements of XML APIs for 10-element XML documents to a FileWriter, over different attribute- and element text access times.

For 1000-element documents (see Figure 4), we can obtain nice speedups and may sometimes be able to write a document in half of the time. (Again, with the same odd behavior of performance collapse for large computations.)

Figure 4: The output time requirements of XML APIs for 1000-element XML documents to FileWriter, over different attribute- and element text access times.

All in all, we can find the parallel output to the same underlying stream may, at least in some cases, provide a significant speedup. Of course, we have to pay for synchronization and other overhead when using ForkJoinPool, plus we violate its contract by actually doing blocking I/O. As a result, the speedup is not equal to the number of used processors.

Then again, my API was hacked together and the way I create random documents may be biased in some way. It should furthermore be noted that I did not do too many repetitions for statistical sound evaluation. On a different machine, you may get entirely different results. Also, my API does not support anything fancy, which is likely the reason why it is a bit faster than XMLStreamWriter if used in a serial way. So these results here are just something inbetween a sanity test and a rough proof-of-concept.

Conclusions

In this post, I have outlined a concept for hierarchical APIs based on Java 7′s try-with-resources and AutoCloseable interface. I have argued that, at least to me, it seems to be robust, both in terms of reducing potential sources of coding errors as well as in the presence of exceptions being thrown. Furthermore, without much effort, it can be used to parallelize output to sequential streams, which seems to provide a performance improvement – at least in some preliminary experiments. If implemented properly, I could even imagine that this could speed up the way a web- or application server creates single-stream contents.

Of course, I am not a software engineer. I know nothing about frameworks out there, I know nothing about fancy concepts and patterns. So this idea may actually be an old hat. Or discouraged by experts for some reason. Anyway, I think it is cute.

Example Source Code

The current version of the source codes belonging to this article are contained in this ZIP archive. I put them under the LGPL version 3 license. I do not resume any responsibility of whether they may work or not. This holds, for instance, for my XML output API – it is just an example and not intended for any productive application at all.

The source archive contains two root packages:

1. org.optimizationBenchmarking.utils is the utility package for the next version of TSP Suite. It contains examples for potential base classes to build a hierarchical API using AutoCloseable and try-with-resources in sub-package hierarchy, my example XML API in sub-package io.xml, and the example bibliography API in sub-package bibliography.
2. Package examples holds examples for everything we have discussed above. If you are looking for examples of a class you found in package org.optimizationBenchmarking.utils.x.y, then you will find it in package examples.org.optimizationBenchmarking.utils.x.y, if there are any.

The source archive also contains a heap of other utility classes, which, in one way or another, are used by my code. You may ignore them.

Deadline Extension for SSCI’14 (and our special session)

Good news for everyone who was a bit late to submit their paper to SSCI 2014: The deadline has been extended to July 15, 2014. The same, of course, automatically holds for our Special Session on Benchmarking and Testing for Production and Logistic Optimization discussed on this blog. Its CfP has already been updated. We are looking forward to receive your submissions ^_^

Towards a figure-like LaTeX construct that can break over multiple pages and works in double-column documents

I am working on a tool (TSPSuite) for analyzing optimization algorithms on the Traveling Salesman Problem. TSPSuite automatically generates reports including the comparison of the performance of different algorithms. These reports can be in output in the LaTeX language. In the reports, there are automatically-generated figures which contain a lot of automatically-generated sub-figures. And here the problem begins: TSPSuite may generate more rows of sub-figures for a single figure than what fits on a page. Floating objects (such as figures) in LaTeX cannot include page breaks. Thus, the hosting figure needs to be split into several figures, i.e., a “figure series”. But where do I split it? LaTeX has no automatic approach for that, so I need to code it inside my TSPSuite tool. However, since sub-figures and figures have captions, it becomes extremely complicated to calculate sizes outside of LaTeX. Thus, I sometimes end up with figures which are not nicely divided. Another problem is that LaTeX has a limit for the number of pending, not-yet-laid-out floating objects and this limit is relatively low (18, I believe). Thus, if my figure series gets too long, it may cause errors during compilation of the report with LaTeX. In other words, my TSPSuite needs to sporadically insert \clearpages or \FloatBarriers, which mess up the layout even further. All in all, what I need is something that:

1. has the look-and-feel of a figure environment, but
2. can break over several pages, and
3. ideally float, and
4. work in both single- and double-column documents (since quite a few computer science journals and conferences use double-column classes.

These requirements are, to the best of my knowledge, hard and borderline impossible to fulfill, as they arose a few times in internet communities without satisfying answer: 1, 2, and 3. Especially the last one, the requirement to work in double-column setups, turned out to be tricky.

How to make this (almost) work

Nevertheless, I think I made a small step towards solving these issues, although it does not yet work well. I want to document it here for future use and thus put together a LaTeX package. Also, by publishing the current state of my idea here, maybe I can get some feedback how to correct it.

My idea is as follows:

Something like a figure

LaTeX’ figures and floats cannot break. So let us first find something like a figure that can break, and later make it float. A figure is basically an environment that has a figure caption and a body. With the \captionof command from package caption, we can create a figure caption wherever it pleases us. So we just need the figure body, and for this a simple \begin{center}...\end{center} would do. We could put a lot of stuff into that, and our new “figure series” environment would nicely break over several pages (since it is not a floating object). Let’s say we have created an environment like this. We cannot really make it float, but we can “emulate” floating behavior by putting it into an \afterpage{...} call (from package afterpage). \afterpage will take its contents and render it at the top of the next page. Until this happens, the contents coming after the \afterpage (such as text following our figure series) command will be laid out. Of course, the contents of our environment, after starting on the next page, could then span over several following pages.

Sub-figures inside

Now we want to put sub-figures into our figure series body. There are several packages that can do this, but the subcaption package seems to be the state-of-the-art. It provides the command \subcaptionbox that takes a caption and a (figure) content as parameter. This is exactly what we want. We now can nicely put some \subcaptionboxs in a row and terminate this row with \par. In order to get an automatic and nice spacing between the sub-figures of a row, we can insert \strut\hfill\struts between them. Now we have an almost-floating environment into which we can insert arbitrarily many rows of sub-figures and which nicely breaks over several pages.

The Double-Column Problem

This would all be nice if there were not document classes which have two columns, such as IEEE’s IEEEtran and ACM’s sig-alternate. Putting our environment, as is, into a double-spaced document could mean that the figure series begins in the middle of the left column, fills this column with sub-figures, breaks over to the right column, and continues filling this column with sub-figures until its middle. That would look extremely odd. A figure series will usually contain many sub-figures and should thus span over the whole page width. We can achieve this by temporarily switching double-column documents to single-column ones by using the command \onecolumn and later, after our series, switch back to double-column mode using the command \twocolumn.

Page Breaks induced by \onecolumn and \twocolumn

The problem is that these two commands will always create a page break before switching the column mode. For the \onecolumn command, this is not a big deal. We can always let our figure series float using \afterpage. It would begin on the next page and since there is necessarily is a page break between the current page and the next page, the page break created by \onecolumn does not change anything.

\afterpage in Double-Column Mode

The \afterpage command behaves slightly different in double-column mode than in single-column mode. More precisely, in double-column mode, if called in the first column, it prints its contents at the start of the next (second, right) column instead on the next page. If called in the right column, it prints its stuff at the start of the left column on the next page. We can solve this minor issue by simply wrapping our figure series into two \afterpages if we are in the left column (which we can detect via \if@firstcolumn.

The Page Break after \twocolumn

While we could defuse the page break after the \onecolumn command, the one after \twocolumn poses a much more severe problem. This page break will potentially leave a larger fraction of the page blank, which does not look nice. We cannot \afterpage it away, since then the document’s main text would be set in single-column mode until the next page where it would be typeset in double-column again. One simple trick is to \let\clearpage\relax, i.e., to set the \clearpage command (used inside \twocolumn) to do nothing before \twocolumn and to restore it afterwards. However, this will break the layout completely: The text in the left column after our figure series will start nicely below the figure series. The text in the right column, however, will start at the top of the page — right inside the space occupied by the sub-figures!

Restoring Occupied Space in the Right Column

So we can either have an (potentially) almost blank page or text written over our figures. One way out of this may be as follows: Assume that we can somehow find out where exactly (vertically speaking) our figure series ends. If we know this “height” occupied by our figure series, then we could create an object at the top of the right column occupying the same amount of space. This object could be invisible, say a \vspace. Then, the text in the right column would start at the same vertical offset as the text in the left column, i.e., right under our figure.

\pagetotal holds how much space the current contents on a page occupy, if I understand correctly. I store the value of \pagetotal in a variable right before we do the \twocolumn (with the defused \clearpage). After \twocolumn, I need to occupy that much space at the top of the right column. Since I will be in the left column after \twocolumn, I can simply put a corresponding \vspace into an \afterpage. This turns out to do the trick quite well, although not perfect: The two columns now often start at the same vertical offset, but sometimes there are little visible deviations. I assume these come from some automatic space distribution done by LaTeX. Here I need more understanding to fix this.

Merging Neighboring Figure Series

Another issue to consider is that two figure series may be adjacent to each other. In this case, odd things may or may not happen. I think it is best to combine such neighboring figure series, i.e., to put them into a single, common pair of \onecolumn\twocolumn. Of course they should retain their own captions and be separate objects, just grouped together. This can be achieved by storing the body of the figure series in a special macro which initially is empty. The macro is then rendered when the figure series is actually laid out, i.e., when the body between \onecolumn\twocolumn of the corresponding \afterpage is executed. If a figure series begins and detects that our special macro is not empty, it simply attaches the body of the new series to the existing macro, by using \g@addto@macro. Right when the body of a figure series is laid out, we set the macro back to empty. This way, all figure series bodies of figure series issued on the same page will automatically be merged and the available space is used efficiently.

Lonely Captions

This can have one drawback, though. First, if we have a multi-page figure series, the captions should be at its top. Normally figure captions are below the figures, but that would mean you may encounter a page of subfigures without a main caption. So we put the caption at the top. If multiple figure series are merged, it may happen that the caption of second or third series occurs right at the bottom of the page, followed by a normal page break, before its subfigures begin. This can be solved by not rendering the caption directly, but to render it inside a \parbox together with the first row of sub-figures. For this purpose, we can store it in a special macro.

The Package

I put all of the above together with several examples and documentation into a package and called “figureSeries”. I want to use it for my own purposes, so for me and for now, it is more or less acceptable, although it is not perfect (as said, the spacing in two-column mode is still a tad bit off). I hope that I can resolve these issues in the future (and am welcoming any suggestions). I also asked for feedback in the stackexchange forum. Once I am fully satisfied with the package, I will submit it to ctan (and I will update this post as well).