Posts Tagged ‘statistics’

Java Performance through Rigorous Replay Compilation

Tuesday, August 19th, 2008

At this year’s OOPSLA I am going to present a paper that was accepted in the Research Papers track.

Java Performance Evaluation through Rigorous Replay Compilation, Andy Georges, and Lieven Eeckhout, Dries Buytaert

The abstract of this paper reads as follows.

A managed runtime environment, such as the Java virtual machine, is non-trivial to benchmark. Java performance is affected in various complex ways by the application and its input, as well as by the virtual machine (JIT optimizer, garbage collector, thread scheduler, etc.). In addition, non-determinism due to timer-based sampling for JIT optimization, thread scheduling, and various system effects further complicate the Java performance benchmarking process.

Replay compilation is a recently introduced Java performance analysis methodology that aims at controlling non-determinism to improve experimental repeatability. The key idea of replay compilation is to control the compilation load during experimentation by inducing a pre-recorded compilation plan at replay time. Replay compilation also enables teasing apart performance effects of the application versus the virtual machine.

This paper argues that in contrast to current practice which uses a single compilation plan at replay time, multiple compilation plans add statistical rigor to the replay compilation methodology. By doing so, replay compilation better accounts for the variability observed in compilation load across compilation plans. In addition, we propose matched-pair comparison for statistical data analysis. Matched-pair comparison considers the performance measurements per compilation plan before and after an innovation of interest as a pair, which enables limiting the number of compilation plans needed for accurate performance analysis compared to statistical analysis assuming unpaired measurements.

The bulk of this paper made up Chapter 5 in my PhD dissertation, which was published on April 30, 2008. Here and there slight improvements were made before we submitted the final version. You can get a preprint version of the paper. The presentation I gave is available in both pdf format or as a zipped Keynote archive.

Statistically Rigorous Java Performance Evaluation: presentation

Tuesday, October 23rd, 2007

If you are interested in the presentation I gave at OOPSLA, you can get a Keynote exported pdf.

Adding Rigorous Statistics to the Java Benchmarker's Toolbox

Tuesday, October 16th, 2007

You can find the pdf of the poster I will be presenting together with Dries at OOPSLA this year. If all went well, there should be a two-page poster abstract printed in the OOPSLA Companion (preprint). So feel free to drop by on Monday evening and have a chat.

The abstract to this abstract reads as follows.

Java performance is far from trivial to benchmark because it is affected by various factors such as the Java application, its input, the virtual machine, the garbage collector, the heap size, etc. In addition, non-determinism due to Just-in-Time compilation/optimization, thread scheduling, etc., causes the execution time of a Java program to differ from run to run.

This poster advocates statistically rigorous data analysis when reporting Java performance. We advise to model non-determinism by computing confidence intervals. In addition, we show that prevalent data analysis approaches may lead to misleading or even incorrect conclusions. Although we focus on Java performance, the techniques can be readily applied to any managed runtime system.

Statistically Rigorous Java Performance Evaluation

Monday, June 18th, 2007

The following paper has been accepted for OOPSLA 2007.

Statistically Rigorous Java Performance Evaluation, Andy Georges, Dries Buytaert, and Lieven Eeckhout.

The abstract reads as follows.

Java performance is far from being trivial to benchmark because it is affected by various factors such as the Java application, its input, the virtual machine, the garbage collector, the heap size, etc. In addition, non-determinism at run-time causes the execution time of a Java program to differ from run to run. There are a number of sources of non-determinism such as Just-In-Time (JIT) compilation and optimization in the virtual machine (VM) driven by timer-based method sampling, thread scheduling, garbage collection, and various system effects.

There exist a wide variety of Java performance evaluation methodologies used by researchers and benchmarkers. These methodologies differ from each other in a number of ways. Some report average performance over a number of runs of the same experiment; others report the best or second best performance observed; yet others report the worst. Some iterate the benchmark multiple times within a single VM invocation; others consider multiple VM invocations and iterate a single benchmark execution;
yet others consider multiple VM invocations and iterate the benchmark multiple times.

This paper shows that prevalent methodologies can be misleading, and can even lead to incorrect conclusions. The reason is that the data analysis is not statistically rigorous. In this paper, we present a survey of existing Java performance evaluation methodologies and discuss the importance of statistically rigorous data analysis for dealing with non-determinism. We advocate approaches to quantify startup as well as steady-state performance, and, in addition, we provide the JavaStats software to automatically obtain performance numbers in a rigorous manner. Although this paper focuses on Java performance evaluation, many of the issues addressed in this paper also apply to other programming languages and systems that build on a managed runtime system.

This paper took quite some work, especially in the experimentation-wise. While the initial reviews were very positive, they required us to perform several extra experiments. But in the end, it was worth the effort. You can get a preprint version.

So, 2 out of X at OOPSLA for us! Yay!