Archive for the ‘paper’ Category

Automated Just-in-time Compiler Tuning

Monday, January 25th, 2010

I’m pretty excited about our paper to get accepted at CGO 2010, which takes place in Toronto, Canada.

Automated Just-in-time Compiler tuning, Kenneth Hoste, Andy Georges, and Lieven Eeckhout.

The abstract of the paper reads as follows.

Managed runtime systems, such as a Java virtual machine (JVM), are complex pieces of software with many interacting components. The Just-In-Time (JIT) compiler is at the core of the virtual machine, however, tuning the compiler for optimum performance is a challenging task. There are (i) many compiler optimizations and options, (ii) there may be multiple optimization levels (e.g., -O0, -O1, -O2), each with a specific optimization plan consisting of a collection of optimizations, (iii) the Adaptive Optimization System (AOS) that decides which method to optimize to which optimization level requires fine-tuning, and (iv) the effectiveness of the optimizations depends on the application as well as on the hardware platform. Current practice is to manually tune the JIT compiler which is both tedious and very time-consuming, and in addition may lead to suboptimal performance.

This paper proposes automated tuning of the JIT compiler through multi-objective evolutionary search. The proposed framework (i) identifies optimization plans that are Pareto-optimal in terms of compilation time and code quality, (ii) assigns these plans to optimization levels, and (iii) fine-tunes the AOS accordingly. The key benefit of our framework is that it automates the entire exploration process, which enables tuning the JIT compiler for a given hardware platform and/or application at very low cost.

By automatically tuning Jikes RVM using our framework for average performance across the DaCapo and SPECjvm98 benchmark suites, we achieve similar performance to the hand-tuned default Jikes RVM. When optimizing the JIT compiler for individual benchmarks, we achieve statistically significant speedups for most benchmarks, up to 40% for start-up and up to 19% for steady-state performance. We also show that tuning the JIT compiler for a new hardware platform can yield significantly better performance compared to using a JIT compiler that was tuned for another platform.

You can get a preprint of the paper. We also plan to make our tool available, so it can be used to automagically tune other VMs *cough* J9 *cough*.

Java Performance through Rigorous Replay Compilation

Tuesday, August 19th, 2008

At this year’s OOPSLA I am going to present a paper that was accepted in the Research Papers track.

Java Performance Evaluation through Rigorous Replay Compilation, Andy Georges, and Lieven Eeckhout, Dries Buytaert

The abstract of this paper reads as follows.

A managed runtime environment, such as the Java virtual machine, is non-trivial to benchmark. Java performance is affected in various complex ways by the application and its input, as well as by the virtual machine (JIT optimizer, garbage collector, thread scheduler, etc.). In addition, non-determinism due to timer-based sampling for JIT optimization, thread scheduling, and various system effects further complicate the Java performance benchmarking process.

Replay compilation is a recently introduced Java performance analysis methodology that aims at controlling non-determinism to improve experimental repeatability. The key idea of replay compilation is to control the compilation load during experimentation by inducing a pre-recorded compilation plan at replay time. Replay compilation also enables teasing apart performance effects of the application versus the virtual machine.

This paper argues that in contrast to current practice which uses a single compilation plan at replay time, multiple compilation plans add statistical rigor to the replay compilation methodology. By doing so, replay compilation better accounts for the variability observed in compilation load across compilation plans. In addition, we propose matched-pair comparison for statistical data analysis. Matched-pair comparison considers the performance measurements per compilation plan before and after an innovation of interest as a pair, which enables limiting the number of compilation plans needed for accurate performance analysis compared to statistical analysis assuming unpaired measurements.

The bulk of this paper made up Chapter 5 in my PhD dissertation, which was published on April 30, 2008. Here and there slight improvements were made before we submitted the final version. You can get a preprint version of the paper. The presentation I gave is available in both pdf format or as a zipped Keynote archive.

Statistically Rigorous Java Performance Evaluation

Monday, June 18th, 2007

The following paper has been accepted for OOPSLA 2007.

Statistically Rigorous Java Performance Evaluation, Andy Georges, Dries Buytaert, and Lieven Eeckhout.

The abstract reads as follows.

Java performance is far from being trivial to benchmark because it is affected by various factors such as the Java application, its input, the virtual machine, the garbage collector, the heap size, etc. In addition, non-determinism at run-time causes the execution time of a Java program to differ from run to run. There are a number of sources of non-determinism such as Just-In-Time (JIT) compilation and optimization in the virtual machine (VM) driven by timer-based method sampling, thread scheduling, garbage collection, and various system effects.

There exist a wide variety of Java performance evaluation methodologies used by researchers and benchmarkers. These methodologies differ from each other in a number of ways. Some report average performance over a number of runs of the same experiment; others report the best or second best performance observed; yet others report the worst. Some iterate the benchmark multiple times within a single VM invocation; others consider multiple VM invocations and iterate a single benchmark execution;
yet others consider multiple VM invocations and iterate the benchmark multiple times.

This paper shows that prevalent methodologies can be misleading, and can even lead to incorrect conclusions. The reason is that the data analysis is not statistically rigorous. In this paper, we present a survey of existing Java performance evaluation methodologies and discuss the importance of statistically rigorous data analysis for dealing with non-determinism. We advocate approaches to quantify startup as well as steady-state performance, and, in addition, we provide the JavaStats software to automatically obtain performance numbers in a rigorous manner. Although this paper focuses on Java performance evaluation, many of the issues addressed in this paper also apply to other programming languages and systems that build on a managed runtime system.

This paper took quite some work, especially in the experimentation-wise. While the initial reviews were very positive, they required us to perform several extra experiments. But in the end, it was worth the effort. You can get a preprint version.

So, 2 out of X at OOPSLA for us! Yay!

Using HPM-Sampling to Drive Dynamic Compilation

Saturday, May 12th, 2007

The following paper has been acepted for publication at OOPSLA 2007.

Using HPM-Sampling to Drive Dynamic Compilation, Dries Buytaert, Andy Georges, Michael Hind, Matthew Arnold, Lieven Eeckhout, and Koen De Bosschere.

The paper abstract reads as follows.

All high-performance production JVMs employ an adaptive strategy for program execution. Methods are first executed unoptimized and then an online profiling mechanism is used to find a subset of methods that should be optimized during the same execution. This paper empirically evaluates the design space of
several profilers for initiating dynamic compilation and shows that existing online profiling schemes suffer from several limitations. They provide an insufficient number of samples, are untimely, and have limited accuracy at determining the frequently executed methods. We describe and comprehensively evaluate HPM-sampling, a simple but effective profiling scheme for finding optimization candidates using hardware performance monitors (HPMs) that addresses the aforementioned limitations. We show that HPM-sampling is more accurate; has low overhead; and improves performance by 5.7\% on average and up to 18.3\% when compared to the default system in Jikes RVM, without changing the compiler.

MontrĂ©al, here we come. October 21st – October 25th it is!

This paper has quite a long history behind it. Dries and I conceived the idea while attending the ACACES summerschool in July 2006. After a long talk with Mike, we decided to launch some preliminary measurements with the system Dries had already built into Jikes RVM using the HPM interface I had adapted from Steve Blackburn’s perfctr patch for Jikes RVM. We intially targetted PLDI 2007, when some matters were brought to our attention, that questioned our original idea on the current state of the art. Submission was postponed, extra experiments were conducted and we targetted VEE instead, where our paper was rejected. Based on the reviews we received there, it seems like it was a border case, but a rejection nonetheless. So, we figured, why not submit to OOPSLA. Worst case scenario: we get additional reviews to improve our paper. I turns out that the Best Case Scenario was visited upon us instead. You can get a preprint version.

Method-Level Phase Behavior in Java Workloads.

Tuesday, July 13th, 2004

The following paper has been accepted for publication at OOPSLA 2004

Method-Level Phase Behavior in Java Workloads, Andy Georges, Dries Buytaert, Lieven Eeckhout, and Koen De Bosschere

The paper abstract reads as follows.

Java workloads are becoming more and more prominent on various computing devices. Understanding the behavior of a Java workload which includes the interaction between the application and the virtual machine (VM), is thus of primary importance during performance analysis and optimization. Moreover, as contemporary software projects are increasing in complexity, automatic performance analysis techniques are indispensable. This paper proposes an off-line method-level phase analysis approach for Java workloads that consists of three steps. In the first step, the execution time is computed for each method invocation. Using an off-line tool, we subsequently analyze the dynamic call graph (that is annotated with the method invocations` execution times) to identify method-level phases. Finally, we measure performance characteristics for each of the selected phases. This is done using hardware performance monitors. As such, our approach allows for linking microprocessor-level information at the individual methods in the Java application`s source code. This is extremely interesting information during performance analysis and optimization as programmers can use this information to optimize their code. We evaluate our approach in the Jikes RVM on an IA-32 platform using the SPECjvm98 and SPECjbb2000 benchmarks. This is done according to a number of important criteria: the overhead during profiling, the variability within and between the phases, its applicability in Java workload characterization (measuring performance characteristics of the various VM components) and application bottleneck identification.

How Java Programs Interact with Virtual Machines at the Microarchitectural Level

Saturday, July 12th, 2003

The following paper has been accepted for publication at OOPSLA 2003

How Java Programs Interact with Virtual Machines at the Microarchitectural Level, Lieven Eeckhout, Andy Georges, and Koen De Bosschere.

The paper abstract reads as follows.

Java workloads are becoming increasingly prominent on various platforms ranging from embedded systems, over general-purpose computers to high-end servers. Understanding the implications of all the aspects involved when running Java workloads, is thus extremely important during the design of a system that will run such workloads. In other words, understanding the interaction between the Java application, its input and the virtual machine it runs on, is key to a succesful design. The goal of this paper is to study this complex interaction at the microarchitectural level, e.g., by analyzing the branch behavior, the cache behavior, etc. This is done by measuring a large number of performance characteristics using performance counters on an AMD K7 Duron microprocessor. These performance characteristics are measured for seven virtual machine configurations, and a collection of Java benchmarks with corresponding inputs coming from the SPECjvm98 benchmark suite, the SPECjbb2000 benchmark suite, the Java Grande Forum benchmark suite and an open-source raytracer, called Raja with 19 scene descriptions. This large amount of data is further analyzed using statistical data analysis techniques, namely principal components analysis and cluster analysis. These techniques provide useful insights in an understandable way.From our experiments, we conclude that (i) the behavior observed at the microarchitectural level is primarily determined by the virtual machine for small input sets, e.g., the SPECjvm98 s1 input set; (ii) the behavior can be quite different for various input sets, e.g., short-running versus long-running benchmarks; (iii) for long-running benchmarks with few hot spots, the behavior can be primarily determined by the Java program and not the virtual machine, i.e., all the virtual machines optimize the hot spots to similarly behaving native code; (iv) in general, the behavior of a Java application running on one virtual machine can be significantly different from running on another virtual machine. These conclusions warn researchers working on Java workloads to be careful when using a limited number of Java benchmarks or virtual machines since this might lead to biased conclusions.