<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Geek in progress &#187; paper</title>
	<atom:link href="http://www.itkovian.net/base/category/research/paper-research/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.itkovian.net/base</link>
	<description>I am not yet done.</description>
	<lastBuildDate>Thu, 20 Oct 2011 20:56:59 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.2.1</generator>
		<item>
		<title>Position statement</title>
		<link>http://www.itkovian.net/base/position-statement/</link>
		<comments>http://www.itkovian.net/base/position-statement/#comments</comments>
		<pubDate>Wed, 20 Oct 2010 04:54:35 +0000</pubDate>
		<dc:creator>Itkovian</dc:creator>
				<category><![CDATA[paper]]></category>
		<category><![CDATA[research]]></category>
		<category><![CDATA[benchmarking]]></category>
		<category><![CDATA[evaluation]]></category>
		<category><![CDATA[measurement]]></category>

		<guid isPermaLink="false">http://www.itkovian.net/base/position-statement/</guid>
		<description><![CDATA[Just to open it up and release it into the wild, here’s the statement I wrote with Koen De Bosschere on several (lower level) aspects of evaluation in computer science, we would like to see addressed. Recent work has shown a continued identification of pitfalls in conducting experimental computer science. In this position statement, we [...]]]></description>
			<content:encoded><![CDATA[<p>Just to open it up and release it into the wild, here’s the statement I wrote with Koen De Bosschere on several (lower level) aspects of evaluation in computer science, we would like to see addressed.</p>
<p>Recent work has shown a continued identification of pitfalls in conducting experimental computer science. In this position statement, we address three issues we consider important to enhance the state-of-the-art in this field: (i) the experimental design and setup, (ii) performance measurements and the subsequent analysis of results, and (iii) benchmark selection.</p>
<p>First, we consider the experimental setup. When designing an experiment, there are many dimensions (input set (size), heap size, garbage collector, compilers, optimisations, different VMs, etc.) to consider. In the not-so-distant past, researchers often used a single design point (e.g., a fixed heap size, one garbage collector, etc.). Recently, however, several researchers made a case in favour of considering multiple design points, and to let, for example, the heap size vary (in fixed steps) from a benchmark dependent minimum to a factor thereof. Essentially, a good experimental design should acknowledge these dimensions and their importance and make correct decisions with respect to the design points chosen for any given experiment. Some dimensions are continuous, others are discrete.</p>
<p>Hence, evaluating all points is not simply infeasible, sometimes it is outright impossible.</p>
<p>Moreover, the output value (e.g., performance as quantified by execution time) is not necessarily a continuous function of continuous input (e.g., heap size). At other times, we are only concerned with a subspace, i.e., fewer dimensions. This brings us to our first concern. Some researchers we have discussed these matters with, have stated that trying to cover the complete design space is pointless, and that they prefer to pick a single point for evaluating their new shiny idea. We beg to differ, but it is clear that the community needs to agree on which design points to used to obtain representative measurements.</p>
<p>Other scientific communities have since long solved the problem of experiment design. For example, in medical and social sciences, taking a representative (random) sample of the population follows a well known and widely practiced methodology. We do not believe that simply adopting these techniques is the way to go, but we can definitely learn from their methodology. We need to carefully examine how the experiment’s design space is shaped and which statistical techniques have to be used for choosing the correct (in some sense) points in it. For this, techniques borrowed from the machine learning community might be employed, where researchers choose data points that have most impact on a model they wish to build. Clearly, we need to pick more points from regions in the design space where the output function (e.g., performance) changes rapidly or where it is discontinuous.</p>
<p>The second issue involves measurement and analysis of data. Computer systems tend to exhibit mildly chaotic behaviour, e.g. programs may behave non-deterministically. We have argued in the past to add more statistical rigour to measurements and data analysis. It seems that the community is slowly embracing this idea, but there are a few obstacles holding researchers and practitioners back. In our opinion, there are at least two issues that must be addressed. The first and foremost problem is the lack of frameworks that automate the hard work &#8212; nobody likes boilerplate. Second, doing elaborate experiments takes time and resources. Computing resources are often scarce. Even when an experiment is embarrassingly parallel, researchers do not always have the resources required to exploit (most of) this parallelism. Automation is the key to solving these problems, yet any approach must be fully aware of the pitfalls we uncovered and deal with them adequately. A good example to follow is the recently developed Criterion framework, used by the Haskell community. (<a href="http://www.serpentine.com/blog/2009/09/29/criterion-a-new-benchmarking-l">http://www.serpentine.com/blog/2009/09/29/criterion-a-new-benchmarking-l</a>&#8230;). It improves on our own work in several respects, e.g., identifying the impact of outliers on the variance, determining the required number of iterations and the number of experiments to be conducted. It is usable for both micro and macro-benchmarks. The community should adopt these practices, and take them to the next level: non-normal distributed measurements, check autocorrelation, etc.</p>
<p>Third, concerning benchmark suites, there are two issues we will briefly discuss. Current benchmarks suites are sets (in the mathematical sense) of programs and their inputs. We propose to add an (objective) ordering to these sets, for example how much each benchmark adds to the coverage of the space of computer programs, spanned by some (again, objective) metrics. Ideally, these metrics should be machine and language independent, though this seems hard to achieve. Reducing the demands, we can start with a set of micro-architecture independent metrics, and see how well programs from different language map in the space. When for some reason a subset is used, the experiment should always include the top ranked benchmark, without skipping benchmarks. In such a scenario, it seems necessary to define the minimum number of benchmarks that should be evaluated, e.g., k &gt;= 5. Also, to avoid focusing only on those benchmarks, we require the code and setup to be made public such that reviewers and potential users can evaluate further.</p>
<p>The second important issue &#8212; closely linked to the benchmark ordering &#8212; involves deciding when an experiment shows positive evidence for the evaluated technique or innovation. Currently, reviewers check the mean (often the wrong one) and see if the approach works for (at least some of) the benchmarks. Now, if the benchmarks are both characterised and ordered, one can see for which regions the innovation works, thus showing that it is in fact useful for a (limited) class of programs &#8212; compare this to a drug that works for 70% of a population.</p>
<p>It is time to place experimental computer science on solid grounds with respect to both the design of an experiment as well as its measurement and evaluation. This requires the following steps. First, gauge the width and depth of the problem by uncovering all pitfalls. Then, provide researchers with tools and sound methodologies to conduct experiments. </p>
]]></content:encoded>
			<wfw:commentRss>http://www.itkovian.net/base/position-statement/feed/</wfw:commentRss>
		<slash:comments>11</slash:comments>
		</item>
		<item>
		<title>Performance Metrics for Consolidated Servers</title>
		<link>http://www.itkovian.net/base/performance-metrics-for-consolidated-servers/</link>
		<comments>http://www.itkovian.net/base/performance-metrics-for-consolidated-servers/#comments</comments>
		<pubDate>Tue, 13 Apr 2010 12:54:34 +0000</pubDate>
		<dc:creator>Itkovian</dc:creator>
				<category><![CDATA[paper]]></category>
		<category><![CDATA[research]]></category>
		<category><![CDATA[consolidation]]></category>
		<category><![CDATA[eurosys]]></category>
		<category><![CDATA[hpcvirt]]></category>
		<category><![CDATA[metrics]]></category>
		<category><![CDATA[performance]]></category>
		<category><![CDATA[virtualisation]]></category>

		<guid isPermaLink="false">http://www.itkovian.net/base/performance-metrics-for-consolidated-servers/</guid>
		<description><![CDATA[The following paper got accepted for the HPCVirt 2010 workshop, taking place in Paris, France (duh). Performance Metrics for Consolidated Servers, Andy Georges, and Lieven Eeckhout. The abstract of the paper reads as follows: In spite of the widespread adoption of virtualization and consol- idation, there exists no consensus with respect to how to bench- [...]]]></description>
			<content:encoded><![CDATA[<p>The following paper got accepted for the <a href="http://www.csm.ornl.gov/srt/conferences/hpcvirt2010/">HPCVirt 2010 workshop</a>, taking place in Paris, France (duh). </p>
<p>Performance Metrics for Consolidated Servers, <a href="http://itkovian.net">Andy Georges</a>, and <a href="http://www.elis.ugent.be/~leeckhou">Lieven Eeckhout</a>.</p>
<p>The abstract of the paper reads as follows:</p>
<p>In spite of the widespread adoption of virtualization and consol- idation, there exists no consensus with respect to how to bench- mark consolidated servers that run multiple guest VMs on the same physical hardware. For example, VMware proposes VMmark which basically computes the geometric mean of normalized throughput values across the VMs; Intel uses vConsolidate which reports a weighted arithmetic average of normalized throughput values.</p>
<p>These benchmarking methodologies focus on total system through- put (i.e., across all VMs in the system), and do not take into account per-VM performance. We argue that a benchmarking methodology for consolidated servers should quantify both total system through- put and per-VM performance in order to provide a meaningful and precise performance characterization. We therefore present two performance metrics, Total Normalized Throughput (TNT) to characterize total system performance, and Average Normalized Reduced Throughput (ANRT) to characterize per-VM performance.</p>
<p>We compare TNT and ANRT against VMmark using published performance numbers, and report several cases for which the VM- mark score is misleading. This is, VMmark says one platform yields better performance than another, however, TNT and ANRT show that both platforms represent different trade-offs in total system throughput versus per-VM performance. Or, even worse, in a cou- ple cases we observe that VMmark yields opposite conclusions than TNT and ANRT, i.e., VMmark says one system performs better than another one which is contradicted by the TNT/ANRT performance characterization.</p>
<p>You can find a <a href="http://itkovian.net/base/files/papers/hpcvirt-2010-ageorges-preprint.pdf">preprint</a> to the full paper. The presentation slides are up <a href="http://itkovian.net/base/files/papers/hpcvirt-2010-ageorges-presentation.pdf">here</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.itkovian.net/base/performance-metrics-for-consolidated-servers/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Automated Just-in-time Compiler Tuning</title>
		<link>http://www.itkovian.net/base/automated-just-in-time-compiler-tuning/</link>
		<comments>http://www.itkovian.net/base/automated-just-in-time-compiler-tuning/#comments</comments>
		<pubDate>Mon, 25 Jan 2010 14:49:29 +0000</pubDate>
		<dc:creator>Itkovian</dc:creator>
				<category><![CDATA[paper]]></category>
		<category><![CDATA[research]]></category>
		<category><![CDATA[cgo]]></category>
		<category><![CDATA[genetic algorithm]]></category>
		<category><![CDATA[java]]></category>
		<category><![CDATA[jit compilation]]></category>

		<guid isPermaLink="false">http://www.itkovian.net/base/?p=243</guid>
		<description><![CDATA[I'm pretty excited about our paper to get accepted at CGO 2010, which takes place in Toronto, Canada.

Automated Just-in-time Compiler tuning, <a href="http://www.elis.ugent.be/~kehoste">Kenneth Hoste</a>, <a href="http://itkovian.net/base">Andy Georges</a>, and <a href="http://www.elis.ugent.be/~leeckhou">Lieven Eeckhout</a>.

The abstract of the paper reads as follows.

<p> Managed runtime systems, such as a Java virtual machine (JVM), are complex pieces of software with many interacting components.  The Just-In-Time (JIT) compiler is at the core of the virtual machine, however, tuning the compiler for optimum performance is a challenging task.  There are (i) many compiler optimizations and options, (ii) there may be multiple optimization levels (e.g., <tt>-O0</tt>, <tt>-O1</tt>, <tt>-O2</tt>), each with a specific optimization plan consisting of a collection of optimizations, (iii) the Adaptive Optimization System (AOS) that decides which method to optimize to which optimization level requires fine-tuning, and (iv) the effectiveness of the optimizations depends on the application as well as on the hardware platform.  Current practice is to manually tune the JIT compiler which is both tedious and very time-consuming, and in addition may lead to suboptimal performance.

This paper proposes automated tuning of the JIT compiler through multi-objective evolutionary search.  The proposed framework (i) identifies optimization plans that are Pareto-optimal in terms of compilation time and code quality, (ii) assigns these plans to optimization levels, and (iii) fine-tunes the AOS accordingly. The key benefit of our framework is that it automates the entire exploration process, which enables tuning the JIT compiler for a given hardware platform and/or application at very low cost.

By automatically tuning <a href="http://jikesrvm.org">Jikes RVM</a> using our framework for average performance across the <a href="http://dacapobench.org">DaCapo</a> and <a href="http://www.spec.org/jvm98">SPECjvm98</a> benchmark suites, we achieve similar performance to the hand-tuned default Jikes RVM. When optimizing the JIT compiler for individual benchmarks, we achieve statistically significant speedups for most benchmarks, up to 40% for start-up and up to 19% for steady-state performance.  We also show that tuning the JIT compiler for a new hardware platform can yield significantly better performance compared to using a JIT compiler that was tuned for another platform.

<p>You can get a <a href="http://itkovian.net/base/files/papers/cgo2010-hoste-preprint.pdf">preprint</a> of the paper. We also plan to make our tool available, so it can be used to automagically tune other VMs *cough* J9 *cough*.]]></description>
			<content:encoded><![CDATA[<p>I&#8217;m pretty excited about our paper to get accepted at CGO 2010, which takes place in Toronto, Canada.</p>
<p>Automated Just-in-time Compiler tuning, <a href="http://www.elis.ugent.be/~kehoste">Kenneth Hoste</a>, <a href="http://itkovian.net/base">Andy Georges</a>, and <a href="http://www.elis.ugent.be/~leeckhou">Lieven Eeckhout</a>.</p>
<p>The abstract of the paper reads as follows.</p>
<p> Managed runtime systems, such as a Java virtual machine (JVM), are complex pieces of software with many interacting components.  The Just-In-Time (JIT) compiler is at the core of the virtual machine, however, tuning the compiler for optimum performance is a challenging task.  There are (i) many compiler optimizations and options, (ii) there may be multiple optimization levels (e.g., <tt>-O0</tt>, <tt>-O1</tt>, <tt>-O2</tt>), each with a specific optimization plan consisting of a collection of optimizations, (iii) the Adaptive Optimization System (AOS) that decides which method to optimize to which optimization level requires fine-tuning, and (iv) the effectiveness of the optimizations depends on the application as well as on the hardware platform.  Current practice is to manually tune the JIT compiler which is both tedious and very time-consuming, and in addition may lead to suboptimal performance.</p>
<p>This paper proposes automated tuning of the JIT compiler through multi-objective evolutionary search.  The proposed framework (i) identifies optimization plans that are Pareto-optimal in terms of compilation time and code quality, (ii) assigns these plans to optimization levels, and (iii) fine-tunes the AOS accordingly. The key benefit of our framework is that it automates the entire exploration process, which enables tuning the JIT compiler for a given hardware platform and/or application at very low cost.</p>
<p>By automatically tuning <a href="http://jikesrvm.org">Jikes RVM</a> using our framework for average performance across the <a href="http://dacapobench.org">DaCapo</a> and <a href="http://www.spec.org/jvm98">SPECjvm98</a> benchmark suites, we achieve similar performance to the hand-tuned default Jikes RVM. When optimizing the JIT compiler for individual benchmarks, we achieve statistically significant speedups for most benchmarks, up to 40% for start-up and up to 19% for steady-state performance.  We also show that tuning the JIT compiler for a new hardware platform can yield significantly better performance compared to using a JIT compiler that was tuned for another platform.</p>
<p>You can get a <a href="http://itkovian.net/base/files/papers/cgo2010-hoste-preprint.pdf">preprint</a> of the paper. We also plan to make our tool available, so it can be used to automagically tune other VMs *cough* J9 *cough*.<br />
<!--break--></p>
]]></content:encoded>
			<wfw:commentRss>http://www.itkovian.net/base/automated-just-in-time-compiler-tuning/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
		<item>
		<title>Java Performance through Rigorous Replay Compilation</title>
		<link>http://www.itkovian.net/base/java-performance-through-rigorous-replay-compilation/</link>
		<comments>http://www.itkovian.net/base/java-performance-through-rigorous-replay-compilation/#comments</comments>
		<pubDate>Tue, 19 Aug 2008 01:16:18 +0000</pubDate>
		<dc:creator>Itkovian</dc:creator>
				<category><![CDATA[paper]]></category>
		<category><![CDATA[research]]></category>
		<category><![CDATA[java]]></category>
		<category><![CDATA[non-determinism]]></category>
		<category><![CDATA[oopsla]]></category>
		<category><![CDATA[replay]]></category>
		<category><![CDATA[statistics]]></category>

		<guid isPermaLink="false">http://www.itkovian.net/base/?p=221</guid>
		<description><![CDATA[At this year's <a href="http://oopsla.org/oopsla2008/">OOPSLA</a> I am going to present a paper that was accepted in the Research Papers track.

Java Performance Evaluation through Rigorous Replay Compilation, <a href="http://itkovian.net">Andy Georges</a>, and <a href="http://www.elis.ugent.be/~leeckhou">Lieven Eeckhout</a>, <a href="http://buytaert.net">Dries Buytaert</a>

The abstract of this paper reads as follows.

A managed runtime environment, such as the Java virtual machine, is non-trivial to benchmark. Java performance is affected in various complex ways by the application and its input, as well as by the virtual machine (JIT optimizer, garbage collector, thread scheduler, etc.). In addition, non-determinism due to timer-based sampling for JIT optimization, thread scheduling, and various system effects further complicate the Java performance benchmarking process.

Replay compilation is a recently introduced Java performance analysis methodology that aims at controlling non-determinism to improve experimental repeatability. The key idea of replay compilation is to control the compilation load during experimentation by inducing a pre-recorded compilation plan at replay time. Replay compilation also enables teasing apart performance effects of the application versus the virtual machine.

This paper argues that in contrast to current practice which uses a single compilation plan at replay time, multiple compilation plans add statistical rigor to the replay compilation methodology. By doing so, replay compilation better accounts for the variability observed in compilation load across compilation plans. In addition, we propose matched-pair comparison for statistical data analysis. Matched-pair comparison considers the performance measurements per compilation plan before and after an innovation of interest as a pair, which enables limiting the number of compilation plans needed for accurate performance analysis compared to statistical analysis assuming unpaired measurements.


The bulk of this paper made up Chapter 5 in my PhD dissertation, which was published on April 30, 2008. Here and there slight improvements were made before we submitted the final version. You can get a <a href="http://itkovian.net/base/files/papers/oopsla2008-georges-preprint.pdf">preprint</a> version of the paper. The presentation I gave is available in both <a href="http://itkovian.net/base/files/papers/oopsla2008-georges-presentation.pdf">pdf</a> format or as a <a href="http://itkovian.net/base/files/papers/oopsla2008-georges-presentation.zip">zipped Keynote</a> archive.]]></description>
			<content:encoded><![CDATA[<p>At this year&#8217;s <a href="http://oopsla.org/oopsla2008/">OOPSLA</a> I am going to present a paper that was accepted in the Research Papers track.</p>
<p>Java Performance Evaluation through Rigorous Replay Compilation, <a href="http://itkovian.net">Andy Georges</a>, and <a href="http://www.elis.ugent.be/~leeckhou">Lieven Eeckhout</a>, <a href="http://buytaert.net">Dries Buytaert</a></p>
<p>The abstract of this paper reads as follows.</p>
<p>A managed runtime environment, such as the Java virtual machine, is non-trivial to benchmark. Java performance is affected in various complex ways by the application and its input, as well as by the virtual machine (JIT optimizer, garbage collector, thread scheduler, etc.). In addition, non-determinism due to timer-based sampling for JIT optimization, thread scheduling, and various system effects further complicate the Java performance benchmarking process.</p>
<p>Replay compilation is a recently introduced Java performance analysis methodology that aims at controlling non-determinism to improve experimental repeatability. The key idea of replay compilation is to control the compilation load during experimentation by inducing a pre-recorded compilation plan at replay time. Replay compilation also enables teasing apart performance effects of the application versus the virtual machine.</p>
<p>This paper argues that in contrast to current practice which uses a single compilation plan at replay time, multiple compilation plans add statistical rigor to the replay compilation methodology. By doing so, replay compilation better accounts for the variability observed in compilation load across compilation plans. In addition, we propose matched-pair comparison for statistical data analysis. Matched-pair comparison considers the performance measurements per compilation plan before and after an innovation of interest as a pair, which enables limiting the number of compilation plans needed for accurate performance analysis compared to statistical analysis assuming unpaired measurements.</p>
<p>The bulk of this paper made up Chapter 5 in my PhD dissertation, which was published on April 30, 2008. Here and there slight improvements were made before we submitted the final version. You can get a <a href="http://itkovian.net/base/files/papers/oopsla2008-georges-preprint.pdf">preprint</a> version of the paper. The presentation I gave is available in both <a href="http://itkovian.net/base/files/papers/oopsla2008-georges-presentation.pdf">pdf</a> format or as a <a href="http://itkovian.net/base/files/papers/oopsla2008-georges-presentation.zip">zipped Keynote</a> archive.<!--break--></p>
]]></content:encoded>
			<wfw:commentRss>http://www.itkovian.net/base/java-performance-through-rigorous-replay-compilation/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Statistically Rigorous Java Performance Evaluation</title>
		<link>http://www.itkovian.net/base/statistically-rigorous-java-performance-evaluation/</link>
		<comments>http://www.itkovian.net/base/statistically-rigorous-java-performance-evaluation/#comments</comments>
		<pubDate>Mon, 18 Jun 2007 10:18:56 +0000</pubDate>
		<dc:creator>Itkovian</dc:creator>
				<category><![CDATA[paper]]></category>
		<category><![CDATA[research]]></category>
		<category><![CDATA[java]]></category>
		<category><![CDATA[non-determinism]]></category>
		<category><![CDATA[oopsla]]></category>
		<category><![CDATA[statistics]]></category>

		<guid isPermaLink="false">http://www.itkovian.net/base/?p=158</guid>
		<description><![CDATA[The following paper has been accepted for OOPSLA 2007.

Statistically Rigorous Java Performance Evaluation, <a href="http://itkovian.net">Andy Georges</a>, <a href="http://buytaert.net">Dries Buytaert</a>, and <a href="http://www.elis.ugent.be/~leeckhou">Lieven Eeckhout</a>.

The abstract reads as follows.]]></description>
			<content:encoded><![CDATA[<p>The following paper has been accepted for OOPSLA 2007.</p>
<p>Statistically Rigorous Java Performance Evaluation, <a href="http://itkovian.net">Andy Georges</a>, <a href="http://buytaert.net">Dries Buytaert</a>, and <a href="http://www.elis.ugent.be/~leeckhou">Lieven Eeckhout</a>.</p>
<p>The abstract reads as follows.</p>
<p>Java performance is far from being trivial to benchmark because it is affected by various factors such as the Java application, its input, the virtual machine, the garbage collector, the heap size, etc. In addition, non-determinism at run-time causes the execution time of a Java program to differ from run to run. There are a number of sources of non-determinism such as Just-In-Time (JIT) compilation and optimization in the virtual machine (VM) driven by timer-based method sampling, thread scheduling, garbage collection, and various system effects.</p>
<p>There exist a wide variety of Java performance evaluation methodologies used by researchers and benchmarkers. These methodologies differ from each other in a number of ways. Some report average performance over a number of runs of the same experiment; others report the best or second best performance observed; yet others report the worst. Some iterate the benchmark multiple times within a single VM invocation; others consider multiple VM invocations and iterate a single benchmark execution;<br />
yet others consider multiple VM invocations and iterate the benchmark multiple times.</p>
<p>This paper shows that prevalent methodologies can be misleading, and can even lead to incorrect conclusions. The reason is that the data analysis is not statistically rigorous. In this paper, we present a survey of existing Java performance evaluation methodologies and discuss the importance of statistically rigorous data analysis for dealing with non-determinism. We advocate approaches to quantify startup as well as steady-state performance, and, in addition, we provide the JavaStats software to automatically obtain performance numbers in a rigorous manner. Although this paper focuses on Java performance evaluation, many of the issues addressed in this paper also apply to other programming languages and systems that build on a managed runtime system.</p>
<p>This paper took quite some work, especially in the experimentation-wise. While the initial reviews were very positive, they required us to perform several extra experiments. But in the end, it was worth the effort. You can get a <a href="http://itkovian.net/base/files/papers/oopsla2007-georges-preprint.pdf">preprint</a> version.</p>
<p>So, 2 out of X at OOPSLA for us! Yay!</p>
]]></content:encoded>
			<wfw:commentRss>http://www.itkovian.net/base/statistically-rigorous-java-performance-evaluation/feed/</wfw:commentRss>
		<slash:comments>9</slash:comments>
		</item>
		<item>
		<title>Using HPM-Sampling to Drive Dynamic Compilation</title>
		<link>http://www.itkovian.net/base/using-hpm-sampling-to-drive-dynamic-compilation/</link>
		<comments>http://www.itkovian.net/base/using-hpm-sampling-to-drive-dynamic-compilation/#comments</comments>
		<pubDate>Sat, 12 May 2007 05:16:39 +0000</pubDate>
		<dc:creator>Itkovian</dc:creator>
				<category><![CDATA[paper]]></category>
		<category><![CDATA[research]]></category>
		<category><![CDATA[hpm]]></category>
		<category><![CDATA[java]]></category>
		<category><![CDATA[oopsla]]></category>

		<guid isPermaLink="false">http://www.itkovian.net/base/?p=148</guid>
		<description><![CDATA[The following paper has been acepted for publication at OOPSLA 2007.

<b>Using HPM-Sampling to Drive Dynamic Compilation</b>, <a href="http://buytaert.net">Dries Buytaert</a>, <a href="http://itkovian.net">Andy Georges</a>, <a href="http://www.research.ibm.com/people/h/hind/">Michael Hind</a>, <a href="http://www.research.ibm.com/people/m/marnold/">Matthew Arnold</a>, <a href="http://www.elis.ugent.be/~leeckhou">Lieven Eeckhout</a>, and <a href="http://www.elis.ugent.be/~kdb">Koen De Bosschere</a>.

The paper abstract reads as follows.

All high-performance production JVMs employ an adaptive strategy for program execution.  Methods are first executed unoptimized and then an online profiling mechanism is used to find a subset of methods that should be optimized during the same execution.  This paper empirically evaluates the design space of
several profilers for initiating dynamic compilation and shows that existing online profiling schemes suffer from several limitations. They provide an insufficient number of samples, are untimely, and have limited accuracy at determining the frequently executed methods.  We describe and comprehensively evaluate HPM-sampling, a simple but effective profiling scheme for finding  optimization candidates using hardware performance monitors (HPMs) that addresses the aforementioned limitations.  We show that HPM-sampling is more accurate; has low overhead; and improves performance by 5.7\% on average and up to 18.3\% when compared to the default system in Jikes RVM, without changing the compiler.]]></description>
			<content:encoded><![CDATA[<p>The following paper has been acepted for publication at OOPSLA 2007.</p>
<p><b>Using HPM-Sampling to Drive Dynamic Compilation</b>, <a href="http://buytaert.net">Dries Buytaert</a>, <a href="http://itkovian.net">Andy Georges</a>, <a href="http://www.research.ibm.com/people/h/hind/">Michael Hind</a>, <a href="http://www.research.ibm.com/people/m/marnold/">Matthew Arnold</a>, <a href="http://www.elis.ugent.be/~leeckhou">Lieven Eeckhout</a>, and <a href="http://www.elis.ugent.be/~kdb">Koen De Bosschere</a>.</p>
<p>The paper abstract reads as follows.</p>
<p>All high-performance production JVMs employ an adaptive strategy for program execution.  Methods are first executed unoptimized and then an online profiling mechanism is used to find a subset of methods that should be optimized during the same execution.  This paper empirically evaluates the design space of<br />
several profilers for initiating dynamic compilation and shows that existing online profiling schemes suffer from several limitations. They provide an insufficient number of samples, are untimely, and have limited accuracy at determining the frequently executed methods.  We describe and comprehensively evaluate HPM-sampling, a simple but effective profiling scheme for finding  optimization candidates using hardware performance monitors (HPMs) that addresses the aforementioned limitations.  We show that HPM-sampling is more accurate; has low overhead; and improves performance by 5.7\% on average and up to 18.3\% when compared to the default system in Jikes RVM, without changing the compiler.<br />
<!--break--></p>
<p>Montréal, here we come. October 21st &#8211; October 25th it is!</p>
<p>This paper has quite a long history behind it. Dries and I conceived the idea while attending the <a href="http://www.hipeac.net/acaces2006/">ACACES</a> summerschool in July 2006. After a long talk with Mike, we decided to launch some preliminary measurements with the system Dries had already built into Jikes RVM using the HPM interface I had adapted from <a href="http://cs.anu.edu.au/~Steve.Blackburn/">Steve Blackburn</a>&#8216;s <a href="http://user.it.uu.se/~mikpe/linux/perfctr/">perfctr patch for <a href="http://jikesrvm.sourceforge.net">Jikes RVM</a>. We intially targetted PLDI 2007, when some matters were brought to our attention, that questioned our original idea on the current state of the art. Submission was postponed, extra experiments were conducted and we targetted VEE instead, where our paper was rejected. Based on the reviews we received there, it seems like it was a border case, but a rejection nonetheless. So, we figured, why not submit to OOPSLA. Worst case scenario: we get additional reviews to improve our paper. I turns out that the Best Case Scenario was visited upon us instead. You can get a <a href="http://itkovian.net/base/files/papers/oopsla2007-buytaert-preprint.pdf">preprint</a> version.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.itkovian.net/base/using-hpm-sampling-to-drive-dynamic-compilation/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Method-Level Phase Behavior in Java Workloads.</title>
		<link>http://www.itkovian.net/base/method-level-phase-behavior-in-java-workloads/</link>
		<comments>http://www.itkovian.net/base/method-level-phase-behavior-in-java-workloads/#comments</comments>
		<pubDate>Tue, 13 Jul 2004 07:59:59 +0000</pubDate>
		<dc:creator>Itkovian</dc:creator>
				<category><![CDATA[paper]]></category>
		<category><![CDATA[research]]></category>
		<category><![CDATA[java]]></category>
		<category><![CDATA[oopsla]]></category>
		<category><![CDATA[performance counters]]></category>
		<category><![CDATA[phases]]></category>

		<guid isPermaLink="false">http://www.itkovian.net/base/?p=153</guid>
		<description><![CDATA[The following paper has been accepted for publication at OOPSLA 2004

Method-Level Phase Behavior in Java Workloads, <a href="http://itkovian.net/">Andy Georges</a>, <a href="http://buytaert.net/">Dries Buytaert</a>, <a href="http://www.elis.ugent.be/~leeckhou">Lieven Eeckhout</a>, and <a href="http://www.elis.ugent.be/~kdb">Koen De Bosschere</a>

The paper abstract reads as follows.]]></description>
			<content:encoded><![CDATA[<p>The following paper has been accepted for publication at OOPSLA 2004</p>
<p>Method-Level Phase Behavior in Java Workloads, <a href="http://itkovian.net/">Andy Georges</a>, <a href="http://buytaert.net/">Dries Buytaert</a>, <a href="http://www.elis.ugent.be/~leeckhou">Lieven Eeckhout</a>, and <a href="http://www.elis.ugent.be/~kdb">Koen De Bosschere</a></p>
<p>The paper abstract reads as follows.</p>
<p>Java workloads are becoming more and more prominent on various computing devices. Understanding the behavior of a Java workload which includes the interaction between the application and the virtual machine (VM), is thus of primary importance during performance analysis and optimization. Moreover, as contemporary software projects are increasing in complexity, automatic performance analysis techniques are indispensable. This paper proposes an off-line method-level phase analysis approach for Java workloads that consists of three steps. In the first step, the execution time is computed for each method invocation. Using an off-line tool, we subsequently analyze the dynamic call graph (that is annotated with the method invocations` execution times) to identify method-level phases. Finally, we measure performance characteristics for each of the selected phases. This is done using hardware performance monitors. As such, our approach allows for linking microprocessor-level information at the individual methods in the Java application`s source code. This is extremely interesting information during performance analysis and optimization as programmers can use this information to optimize their code. We evaluate our approach in the Jikes RVM on an IA-32 platform using the SPECjvm98 and SPECjbb2000 benchmarks. This is done according to a number of important criteria: the overhead during profiling, the variability within and between the phases, its applicability in Java workload characterization (measuring performance characteristics of the various VM components) and application bottleneck identification.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.itkovian.net/base/method-level-phase-behavior-in-java-workloads/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>How Java Programs Interact with Virtual Machines at the Microarchitectural Level</title>
		<link>http://www.itkovian.net/base/how-java-programs-interact-with-virtual-machines-at-the-microarchitectural-level/</link>
		<comments>http://www.itkovian.net/base/how-java-programs-interact-with-virtual-machines-at-the-microarchitectural-level/#comments</comments>
		<pubDate>Sat, 12 Jul 2003 07:59:59 +0000</pubDate>
		<dc:creator>Itkovian</dc:creator>
				<category><![CDATA[paper]]></category>
		<category><![CDATA[research]]></category>
		<category><![CDATA[java]]></category>
		<category><![CDATA[oopsla]]></category>
		<category><![CDATA[performance counters]]></category>
		<category><![CDATA[virtual machine]]></category>

		<guid isPermaLink="false">http://www.itkovian.net/base/?p=152</guid>
		<description><![CDATA[The following paper has been accepted for publication at OOPSLA 2003

How Java Programs Interact with Virtual Machines at the Microarchitectural Level,  <a href="http://www.elis.ugent.be/~leeckhou">Lieven Eeckhout</a>, <a href="http://itkovian.net">Andy Georges</a>, and <a href="http://www.elis.ugent.be/~kdb">Koen De Bosschere</a>.

The paper abstract reads as follows.]]></description>
			<content:encoded><![CDATA[<p>The following paper has been accepted for publication at OOPSLA 2003</p>
<p>How Java Programs Interact with Virtual Machines at the Microarchitectural Level,  <a href="http://www.elis.ugent.be/~leeckhou">Lieven Eeckhout</a>, <a href="http://itkovian.net">Andy Georges</a>, and <a href="http://www.elis.ugent.be/~kdb">Koen De Bosschere</a>.</p>
<p>The paper abstract reads as follows.</p>
<p>Java workloads are becoming increasingly prominent on various platforms ranging from embedded systems, over general-purpose computers to high-end servers. Understanding the implications of all the aspects involved when running Java workloads, is thus extremely important during the design of a system that will run such workloads. In other words, understanding the interaction between the Java application, its input and the virtual machine it runs on, is key to a succesful design. The goal of this paper is to study this complex interaction at the microarchitectural level, e.g., by analyzing the branch behavior, the cache behavior, etc. This is done by measuring a large number of performance characteristics using performance counters on an AMD K7 Duron microprocessor. These performance characteristics are measured for seven virtual machine configurations, and a collection of Java benchmarks with corresponding inputs coming from the SPECjvm98 benchmark suite, the SPECjbb2000 benchmark suite, the Java Grande Forum benchmark suite and an open-source raytracer, called Raja with 19 scene descriptions. This large amount of data is further analyzed using statistical data analysis techniques, namely principal components analysis and cluster analysis. These techniques provide useful insights in an understandable way.From our experiments, we conclude that (i) the behavior observed at the microarchitectural level is primarily determined by the virtual machine for small input sets, e.g., the SPECjvm98 s1 input set; (ii) the behavior can be quite different for various input sets, e.g., short-running versus long-running benchmarks; (iii) for long-running benchmarks with few hot spots, the behavior can be primarily determined by the Java program and not the virtual machine, i.e., all the virtual machines optimize the hot spots to similarly behaving native code; (iv) in general, the behavior of a Java application running on one virtual machine can be significantly different from running on another virtual machine. These conclusions warn researchers working on Java workloads to be careful when using a limited number of Java benchmarks or virtual machines since this might lead to biased conclusions.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.itkovian.net/base/how-java-programs-interact-with-virtual-machines-at-the-microarchitectural-level/feed/</wfw:commentRss>
		<slash:comments>7</slash:comments>
		</item>
	</channel>
</rss>

