<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Geek in progress &#187; performance counters</title>
	<atom:link href="http://www.itkovian.net/base/tag/performance-counters/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.itkovian.net/base</link>
	<description>I am not yet done.</description>
	<lastBuildDate>Thu, 20 Oct 2011 20:56:59 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.2.1</generator>
		<item>
		<title>Using performance counters with multi-threaded applications</title>
		<link>http://www.itkovian.net/base/using-performance-counters-with-multi-threaded-applications/</link>
		<comments>http://www.itkovian.net/base/using-performance-counters-with-multi-threaded-applications/#comments</comments>
		<pubDate>Fri, 23 May 2008 11:28:17 +0000</pubDate>
		<dc:creator>Itkovian</dc:creator>
				<category><![CDATA[kernel]]></category>
		<category><![CDATA[multi-threaded applications]]></category>
		<category><![CDATA[patch]]></category>
		<category><![CDATA[performance counters]]></category>

		<guid isPermaLink="false">http://www.itkovian.net/base/?p=214</guid>
		<description><![CDATA[Since a few years, there is quite good support for using performance counters on Linux machines. Examples are <a href="http://oprofile.sourceforge.net/">OProfile</a> (which has been included in the kernel since 2.6, I think), <a href="http://user.it.uu.se/~mikpe/linux/perfctr/">Perfctr</a>, and <a href="http://perfmon2.sourceforge.net/">Perfmon</a> (not to be confused with <a href="http://perfmon.sourceforge.net/">the other Perfmon</a>, which is a SNMP based performance monitoring tool). I think Perfmon is destined to make it to the kernel source tree as well, or so I've heard. Yet, I have been using Perfctr since I started my research, so this post is only about that tool.

There has been talk on the Perfctr mailing list (which gets hopelessly spammed these days) for including support for multi-threaded processes, but thus far I've seen nothing that does what I want. So, without further ado, here's how to patch your kernel to support multi-threaded applications.]]></description>
			<content:encoded><![CDATA[<p>Since a few years, there is quite good support for using performance counters on Linux machines. Examples are <a href="http://oprofile.sourceforge.net/">OProfile</a> (which has been included in the kernel since 2.6, I think), <a href="http://user.it.uu.se/~mikpe/linux/perfctr/">Perfctr</a>, and <a href="http://perfmon2.sourceforge.net/">Perfmon</a> (not to be confused with <a href="http://perfmon.sourceforge.net/">the other Perfmon</a>, which is a SNMP based performance monitoring tool). I think Perfmon is destined to make it to the kernel source tree as well, or so I&#8217;ve heard. Yet, I have been using Perfctr since I started my research, so this post is only about that tool.</p>
<p>There has been talk on the Perfctr mailing list (which gets hopelessly spammed these days) for including support for multi-threaded processes, but thus far I&#8217;ve seen nothing that does what I want. So, without further ado, here&#8217;s how to patch your kernel to support multi-threaded applications. <!--break--></p>
<p>I assume you know how to install the basic Perfctr driver, and compile your kernel to add support for it. If possible, compile it as a module, as this is easiest if you need to change things (unless you also change stuff in the kernel header files, in which case you probably want to recompile the complete kernel). Let&#8217;s further assume your kernel source lives in /usr/src/linux, further referred to as toplevel. I&#8217;ll also assume we&#8217;re building the Perfctr module here.</p>
<p>The first thing that needs to be done is make sure that child processes set up their kernel data structures such that performance counter data can be stored at context switch (you want to use virtual counters, i.e., per-process counters). Therefore you need to add a function that does exactly this. We&#8217;ll call it __vperfctr_set_child_perfctr(struct task_struct*, struct task_struct*). You also want to be able to set up and empty vperfctr structure, by simply allocating space for it. So, add their existance to your toplevel/include/linux/perfctr.h file (which should be there if you patched the kernel and copied the relevant files when installing Perfctr), which should then read:</p>
<div class="code">
<ol>
<li>#ifdef CONFIG_PERFCTR_MODULE
<li>extern struct vperfctr_stub {
<li class="indent">struct module *owner;
<li class="indent">void (*exit)(struct vperfctr*);
<li class="indent">void (*suspend)(struct vperfctr*);
<li class="indent">void (*resume)(struct vperfctr*);
<li class="indent">void (*sample)(struct vperfctr*);
<li class="newline indent">struct vperfctr* (*get_empty)(void);
<li class="newline indent">void (*set_child_perfctr) (struct task_struct*, struct task_struct*);
<li>#ifdef CONFIG_PERFCTR_CPUS_FORBIDDEN_MASK
<li class="indent">void (*set_cpus_allowed)(struct task_struct*, struct vperfctr*, cpumask_t);
<li>#endif
<li>} vperfctr_stub;
<li>
<li>extern void _vperfctr_exit(struct vperfctr*);
<li>#define _vperfctr_suspend(x)  vperfctr_stub.suspend((x))
<li>#define _vperfctr_resume(x) vperfctr_stub.resume((x))
<li>#define _vperfctr_sample(x) vperfctr_stub.sample((x))
<li class="newline">#define _vperfctr_get_empty()   vperfctr_stub.get_empty()
<li class="newline">#define _vperfctr_set_child_perfctr(x,y)  vperfctr_stub.set_child_perfctr((x),(y))
<li>#define _vperfctr_set_cpus_allowed(x,y,z) (*vperfctr_stub.set_cpus_allowed)((x),(y),(z))
<li>#else /* !CONFIG_PERFCTR_MODULE */
<li>#define _vperfctr_exit(x) __vperfctr_exit((x))
<li>#define _vperfctr_suspend(x)  __vperfctr_suspend((x))
<li>#define _vperfctr_resume(x) __vperfctr_resume((x))
<li>#define _vperfctr_sample(x) __vperfctr_sample((x))
<li class="newline">#define _vperfctr_get_empty()   __vperfctr_get_empty()
<li class="newline">#define _vperfctr_set_child_perfctr(x,y) __vperfctr_set_child_perfctr((x),(y))
<li>#define _vperfctr_set_cpus_allowed(x,y,z) __vperfctr_set_cpus_allowed((x),(y),(z))
<li>#endif  /* CONFIG_PERFCTR_MODULE */
</ol>
</div>
<p>In the same file you should add some code to the perfctr_copy_task(struct task_struct *, struct pt_regs *) function. Otherwise it only contains a comment stating that nothing should be done until inheritance is implemented and sets the vperfctr structure to NULL. The code for that function should become the following:</p>
<div class="code">
<ol>
<li>static inline void perfctr_copy_task(struct task_struct *child, struct pt_regs *regs) {
<li>
<li class="indent newline">if(current-&gt;thread.perfctr != NULL) {
<li class="indent2 newline">child-&gt;thread.perfctr = _vperfctr_get_empty();
<li class="indent2 newline">if(!child-&gt;thread.perfctr) {
<li class="indent3 newline">printk(&#8220;PERFCTR::error activating child perfctr\n&#8221;);
<li class="indent2 newline">}
<li class="indent2 newline">else {
<li class="indent3 newline">_vperfctr_set_child_perfctr(current, child);
<li class="indent2 newline">}
<li class="indent newline">}
<li class="indent newline">else {
<li class="indent2">child-&gt;thread.perfctr = NULL;
<li class="indent newline">}
<li class="indent">
<li>}
</ol>
</div>
<p>The above will get a new structure set up (through __vperfctr_get_empty) and copy the existing settings (i.e., for the control registers etc.) to that structure (through __vperfctr_set_child_perfctr). Before we can use these functions, we need to define them, which is done in toplevel/drivers/perfctr/virtual.c.</p>
<div class="code">
<ol>
<li class="indent newline">struct vperfctr* __vperfctr_get_empty(void) {
<li class="indent2 newline">  return get_empty_vperfctr();
<li class="indent newline">}
<li class="indent newline">
<li class="indent newline">void __vperfctr_set_child_perfctr(struct task_struct* parent, struct task_struct* child) {
<li class="indent newline">
<li class="indent2 newline">  int err;
<li class="indent2 newline">  struct vperfctr* parent_perfctr = parent-&gt;thread.perfctr;
<li class="indent2 newline">  struct vperfctr* child_perfctr = child-&gt;thread.perfctr;
<li class="indent newline">
<li class="indent2 newline">  if(!child_perfctr) { /* check should have been done before! */
<li class="indent3 newline">    return;
<li class="indent2 newline">  }
<li class="indent newline">
<li class="indent2 newline">  child_perfctr-&gt;owner = child;
<li class="indent2 newline">  memcpy(&#038;(child_perfctr-&gt;cpu_state.control), &#038;(parent_perfctr-&gt;cpu_state.control), sizeof(parent_perfctr-&gt;cpu_state.control));
<li class="indent2 newline">  child_perfctr-&gt;si_signo = parent_perfctr-&gt;si_signo;
<li class="indent newline">
<li class="indent newline">#ifdef CONFIG_SMP
<li class="indent2 newline">  child_perfctr-&gt;sampling_timer = parent_perfctr-&gt;sampling_timer;
<li class="indent newline">#endif
<li class="indent newline">
<li class="indent2 newline">  err = perfctr_cpu_update_control(&#038;child_perfctr-&gt;cpu_state, 0);
<li class="indent2 newline">  if(err &lt; 0) {
<li class="indent3 newline">    printk(&#8220;perfctr::error::__vperfctr_set_child cstatus &lt; 0 &#8220;);
<li class="indent2 newline">  }
<li class="indent newline">
<li class="indent newline">}
</ol>
</div>
<p>I think the Perfctr patch uses p-&gt;thread as the first argument, so change this accordingly.</p>
<p>Now, to get the counter values assembled in one central spot, I&#8217;m using a second module that accumulates these values. Upon exit, each thread will pass on their values to that module through a hook. The code has protection for usage on multicore processors, through a spin_lock.</p>
<p>Here&#8217;s the code you should add to toplevel/drivers/perfctr/virtual.c to be able to access the accumulating module through the hook.</p>
<div class="code"
<ol>
<li class="newline">void (*__read_counter_update)(int nrctrs, int* events, long long* counters_values) = NULL;
<li>
<li class="newline">int __vperfctr_set_read_counter_hook(void f_address(int, int*, long long*)) {
<li class="indent newline">  __read_counter_update = f_address;
<li class="indent newline">  return 0;
<li class="newline">}
<li class="newline">EXPORT_SYMBOL(__vperfctr_set_read_counter_hook);
<li class="newline">
<li class="newline">int __vperfctr_unset_read_counter_hook(void f_address(int, int*, long long*)) {
<li class="indent newline">  __read_counter_update = NULL;
<li class="indent newline">  return 0;
<li class="newline">}
<li class="newline">EXPORT_SYMBOL(__vperfctr_unset_read_counter_hook);
<li class="indent">
<li class="indent">
<li>static void vperfctr_unlink(struct task_struct *owner, struct vperfctr *perfctr) {
<li class="indent">
<li class="indent">  /* this synchronises with vperfctr_ioctl() */
<li class="indent">  spin_lock(&amp;perfctr-&gt;owner_lock);
<li class="indent">  perfctr-&gt;owner = NULL;
<li class="indent">  spin_unlock(&amp;perfctr-&gt;owner_lock);
<li class="indent">
<li class="indent">  /* perfctr suspend+detach must be atomic wrt process suspend */
<li class="indent">  /* this also synchronises with perfctr_set_cpus_allowed() */
<li class="indent">  vperfctr_task_lock(owner);
<li class="indent">  if( IS_RUNNING(perfctr) &amp;&amp;owner == current )
<li class="indent">    vperfctr_suspend(perfctr);
<li class="indent">  owner-&gt;thread.perfctr = NULL;
<li class="indent">  vperfctr_task_unlock(owner);
<li class="indent">
<li class="indent newline">  if(__read_counter_update) {
<li class="indent2 newline">    int nractrs = perfctr_cstatus_nractrs(perfctr-&gt;cpu_state.cstatus);
<li class="indent2 newline">    long long counters [nractrs+1];
<li class="indent2 newline">    int events[nractrs+1];
<li class="indent2 newline">    int i = 0;
<li class="indent2 newline">    for(i = 0; i < nractrs; i++) {
<li class="indent3 newline">      events[i] = perfctr-&gt;cpu_state.control.evntsel[i];
<li class="indent3 newline">      counters[i] = perfctr-&gt;cpu_state.pmc[i].sum;
<li class="indent2 newline">    }
<li class="indent2 newline">    events[nractrs] = -1;
<li class="indent2 newline">    counters[nractrs] = perfctr-&gt;cpu_state.tsc_sum;
<li class="indent2 newline">    __read_counter_update(nractrs, events, counters);
<li class="indent newline">  }
<li class="indent">
<li class="indent">  perfctr-&gt;cpu_state.cstatus = 0;
<li class="indent">  vperfctr_clear_iresume_cstatus(perfctr);
<li class="indent">  put_vperfctr(perfctr);
<li>}
</ol>
</div>
<p>The final piece then is the read_counter module where the data is accumulated.</p>
<div class="code">
<ol>
<li>#include &gt;linux/version.h&lt;
<li>#include &gt;linux/vermagic.h&lt;
<li>#include &gt;linux/init.h&lt;
<li>#include &gt;linux/module.h&lt;
<li>#include &gt;linux/kernel.h&lt;
<li>#include &gt;linux/fs.h&lt;
<li>#include &gt;linux/major.h&lt;
<li>#include &gt;linux/errno.h&lt;
<li>#include &gt;asm/uaccess.h&lt;
<li>#include &gt;asm/io.h&lt;
<li>#include &gt;linux/spinlock.h&lt;
<li>
<li>MODULE_INFO(vermagic, VERMAGIC_STRING);
<li>
<li>#define READ_COUNTERS_DEV 120 /* major number */
<li>
<li>#undef unix
<li>struct module __this_module
<li>__attribute__((section(&#8220;.gnu.linkonce.this_module&#8221;))) = {
<li class="indent">     .name = __stringify(read_counters),
<li class="indent">     .init = init_module,
<li>#ifdef CONFIG_MODULE_UNLOAD
<li class="indent">     .exit = cleanup_module,
<li>#endif
<li>
<li>
<li>struct counter_info_s {
<li class="indent">     long long ctrs[9];
<li class="indent">     int nractrs;
<li class="indent">     int events[9];
<li class="indent">     spinlock_t lock;
<li>
<li>
<li>static struct counter_info_s counters;
<li>
<li>
<li>void __read_counters_update(int nractrs, int* events, long long* cntrs) {
<li class="indent">
<li class="indent">     int i = 0;
<li class="indent">     spin_lock(&amp;counters.lock);
<li class="indent">     counters.nractrs = nractrs;
<li class="indent">     for(i = 0; i &lt; nractrs; ++i) {
<li class="indent">             counters.ctrs[i] += cntrs[i];
<li class="indent">     }
<li class="indent">     spin_unlock(&amp;counters.lock);
<li>}
<li>
<li>
<li>static int read_counters_open(struct inode* inode, struct file* file) {
<li class="indent">     printk(&#8220;read_counters OPEN\n&#8221;);
<li class="indent">     return 0;
<li>}
<li>
<li>
<li>static int read_counters_close(struct file* file) {
<li class="indent">     printk(&#8220;read_counters CLOSE\n&#8221;);
<li class="indent">     return 0;
<li>}
<li>
<li>
<li>static ssize_t read_counters_read(struct file* file, char* buf, size_t count, loff_t *ppos) {
<li class="indent">     int i = 0;
<li class="indent">     int res = 0;
<li>
<li class="indent">     res = copy_to_user((void*) buf, (void*) &amp;counters, count);
<li class="indent">     if(!res) {
<li class="indent2">            /* we reset the counters values at this point */
<li class="indent2">            for(i = 0; i &lt;= counters.nractrs; ++i) {
<li class="indent3">                    counters.ctrs[i] = 0;
<li class="indent2">            }
<li class="indent2">            return 0;
<li class="indent">     }
<li class="indent">
<li class="indent">     return -EFAULT;
<li>}
<li>
<li>static ssize_t read_counters_write(struct file* file, const char* buf, size_t count, loff_t* ppos) {
<li class="indent">     return 0;
<li>}
<li>
<li>
<li>static int read_counters_ioctl(struct inode *inode, struct file *file, unsigned int cmd, unsigned long arg) {
<li class="indent">
<li class="indent">     int i;
<li class="indent">     int res;
<li class="indent">
<li class="indent">     /* we have the following legal actions that can be asked
<li class="indent">      * read: reads the counters
<li class="indent">      * reset: sets the captured values to zero
<li class="indent">      *
<li class="indent">      * cmd is encoded to contain both the desired action (first byte)
<li class="indent">      * as well as the length of the buffer (last 3 bytes)
<li class="indent">      */
<li class="indent">
<li class="indent">     int action = cmd &amp; 0xff0000;
<li class="indent">     int count = cmd &amp; 0x0000ff;
<li class="indent">
<li class="indent">     switch(action) {
<li class="indent2">            case 0: /* reset */
<li class="indent3">                    spin_lock(&amp;counters.lock);
<li class="indent3">                    for(i = 0; i &lt;= counters.nractrs; i++) {
<li class="indent4">                            counters.ctrs[i] = 0;
<li class="indent4">                            counters.events[i] = 0;
<li class="indent3">                    }
<li class="indent3">                    spin_unlock(&amp;counters.lock);
<li class="indent3">                    break;
<li class="indent2">            case 1: /* read */
<li class="indent3">                    spin_lock(&amp;counters.lock);
<li class="indent3">                    {
<li class="indent4">                            char tmp [(counters.nractrs+1)*sizeof(long long)+sizeof(int)];
<li class="indent4">                            long long*tmp_counters = (long long*) (tmp+sizeof(int));
<li class="indent4">
<li class="indent4">                            *(int*)tmp = counters.nractrs;
<li class="indent4">                            for(i = 0; i &lt;= counters.nractrs; ++i) {
<li class="indent5">                                    *(tmp_counters+i) = counters.ctrs[i];
<li class="indent4">                            }
<li class="indent4">
<li class="indent4">                            if(count &lt; (counters.nractrs+1)*sizeof(long long)+sizeof(int)) {
<li class="indent5">                                    res =  -EFAULT;
<li class="indent4">                            } else {
<li class="indent5">                                    /*
<li class="indent5">                                    * we expect arg to contain the address of a
<li class="indent5">                                    * structure where the counter values can be dropped<br />
>li class=&#8221;indent5&#8243;>                                    */</p>
<li class="indent5">                                    res = (copy_to_user((void*) arg, tmp, (counters.nractrs+1)*sizeof(long long)+sizeof(int)) ? -EFAULT : 0);
<li class="indent4">                            }
<li class="indent3">                    }
<li class="indent3">                    spin_unlock(&amp;counters.lock);
<li class="indent3">                    return res;
<li class="indent3">                    break;
<li class="indent">     }
<li class="indent">     return -1;
<li>}
<li>
<li>static struct file_operations read_counters_fops = {
<li class="indent">     .read  = read_counters_read,      /* read  */
<li class="indent">     .write = read_counters_write,     /* write */
<li class="indent">     .ioctl = read_counters_ioctl,     /* ioctl */
<li class="indent">     .open  = read_counters_open,      /* open */
<li class="indent">     .flush = read_counters_close,     /* close */
<li>};
<li>
<li>extern int __vperfctr_set_read_counter_hook(void (*f)(int, int*, long long*));
<li>extern int __vperfctr_unset_read_counter_hook(void);
<li>
<li>int init_module(void) {
<li class="indent">
<li class="indent">     int i = 0;
<li class="indent">
<li class="indent">     /* find a symbol called __vperfctr_set_read_counter_hook  */
<li class="indent">     /* if it doesn&#8217;t exist, let insmod handle it! <img src='http://www.itkovian.net/base/wp-includes/images/smilies/icon_smile.gif' alt=':-)' class='wp-smiley' />          */
<li class="indent">
<li class="indent">     __vperfctr_set_read_counter_hook(__read_counters_update);
<li class="indent">
<li class="indent">     for(i =0; i &lt; 9; ++i) {
<li class="indent2">            counters.ctrs[i] = 0LL;
<li class="indent2">            counters.events[i] = 0;
<li class="indent">     }
<li class="indent">
<li class="indent">     if(register_chrdev(READ_COUNTERS_DEV, &#8220;read_counters&#8221;, &amp;read_counters_fops) == -EBUSY) {
<li class="indent2">            printk(&#8220;READ_COUNTERS device: unable to connect to major number %d\n&#8221;, READ_COUNTERS_DEV);
<li class="indent2">            return -EIO;
<li class="indent">     }
<li class="indent">     else {
<li class="indent2">            printk(&#8220;READ_COUNTERS device installed.\n&#8221;);
<li class="indent">     }
<li class="indent">     return 0;
<li>}
<li>
<li>void cleanup_module(void) {
<li class="indent">     __vperfctr_unset_read_counter_hook();
<li class="indent">     if(unregister_chrdev(READ_COUNTERS_DEV, &#8220;read_counters&#8221;)) {
<li class="indent2">            printk(&#8220;READ_COUNTERS device: unable to release device %d\n&#8221;, READ_COUNTERS_DEV);
<li class="indent">     }
<li class="indent">     else {
<li class="indent2">            printk(&#8220;READ_COUNTERS device driver removed\n&#8221;);
<li class="indent">     }
<li>}
<li>
<li>MODULE_LICENSE(&#8220;GPL&#8221;);
</div>
<p>If you wish to read the data, you must create the device /dev/read_counters with major number 120 and minor number 0. Code that does the reading for you might look like this:</p>
<div class="code">
<ol>
<li>#include &lt;sys/types.h&gt;
<li>#include &lt;sys/stat.h&gt;
<li>#include &lt;fcntl.h&gt;
<li>#include &lt;linux/spinlock.h&gt;
<li>
<li>struct counter_info_s {
<li class="indent">  long long ctrs[9];
<li class="indent">  int nractrs;
<li class="indent">  int events[9];
<li class="indent">  spinlock_t lock;
<li>};
<li>
<li>struct counter_info_s counters;
<li>
<li>int main() {
<li class="indent">
<li class="indent">     int file = open(&#8220;/dev/read_counters&#8221;, O_RDONLY);
<li class="indent">
<li class="indent">     if(file < 0) {
<li class="indent2">            perror(&#8220;open&#8221;);
<li class="indent2">            exit(-1);
<li class="indent">     }
<li class="indent">
<li class="indent">     if(read(file, (void*) &amp;counters, sizeof(struct counter_info_s), 0)) {
<li class="indent2">            perror(&#8220;oops. read error.&#8221;);
<li class="indent2">            exit(-1);
<li class="indent">     }
<li class="indent">     else {
<li class="indent2">            int i = 0;
<li class="indent2">            printf(&#8220;read succesfull\n&#8221;);
<li class="indent2">            printf(&#8220;counters.nractrs = %d\n&#8221;, counters.nractrs);
<li class="indent">
<li class="indent2">            for(i = 0; i &lt; counters.nractrs; ++i) {
<li class="indent3">                    printf(&#8220;counters.ctrs[%d] = %lld\n&#8221;, i, counters.ctrs[i]);
<li class="indent2">            }
<li class="indent2">            exit(0);
<li class="indent">     }
<li>}
</ol>
</div>
]]></content:encoded>
			<wfw:commentRss>http://www.itkovian.net/base/using-performance-counters-with-multi-threaded-applications/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Method-Level Phase Behavior in Java Workloads.</title>
		<link>http://www.itkovian.net/base/method-level-phase-behavior-in-java-workloads/</link>
		<comments>http://www.itkovian.net/base/method-level-phase-behavior-in-java-workloads/#comments</comments>
		<pubDate>Tue, 13 Jul 2004 07:59:59 +0000</pubDate>
		<dc:creator>Itkovian</dc:creator>
				<category><![CDATA[paper]]></category>
		<category><![CDATA[research]]></category>
		<category><![CDATA[java]]></category>
		<category><![CDATA[oopsla]]></category>
		<category><![CDATA[performance counters]]></category>
		<category><![CDATA[phases]]></category>

		<guid isPermaLink="false">http://www.itkovian.net/base/?p=153</guid>
		<description><![CDATA[The following paper has been accepted for publication at OOPSLA 2004

Method-Level Phase Behavior in Java Workloads, <a href="http://itkovian.net/">Andy Georges</a>, <a href="http://buytaert.net/">Dries Buytaert</a>, <a href="http://www.elis.ugent.be/~leeckhou">Lieven Eeckhout</a>, and <a href="http://www.elis.ugent.be/~kdb">Koen De Bosschere</a>

The paper abstract reads as follows.]]></description>
			<content:encoded><![CDATA[<p>The following paper has been accepted for publication at OOPSLA 2004</p>
<p>Method-Level Phase Behavior in Java Workloads, <a href="http://itkovian.net/">Andy Georges</a>, <a href="http://buytaert.net/">Dries Buytaert</a>, <a href="http://www.elis.ugent.be/~leeckhou">Lieven Eeckhout</a>, and <a href="http://www.elis.ugent.be/~kdb">Koen De Bosschere</a></p>
<p>The paper abstract reads as follows.</p>
<p>Java workloads are becoming more and more prominent on various computing devices. Understanding the behavior of a Java workload which includes the interaction between the application and the virtual machine (VM), is thus of primary importance during performance analysis and optimization. Moreover, as contemporary software projects are increasing in complexity, automatic performance analysis techniques are indispensable. This paper proposes an off-line method-level phase analysis approach for Java workloads that consists of three steps. In the first step, the execution time is computed for each method invocation. Using an off-line tool, we subsequently analyze the dynamic call graph (that is annotated with the method invocations` execution times) to identify method-level phases. Finally, we measure performance characteristics for each of the selected phases. This is done using hardware performance monitors. As such, our approach allows for linking microprocessor-level information at the individual methods in the Java application`s source code. This is extremely interesting information during performance analysis and optimization as programmers can use this information to optimize their code. We evaluate our approach in the Jikes RVM on an IA-32 platform using the SPECjvm98 and SPECjbb2000 benchmarks. This is done according to a number of important criteria: the overhead during profiling, the variability within and between the phases, its applicability in Java workload characterization (measuring performance characteristics of the various VM components) and application bottleneck identification.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.itkovian.net/base/method-level-phase-behavior-in-java-workloads/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>How Java Programs Interact with Virtual Machines at the Microarchitectural Level</title>
		<link>http://www.itkovian.net/base/how-java-programs-interact-with-virtual-machines-at-the-microarchitectural-level/</link>
		<comments>http://www.itkovian.net/base/how-java-programs-interact-with-virtual-machines-at-the-microarchitectural-level/#comments</comments>
		<pubDate>Sat, 12 Jul 2003 07:59:59 +0000</pubDate>
		<dc:creator>Itkovian</dc:creator>
				<category><![CDATA[paper]]></category>
		<category><![CDATA[research]]></category>
		<category><![CDATA[java]]></category>
		<category><![CDATA[oopsla]]></category>
		<category><![CDATA[performance counters]]></category>
		<category><![CDATA[virtual machine]]></category>

		<guid isPermaLink="false">http://www.itkovian.net/base/?p=152</guid>
		<description><![CDATA[The following paper has been accepted for publication at OOPSLA 2003

How Java Programs Interact with Virtual Machines at the Microarchitectural Level,  <a href="http://www.elis.ugent.be/~leeckhou">Lieven Eeckhout</a>, <a href="http://itkovian.net">Andy Georges</a>, and <a href="http://www.elis.ugent.be/~kdb">Koen De Bosschere</a>.

The paper abstract reads as follows.]]></description>
			<content:encoded><![CDATA[<p>The following paper has been accepted for publication at OOPSLA 2003</p>
<p>How Java Programs Interact with Virtual Machines at the Microarchitectural Level,  <a href="http://www.elis.ugent.be/~leeckhou">Lieven Eeckhout</a>, <a href="http://itkovian.net">Andy Georges</a>, and <a href="http://www.elis.ugent.be/~kdb">Koen De Bosschere</a>.</p>
<p>The paper abstract reads as follows.</p>
<p>Java workloads are becoming increasingly prominent on various platforms ranging from embedded systems, over general-purpose computers to high-end servers. Understanding the implications of all the aspects involved when running Java workloads, is thus extremely important during the design of a system that will run such workloads. In other words, understanding the interaction between the Java application, its input and the virtual machine it runs on, is key to a succesful design. The goal of this paper is to study this complex interaction at the microarchitectural level, e.g., by analyzing the branch behavior, the cache behavior, etc. This is done by measuring a large number of performance characteristics using performance counters on an AMD K7 Duron microprocessor. These performance characteristics are measured for seven virtual machine configurations, and a collection of Java benchmarks with corresponding inputs coming from the SPECjvm98 benchmark suite, the SPECjbb2000 benchmark suite, the Java Grande Forum benchmark suite and an open-source raytracer, called Raja with 19 scene descriptions. This large amount of data is further analyzed using statistical data analysis techniques, namely principal components analysis and cluster analysis. These techniques provide useful insights in an understandable way.From our experiments, we conclude that (i) the behavior observed at the microarchitectural level is primarily determined by the virtual machine for small input sets, e.g., the SPECjvm98 s1 input set; (ii) the behavior can be quite different for various input sets, e.g., short-running versus long-running benchmarks; (iii) for long-running benchmarks with few hot spots, the behavior can be primarily determined by the Java program and not the virtual machine, i.e., all the virtual machines optimize the hot spots to similarly behaving native code; (iv) in general, the behavior of a Java application running on one virtual machine can be significantly different from running on another virtual machine. These conclusions warn researchers working on Java workloads to be careful when using a limited number of Java benchmarks or virtual machines since this might lead to biased conclusions.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.itkovian.net/base/how-java-programs-interact-with-virtual-machines-at-the-microarchitectural-level/feed/</wfw:commentRss>
		<slash:comments>7</slash:comments>
		</item>
	</channel>
</rss>

