Archive for June, 2007

Transport class for Python's XML-RPC lib

Thursday, June 21st, 2007

Given that the xmlrpclib.Transport class can be derived, it is perhaps easier to define a new transport class that implements the patch shown in the facelift post, though I still believe Python’s XML-RPC library is due a much needed update.

Thus, I present the HTTPTransport class:

Update: It seems I forgot to parse the resulting payload. this is now fixed in the updated code below.

from xmlrpclib import Transport
from xmlrpclib import ProtocolError

class HTTPTransport(Transport):
    ##
    # Connect to server.
    #
    # @param host Target host.
    # @return A connection handle.

    def make_connection(self, host):
        # create a HTTP connection object from a host descriptor
        import httplib
        host, extra_headers, x509 = self.get_host_info(host)
        return httplib.HTTPConnection(host)
    ##
    # Send a complete request, and parse the response.
    #
    # @param host Target host.
    # @param handler Target PRC handler.
    # @param request_body XML-RPC request body.
    # @param verbose Debugging flag.
    # @return XML response.

    def request(self, host, handler, request_body, verbose=0):
        # issue XML-RPC request

        h = self.make_connection(host)
        if verbose:
            h.set_debuglevel(1)

        self.send_request(h, handler, request_body)
        self.send_host(h, host)
        self.send_user_agent(h)
        self.send_content(h, request_body)

        response = h.getresponse()

        if response.status != 200:
          raise ProtocolError(host + handler, response.status, response.reason, response.msg.headers)

        payload = response.read()
        parser, unmarshaller = self.getparser()
        parser.feed(payload)
        parser.close()

        return unmarshaller.close()

Statistically Rigorous Java Performance Evaluation

Monday, June 18th, 2007

The following paper has been accepted for OOPSLA 2007.

Statistically Rigorous Java Performance Evaluation, Andy Georges, Dries Buytaert, and Lieven Eeckhout.

The abstract reads as follows.

Java performance is far from being trivial to benchmark because it is affected by various factors such as the Java application, its input, the virtual machine, the garbage collector, the heap size, etc. In addition, non-determinism at run-time causes the execution time of a Java program to differ from run to run. There are a number of sources of non-determinism such as Just-In-Time (JIT) compilation and optimization in the virtual machine (VM) driven by timer-based method sampling, thread scheduling, garbage collection, and various system effects.

There exist a wide variety of Java performance evaluation methodologies used by researchers and benchmarkers. These methodologies differ from each other in a number of ways. Some report average performance over a number of runs of the same experiment; others report the best or second best performance observed; yet others report the worst. Some iterate the benchmark multiple times within a single VM invocation; others consider multiple VM invocations and iterate a single benchmark execution;
yet others consider multiple VM invocations and iterate the benchmark multiple times.

This paper shows that prevalent methodologies can be misleading, and can even lead to incorrect conclusions. The reason is that the data analysis is not statistically rigorous. In this paper, we present a survey of existing Java performance evaluation methodologies and discuss the importance of statistically rigorous data analysis for dealing with non-determinism. We advocate approaches to quantify startup as well as steady-state performance, and, in addition, we provide the JavaStats software to automatically obtain performance numbers in a rigorous manner. Although this paper focuses on Java performance evaluation, many of the issues addressed in this paper also apply to other programming languages and systems that build on a managed runtime system.

This paper took quite some work, especially in the experimentation-wise. While the initial reviews were very positive, they required us to perform several extra experiments. But in the end, it was worth the effort. You can get a preprint version.

So, 2 out of X at OOPSLA for us! Yay!

Python XML-RPC needs a facelift.

Friday, June 15th, 2007

I have been experimenting with the Python XML-RPC implementation for a while now, and yesterday, I came across what is most accurately described as a bug. Let’s consider a nice figure to illustrate how the XML-RPC implementation handles things in the Python 2.5 release.

Python xmlrpc madness

So, basically the XML-RPC ServerProxy files a request with the Transport class to deliver the XML goodies to the remote server. However, in the current implementation, Transport uses httplib.HTTP. This is a wrapper class that uses HTTPConnection for most things, but not for receiving responses from the server. And that is exactly where the problem lies. The HTTP.getreply function fetches the HTTP status, reason and headers. But the XML-RPC Transport class does not check the headers for any indication of a content length. When they get the response, things really turn haywire. No matter what, they ask a socket (or a file imposing as a socket) to read 1024 bytes. The socket library tries to comply, but obviously when either the content is shorter, or the connection is closed after the content has been read, an error is raised.

So what are the options to correct this behaviour. I think one can do two things. First of all, fix the Transport function that asks the socket for data to use an extra argument indicating the expected payload size. Obviously, once the headers are received they should be chacked for the presence of a Content-Length field and the requested size should correspond to the value in this length field. I’ve implemented that and it works.

However, I think a second option is perhaps better. Why remain with the HTTP class when a nice and shiny HTTPConnection class is available that does all we need and more? Let’s move the XML-RPC HTTP connection object to that class, and voila! Fixed.

In unified diff format, it boils down to this:

--- /sw/lib/python2.5/xmlrpclib.py      2006-11-29 02:46:38.000000000 +0100
+++ xmlrpclib.py        2007-06-15 15:59:02.000000000 +0200
@@ -1182,23 +1182,13 @@
         self.send_user_agent(h)
         self.send_content(h, request_body)

-        errcode, errmsg, headers = h.getreply()
+        response = h.getresponse()
+
+        if response.status != 200:
+          raise ProtocolError(host + handler, response.status, response.reason, response.msg.headers)

-        if errcode != 200:
-            raise ProtocolError(
-                host + handler,
-                errcode, errmsg,
-                headers
-                )
-
-        self.verbose = verbose
-
-        try:
-            sock = h._conn.sock
-        except AttributeError:
-            sock = None
-
-        return self._parse_response(h.getfile(), sock)
+        payload = response.read()
+        return payload

     ##
     # Create parser.
@@ -1250,7 +1240,7 @@
         # create a HTTP connection object from a host descriptor
         import httplib
         host, extra_headers, x509 = self.get_host_info(host)
-        return httplib.HTTP(host)
+        return httplib.HTTPConnection(host)

WTF calculator competition

Tuesday, June 12th, 2007

There is a neat competition running at The Daily WTF (yeah, I dislike their new name) where one can present the worst possible implementations of a calculator app.

I think they might need one extra entrance. A few years ago, Ghent University decided that it would be to the benefit of all, especially the employees, if we could use the glorious SAP system. At first, they decided a web application would be most suitable for introducing the system, and to let employees learn the ropes. Though of course, that would take little efforst, because the system would ease the entire workflow for everybody involved.

So, the web application was installed (reportedly on a small PC acting as the front end to a giant SUN server). And, miraculously it came with its very own calculator. I’ve no idea what the real reason might have been — it probably had something to do with either (a) making things more secure, or (b) filling the pockets of the consultants paid to incorporate the university workflow into the SAP system — but somehow, evertything one could click on was an image, generated on the fly. So was the calculator. I think you can already see where this leads. Everytime you enter a digit (by clicking) into the calculator it refreshed the entire webpage to show you an updated image containing the digit you just entered or the result of the function you just indicated the calculator should execute. It might not particularly stand out in the WTF competition, I think.

As if that was not enough, they had a similar system for entering dates from a calendar.

Right now, we have a real application, that comes with its own horrors.

Electricity symbol stencil for Omnigraffle

Tuesday, June 12th, 2007

I have made a stencil for Omnigraffle that allows one to create the layout for the electricity grid in one’s house. You can get it from a darcs repository, or just get a zipped version here.

Here’s a screenshot of what the stencil items look like:

Omnigraffle stencil

Plazes moves from beta to gamma

Wednesday, June 6th, 2007

Yesterday, I received an email from plazes.com stating that they finally moved out of beta. I immediately upgraded my plazer application to the latest and greatest. Much to my disappointment however, it turns out that both the plazer app and the website are still somewhat buggy, to say the least. The website does not seem to manage keeping me logged into my account, so each time I do something, like invite somebody, I need to log in again. Setting ‘remember me’ helps, in that I do no longer need to manually log in, but I still need to do some stuff on the site twice.

The plazer app itself also acts weirdly, sometimes showing actions from my contacts, and on a next check some actions have mysteriously disappeared. Yesterday, when I arrived at my home, it failed to recognize my own plaze. I’m quite confident I was at the right address – my key unlocked the door, my powerbook recognized my WiFi, and the pictures on the wall were those of my kids. Ok, it may have been a database glitch – after a few hours I was finally found to be at my plaze. But I do not expect that from software that moves out of beta. I never had any trouble when plazes was still in its testing phase, I thought it worked fine. I hope the developers get round to squishing the remaining bugs, so we can all be happy at our plazes.