<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Geek in progress &#187; paul mckenney</title>
	<atom:link href="http://www.itkovian.net/base/tag/paul-mckenney/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.itkovian.net/base</link>
	<description>I am not yet done.</description>
	<lastBuildDate>Thu, 20 Oct 2011 20:56:59 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.2.1</generator>
		<item>
		<title>Performance, scalability, and real-time response from the Linux kernel</title>
		<link>http://www.itkovian.net/base/performance-scalability-and-real-time-response-from-the-linux-kernel/</link>
		<comments>http://www.itkovian.net/base/performance-scalability-and-real-time-response-from-the-linux-kernel/#comments</comments>
		<pubDate>Mon, 20 Jul 2009 16:19:37 +0000</pubDate>
		<dc:creator>Itkovian</dc:creator>
				<category><![CDATA[kernel]]></category>
		<category><![CDATA[linux]]></category>
		<category><![CDATA[paul mckenney]]></category>
		<category><![CDATA[real time]]></category>

		<guid isPermaLink="false">http://www.itkovian.net/base/?p=239</guid>
		<description><![CDATA[The most interesting course, as well as the one I enjoyed most, was on the
performance, scalability and real-time response of the Linux kernel.

<div class="figure">
<a href="http://www.flickr.com/photos/itkovian/3717236440/" title="Paul McKenney by Itkovian, on Flickr"><img src="http://farm4.static.flickr.com/3425/3717236440_950cf38df4.jpg" width="319" height="500" alt="Paul McKenney" /></a>
</div>]]></description>
			<content:encoded><![CDATA[<p>The most interesting course, as well as the one I enjoyed most, was on the<br />
performance, scalability and real-time response of the Linux kernel.</p>
<div class="figure">
<a href="http://www.flickr.com/photos/itkovian/3717236440/" title="Paul McKenney by Itkovian, on Flickr"><img src="http://farm4.static.flickr.com/3425/3717236440_950cf38df4.jpg" width="319" height="500" alt="Paul McKenney" /></a>
</div>
<p><!--break--></p>
<p>He opened the course by dropping a question into the unsuspecting audience:<br />
what decades old tech can keep a multi-core busy and yet be easy to program<br />
against. I thought Paul had the idea of a time-sharing machine in his mind, but<br />
the solution was far easier than that: SQL. Given that the frequency increase<br />
of the CPUs is stabilising at naught, we need to find a good way to easily<br />
program against multi-core architectures. Something that rivals the ease of<br />
SQL, where under the hood a lot of stuff is going on, but to the user, it<br />
remains fairly simple. Unlike most of the other people talking about<br />
parallelism, Paul stressed multiple times that if one does not need it, it&#8217;s<br />
best to run single threaded. I wholeheartedly agree! On the other hand, if we<br />
parallelise, we should be considering high-level approaches prior to trying to<br />
get the nitty-gritty details right. So, first: get your algorithm in shape. I<br />
think that&#8217;s a very good point, given the fact that research papers publishing<br />
tweaks, rather than new algorithms seldom succeed in increasing the performance<br />
with a factor or even a large percentage. Conversely, any performance lost at<br />
the base OS level, cannot be made up by the higher levels, no matter the<br />
algorithm. Context-switches, locks, etc. take a (more-or-less) fixed amount of<br />
time, and that time will be spent anyhow.</p>
<p>The major issue with RT-processes seems that they need to interact with non-RT<br />
processes, I/O (disk, network, etc.). As such, the RT approach has to be<br />
applied across the entire execution stack, if we want to gdet it right.<br />
However, we still need to keep a fair responsiveness for non-RT processes.<br />
Essentially, Paul argues for making tradeoffs, rather that going for the<br />
best-for-a-single-goal apporoach and ignore the rest.</p>
<p>The question raised was why we need to enhance performance. The answer is that<br />
people time is much more costly than machine time these days. So it does no<br />
lomnger pay off to get an engineer trying to enhance the solution. It should be<br />
done automnagically as much as possible. Moreover, general solutions help to<br />
spread the cost over multiple users.</p>
<p>One of the major problem when parallelising programs is that people either do<br />
not grok the issues fully, or try to tackle the problems in the wrong order.<br />
Paul argued that we first need to understand how we can split up the problem<br />
into parts where there is little interaction between the data (as to avoid<br />
excess locking). Only then can we partition the work that is done on that data.<br />
The final step then is to determine which parts can have actual access to the<br />
data, i.e., assign the locks. The matra that was repeated here was that<br />
low-level details really do matter, and that it is important to get them right.<br />
Building on this, the argument was raised what we rely on people who implement<br />
things to have detailed knowledge of the underlying hardware. Unfortunately,<br />
this is not always the case.</p>
<p>The takeaway lesson from the first lecture was this: parallel programming is<br />
bloody hard, because it was designed that way.</p>
<p>Lesson 2 discussed Linux kernel programming environments dealing with: response<br />
times, preemption inside the kernel, non-maskable interrupts, etc. Point made:<br />
if an algorithm runs at a low level, you need interruptible locks. The kernel<br />
comes with a broad aaray of synchronisation primitives, so it is important to<br />
use the right primitives for the right job. For example, use locks that allow<br />
looping in the reader if there are potentially (multiple) writers. Once more,<br />
Paul stressed that synchronisation primitivies are not the first thing to<br />
decide on. We should associate locks and other primitives with each data<br />
partition (that was agreed upon earlier in the design stage). Clearly, it is<br />
not good to have too many data partitions, as that means more locks, and a<br />
higher risk of lock contention. The example used throughout this lesson was<br />
that of a linked list. Should we lock the header? Lock each node? Keep the<br />
locks in the data structure or in some hash array of locks? Key point: provide<br />
protection for each way in which the data can be accessed! A per-cpu locking<br />
mechanism can be used; if done right it scales pretty well.</p>
<p>In lesson 3, Paul tackled the performance and scalability of Linux<br />
applications. Most frameworks (200+) that we once in use have now either faded,<br />
merged, or discontinued. Advice is given not to use or rely on signal handlers.<br />
POSIX primitives were discussed, as were per-thread variables, spinlocks, etc.<br />
Important point was that the use of per-cpu state to lock onto, does not<br />
translate well from kernel to user space. Some remarks were made about the RT<br />
aspects of user-space applications. Should this be enforced? The issue here is<br />
that opening RT behaviour to user-space clears the way for abuse. During the class, he used the (adapted) illustration of the blind philosophers and the elephant:</p>
<div class="figure">
<a href="http://www.flickr.com/photos/itkovian/3724554811/" title="The five blind penguins and the elephant by Itkovian, on Flickr"><img src="http://farm4.static.flickr.com/3495/3724554811_8da2811907.jpg" width="500" height="333" alt="The five blind penguins and the elephant" /></a>
</div>
<p>Lesson 4 was fully dedicated to real time systems, discussing some of the<br />
implementations in the Linux kernel for dealing with this. Main topics of the<br />
day were timers, high-resolution and others, interrupt handlers that can be<br />
threaded, etc. It was stressed that real time has a broad range of meanings,<br />
going from a few nanoseconds up to 10ms, the latter amounting basically to the<br />
context switch time. Apparently, as a first step, some parts of the kernel can<br />
be preempted, some cannot. The consequence is a reduction of schedular latency,<br />
but nowhere near enough for a RT system at the hiogh end of the scale. Timer<br />
wheels were added to improve locality and queueing, but still certain cascading<br />
operations on this data structure can take a long time. Long enough to warrant<br />
implementing high-resolution timers using RB trees, along with preemptible<br />
spinlocks. Still: greater power means greater responsibility, so care must be<br />
taken. Priority inversion was discussed, and adequaltely illustrated using the<br />
dancing processes.</p>
<div class="figure">
<a href="http://www.flickr.com/photos/itkovian/3727450809/" title="Real time processes by Itkovian, on Flickr"><img src="http://farm4.static.flickr.com/3485/3727450809_9fef6d4eb1.jpg" width="500" height="333" alt="Real time processes" /></a>
</div>
<p>RCU once more came to the rescue, and it was shown how this can be used in the RT scenario, with priority inversion.</p>
<p>In the final lesson, Paul discussed RT Linux applications.</p>
<div class="figure">
<a href="http://www.flickr.com/photos/itkovian/3728456171/" title="Real time vs. the Hammer by Itkovian, on Flickr"><img src="http://farm4.static.flickr.com/3219/3728456171_c498e263a4.jpg" width="500" height="333" alt="Real time vs. the Hammer" /></a>
</div>
<p>I guess the above illustration really says it all. The class discussed the<br />
meaning of a hard RT system, and most of use were proven wrong. In some cases<br />
knowing that failure is imminent is more important that guaranteed making the<br />
deadline. (This eems to have some resemblance to writing research papers.) A<br />
combination of an accurate system that is allowed to fail and can indicate it<br />
with a less accurate systemn that guarentees deadline meeting seems to be the<br />
way to go. In any case, maths cannot describe RT systems in practive, and QoS<br />
is more important that hard/soft RT distinction.</p>
<p>RT applications can be divided into three classes: search for life<br />
(medical/industrial control systems), search for death (military) and search<br />
for money (financial). In todays interconnected machine web, the slowest<br />
machine determines the RT nature of the complete system. Multiple serialised<br />
machines have a large impact on this fact. Funny fact: in the Linux kernel,<br />
real time used to mean real life time, rather than deadline meeting. So beware<br />
of the code you rely on!</p>
]]></content:encoded>
			<wfw:commentRss>http://www.itkovian.net/base/performance-scalability-and-real-time-response-from-the-linux-kernel/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
	</channel>
</rss>

