Get news? 2011 | 2010 | 2009 | 2008 | 2007 | 2006 | 2005 | 2004 | 2003 | 2002 | 2001 | 2000 | 1999 | 1998 | 1997 | About | Contact Want to help?

Linux-Kongress 2004
11th International Linux System Technology Conference
September 7-10, 2004 in Erlangen, Germany

Home | Program | Abstracts | Tutorials | Fees | Registration | Location | Sponsors | Call for Papers

Papers

The following papers will be presented at the Linux-Kongress on Thursday, September 9 and Friday, September 10. All talks are in English. (The original Call for Papers.)

Advanced virtualization techniques for FAUmachine by Martin Waitz Thursday 11:30

Our team developed a virtual PC formerly known as UMLinux, now called FAUmachine. One of our main targets was, that the changes needed to port an original Linux kernel to our virtual environment should be minimal. Thus we replaced certain assembler instructions with calls to the virtual PC, which emulates these instructions. This approach has proven to work very well and has the important benefit, that the virtual machine completely runs in user space. No special modules or extensions to the hosting Linux kernel are needed.

Nonetheless, two major drawbacks have shown up. First, we can not run system level binaries for which we have no source code. This includes binary-only Linux kernel modules and operating systems like Windows. Second, we use ptrace(2) to redirect system calls issued by user processes running on FAUmachine to the kernel running on FAUmachine. The introduced overhead degrades the performance of the virtual system significantly. To remedy these problems we implemented a just-in-time compiler and an extension to the hosting Linux kernel.

Instead of executing directly, instructions are filtered through the just-in-time compiler (JIT). If these instructions are special somehow, they have to be emulated. All other instructions are copied verbatim into a buffer. To invoke emulation, a call instructions to the virtual machine is inserted in the buffer before copying a special instruction. When the buffer is filled, the code is executed directly. Instead of executing a special instruction, the control flow moves to the virtual machine, which emulates the desired instruction. This technique allows to run arbitrary code without prior modifications. First performance measurements and comparisons to the previous approach show, that the JIT degrades the performance of FAUmachine only about fifteen percent.

To improve system call redirection, we added a per-process virtual memory range, from which system calls can be issued. Attempts to issue system calls from outside this range are converted into a signal instead. The virtual machine sets this range to its own code segment and therefore can execute system calls. System calls from outside this range generate a signal and can be redirected easily to the guest kernel. With this simple modification we were able to reduce the overhead of system call redirection using ptrace about the factor three.

One topic of our current research is the use of the full 4 GByte address space for FAUmachine. As long as the code segment of a Linux kernel running on the virtual machine does not overlap with the segment of the hosting kernel, Linux can run on top of Linux easily. When they overlap, the virtual machine has to emulate not only special instructions, but also all instructions referencing that overlapping segment. This has a huge performance impact. To get rid of this limitation, we are working on an enhanced version of the existing 4-GByte-patch, which will allow us to make use of the full 4 GByte address space.

About the author:

This paper was written by Martin Waitz, Hans-Jörg Höxer and Volkmar Sieh. All have graduated in computer science from the University of Erlangen. Volkmar Sieh also hold a PhD in computer science. All authors are members of the team working on the FAUmachine project.

The Active Block I/O Scheduling System (ABISS) - A framework for storage QoS by Dr. Benno van den Brink Thursday 11:30

In the near future, Consumer Electronics (CE) devices that are able to stream multiple A/V streams will be commonplace. In such an environment applications will have to be able to share system resources while providing a 'soft real-time' service. This sharing should be efficient because - even more than e.g. in the traditional PC environment - CE devices often have to meet other constraints like low power consumption, noise-free operation, minimum hardware cost, etc. Resource sharing can be accomplished by either making the applications aware of each other, or by making the system aware of the applications.

In this paper we will present the results of work done on the hard-disk storage subsystem of Linux, resulting in the Active Block I/O Scheduling System (ABISS). The main purpose of ABISS is to make the system application-aware by either providing a guaranteed reading and writing bit rate to any application that asks for it or denying access when the system is fully committed. Apart from these guaranteed real-time (RT) streams, we also included multiple priorities for best-effort (BE) disk traffic.

The system consists of a framework that is included in the kernel, with a policy and coordination unit implemented in user space. This approach ensures separation between the kernel infrastructure (the framework) and the policies (e.g. admission control) in user space.

The kernel part consists of our own elevator and a new 'read scheduler' in kernel space, communicating with a user-space daemon. The elevator implements the multiple priorities of the stream and the read scheduler is responsible for timely pre-loading and buffering of data. Apart from the elevator and read scheduler, some minor modifications were made to file system drivers. The ABISS extensions are controlled through ioctls applied to files accessed through the regular POSIX API. A small library with wrapper functions shaped after stdio (fopenrt(), freadrt() and fwritert()) is available for applications preferring a higher-level API.

About the author:

I graduated in experimental physics at the Vrije Universiteit in Amsterdam in 1990 and received my Ph.D. in 1995 from the same university for research in the field of experimental nuclear physics.

In 1995 I joined Philips Research Laboratories. My fields of research at Philips have been simulations and design optimization of Cathode Ray Tubes (CRTs), maskless CRTs, and currently storage-enabled mobile consumer systems. In our research group we use Linux extensively to do research on and make prototypes of advanced Consumer Electronics systems.

Machine check handling on Linux by Andi Kleen Thursday 12:15

The number of transistors in common CPUs and memory chips is growing each year. Hardware busses are getting faster. This increases the chances of data corruption by arbitrary bit flips in hardware. Modern chips can detect and sometimes correct such events using ECC checksums and other techniques, but there are cases the hardware can't hide such problems completely and software has to handle it. Such an event is called an machine check event (MCE).

As these events become more common it's becoming more and more important that Linux recovers as well as possible from them. The traditional strategy is to either panic or ignore, but that's not good enough anymore. Important is to log these events well and to recover the machine as best as possible without corrupting data. The possibilities to handle machine checks vary with hardware and firmware support, but also the kernel has to participate to handle them well.

The paper will discuss some generic issues in handling MCEs in software.

The Linux/x86-64 port recently got a new machine check handler to address many problems in the old one, which was derived from the 32bit code.

This paper will discuss the problems in the traditional x86 Linux machine check handlers and cover the design and implementation of the new x86-64 handler. It willbe mainly focusing on capabilities of current x86-64 and IA32 machines, but also mention other architectures.

In addition possible fugure changes to generic Linux code in order to improve recovery from machine checks on all architectures will be discussed.

About the author:

Andi Kleen is a kernel developer at SUSE Labs. He has worked over the years on many areas of the Linux kernel, such as the TCP/IP stack, NFS, file systems, NUMA tuning, device drivers, the x86 and x86-64 ports and other areas.

For the last years he has served as the maintainer of the Linux kernel port to the x86-64 architecture, which is an 64bit variant of the traditional PC x86 architecture.

Comparitive study of the Asynchronous I/O programming interfaces on Linux by Chinmay Albal Thursday 12:15

The Linux 2.6 kernel integrates the asynchronous I/O aka AIO for I/O Operations. Asynchronous IO overlaps IO operations with application processing, thus improving utilization of the CPU and devices and aids better application performance. This can be used to boost performance in areas such as databases, web servers, proxy servers and streaming content servers for video/audio applications.

There are currently two programming libraries providing Asynchronous I/O interfaces to develop async I/O applications.

  1. Libaio which has been developed at RedHat.
  2. POSIX glibc implementation of the async I/O calls.

Libaio provides the native Linux API for async IO and also exploits the Completion Queue model which is the core of kernel AIO. In the Linux AIO implementation, each AIO request is associated with a completion queue on which application threads wait for completion notification.

Libaio defines a set of syscalls and library wrappers, provides header files to include and libraries to link with the Linux-native asynchronous I/O facility. Creating, destroying of entries in the Completion Queue and calls for submitting and cancelling of async IO jobs to the kernel are made using these syscalls.

The POSIX glibc AIO which works in conjunction with the pthreads interface uses threads for handling the enqueued requests and similarly provides calls for performing AIO operations.

As Async I/O is being increasingly deployed in large applications and gaining in popularity we will focus on a case study which involves developing two similar applications, one using libaio the other using POSIX glibc models. We will then make a comparison of the differences and performance issues between the two async I/O models.

The outline of the talk is

  1. Introduction and Advantages of AIO.
  2. Design of libaio library.
  3. Design of POSIX glibc AIO library.
  4. Case study on building applications which could benefit from asynchronous IO.
  5. Compare the differences and performance issues on applications which are implemented using libaio and POSIX glibc.

Chinmay Albal

About the author:

Chinmay Albal works in the Linux Technology Center at IBM India Software Labs, Bangalore. Chinmay has been a linux user since 2001 and currently works for the Linux Change team handling the Linux Kernel and library components.

ct_sync: ip_conntrack state replication by Harald Welte Thursday 14:30

With traditional, stateless firewalling (such as ipfwadm, ipchains) there is no need for special HA support in the firewalling subsystem. As long as all packet filtering rules and routing table entries are configured in exactly the same way, one can use any available tool for IP-Address takeover to accomplish the goal of failing over from one node to the other.

The presentation will cover the architectural design and implementation of the connection tracking failover system (ct_sync).

About the author:

Harald Welte is the chairman of the five netfilter/iptables core team.

His main interest in computing has always been networking. In the few time left besides netfilter/iptables related work, he's writing obscure documents like the UUCP over SSL HOWTO. Other kernel-related projects he has been contributing are user mode linux and the international (crypto) kernel patch.

He has been working as an independent IT Consultant working on projects for various companies ranging from banks to manufacturers of networking gear. During the year 2001 he was living in Curitiba (Brazil), where he got sponsored for his Linux related work by Conectiva Inc.

Starting with February 2002, Harald has been contracted part-time by Astaro AG, who are sponsoring him for his current netfilter/iptables work.

Aside from the Astaro sponsoring, he continues to work as a freelancing kernel developer and network security consultant.

Harald is living in Berlin, Germany.

Problems and goals of the Linux Input layer in 2.6 and beyond by Vojtech Pavlik Thursday 14:30

The Linux Input layer is a new implementation of keyboard, mouse, joystick, and other human input device drivers for Linux. It has been long in development, and has been added to the Linux kernel in version 2.5.

Today, in the 2.6 kernel, it's in widespread use, and it's strengths and and weaknesses are showing.

The input layer allowed for a very needed generalization and abstraction of human input handling in Linux, creating an unified interace for all such devices. This allowed merging and removing several reimplementations of drivers for identical hardware across different architectures, easing maintenance.

The input layer, by moving input data processing into the kernel also allowed support of multiplexer and pass-through devices found in notebooks, which is a benefit not expected while designing it.

The drawbacks then are namely that it wasn't designed to be 100% compatible with the previous implementation used in the 2.4 kernel. This angers many a user upgrading from 2.4 to 2.6, even when the backward compatibility was sacrificed for enhancing the functionality.

Thus, the development of the input layer is an ongoing battle between innovation and compatibility, trying to keep present behavior as closely as possible, while not being stuck with existing design.

Another unpleasant surprise was that while the kernel development can move forward rather quickly, applications (like is the case of X or GPM) that interface it stay behind. This was a hard learned lesson, and for future a closer cooperation with application developers shows as an absolute necessity.

Last, there are plans for the future - adding full hotplug and sysfs capabilities, that'll allow interfacing to d-bus, better integration with X.org new X servers, adding more drivers, enhancing 2.4 compatibility even further.

Vojtech Pavlik

About the author:

Vojtech Pavlik, born 1976 in Prague, Czech Republic is the main author of the Linux Input layer. He spent some years at the Charles University in Prague, and now is continuing his studies at the Czech Technical University in the same city. Since 1999 he works for SUSE, developing the Linux kernel. His primary area of interest in the kernel are drivers, namely in the IDE, USB, Networking and input subsystems.

TCP Connection Passing by Dr. Werner Almesberger Thursday 15:15

TCP Connection Passing (tcpcp) is an experimental mechanism that allows cooperating applications to pass complete ownership of TCP connection endpoints from one Linux host to another one. tcpcp can be used between hosts using different architectures and does not need the other endpoint of the connection to cooperate (or even to know what's going on).

Such functionality should be useful in load-balancing, process migration, and possibly also failover applications. tcpcp is not a complete process migration or load-balancing solution, but rather a building block that can be integrated into such systems. tcpcp is currently a proof of concept implementation.

tcpcp consists of a kernel patch that implements the operations for dumping and restoring the TCP connection endpoint, a library with wrapper functions, and a few applications for debugging and demonstration.

The paper describes the actual implementation of tcpcp and the connection passing procedure. It also briefly discusses the constraints arising from being compatible with unmodified TCP/IP peers, and possible extension to checkpointing.

Werner Almesberger

About the author:

Werner Almesberger got hooked on Linux in the days of the 0.12 kernel, when studying computer science at ETH Zurich, and has been hacking the kernel and related infrastructure components ever since, both as a recreational activity, and as part of his work, first during his PhD in communications at EPF Lausanne, and later also in industry. Being a true Linux devout, he moved closer to the home of the penguins in 2002, and now lives in Buenos Aires, Argentina.

Contributions to Linux include the LILO boot loader, the initial RAM disk (initrd), the MS-DOS file system, much of the ATM code, the tcng traffic control configurator, and the UML-based simulator umlsim.

Looking at Bluetooth mice and keyboards by Marcel Holtmann Thursday 15:15

In May 2003 the Bluetooth SIG announced the Human Interface Device (HID) profile. This profile defines how wireless devices can discover the feature sets of input devices and communicate with them using the HID protocol. Further it defines how a Bluetooth device can support HID services using the Logical Link Control and Adaptation Protocol (L2CAP) as lower layer. The original HID specification was written for the Universal Serial Bus (USB) and defines protocols, procedures and features for keyboards, pointing devices, gaming devices, remote monitoring devices etc. The Bluetooth HID profile is an adaptation of this specification into a generic HID driver that can be used to control USB and Bluetooth input devices.

This paper covers the integration of the Bluetooth HID profile into Linux and into the Bluetooth subsystem of the kernel. It demonstrates the current way of getting Bluetooth mice and keyboards working with Linux. It also shows the work of transforming the current USB HID driver into an hardware independent HID driver, which can be used by the USB and the Bluetooth subsystem at the same time.

About the author:

Marcel Holtmann is the maintainer of the official Linux Bluetooth stack BlueZ (www.bluez.org). He is working with Bluetooth since 2001 and has written most of the hardware drivers and the protocol implementations for Linux.

TProxy: NAT-based Transparent Proxying With Netfilter by Krisztián Kovács Thursday 16:30

Caching proxies and proxy firewalls are often-used network applications nowadays. Unfortunately to make use of these devices, some reconfiguration of the clients is usually necessary, which may be problematic given a large number of clients and/or a complicated network topology. Transparent proxying is a technique that makes the presence of the proxy invisible to the user, however, needs kernel support to provide full transparency. Linux 2.2 had support code built into the network stack, but it was removed by the developers before Linux 2.4 was released.

This paper introduces TProxy, a NAT-based approach to transparent proxying. This NAT-based implementation is (almost totally) decoupled from the core networking code, yet provides an easy-to-use API for proxies built using the BSD socket library.

The first part introduces the general requirements of transparent proxying. First, the proxy needs to be able to intercept connections originally sent to other hosts, while retaining information about the original destination. To handle more complex protocols with secondary connections, the rules describing these redirections have to be dynamically configurable. Additionally, forging source addresses of connections initiated by the proxy on behalf of the original clients is necessary to provide server-side transparency.

The second part gives an overview of the actual implementation. It begins with the introduction of the basic concept of using dynamic NAT mappings to forge source and destination addresses, then goes on to a more detailed discussion of how TProxy integrates with Netfilter and iptables. The core of the implementation is the new 'tproxy' iptables table, which provides both the traditional "table of rules" interface for the administrators, and a setsockopt()/getsockopt()-based API for proxy applications. The operations defined in the API are designed to be similar to those provided by the sockets API, so that existing applications do not need to be restructured. A brief overview of the internal state table of the tproxy table and the rules governing packet processing are also given, for both TCP and UDP. Besides the "tproxy" iptables module, TProxy still requires some changes to the core network stack and Netfilter, which are also outlined in this section.

The next part discusses a few drawbacks of the network-stack-decoupled implementation, along with possible solutions: timeout and synchronization issues between the network stack and Netfilter's connection tracking subsystem, dynamic allocation of port numbers when creating NAT mappings and early detection of conflicting addresses.

The working of a transparent TFTP proxy is described as an example. The more general applicability of the concept is demonstrated by an implementation of the same functionality on Solaris. Finally, the general experiences of implementing and using TProxy are summarized, along with possible future improvements.

About the author:

Krisztián Kovács is a software developer specializing in low-level network programming, especially Linux and Netfilter. He is a frequent contributor to the Netfilter community and the current developer and maintainer of the transparent proxying patches. He has created an early proof of concept implementation of Netfilter stateful failover, and is the author of various NAT-related patches and extensions in Netfilter. Krisztián has received his Master's degree at the Budapest University of Technology.

Balázs Scheidler is a software developer specializing in transparent, proxy based firewall development, who originally implemented the TProxy patches and handed over maintainership to Krisztián about a year ago. He is a frequent contributor of various random kernel and non-kernel related development projects and is the primary author of syslog-ng and Zorp. Balázs has received his Master's degree in IT engineering at the University of Veszprém.

Samba 4 status update by Volker Lendecke Thursday 16:30

In the last months Samba 4 has seen dramatic improvements in functionality. This talk will give an update of the recent developments that have been done lately. As this abstract is written (early June 2004) the following areas are worth looking at:

Much of the development of Samba 4 has been to get the infrastructure right. Part of this very interesting pieces of infrastructure is the so-called ldb library. LDB is a database library that sits somewhere the simple tdb hashtables and a full LDAP database, but without the maintenance overhead that comes with LDAP. Tdb was a great success and helped Samba a lot in many areas. For example the whole byte range locking, oplock and share mode subsystems would not be where they are now without an efficient communication mechanism. LDB extends tdb with multi-indexing capabilities and most of the query language that LDAP offers. LDB currently comes with two storage mechanisms: tdb and LDAP. This means that a transition from a simple tdb user database to an LDAP server is as simple as redirecting the SAM backend from tdb to LDAP. Data migration is smooth as LDB can export ldif data.

Another important area where Samba 4 has improved a lot is the MSRPC subsystem. This subsystem is responsible for advanced features such as the Samba Domain Controller as well as the NT-style printing subsystem. MSRPC consists of hundreds of remote procedure calls with their individual parameters. In the Samba 3 environment each individual call was hand-coded, each individual byte had to be fetched from the wire and put back in the reply. Samba 4 puts us to the point where we should have been years ago: All the byte-stuffing is now auto-generated by an IDL (Interface definition language) compiler. This makes development of new RPC calls and subsystems vastly easier and less error-prone. The first target of development is the NT-compatible PDC functionality with an ldb-based user database backend. As fast as development is currently proceeding, at the time of this talk I hope to be able to present a Samba 4 PDC that is more capable than Samba 3.

The last part of my talk will cover architectural issues with Samba 4 as an Active Directory Domain Controller. Given that the database Samba 4 is using for its users is very close to an LDAP interface, you might hope that Samba 4 will be able to become something that persuades an XP workstation to download Group Policies. This means that XP sees Samba 4 as an AD Domain Controller. The missing pieces are Kerberos and LDAP. Both OpenLDAP and Heimdal Kerberos offer flexible backend interfaces for their data store, so it should be possible to have them agree upon a common user database. This talk will show the (then) current development in that area.

About the author:

Volker Lendecke, Samba core team member since about 1994, original author of linux smbfs & ncpfs, Co-Founder of SerNet Service Network GmbH in Göttingen, actively doing consulting & development of Samba in customer environments, lots of talks at various european conferences.

pktgen the linux packet generator by Robert Olsson Thursday 17:15

pktgen is a high-performance testing tool included in the Linux kernel. As part of the kernel it can test the TX process on the running system including device driver, chip etc. Wired to other network devices it can work as an ordinary packet source to test any network device as routers, bridges etch, including the Linux network stack.

Robert Olsson

About the author:

Since a couple years involved in development and testing of the Linux network stack architecture and device drivers with somewhat focus on packet fowarding. Author has also experiences from Linux router deployment in high-speed production environments.

Speeding up the Linux boot process with minit by Felix von Leitner Thursday 17:15

The typical Linux boot process consists of dozens of shell scripts spawning hundreds of processes, and service startup is serialized. This makes the Linux boot process very slow.

Switching to minit reduced post-kernel initialization time from over 20 seconds to about 2 seconds on my notebook.

minit also offers dependencies, the static binary is only 8k, it can work on read-only boot media, is reconfigurable at run-time, has built-in logging support, can respawn services that died, requires neither /proc to be mounted nor System V IPC nor Unix Domain Sockets. minit is primarily meant for embedded systems but is particularly useful for servers as well.

Windows XP goes to great length to make the boot process look fast. It is time we do the same. The minit speedup is so substantial that it can cause a real change in how you use your computer. The author found minit boot time faster than suspend-to-disk (and even suspend-to-RAM) wake-up time, so he never uses those features.

About the author:

Felix started several free software projects geared towards small software for embedded environments (or just for people who are dogmatic about complexity reduction). The most well known project is the diet libc, a small libc for Linux.

Other noteworthy accomplishments are

  • libowfat (a small abstraction layer and replacement for the most insultingly bad libc APIs)
  • gatling (one of the fastest web servers around, if not the fastest)
  • research into scalability APIs for fast and scalable network programming (gatling is the testbed for this research)
  • tinyldap (a very fast and small read-only LDAP server ala tinydns from djbdns).

Felix also founded a small security consulting company called Code Blau GmbH with a friend. Call us if you are in need of a good security consultant.

Free and Open Source Software: A retrospective from 2091 by Jon "maddog" Hall Friday 9:30

It is 100 years since Linus Torvalds sent out the first few pieces of code for what became the world's most popular operating system. How has this changed the software climate around the world? What have we done differently in the last 100 years because of this? Could we have followed another path. We have pulled out of suspended animation one of the first users of Free and Open Source code to describe issues of that day which were resolved.

Jon "maddog" Hall

About the author:

Jon "maddog" Hall was the Executive Director of Linux International, one of the early driving forces behind Linux. Deemed redundant in the year 2015, he was put into suspended animation, only to be revived every twenty-five years to speak about the evolution of Free and Open Source Software.

A new Cluster Resource Manager for Heartbeat by Lars Marowsky-Brée Friday 10:15

The Linux HA project has made a lot of progress since 1999. It's main application heartbeat is probably one of the most widely deployed two-node failover solutions on Linux, and has proven to be very robust. Linux HA not only has a large user base, but its modular nature has also attracted many developers and companies. However, a lot of work remains to be done and some is even on-going (though patches are always accepted).

This talk outlines the key features and design of a clustered resource manager to be running on top of and enhancing the Open Clustering Framework infrastructure provided by heartbeat, and of course the current status of the sub-project.

The goal is to allow flexible resource allocation and globally ordered, dependency-based recovery actions in a cluster of N nodes and dynamic reallocation of resources in case of failures (fail-over) or in response to administrative changes to the cluster (switch-over).

It will provide insight for potential contributors looking for a great project to work on, and also explain how those key features are actually useful in real life to model complex HA scenarios.

Lars Marowsky-Brée

About the author:

Lars Marowsky-Brée currently works as a developer for SUSE Labs. His main area of expertise are High Availability and Cluster related topics; ranging from storage (multipathing, RAID and replication) over network load balancing to Cluster Infrastructure Service, Resource Management and Administration. As a realist (people who call him a pessimist already do not want to know what that would be like) and natural paranoid, he enjoys these topics a lot.

Using Linux since 1994, his initial involvement with network operations (aka BOFH) provided him with lots of real-life experience about the various reasons for service outages and the one common factor. He soon began to appreciate the complexities in keeping a service running in the face of malicious software, possessed hardware, well-intentioned users and the world at large and loves to rant about it; this has kept him employed and invited to conferences ever since.

In early 2000, he took the chance to work on Linux High Availability exclusively and joined SuSE.

NX/NoMachine -- GPL software for fast, responsive remote GUI (even across modem) by Kurt Pfeifle Friday 10:15

NX from NoMachine needs only 40 kBit/sec modem bandwidth to display a complete KDE desktop with Konqueror (file manager), KMail (mail client), Mozilla (web browser) and OpenOffice (word processor), and still have a very good responsiveness.

It does so by using an extremely well working new type of specialized X11 compression, an intelligent caching mechanism and a new design to eliminate the dreaded X11 roundtrips which make normal remote GUI via X11 so painfully slow.

The talk and presentation will show how each of the mechanism works. We will see "live" different types of links to a remote Linux server displaying a GUI. Some session will use only a part of the NX technologies (like with and without roundtrip suppression). This will demonstrate the weight of each part of the NX technology for the complete setup.

Though NX can also access remote Windows Terminal Server session very fast, we will only touch that feature very shortly. We will look only very shortly at the commercial NX implementation by NoMachine and concentrate on the commandline way to run NX by only using the GPL'd part.

If it is ready, we will also see the KDE implementation of NX technology to form a Free and Open Source Terminal Server for cross platform access, compatible with the commercial NX from NoMachine.

About the author:

Kurt is working as a system engineer at Danka Deutschland GmbH. His job includes consulting and training related to network printing, IPP (Internet Printing Protocol), migrating heterogeneous networks to Linux print servers (with the help of CUPS and Samba).

Kurt's original involvement into Free and Open Source Software was heavily based on printing problems. Lately this has changed somewhat. Ever since he first came across NX he is fascinated by the uniq and new ways of using computers it offers. To him, NX represents a truely disruptive technology which will make its impact not only on the Unix, but also on the Windows and Mac OS X platforms.

NFS: The Greatest Networked File System by Olaf Kirch Friday 11:30

This is a general talk about NFS, its history, its problems, and future. The talk will be technical, discussing common problems in networked file systems, and how they were solved (or ignored) in various NFS protocol versions.

The talk will also cover NFSv4, and briefly touch other networked file systems.

The target audience is people interested in network file systems.

About the author:

Olaf Kirch has been a Linux contributor for over 10 years, authoring the Linux Network Administrator's Guide, and large portions of the NFS kernel code.

Legal enforcement of the GPL by Harald Welte Friday 11:30

More and more vendors of various computing devices, especially network-related appliances such as Routers, NAT-Gateways and 802.11 Access Points are using Linux and other GPL licensed free software in their products.

While the linux community can look at this as a big success, there is a back side of that coin: A large number of those vendors have no idea about the GPL license terms, and as a result do not fulfill their obligations under the GPL.

The netfilter/iptables project has started legal proceedngs against a number of companies in violation of the GPL since December 2003. Those legal proceedings were quite successful so far, resulting in a number of amicable agreements and one granted preliminary injunction.

This presentation will cover

  1. techniques used for reverse engineering and discovering the use of linux
  2. theoretical options an author has to enforce his copyright
  3. practical implications
  4. success stories so far
  5. how to make potential GPL enforcement easier while writing software

About the author:

Harald Welte is the chairman of the netfilter/iptables core team.

His main interest in computing has always been networking. In the few time left besides netfilter/iptables related work, he's writing obscure documents like the UUCP over SSL HOWTO. Other kernel-related projects he has been contributing are user mode linux and the international (crypto) kernel patch.

He has been working as an independent IT Consultant working on projects for various companies ranging from banks to manufacturers of networking gear. During the year 2001 he was living in Curitiba (Brazil), where he got sponsored for his Linux related work by Conectiva Inc.

Starting with February 2002, Harald has been contracted part-time by Astaro AG, who are sponsoring him for his current netfilter/iptables work.

Aside from the Astaro sponsoring, he continues to work as a freelancing kernel developer and network security consultant.

Harald is living in Berlin, Germany.

Journalled quota by Jan Kara Friday 12:15

As the time goes processors are faster and faster, disks bigger and bigger and the bandwidth of the buses falls behind. Hence a filesystem check on the current disks takes often more than an hour and it becomes uncomfortable even for a regular user. One way of eliminating the need of a long filesystem check is a method called "journalling". This method keeps a log of recently performed operations (journal) and after an unclean shutdown it is enough to scan the log and finish (or cancel) the operations which are not complete (this is called "journal replay").

The problems similar to the long filesystem check times arise also in the case of quotas. After an unclean shutdown it is needed to correct the files with a quota information which requires scanning of the whole disk to compute the space and the number of inodes each user (or group) uses. Fortunately journalling helps also in this case.

In the paper we describe a basic VFS internals and how the quota subsystem is connected to the rest of VFS first. Then we show the internal structure of the quota subsystem including some information about "quota formats layer" implementing the writing of quota information to disk. Afterward we explain the ideas of journalling and briefly describe its implementation in Linux kernel. Finally we present in details the implementation of journalled quotas in recent 2.6 kernels and its interaction with filesystems (ext3 and reiserfs). We discuss some issues arising from the deletion or the truncation of files during the journal replay and also the ways of avoiding the deadlocks due to recursion into filesystem.

In the end we present some ideas about new features the quota subsystem could provide.

Jan Kara

About the author:

First I have seen Linux installed on computers of my friends in 1996. Shortly after that I started using it. Somehow I was fascinated by filesystems and so I have read the sources of the kernel and written my own filesystem. While reading the sources I found an area which was not maintained, wrote a few fixes and became a maintainer of the quota subsystem. In 1998 I got an offer from SuSE to work on the kernel and get a salary for it and so I accepted. Currently I am working in SuSE Labs as a part-time kernel developer and I am also PhD student of Discrete Mathematics at Charles University in Prague.

The kexec Way to Lightweight Reliable System Crash Dumping by Hariprasad Nellitheertha Friday 12:15

Most Operating Systems have had reliable crash dumping solutions to suit OS and platform specifications. In the case of Linux, however, it has not been possible to develop a comprehensive solution. This is because Linux caters to a wide variety of platforms, devices and usage environments. A common dumping solution which addresses the needs and limitations of these situations is not practical.

The main issues with regard to existing solutions (LKCD, RedHat netdump & diskdump, etc) are:

  • Dependency on the "failed" kernel to perform dumping
  • No standardization with regard to dump formats leading to a plethora of analysis tools such as crash, lcrash, gdb, etc.
  • Installation overheads arising from the need for a pre-configured "dump device" such as a disk partition or a separate server and a specialized dump gathering tool
  • Size and complexity of code executed from the "failed" kernel

The availability of kernel-to-kernel bootloaders has opened up interesting possibilities in addressing these issues. One such feature available on the 2.6 kernel is "kexec". kexec is a fast reboot feature which makes it possible to load and boot a new kernel from an existing one. In addition, it avoids the firmware reboot stage and thus preserves memory across reboots. Crash dumping tools can exploit this behaviour of kexec.

In this paper, we talk about a new crash dumping tool we are developing as a next generation first failure data capture facility for Linux. This new tool uses the kexec feature to reboot to a new kernel upon an OS failure such as an oops or panic. System memory is preserved across this reboot. Additionally, the second, possibly custom-built, kernel boots with a very small amount of system memory. The rest of the memory is accessible as a high memory device in the context of the second kernel. This memory can now be accessed as just another device. This allows a surprising amount of flexibility for independent utilities to gather or analyse specific FFDC data from the failed kernel from pending syslog buffers to custom utilities for viewing specific subsystem memory state at the time of failure.

Additionally, we also provide interfaces in the /proc file system to view the preserved memory in the form of an ELF format file. Such an abstraction makes it possible to treat the crash dump as an ordinary file. Saving a dump is now a matter of a file copy (locally or across networks) using standard file transfer commands. This completely eliminates the need for a separate dump saving utility and also eliminates the need for a pre-configured dump device.

Further, presenting the dump as ELF format files means standard core dump analysis tools such as GDB or "crash" can be used to analyse these dumps, simplifying adoption.

Separate /proc views of filtered crash dump memory contents could be provided making it possible to extract a "kernel pages only", "entire memory" or "all except free pages" version of the dump without additional pre-dumping configuration or setup. This is especially important for bringing dumps down to manageable sizes on machines with several gigabytes of memory.

Our WIP code indicates that this approach is very minimally invasive to the core kernel code and is small and lightweight compared to the other implementations in existence.

We discuss the status of this effort, related work, the benefits that this approach has over other implementations and the problems that it solves.

Hariprasad Nellitheertha

About the author:

Hariprasad Nellitheertha works at IBM India Software Labs as a member of the Linux Technology Center. An Electronics and Communications Engineer from the University of Mysore, Hari joined IBM in September 1999. Hari was first exposed to operating systems when he worked on OS/2 kernel and file systems, specifically on the HPFS386 and OS/2 JFS file systems. Hari has been with the Linux Technology Center since 2003. Initially, Hari was with the Linux Defect Fixing Support team, before joining the RAS team. Hari works in the Linux Kernel Crash Dumps project.

Linux Cluster Logical Volume Manager by Heinz Mauelshagen Friday 14:00

Red Hat open-sourced all formerly commercial products of Sistina Software, a Minneapolis based storage and clustering solutions provider, at the end of July after the aquisition in January 2004.

Beside GFS (The Global File System), the Cluster Logical Volume Manager got open-sourced under the (L)GPL, which allows Logical Volume Management of shared Physical Volumes, Volume Groups and Logical Volumes in shared storage clusters.

2002 Sistina introduced LVM2 and Device-Mapper as an LVM1 compatible, feature enhancent next generation Logical Volume Manager with atomic metadata transactions for enhanced resilience and full configurability of device name filters, metadata file stores, logging levels etc. The open-source CLVM is a superset of LVM2/Device-Mapper.

The talk will cover the architecture and use of CLVM/Device-Mapper and its relationship to Cluster Infrastructure components used.

About the author:

Heinz Mauelshagen is the author of LVM1 and works on Device-Mapper, LVM2 and CLVM.

mISDN - a modular ISDN driver stack by Karsten Keil Friday 14:00

mISDN is a new ISDN driver for passiv ISDN controller based on the ITU ISDN stack layout. It should become the successor of the HiSax driver.

It implement a CAPI 2.0 stack in a modular way. Layers of the stack can be released in kernel or userspace, so more complex protocols can be implemented and debugged in userspace. The layer userspace interface may be also used for special applications, like ISDN protocol analyzer and tester. The standard CAPI20 application interface allows easy porting and developing of applications.

I will talk about the current ISDN driver situation and the reasons for developing mISDN. I will give a description about the driver layout, basic components and the implemented interfaces between the driver components. A overview about already released parts and some application will complete the talk.

Karsten Keil

About the author:

Karsten Keil is Linux developer since 1995, his special area is writing drivers for passiv ISDN cards (HiSax). In 1997 he held the ISDN4Linux workshop with Fritz Elfert during Linux-Kongress in Würzburg. Since 1999 he works for SuSE Linux AG as developer, member of the SuSE Labs with main focus ISDN and communication drivers/applications.

Cluster Snapshot Block Device by Daniel Phillips Friday 14:45

One of the central requirements of an enterprise-class filesystem is live backup, that is, the ability to snapshot a consistent image of a filesystem at any instant in time, then copy the snapshot to backup media while the filesystem continues to operate normally, possibly under heavy load. As part of its Device Mapper subsystem Linux provides a snapshot facility that works at the block device level. This work extends the concept to cluster block devices, which may be accessed simultaneously by a cluster filesystem mounted on many separate cluster nodes. Some deficiencies of the existing single-node snapshot virtual device are corrected: data blocks are shared between individual snapshots; data for all snapshots is unified into a single snapshot store volume, and performance is kept nearly constant regardless of how many snapshots are held simultaneously. This is accomplished by providing distributed access to a btree database resident within the snapshot store volume. For each node, a simple kernel client runs under the device mapper subsystem, while a single, relatively complex server runs in userspace, providing synchronization and device access information to clients over a network. Some interesting design techniques are applied in order to correctly simulate the reliability characteristics of a single, physical storage device under numerous possible cluster failure modes.

About the author:

Daniel Phillips has contributed to various filesystem and memory management subsystems of the Linux kernel over the past years. Lately he has acted as a liason between the proprietary world of Sistina GFS and the Linux kernel world, including the OpenGFS project which famously carried on from Sistina's early GPL release of GFS. Thanks to Sistina for funding it and to Red Hat for setting it free, the work described in this paper is now available to everyone.

Persistent device names with udev by Ihno Krumreich Friday 14:45

The advantages and shortfalls of persistent device nodes have been discussed several times over the last years and as of yet no generally acceptable solution has been found.

This is partly due to the lack of a flexible framework which allows for a naming scheme to be implemented, partly due to a missing scheme which would allow for a persistent device naming of all connected hardware and which fits the most common requirements, namely desktop and enterprise installations.

The introduction of the Linux Device Model in Linux 2.6 with its automatic generation of hotplug events and the user-space daemon udev makes it now possible to easily generate arbitrary device names which could be used to implement persistent device names.

This paper presents a persistent device naming scheme in which persistent device names are introduced as symlinks to the existing device nodes. These symlinks reside in the subdirectory of the /dev tree so as not to interfere with existing device names. To cater for different installation requirements two naming schemes (called 'access methods') are provided, one for identifying a device by the hardware connected to it (access-method 'by-id'), and a second one for identifying a device by the (hardware-) path to the connected device (access-method 'by-path').

The generation rules for the access-methods are presented in this paper, together with an analysis of the advantages and pitfalls for those methods.

This persistent device naming scheme has been implemented in the upcoming SUSE SLES9 release and is presented here as a base upon which a generally accepted persistent device naming scheme could be introduced.


Comments or Questions? Mail to contact@linux-kongress.org Last change: 2005-09-17