Get news? 2011 | 2010 | 2009 | 2008 | 2007 | 2006 | 2005 | 2004 | 2003 | 2002 | 2001 | 2000 | 1999 | 1998 | 1997 | About | Contact Want to help?

Linux-Kongress 2005
12th International Linux System Technology Conference
October 11-14, 2005 at University of Hamburg, Germany

Home | Program | Abstracts | Tutorials | Fees | Registration | Exhibition | Location & Accomodation | Sponsors | Call for Papers

Papers

The following papers will be presented at the Linux-Kongress on Thursday, October 13 and Friday, October 14. All talks are in English. (The original Call for Papers.)

Thursday, October 13

Friday, October 14

Linux-HA Release 2: World Class High-Availability Software by Alan Robertson Thursday 11:30

Linux-HA is the oldest, best-known and best-tested and most widely written about open source high-availability suite for Linux. For many years, it was limited to two nodes, and didn't monitor services for correct operation.

The recently released Linux-HA Release 2 extends the capabilities of Linux-HA far beyond anything available in the past, and provides basic capabilities comparable to any commercial HA package. This release provides capabilities for integrated resource monitoring, larger clusters, 1-node clusters, sophisticated resource dependency models, and for integration with cluster filesystems like Red Hat's GFS, and Oracle's OCFS2.

This talk will give an overview of release 2, explaining these new features and how to configure them, and provide an overview of features coming in the near future.

About the author:

Alan Robertson founded the High-Availability for Linux project where he has been an active developer, architect and project leader since about 1997. He maintains the Linux-HA project web site at http://linux-ha.org/, and has been a key developer for the open source heartbeat program. In the open source world, he worked for SuSE for a year, then joined IBM's Linux Technology Center in March 2001. Alan also jointly leads the Open Cluster Framework effort to define standard APIs for clustering, and provide an open source reference implementation of these APIs.

Before joining SuSE, he was a Distinguished Member of Technical Staff for Bell Labs. He worked for Bell Labs 21 years in a variety of roles. These included developing telecommunication products, designing communication controllers and providing leading-edge computing support.

He obtained an MS in Computer Science from Oklahoma State University in 1978 and a BS in Electrical Engineering from OSU in 1976.

Programming with the Netpoll API by Jeffrey Moyer Thursday 11:30

The netpoll API provides a framework for implementing kernel UDP clients and servers that operate outside of the Linux kernel's network stack. Because it does not use the network stack, netpoll is able to send and receive packets in situations where normal packet delivery would not be possible. An example of this is when the system is quiesced for debugging or when taking a crash dump.

Netpoll requires each underlying device driver to implement a poll_controller hook. The contents of this hook are essentially the same across all drivers, but will vary slightly depending on whether a particular driver is written to use the New API (NAPI). The one exception to this rule is the bonding driver. Because the bonding driver is a virtual device driver, ushering traffic to real devices based on policy, it requires further hooks into the netpoll code to send and receive packets.

This paper explores the design and implementation of the netpoll API. A necessary primer on relevant portions of the network driver interface is presented. A comparison is made between netpoll and the low level networking code that it emulates. The changes made to the core network stack to accommodate netpoll are also explained. Using the information presented, the netconsole implementation is extended to support reading input from a remote server.

Jeff Moyer

About the author:

Jeff Moyer is a senior software engineer at Red Hat, Inc., who has been using Linux since 1995. In his formative years, he worked on high performance cluster computing infrastructure at Worcester Polytechnic Institute. He then went on to implement high availability cluster software such as Kimberlite, Convolo Dataguard, Convolo Netguard, and other solutions in the embedded device space. Jeff has since moved on to a mixed bag of hacking, including the Linux automounter, the netpoll API, Red Hat's netdump utility, and the AIO subsystem.

DRBD 8 by Philipp Reisner Thursday 12:15

DRBD is a well established Linux software component to build HA (high availability) clusters out of common off-the-shelf hardware. DRBD's key point is to replace a shared storage system by online mirroring. In the presentation and the paper we will describe DRBD 8's new features, which are a major step forward for shared nothing HA systems.

The most outstanding new feature is certainly our new "shared disk" mode, i.e. support for shared-disk file systems and other shared storage-aware applications.

Provided that you use one of OpenGFS, GFS, OCFS2, this means that applications have read and write access to the mirrored dataset on both nodes at the same time, since you can mount the storage on both peers.

Besides that, DBRD 8 supports effective resource level fencing. It prevents the divergence of the nodes' copies of the data when DRBD's communication path fails.

Although DRBD has been widely used and under active development for 5 years now, it turned out that the method to determine the node with the up-to-date data was suboptimal. A new scheme based on UUIDs will be presented. This new scheme also works correctly in multi-node setups.

About the author:

Philipp graduated from the Vienna University of Technology in computer science in 2000. Since November 2001 he has been MD at LINBIT, a provider of professional Linux services with a focus on high availability clustering.

Lars-Gunnar Ellenberg joined DRBD development in 2002. Since then he has become a co-author, and is now employed at LINBIT.

Status of IPv6 in Linux by Dr. Peter Bieringer Thursday 12:15

This speech will cover a view about the status of implementation of IPv6 in Linux kernel, C-libraries and client-/server networking programs.

The Status of Linux will be shown for kernel with information about the history and ongoing work. In addition the status of nowadays important firewalling would be given.

The status of C-libraries is releated to DNS resolver and RPC support.

The status of client-/server networking programs show examples which applications are already supporting IPv6 native or by additional patches.

Next, some examples will be given how to enable IPv6 in Linux and get IPv6 connectivity and how to setup permanent configuration. Afterwards, some examples are shown how easy it is to enable IPv6 in applications (client and servers).

Last but not least summary & outlook show additonal information what to expect in the future and further information about additional informational resources.

About the author:

The author starts his IPv6 related work in 1996. Since 2000 he is employed at AERAsec Network Services and Security GmbH as security consultant and trainer for several courses including IPv6. He is also publisher of several IPv6 related documents on the World Wide Web like the 'IPv6 & Linux - HowTo' and its successor 'Linux IPv6 HOWTO' and 'IPv6 & Linux - Current Status'. Also he programs the tool 'ipv6calc' and developed the IPv6 support in 'initscripts', which is used in Red Hat Linux / Fedora Core and clones. In addition he is co-founder and core member of 'Deep Space 6' and member of the 'German IPv6 Task Force'.

Cluster synchronization with csync2 by Clifford Wolf Thursday 14:30

Csync2 is a cluster synchronization tool. It can be used to keep files on multiple hosts in a cluster in sync. Csync2 can handle complex setups with much more than just 2 hosts, handle file deletions and can detect conflicts. It is expedient for HA-clusters, HPC-clusters, COWs and server farms.

Usually, the job csync2 does is seen as very trivial task and so it is most often solved using little shell scripts and tools such as rsync or scp. But those solutions do not address the three most difficult issues in synchronizing files in a cluster:

1. Conflict detection

The trivial rsync-based method for syncing the files does not detect a conflict if a file has been modified on both hosts. Usually there also is no simple way of replicating file removals.

2. Complex setups

The two-node scenario is a very simple but not a very realistic one. Usually there are various intersecting groups of servers, some files being replicated between all hosts and other files only between a smaller set of servers.

3. Reacting to updates

In many cases it is not sufficient to simply replicate files. Instead, it may be necessary to execute arbitrary commands in reaction to files matching a pattern being updated. E.g. in a web-server cluster the apache configuration files should be synchronized between the servers and an 'apachectl graceful' should be executed whenever the apache configuration has been changed.

Csync2 addresses these (and many other) issues and so provides an important foundation for professional Linux clustering.

I have developed csync2 as complement to DRBD (which also has been developed at LINBIT Information Technologies). While DRBD does synchronous replication of block devices between two nodes for fail-over clusters and is usually used for e.g. databases, csync2 does asynchronous replication of files between many nodes for all kinds of clusters and is usually used for configuration files and application images.

Clifford Wolf

About the author:

Clifford Wolf is working as software developer at LINBIT Information Technologies in Austria, Vienna. He is well-known for his free software projects such as ROCK Linux, a tool-chain for building Linux distributions.

In his remaining spare time he is press relations officer for the Austrian branch of the Chaos Computer Club and is moderating his own radio show 'Nerd on Air' on 'Radio Orange'.

Dynamic Device Management - udev as a kernel event manager by Dr. Hannes Reinecke Thursday 14:30

Every new computer generation increases the amount of hardware that can be connected and disconnectd at any time on a running system. This requires a migration from static system setups to an event driven model, to dynamically adapt to the changing system environment.

The "Hotplug" Subsystem

The traditional Linux hotplug subsystem consists of a set of shell-scripts, called agents. If the kernel detects a new device or if a device is removed, the kernel forks an event handling program which dispatches the event to one of the agents. The agent will trigger the load of the appropriate kernel-module and possibly configure the device.

Restrictions

However, with full hotplug event support for all registered devices this mechanism has come to age and has some drawbacks. With the ever growing number of dynamic devices, the execution of the hotplug dispatcher can lead to serios problems. The events may arrive out-of-order and the system may be left in the state the last event has signified instead of representing the actual state of the kernel. Modern power management and supend/resume support interacts very badly with kernel-forked helper processes while it tries to change the systems power state. A very large amount of events-processes can lead to memory shortage or even a machine freeze. The script-based system is known to be time and resource consuming, resulting in long processing time for a single event.

Integrated Device Management System

This situation can be vastly improved by integrating the hotplug event processing into the udev event handling.

The primary goal of udev is to keep the device nodes in /dev in sync with the currently known kernel-devices. To create a device node udev receives an uevent from the kernel and matches the event properties with a set of rules to decide if and how a resulting device node is created and named. As udev already has the capabilities to execute arbitrary programs for any kernel event, it is possible to design a hotplug subsystem integrated into udev. Such a system has a far finer-grained control over the actions which have to be taken for a specific device as the traditional linux hotplug subsystem ever could.

This paper presents a fully integrated udev-based device management system; a performance comparison against the traditional hotplug subsystem is given.

About the author:

Studied Physics with main focus image processing in Heidelberg from 1990 until 1997, followed by a PhD in Edinburgh's Heriot-Watt University. Worked as sysadmin during the studies, mainly in the Mathematical Institute in Heidelberg.

Linux addict since the earliest days (0.95); various patches to get Linux up and running. Now working for SUSE Linux Products GmbH to support IBM's S/390 architecture on Linux. Main points of interest are SCSI, multipathing, udev and device configuration. And S/390, naturally. Plus the odd obscure hardware like DEC Alpha, for an ever decreasing number of machines.

High Availibilty and Load Balancing Cluster for Linux Terminal Services by Wolfgang Büch Thursday 15:15

Combining Linux Terminal Services with a High Availibilty Environment provides a centralized and redundant network managment platform for linux thin clients.

Three years ago the Regional Computer Center (RRZ) of the University of Hamburg started a project to migrate windows based computers to diskless linux clients. Today over 250 diskless linux clients, some of them several kilometers away, are managed in a WAN from a central point of administration in the RRZ. The deployment of linux thin clients saved over 70% percent of primary investment compared to traditional technical solutions.

The concept and all modifications developed at the RRZ are mainly derived from three linux projects:

  • The implementation of the "Linux Terminal Server Project".
  • Linux Cluster nodes based on the "Linux Virtual Server (LVS)" Project act as loadbalancer and HA solution for all the services needed by thin clients.
  • The openldap Project builds a centralized and redundant replicated database for all network information used by the terminal server and/or thin clients.

In this paper we will describe the modification of programs like LTSP and the general cluster configuration.

Clustering of Terminalservices can be achieved by eliminating all dependencies among the cluster nodes. Therefore all services like dhcp,dns and ltsp were modified to share a unique and replicated ldap database, which provides informations like ip-address and hostname needed by a service e.g dhcp.

The load balancing is implemented according to LVS. One redundant node acts as an LVS director which distributes the requests for services to the available cluster nodes acting as terminal servers. As all relevant network information is stored in a replicated ldap database on each cluster node, these terminal servers are able to reply to every service request from any thin client. This means that the terminal system as a whole acts as a load balancing and high availability linux cluster for terminal services..

About the author:

Wolfgang Büch works as a system administrator at the Regional Computer Center (RRZ) at the University of Hamburg.

dmraid - device-mapper RAID tool by Heinz Mauelshagen Thursday 15:15

Device-mapper, the new Linux 2.6 kernel generic device-mapping facility, is capable of mapping block devices in various ways (e.g. linear, striped, mirrored). The mappings are implemented in runtime loadable plugins called mapping targets.

These mappings can be used to support arbitrary software RAID solutions on Linux 2.6, such as ATARAID, without the need to have a special low-level driver as it used to be with Linux 2.4. This avoids code-redundancy and reduces error rates.

Device-mapper runtime mappings (e.g. map sector N of a mapped device onto sector M of another device) are defined in mapping tables.

The dmraid application is capable of creating these for a variety of ATARAID solutions (eg. Highpoint, NVidia, Promise, VIA). It uses an abstracted representation of RAID devices and RAID sets internally to keep properties such as paths, sizes, offsets into devices and layout types (e.g. RAID0). RAID sets can be of arbitrary hierarchical depth in order to reflect more complex RAID configurations such as RAID10.

Because the various vendor specific metadata formats stored onto ATA devices by the ATARAID BIOS are all different, metadata format handlers are used to translate between the ondisk representation and the internal abstracted format.

The mapping tables which need to be loaded into device-mapper managed devices are derived from the internal abstracted format.

My talk will give a device-mapper architecture/feature overview and elaborate on the dmraid architecture and how it uses the device-mapper features to enable access to ATARAID devices.

About the author:

Heinz Mauelshagen is the original author of the Linux Logical Volume Manager. He develops and consults in various areas of storage management and is now employed by Red Hat GmbH since the aquisition of Sistina Inc. in early 2004.

Virtualization with Xen by Gerd Knorr Thursday 16:30

Xen is a open source virtualization project, maybe the most important one these days. Initially it was a project created by the Computer Laboratory of the University of Cambridge, but now development is done by a much larger community.

This talk will cover the technical aspects of xen. After giving a short overview about the history of the project and xen's features I'll introduce the concept of the paravirtualization used by xen, will talk about memory management and explain how hardware device access and virtual devices are handled in xen.

About the author:

Gerd Knorr has worked in various areas of the linux kernel. That includes but isn't limited to the maintainance of the video4linux subsystem and some v4l drivers during the last years. Current main focus is virtualization, mostly Xen but also uml. He is is member of the SUSE Labs.

RFID, Biometric Passports and Linux by Harald Welte Thursday 16:30

Numerous countries around the globe are in the process of introducing passports with biometric information stored on RFID chips, so-called ICAO MRTD's (Machine Readable Travel Documents). The German authorities coincidentially plan to issue the first such passports at the time of LK2005 in October 2005.

As part of the CCC (Chaos Computer Club) working group on biometric passprots, the author of this paper has followed the technical development and standardization process very closely. In order to gather first-hand experience with this new technology, he has implemented a GPL-licensed, Linux-based RFID stack.

The stack includes a device driver for the common Philips CL RC632 reader chipset, an implementation of the ISO 14443-1, 2, 3 and 4 protocols, as well as an example "border control application" that is able to read and verify information stored on an ICAO MRTD compliant passport.

The paper covers some high-level introduction into the technical standards, as well as a description of the "libmrtd" and "librfid" projects and a live demonstration with some passport samples.

About the author:

Harald Welte is the chairman of the netfilter/iptables core team.

His main interest in computing has always been networking. In the few time left besides netfilter/iptables related work, he's writing obscure documents like the "UUCP over SSL HOWTO" or "A packet's journey through the Linux network stack". Other kernel-related projects he has been contributing to are random netowrking hacks, some device driver work and the neighbour cache.

He has been working as an independent IT Consultant working on projects for various companies ranging from banks to manufacturers of networking gear. During the year 2001 he was living in Curitiba (Brazil), where he got sponsored for his Linux related work by Conectiva Inc.

Starting with February 2002, Harald has been contracted part-time by Astaro AG, who are sponsoring him for his current netfilter/iptables work. Aside from the Astaro sponsoring, he continues to work as a freelancing kernel developer and network security consultant.

He licenses his software under the terms of the GNU GPL. Sometimes users of his software are not compliant with the license, so he started enforcing the GPL with his gpl-violations.org project.

During the last year, Harald has started development of a free, GPL-licensed Linux RFID and electronic passport software suite.

Harald is living in Berlin, Germany.

Automating Xen Virtual Machine Deployment by Kris Buytaert Thursday 17:15

While consolidating physical to virtual machines using Xen,we wanted to be able to deploy and manage a virtual machines in the same way we manage and deploy physical machines. For operators and support people there should be no difference between virtual and physical installations.

Integrating Virtual Machines with the rest of the infrastructure, should have a low impact on the existing infrastructure. Typically Virtual machine vendors have their own tools to deploy and manage virtual machine, Apart from the vendor lock-in to that specific virtual machine platform , it requires the administrators to learn yet another platform that they need to understand and manage, something we wanted to prevent.

This paper discusses how we integrated SystemImager with Xen, hence creating a totally open source deployment framework for the popular opensource Virtual Machine monitor. We'll document both development of our tools and go more in depth on other infrastructure related issues when using Xen.

System Imaging environments in combination with Virtual machines can also be used to ensure safe production deployments. By saving your current production image before updating to your new production image, you have a highly reliable contingency mechanism. If the new production environment is found to be flawed, simply roll-back to the last production image on the virtual machines with a simple update command!

Xen has become one of the most popular virtualisation platforms during the last 9 months, although not such a young project, it is now gaining acceptance in the corporate world as a valuable alternative to VMWare.

About the author:

Kris Buytaert is a Linux and Open Source Consultant operating in the Benelux. He has consulting and development experience with multiple enterprise level clients and government agencies. He is a contributor to the Linux Documentation Project and author of different technical publications. Kris is maintainer of the openMosix HOWTO.

linuxprinting.org -- Software and Knowledge for Printing with Free Software by Dr. Till Kamppeter Thursday 17:15

Getting printing properly working is a rather complicated task in the administration of a GNU/Linux or Unix system, especially when one wants to make use of all the capabilities of a modern printer. One needs a printer spooler which collects the print jobs from applications and network clients, filters to transfer non-PostScript jobs to PostScript and a printer driver which transfers PostScript to the printer's native language.

All this is not trivial: First, one needs a printer for which a driver or enough knowledge about its language exists, and then one has to make the printing system call the correct filters with their long, cryptic, and often not well-documented command lines and to give the user the possibility to control the capabilities of the printer.

To improve this situation, Grant Taylor, the author of the former Printing-HOWTO has set up a database for information about free software printer drivers as well as for printers and how they are supported with free software. This database, called Foomatic is located on http://www.linuxprinting.org/ and Till Kamppeter is currently maintaining it. Now the database lists near 250 free software printer drivers and more than 1600 printers.

The database is implemented in XML and is accompanied by a universal, PPD-(Postscript Printer Description)-based print filter ("foomatic-rip") and Perl scripts which automatically create Adobe-compliant PPD files and even complete print queues for all known free spoolers: CUPS, LPRng, LPD, GNUlpr, PPR, CPS, PDQ, and spooler-less printing. With these queues the user will have access to the full functionality of the printer driver in use and thanks to the PPD files he can even use all the printer options out of applications (as OpenOffice) or from Windows/Mac clients.

The system is used by the printer setup tools of most GNU/Linux distributions, as Mandriva, Red Hat/Fedora, SuSE, ... and several printer manufacturers are contributing to the database. For PostScript printers some manufacturers (HP, Ricoh, Epson, Kyocera, ...) even release the official PPD files which are part of their Windows/Mac software as free software and post them on linuxprinting.org. So it turned to be an unofficial standard and around 10000 people visit the linuxprinting.org web site every day.

With its database and its static pages linuxprinting.org is the biggest knowledge base about printing with free software. To make it easier for users and printer manufacturers to add even more knowledge it is planned to manage the site content with MediaWiki, the Wiki system successfully used by WikiPedia. Then static pages and discussion forums will be replaced by the Wiki system, and the Wiki will also serve as input frontend for new printers.

The talk will cover

  • how linuxprinting.org evolved
  • what linuxprinting.org provides
  • how the database is structured
  • how PPD files and printer queues are generated with it
  • how the devlopment of linuxprinting.org will go on
  • perhaps first linuxprinting.org-Wiki experience

This talk is aimed to system administrators and technically interested users who want to know what happens "behind the scenes".

Till Kamppeter

About the author:

Till Kamppeter holds a PhD in Theoretical Physics. While he did his PhD he was system administrator for Unix and GNU/Linux in the physics department. As system administrator he got to the free software with contributions to X-CD-Roast and later XPP as his first own project. XPP lead him to Mandriva in Paris in August 2000, where he is responsible for the printing and digital imaging in Mandriva Linux.

His main project now is maintaining the linuxprinting.org web site with its printer database and the Foomatic software. He improved this system substatially, and currently it is the standard for printer driver integration in most major GNU/Linux distribution. He is also in the Open Printing Group of FreeStandards.org.

He has given many talks and tutorials and organized booths on free-software-related events. In addition he organized Printing Summits on the Libre Software Meeting 2004 and 2005 in France.

He also wrote several articles in german and brazilian magazines about free software.

The making of freenigma by Stefan Richter and Werner Koch Thursday 18:00

Security is the fastest growing segment of the IT industry. On the one hand, companies spend a lot of money on firewalls, virus scanners and spam filters. On the other hand, confidential information in emails is still sent in clear text ? and therefore unprotected ? 99% of the time.

Although it is obvious to everyone today that emails should be encrypted and signed in order to safeguard privacy and confidential information, client-side encryption and signature have still not caught on.

This is certainly due in part to the subject matter (cryptography), but also to usability aspects.

That was the motivation for freenigma, a joint research project of g10 Code and freiheit.com technologies, to begin developing a central proxy server for encrypting and signing emails based on GnuPG/GPGME and the OpenPGP and S/MIME standards. The goal was to create a server system that was under free licence (GPL) and can be used in both private and company settings.

The presentation will provide an insight into the architecture and functionality of freenigma and, using practical examples, will demonstrate setup, configuration and use.

About the authors:

Stefan Richter, founder and managing partner of freiheit.com technologies was born in 1966. He holds degrees in Computer Science (Dipl.-Inf.) and Engineering (Dipl.-Ing.) and has been programming computers for more than 22 years. After different positions, for example in the development of scientific software in the fields of meteorology and oceanography at the Alfred Wegener Institute for Polar and Marine Research, and in applied research in the aviation and aerospace industries and also the military at the Institute for Applied Systems Technology Bremen GmbH (ATB), has now been working in commercial software development for 15 years. In his free time, he is a volunteer for the Free Software Foundation Europe where he is involved with Free Software and digital civil rights.

Werner Koch, born 1961, is radio amateur since the late seventies and became interested in software development at about the same time. He worked on systems ranging from CP/M systems to mainframes, languages from assembler to Smalltalk and applications from drivers to financial analysis systems. He is a long time GNU/Linux user, principal author of the GNU Privacy Guard and founding member of the FSF-Europe. In 2001 he founded g10 Code, a company specialized in development of Free Software based security applications.

Tick-less Idle CPUs for Virtualization and Power Management by Srivatsa Vaddagiri Friday 9:30

Traditionally, operating systems have used a periodic timer as a heart beat to keep track of time, which is needed for activities like scheduling and accounting, as well as for implementing timers required by user applications and the OS itself. The Linux kernel uses a periodic timer on every CPU as this heart beat. The frequency of this timer tick varies from implementation to implementation. While the Linux kernel earlier used a frequency of 100 ticks/second, more recent distributions of the kernel use a frequency of 1000 ticks/second.

The overhead of such a housekeeping timer however becomes prominent when a CPU is idle and has no immediate housekeeping needs. In virtualized environment, such a timer could reduce the amount of physical CPU time available to a busy partition by consuming physical CPU cycles in an idle partition. Such a timer also prevents idle CPUs from going into low-power states for long periods. It has, therefore, become necessary to find a way to avoid the periodic ticks under some circumstances.

Avoiding the ticks like this however poses a number of challenges. Since many kernel subsystems (like RCU, scheduler, timer, accounting) rely on these ticks, those subsystems need to be modified to deal with the lack of a periodic timer tick. The wall-time can stop getting updated when all CPUs become idle and consequently it has to be recovered upon the resumption of any CPU. Also, there are various short-timers (like slab reap timer) in use by kernel, which can restrict how long idle CPUs can switch off timer ticks. This paper will look at the implications of tick-less idle CPU in various kernel subsystems and how these subsystems need to be modified to deal with it. The paper also presents the results of a number of tests on virtualized platforms with tick-less idle CPUs.

About the author:

Srivatsa Vaddagiri is a Linux kernel hacker working at IBM's Linux Technology Center. He has been with IBM since 9 years now, focusing mainly on Unix related technologies. Some of his most important contributions are AIX on IA64, Linux on a handheld, CPU Hotplug in Linux and lock-free socket hash table lookup. Currently he is looking at making Linux kernel go tickless on idle CPUs.

Standardizing the Penguin: a Progress Report to the Community by Mats Wichmann Friday 9:30

The Linux Standard Base (LSB) project has been evolving for a number of years as an open-source effort to standardise the core functionality of GNU/Linux systems. The concept is to remove incompatibility from parts of the system where there's no real value-add by being different, leaving the rest of the space for innovation; and to make life easier for developers by providing a dependable base they can code to. Community-driven consensus standards are slow to evolve, but the LSB core is now very stable (and is in fact a pending ISO standard) and usable. The next major release will pull in lots of new capabilities including desktop, some works towards manageability and security interfaces, better developer tools, a new edition of the LSB Book, and more. This paper and talk serves as a report to the community on their standardization project, by looking at the road that lies ahead; some of the tools and tests that support the standard, and the road forward; providing a forum for input into future directions that would help make LSB an even more useful standard for developers; as well as reviewing how the community can contribute.

Mats Wichmann

About the author:

Mats Wichmann has been kicking around first UNIX, then the Linux / open source world for rather a long time until finding a home with the LSB project at Intel. At Intel Corporation he's the Linux Standards Architect with the Opensource Technology Center. Mats has been a developer with the LSB project since 2001, and was elected LSB Chairman in January 2004, which role he still holds. He has also worked as a consultant, trainer, and courseware developer. He has past standards/ABI experience with the MIPS ABI Group where he worked as technical director and is an Austin Group and an IEEE Standards Association member. Mats is co-author of the book Building Applications with the Linux Standard Base.

kboot - A Boot Loader Based on Kexec by Werner Almesberger Friday 10:15

Compared to the "consoles" found on traditional Unix workstations and mini-computers, the Linux boot process is feature-poor, and the addition of new functionality to boot loaders often results in massive code duplication. With the availability of kexec, this situation can be improved.

kboot is a proof-of-concept implementation of a Linux boot loader based on kexec. kboot uses a boot loader like LILO or GRUB to load a regular Linux kernel as its first stage. Then, the full capabilities of the kernel can be used to locate and to access the kernel to be booted, perform limited diagnostics and repair, etc.

kboot integrates the various components needed for a fully featured boot loader, and demonstrates their use. While the main focus is on core technical functionality, kboot can serve as a starting point for customized boot environments offering additional features. kboot can also be used as a platform for exploring architectural enhancements, such as pre-loading of device scan results to accelerate the boot process.

Werner Almesberger

About the author:

Werner Almesberger got hooked on Linux in the days of the 0.12 kernel, when studying computer science at ETH Zurich, and has been hacking the kernel and related infrastructure components ever since, both as a recreational activity, and as part of his work, first during his PhD in communications at EPF Lausanne, and later also in industry. Being a true Linux devout, he moved closer to the home of the penguins in 2002, and now lives in Buenos Aires, Argentina.

Contributions to Linux include the LILO boot loader, the initial RAM disk (initrd), the MS-DOS file system, much of the ATM code, the tcng traffic control configurator, and the UML-based simulator umlsim.

Hacking the Linux Automounter: Current Limitations and Future Directions by Jeffrey Moyer Friday 10:15

The IT industry is experiencing a great move from proprietary operating systems to Linux. As a result, the features and functionality that customers have come to expect of these systems now must be provided for on Linux.

Many large scale enterprise deployments include an automounter implementation. The automounter provides a mechanism for automatically mounting file systems upon access, and unmounting them when they are no longer referenced. It turns out that the Linux automounter is not feature-complete. And, there are cases where Linux is just plain incompatible with the implementations of other, proprietary vendors.

This paper takes a look at the common problems in large scale Linux autofs deployments, and offers solutions to these problems.

In order to solve the current automounter limitations, we must start with an understanding of how things work today. To this end, we will explain some basic information about the automounter, such as how to configure an autofs client machine. We will walk through the code for basic operations such as the mounting or lookup of a directory and the unmounting, or expiry of a directory. Through this, we will see where autofs fits into the VFS layer.

With a picture of the landscape in place, we take a look at major issues facing customer deployments. Currently, there are two main pain points. The first is that the Linux automounter implements direct maps in a way which is incompatible with that of every other implementation. We will discuss the desired behavior and compare it with that of the Linux automounter. We will then look at ways to overcome this incompatibility by extending the autofs kernel interface.

The second major pain point surrounds the use of multi-mount entries for the /net, or -hosts maps. Because of the nature of multi-mount maps, the Linux implementation mounts and unmounts these directory hierarchies as a single unit. As such, clients mounting several big filers can experience resource starvation, causing failed mounts. We will look at this problem from several different levels. We start at the root of the problem, and show how the kernel, glibc, and automount can be modified to address the issue.

We conclude with future directions for the automounter.

Jeff Moyer

About the authors:

Jeff Moyer is a senior software engineer at Red Hat, Inc., who has been using Linux since 1995. In his formative years, he worked on high performance cluster computing infrastructure at Worcester Polytechnic Institute. He then went on to implement high availability cluster software such as Kimberlite, Convolo Dataguard, Convolo Netguard, and other solutions in the embedded device space. Jeff has since moved on to a mixed bag of hacking, including the Linux automounter, the netpoll API, Red Hat's netdump utility, and the AIO subsystem.

The paper is co-authored by Ian Kent, who obtained a degree majoring in Mathematics and computer science in 1983. He has worked in the computing industry since then. The first 5 years was spent doing software development and infrastructure work. Following this he has worked mostly in infrastructure although he has always had software development pojects of some sort as a sideline.

He has been using Linux, in one way or another, since 1994. Having the need to use an automounter in most of the environments he work lead him to work on autofs in Linux. After customising it for one site he took on maintaining the Version 4 code base. He has been maintaing this for about three years.

Extending Kprobes to Support User-Space Application Instrumentation by Prasanna Panchamukhi Friday 11:30

Extensive usage of Linux in the enterprise world urges a need for a tool to analyse the production system non-disruptively. Kprobes provides such a simple and lightweight interface to analyse the Linux kernel with minimal disruption, providing 24x7 availability. One can write a loadable kernel module using kprobes facilities to trace the kernel. There are user-space tools being developed which underneath uses the kprobes feature to instrument the Linux kernel non-disruptively. SystemTap is a one such user-space utility built on top of kprobes interface to write simple scripts, insert probes into any kernel routines and get the formatted trace data. Trace data can be function arguments, function return values, stack traces, global variables etc. Now with Kprobes feature in the mainline kernel and with the development of tools like SystemTap, next step would be to provide user-space probe mechanism which can readily be used to instrument user-space applications non-disruptively.

This paper does an extensive comparison of existing user-space application instrumentation, designs new interfaces to meet to the requirements and will demonstrate the usage of userspace probe mechanism in the real-world.

This paper talks about the user-space instrumentation mechanism that can be used to insert probes in the user-space applications and collect the tracing data non-disruptively. This papers discusses the following topics in detail.

  1. Motivation for dynamic user-space application instrumentation.
  2. Improved dynamic user-space application instrumentation mechanism and interfaces.
  3. Comparison of existing user-space instrumentation mechanisms such as Dprobes, Tools Using Dyninst etc.
  4. Current user-space probe mechanism, may miss probes in the symmetric multiprocessor environment. Modifications are required to make it efficient.
  5. This paper also discusses improved user-space probe mechanism to avoid probes miss in symmetric multiprocessor environment. Also discusses various features that user-space probe provides such as tracing the function entry and exit points, multiple handlers at the same address etc.
  6. Provides real-life examples to show the usage of user-space instrumentation.

References

[1] Paper on Dynamic Probes - Debugging by stealth published at Linux.Conf.Au 2003
[2] Dynamic Probes -website having user-space probes patches.
[3] The SystemTap project website
[4] Talk on Analysing real world problems in Linux kernel using Kprobes and Jprobes.
[5] Paper on Locating system problems using dynamic instrumentation at OLS 2005
[6] Paper on Dynamic Instrumentation of Production Systems published at USENIX-2004.
[7] Tools using Dyninst

About the author:

Prasanna is currently working with IBM's Linux Technology center Reliability Availability and Serviceability tools group. He is one of the developers for Kprobes in Linux and SystemTap tool for Linux. Prasanna is also involved in improving various probe and trace tools for Linux. You can reach Prasanna at prasanna@in.ibm.com.

VFS based Union Mounts for Linux by Jan Blunck Friday 11:30

Unlike a traditional mount, that hides the contents of the mount point, a union mount presents a single view as if the file systems are merged together. Although only the top layer file system of the union stack can be altered, it appears as if it is possible to delete or modify anything. Files in the lower layers may be deleted with whiteouts in the topmost layer. Modified files are automatically copied into the topmost layer first. For a virtual file system (VFS) based implementation, heavy changes to the VFS and some of the low-level file systems are necessary. This includes modification of lookup and directory reading operations as well as the introduction of a persistent whiteout file type.

Union mounts make the implementation of some applications easier, e.g. live-cds or sourcetree-management. In combination with an execute-in-place file system, union mounts are used for efficient software management on read-only file systems shared between Linux z/VM guests.

About the author:

Jan Blunck studied electrical engineering at the Technische Universität Hamburg-Harburg, specializing in computer engineering. In 2003, he has been studying abroad at the Nanyang Technological University, Singapore. He has written his diploma thesis about a VFS based implementation of transparent file system mounts, also referred to as union mounts, at the IBM Lab in Böblingen, Germany. His contributions in Linux development reach from a USENET newsreader, over device drivers to Linux VFS development.

Page Migration: Implementation and Uses by Mike Kravetz Friday 12:15

This paper explores the topic of page migration within the Linux kernel. Page migration is the act of moving data from one physical page to another. This action should be transparent to users of the data. The paper will explore an implementation of page migration. As one would expect, this will touch on modifications and enhancements to various pieces of the virtual memory subsystem.

After explaining the implementation of page migration, two known uses of page migration are discussed. These are memory hotplug and process migration on NUMA architectures. The paper will show how page migration is used in each of these projects.

About the author:

Mike is a member of IBMs Linux Technology Center. He has been working in OS design and development for the past 20+ years. Mike has made numerous contributions to UNIX OSs in the areas of process management, memory management, NUMA enablement, loadable kernel modules, shared library support and multi-system shared device accessibility. He started hacking on Linux in 2000.

Logfs - finally a scalable flash file system by Jörn Engel Friday 12:15

Manufacturers like Samsung are positioning flash media as replacements for hard disks as mass storage media and gaining strength in the marketplace, esp. in the embedded area. While bringing advantages in price, power consumption and reliability, flash technology is distinctly different to hard disks and requires special support.

Three differences are relevant to the design of filesystems. Flash requires updates to occur out of place, while hard disks work well with in-place updates. Lifetime of flash blocks is limited by the number of write accesses to them. And flash blocks are substantially larger than hard disk sectors, requiring blocks to be shared by several filesystem blocks and to be garbage collected under space pressure.

Two approaches exist to deal with these differences. One is to add an abstraction layer that emulates hard disk behaviour, commonly known as Flash Translation Layer (FTL, NFTL, INFTL, etc.). Any existing file system can be used on top of this abstraction, with FAT being the most common choice. The second approach is to create specialized file systems for flashes, like JFFS2 or YAFFS.

While Flash Translation Layers facilitate the integration of flash devices into existing systems, they also come with disadvantages. File systems working above the translation layer are usually build for hard disk drives. Empty space is never explicitly cleared, as that would waste IO cycles and complicate undelete operations, that most filesystems support. In combination with garbage collection (GC) in the translation layer, this causes empty space to be recogniced as valid data by the translation layer. During GC, this "data" gets written into empty blocks, reducing performance and medium lifetime.

Current flash filesystems, combining filesystem and translation layers, have efficient garbage collection, as they know the state of their content. But they are based on a log structured design, which does not easily map to the filesystem tree presented to users. JFFS2 and YAFFS both have to scan the medium during mount time and build a partial filesystem tree in memory. Their drawback, hence, is increased memory usage and long mount times. One of the authors already experienced 15min to mount an empty JFFS2 filesystem.

To free ourselves from this uncompfortable situation between a rock and a hard place, a new filesystem design is presented. Requirements for the new design are out of place updates of data combined with a tree structure on the medium. Design has started over Eastern 2005 and convinced most MTD developers about its necessity since.

About the author:

Jörn Engel has been working on embedded systems - most of them running Linux - since 2001. Since, he has written several MTD drivers, added support for new hardware to JFFS2 and become an MTD fellow.

He is currently working for IBM in the development lab in Böblingen, Germany, where he already completed his diploma thesis on Linux kernel code quality. The "make checkstack" build target has emerged from this thesis and become a standard tool since.

Robert Mertens is currently working on his PhD in computer science at the university of Osnabrück, Germany. His primary interests are eLearning and lecture recording systems. These days, he is primarily occupied with his new-born daughter, Alexandra Maria.

First steps towards the next generation netfilter subsystem by Harald Welte Friday 14:00

Until 2.6, every new kernel version came with its own incarnation of a packet filter: ipfw, ipfwadm, ipchains, iptables. 2.6.x still had iptables. What was wrong? Or was iptables good enough to last even two generations?

In reality the netfilter project is working on gradually transforming the existing framework into something new. Some of those changes are transparent to the user, so they slip into a kernel release almost unnoticed. However, for expert users and developers those changes are noteworthy anyway.

Some other changes just extend the existing framework, so most users again won't even notice them - they just don't take advantage of those new features.

The 2.6.14 kernel release will mark a milestone, since it is scheduled to contain nfnetlink, ctnetlink, nfnetlink_queue and nfnetlink_log - basically a totally new netlink-based kernel/userspace interface for most parts of the netfilter subsystem.

nf_conntrack, a generic layer-3 independent connection tracking subsystem, initially supporting IPv4 and IPv6, is also in the queue of pending patches. Chances are high that it will be included in the mainline kernel at the time this paper is presented at Linux Kongress.

Another new subsystem within the framework is the "ipset" filter, basically an alternative to using iptables in certain areas.

The presentation will cover a timeline of recent advances in the netfilter world, and describe each of the new features in detail. It will also summarize the results of the annual netfilter development workshop, which is scheduled just the week before Linux Kongress.

About the author:

Harald Welte is the chairman of the netfilter/iptables core team.

His main interest in computing has always been networking. In the few time left besides netfilter/iptables related work, he's writing obscure documents like the "UUCP over SSL HOWTO" or "A packet's journey through the Linux network stack". Other kernel-related projects he has been contributing to are random netowrking hacks, some device driver work and the neighbour cache.

He has been working as an independent IT Consultant working on projects for various companies ranging from banks to manufacturers of networking gear. During the year 2001 he was living in Curitiba (Brazil), where he got sponsored for his Linux related work by Conectiva Inc.

Starting with February 2002, Harald has been contracted part-time by Astaro AG, who are sponsoring him for his current netfilter/iptables work. Aside from the Astaro sponsoring, he continues to work as a freelancing kernel developer and network security consultant.

He licenses his software under the terms of the GNU GPL. Sometimes users of his software are not compliant with the license, so he started enforcing the GPL with his gpl-violations.org project.

During the last year, Harald has started development of a free, GPL-licensed Linux RFID and electronic passport software suite.

Harald is living in Berlin, Germany.

A virtual filesystem on steroids: Mount anything, index and search it. by Ben Martin Friday 14:00

The libferris Virtual filesystem has always sought to push the boundaries of what a filesystem should do in terms of what can be mounted and what metadata can be shown for files. Over the past 5 years it has extended to from mounting more traditional things such as tar.gz, ssh, digital cameras, IPC primitives to be able to mount various ISAM files including: db4, tdb, edb, eet, gdbm, various relational databases including: odbc, mysql, postgresql, various servers such as: Http, Ftp, LDAP, Evolution, RDF graphs aswell as XML files and Sleepycat's dbXML.

Recently support for indexing filesystem data using any combination of Lucene, ODBC, TSearch2, xapian, LDAP and PostgreSQL has been added with the ability to query these backends for matching files. Matches are naturally presented as a virtual filesystem.

To enable legacy clients to take advantage of libferris, a Samba VFS module has been created allowing parts of a libferris system to be exported as samba shares.

The talk will be about the things libferris can mount, the metadata it offers, searching with libferris and finally how to export things as Samba shares.

About the author:

I've been working on filesystem related code for the past 10+ years, libferris for the last 5. I have collected various degrees including a Bachelors and Masters in InfoTech.

I am currently undertaking a PhD on the application of Formal Concept Analysis to Semantic File Systems to give a superior search and interaction interface to one's filesystem.

NoSE - easily building virtual honeynets by Andreas Görlach Friday 14:45

The communities developing in the areas of virtualization and simulation are in a state of flux. In the case of virtualization even a renaissance is proclaimed. Yet there are only few approaches combining simulation at network layer and operating system virtualization, at least in the field of open source.

We developed a system called Network Simluation Environment (NoSE) to simulate arbitrary network environments on a single Linux machine. NoSE represents a high-interaction honeypot and has special support for honeynet applications and forensics.

The system uses different Open Source emulators, like User-Mode-Linux, Qemu, and Xen, to provide support for a broad range of guest operating systems. We have tested Linux, BSD, and Windows. Other emulators can be added easily.

The Linux kernel's bridging facilities are used to build a virtual ethernet to connect the virtual machines. NoSE provides a GUI and a management daemon that is capable of generating the whole network infrastructure with just a few clicks. Different virtual machines and network configurations can be archived in a library for later reuse. Starting and stopping of whole networks thus becomes a simple task.

As the emulators run full-fledged operating systems, there are almost no restrictions for applications and services that can be installed within the simulated network. Possible applications for our system include network simulation, testing, training, distributed application development, and analysis of security issues. Security tools for monitoring, intrusion detection, and sniffing are already integrated. Data capture (i.e. logging) and data control (preventing attackers to harm other machines) takes place outside the honeynet.

About the author:

Andreas Görlach is PhD student at the IT Transfer Office (ITO), a third-party funded research unit of the department of computer science at the TU Darmstadt.

His work focuses on network security. With his colleagues he regularly teaches the aspects of IT network security to computer professionals in a course called "Hacker Contest". Within that course a virtual lab built by means of NoSE forms the basis for the security training.

In practicals for students of computer science Andreas supervises the analysis of current technologies such as WLAN or VoIP.

Other field of his work include privacy and security in ubiquitous computing.

Distributed Samba by Volker Lendecke Friday 14:45

Samba is working fine as a file, print and authentication server on a single host. With the rise of distributed file systems like GFS, OCFS, Lustre and others the with might come up to share the same file space via different Samba nodes.

Right now this almost inevitably leads to data corruption, as Samba needs to present very special locking semantics to the Windows client. These locking semantics have absolutely nothing to do with anything posix can deliver, and thus GFS and the others can not coordinate Samba access across nodes.

In response to a particular customer request I'm in the process of fixing that. As the underlying file system can not deliver the locking semantics Samba needs, on a single host coordination is done via shared databases. This does not work at all or would be *very* inefficient if it was ported 1:1 to a distributed environment.

This talk will present the locking semantics Samba needs:

  • Oplocks are a way to reliably allow a client to cache files
  • Share modes are complete-file locks with very peculiar semantics, in particular locking violations may not be answered immediately.
  • Byte range locks also have to be taken care of by Samba, as Posix has very weird semantics here as well.

The second part of the talk will be a description of the current state of development. If I happen to have something to show at the time of the talk, a live demonstration is inevitable.

Volker Lendecke

About the author:

Volker Lendecke is a long-time member of the core Samba Team. Volker is co-founder of the Göttingen, Germany based SerNet Service Network GmbH and does consulting, training and development for Samba and other Open Source products.


Comments or Questions? Mail to contact@linux-kongress.org Last change: 2006-01-19