Get news? 2011 | 2010 | 2009 | 2008 | 2007 | 2006 | 2005 | 2004 | 2003 | 2002 | 2001 | 2000 | 1999 | 1998 | 1997 | About | Contact Want to help?

Linux-Kongress 2002
9th International Linux System Technology Conference
September 4-6, 2002 in Cologne, Germany

Home | Events | Program | Abstracts | Tutorials | BoFs | Fees | Exhibition | Location | Accommodations | Keysigning Party | Sponsors | Supporters | Reports and Photos | Papers and Slides | Call for Papers

See the list of all papers
Author Fábio Olivé Leite
Title Load-Balancing HA Clusters with No Single Point of Failure
Paper
Postscript: lk2002-leite.ps (163154 Bytes)
Abstract

This paper will present the work of the author towards obtaining a fully distributed, load balancing, high availability cluster with no single point of failure. High availability, load balancing clusters are a very active area of Linux development, and there are several efforts worldwide that aim to achieve such goals. Most of these efforts involve the concept of a redirector machine, that receives the client's connections and distributes them among the working nodes (a server farm) using several different distribution policies, and that sometimes also route the server's answers. This redirector is a single point of failure that can render the whole cluster useless if it fails.

This work presents a new model for load balancing clusters that completely eliminates the need for a redirector, where all nodes are equal and each decides, using a distributed heuristic, whether it is the one that should accept a new connection. The key aspects here are: making sure all connections can reach all servers, and that exactly one server decides it should take an incoming connection.

There are at least two other known attempts at achieving this goal, but both are based on slicing the connection space based on the client addresses and distributing different slices to different servers. This not only does not ensure proper load balancing, but also leads to entire network address ranges being unserved should a server fail and the slice not be taken by another server. These attempts also cause the servers to loose their network identity, as they should all assume the same physical and network address, which makes remote administration impossible.

The author has developed a model for such type of clustering in which servers do not loose their network identity (remote administration is possible), the servers in the farm can be directly connected to the backbone (no redirection), load balancing can be done effectively and crashed servers affect only their own established connections. It involves some ARP magic, unicast and multicast confusion and weird modules for IP Tables. The author has a working prototype missing only one module from complete reference implementation. The paper will present the model, its goals, the techniques involved, the modules to achieve the goals and some practical tests. This paper is being presented to the community in search of validation and criticism, as the author has failed to find any gotchas in the approach taken.

This research is being conducted in the development laboratories of Conectiva S.A., in Curitiba, Brazil.

About the Author

Fábio Olivé Leite is finishing his MSc in Fault Tolerance, has a BSc [olive] degree in Computer Science and a Technician degree in Industrial Electronics. He has experience as a software developer in Linux High Availability projects, as a University Professor in subjects such as Computer Architecture, Operating Systems and Parallel Processing, as a Network Administrator and lately some experience in embedded applications. His areas of expertise include operating systems development, distributed systems and reliable communication. He has programming skills in C, Python, Perl, Bourne/Korn shell scripting and Assembly Language. Fábio coordinates the Linux Users Group of his home city; his latest lectures were given at the "8th International Linux Kongress" and at "Unix en High Availability (Voorjaarsconferentie)" in the Netherlands, and also at the "2nd Testing and Fault Tolerance Workshop" in Curitiba, Brazil.


Comments or Questions? Mail to contact@linux-kongress.org Last change: 2005-09-17