Mohsen Sharifi
Software Engineering Laboratory
Department of Computer Engineering
Iran University of Science & Technology
Narmak, Tehran
Iran
Linux is an easily available and powerful operating system for ordinary users, but it is based on a 70s design, making the need for the addition of more modern concepts apparent. Distributed Inter-Process Communication (DIPC) provides the programmers of Linux with distributed programming facilities, including transparent Distributed Shared Memory (DSM). It works by making UNIX System V IPC mechanisms (shared memory, message queues and semaphores) network transparent, thus integrating neatly with the rest of the system. The underlying network protocol used is TCP/IP. DIPC is targeted to work on Wide Area Networks (WANs) and in heterogeneous environments.
Building Multi-Computers [1] and programming them are among the more popular research subjects, and demand for them is rapidly rising. Any solution to distributed programming under Linux should keep up with one of Linux's more important features: availability to ordinary users.
Right from the start, it was decided that ease of application programming, and the simplicity of the DIPC itself should be among the most important factors in the system design, even if they meant that there will be some loss in performance. This decision was backed by the observation that computer and telecommunication equipment's speeds are improving very rapidly, while training and programming times for distributed applications seem not to be following a similar trend.
In DIPC, UNIX System V IPC mechanisms [5], consisting of semaphores, messages and shared memories, are modified to function in a network environment. Here the same system calls that are used to provide communication between processes running in the same computer could be used to allow the communication of processes running on different machines. There is no new system call for the application programmers' use. There is also no library to be linked to the application code, and no need for any modifications in compilers. DIPC could be used with any language that allows access to operating system's system calls. It is completely camouflaged in the kernel.
The above means that DIPC supports both the message passing and the distributed shared memory paradigms of distributed programming, which results in more options for application programmers [6]. Also, allowing the processes to share only selected parts of their address space helps to reduce the problems of false sharing.
It was decided to implement DIPC in the user space as much as possible, with minimal changes to the kernel. This can lead to a cleaner and simpler design, but in a monolithic operating system, such as Linux, has the drawback of requiring frequent copy operations between kernel and user address spaces [2]. As UNIX does not allow user space processes to access and change kernel data structures at will, DIPC has to have two parts: the more important part is a program named dipcd, which runs with superuser privileges. dipcd forks several processes to do its work. The other part is inside the kernel, giving dipcd work to do and also letting it to see and manipulate kernel data. The two parts use a private system call to exchange data. This system call should not be used by other processes in the system.
DIPC is only concerned with providing some mechanisms for distributed programming. The policies, i.e. how a program is parallelized, or where an application program's processes should run, are determined by the programmer or the end user. Considering the fact that in most cases the programs' codes change much less frequently than the data they use, DIPC provides easy data transfer over the network, and assumes that the code to use these data already resides at the suitable places.
In normal System V IPC, processes specify numerical keys to gain access to the same IPC structure [5]. They can then use these structures to communicate with each other. A key normally has a unique meaning only in one computer. DIPC makes the IPC keys globally known. Here, if the application programmer wants it, a key can have the same meaning in more than one machine. Processes on different computers can communicate with each other the same way they did in a single machine.
Information about all the IPC keys in use are kept by one of dipcd's processes called the referee. Each cluster has only one referee. In fact, it is having the same referee that places computers in the same cluster. All other processes in the cluster refer to this one to find out if a key is in use. This means that the referee is DIPC's name server. Beside many other duties, the referee also makes sure that only one computer at a time will attempt to create an IPC structure with a given key value, hence the name. Using a central entity simplifies the design and implementation, but can become a bottleneck in large configurations. Finding a remedy to this problem is left to the time when DIPC is actually running in such configurations.
Users may need to run some programs (e.g. utilities) in all the computers in the system at the same time, and these programs may need to use the same IPC keys. This could create interferences. So as to prevent any unwanted interactions, it was decided that distributed IPC structures should be declared by programmers as being so. The programmer just has to specify a flag to do this. The structures are local by default. The mentioned flag is the only thing that the programmer should do to create a distributed program. The rest is like ordinary System V IPC programming. Should it not have been the intention to make DIPC compatible with older programs, this system would be totally transparent to programmers.
DIPC's programming model is very simple, and quite like using ordinary System V IPC. A Typical scenario is this: a process first creates and initializes the needed IPC structures. After that other processes are started to collaborate on a job. All of them can access the same IPC structures and exchange data. These processes are usually executing in remote machines, but they all could also be running in the same computer, meaning that distributed programs can be written on a single machine and later run in real multi-computers.
One important point to keep in mind about DIPC is that no other UNIX facility is changed to work in a distributed environment. So programmers can not use system calls like fork(), which creates a process in the local computer.
The fact that DIPC programs use numerical keys to be able to transfer data means that they do not need to know where the corresponding IPC structures are. DIPC makes sure that processes find the needed resources just by using the specified keys. The resources could be located in different computers during different runs of a distributed program. This logical addressing of resources makes the programs independent of any physical network characteristics.
Simple techniques allow the mapping from logical computing resources needed by a program to physical resources to be done with no need to re-make the program. As DIPC programs do not need to use any physical network addresses, they do not need recompiling to run in new environments. Of course this does not prevent the programmer from choosing to make his/her program dependent on some physical system characteristics. (S)He could for example hard code a computer address in his code. DIPC programmers are discouraged to do so.
When dipcd is not running, the kernel parts of DIPC are short circuited, causing the system to behave like a normal Linux operating system. So users can easily disable the distributed system. Also, normal Linux kernels are not affected by DIPC programs, meaning that the there is no need to change and recompile these programs when they are to be executed in single computers with no DIPC support.
DIPC can be configured to provide a segment-based or a page-based DSM. In the first case, DIPC transfers the whole contents of the shared memory from computer to computer, with no regard to whether all that data are to be used or not. This could reduce the data transfer administration time. In the page-based mode, 4KB pages are transferred as needed. This makes multiple parallel writes to different pages possible.
In DIPC each computer is allowed to access the shared memory for at least a configurable time quantum. This lessens the chances of the shared memory being transferred frequently over the network, which could result in very bad performance.
[2] Andrew S. Tanenbaum, Operating Systems Design and Implementation, Prentice-Hall, 1987.
[3] Andrew S. Tanenbaum, Computer Networks, Prentice-Hall, 1989.
[4] Andrew S. Tanenbaum, Distributed Operating Systems, Prentice-Hall, 1995.
[5] W. Richard Stevens, Advanced Programming in the UNIX Environment, Addison-Wesley, 1992.
[6] Robert G. Babb II, editor, Programming Parallel Processors, Addison-Wesley, 1988.
[7] Bill Nitzberg and Virginia Lo, "Distributed Shared Memory: A Survey of Issues and Algorithms", Computer, August 1991, pp 52-60.
[8] Maurice J. Bach, The Design of the UNIX Operating System, Prentice-Hall, 1986.
[9] PVM web page: http://www.epm.ornl.gov/pvm/pvm_home.html