The Research Chair in Distributed Computing (the acronym CALDI comes from the words ``calcul distribué''
which is the French name of Distributed Computing) was created at the Université
du Québec à Hull in February 2001, in order to promote the specialization in a research domain
consistent with the technological environment of the National Capital Region of Canada.
The research work of the Chair concentrates on information processing in distributed systems
and on related problems of communication in networks.
Distributed computing is a domain of computer science whose growth significantly
accelerated in recent years. This is due to the increasing role that interconnection networks
and telecommunication play in the life of modern society. Distributed computing is a foundation
of these two fields of application of computer science. It concerns environments where
several processors situated in different sites have to collaborate in order to perform a computational task.
The nature of tasks to be accomplished in a distributed way may vary from numerical computations
too complex for one computer, to the coordination of the work of all branches of a bank, spread
over a continent, to the search for data in the Internet, and many others. Collaborating processors
may be computers of a local network of a university, situated close to each other, but may also be
Internet servers at distances of thousands of kilometers. Clearly, each of these situations yields
different problems and challenges concerning organization and division of work.
Every processor has a certain degree of autonomy and can carry out parts of the task locally,
independently of other processors. It has a memory and a control unit distinct from others.
However, in order to accomplish the entire task, processors must share some resources, which requires
coordination. This is done by means of communication among processors which exchange messages
during the execution of the distributed task. Hence problems of communication in networks
are an important issue in distributed computing. Sharing resources among processors concerns
the following aspects.
Data of the problem are initially distributed among processors and must be exchanged between them.
For example, bank operations performed by a client in different branches must be known at each of them.
Partial results of computations of one processor must be communicated to others to be used in turn
as data in their computations. For example, large tasks of matrix computations are performed in a distributed
way, assigning computations concerning different parts of the matrix to different processors. Since these
computations may be mutually dependent, processors must exchange partial results interactively in order
to accomplish the global task.
Final results of computation are distributed among processors, each of them often keeping only a part
of the global result. This is due either to the size of the result, too large for the local memory of
each processor, or to the fact that each processor will need only a part of the result relevant to it
in performing future tasks. For example, after accomplishing the fundamental task of finding a minimum
spanning tree of the network in a distributed fashion, every processor keeps in its local memory only
incident edges of the tree.
The above indicated crucial features of distributed systems explain specific challenges of distributed
computing, as compared to centralized computing, performed by a single processor. New problems, which do not
appear in a centralized environment, concern sharing tasks among processors in order to take maximum advantage
of the computational power of each of them, organizing communication among processors in order to
coordinate their work efficiently, and maintaining a balance between the load of local computations
in each of the processors and the load of communication in the entire system. These new challenges often
increase the complexity of distributed problems as compared to those concerning centralized processing:
many tasks for which centralized solutions are well known present major difficulties when they are to be
solved in a distributed way. However, the additional conceptual work invested in overcoming these new obstacles
is usually very profitable: distributed methods are in general much faster than those using centralized computing.
In our time, when the amount of processed information grows exponentially, distributed processing becomes
unavoidable: no individual computer has sufficient ressources to accomplish huge
information processing tasks required now and in the future. Distributed computing becomes an
indispensable tool and it is almost sure that its role will increase in time.
Research directions
The research activity of the Chair CALDI concentrates on algorithmic and software aspects of distributed computing.
It is important to underline the relations between these two domains, as well as relations between each of them
and hardware issues. On the one hand, research results in distributed algorithms can be used, e.g., in the
development of communication protocols or in the construction of distributed data bases. The complexity
analysis of communication algorithms conceived for various network topologies may and should influence
the choice of topologies for communication networks to be used in the future. These examples show the
impact of distributed algorithms on software and hardware issues in distributed systems. On the other hand,
the impact is also reciprocal. New technologies in the domain of telecommunication, both in its software
and hardware part, significantly influence theoretical models of communication and algorithmic problems
to be considered. Two particularly convincing examples of the impact of technology on distributed algorithms
are the following. Increasing use of optic fiber in communication networks yields growing interest in
communication models that consider characteristics of optical transmission channels. Likewise, growing
popularity of the Internet in modern society contributes to the creation of new subdomains of distributed
algorithms, such as exploration of unknown networks by mobile agents, or searching for data on the Web.
This shows that distributed algorithms which form a focus of the research activity of the Chair,
are tightly related to other aspects of distributed computing, as well as to problems of its
two main application domains: interconnection networks and telecommunication.
Below we describe the main research directions of the Chair CALDI. Each of them comprises a research project
spanning over several years. Most of them are tightly connected and will be developed in parallel.
This description should not be considered in a restrictive way: technological progress to be foreseen in
the next decade and the development of distributed computing following it, will doubtless modify some of
these directions and will add others. However, the following description gives representative examples
of types of problems that will be investigated in the Chair.
Communication in partially unknown environments
In many applications, the topology of the communication network and its parameters, such as the diameter,
the size, or the maximum degree, may be partially or totally unknown to processors. Often, information
available to every processor is limited to the part of the network situated in its vicinity. In such situations,
communication algorithms that can be used may only depend on locally available information. The main problem
that we want to study concerns optimization of communication algorithms working in partially unknown
networks, and the influence of the amount of available information on communication efficiency.
Communication in faulty networks
The growing size of communication networks and the huge amount of transmitted information increases
the vulnerability of these networks to component failures. Faults of links or nodes of the network
may be permanent or intermittent, may be restricted to a part of the network or randomly distributed
throughout it, and may alter the work of the components in various ways. The main problem in this context is
to design communication algorithms which work efficiently and reliably in spite of the faults and
without knowing a priori their location. One of important aspects of such algorithms is fault diagnosis
whose aim is to locate faults using distributed methods, without relying on a central monitor
coordonating actions of processors. These methods use results of tests that processors perform on their neighbors.
The difficulty of this task is augmented by the fact that information generated by faulty processors is
not reliable and it is not known a priori which processors are faulty and which are functional.
Refined methods of selection of tests and of their analysis permit, however, to extract from test results
information necessary to perform reliable diagnosis.
Network exploration
In many applications it is important to learn the topology of an unknown network, such as
a part of the Internet or a particular mobile network. In such cases, one or more mobile agents
(which are programs specifically designed for network exploration) have to construct a map of the network,
(i.e., to find the underlying graph), by traversing all nodes and links of the network. The problem is
to organize the work of the agents in the most efficient way, i.e., so that they complete the task in minimum
time, sharing work and avoiding redundant traversals. Scheduling trajectories of these agents in an unknown
environment and scheduling communication between them are important challenges that need to be faced.
Communication in wireless and mobile networks
Growing popularity of cellular phones yields increasing interest of the research community
in a particular type of communication networks: wireless and mobile networks.
This type of networks whose main application until quite recently was limited to military needs,
became indispensable in everyday life and its role will further grow when wireless access to the Internet becomes
popular. Wireless and mobile networks are a source of new important algorithmic problems. The choice of
wavelengths minimizing interference, dynamic localizing of nodes in a mobile network, communication
in an unknown environment that changes frequently, are only some examples. Distributed algorithms are
particularly well adapted to solve these problems. Indeed, classic networks are relatively stable
and their topology can be learned by nodes of the network after some time of functioning. Hence in this environment,
centralized algorithms can be sometimes used. On the other hand, the structure of mobile networks changes
frequently during the execution of a communication protocol. This does not permit the use of algorithms
requiring global knowledge of the network, and hence distributed methods based only on local knowledge
must be applied.
Optimization of distributed object-oriented systems
Advanced computing applications are more and more reliant on two fundamental
technologies: object-orientation and distribution (e.g., client/server).
Existing object-oriented design methods are of little help in the systematic
and efficient development of large distributed systems. One of the main
impediments to an effective distribution of objects is the high level of
network overload resulting from the numerous message exchanges characterizing
the object-oriented paradigm. One of our research subjects tackles the
quantification of coupling among objects and the uncovering of appropriate
object permutations and processor-centered groupings in order to minimize
interprocessor traffic.
Behavioral modeling of network components
Physical components of current networks are more and more complex, rendering their modeling
rather difficult and posing problems during the planning and management of such networks. Thus
modeling and management of these networks require more suitable tools and methodologies in
order to obtain a realistic outcome. Our goal in this research is to define a new representation
methodology to model these components and their behaviors in a realistic way. We focus first
on protocol specifications to be used to define architectures and communications of these
components in order to be specified formally and validated. Examples of components
considered include switches and routers since they are directly responsible for the network
performance and enable sustained quality of services.
Global performance management in complex networks
Existing communication networks are made of components distributed over large spaces.
These components must communicate in a continuous way to guarantee the required
performance even in dynamic contexts. The global performance of communication networks
operating in a dynamic environment is still an open problem. Our goal is to define
the most suitable strategies to deal with this problem in the context of network changes.
We will consider different routers as network components, and
will study the problem of load balancing between them, taking into account different network
topologies.