Distributed High-Performance Simulation using Java and Time-Warp

Peter Ashenden
Ken Hawick
Department of Computer Science, University of Adelaide, Australia
February 1999

Summary

This project will involve development of a kernel to support parallel discrete event simulation (PDES) using the Time-Warp PDES algorithm.  The kernel will be developed in Java, and build on the techniques used in the Warped PDES kernel from the University of Cincinnati.  The project will be undertaken in conjunction with the Distributed High Performance Computation (DHPC) group at Adelaide.

Project Description

Background

Discrete event simulation is a technique for simulating the behaviour of complex systems.  Objects in the system are modelled as processes that interact by scheduling the occurrence of events.  Processes respond to observed events by updating their state and scheduling further events.  Thus events are a form of message-passing communication between the simulation objects.

There are numerous applications of discrete event simulation, including:

Many simulation applications involve thousands of simulation objects and millions of events, and take a long time to execute on conventional computer systems.  It is desirable to improve the execution time by using parallel computers or distributed networks of computers.  However, this introduces the problem of synchronizing the progress of processes running on different processors.  For example, if one process proceeds past simulation time t, and another process that is still before time t schedules an event at time t, the first process would miss the event.  This would result in incorrect behaviour being simulated.

There are two techniques for ensuring that processes synchronize correctly.  Conservative synchronization involves processes not executing past some time t until they can be sure that they will not receive a message before time t.  Conservative synchronization has the advantage that no unneeded computation is performed, however, it may constrain the amount of parallelism achievable.  Furthermore, the mechanism that ensures that processes wait until no further messages will be received is prone to deadlock.  Thus conservative simulators must include a global deadlock detector and resolver.

Optimistic, or Time-Warp, synchronization allows processes to proceed  forward arbitrarily in simulation time.  However, the state of each process is regularly checkpointed, so that if a process receives a message in its past, the process rolls back and recomputes based on the data in the message.  Since the process may have incorrectly sent further messages to other processes, it must send anti-messages to cancel the effect of the incorrectly sent messages.  Cancellation is achieved by rolling back the recipients of incorrectly sent messages.  While rolled-back computation may appear to be wasted, if the receipt of a message does not cause the process to change state, the computation is in fact correct, and can thus be used.  When this occurs, Time-Warp synchronization can make faster process in simulation than Conservative synchronization.  The overall progress of a time-warp simulation is measured by Global Virtual Time (GVT), which is the global minimum of the simulation times to which processes have advanced.

Warped

Warped is a time warp simulation kernel written in C++ and MPI-based message passing. The kernel includes classes that represent simulation objects, and provides facilities for objects to schedule events.  Users develop a model of a system to be simulated by deriving objects specific to their application from the classes provided in Warped.  The simulation kernel can be configured to include various alternative algorithms for different aspects of time warp, including schedule of simulation objects, state saving, memory management, event list management, cancellation, GVT calculation, and others.

The Java Time-Warp Kernel

The project will involve development of a Java version of the warped simulation kernel, allowing simulations to be developed in Java and executed in a distributed computation environment..  Initially, only the simplest of the configurable modules for the time-warp algorithm will be implemented.  Next, the kernel will be extended to allow dynamic creation of processes and migration of simulation objects during the course of a simulation.  These facilities are not currently provided in the Warped kernel, and will be the main research contribution of this project.

Dynamic process creation will allow simulation of systems in which the number of simulation objects varies through the lifetime of the system.  Such systems include computer software systems, in which the simulation objects represent threads of computation.

Migration of simulation objects will allow a simulation to be dynamically repartitioned across the multicomputer or distributed network of computers hosting the simulation.  This feature will form the basis for future research into dynamic and adaptive partitioning and distribution.  Current distributed systems statically distribute simulation objects across the host platform, either randomly or based on estimates of computational and communication load.  However, it is difficult to determine reliable estimates in many cases.  It is expected that improved simulation performance be obtained by measuring computational and communication load of a running simulation, then redistributing the simulation more effectively and allowing it to continue.

Simulation Case Study

Researchers in the DHPC group are investigating architectures and applications for Beowulf style multicomputers: systems composed of an array of low-cost workstation computers interconnected by a network and switches.  The researchers will use the Java Time-Warp kernel to develop simulation models of application programs running on different Beowulf configurations.

Further Information

Peter Ashenden
Dept. Computer Science, University of Adelaide, SA 5005, Australia
Phone: +61 8 8303 4477
Fax: +61 8 8303 4366
Email: petera@cs.adelaide.edu.au