Research
I am working with Prof. Sayan Mitra within the Center for Reliable and High-Performance Computing at the Coordinated Science Laboratory of the University of Illinois at Urbana-Champaign.
Publications
2010
SSS 2010
Taylor Johnson and Sayan Mitra, "Safe Flocking in Spite of Actuator Faults", 12th International Symposium on Stabilization, Safety, and Security
of Distributed Systems (SSS 2010), September 2010.
The safe flocking problem requires a collection of N mobile agents to (a) converge to and maintain an equi-spaced lattice formation, (b) arrive at a destination, and (c) always maintain a minimum safe separation. Safe flocking in Euclidean spaces is a well-studied and difficult coordination problem. Motivated by real-world deployment of multi-agent systems, this paper studies one-dimensional safe flocking, where agents are afflicted by actuator faults. An actuator fault is a new type of failure that causes the affected agent to be stuck with an arbitrary velocity. In this setting, first, a self-stabilizing solution for the problem is presented. This relies on a failure detector for actuator faults. Next, it is shown that certain actuator faults cannot be detected, while others may require O(N) time for detection. Finally, a simple failure detector that achieves the latter bound is presented. Several simulation results are presented for illustrating the effects of failures on the progress towards flocking.
Taylor Johnson, "Fault-Tolerant Distributed Cyber-Physical Systems: Two Case Studies", Master's Thesis, University of Illinois at Urbana-Champaign, May 2010. <pdf>
Fault-tolerance in distributed computing systems has been investigated extensively in the literature and has a rich history and detailed theory. This thesis studies fault-tolerance for
distributed cyber-physical systems (DCPS), where distributed computation is combined with dynamics of physical processes. Due to their interaction with the physical world, DCPS may suffer
from failures that are qualitatively different from the types of failures studied in distributed computing. Failures of the components of DCPS which interact with the physical
processes---such as actuators and sensors---must be considered. Failures in the cyber domain may interact with failures of sensors and actuators in adverse ways.
This thesis takes a first
step in analyzing fault-tolerance in DCPS through the presentation of two case studies. In each case study, the DCPS are modeled as distributed algorithms executed by a set of agents, where
each agent acts independently based on information obtained from its communication neighbors and agents may suffer from various failures. The first case study is a distributed traffic
control problem, where agents control regions of roadway to move vehicles toward a destination, in spite of some agents' computers crashing permanently. The second case study is a
distributed flocking problem, where agents form a flock, or a roughly equally spaced distribution in one dimension, and move towards a destination, in spite of some agents' actuators
becoming stuck at some value.
Each algorithm incorporates self-stabilization in order to solve the problem in spite of failures. The traffic algorithm uses a local signaling mechanism to
guarantee safety and a self-stabilizing routing protocol to guarantee progress. The flocking algorithm uses a failure detector combined with an additional control strategy to ensure safety
and progress.
ICDCS 2010
Taylor Johnson, Sayan Mitra, and Karthikeyan Manamcheri Sukumar, "Safe and Stabilizing Distributed Cellular Flows", 30th IEEE International Conference on Distributed Computing Systems (ICDCS 2010), June 2010.
Advances in wireless vehicular networks present us with opportunities for developing new distributed traffic control algorithms that avoid phenomena such as abrupt phase-transitions. Towards this end, we study the problem of distributed traffic control in a partitioned plane where the movement of all entities (vehicles) within each partition (or cell) is controlled by a single process. We present a distributed traffic control protocol that guarantees minimum separation between vehicles at all times, even when some cell-processes fail by crashing. Once failures cease, the protocol is guaranteed to stabilize and the packets with feasible paths to the target cell make progress towards it. The algorithm relies on two general principles: local geographical routing, and temporary blocking for maintenance of safety. Our proofs use mostly assertional reasoning and may serve as a template for analyzing other distributed traffic control protocols. We also present simulation results which provide estimates of throughput as a function of velocity, safety separation, and path complexity. Further, we present simulation results to estimate throughput as a function of failure and recovery rates.
2009
RTSS 2009
Presented "Handling Failures in Cyber-Physical Systems: Potential Directions" at PhD Student Forum
<pdf (abstract), pdf (slides)>
The strong coupling of software and physical processes in the emerging field of cyber-physical systems (CPS) motivates the development of new methods to respond to failures in both the cyber and physical domains. To this end, we propose a study of existing work on handling failures from various disciplines. If these models and methods are applicable to CPS, appropriate extensions should be made to apply them. However, if they are not, then we should head off into uncharted territory, developing new methods, which we suggest to be drawn from fields such as formal methods and verification.
RTAS 2009
Presented "Power Usage of Time and Event-Triggered Paradigms: A Case Study" at Poster Session
<pdf (abstract), pdf (poster), source code>
We evaluate the time-triggered and event-triggered programming paradigms in the context of developing large-scale distributed real-time systems with fault-tolerance. To this end, we present a simple case study using the Lego Mindstorms NXT platform in an intrusion detection problem, and compare the power consumption and accuracy of event detection in time-triggered and event-triggered systems. We use this case study comparison as part of the motivation for the development of a new mixed event-triggered and time-triggered language.