Adaptive Fault Tolerance for Data Streaming Systems
André Martin (TU Dresden)
Event Stream Processing (ESP) Systems are currently facing a renaissance in the data processing area as they provide results at low latency compared to the traditional MapReduce approach. In order to ensure responsiveness, active replication is an often used approach for fault tolerance for those classes of applications. Although the approach provides a quick recovery, it comes with a high price as it consumes almost twice the resources. In addition to active replication, a number of alternative mechanisms such as active and passive standby as well as passive replication exist that consume considerable less resources, however, at the cost of a longer recovery time. Since ESP applications are highly dynamic systems, such recovery times may also be in acceptable ranges for the user as the time to recover from a crash strongly depends on factors such as event throughput and state size that greatly vary over the course of processing. In this talk, I will present an adaptive approach for fault tolerance that is tailored to ESP application operating in cloud environments. Our evaluation shows that the overall resource footprint for fault tolerance can be considerably reduced using our adaptive approach without consequences to the recovery time.
André Martin is a post-doctoral researcher at the Systems Engineering Group at TU Dresden, Germany since January 2016. He holds a PhD (2015) and a Diploma (2008) in Computer Science both from the Technical University of Dresden. His research interests is in distributed systems and cloud computing with a focus in large scale data processing systems and fault tolerance.
Back to EBSIS Events section.