Troubleshooting In Distributed Systems

*
Accepted Session
Short Form
Beginner
Scheduled: Tuesday, June 23, 2015 from 3:45 – 4:30pm in B302/303

Excerpt

The shift to microservice and distributed architectures has made software products more flexible and scalable-- and a lot more complex. With so many moving parts, ephemeral conditions and the spectre of partial failure, it can be much more difficult to pinpoint how and why things break. Learn how Logstash, Elasticsearch and Kibana can be used to monitor healthy systems and investigate issues as they pop up, and what we can do outside of software to improve our process of problem-solving.

Description

The shift to microservice and distributed architectures has made software products more flexible and scalable— and a lot more complex. With so many moving parts, ephemeral conditions and the spectre of partial failure, it can be much more difficult to pinpoint how and why things break.

Many tools have been created to help meet this challenge. We will provide an introduction to a trio of projects that form a highly extensible way to centralize, index, and view logs. Learn how to use Logstash to filter and transform your logs to make maximum use of the information they contain. We will discuss Elasticsearch and the powerful querying abilities its API provides. Finally, we will demonstrate how to use Kibana dashboards to keep tabs on systems that are running well, and diagnose those that aren’t. Kibana v.4 adds powerful new ways to visualize, aggregate and analyze data, and we will explore what insights it can provide about the internal state of your system.

Of course, software isn’t the only solution to the difficulties here. Fixing problems within distributed systems is itself a problem within a distributed system: Expertise, information and potential solutions are currently spread across teams of developers, testers, users, and the entire community that surrounds the software being used. We will discuss how we can level up our wider approach to solving problems to meet the challenges presented by systems of increasing complexity.

Tags

monitoring, troubleshooting, logging, analytics

Speaking experience

This is my first conference, but I have previously led workshops on science education.

Speaker

  • Square camels

    Megan Baker

    Workday

    Biography

    Megan is a Cloud Engineer at Workday, where she works on a team building an OpenStack-based private cloud. A recent graduate of Cornell University’s M.Eng. program, she is using her background in machine learning and analytics to tackle new problems in distributed systems.

    Outside of work, she’s interested in learning homebrewing, teaching her dog how to dance, and picking up new hobbies.

    Sessions