Sarah is a data scientist at Bitly. She loves Python, machine learning, and the startup world. She is an accomplished conference speaker and an O’Reilly Media author, and is very involved in the Python community.

* Machine Learning at Scale: Using Apache Spark and MLlib

A common problem of working with large sets of data is that machine learning tools are not able to scale effectively. Apache Spark is a fast, cluster computing engine that provides a rich toolset for machine learning called MLlib, which solves this problem of scaling.
