Machine Learning at Scale: Using Apache Spark and MLlib

*
Proposal
Short Form
Intermediate
Scheduled: Tuesday, June 23, 2015 from 11:00 – 11:45am

Excerpt

A common problem of working with large sets of data is that machine learning tools are not able to scale effectively. Apache Spark is a fast, cluster computing engine that provides a rich toolset for machine learning called MLlib, which solves this problem of scaling.

Description

A common problem of working with large sets of data is that machine learning tools are not able to scale effectively. Apache Spark is a fast, cluster computing engine that provides a rich toolset for machine learning called MLlib, which solves this problem of scaling.

In this talk you’ll learn the basic functionality of Spark’s MLlib, how to train a model using MLlib, and how machine learning with Spark might fit into a larger architecture. You’ll walk away with knowledge of the benefits of using Spark for machine learning and ideas for trying it out yourself.

Tags

spark, machine learning

Speaking experience

I've spoken at:

- PyData NYC 2013 http://pydata.org/nyc2013/abstracts/#92
- PyTennessee 2014 http://pytennessee.tumblr.com/post/75896855243/pytn-profiles-sarah-guido-and-spotify
- PyCon 2014 https://us.pycon.org/2014/schedule/presentation/244/
- PyData SV 2014 http://pydata.org/sv2014/abstracts/#212
- OSCON 2014 http://www.oscon.com/oscon2014/public/schedule/detail/34254 and also http://www.oscon.com/oscon2014/public/schedule/detail/34255
- PyGotham 2014
- PyTennessee 2015 https://www.pytennessee.org/schedule/presentation/72/
- Will be speaking at PyCon 2015 https://us.pycon.org/2015/schedule/presentation/320/ and https://us.pycon.org/2015/schedule/presentation/466/

I haven't given this particular talk before.

Speaker

  • Biography

    Sarah is a data scientist at Bitly. She loves Python, machine learning, and the startup world. She is an accomplished conference speaker and an O’Reilly Media author, and is very involved in the Python community.