Serialist: lazy web-crawling in Haskell

*
Accepted Session
Short Form
Scheduled: Wednesday, June 2, 2010 from 3:45 – 4:30pm in Fremont

Excerpt

Serialist (http://serialist.net/) provides a way to find, track and read serialized content (e.g., web comics). It's implemented entirely in Haskell and demonstrates functional web application development, crawling, scraping and distributed architecture. Serialist uses interesting graph algorithms to add and step through content lazily.

Description

We’ll present Serialist, our site for keeping track of the webcomics and stories that we read.

We implemented Serialist entirely in Haskell. Serialist demonstrates functional web-application development, web crawling and scraping, distributed architecture in Haskell, and interesting graph algorithms.

Other sites exist for tracking webcomics updates, but require manual intervention from a moderator or administrator, often involving writing new page-scraping code for each serial. Our graph algorithms let us accept user submissions for new serials to crawl, making them available immediately. Haskell allowed us to concisely express our graph analyses, and run them over a lazy link-graph of the Internet.

Speaking experience

Speakers

  • Jamey is climbing the halls

    Biography

    Jamey Sharp was placed on Ritalin, briefly, in fifth grade. His interests and activities have been varied ever since. Today his day job involves a computer test for attention deficit disorder, but his biggest projects have been the Portland State Aerospace Society, a student rocketry club at Portland State University; XCB, a new low-level binding to the X protocol, in the process of replacing Xlib; and Serialist, because his other projects didn’t leave him enough time to read his favorite webcomics without tool support.

    Jamey’s interests span computer science fields including cryptography, combinatorial search, compilers, and computational complexity; systems-level programming, such as file format and network protocol implementations, Linux kernel development, and boot-loader hacking; computer architecture and its impact on software design; and functional programming, preferably in Haskell.

    Sessions

      • Title: Serialist: lazy web-crawling in Haskell
      • Track: Hacks
      • Room: Fremont
      • Time: 3:454:30pm
      • Excerpt:

        Serialist (http://serialist.net/) provides a way to find, track and read serialized content (e.g., web comics). It’s implemented entirely in Haskell and demonstrates functional web application development, crawling, scraping and distributed architecture. Serialist uses interesting graph algorithms to add and step through content lazily.

      • Speakers: <a href="/users/432">Jamey Sharp</a>, <a href="/users/434">Josh Triplett</a>
      • Title: Unlikely tools for pair programming
      • Track: Cooking
      • Room: Steel
      • Time: 3:454:30pm
      • Excerpt:

        Co-conspirators Jamey Sharp and Josh Triplett get up to a lot of miscellaneous hacking mischief together. Much of this hacking occurs while staring at the same screen, and tag-teaming the keyboard. Sometimes this happens with the two of them in different places. We’ll demo our favorite tools and invite audience contributions to the discussion.

      • Speakers: <a href="/users/432">Jamey Sharp</a>, <a href="/users/434">Josh Triplett</a>
  • Face 100x100

    Biography

    Josh Triplett is a PhD student at Portland State University and a Free and Open Source Software hacker. Josh is involved in research on relativistic programming and advanced synchronization techniques for highly parallel systems. Josh builds and launches Linux-powered rockets with the Portland State Aerospace Society, and hacks on numerous other projects . Lately, Josh does a lot of his hacking in Haskell.

    Sessions

      • Title: Unlikely tools for pair programming
      • Track: Cooking
      • Room: Steel
      • Time: 3:454:30pm
      • Excerpt:

        Co-conspirators Jamey Sharp and Josh Triplett get up to a lot of miscellaneous hacking mischief together. Much of this hacking occurs while staring at the same screen, and tag-teaming the keyboard. Sometimes this happens with the two of them in different places. We’ll demo our favorite tools and invite audience contributions to the discussion.

      • Speakers: <a href="/users/432">Jamey Sharp</a>, <a href="/users/434">Josh Triplett</a>
      • Title: Serialist: lazy web-crawling in Haskell
      • Track: Hacks
      • Room: Fremont
      • Time: 3:454:30pm
      • Excerpt:

        Serialist (http://serialist.net/) provides a way to find, track and read serialized content (e.g., web comics). It’s implemented entirely in Haskell and demonstrates functional web application development, crawling, scraping and distributed architecture. Serialist uses interesting graph algorithms to add and step through content lazily.

      • Speakers: <a href="/users/432">Jamey Sharp</a>, <a href="/users/434">Josh Triplett</a>