On predicting predictors: hacking archive formats for fun and prophecy

*

Excerpt

We aim to inform you about the archive formats you use every day. We will include an in-depth look at the tar, ar, cpio, gzip, bzip2, and deb formats, as well as the internals of the Git object store. Armed with this information, we will show you a practical application: removing the redundancy between files in version control and distributions of source and binaries.

Description

Existing projects like pristine-tar focus on finding the right options to the compression code to reproduce the file from the uncompressed data (“gzip -9 —rsyncable”), treating the file formats as magic black boxes. Our in-depth analysis of archive formats lets us record just enough information to reproduce any archive regardless of the tool used to produce it.

Speaking experience

Speakers

  • Jamey is climbing the halls

    Biography

    Jamey Sharp was placed on Ritalin, briefly, in fifth grade. His interests and activities have been varied ever since. Today his day job involves a computer test for attention deficit disorder, but his biggest projects have been the Portland State Aerospace Society, a student rocketry club at Portland State University; XCB, a new low-level binding to the X protocol, in the process of replacing Xlib; and Serialist, because his other projects didn’t leave him enough time to read his favorite webcomics without tool support.

    Jamey’s interests span computer science fields including cryptography, combinatorial search, compilers, and computational complexity; systems-level programming, such as file format and network protocol implementations, Linux kernel development, and boot-loader hacking; computer architecture and its impact on software design; and functional programming, preferably in Haskell.

    Sessions

      • Title: Serialist: lazy web-crawling in Haskell
      • Track: Hacks
      • Room: Fremont
      • Time: 3:454:30pm
      • Excerpt:

        Serialist (http://serialist.net/) provides a way to find, track and read serialized content (e.g., web comics). It’s implemented entirely in Haskell and demonstrates functional web application development, crawling, scraping and distributed architecture. Serialist uses interesting graph algorithms to add and step through content lazily.

      • Speakers: <a href="/users/432">Jamey Sharp</a>, <a href="/users/434">Josh Triplett</a>
      • Title: Unlikely tools for pair programming
      • Track: Cooking
      • Room: Steel
      • Time: 3:454:30pm
      • Excerpt:

        Co-conspirators Jamey Sharp and Josh Triplett get up to a lot of miscellaneous hacking mischief together. Much of this hacking occurs while staring at the same screen, and tag-teaming the keyboard. Sometimes this happens with the two of them in different places. We’ll demo our favorite tools and invite audience contributions to the discussion.

      • Speakers: <a href="/users/432">Jamey Sharp</a>, <a href="/users/434">Josh Triplett</a>
  • Face 100x100

    Biography

    Josh Triplett is a PhD student at Portland State University and a Free and Open Source Software hacker. Josh is involved in research on relativistic programming and advanced synchronization techniques for highly parallel systems. Josh builds and launches Linux-powered rockets with the Portland State Aerospace Society, and hacks on numerous other projects . Lately, Josh does a lot of his hacking in Haskell.

    Sessions

      • Title: Unlikely tools for pair programming
      • Track: Cooking
      • Room: Steel
      • Time: 3:454:30pm
      • Excerpt:

        Co-conspirators Jamey Sharp and Josh Triplett get up to a lot of miscellaneous hacking mischief together. Much of this hacking occurs while staring at the same screen, and tag-teaming the keyboard. Sometimes this happens with the two of them in different places. We’ll demo our favorite tools and invite audience contributions to the discussion.

      • Speakers: <a href="/users/432">Jamey Sharp</a>, <a href="/users/434">Josh Triplett</a>
      • Title: Serialist: lazy web-crawling in Haskell
      • Track: Hacks
      • Room: Fremont
      • Time: 3:454:30pm
      • Excerpt:

        Serialist (http://serialist.net/) provides a way to find, track and read serialized content (e.g., web comics). It’s implemented entirely in Haskell and demonstrates functional web application development, crawling, scraping and distributed architecture. Serialist uses interesting graph algorithms to add and step through content lazily.

      • Speakers: <a href="/users/432">Jamey Sharp</a>, <a href="/users/434">Josh Triplett</a>