Spelunking with ǝpoɔᴉu∩

Accepted Session
Short Form
Scheduled: Thursday, June 23, 2016 from 3:45 – 4:30pm in B304


What do a fistbump emoji, Mandarin Chinese, and rocket ships have in common? They're all represented with entries in Unicode, the biggest, baddest, and most widely-used open standard. In this talk, we'll explore the messy and conflicting ideas that humans call "text", and how we represent those ideas in software.


Strings seem like they should be among the simplest, most open, and transparent ideas ever represented in computing — an array of bytes that represents some text. The problem is that those bytes mask an enormous amount of complexity added by fickle human beings, who have conflicting ideas about the nature of text, over the many centuries we’ve been writing things down. Our best answer to this problem is a set of related standards collectively called Unicode.

In this talk, we’ll explore the complex inner workings of Unicode as represented in different programming languages, how humans have messy conceptualizations of what constitutes “text”, and how these ideas have collided inside our software. At the end, I hope you’ll leave with an appreciation for one of the most fascinating and detailed open, living standards in our field today.


standards, text, open-source, unicode, theory

Speaking experience

* O'Reilly talk one: https://www.youtube.com/watch?v=A8LV5fvNagY
* O'Reilly talk two: https://www.youtube.com/watch?v=ebLNPjPFaVk

I haven't given this exact talk before, but I enjoy giving talks in the general theme of "that thing that seems simple is actually quite complicated".


Leave a private comment to organizers about this proposal