Don't Fear Unicode

*
Accepted Session
Short Form
Intermediate
Scheduled: Tuesday, June 26, 2012 from 2:30 – 3:15pm in B201

Excerpt

Unicode isn’t new, but it still seems hard when your starting at the beginning and haven’t even been told the difference between a glyph, a codepoint, a character and a byte. Every year there are talks and tutorials at conferences about it, but if you haven’t grasped the basics, you can feel frustrated and lost much too quickly. This talk will cover the essentials of Unicode, locale and how they affect things like regular expressions, reading and writing files and sending data out to the world. Perl will be the programming language used to demonstrate these ideas, but much of the content should be accessible to all programmers.

Description

Unicode isn’t new, but it still seems hard when your starting at the beginning and haven’t even been told the difference between a glyph, a codepoint, a character and a byte. Every year there are talks and tutorials at conferences about it, but if you haven’t grasped the basics, you can feel frustrated and lost much too quickly.

Unicode sneaks into the most unexpected places. Do you ever wonder if your life would be much, much easier if your default encoding was not ASCII? Do you know what UTF-8 and Unicode strings are? Do you know what your default encoding is, or how to change it? Does it all seem to hard, and make you resent anything to do with the locale?

If 7-bit ASCII was good enough for me, it should be good enough for you! Have you been left behind with this whole Unicode thing to the point that you’re confused and resentful of it all? I know I was. When your name, and everything you write works wonderfully in ASCII it can be hard to summon the enthusiasm to learn about Unicode, even when you know that you should be handling your data better.

Imagine your code is using a logging library, that expects strings. What does it do when you pass it a string containing unicode? It’ll probably write it, encoding it in your default encoding (probably ASCII). And it’ll probably work, on all of your test cases, and on most of your data. Until someone comes on with a non-ASCII character in their name, and causes your code to throw an exception. You probably weren’t expecting it, it might not even be your library that’s at fault. Unicode works implicitly just often enough that unicode characters can sneak in well before you realise your code isn’t robust enough to handle them.

This talk will cover the essentials of Unicode and how it affects things like regular expressions, reading and writing files, working with strings and sending data out to the world. Perl will be the programming language used to demonstrate these ideas, but much of the content should be accessible to all programmers.

Speaking experience

My job is essentially to talk to people. I run Perl training courses, speak at Perl Monger meetings and conferences.

I've presented an alpha version of this talk at a mini conference for linux.conf.au and to the Melbourne Perl Mongers. I've offered a much more technical version to OSCON, but I haven't received acceptances yet.

Speaker

  • Jarich face

    Jacinta Richardson

    Perl Training Australia

    Biography

    Jacinta Richardson runs Perl Training Australia, a micro-business offering courses throughout Australia. Both as part of her job and a massive free-time sink, she is involved in running conferences (linux.conf.au 2007, Open Source Developers’ Conference (Australia) 2004-2008, Australian System Administrators Conference (SAGE-AU) 2008-2009), attending conferences, writing perl-tips, speaking at Perl Monger meetings whenever she’s in the right town, participating in on-line Perl forums and promoting women in IT. For her work in the Perl community, Jacinta was awarded the White Camel Award in 2008. When away from the computer, Jacinta enjoys scuba diving, cycling and baking.

    Sessions

      • Title: Don't Fear Unicode
      • Track: Cooking
      • Room: B201
      • Time: 2:303:15pm
      • Excerpt:

        Unicode isn’t new, but it still seems hard when your starting at the beginning and haven’t even been told the difference between a glyph, a codepoint, a character and a byte. Every year there are talks and tutorials at conferences about it, but if you haven’t grasped the basics, you can feel frustrated and lost much too quickly. This talk will cover the essentials of Unicode, locale and how they affect things like regular expressions, reading and writing files and sending data out to the world. Perl will be the programming language used to demonstrate these ideas, but much of the content should be accessible to all programmers.

      • Speakers: Jacinta Richardson