What's in a name? Phonetic Algorithms for Search and Similarity

*
Accepted Session
Short Form
Intermediate
Scheduled: Tuesday, June 23, 2015 from 10:00 – 10:45am in B204

Excerpt

Search can be as simple as returning a word or part of word based on character similarity. LIKE and wildcard matches can be sufficient, but can only account for character or string matching, and fail on misspelled words or names. Phonetic algorithms can help us find matches for misspellings and typo'd user data.

Description

While building a lost and found item database for a yearly convention, I found that I needed something a bit more reliable than string matching to search for names of con-goers who lost an item. Due to a short development timeline and this being a side project, I didn’t have time to observe or train the intended users of the system. I needed to build a querying interface that was easy to use and could handle. Though I was using a database with full-text search capabilities, it didn’t account for name similarities, case insensitivity, or misspelled names. Enter phonetic algorithms, which break words and names down by pronunciation, enabling similar and misspelled queries to return more matches. In this talk, I’ll go over the following:

  • Full text search: what is it, how does it work and where is it used
    • I’ll go over some examples with *SQL and Elasticsearch queries, digging into how they work under the hood.
  • Phonetic algorithms:
    • Soundex
    • Fuzzy
    • Metaphone
  • Phonetic Algorithm use cases and examples
    • Spell checkers
    • Libraries to use
    • Preprocessing and indexing search data in *SQL and NoSQL
    • How do Phonetic Algorithms hold up to non-english words and names

Tags

Databases, programming, linguistics

Speaking experience

I've given several lightning and 30-40 minute talks at local meetups, and have given a talk on Mentoring Apprentice Engineers DevOpsDays Silicon Valley in 2013 and 2014. Video of my talk in 2014 is online here: https://vimeo.com/115484860

Speaker