What's in a name? Phonetic Algorithms for Search and Similarity*
Search can be as simple as returning a word or part of word based on character similarity. LIKE and wildcard matches can be sufficient, but can only account for character or string matching, and fail on misspelled words or names. Phonetic algorithms can help us find matches for misspellings and typo'd user data.
While building a lost and found item database for a yearly convention, I found that I needed something a bit more reliable than string matching to search for names of con-goers who lost an item. Due to a short development timeline and this being a side project, I didn’t have time to observe or train the intended users of the system. I needed to build a querying interface that was easy to use and could handle. Though I was using a database with full-text search capabilities, it didn’t account for name similarities, case insensitivity, or misspelled names. Enter phonetic algorithms, which break words and names down by pronunciation, enabling similar and misspelled queries to return more matches. In this talk, I’ll go over the following:
- Full text search: what is it, how does it work and where is it used
- I’ll go over some examples with *SQL and Elasticsearch queries, digging into how they work under the hood.
- Phonetic algorithms:
- Phonetic Algorithm use cases and examples
- Spell checkers
- Libraries to use
- Preprocessing and indexing search data in *SQL and NoSQL
- How do Phonetic Algorithms hold up to non-english words and names
Databases, programming, linguistics
I've given several lightning and 30-40 minute talks at local meetups, and have given a talk on Mentoring Apprentice Engineers DevOpsDays Silicon Valley in 2013 and 2014. Video of my talk in 2014 is online here: https://vimeo.com/115484860
Mercedes Coyle is a Data Infrastructure Engineer with Ulive, where she builds and maintains systems and pipelines for Analytics. In her spare time, she’s learning how to run a backyard farm, including building a chicken coop monitoring system.
- Title: What's in a name? Phonetic Algorithms for Search and Similarity
- Track: Chemistry
- Room: B204
- Time: 10:00 – 10:45am
Search can be as simple as returning a word or part of word based on character similarity. LIKE and wildcard matches can be sufficient, but can only account for character or string matching, and fail on misspelled words or names. Phonetic algorithms can help us find matches for misspellings and typo’d user data.
- Speakers: Mercedes Coyle