Open Source Library (2023-12-21)

When working on the Language Understanding Intelligence Service, we needed to efficiently find all the occurances of multiple strings in a large body of text. Matthew Hurst told me about Trie search and how the tree structure could match multiple search phrases at once. His team implemented the search on the server in C#.

Later, on my pretty-good-nlp project I needed to find multiple phrases in a string and looked for a good Trie search implementation in Typescript. I found some partial JavaScript implementations but they had a few bugs and failed to find instances where one search phrase was a sub-phrase of another.

I wrote up my own implementation and decided that it was good enough to extract into its own package. Working on examples, I realized I could make the algorithm handle more than just words (i.e. tokenized text) and could leverage the Iterator<T>.