Open Source Library (2023-12-21)

When working on the Language Understanding Intelligence Service, we needed to efficiently find all the occurances of multiple strings in a large body of text. Matthew Hurst told me about Trie search and how the tree structure could match multiple search phrases at once. His team implemented the search on the server in C#.

Later, on my pretty-good-nlp project I needed to find multiple phrases in a string and looked for a good Trie search implementation in Typescript. I found some partial JavaScript implementations but they had a few bugs and failed to find instances where one search phrase was a sub-phrase of another.

I wrote up my own implementation and decided that it was good enough to extract into its own package. Working on examples, I realized I could make the algorithm handle more than just words (i.e. tokenized text) and could leverage the Iterator<T>.

Story

One evening, when I was working on a large customer data store, there was an emergency team meeting to figure out how to handle a bug that had incorrectly hashed the Passport authentication identifiers in our SQL databases. About 20 of us were called into a 10 person conference room to discuss options.

A dev on my team reported he had already patched the code and had a SQL script to repair the existing data. However, some random guy from another team had been talking to my boss and followed us into the meeting room. He decided to take over the meeting and loudly declared that we first had to estimate how long running the script would take. My boss, normally outspoken and decisive, deferred to him. Everyone assumed he must be important and knew what he was doing.

He grabbed a whiteboard pen and starting doing a calculation on the board. He stated that updating a row in SQL probably takes at best 1 millisecond when within a transaction. He also wrote out a bunch of hash algorithm statements that he said added up to 2.5 milliseconds per row. We had 400 million rows that needed updating. He then calculated this would take 400 million x 3.5 = 1.4 billion milliseconds => 1.4 million seconds => 23 thousand minutes => 388 hours => 16 days.

No one could use the system until all the rows were updated. The room was in a panic. I thought that we should get the script started ASAP. I pulled the dev and a service engineer into the hallway to confirm they had confidence in the script and to ask them to get it running.

In the meantime, random guy started drawing up a shift rotation schedule for each person to be on site watching the script run over the next 16 days. After 15 minutes, as random guy was finalizing the shift schedule, the dev and service engineer returned. The dev whispered in my ear the results.

Random guy was literally giving a pep talk to the room about how hard it was going to be working 24 hours a day, but that it would be a worthy and valiant effort. I tried to interrupt him, but he didn't give me any opportunity.

Finally, he finished. I let the room know that we had run the script and it completed in under 15 minutes. The script updated over 400 rows per millisecond. He muttered that was impossible as his chest caved and he hung his head.

While this guy was a bit of a blow hard, he followed how developers are taught to estimate algorithms. We mentally walk through the steps, estimate the cost of each instruction, account for loops, and multiply by the number of items to get the result.

The problem was that computation speed had exceeded human imagination. It is hard for humans to understand how fast 1/400th of a millisecond goes by. We like to think we can understand that kind of processing power, but we can't be precise enough to avoid exponential errors. This was back in 2005. Today, that script would likely complete in seconds.

We shouldn't manually think through estimates. Instead, we have to run the code in a production/test environment and measure the real performance. Don't be that (random) guy.

Open Source Component Library

sterling-svelte is a modern, accessible, lightweight component UI library for Svelte.

It took just over a year of personal time to create. I loved getting to learn Svelte and SvelteKit completely. I also enjoyed the challenge of making the architecture, design, and technology decisions.

I'm very proud of this achievement. sterling-svelte is high quality, easy to use, fast, and extensible. The documentation and examples are what you would expect from professional and mature UI libraries. I'm pleased it has already has 4K+ downloads!

You can check it out here:

Now so much I know
that things just don't grow
if you don't bless them with your patience.
Emmylou - First Aid Kit
Open Source Library

pretty-good-nlp is a deterministic, match-based, recognizer for natural language processing (NLP) scenarios.

I built it so that I could have a recogizer for my machine learning applications while waiting for data scientists to build a predictive model.

It has a some really nice features including phrase and pattern matching, negations, weighting, order evaluation, and noise removal.

Story

Back in the days of Windows 95, my brother called me in a panic. He reported that his drive was out of space and he couldn't figure out what to do.

I walked him through selecting some files he didn't need anymore. When I told him to drag the files to the recycle bin in Windows Explorer, he couldn't find it. I asked him to look on his desktop and he reported it wasn't there either.

He then told me what he had done the day before while trying to clean up files on his desktop. He had somehow dragged the recycle bin icon around and ended up creating a shortcut to the recycle bin. He now had two recycle bins on his desktop. He dragged one on to to the other and then emptied the recycle bin.

Turns out he dragged the real recycle bin to the recycle bin shortcut. This put the recycle bin control panel app into the recycle bin. When he emptied the recycle bin, the control panel app deleted itself!

The OS went through the recycle bin app for soft and hard deletes. Ctrl+Del didn't do anything. We were able to open a command prompt and delete files, but this was too tedious for him to do regularly without my help. He just kind of lived with a too full drive until he bought his next computer.

I doubt any developer or tester could have predicted a user would use a shortcut to the recycle bin to delete the recycle bin.

Bookshelf

I've been studying software architecture throughout my career. I highly recommend these books. I reference them to solve tough problems or when I sense something is not quite right with existing code.

Most of the good programmers do programming not because they expect to get paid or get adulation by the public, but because it is fun to program.
Linus Torvalds
Open Source Examples

Recoil JS is my favorite React state management library. It is a lightning-fast, minimal, and ideomatic React. It blends MobX's observability and Redux's flat data islands.

For some unknown reason, the recoil examples on the official site are documentation only. I created a project for each example so you can build and debug them. Additionally, I provided an example of a dispatcher pattern using recoil.

Open Source Library

I needed a binary search algorithm to find scroll offsets in a virtualized list. There didn't seem to be a package implementing a binary search in Typescript, so I created this one. It has a bonus feature. If the value isn't found, it provides the nearest range of indices where the value would have been.

Opinion

If you've never heard of them, check out svelte.dev and kit.svelte.dev.

I am more productive. Writing HTML, CSS, and JS is straigtforward. Svelte's hot-module-reloading makes my inner dev loop very fast. I feel like I can code anything with Svelte.

I can properly architect and design. Svelte provides componentization, encapsulation, composition, and extensibility.

I don't have to make as many low-level decisions. I don't agonize over which state management, templating, or styling libraries to choose. Svelte meets all my needs for building applications and libraries.

I don't waste time trying to optimize my code by hand. There's no need call optimization memo functions since the Svelte compiler makes the code as fast and small.

I quickly became an expert. Svelte's documentation is concise and precise. The tutorial and examples get you started in a couple of hours. Addtional capabilities like SvelteKit are built on top of rather than inside Svelte. This helps Svelte avoid collapsing under the weight of additional complexity.

Even if you are a React or Angular fanatic, I hope you'll check out Svelte. I've found that learning new languages and technologies helps me be better with the ones I know.

I was raised up believing
I was somehow unique.
Like a snowflake
distinct among snowflakes,
unique in each way you can see.
And now, after some thinking,
I'd say I'd rather be
a functioning cog
in some great machinery
serving something beyond me.
Helplessness Blues - Fleet Foxes
Open Source Library

A splitter control allows the user to resize 2 different panes relative to each other. Splitters can be nested to create sophisticated application layouts.

The splitter is available for React and Svelte. Each is responsive, robust, and easy to use.

Everyone is afraid of losing
Even the ones that always win
Hey sleepwalker, when the mountain comes back to life
It doesn't come from without
It comes from within
Sleepwalker - The Killers