I was listening to some folks from Anthropic talking about interpretability whilst doing chores, and thinking about how totally strange and yet vaguely mundane the world is that we’re living in right now. I mean, it’s not mundane for many reasons, but I’m specifically talking about the way that language models are permeating every corner of western culture.
We’ve invented this strange new thing, and we don’t totally understand all of the ways in which it works, and we’re only now getting a good enough look at how it really works such that we can begin to steer it, and yet we’ve inserted it between ourselves and our everyday lives. Ourselves and our work. Sometimes, ourselves and our art. Are there any other examples of this, at such speed, from across all of history?
I’m writing this in IA Writer whilst a Jupyter Notebook peeks out from behind the window. I’m writing words that will make a machine (eventually) write words, but not any that I’ve told it to write. I’m writing in a text editor that has a first-class feature for annotating text as authored by AI. I’m mostly writing stream of consciousness; just predicting the next word, and then typing it—but it’s more complicated than that.
It turns out that it’s more complicated than that for language models, too. There’s enough evidence from research teams that the model might “think” a few words ahead, even if it’s only predicting the next word. We’ve jumped so quickly to labeling these things as either “just another tool” or “sentient silicon-person” (because either is comforting in some way) but the truth (as it often is) might be somewhere in between.
Anyway, that’s part of why I’m building a (small) large language model (I know)—to better understand this thing that we’ve made. The other reason, admittedly, is that it’s just fun. Non-deterministic computers are goofy and I want to play with them. Computers have done exactly what we told them to do for a long time. Probably good? Maybe not?