The AI Hype Is Real
In the wake of the release of GPT-4, hype around “generative AI”—that is, AI that makes “stuff” (images, music, text, code, etc.)—has gone mainstream. Lots of non-technical people (including me!) are now rightfully excited about the prospect of AI writing blog posts or generating realistic-sounding pop songs.
People are also noticing that copywriters and computer programmers are about to face a lot of competition (and perhaps receive a lot of help) from these tools. Just yesterday, on Reddit, I saw this “prompt engineering” guide (one of many) to making ChatGPT generate various kinds of written works for marketing, blogging, and more. On Twitter, I saw this thread about how a blogger with no coding experience spent his weekend using GPT-4 to create an app and a Chrome extension from scratch.
All of these use cases are well within the known capabilities of today’s generative AI. And honestly, I think they are all fine. But using AI in this way no longer inspires me. It focuses so much on the generative aspect of these tools, and so little on the AI. Indeed, the fixation on generation is the source of much recent poo-pooing of AI. For instance, that ChatGPT routinely “hallucinates” facts is taken as proof that it doesn’t “know” anything, and is therefore actually useless for its ostensible purpose of “generating” factual text.
But, what if we didn’t care about using ChatGPT to generate blog posts or write apps? What if the astonishing thing about “generative AI” isn’t its capacity to generate things, but its intelligence?
Baristas versus Keurig Machines
When I used ChatGPT for the first time, my immediate reaction was: huh, I wonder if I can make this do my job for me.
It seems many people had this reaction, as evidenced by the sorts of projects I see people doing with ChatGPT every day—just see any of my examples above.
Then, a few weeks ago, this tweet, changed my entire paradigm for what ChatGPT is capable of. I had heard about “steerability,” but this was the moment it clicked for me that ChatGPT is seriously different from Midjourney (the previous AI tool that I got very hyped about). It is not a Keurig machine, taking your carefully-crafted “prompt” (the K-cup, in this analogy) as input and producing a finished “work” (the hot beverage) as output. Not at all.
That tweet made me realize that ChatGPT has a mind.
Now, to be clear, ChatGPT definitely does not have a human mind! It does not deserve rights or labor protections! And while ChatGPT is obviously a machine, I want to argue that it is much, much more than an appliance.
Everywhere I look I see people using ChatGPT like an appliance—that is, like a Keurig machine. You could make a cup of coffee by hand, grinding and weighing the beans, heating and pouring the water by hand, all adjusted based on your taste and past experience. Or, you could use a Keurig to do the same job, faster and with somewhat crappier results, in one easy step. Likewise, you could think up your own marketing copy or React code and type it up yourself in a word processor or IDE. Or, you can have ChatGPT do it for you, faster and with somewhat crappier results, in one easy step.1
I don’t mean to impugn using ChatGPT in this way. It obviously works when used like an appliance. But saying ChatGPT is an appliance is like saying a barista is a Keurig machine. And what makes a barista better than a Keurig? You can simply tell them what kind of beverage you want, and they will do their best to act on that request. This works because the barista has a mind.
The Mind of ChatGPT
What kind of mind does ChatGPT have?
ChatGPT’s mind is flexible. It will try to be whatever you tell it to be (within the limits of a chatbot interface, of course). This feature is known as “steerability.” You can tell ChatGPT to act like Socrates, and it will try its best to act like its idea of Socrates. You can tell it to explain things in terms that would make sense to a five-year-old, and it will try to do that, too.
ChatGPT’s mind is (in a very limited way) aware. This is made possible by the “context window.” According to OpenAI, GPT-4 is able to “handle” a context of up to 25,000 words—that is, within a given conversation ChatGPT is “aware” not just of the initial prompt, but of the entire conversation thus far, up to 25,000 words. I don’t know precisely how this works. I suppose the “dumb” way would be to append each successive prompt and response in the conversation to a single context object that GPT-4 refers to over and over. Some people seem to think there is a little more engineering behind ChatGPT than this, which I certainly hope is the case, since the alternative implies that the quality of conversations longer than the context window could degrade precipitously, possibly to the point of becoming useless. Alas, OpenAI is not actually open, so we don’t know for sure, except through our own observations.
And ChatGPT’s mind is knowledgeable. A key concept in understanding ChatGPT is “emergent behavior.” The simple example goes like this: OpenAI didn’t program ChatGPT to do math, or play chess, yet (to varying extents) it can do both. How? “Emergent behavior.” For whatever reason, when you make your large language model, er, large enough, it becomes able to, uh, model all sorts of things that can be described with, well, language. GPT-4 thus has concepts of all sorts of things. It “knows” what sonnets are and can produce them. It “knows” the rules of various games and can simulate them.
Thanks to its flexibility, awareness, and knowledge, ChatGPT is able to act in surprisingly intelligent ways.
The Parable of the Secretary
New topic: What is the point of having a human secretary in 2023? If you need to send correspondence, you can just fire off an email. If you need to manage your calendar, you can use Outlook or a phone app to do it yourself. Thanks to modern technology, busy adults have all sorts of tools at their disposal for managing their administrative lives. Secretaries have been made obsolete.
Except… they really haven’t. Law offices and doctors still have secretaries—when is the last time you scheduled a physical by having a phone call with an MD?
No, secretaries have not been made obsolete. To the contrary: technology has allowed all of us to have our own secretary—ourselves! All we need to do is use a bunch of apps.
Supposedly, this change is beneficial and convenient. Our lives, we believe, are made easier by such apps. But lived experience tells us this isn’t the whole truth. It is certainly easier to do secretarial work with a 2023 smartphone than a fully-outfitted 1966 office desk—so much so that all of us have added it to our other responsibilities without really noticing. But it is still a nonzero amount of work. Some people (e.g. ADHD-sufferers like myself) would argue that it is a lot of work! Technology has not gotten rid of this work. It has merely made it accessible to everyone.
It seems we can’t get rid of this work. But hey, at least we have convenient appliances for completing it—that is, apps. And really, what could be more convenient than an app?
Well, we actually know the answer to that question: it’s humans. Humans are more convenient than apps.
You may have heard of such a thing as a “personal assistant” (sometimes styled as an “executive assistant,” for those who consider themselves executives first and people second). This is a living, breathing human who will, among other things, interface with all of the secretarial apps for you. But… why? Those apps are fairly straightforward to use, and much cheaper than hiring a human worker.
Maybe we should conclude that human assistants are just an affectation of people who want to flaunt how rich or important they are. However, I think there’s something else going on here. If using secretarial apps yourself was truly the easiest, fastest, least cumbersome approach to life administration, rich and important people would use them. But clearly, in many cases, a calculation has been made that, no matter how easy the work of app-using may be, delegating to a human is easier.
Why might this be? The answer is simple. Apps are (virtual) appliances. Their interfaces are unnatural, inflexible, dead. You must learn to use them, adapt to their shortcomings, play by their rules.
But an assistant? You don’t need to learn how to “use” them in quite the same way. A good assistant will learn how you work, and how best to work for you. This is possible because an assistant has a mind.
As of April 2023, ChatGPT is in no way ready to act as anyone’s secretary or executive assistant.2 But I would submit to you that, much as a personal assistant is a human mind riding around in a human body, ChatGPT is a GPT-4 mind riding around in a chatbot body. It is no mere appliance, with inflexible rules of use you must obey. It can learn to serve you, in sophisticated and interesting ways, if only you deign to teach it.
Often, when people talk about “prompt engineering,” they are talking about engineering a singular prompt to set the chatbot up just-so to generate a desired output, as if engineering the perfect K-cup that anyone can place in their Keurig to produce the perfect cup of coffee. This is the job of “prompt engineer” at companies today: mess around with prompts until you get one that reliably produces the kind of output you want, so that other people in your company can paste that prompt into ChatGPT on their own machines and have the same kind of conversation there. For what it’s worth, long, complex, up-front prompts can definitely generate impressive results. But I argue that approaching ChatGPT in this way actually underestimates what it can do, what it can be.
In ChatGPT, unlike Midjourney, we aren’t restricted to “engineering” a single perfect prompt in order to generate a single perfect output.3 That’s what the “Chat” part of the name means: over the course of many prompts—AKA a conversation—you can teach ChatGPT to behave as a trained, knowledgeable assistant that can not only work for you, but work with you.
My point here is that, at least in the case of ChatGPT, the top-heavy “prompt engineering” paradigm lacks imagination. You can frontload a big, carefully worded prompt that sets ChatGPT up as a certain kind of appliance. But you can also have a dialogue with it, in which it learns about what you want, and—crucially—you learn about what it can do.
There Is No Executive Function App
I mentioned above that I suffer from ADHD. It is beyond the scope of this post to explain all my symptoms, but there is one symptom in particular that I suffer from every day and have never been able to solve. Namely, a shortfall of executive function. Essentially, when I set out to do any activity, invariably at some point within 1 to 45 minutes I will discover myself inexplicably doing some other activity, without having intentionally set out to do so, in a way that greatly slows down my progress on the original activity. (Obviously, every human has experienced this to some extent; what raises it to the level of a disorder is the severity.)
How does one fix this problem? Simply “trying harder” to stay focused doesn’t work—trust me, I’ve tried. Psychological science suggests various interventions, to wit: meditation (done it), medication (done it), and making a to-do list (done it more than everyone I know put together).
The to-do list idea has been especially powerful for me. Since college, I have sought to organize my life according to the workflow of Getting Things Done. Recognizing when a task has entered my life and “capturing it” (putting it on a list) has proven indispensable, as has a weekly review of my various lists and inboxes.
However, simply having a to-do list is not a solution in itself. You need to adhere to it. That’s the tricky part. I often find it overwhelming just to consult my to-do list, because the sheer awareness of multiple tasks I could be doing makes it harder to focus on the one I want to do first. So in addition to a to-do list, I need a priority list.
Like a good modern, I hypothesized that maybe apps can fix this problem for me! Thus, over the past five or so years, I tried every to-do app. Here’s my authoritative review: they don’t work.
Okay, that’s a little harsh. Prioritizing and time-blocking and tagging and filtering can help you identify the right task to work on in a given moment, but these systems do little to help you maintain focus once you’ve begun working, or restore it when you get off track. Moreover, as discussed above, every to-do app in existence requires you to conform to its opinion of what tasks “are” and how they should be organized and displayed.
Upon reflection, the problem I want to solve is not actually a problem of organization. The work I need to do is not a mystery to me, I’m just impeded from doing it by a mental condition. I’m not looking for a better list, I’m looking for better executive function.
My dream solution to this problem would be to have a human aide who quietly sits next to me all day and, when (not if!) I get distracted, gently taps me on the shoulder and redirects my attention back to the thing I was originally trying to do. Ideally, this aide would also encourage me when I’m feeling frustrated and help me think through why my attention might be drifting—perhaps the task I have set for myself is too big, or too poorly-defined, or not even the right thing to be working on right now.
I don’t want a Keurig, I want a barista. I don’t want to fiddle with my phone calendar, I want to delegate to a personal assistant. And I don’t want to use a to-do app, I want an executive function angel-on-my-shoulder that understands my weaknesses and helps me work through them.
You probably see where I’m going with this.
Proof of Concept: The ChatGPT Priority Tracker
Yeah so anyway I outsourced my executive function to ChatGPT:
I would say this is a successful, if quite a rudimentary, proof of concept. It does what I had in mind, better than I expected. This approach is obviously limited by the fact that ChatGPT is a chatbot and can only provide outputs on command—a far cry from a human sitting next to me and independently nudging me whenever I lose focus. But it’s closer to my vision than, say, a to-do list app. With this approach, all I have to do is keep ChatGPT open in a browser tab, and build the tiny habit of regularly looking at that tab. This feels like a big improvement over looking at a to-do list. Looking at a to-do list feels like opening the hatch on my mental submarine. ChatGPT is like a periscope.
You will notice that I did put some effort into “engineering” the initial prompt in order to define the general behavior I wanted from this assistant. However, I specifically did not worry about trying to imagine and nail down all the capabilities I wanted on the first try. Because the context window extends beyond the initial prompt, it is not necessary to get everything perfect right away. This is also more user-friendly to me, because at the outset I didn’t yet know all of the functionality I wanted from this assistant. I may yet add even more. That’s the beauty of ChatGPT having a mind—it can keep learning over time.
I encountered a couple of amusing limitations in this project. First, the bot loves to remind me about its willingness to help, despite my repeated requests that it not do so:
I haven’t used GPT-4 directly through the API, but I suspect this is an artifact of how OpenAI has programmed the ChatGPT chatbot’s default “personality,” rather than an inherent feature of the GPT-4 LLM.
Second, and more interesting to me, the logic that ChatGPT uses to contruct its responses seems to prioritize mechanical literalness over any sort of “natural” communication style. This is seen in cases where it repeats the same information multiple times in short order:
I did instruct the bot to always end its messages by restating the current top priority—in this sense, the bot is functioning as intended. But these example messages are unnatural and clearly silly. This is a good reminder that, as intelligent as ChatGPT appears, it is still very much a computer program, and behaves in computer program-y ways.
Thus far, my attempted remedy to this unwanted behavior has just been to remind the bot not to do it, as if I was talking to a human assistant. That seems to work sometimes, and is definitely the easiest and most natural fix for me as a user. However, it probably isn’t the most efficient approach in general. I am sure you could prompt engineer this behavior away by explicitly defining the linguistic logic you want the bot to follow in composing its responses. I haven’t done that here, because the point of this project was to do highlight ChatGPT’s intelligence by doing as little explicit prompt engineering as possible.
Okay, so earlier in this post I talked a lot of shit about “apps”. Obviously, ChatGPT is an app. This isn’t lost on me. But my contention is that, unlike the inflexible “appliance”-type apps that currently pervade our devices and lives, ChatGPT has a “mind” and thus provides a level of interactivity and intelligence that conventional apps do not. You can spend six hours fiddling with OmniFocus, but at the end of the day it still only has the (admittedly numerous) features that the Omni Group has built into it, and you have to work within those constraints.
What I have tried to illustrate here is that “generative AI” is useful not only for generation, but for actual, literal, artificial intelligence. I feel like I have, in a very rudimentary way, outsourced some of my executive function—a major feature of my human mind!—to ChatGPT. Using this chatbot feels less like using an app, and more like talking to an intelligent and cooperative, if somewhat alien, personal assistant.
In that light, I want to highlight the way I talk to ChatGPT. That is, I may seem to anthropormophize the bot, speaking to it politely, using complete sentences, deploying correct grammar and syntax to the extent possible. I do not think this is in any way necessary, and I highly doubt it makes the results better—for all I know those extra niceties are polluting my context and making the output worse.
In most prompt engineering I’ve seen, people talk to ChatGPT very differently than they do to humans—not rude, just very direct. This reminds me of a view I have occasionally encountered, that we should not say “please” or “thank you” to computer personalities—e.g. Siri, Alexa—because it is a shibboleth, because it implies they are more human than they actually are and devalues real humans somehow, or whatever.
All that might be true. But I don’t care. The whole point of this exercise was to illustrate the difference between ChatGPT and an “appliance” app. I specifically tried to get the functionality I wanted while communicating in the way that feels natural to me. The AI should adapt to and work for me, not the other way around. Because it’s intelligent.
Postscript: An, uh, Bicycle of the Mind?
I love computers. They are, as someone once said, “a bicycle of the mind.”4 When Jobs said this in 1990, he was talking about the effect of computers on the humans who use them, empowering them to accomplish things that simply aren’t possible otherwise.
Clearly he was onto something—computers now mediate our entire human existence. I am writing this blog post on a computer, and you’re reading it on one. But for all of history, the “true” power of computers—that is, the power to make them do precisely what you want—has been locked behind a knowledge of computer programming.
I am not a programmer. I understand what programmers do in the same way I understand what plumbers do. Just as I can unclog a toilet, I can paste code snippets from Stack Overflow into the terminal to do what needs doing. But I can’t design and build an app. I certainly can’t build an LLM.
I’ve been hearing about the imminent arrival of no-code development for years. Well, surprise! It’s here today, and it’s called ChatGPT. For $20 a month, you can skip right over learning to code and “program” a computer just by talking—within the constraints of a chatbot dialogue, of course.
All of this is to say, we are in the very, very early days of AI. The constraints are still so tight, the limitations so easy to hit. I expect that to change. And we should be excited. Everyone could benefit from a having a tireless, cooperative, intelligent assistant helping them through everyday life.
That said, getting ChatGPT to spit out the text you want is not necessarily “easy.” ↩
I see no technical obstacle to such interfaces to LLMs being built in the relatively near future. Check back in 2026—I bet once they hook GPT-N up to Alexa and Siri we’ll see some real progress on this. ↩
Also notwithstanding that GPT-4 is “non-deterministic,” i.e. its outputs have a little bit of randomness so that if you give it the same prompt 10 times it might give you 10 slightly different responses. ↩
I swear to God I always thought the quote was “for the mind” but evidently not. ↩