Consensual Illusions

Category: Human Edited

Is “Prompt Engineering” just good communication? And why does it matter what we call it?

soundtrack by Deep Sea Current

Over the last couple years the buzz phrase ‘prompt engineering’ has morphed into a widely accepted term for using language models, suggesting a specialized skill that you might need to pay someone to learn, like juggling fire or cooking soufflés. At first, I filed it away into the “continuing education” department of my brain like I would with a new programming language or framework. Finally I took a closer look at what people were calling “prompt engineering” and my first thought was “what am I missing, isn’t this just writing?”

The more I see it used, the less I think we really need another fancy term for what is essentially clear thinking and effective communication. As we hurtle into a future where these real core skills are becoming an endangered species – and the rift between the superpower-havers and the have-nots threatens to get bigger rather than smaller – the more I become convinced that it does more harm than good.

In The Short Term

Granted, the various formalized techniques out there for structuring prompts can be super useful as a cheat sheet. This is especially true if you’re not already in the habit of thinking through problems systematically. But if you inspect these strategies more closely, you’ll notice that they follow a few common patterns. They all focus on clearly articulating what you’re trying to achieve, providing relevant context, and facilitating their thought process to make sure they’re considering the most important details. Sound familiar? That’s because these are fundamental skills we use in any form of communication, whether its with humans or machines.

One counterargument is that there are definitely some model-specific quirks and technical limitations that are invisible to a user lacking specialized knowledge. Things like proactively working around a model’s token limits and context windows can lead to better and more consistent results. Understanding certain special parameters like temperature is useful if you have any control over it. And sure, when you’re building production systems that need to squeeze optimal performance out of these models, that specialized knowledge becomes much more relevant.

In The Long Term

But here’s the thing. As these models become more sophisticated, a lot of these limitations are becoming less relevant, even to the power users and enthusiast crowd. Modern models (or rather, ensembles of models a.k.a. ‘agents’) are increasingly good at understanding natural communication and intent. The “engineering” part of prompting is gradually being absorbed into the models themselves. And with the most recent ‘reasoning’ models (like GPT-o1), trying to micro-manage or over-engineer them can actually make them worse.

Even in cases where we’re working with simpler, more specialized models, we’re likely heading toward a future where more advanced models orchestrate these interactions for us. In other words, the technical details of prompt creation get abstracted away, letting us focus on clearly communicating our goals rather than mastering specialized prompting techniques.

Why does it matter?

I worry that by mystifying these skills with fancy terminology, we’re creating real barriers for people who could benefit from this technology. Some engineers I know have been hesitant to try using AI tools because they don’t have time to learn a whole new set of skills. This is exactly the problem – we’re taking fundamental skills people already have and rebranding them in a way that makes them feel inaccessible.

In software development, we’ve already seen how intimidating terminology can shape behavior. For a minute everyone thought they needed a “DevOps Engineer” to use continuous deployment. “Data Science” still gets branded as a mysterious discipline, artificially separated from analysts and engineers who do similar work. The same pattern is emerging with AI interaction. People who are already excellent communicators and problem solvers are holding back either because they think they need specialized training first, or because they’ve been alienated by the over use of buzzwords.

The irony is that the best results often come from clear, straightforward communication rather than technically complex prompts. I’ve watched ‘non-technical’ people get impressive results from these tools simply by explaining their needs clearly and iterating on the responses – the same skills they use when working with human teammates. By treating AI interaction as some specialized discipline, we risk overlooking the value of these fundamental communication skills that everyone already possesses.

So what do we call it instead?

Maybe instead of searching for the perfect catchphrase, we should think about who we’re trying to reach. Different groups naturally gravitate toward different metaphors and frameworks that make sense to them.

For the sci-fi crowd (my people), something like “Robopsychology” might actually be perfect, if a little on the wacky side. It captures the essence of understanding how these artificial entities process information, and helps emphasize the squishier parts vs the purely technical aspects of communicating with them.

The creative community might connect better with concepts like “AI Collaboration.” This framing acknowledges the partnership aspect of working with thinking machines, rather than treating them as just another system to be engineered (or something that autonomously replaces the artist and should be avoided). Writing this blog has been a fun experience in this area.

For educators, terms like “Learning Design” already exist to embed these tools into the context of existing teaching methodologies and strategies. In this context, prompting is a skill that blooms naturally from forward-thinking educators like Lilach and Ethan Mollick.

The point here isn’t to create more buzzwords – we have enough of those already. Instead, it’s about finding ways to make these concepts more approachable and relevant to different communities. Just as good teachers adapt their language to their students’ understanding, we should be flexible in how we talk about AI interaction based on who we’re talking to. And knowing your audience is yet another skill which will never become obsolete.

December 29, 2024
The New Disposable Code Economy
soundtrack by Deep Sea Current

I’ve been thinking about how generative AI is changing not just how we write code, but how we value it.

Here’s a specific example of what I mean: I started building a UI for a personal note organization app months ago. While Todo Lists are a popular tutorial project, making an app that matches your exact personal needs can become very complex and time consuming. I eventually decided it wasn’t the best use of my time. Recently I brushed it off again, since I still don’t have a good off the shelf solution that fits my needs. I quickly blew away most of the existing code and used an automated VSCode extension called Cline to recreate a better version within a couple of hours.

Another more general example calls back to all those gloriously ugly personal Geocities websites that popped up in the early days of the Internet. Except instead of HTML pages with tiled backgrounds, everyone is now spinning up entire frameworks and applications. This has already been a trend for a while, but what used to be a stream of DIY solutions has become a full-on flood now that anyone can generate working code quickly.

Why the ‘Why’

Think of it like the transition from hand-crafted furniture to IKEA. Master carpenters used to spend weeks perfecting joinery techniques. Now, the real value is in the design itself – figuring out how to make something functional, appealing, and mass-producible. The actual assembly became commoditized.

This isn’t just about AI making coding faster. It’s about a fundamental shift in what we consider valuable. When you can generate and regenerate implementation details almost instantly, the precious commodity becomes the product + architectural vision and problem-solving approach. In other words, the “why” of solving the problem becomes even more important to understand and communicate clearly versus the “how.”

Letting Go

Luckily, code is less environmentally damaging to dispose of than furniture. That still doesn’t make it easy. Even bad code quickly gets entangled in critical systems if it gets the job done. Creating new things is always more appealing and immediately gratifying than doing thankless surgery on a legacy codebase. This pattern leads to a deep, dark closet full of cruft and the sinking feeling that something in there is important but you don’t have the time to dig it out – until it starts to smell like smoke.

So just as it’s becoming easier to create code, we need to get better at letting it go. The ability to rapidly generate new code could make technical debt spiral out of control if we’re not careful. Every piece of code we keep around has a maintenance cost, whether it’s actively being worked on or not. It takes up mental space, requires security updates, and adds complexity to our systems. The more easily you can create new things, the more important it becomes to regularly clear out the old.

Implications

This shift has interesting implications for how we work, including:
1. Spending more time on problem definition and system design
2. Faster experimentation with different approaches
3. Less emotional attachment to specific implementations
4. Greater focus on business outcomes over technical perfection
5. Codebase pruning as a regular practice
These are all topics that deserve their own posts. But at a high level, what does this mean for engineers and other tech builders? I suspect the most valuable skills going forward won’t be memorizing language features or design patterns, but rather developing strong intuition about system architecture, trade-offs, and knowing when to let go. The ability to quickly evaluate different approaches, communicate their implications, and recognize when code has outlived its usefulness will matter more than ever. Encouraging each other to treat removal as a healthy part of the software lifecycle rather than a failure, and sharing success stories of code retirement can help build positive momentum towards a more future-proof development process.

The code itself might be disposable, but the thinking behind it – and the discipline to maintain a healthy codebase – certainly isn’t.
December 24, 2024
The AI Megapixel Wars – Benchmarks vs Practical Utility

soundtrack by Jason Sanders

The non-stop progression of Generative AI benchmarks over the past year has been both exciting and exhausting to follow. While big leaps in capabilities make for great headlines, I’m finding myself getting more skeptical about how much these improvements actually matter for everyday users. When I see reports about the latest model achieving better performance on some arcane academic test, I can’t help but think of my personal experiences where advanced models struggled with tasks like mastering CSS styling consistency, or ran themselves in circles trying to fix unit tests.

At times this disconnect between benchmark performance and practical utility feels like a repeat of the Great Digital Camera Megapixel Wars. More megapixels didn’t automatically translate to better photos, and I suspect that higher MMLU scores don’t always mean that a model will be more helpful for common tasks.

That said, there are cases where cutting-edge models can obviously shine – like complex code refactoring projects or handling nuanced technical discussions that require deep ‘understanding’ across multiple domains. The key is matching the tool to the task: I wouldn’t use a simpler model to help architect a distributed system, but I also wouldn’t pay premium rates to use GPT-o1 for basic text summarization.

Maybe instead of fixating on universal benchmarks, we need more personal metrics that reflect our very specific definitions of real-world usability. For example, how many attempts does it take to write a working Tabletop Simulator script so I can play a custom Magic: the Gathering game format? How well does the model maintain the most relevant context in longer conversations about building out my Pathfinder tabletop RPG character? I doubt that OpenAI researchers are focusing on benchmarks specific to these problems. (Side note: I think its interesting that while embellishing this blog post, Claude suggested I should avoid using examples that are ‘too niche.’ ‘Niche’ is real life. We are all a niche of one.)

I’d also hypothesize that a skilled verbal communicator working with an older model often outperforms an unfocused prompter using the latest frontier model, just like a pro with an old iPhone will still take better pictures than an amateur with the newest professional-grade digital camera. If this hypothesis is true, it suggests we should focus more on developing our own reasoning and communication skills, and choosing the right tool for each specific need, rather than chasing the latest breakthroughs.

The most practical benchmark for your own everyday use can be as simple as keeping notes about using different models for your real-world tasks. For example, this post was largely written using Claude 3.5 Sonnet v2 using a custom project, because I consistently prefer the style and tone I get from Claude using this method. Then I asked GPT-o1 to give technical feedback, because I prefer to use o1 as the ‘critic’ rather than the ‘creator.’ My own unscientific personal testing has revealed that while frontier models do often impress me with their ‘reasoning’ abilities, they’re not always the best fit for every step in every task. And as this technology continues to evolve, finding a balance between capability and practicality will become increasingly important for anyone just trying to get things done.

December 21, 2024