Tag: real world not hello world

  • The New Disposable Code Economy

    soundtrack by Deep Sea Current

    I’ve been thinking about how generative AI is changing not just how we write code, but how we value it.

    Here’s a specific example of what I mean: I started building a UI for a personal note organization app months ago. While Todo Lists are a popular tutorial project, making an app that matches your exact personal needs can become very complex and time consuming. I eventually decided it wasn’t the best use of my time. Recently I brushed it off again, since I still don’t have a good off the shelf solution that fits my needs. I quickly blew away most of the existing code and used an automated VSCode extension called Cline to recreate a better version within a couple of hours.

    Another more general example calls back to all those gloriously ugly personal Geocities websites that popped up in the early days of the Internet. Except instead of HTML pages with tiled backgrounds, everyone is now spinning up entire frameworks and applications. This has already been a trend for a while, but what used to be a stream of DIY solutions has become a full-on flood now that anyone can generate working code quickly.

    Why the ‘Why’

    Think of it like the transition from hand-crafted furniture to IKEA. Master carpenters used to spend weeks perfecting joinery techniques. Now, the real value is in the design itself – figuring out how to make something functional, appealing, and mass-producible. The actual assembly became commoditized.

    This isn’t just about AI making coding faster. It’s about a fundamental shift in what we consider valuable. When you can generate and regenerate implementation details almost instantly, the precious commodity becomes the product + architectural vision and problem-solving approach. In other words, the “why” of solving the problem becomes even more important to understand and communicate clearly versus the “how.”

    Letting Go

    Luckily, code is less environmentally damaging to dispose of than furniture. That still doesn’t make it easy. Even bad code quickly gets entangled in critical systems if it gets the job done. Creating new things is always more appealing and immediately gratifying than doing thankless surgery on a legacy codebase. This pattern leads to a deep, dark closet full of cruft and the sinking feeling that something in there is important but you don’t have the time to dig it out – until it starts to smell like smoke.

    So just as it’s becoming easier to create code, we need to get better at letting it go. The ability to rapidly generate new code could make technical debt spiral out of control if we’re not careful. Every piece of code we keep around has a maintenance cost, whether it’s actively being worked on or not. It takes up mental space, requires security updates, and adds complexity to our systems. The more easily you can create new things, the more important it becomes to regularly clear out the old.

    Implications

    This shift has interesting implications for how we work, including:

    1. Spending more time on problem definition and system design
    2. Faster experimentation with different approaches
    3. Less emotional attachment to specific implementations
    4. Greater focus on business outcomes over technical perfection
    5. Codebase pruning as a regular practice

    These are all topics that deserve their own posts. But at a high level, what does this mean for engineers and other tech builders? I suspect the most valuable skills going forward won’t be memorizing language features or design patterns, but rather developing strong intuition about system architecture, trade-offs, and knowing when to let go. The ability to quickly evaluate different approaches, communicate their implications, and recognize when code has outlived its usefulness will matter more than ever. Encouraging each other to treat removal as a healthy part of the software lifecycle rather than a failure, and sharing success stories of code retirement can help build positive momentum towards a more future-proof development process.

    The code itself might be disposable, but the thinking behind it – and the discipline to maintain a healthy codebase – certainly isn’t.

  • The AI Megapixel Wars – Benchmarks vs Practical Utility

    soundtrack by Jason Sanders

    The non-stop progression of Generative AI benchmarks over the past year has been both exciting and exhausting to follow. While big leaps in capabilities make for great headlines, I’m finding myself getting more skeptical about how much these improvements actually matter for everyday users. When I see reports about the latest model achieving better performance on some arcane academic test, I can’t help but think of my personal experiences where advanced models struggled with tasks like mastering CSS styling consistency, or ran themselves in circles trying to fix unit tests.

    At times this disconnect between benchmark performance and practical utility feels like a repeat of the Great Digital Camera Megapixel Wars. More megapixels didn’t automatically translate to better photos, and I suspect that higher MMLU scores don’t always mean that a model will be more helpful for common tasks.

    That said, there are cases where cutting-edge models can obviously shine – like complex code refactoring projects or handling nuanced technical discussions that require deep ‘understanding’ across multiple domains. The key is matching the tool to the task: I wouldn’t use a simpler model to help architect a distributed system, but I also wouldn’t pay premium rates to use GPT-o1 for basic text summarization.

    Maybe instead of fixating on universal benchmarks, we need more personal metrics that reflect our very specific definitions of real-world usability. For example, how many attempts does it take to write a working Tabletop Simulator script so I can play a custom Magic: the Gathering game format? How well does the model maintain the most relevant context in longer conversations about building out my Pathfinder tabletop RPG character? I doubt that OpenAI researchers are focusing on benchmarks specific to these problems. (Side note: I think its interesting that while embellishing this blog post, Claude suggested I should avoid using examples that are ‘too niche.’ ‘Niche’ is real life. We are all a niche of one.)

    I’d also hypothesize that a skilled verbal communicator working with an older model often outperforms an unfocused prompter using the latest frontier model, just like a pro with an old iPhone will still take better pictures than an amateur with the newest professional-grade digital camera. If this hypothesis is true, it suggests we should focus more on developing our own reasoning and communication skills, and choosing the right tool for each specific need, rather than chasing the latest breakthroughs.

    The most practical benchmark for your own everyday use can be as simple as keeping notes about using different models for your real-world tasks. For example, this post was largely written using Claude 3.5 Sonnet v2 using a custom project, because I consistently prefer the style and tone I get from Claude using this method. Then I asked GPT-o1 to give technical feedback, because I prefer to use o1 as the ‘critic’ rather than the ‘creator.’ My own unscientific personal testing has revealed that while frontier models do often impress me with their ‘reasoning’ abilities, they’re not always the best fit for every step in every task. And as this technology continues to evolve, finding a balance between capability and practicality will become increasingly important for anyone just trying to get things done.