Building miniVoice: AI-Powered Audio Transcription in Just 8 Prompts



Building miniVoice: AI-Powered Audio Transcription in Just 8 Prompts
A year or two ago, I heard someone describe a future where software would be created on-demand, quickly assembled to solve a specific problem, and then just as quickly discarded. On-demand software. Just-in-time programming. This concept — software that exists briefly to meet immediate needs — has fascinated me ever since. When Claude 3.7 was released recently, I was curious how close this new model could get.
A friend was visiting and we were exploring Claude 3.7 together. He mentioned having a folder full of Apple voice notes he wanted to transcribe but felt uncomfortable uploading them to random online services. He also didn't have a convenient way to process multiple files efficiently. So, without any organization or planning I attempted to cowboy-code the tool into existence.
It took us just eight prompts.
The experience was impressive. Not because it was perfect, but by how close we got just by knowing what to ask for, and how to handle the simple errors that came up. But these were the types of errors that anyone can learn to deal with in a few attempts. These are not "spend years learning to code" problems, these are "introduce the new paradigm, get a handle on a few concepts, get loose with some trial and error over a few days and be comfortable" type of problems.
Yes. There were errors and inefficiencies, and they were mostly my fault. But even then, the AI effortlessly handled corrections. If something went wrong, I didn't need to diagnose it—I just pasted the error message directly back into Claude, and it quickly offered a fix.
Getting to an Idea in One Prompt
My initial prompt simply laid out the basic idea: an HTML page to record audio and send it to OpenAI's API for transcription. Claude immediately provided a working solution, although visually unimpressive.
I asked it to enhance the UI using Tailwind CSS and shadCN components, which instantly upgraded the app into a sleek, mobile-friendly interface—even though I couldn't code the underlying tech.
Errors emerged (like using CDN-hosted Tailwind in production), but Claude quickly guided me toward best practices. When I attempted to secure the API key in an environment file (.env) and mistakenly broke something, I again just pasted the error message into Claude. It promptly identified my missing quotes and resolved the issue.
MOAR FEATURES!!!!11
With the basics sorted, I casually requested additional features: support for uploading MP3s and other audio formats, and a function to download transcripts as markdown files. Claude incorporated these from a single prompt. I just asked it to add the functionality, and it did.
Finally, when we realized Apple's voice notes (m4a files) weren't initially compatible, Claude swiftly suggested converting them on-the-fly to MP3 format, effortlessly solving our final hurdle.
This wasn't just a neat trick—it signified something more profound. Traditionally, software solutions required considerable time, money, or both. But here, a highly personalized tool emerged almost spontaneously, tailored exactly to our needs, and then quietly vanished when no longer needed.
We're entering an era of disposable software—where applications are instantly crafted, quickly adjusted, and gracefully deleted or saved for later once their job is done. This fundamentally shifts how we think about software development, problem-solving, and digital interaction.
I'm genuinely excited about what's next. We’re just scratching the surface, but the possibilities ahead are immense.
Prompts Used
Prompt 1: Initial Request
I want to create an HTML page so I can quickly open a browser and record a message. It should then send that message to the OpenAI API for audio and with the query "transcribe this." I want to then display the results of that query on the page below the record button. It should probably have a loading window, a record button, all of that. Let's make it.
Prompt 2: UI Enhancement Request
Ok, now I'd like to make this a React app with Tailwind CSS and use some nice components. Shadcn are great. Make it look modern and like a slick mobile app because I'm gonna use it on my phone a lot.
Prompt 3: Error Report
(index):64 cdn.tailwindcss.com should not be used in production. To use Tailwind CSS in production, install it as a PostCSS plugin or use the Tailwind CLI: https://tailwindcss.com/docs/installation
babel.min.js:24 You are using the in-browser Babel transformer. Be sure to precompile your scripts for production - https://babeljs.io/docs/setup/
babel.min.js:2 Uncaught SyntaxError: Inline Babel script: Unexpected token (114:56)
Prompt 4: Environment Variable Request
I want to put the API key in an .env file. Can you help me? Then I can remove the API key form from the UI.
Prompt 5: Error Report
Uncaught SyntaxError: Invalid or unexpected token
react-dom.production.min.js:121 ReferenceError: apiKey is not defined
Prompt 6: Fix Request
Fix it.
Prompt 7: Feature Request
Can you make it so that I can upload an .mp3 or other audio file? Or whatever is from an Apple notes file? Does OpenAI accept those? And I also want to be able to download a markdown file from the transcript. Can you provide that functionality also?
Prompt 8: File Format Support
However, the .m4a file was getting rejected. Can you make sure that is accepted by the API, and if not, is it possible to convert it on the fly before we send?
Join Copy Club
If exploring AI's potential excites you as much as it excites me, consider joining Copy Club. Whether you're a complete beginner or already building sophisticated solutions, you'll find a supportive community eager to explore AI’s power together. It’s fun, it's empowering, and I'd love to have you join us!