My thoughts on building effective AI Prompts - Design At The End Of The World

Building an AI-powered curriculum migration tool should be straightforward. The AI reads the old content, applies new formatting rules, and outputs the result. Except it only works about 60% of the time, and figuring out how to solve this has become my current obsession.

We’re building a tool to help staff refactor their course content for new curriculum guidelines. The AI takes existing course layouts and transforms them into the new required format. In theory, it’s simple. In practice, getting consistent results has been the real challenge.

The problem isn’t the technology itself, Moodle’s AI subsystem actually makes the technical side pretty painless. Once you set up a plugin, it handles all the sending and receiving of data to whatever AI endpoint the user has connected. You can focus on building features rather than wrestling with API calls. Different institutions can plug in their preferred service, OpenAI, Anthropic, or whatever else when they install your plugin.

The tricky part is the prompting. I’ve built the prompt to construct itself dynamically based on user selections, which gives us flexibility but makes troubleshooting a nightmare when the output goes wrong. There’s a sweet spot in terms of prompt length and complexity, too long and the AI seems to lose track of important details, too short and it fills in the gaps with guesses that might be wrong, or doesn’t return enough content.

The prompt itself should return the new module layout in JSON, and it does, but sometimes (not all the time) it’s double encoded and whatever I do it doesn’t seem to stop doing that.

I’d rather it return data in the correct format with guesses, than the wrong format altogether, which at the moment, breaks the output of the Moodle tool completely.

One approach would be to guard against bad responses and let users regenerate, which will probably feature in the final version. But that wastes AI tokens and creates friction. The real goal is to bump up that success rate by refining the prompt engineering itself.

So that’s where I am, tweaking, testing, and trying to crack the code on reliable AI output for real-world use.