AI monthly insights #3

In the November edition, we briefly discussed our first production-related topic, focusing on quality assurance and metrics-based testing of the AI applications. In this edition, we are staying with production-related topics, but this time, we will open a topic of prompt management.
Let's get straight to the point. Managing your prompts in Git can be challenging, often involving workflows that are both time-consuming and inefficient. For production-ready solutions, especially those serving thousands of users and where core functionality depends on AI models, a faster, iterative, and reliable workflow for refining prompts is essential. We need an approach that ensures system prompts, security-related prompts, or any other prompts used by your agentic system can be improved with minimal friction.
So, what is the minimal set of features we really need for prompt management?
Versioning
- This is a feature that both Git and spreadsheets can handle, but a proper VCS like Git is always a better choice. However, the downside is that Git isn’t particularly accessible to non-technical users, so the ideal solution should prioritize greater usability.
Editor
- Naturally, prompts need to be edited and refined repeatedly. This isn’t limited to development; non-technical roles, administrators, or domain experts may also need to engage in prompt engineering. A solution must provide an easy and intuitive way to edit prompts.
“Hot reload”
- We need a fast way to deploy the changes. Ideally, this process should require no manual steps, no application redeployment, and no downtime.
We highly recommend:
Adopting a more advanced solution
In short, if you are looking for production-ready platforms that cover all the above-mentioned features and more, you have several options:
- Langfuse (https://github.com/langfuse/langfuse),
- AgentaAI (https://github.com/Agenta-AI/agenta)
- Phoenix (Prompt management coming soon) (https://github.com/Arize-ai/phoenix).
We tend to prefer open-source, self-hosted options due to the increased focus on data sensitivity in today’s AI world. However, many open-source platforms also offer SaaS cloud-hosted options, providing the flexibility to switch between hosting models as needed.
The tools listed above go far beyond simple prompt management:
- Trace - they allow to trace the end-to-end flows
- Test and improve the prompting
- Evaluate the quality of your AI solution (evaluators similar to RAGAS - see the previous edition)
- Monitor the usage
- Collect feedback from end users
- and more.
Building your own prompt management
Developing your own solution to meet the minimal set of features can save significant time and often prove more cost-effective in the long run.
The AI domain is filled with libraries, platforms, and tools, many of which struggle to keep up with evolving needs, lose contributors, or fail to survive over time. We are still at the very beginning of the generative AI era, where everyone is focused on building better tools, and the lifecycle of these tools is often very short. This leaves you juggling between three options:
- Should we implement our own minimal solution?
- Should we take a chance on a promising tool that’s gaining traction on GitHub?
- Or should we adopt a pricey, established solution?
As Co-CEO, I bring together deep technical expertise and strategic vision to drive business growth. I enjoy solving problems through smart architecture, data, and a bit of math. Outside of work, you’ll probably find me on a bike, at the gym, or just tackling something new — because I don’t sit still for long.