I am in the process of building AI agents to do various tasks as I’ve outlined in a post I wrote recently but cost is now starting to become an issue. The all-you-can-eat model was not going to be sustainable and the more popular AI becomes the more demand and the more cost. It’s no different than everyone deciding to fly to Europe for the summer and airlines get full and there is only so much capacity to go around so prices go up or you do without.
It’s a bit challenging to understand pricing here. Each main provider (Anthropic Claude, Google Gemini, and OpenAI ChatGPT) have monthly subscriptions that you can use (Chat bot) and for most people that’s probably fine. It’s like having a Netflix sub and you turn on the TV and watch shows or movies. When you turn off the TV, the subscription is still there but you’re not using it and Netflix doesn’t have to stream anything to your TV. With AI agents however, they are constantly running in the background doing scheduled tasks or activities so it’s always on and consuming “tokens” to create the output you desire.
As I’ve written about many times, I am not an AI expert. I use AI to tell me how to do things and fix things and that’s how I’ve built everything that I have so far. The issue now is that asking all those questions and building tools and then having it carry out tasks is starting to get expensive. How do I solve this problem? Ask AI!
I asked AI to tell me how to optimize my spend on AI models and providers. It built the table below and then I asked it to create an inforgraphic that I can refer to as a guide.
OpenRouter.AI Provider/Model Cost Optimization
For Hermes Agent / agentic workflows, I’d use a tiered OpenRouter setup:
| Use case | Best value pick | Why |
|---|---|---|
| Default cheap agent model | deepseek/deepseek-v4-flash | Excellent value: $0.0983/M input, $0.1966/M output, 1M context, designed for coding assistants, chat, and agent workflows. |
| Ultra-cheap general/chat model | qwen/qwen3.5-flash-02-23 | Very low cost: $0.065/M input, $0.26/M output, 1M context. Good for summarizing, classifying, routine tool calls. |
| Cheap Google multimodal / PDF / long context | google/gemini-2.5-flash-lite | $0.10/M input, $0.40/M output, 1M context. Good when you need Google’s multimodal/document handling cheaply. |
| Better Google agent model | google/gemini-3.1-flash-lite | $0.25/M input, $1.50/M output, 1.05M context, designed for lightweight agentic workflows and high-volume use. |
| Cheap coding specialist | qwen/qwen3-coder or qwen/qwen3-coder-next-2025-02-03 | Qwen3 Coder is optimized for agentic coding, tool use, and repo-scale context; the main Qwen3 Coder page shows $0.22/M input, $1.80/M output and 1M context. |
| Cheap reasoning fallback | deepseek/deepseek-r1 or deepseek/deepseek-r1-0528 | R1 is stronger reasoning but costs more: original R1 is $0.70/M input, $2.50/M output. Use only when flash models fail. |
| Free experimentation | OpenRouter :free models | Useful for testing, but rate-limited. OpenRouter says free users have 50 requests/day and 20 rpm, while pay-as-you-go accounts with $10+ credits get 1000 requests/day on free models. |

My Thoughts
This is how ChatGPT thinks it is the best way to optimize using OpenRouter for the things I’m doing now. Depending on the things you might be doing, the provider/model combination may be different. Each person needs to submit their inventory of tasks and activities and have an AI evaluate the best pricing structure for models and providers.
I have taken the recommendations and reconfigured my OpenRouter settings and will let it run for the month of June and I’ll report back if there are significant savings or if it didn’t help much.
Share The Wealth
Are you using AI to task it to find cost optimization for your AI use yet?