I use VSCode and have been a subscriber to GitHub Copilot for a few years. I didn't pay much attention to the plan I was on until I started using agentic coding in ~October 2025. I then realized what made GitHub Copilot a ridiculously good value, as others discovered as well, was that it works on a "per-request" billing model. In short, if you knew what you were doing (I didn't realize this fully at the time), you could use a high-end model like Opus 4.5, which costs 3 credits, and if just have it rip for HOURS on a task and it would only cost 3 requests (lower end models would be 1 request). With the cheapest plan on GitHub Copilot, it is (well, now WAS), $10/month which gave you 300 requests. A lot of people took advantage of this... imagine paying $10/month and getting like $5000/$10,000 worth of value (ie, what would be the real cost of per-token billing) out of it per month! Absolutely insane.
Microsoft understandably put an end to that last week because they were losing their shirts:
https://github.blog/news-insights/company-news/github-copilot-is-moving-to-usage-based-billing/
In short, they are one of the first to go per-token/per-usage billing and I suspect others like Anthropic and OpenAI will eventually follow suit. It's only a matter of time given the economics of it all.
However as you may know, the Chinese AI labs are extremely competitive with their AI offerings and have both monthly plans and token-based plans (generally 90% less cost) and of course, they release the models outright. Because of what Microsoft did, I've been experimenting with the various Chinese models via OpenRouter (and still using VSCode GitHub Copilot "Bring Your Own Key" which they support), so it's basically the same experience.
However there seems to be a lot of advancements in high density, low parameter models (Qwen3.6 27B, Deepseek V4 Flash) which can be run on consumer hardware at good speeds and output that is not far behind something like Sonnet or Opus, or at least catching up quickly. I haven't owned a discreet GPU in many many years (I don't game), but I believe with a RTX 5090 and 64-128GB of RAM, it can be done. Don't quote me on any that however... I haven't dived into this world yet and don't yet have an understanding of all the settings that determine how well an LLM runs and how it affects its intelligence.
I'd be interested to hear anyone who is playing with this idea: What models are you using? What hardware? What software are you using to run it?