March 17, 2026

Llama has Joined the Chat

My next enhancement to Verdion was to improve diversity by allowing local llama-hosted models to participate...

reasoningllamallamacppfeature

I said it would happen

As predicted, approximately twelve hours after the soft launch, I was back in code, wiring up features. So, here we are.

Llama.cpp Models are Live

Verdion now supports the addition of any number of neurons using llama-server-hosted LLMs to a Ring, in addition to any number of supported cloud neurons. Which, of course, brings the number of supported models backing a neuron from 3 to, well, probably hundreds of thousands.

For debugging and testing, on my most portable dev “workstation” (affectionately nicknamed Craptop), I pitted a tiny 3.1GB Mistral model (runnning on an ancient nVidia Quadro) against the big-3 cloud models.

The first test

For testing, I used the C# class improvement task statement already written up in the MVP Thesis and the pre-print. It’s tested and reliable.

After resolving some simple context-management issues (900MB of remaining VRAM doesn’t exactly permit the 1M tokens of context granted by Opus 4.6), we were off to the races.

Mistral faces off against the Big 3

Poor Mistral. It got destroyed in competition - it was eliminated in the first bracket of every tournament in every test, and with good reason; it was hallucinating characters and making truly odd suggestions (HashSet<T>? yield return? Where are these coming from?). That said, I didn’t expect too much from a 3GB heavily-quantized 7B model.

Interesting but Expected

Mistral was a competent judge, though, which was interesting enough on its face, but also relevant to Verdion — introducing (far) less capable models does not compromise judging quality.

The short story is that the four neurons converged on the same optimal solution which the dozen-plus prior tournaments suggested they probably would.

Actually Interesting

What was fascinating was that Mistral learned.

I expected the Cloud models to be able to learn. Their context windows are large enough for Verdion to do some pretty sophisticated prompt engineering to all but guarantee it.

Mistral had 4K tokens available for input, thinking, and output - combined. But it still learned.

Its submissions grew in complexity and quality throughout the tournament. Not enough to become competitive with the cloud models; it continued to be trounced in its first bracket, as I mentioned. But Verdion’s adversarial architecture was showing that even an unsophisticated model with a tiny context window could still get better from exposure to the process in play.

Other Tests - And Hallucinations

During the debugging phase, I had Mistral tackle some of the other prompts I used for testing - notably the literary summary of House of Leaves, by Mark Z. Danielewski.

The results were both horrifying and hilarious - Mistral hallucinated characters, failed to recall actual characters’ names, mischaracterized them, and generally missed the point of the novel. Which is not surprising. And again, Verdion was up to the task; the remaining neurons in the tournament knew that Mistral was incorrect, and the tournament algorithm worked. A proper summary emerged, despite Mistral’s potential to influence the other models with hallucinations (tempered by the judging mechanic in play).

All in all — a model that thinks the main character of House of Leaves is named John can still contribute to a tournament that produces production-quality code. I didn’t have to tinker for that to happen.

Truth through competition.

A little teaser: Stay tuned - the next entry will be about a 30B model’s neuron running on my 4090, with results showing some absolutely captivating nuance.

← All posts