I'm tired of LLM bullshitting. So I fixed it.

SuspciousCarrot78@lemmy.world · edit-2 6 hours ago

I'm tired of LLM bullshitting. So I fixed it.

ThirdConsul@lemmy.zip · 3 months ago

I want to believe you, but that would mean you solved hallucination.

Either:

A) you’re lying

B) you’re wrong

C) KB is very small

SuspciousCarrot78@lemmy.world · edit-2 4 hours ago

deleted by creator

ThirdConsul@lemmy.zip · 3 months ago

So… Rag with extra steps and rag summarization? What about facts that are not rag retrieval?

SuspciousCarrot78@lemmy.world · edit-2 4 hours ago

deleted by creator

ThirdConsul@lemmy.zip · 3 months ago

The system summarizes and hashes docs. The model can only answer from those summaries in that mode

Oh boy. So hallucination will occur here, and all further retrievals will be deterministically poisoned?

SuspciousCarrot78@lemmy.world · edit-2 4 hours ago

deleted by creator

ThirdConsul@lemmy.zip · 3 months ago

Huh? That is the literal opposite of what I said. Like, diametrically opposite.

The system summarizes and hashes docs. The model can only answer from those summaries in that mode. There’s no semantic retrieval step.

No, that’s exactly what you wrote.

Now, with this change

SUMM -> human reviews

That would be fixed, but will work only for small KBs, as otherwise the summary would be exhaustive.

Case in point: assume a Person model with 3-7 facts per Person. Assume small 3000 size set of Persons. How would the SUMM of work? Do you expect a human to verify that SUMM? How are you going to converse with your system to get the data from that KB Person set? Because to me that sounds like case C, only works for small KBs.

Again: the proposition is not “the model will never hallucinate.”. It’s “it can’t silently propagate hallucinations without a human explicitly allowing it to, and when it does, you trace it back to source version”.

Fair. Except that you are still left with the original problem of you don’t know WHEN the information is incorrect if you missed it at SUMM time.

SuspciousCarrot78@lemmy.world · edit-2 4 hours ago

deleted by creator

PolarKraken@lemmy.dbzer0.com · 3 months ago

Woof, after reading your “contributions” here, are you this fucking insufferable IRL or do you keep it behind a keyboard?

Goddamn. I’m assuming you work in tech in some capacity? Shout-out to anyone unlucky enough to white-knuckle through a workday with you, avoiding an HR incident would be a legitimate challenge, holy fuck.

Kobuster@feddit.dk · 3 months ago

Hallucination isn’t nearly as big a problem as it used to be. Newer models aren’t perfect but they’re better.

The problem addressed by this isn’t hallucination, its the training to avoid failure states. Instead of guessing (different from hallucination), the system forces a Negative response. That’s easy and any big and small company could do it, big companies just like the bullshit

SuspciousCarrot78@lemmy.world · edit-2 4 hours ago

deleted by creator

ThirdConsul@lemmy.zip · edit-2 3 months ago

A very tailored to llms strengths benchmark calls you a liar.

https://artificialanalysis.ai/articles/gemini-3-flash-everything-you-need-to-know (A month ago the hallucination rate was ~50-70%)

Squizzy@lemmy.world · 3 months ago

Buuuuullshit. Asked different models about the ten highest summer transfer scorers and got wildly different answers. They then tried to explain why amd got more wrong numbers.

I'm tired of LLM bullshitting. So I fixed it.

I'm tired of LLM bullshitting. So I fixed it.

llama-conductor