Hallucination isn’t nearly as big a problem as it used to be. Newer models aren’t perfect but they’re better.
The problem addressed by this isn’t hallucination, its the training to avoid failure states. Instead of guessing (different from hallucination), the system forces a Negative response.
That’s easy and any big and small company could do it, big companies just like the bullshit
Buuuuullshit. Asked different models about the ten highest summer transfer scorers and got wildly different answers. They then tried to explain why amd got more wrong numbers.
Hallucination isn’t nearly as big a problem as it used to be. Newer models aren’t perfect but they’re better.
The problem addressed by this isn’t hallucination, its the training to avoid failure states. Instead of guessing (different from hallucination), the system forces a Negative response. That’s easy and any big and small company could do it, big companies just like the bullshit
deleted by creator
A very tailored to llms strengths benchmark calls you a liar.
https://artificialanalysis.ai/articles/gemini-3-flash-everything-you-need-to-know (A month ago the hallucination rate was ~50-70%)
Buuuuullshit. Asked different models about the ten highest summer transfer scorers and got wildly different answers. They then tried to explain why amd got more wrong numbers.