Hallucination benchmarks are messy in 2026. Error rates shift wildly depending...
https://nova-wiki.win/index.php/Sycophancy-Induced_Hallucination:_Why_Your_Frontier_Model_is_Lying_to_You_(And_How_to_Fix_It)
Hallucination benchmarks are messy in 2026. Error rates shift wildly depending on the test you choose. For instance, the HalluHard benchmark shows a 30.2% failure rate even with web search enabled