Fun story from our internal testing on Claude 3 Opus. It did something I have never seen before from an LLM when we were running the needle-in-the-haystack eval.

For background, this tests a model’s recall ability by inserting a target sentence (the "needle") into a corpus of… pic.twitter.com/m7wWhhu6Fg

— Alex (@alexalbert__) March 4, 2024