Really insightful breakdown of the Vercel test. The fact that "You MUST invoke" performed worse than "Explore project first" shows how much prompt engineering is still voodoo even in 2026. Teh pull vs push context trade-off is interesting too, basically trading token efficiency for reliability. Seems like we're learning that reducing agent decision points (even if it means bloating context) beats hoping the model makes the right retrieval call.
It’s really hard to wrap your head around AI. On one side you have these amazing new tools and achievements, just look at Genie 3. On the other, you still get voodoo prompt shenanigans like this Vercel eval and image gen with misspelled words. 🤷♂️
Really insightful breakdown of the Vercel test. The fact that "You MUST invoke" performed worse than "Explore project first" shows how much prompt engineering is still voodoo even in 2026. Teh pull vs push context trade-off is interesting too, basically trading token efficiency for reliability. Seems like we're learning that reducing agent decision points (even if it means bloating context) beats hoping the model makes the right retrieval call.
It’s really hard to wrap your head around AI. On one side you have these amazing new tools and achievements, just look at Genie 3. On the other, you still get voodoo prompt shenanigans like this Vercel eval and image gen with misspelled words. 🤷♂️