How We Broke Top AI Agent Benchmarks: And What Comes Next
Who would have tought… “The benchmarks aren’t measuring what you think they’re measuring” https://rdi.berkeley.edu/blog/trustworthy-benchmarks-cont/
Who would have tought… “The benchmarks aren’t measuring what you think they’re measuring” https://rdi.berkeley.edu/blog/trustworthy-benchmarks-cont/
Do we have any say in history? “The self you experience as yourself—the deliberate, choosing, narrating self—is not the primary phenomenon. It is a secondary formation, a surface effect produced by deeper processes. Your desires, your fears, your loves, your… Read More »Fate, Structure, Mimesis
And leads to strange results.
Unless you think it will solve itself.
“The most interesting thing about the Moon is us.”
Usually we deceive ourselves. “We are systematically deceived — not primarily by external temptation but by our own affections, which dress themselves as reasons.” https://tantaman.com/2026-04-06-jesuit-formation.html
I am always surprised how many people do not understand this. “Models do not (broadly speaking) learn over time. They can be tuned by their operators, or periodically rebuilt with new inputs or feedback from users and experts. Models also… Read More »The Future of Everything is Lies, I Guess
This is why virtue signaling is dumb. “virtus, channeled by the other virtues, leads to admirable deeds” https://acoup.blog/2024/03/29/fireside-friday-march-29-2024-on-roman-values/
More of this. “Christina Koch became the first woman to travel to the vicinity of the moon. The last time humans went to the moon, women could not have their own credit card” https://lizplank.substack.com/p/artemis-ii-is-competency-porn-and
Social media was a mistake. “People are living beings with beating hearts and live emotions. Social life has always been about engaging in the immediate physical presence of such beings. Social media avoids exactly that part, while allowing you to… Read More »Social Media is the Opposite of Social Life