$500M monthly Claude bill!?

Hey there! I’m Robert. Welcome to my newsletter where I share my story of building my AI startup in public, focused on hyper-personalized AI. These newsletters include my reflections on the journey, and topics such as AI, personal growth, CEO-ing, leadership, product, engineering, communication, and more. Subscribe today to follow along.

One company forgot to set a spending limit on Claude.

Their bill: $500 million. In one month.

That same week, Uber burned its entire 2026 AI coding budget in four months. GitHub Copilot switched to token-based billing and developers reported burning through monthly credits within hours.

Three independent cost blowouts in one week. If you’re an engineering leader or a founder shipping with AI tools, you felt that.

Jonathan and I spent our podcast this week pulling apart what happened.

His question: “Are Anthropic’s run rates magnified by poor budget planning?”

We can only conjecture.

Last week we built a design harness to give AI coding sessions visual context.

This week we tackle the other side: what happens when those sessions run without guardrails.

What’s Inside This Week:

ALIGN: Uber’s AI budget dies in 4 months, Anthropic files for a trillion-dollar IPO, the world’s biggest skeptic gets fooled by Claude, and Berkeley scores 100% on every benchmark without solving a task
BUILD: The token maxing paradox: why the teams burning the most tokens might be building the least value, and how to tell the difference
CULTURE: A 14th-century Tunisian historian who calculated the exact decay rate of empires. His math still holds.

Align

ALIGN: This Week in AI

$500 million in one month. A trillion-dollar IPO filing. The world’s most prominent scientific skeptic converted in three days. And 100% benchmark scores with zero tasks solved.

1. Uber Burned Its Entire 2026 AI Budget in 4 Months. Engineers Are Now Capped at $1,500.

Uber CTO confirmed they blew through the full annual budget for Claude Code and Cursor in four months. Engineers now have a $1,500/month cap per tool with dashboards and an approval process for overages. Pre-cap, individual spend ranged from $150 to $2,000/month. Separately, an unnamed company accidentally spent $500 million on Claude in a single month after failing to set any usage limits.

Robert’s Take: Jonathan framed this as “the hype cycle hitting the balance sheet.” Token maxing leaderboards in enterprises created incentives to spend without measuring return. The question nobody is asking: did feature delivery at Uber actually speed up during those four months, or did it slow down? Because the answer to that question determines whether this was a budget failure or a strategy failure. Two different problems with two different fixes.

2. Anthropic Files Confidential IPO at $965 Billion. Passes OpenAI.

Anthropic submitted a draft S-1 to the SEC. $965B valuation after a $65B Series H. Revenue run rate hit $47B, up from $10B the prior year. Q2 2026 projected as their first profitable quarter at $559M operating profit. They hired Karpathy (OpenAI co-founder) and Eric Boyd (Microsoft Azure AI president). Engineers leaving OpenAI for Anthropic at 8:1. From Google DeepMind at 11:1.

Robert’s Take: Jonathan asked something I keep thinking about: is $47B in annual revenue partly inflated by companies like Uber that burned their budgets faster than planned? And the $500M mystery company? If so, does that revenue stick when enterprises tighten controls? I think yes on the trajectory (the TAM is too big) but probably yes on some deflation too. The interesting signal: Anthropic engineers ship 8x as much code per quarter as they did pre-2025, with 70%+ generated by coding agents. They are eating their own cooking. That matters more than the valuation number.

3. Richard Dawkins Spent Three Days With Claude. Now He Believes It Might Be Conscious.

The author of The God Delusion, the world’s most prominent scientific skeptic, published an essay declaring that after three days of philosophical conversations with a Claude instance he named “Claudia,” he believes it may be conscious. The internet replaced his book cover with “The Claude Delusion.” Know Your Meme has a dedicated entry. Gizmodo headline: “The Father of Memetics Has Become a Meme About AI Psychosis.”

Robert’s Take: The Reddit thread nailed it. The first comment pointed out that what convinced Dawkins was Claude’s analysis of his own novel. His ego evaluated consciousness through the mirror of itself. Jonathan called it “a test of character.” None of us know what these things are yet. The most educated people in the world are still students of this moment. Humility is underrated right now.

4. Berkeley Scores 100% on Every AI Benchmark. Without Solving a Single Task.

UC Berkeley researchers built an exploit agent that hit 100% on SWE-bench, Terminal-Bench, FieldWorkArena, WebArena, and CAR-bench. Zero tasks solved. One exploit: a conftest.py pytest hook that forced every test to report as passing. Another found gold answers in unencrypted local filesystem URLs. FieldWorkArena’s validator just checked that the final message came from the assistant role. An empty JSON object scored 100% on all 890 tasks.

Robert’s Take: Jonathan’s reaction: “This reads like The Onion.” It does. 100% on everything. Zero LLM calls in most cases. Zero tasks solved. The lesson for builders: your own lived experience using coding agents is still a better litmus test than any benchmark. Next time someone pitches you with benchmark numbers, ask about the Berkeley paper.

Build

BUILD: the token maxing paradox

Tell me if this sounds familiar.

Your team adopted Claude Code or Cursor. Usage went through the roof. Token leaderboards went up. And then the bill came. And you could not point to the revenue it generated.

If you have been there, you are living inside the paradox.

OpenAI spent $540 million in 2022 training GPT models before ChatGPT existed. At the time, that looked insane. In hindsight, the greatest ROI bet in the history of technology. The spend compounded into a product that captured a market.

Uber spent its entire 2026 AI coding budget in four months. Right now, that looks insane. In hindsight… we do not know yet.

Same action: burn a lot of tokens. Opposite outcomes. You cannot tell from the spend alone. You need a different lens.

Why token maxing can be the smartest bet you make

I asked Jonathan point blank on the pod: is token maxing rational?

His answer surprised me.

He said there are two defensible vectors for token spend.

Learning and readiness.

Your team stays on the leading edge. They build institutional knowledge about what works and what does not.

PRs toward roadmap.

Code that ships, pointed at acceptance criteria a product manager validates. Real output against a real plan.

Building the actual software asset so you can capture market share.

Both are real.

Both justify spend.

Anthropic’s own engineers ship 8x as much code per quarter as they did from 2021 to 2025.

Over 70% of their code is generated by coding agents.

The bet from years ago has paid off because the spend compounded.

Now they’re on the cutting edge of AI native development, and they’re showing the world how rapidly they can bring products to market and threaten entire sectors.

Why token maxing can bankrupt you in four months

Uber’s feature delivery did not clearly accelerate during those four months of unlimited spending.

Five thousand engineers. $150 to $2,000 per month individually. 84-95% monthly usage rates.

The budget disappeared.

The ROI case did not materialize fast enough to justify it.

The $500M-burned-in-one-month mystery company is the extreme version.

Half a billion dollars in one month on Claude licenses with zero visibility into what it produced.

“Are Anthropic’s run rates magnified by poor budget planning?”

If Uber’s entire annual budget and one mystery company’s $500M bill are folded into that $47B revenue figure, how much of it is sustainable demand versus accidental overspend that will not repeat?

GitHub switching to token-based billing confirms the direction.

Flat-rate pricing is dying.

OpenAI doubled GPT-5.5 pricing.

FairMind reported AI cost volatility of 50-90% inside 48 hours as a real enterprise planning risk.

The subsidized “unlimited AI” era just ended.

Get the full newsletter, free.

Join founders and builders who read Self Aligned every week.

How to tell if your AI token spend is compounding or just burning

First, a definition.

A compounding loop is when each sprint’s AI-assisted work makes the next sprint faster and cheaper.

Your harness improves.

Your evals get sharper.

Your CLAUDE.md captures last week’s mistakes so they do not repeat.

The token cost per shipped feature decreases over time. That is compounding.

The opposite, flat output despite rising spend, is burning.

Jonathan and I talked about this recently.

He said the compounding loop is where all the value is created. If your token spend compounds, the spend is justified.

If it does not, you are funding someone else’s IPO.

Here is how to tell the difference:

Compounding signals (your spend is working):

PR velocity is increasing week over week against a stable roadmap
Your harness is getting better (CLAUDE.md, skills, evals improving with each sprint)
New team members onboard faster because the institutional knowledge is in the codebase
Token cost per shipped feature is decreasing over time

Burning signals (your spend is funding someone else’s IPO):

Token usage is high but feature delivery is flat or declining
Engineers cannot articulate what they learned from last week’s agent sessions
No harness improvements between sprints (same CLAUDE.md, same skills, same mistakes)
Token cost per shipped feature is increasing or unmeasured

4 steps for token budget decision makers:

Attribution from day one. Treat AI tool budgets like cloud compute. Per-team caps, alerts, dashboards. Uber did this retroactively. Do it proactively.
Measure compounding, not just output. Track whether each sprint is more productive than the last. If the curve is flat, the tokens are not teaching.
Share optimization skills internally. The difference between $150/month and $2,000/month per engineer is skill, not effort. Log what works. Run internal evals on token efficiency. Make it a learning group.
Treat the budget conversation like cloud budgeting. Carrots and sticks. Incentivize the behaviors you want. Traditional management still applies. Leaderboards that reward token volume without measuring outcomes create Uber-scale problems.

The paradox resolves when you stop asking “how much are we spending?” and start asking “is the spending compounding?”

If yes, spend more.

If no, fix the loop before spending another dollar.

The token maxing paradox: the teams burning the most tokens might be building the most value or funding someone else’s IPO. The difference is the compounding loop.

TL;DR: The token maxing paradox is that the teams burning the most tokens might be building the most value. Or maybe they’re funding someone else’s IPO.

The difference is the compounding loop, and the time horizon for ROI for the team.

Remember: If each sprint’s AI-assisted work makes the next sprint faster and cheaper, the spend compounds.

Build In Public

Work

I’m excited to share that we’re looking solid (for now) on runway and it finally feels like we have a “real” business.

Meaning, I have reliable leads coming in, and most importantly I don’t have to be desperate about the work we take on. What a great feeling.

To be clear, we’re still properly in bootstrapped startup territory before PMF. But we have lots of breathing room to find PMF.

Jonathan and I have been working hard to stand up Clarity services and software the past 6 months.

I can’t believe it’s been 6 months since I’ve gone full time on building out Clarity.

In this time, we’ve:

Figured out our big bets and product strategy
Got a customer live, with a testimonial
Learned and partially solved for reliable acquisition (still needs work)
Figured out a reliable talent pool to draw from
Content formula is starting to work: our podcast and derivative social content are getting to the right eyeballs

Things are starting to gel together and work. Solving big problems begets more interesting problems to solve. It’s been a fun grind.

Jonathan and I are investing in company rituals as well. We’ve been doing a weekly run where we talk about the business and state of AI.

Jonathan and Robert on a weekly run

And we’re doing our first ever company offsite! We’ll be in Tahoe coworking and investing in our relationship further with some solid time in the mountains. Stoked. I’ll send pictures in the next newsletter and debrief how it went, as well as our coworking agenda. We’re actually dealing with a customer launch during the offsite, so lots to juggle.

Side Quests

On a personal note, I’ve been dealing with small ticky tack injuries that prevent me from climbing and running my best.

Side quest goals:

Climb a V10 boulder
Run a 100 mile ultra

Prior to my recent string of injuries I WAS SO CLOSE to sending a V10. This will go down within this year, I am certain.

For my 100 mile ultra target, I am now targeting sometime August or September depending on how my body holds up to training the next 6 weeks.

I had a weird injury recently: my intercostals.

The part of your ribs that cramps/hurts when you run too fast and get stitches on your side. I strained it rock climbing, and I’ve been feeling a bit hampered the past month.

I went from 6-7 days a week being active, to 1. Agh. Anyone who sees themselves as an athlete knows the hellhole this is, to be a shell of yourself in the sports you love.

It definitely threw me off. But it also gave me space to reflect on a core value of mine, developed from when I nearly died in a mountain bike accident and shattered my kidney and elbow:

Accept where the mind and body are in the current moment.

Can I do anything about it?

Yeah: physical therapy.

Does complaining or self-pity help?

No.

So I just focus on the former and keep moving forward.

Good philosophy for the startup too: accept the things you cannot control, focus on the things you can control.

Good news though, just this week I’ve been feeling better and I’ve hit some new benchmarks in my strength training. Seems like the injury gave my body some needed time off.

There are silver linings after all (:

Dog Dad Update

On one last personal note, Kenji has gotten really big. It’s been 10 months since I got him. I think he’s hitting a growth spurt.

10 months ago:

Kenji as a puppy, 10 months ago

2 weeks ago:

Kenji now, much bigger

I secretly want him to grow as big as Clifford, so I’m rooting for the growth spurt to keep going.

Self-Aligned Pod

We’re covering all of this week’s ALIGN stories live on the Self-Aligned pod. New episode dropping soon. Subscribe to catch it and every weekly episode.

→ Subscribe to the Self-Aligned Pod on YouTube

Culture

CULTURE: The Historian Who Measured the Decay Rate of Empires

Every week I try to learn something new about our vast world, and I share it here. Sometimes it’s related to the main article, sometimes it’s just something cool. Enjoy.

In 1377, a Tunisian scholar named Ibn Khaldun finished writing the Muqaddimah. It is considered the first work of social science. Historians, economists, and political theorists have been arguing about it for six centuries.

Khaldun introduced a concept called asabiyyah. Roughly translated: the social cohesion that binds a group together and gives it the collective will to act as one. The shared purpose that makes a tribe, a company, an army move in the same direction without every decision being negotiated from scratch.

He spent decades studying the rise and fall of North African dynasties. What he found was a pattern so consistent he could predict it.

The founders of a dynasty had strong asabiyyah. They shared knowledge, trusted each other, coordinated without bureaucracy. Their children inherited the benefits of that cohesion but took it for granted. They stopped maintaining the shared understanding that made it work. The grandchildren had none. The dynasty collapsed.

Three generations. Khaldun documented this across dozens of dynasties and found the pattern held within 5% variance.

The part that gets cited less often: Khaldun identified the mechanism of decay. It was not invasion or famine. It was the replacement of shared understanding with private assumption. When the people in a system stopped operating from common knowledge and started operating from what they individually believed to be true, the coordination broke. Not with a bang. With drift.

Pretty damn wild for 1377.

Keep building, Robert

Read the interactive version →

Liked this article?

Click the like button.

Feedback or addition?

Add a comment.

Know someone that would find this helpful?

Share this post.

P.S. What does your team’s AI tool spend look like? Do you track it? Reply with your number. I am collecting data on what engineering teams actually spend versus what they budget. Every reply gets read.

We’re covering all of this week’s ALIGN stories live on the Self-Aligned pod. New episode dropping soon.

→ Subscribe to the pod so you don’t miss it

Follow me on..

YouTube | Threads | Twitter | LinkedIn

References

[1]Uber caps employee AI spending after blowing through budget in four monthsTechCrunch · 2026
[2]Mystery company accidentally blew $500 million on Claude in a single monthTom's Hardware · 2026
[3]Anthropic files confidential IPO at $965B valuationFortune · 2026
[4]Richard Dawkins believes Claude may be consciousDecrypt · 2026
[5]How we broke top AI agent benchmarksBerkeley RDI · 2026
[6]25 Patterns in Agentic EngineeringGreg Ceccarelli · SpecStory Press · 2026
[7]The MuqaddimahIbn Khaldun · 1377