Skip to content

Cloudflare launched a crawler. The internet lost its mind. The internet is wrong.

Clock icon 5 minutes reading time

The /crawl endpoint respects every rule the "bad" bots ignore, and the outrage cycle hasn't bothered to check.

A viral LinkedIn post captured the mood in four lines: be Cloudflare, spend years protecting sites from crawlers, launch /crawl, become the crawler. The comment threads filled with accusations of protection rackets and mob tactics. "Selling the wall and the ladder," someone quipped. Upvotes poured in.

I've spent years building on Cloudflare's Developer Platform. I've written a book about it. I have a commercial relationship with the company through my employer, CDS, and I'm upfront about that. But what drew me to the platform was technical merit, not a sales quota, and what's drawing me to this argument is that the criticism, however satisfying it feels, collapses the moment you read the documentation.

The outrage rests on an irresistible analogy: the locksmith is selling lock picks. If true, it would be genuinely scandalous. A company routing roughly a fifth of global web traffic simultaneously selling protection from bots and the means to defeat that protection would be a conflict of interest so spectacular it would deserve every pitchfork aimed at it.

It isn't true.

What the endpoint actually is

Strip away the hot takes and /crawl is a managed headless browser. You give it a URL. It spins up a Chrome instance on Cloudflare's edge network, renders the page including JavaScript, follows links to a configurable depth, and hands back the content as HTML, Markdown, or structured JSON. The whole thing runs asynchronously: submit a request, get a job ID, check back later. For anyone who has nursed a fleet of Puppeteer scripts through 3am crashes and memory leaks, this is plumbing, not a weapon.

It sits on top of the Browser Rendering API that Cloudflare has offered for some time, packaged into a single call. Available on both free and paid Workers plans. Five dollars a month gets you the paid tier.

Now, here is the part that everyone quoting the LinkedIn post apparently skipped.

The crawler announces itself as CloudflareBrowserRenderingCrawler/1.0. It is a signed agent, meaning its identity is cryptographically verifiable. It reads and obeys robots.txt, including crawl-delay directives. It respects Cloudflare's own AI Crawl Control settings. It does not bypass CAPTCHAs. It does not bypass Cloudflare's Bot Management. It does not bypass the Web Application Firewall. It does not bypass Turnstile challenges. If you have flipped the one-click "Block AI Scrapers and Crawlers" toggle that Cloudflare offers on every plan including the free tier, this crawler gets stopped at the door like any other bot.

Kenton Varda, a Cloudflare engineer, responded directly to the criticism online: you can block it via robots.txt, you do not need Cloudflare's bot protection to do so, and this is not a malicious bot. Mark Dembo from Cloudflare's developer relations team posted a similar clarification on the original LinkedIn thread, pointing to a public statement on X confirming that the crawler identifies as a bot, respects site owner preferences, and does not bypass protections.

The locksmith is not selling lock picks. The locksmith has built a delivery van that stops at red lights.

The bots nobody is angry about

What makes the outrage especially misplaced is the contrast with what is actually happening on the open web. AI companies have spent the past two years deploying crawlers that spoof their user agents, pretend to be Chrome, ignore robots.txt entirely, and hammer origin servers so aggressively they take sites offline. Cloudflare's own research from its 2024 "AIndependence" announcement documented bots deliberately disguising themselves as regular browsers to evade detection. Shared hosting providers have reported entire server infrastructure going down because a single company's crawlers hit hundreds of virtual hosts simultaneously, each one politely obeying the per-domain rate limit while collectively overwhelming the machine.

One shared hosting operator described months of disruption from exactly this pattern. The crawlers respected each individual site's robots.txt. They just happened to be crawling thousands of sites on the same physical server at the same time. The server didn't care about the politeness of each individual request. It cared that it was drowning.

Against that backdrop, Cloudflare releasing a crawler that honestly identifies itself, obeys every directive, and can be blocked with a single line in a text file is not the scandal. It is closer to a worked example of how crawling should be done.

The ecosystem the critics are ignoring

The /crawl endpoint did not appear in a vacuum. Over the past year Cloudflare has built a remarkably coherent set of tools around the relationship between websites and AI, and the crawler only makes sense when you see the full picture.

In July 2025, Cloudflare began blocking AI crawlers by default for all new sites on its network. It launched AI Crawl Control, giving site owners granular dashboards showing exactly which bots visit, how often, and whether they respect stated preferences. It built a robots.txt monitoring tool that flags crawlers violating directives. And it launched Pay Per Crawl, a private beta marketplace where publishers can charge AI companies for content access on a per-request basis, dusting off the long-dormant HTTP 402 "Payment Required" status code to do it.

The publishers paying attention have noticed. Conde Nast, The Atlantic, TIME, Gannett, BuzzFeed, Universal Music Group, and Stack Overflow have all backed the initiative. Under Pay Per Crawl, a site owner gets three choices for any given crawler: allow it for free, charge it, or block it. Cloudflare handles the billing, the identity verification, and the enforcement.

Seen in this context, /crawl is not a contradiction. It is the demand side of a marketplace whose supply side Cloudflare spent the previous year constructing. Publishers get tools to control and monetise access to their content. Developers get a compliant, well-behaved way to access public content that respects those controls. The two halves interlock.

If anything, /crawl makes the Pay Per Crawl model more viable. A managed crawling service that enforces robots.txt and responds to HTTP 402 creates exactly the kind of well-behaved consumer that publishers can negotiate with. The alternative, thousands of independent scrapers running custom code with varying degrees of compliance, is harder for everyone.

Where the real scepticism should land

None of this means Cloudflare deserves uncritical admiration. There are legitimate questions about what it means for one company to sit between so much of the internet's traffic, acting simultaneously as the proxy, the bot detector, the crawler, and the payment processor. Concentration of that kind warrants scrutiny regardless of how well any individual product behaves. When Cloudflare last went down, customers couldn't even access the dashboard to disable the proxy. That is a real structural concern.

But structural concerns about market power are a different conversation from accusing a specific product of hypocrisy. The /crawl endpoint is not hypocritical. It is a crawler that does what the bad crawlers refuse to do: announce itself, obey the rules, and leave when asked. The critics pattern-matched to a familiar narrative, big company does contradictory-sounding thing, and stopped reading before they reached the part where the contradiction dissolves.

A company that makes locks is allowed to also make doors. The question worth asking is whether the doors respect the locks. Read the documentation. They do.

The internet's outrage would be better spent on the crawlers that don't announce themselves, don't respect robots.txt, and don't appear on any changelog. There are plenty of them. They just don't make for a good LinkedIn post.

Want more content like this? Sign up to our Newsletter for the latest insight and updates!