Cloudflare Will Block AI Crawlers by Default on September 15: What Site Owners Need to Do Now
Cloudflare blocks AI crawlers by default on September 15, 2026. Should you block, allow, or charge via Pay Per Crawl? Decision matrix + verification steps.
Why this matters
On September 15, 2026, Cloudflare changes its defaults: Training and Agent crawlers will be blocked on ad-supported pages for all new sites and all existing free customers. Site owners can opt out, but the window is closing. A new "Pay Per Use" program lets publishers get compensated when their content actually surfaces in an AI answer. This post walks you through the three options, a decision matrix, and how to verify what your setup actually does today before the deadline hits.
Cloudflare Will Block AI Crawlers by Default on September 15: What Site Owners Need to Do Now
TL;DR: On September 15, 2026, Cloudflare will block AI training and agent crawlers by default on ad-supported pages for all new domains, new sites from existing customers, and all existing free-tier customers. Site owners can opt out of the block before the deadline. A new “Pay Per Use” program also lets publishers earn when their content shapes an AI answer, not just when it gets fetched. This post is the decision framework — block, allow, or charge — plus a step-by-step to verify what your crawler access looks like right now.
The problem lands whether you act or not
I audit AI crawler access for a living, and the single most common finding is not a bad configuration. It is a configuration nobody chose. A robots.txt that has never heard of GPTBot or ClaudeBot. Bot-protection settings inherited from a default that predates AI crawlers entirely. Site owners who assume they are open to AI, or closed to it, and have never once probed their own site with an AI user agent to check. When I built the crawler-access checks for my own audit tooling, I had to run them against my own domains first, and even there, the settings reflected whenever I had last touched the file rather than any actual decision.
That scenario is about to become much more common, at scale. Cloudflare, which sits in front of roughly 20% of the web, announced on July 1, 2026 that it is reclassifying all AI bot traffic into three distinct categories and changing what gets blocked by default. If you run a site with ads — a blog monetized with display ads, a media property, a publisher — and you are on Cloudflare’s free plan, your site will be subject to these new defaults automatically on September 15.
The two silent failure modes: (1) you do nothing and AI training crawlers get blocked, which may or may not matter to you depending on whether you were planning to charge them; (2) you do nothing and a legitimate AI agent crawler that users send to your site gets blocked, which actively breaks user workflows. Neither outcome is visible until someone complains or you check a log.
There is also a third path most site owners do not know exists yet: get paid.
What Cloudflare actually announced (the precise policy, not the headline)
The direct answer: Cloudflare is not blocking all AI bots. It is changing defaults for how three newly defined bot categories are handled on pages that display ads.
Cloudflare’s July 1, 2026 blog post defines three categories of AI traffic:
- Search: collects or indexes your content so it can answer questions about it later. Think Perplexity’s crawler, Google AI Overview’s indexer.
- Agent: acts in real time on a person’s behalf to accomplish a task right now. Think an AI assistant browsing your site for a user.
- Training: collects your content to train or fine-tune a model. Think the large-scale ingestion runs OpenAI and Anthropic run.
The new defaults, effective September 15 for new sites and all existing free customers:
| Bot category | Default on ad-supported pages (Sept 15) | Can opt out? |
|---|---|---|
| Search | Allowed | Yes |
| Agent | Blocked | Yes |
| Training | Blocked | Yes |
The nuance that most news coverage missed: multi-purpose crawlers — those that declare themselves as doing Search AND Training — are subject to their most restrictive declared behavior. A crawler that labels itself as a combined Search + Training bot will be blocked under Training rules, even if it also claims to index. Googlebot, Applebot, and BingBot are also affected if a site owner has opted to block Training, because they are classified as multi-purpose.
What “opt out” means here: site owners can visit their Cloudflare Security settings and change these defaults before September 15 (or after, for future-state changes). The change is per-site, not per-account.
Matthew Prince, Cloudflare’s CEO, put the stakes plainly: “Now that the majority of traffic on the Internet is non-human, we must go further and act faster so that a sustainable ecosystem can emerge.”
Cloudflare’s own data backs that framing: automated bots now represent more than half of all web traffic.
Why this matters more than robots.txt ever did
The direct answer: robots.txt is advisory. Cloudflare’s enforcement happens at the network layer, before a request reaches your server, and it applies uniformly regardless of whether a crawler respects the robots standard.
Robots.txt has a fundamental weakness: it depends on crawlers choosing to comply. The major search engines do. Many AI training crawlers do not, and there is no legal enforcement mechanism outside litigation. Cloudflare’s approach enforces access control at the infrastructure layer — the request never reaches your origin if Cloudflare blocks it.
This matters for two reasons. First, AI training crawlers have historically not been great robots.txt citizens. Several major AI companies were sued by publishers precisely because they continued crawling despite explicit disallow rules. Second, the new three-category classification gives site owners something robots.txt never provided: the ability to treat “a bot that is indexing my content for user-facing search retrieval” differently from “a bot that is using my content to train a proprietary model.”
That is a meaningful distinction for anyone who has content worth protecting.
The traffic and revenue context: what a randomized trial found
The direct answer: A field experiment published in April 2026 found that when Google’s AI Overviews appeared on a results page, outbound organic clicks to publisher sites fell by 39.8%, and zero-click searches rose by 34.5%.
The study by Saharsh Agarwal of the Indian School of Business and Ananya Sen of Carnegie Mellon University’s Heinz College is the most rigorous measurement yet of AI-answer-layer click cannibalization. It used a randomized design: 1,065 US Chrome users were randomly assigned through a custom browser extension to either see or not see AI Overviews over a two-week period in January–February 2026. The experiment observed 68,089 unique searches. Unlike prior traffic-drop analyses, which relied on before-and-after comparisons of analytics data, this design isolates the causal effect.
The numbers that matter for a site owner weighing the Cloudflare question:
| Metric | With AI Overview | Without AI Overview | Change |
|---|---|---|---|
| Outbound organic clicks per search | 0.37 | 0.62 | -39.8% |
| Zero-click search probability | 0.73 | 0.54 | +34.5% |
| Sponsored click rate | Flat | Flat | No change |
The implication: an AI company that ingests your content and answers queries from it does not deliver a proportional traffic return. The user’s need is satisfied by the answer. If you run a monetized site, that is a direct revenue impact from training data you provided for free.
This is the economic context behind Cloudflare’s decision to build a payment layer.
Pay Per Crawl vs Pay Per Use: what changed and whether it is worth your time
The direct answer: Pay Per Use is an evolution of Cloudflare’s earlier Pay Per Crawl program. Instead of charging AI companies per fetch, publishers are compensated when their content actually surfaces in an AI-generated answer. Cloudflare acts as the settlement layer; early payment partners include Ceramic.ai and You.com.
The original Pay Per Crawl charged for the retrieval event — the moment a crawler fetched a page. Pay Per Use moves the compensation event to the value creation moment: when a user gets an answer that drew on your content. Cloudflare’s term for the discipline of optimizing for this is Answer Engine Optimization (AEO).
How the mechanics work in practice: when a publisher opts into Pay Per Use with a participating AI partner (currently Ceramic.ai for AI search citations, You.com for agent-driven premium content purchases), Cloudflare tracks citation events and settles payment through the Pay Per Use billing layer.
The honest caveat: this ecosystem is early. Two commercial partners at launch is a thin network. The value of opting in depends entirely on whether your content is the kind of content these AI systems will cite — specific, factual, structured, high-confidence source material. Thin ad-supported news aggregation content is unlikely to be a citation target. Original research, technical guides, and primary data are.
I have measured this gap directly. In June I ran a citation-rate baseline on one of my own properties: across 36 brand-and-topic test prompts spanning four AI engines, the content was cited in roughly 6% of answers, and the spread between engines was stark (Claude cited it in about 18% of relevant answers; ChatGPT, 0%). That is the honest starting point for most sites. Pay Per Use compensation flows from exactly that citation event, so if your measured citation rate is near zero, opting into the program changes nothing until the content itself becomes citable: structured, factual, and specific enough for an answer engine to lean on.
Should you block AI crawlers? The block, allow, or charge decision matrix
This is the actual decision a site owner needs to make before September 15. The matrix below is not about what Cloudflare does by default — it is about what you should actively configure.
| Your site type | Recommended stance | Why |
|---|---|---|
| Ad-supported blog, content publisher | Block Training by default; evaluate Agent case-by-case | Training ingestion directly cannibalizes ad revenue; Agent access is legitimate user workflow |
| Technical documentation, dev tool | Allow Search + Agent; block Training | Your content benefits from AI discoverability; training ingestion without compensation is the only risk |
| Original research, proprietary data | Block all; opt into Pay Per Use when your niche partner exists | High-value content warrants payment; give nothing away for free |
| E-commerce, transactional site | Allow Agent; block Training | Agents help users complete tasks on your site (direct value); training ingestion is low-risk but asymmetric |
| Personal portfolio, low-traffic blog | Default block is fine; reconsider when Pay Per Use expands | Too small to matter today; set a calendar reminder to revisit when the partner network grows |
The step that most people skip: verifying that the settings they configure actually match the bot response codes they see. Cloudflare’s configuration and your robots.txt can conflict, and if a crawler respects robots.txt first but Cloudflare blocks at the network layer, the effective result is a block even if your Cloudflare dashboard says “allowed.”
How to verify your actual crawler access today (before September 15)
Step 1: Check your Cloudflare Security settings. Log into your Cloudflare dashboard, select your domain, go to Security > Bots. Under the AI Scrapers and Crawlers section you will see the current per-category settings (Search, Agent, Training) and whether ad-supported page overrides are configured. This is your ground truth for Cloudflare-layer enforcement.
Step 2: Check your robots.txt. Fetch https://yourdomain.com/robots.txt directly. Look for disallow rules that reference known AI crawler user agents: GPTBot, Google-Extended, CCBot, ClaudeBot, PerplexityBot, YouBot, anthropic-ai. If these are missing, your robots.txt is not blocking anything. If they are present, verify they are in the right user-agent block and not unintentionally broad.
Step 3: Run a bot simulation. Use Cloudflare’s Bot Management analytics (if on a paid plan) or check your server access logs filtered by known AI user agent strings. A practical command if you have log access:
grep -E "(GPTBot|ClaudeBot|PerplexityBot|Google-Extended|CCBot)" /var/log/nginx/access.log | tail -50 If you see these requests getting through with 200 responses on pages you expected to block, your Cloudflare configuration is not enforcing what you think.
Step 4: Verify your ad presence triggers the new default. The September 15 defaults apply to pages that “display ads.” This is detected by Cloudflare’s existing ad detection logic, which primarily looks for ad network tags (Google AdSense, Ezoic, Mediavine, etc.) in page HTML. If your monetization is through sponsorships or affiliate links rather than display ad tags, you may not be in scope for the automatic default change — but you can still configure the settings manually.
Want a faster baseline before you start configuring? The free scan at citability.dev runs the infrastructure checks above in about fifteen seconds — no account required. It tests your robots.txt for AI-specific user-agent rules, probes your site live with ClaudeBot, GPTBot, PerplexityBot, and Google-Extended to verify actual HTTP response codes, and measures crawl responsiveness. That is the same infrastructure layer Cloudflare’s new defaults operate on. The paid audit covers the layer below: whether your content actually surfaces in and shapes AI-generated answers, which is what Pay Per Use compensation ultimately flows from.
FAQ
What happens to my site on September 15 if I do nothing?
If you are a new Cloudflare customer, a new site added by an existing customer, or on Cloudflare’s free tier, and your site displays ads: Training and Agent crawlers will be blocked by default on those pages starting September 15. Search crawlers remain allowed. You do not need to do anything to get this protection, but you should verify it is working as you expect (see the verification steps above) and decide whether the Agent block is the right call for your site’s use case.
Does this affect Googlebot or Bingbot?
It can. Googlebot, Applebot, and BingBot are classified by Cloudflare as multi-purpose crawlers because they combine Search and Training behavior. If you or Cloudflare’s default has Training blocked, these crawlers are subject to the Training block rules. The practical implication: if you block Training broadly, you may accidentally block the major search engine crawlers. Check that your Search category is set to Allowed and that major search bots are not in a blanket block.
What is the difference between Pay Per Crawl and Pay Per Use?
Pay Per Crawl (the original program) compensated publishers per page fetch — the moment a crawler requested a URL. Pay Per Use, announced July 1, 2026, compensates publishers when their content creates value — specifically, when it surfaces in an AI-generated answer. You get paid for the output event, not the retrieval event. Current partners: Ceramic.ai (AI search citations) and You.com (on-demand premium content access by agents).
Should I block AI crawlers on my site?
It depends on what your content is worth and how it is monetized. For ad-supported publishers, blocking Training crawlers is the right default — the ISB/CMU study found that AI Overviews alone cut organic clicks by 39.8%, and you are absorbing that traffic loss while providing training data for free. For technical documentation or developer tools, allowing Search and Agent crawlers makes sense because discoverability in AI answers is distribution. The binary “block everything or allow everything” framing is wrong; the three-category system (Search, Agent, Training) exists precisely so you can be precise. The decision matrix earlier in this post maps site type to recommended stance.
How do I set my Cloudflare AI crawler options?
In your Cloudflare dashboard: select your domain > Security > Bots > AI Scrapers and Crawlers. You will see toggle controls for the Search, Agent, and Training categories. You can configure these globally or with overrides for pages that display ads. Changes take effect immediately. The deadline to configure before Cloudflare changes the defaults for free-tier sites is September 15, 2026.
What to do this week
The September 15 deadline is not a cliff — it is a configuration window. The actual work is twenty minutes if you have Cloudflare access:
- Log into Cloudflare and check your current bot settings. Know what you have before the default changes around you.
- Read your robots.txt. Confirm it matches your intent, particularly around known AI user agent strings.
- Decide your stance on Agent traffic. This is the most nuanced call — Agent bots can be legitimate user-workflow tools (an AI assistant a visitor sends to research your site) or scraper-equivalents. Your call should be based on whether you want your content accessible to AI-assisted users.
- If your content is high-value and structured, investigate Pay Per Use. The partner network is thin now, but this is where the economics of AI-era content creation are heading. Being early to the conversation costs nothing.
If you want the verification done for you, the free scan linked in the section above covers steps 1 through 3 in about fifteen seconds, and it leaves you with a record of what your access looked like before the September defaults landed.
The alternative — doing nothing and assuming the defaults are the right choice — is how sites end up with misconfigured access and no record of what changed or when.
Sources: Cloudflare blog, July 1 2026 · Cloudflare press release, July 1 2026 · The Next Web, July 2 2026 · Agarwal & Sen, SSRN, April 2026 (revised June 2026) · PPC Land coverage of the study
· Sources & further reading
Sources & Further Reading
Further reading
- Content Intent Signaling: The robots.txt Directive That Controls How AI Uses Your Content /blog/content-intent-signaling-robots-txt robots.txt controls access. Content Intent Signaling controls usage. Three new directives separate training from citation permission.
- How to Get Cited by ChatGPT: My 0/5 GEO Audit /blog/how-to-get-cited-by-chatgpt-geo-guide I ran my own site through 5 buyer-intent queries on Perplexity and got cited zero times. Here is what GEO actually is and the pattern the winners share.
- I Spent $10K on AEO and Got Zero AI Citations. Here Is the Audit Section That Would Have Caught Why. /blog/citability-section-5-off-site-authority-launch citability.dev now scores Wikipedia, Wikidata, and JSON-LD sameAs presence. Free, opt-in, under 10s. Part of the AVR Framework, see chudi.dev/framework.
- Schema.org for Answer Engines, the 40 Properties That Matter /blog/schema-org-answer-engines-guide A tactical guide to the Schema.org properties answer engines actually read. Which fields move citation decisions, which are noise, and how sub-DR-20 operators compress a full JSON-LD graph into the forty that matter.
- Entity Optimization for Brands in AI Search /blog/entity-optimization-brands-ai-search Rank is a single-page game. Entity coherence is the compounding game. How sub-DR-20 brands engineer a Person + Organization graph that AI search engines actually cite.
What do you think?
I post about this stuff on LinkedIn every day and the conversations there are great. If this post sparked a thought, I'd love to hear it.
Discuss on LinkedIn