I Audited 7 Websites for AI Citability. Here Is What Actually Predicts Citations.

Chudi Nnorukam • Apr 7, 2026 • 8 min read

Audit data from 7 websites shows domain authority does not predict AI citations. DA-10 sites outperform DA-92 sites. Here is what actually matters.

TL;DR

I ran AI visibility audits on 7 websites including Ahrefs (DA 92), Reddit, Medium, and my own sites. Domain authority had zero correlation with AI citation rates. citability.dev (DA under 10) achieved 15% citation rate, outperforming Ahrefs at 5%. The three strongest predictors were answer-first content, dateModified schema, and original data.

Key Takeaways:

Domain authority does not predict AI citations. Ahrefs (DA 92) is 100% AI-visible but only 5% AI-cited.
Reddit, Medium, and X all failed basic AI infrastructure checks despite massive traffic.
Pages with dateModified schema receive 1.8x more AI citations than pages without.
Only 12% of URLs cited by LLMs appear in Google top 10. AI citation is a different game than SEO.
Answer-first content structure is the single highest-impact factor for getting cited by AI.

Why this matters

Domain authority does not predict whether AI will cite your website. I audited 7 websites for AI citability, and the results challenge nearly everything the SEO industry assumes about AI search visibility.

Ahrefs (DA 92) was cited by AI only 5% of the time despite 100% visibility. A brand-new site with DA under 10 achieved a 15% citation rate. Sites with millions of daily visitors failed basic infrastructure checks. The factors that actually predicted citations had nothing to do with backlinks or traffic.

Here is what the data showed.

TL;DR

For the definition and the full framework, read the pillar: What Is AI Citability? The Five-Pillar Framework. This post is the empirical companion: audit data from 7 sites that shows what actually predicts citation rate.

Domain authority has zero correlation with AI citation rates
Ahrefs (DA 92) is 100% AI-visible but only 5% cited
citability.dev (DA under 10) achieved 15% citation rate, outperforming DA 90+ sites
Reddit, Medium, and X all failed basic AI infrastructure checks
The three strongest predictors: answer-first content, dateModified schema, original data
Only 12% of URLs cited by LLMs appear in Google’s top 10 results

The Audit: 7 Sites, 3 AI Platforms, 10 Infrastructure Checks

I used the AI Visibility Readiness (AVR) framework to run infrastructure audits on 7 websites. Each site was checked for 10 signals that AI crawlers use to discover and parse content: robots.txt, sitemap.xml, answer-first content, content freshness, structured data (JSON-LD), meta descriptions, canonical URLs, HTTPS, heading hierarchy, and social sharing readiness.

Then I queried ChatGPT, Perplexity, and Claude with questions each site should be able to answer. I tracked two metrics:

AI Visibility: Does the AI mention the brand when asked?
AI Citability: Does the AI include a URL from the site as a cited source?

The Results

Site	Domain Authority	AI Infrastructure	AI Visibility	AI Citability
ahrefs.com	92	Foundation-ready	100%	5%
semrush.com	91	Foundation-ready	Partial	Partial
chudi.dev	28	Foundation-strong	25%	0%
citability.dev	Under 10	Foundation-strong	44%	15%
reddit.com	97	Not ready	Untested	Untested
medium.com	95	Not ready	Untested	Untested
x.com	96	Not ready	Untested	Untested

The three highest-DA sites (Reddit 97, X 96, Medium 95) all failed basic infrastructure readiness. They are missing structured data, answer-first content, or proper AI crawler permissions. These sites get cited constantly by AI, but not because of their infrastructure. They get cited because AI training data includes their content at massive scale.

The most striking result: citability.dev, a site with DA under 10 and fewer than 100 backlinks, achieved a 15% citation rate. That is 3x higher than Ahrefs (DA 92). The difference is not authority. The difference is original benchmark data and answer-first content structure.

For everyone else, infrastructure is the gate.

Does High Domain Authority Mean AI Will Cite You?

No. The data is clear: DA has zero predictive power for AI citations.

Ahrefs has a DA of 92, one of the highest in the SEO industry. Every AI platform recognizes the brand instantly. Ask ChatGPT “what is Ahrefs?” and you get a detailed, accurate answer. That is 100% AI visibility.

But ask ChatGPT “what tools should I use for keyword research?” and Ahrefs gets mentioned but rarely linked. The AI knows the brand exists. It does not need to cite the source. That is the visibility-citation gap, and it exists because AI systems already have the information internalized from training data.

Citation happens when AI needs your content as a source for a specific claim. That requires your content to be structured in a way the AI can extract and attribute.

What Infrastructure Do AI Crawlers Actually Need?

The 10-check audit revealed a clear pattern. Sites that passed 8+ infrastructure checks had measurably higher visibility scores. Sites that failed basic checks were invisible regardless of their authority.

The Baseline Signals

robots.txt and sitemap.xml are table stakes. Every site in the audit had these, but the content of each matters. Reddit’s robots.txt blocks several AI crawlers. Medium’s sitemap is auto-generated but does not include all content pages. Simply having the files is not enough.

HTTPS and canonical URLs are similarly baseline. Every audited site passed these. They are necessary but not differentiating.

The Differentiating Signals

Three signals separated the visible sites from the invisible ones:

Answer-first content. Pages that led with a direct answer in the first 100 words scored dramatically higher on AI extractability. This matches research showing AI systems extract the first clear, unqualified statement they find on a page. Generic marketing copy, hero images, and navigation-heavy layouts all push the answer down, making it harder for AI to extract.

Structured data (JSON-LD). Sites with Article, FAQPage, and HowTo schema gave AI systems explicit context about content purpose and structure. The chudi.dev audit showed 9 schema types across pages, including TechArticle with dateModified, FAQPage with 5+ questions per article, and Person schema with expertise signals. This machine-readable layer is what lets AI systems understand your content without parsing ambiguous HTML.

Content freshness. Pages with dateModified in their schema received 1.8x more AI citations than pages without, according to Semrush research. This aligns with another finding: 95% of ChatGPT citations come from recently published or updated content. Stale content without date signals gets deprioritized.

Which Sites Get Cited vs Just Mentioned?

The gap between being mentioned and being cited is the central problem in AI visibility.

Platform-Specific Citation Behavior

Each AI platform has different citation preferences:

Perplexity cites approximately 6.6 sources per answer and heavily indexes Reddit (46.7% of its top cited sources)
ChatGPT cites only about 2.6 sources per answer and shows strong Wikipedia preference (7.8% of all citations)
Google Gemini cites about 6.1 sources per answer with 76% overlap with Google’s traditional top 10

This means the optimization strategy differs by platform. Perplexity rewards breadth of presence across forums and communities. ChatGPT rewards being on established reference sources. Google AI Overviews still correlates heavily with traditional SEO rankings.

The 12% Divergence

Only 12% of URLs cited by LLMs appear in Google’s top 10 search results for the same queries. This is the statistic that should reframe how you think about AI search: ranking on Google and getting cited by AI are largely separate problems.

The exceptions are Google AI Overviews, which show 76% overlap with traditional rankings. But ChatGPT and Perplexity operate on fundamentally different source selection algorithms.

The Three Factors That Actually Predict AI Citations

Based on the audit data and corroborating research, three factors had the strongest predictive power:

1. Answer-First Content Structure

Pages where the direct answer appears in the first 100 words get extracted more often. This means:

Lead with the answer, not the question
Keep opening paragraphs to 25-40 words
Use clear, factual statements without qualifying language
Structure H2 headings as questions the reader would ask AI

The qualifying language point is critical. Phrases like “it depends,” “in many cases,” or “it can be argued” signal uncertainty. AI systems prefer definitive statements they can extract as answers.

2. dateModified Schema with Substantive Updates

The 1.8x citation lift from dateModified schema is real, but only when paired with actual content updates. Google penalizes fake freshness signals, meaning you cannot just bump the date without changing anything. The safe approach:

Update content quarterly with new data and statistics
Add at least 100 words of substantive new content per refresh
Reference current-year sources and data points
Only update dateModified when the refresh is genuine

3. Inline Statistics and Original Data

Pages with inline statistics get 40%+ more AI citations. This makes sense: AI systems need claims they can attribute, and specific numbers are the easiest claims to attribute to a source.

Original data is even more powerful. If your page contains data that does not exist elsewhere, AI has no choice but to cite you when referencing it. This is why I publish audit results and benchmark data publicly. The comparison table at the top of this article is data that exists nowhere else.

What This Means for Your Site

The path from invisible to cited is not about building more backlinks or increasing your DA. It is about making your content technically extractable by AI systems.

The checklist is short:

Check your infrastructure. Run a free scan to verify the 10 baseline signals.
Restructure your content. Lead with answers. Use question-based headings. Add FAQ and HowTo schema.
Publish original data. Give AI systems something they can only get from you.
Keep content fresh. Update quarterly with substantive changes and current statistics.
Test across platforms. Query ChatGPT, Perplexity, and Claude with questions your site should answer. Track citation rates over time.

The sites that get cited in 2026 will not be the ones with the highest DA. They will be the ones whose content is structured so AI systems can extract, trust, and attribute it.

FAQ

Does high domain authority help with AI citations?

No. In our audit of 6 websites, domain authority showed zero correlation with AI citation rates. Ahrefs (DA 92) was 100% AI-visible but only 5% cited. Sites with lower DA but better content structure can outperform high-DA sites in AI citations.

What is the difference between AI visibility and AI citability?

AI visibility means the AI recognizes your brand; AI citability is the higher bar where the AI includes your URL as a source in its response. For the full framework including the five pillars and measurement methodology, see the pillar post at https://citability.dev/blog/what-is-ai-citability. This audit shows what the data actually predicts about that higher bar.

Which AI platforms cite the most sources per answer?

Perplexity cites approximately 6.6 sources per answer, Google Gemini cites about 6.1, and ChatGPT cites only about 2.6. This means Perplexity gives you more opportunities to be cited per query, while ChatGPT is more selective about its sources.

How do I check if AI can cite my website?

Run a free infrastructure scan at citability.dev to check 10 technical readiness signals. Then manually query ChatGPT, Perplexity, and Claude with questions your site should answer. Check whether your URL appears as a cited source in the AI responses.

What content structure gets the most AI citations?

Answer-first structure where the direct answer appears in the first 100 words, followed by supporting evidence. Use question-based H2 headings, keep paragraphs to 25-40 words, include inline statistics, and add FAQ and HowTo structured data markup.

Sources & Further Reading

Sources

Evergreen Media: Answer Engine Optimization Evergreen Media article Primary source for platform-specific citation data including Perplexity, ChatGPT, and Gemini citation volumes and source preferences.
Semrush: Answer Engine Optimization Guide Semrush guide Source for the 1.8x citation lift from dateModified schema and the 40% citation increase from inline statistics data points.
AI Visibility Readiness Framework citability.dev doc Open-source framework used to run the infrastructure audits on all 6 websites. Defines the 10 automated checks referenced in this article.

What do you think?

I post about this stuff on LinkedIn every day and the conversations there are great. If this post sparked a thought, I'd love to hear it.

Discuss on LinkedIn

I Audited 7 Websites for AI Citability. Here Is What Actually Predicts Citations.

Why this matters

TL;DR

The Audit: 7 Sites, 3 AI Platforms, 10 Infrastructure Checks

The Results

Does High Domain Authority Mean AI Will Cite You?

What Infrastructure Do AI Crawlers Actually Need?

The Baseline Signals

The Differentiating Signals

Which Sites Get Cited vs Just Mentioned?

Platform-Specific Citation Behavior

The 12% Divergence

The Three Factors That Actually Predict AI Citations

1. Answer-First Content Structure

2. dateModified Schema with Substantive Updates

3. Inline Statistics and Original Data

What This Means for Your Site

FAQ

Sources & Further Reading

Sources

Further Reading

What do you think?

How to Structure Content So AI Actually Cites Your URL

How to Get Perplexity and ChatGPT to Cite Your Website

Answer Engine Optimization: 6 Factors That Decide If AI Cites You