Chrome Prompt API: AI Visibility Scans at $0 Cost
Chrome 148 put the Prompt API and Gemini Nano in the browser. Here's what our citability.dev scan really costs, and which parts move client-side for $0.
Why this matters
Chrome 148 put the Prompt API and Gemini Nano in the browser. Here's what our citability.dev scan really costs, and which parts move client-side for $0.
We run citability.dev. It scans a site and tells you whether AI search engines can find, parse, and cite it. The free scan checks the infrastructure layer: robots.txt rules, structured data, heading hierarchy, Open Graph tags, sitemap presence. People assume that scan burns API tokens. It does not. Those checks are deterministic parsing, regex and DOM walks, so they already cost us $0 in model spend. The part that costs money is the judgment layer: asking a model “would you cite this page, and for what query.” That runs about $0.60 per scan on the server. Chrome 148 shipped the Prompt API to web pages with Gemini Nano built in, and the honest question it raises is not “can the free scan be free” (it already is) but “can the $0.60 judgment layer move into the browser for $0.” This post is the real math, with the numbers labeled.
What did Chrome 148 actually ship?
Chrome 148 reached stable in May 2026. The headline for anyone building web tools: the Prompt API, which was locked to extensions until Chrome 138, is now exposed to regular web pages. That means a website can call a local language model with LanguageModel.create() and run inference on the user’s own machine, no server round-trip, no per-token bill.
Two models sit behind it. Gemini Nano is the general on-device model. Gemma 197M is a smaller expert variant Google built for narrow, repeatable tasks like summarization and structured synthesis. Alongside the Prompt API, Chrome 148 also made the Summarizer, Translator, and Language Detector APIs stable. The Writer, Rewriter, and Proofreader APIs are still in origin trial, so I am not counting on them yet.
The catch is hardware. The model is a roughly 4GB download that installs silently the first time it is needed, and Google lists the requirements as Windows 10/11, macOS 13+, Linux or ChromeOS, with at least 22GB of free disk and either 16GB of RAM or a GPU with 4GB of VRAM. That is a real gate. Not every visitor clears it. I will come back to why that matters for a scanning tool specifically.
How much does an AI visibility scan really cost to run?
Here is the part most “AI audit” tools will not show you, because it makes the free tier look less impressive: the cheap parts were never the expensive parts.
A citability scan has two layers. The infrastructure layer is mechanical. Fetch robots.txt, parse it, check whether GPTBot and ClaudeBot and the other AI crawlers are allowed. Pull the HTML, walk the DOM, confirm the JSON-LD validates and the heading order is sane and the Open Graph image exists. None of that needs a language model. It is the same class of work a linter does. So it costs effectively nothing today, server or client.
The judgment layer is different. To answer “does AI actually cite this site,” you have to ask a model real questions and read what it says. That is the visibility test (does the model know the brand) and the citation test (does the model link to the page for a relevant query). Each of those runs about $0.60 in API spend on our server. The full four-section audit runs about $2. Those are the numbers that scale with users, and those are the numbers Chrome’s Prompt API could in theory zero out.
| Check | Layer | Needs a model? | Server cost today | Client-side via Prompt API |
|---|---|---|---|---|
| robots.txt + AI crawler rules | Infrastructure | No | $0 | $0 (was already $0) |
| Structured data / JSON-LD validation | Infrastructure | No | $0 | $0 (was already $0) |
| Heading hierarchy + OG tags | Infrastructure | No | $0 | $0 (was already $0) |
| Sitemap presence + freshness | Infrastructure | No | $0 | $0 (was already $0) |
| Brand visibility test | Judgment | Yes | ~$0.60 | $0 (projected) |
| Citation / link test | Judgment | Yes | ~$0.60 | $0 (projected) |
| Full 4-section audit | Both | Yes | ~$2.00 | partial: judgment parts projected $0 |
Two things to read off that table. First, anyone selling you a “free AI infrastructure scan” as if it were a generous gift is selling you regex. It costs them nothing because it always cost nothing. Second, the genuinely interesting line is the judgment layer, and that is exactly where on-device inference changes the unit economics. The server numbers are real and current. The client-side column is labeled projected because we have not shipped it yet, and I am not going to pretend a benchmark I have not run.
Why move the judgment layer into the browser at all?
Run the arithmetic at scale. If a free tool offers one citation test per visitor and ten thousand people use it in a month, that is roughly $6,000 in model spend on a feature that earns nothing directly. That is the quiet reason most free AI audit tools either cap the free tier hard or quietly degrade it to the infrastructure checks that were already free. The expensive layer is the one people actually want, and it is the one that bleeds.
On-device inference flips that. The compute happens on the visitor’s machine. Your marginal cost per scan goes to zero because you are not paying for the tokens, the visitor’s laptop is. For a free tier, that is the difference between a loss leader you have to ration and one you can leave open.
There is a quality tradeoff and I want to be straight about it. Gemini Nano is not Claude Opus or Gemini Pro. It is a small model tuned for speed and footprint. For a judgment like “is this brand mentioned in a plausible answer to this query,” a small local model is probably good enough, because the task is closer to classification than to open reasoning. For nuanced citation analysis across competing sources, it is probably not good enough yet, and the right design is a local pre-filter that runs free on Nano and only escalates the hard cases to a server model. That keeps most scans at $0 and spends real money only where the small model is out of its depth.
What breaks when you try this?
I have not shipped the client-side version, so this section is the engineering risk register, not a victory lap.
The hardware gate is the first problem. A 4GB model that needs 16GB of RAM or a 4GB-VRAM GPU rules out a meaningful slice of visitors, especially on mobile, where almost none of this works today. A scanning tool has to feature-detect with LanguageModel.availability() and fall back gracefully: run the free infrastructure checks for everyone, offer the local judgment layer only to machines that clear the bar, and keep the server path as the option for the rest. You cannot make the local model the only path or you lock out half your traffic.
The first-run download is the second problem. The model installs the first time a page asks for it, and that is a one-time multi-gigabyte pull on the visitor’s connection. You have to surface that honestly with a progress state, because a tool that silently triggers a 4GB download is a tool people uninstall.
The output-reliability problem is third. Chrome 148 added structured output and JSON mode to the Prompt API, which matters a lot here: an audit needs parseable verdicts, not prose. That feature is the thing that makes a local model usable as a scan backend instead of a chatbot. It still needs schema validation and a retry path, because small models drift.
None of these are blockers. They are the build. The reason I am writing the analysis before the implementation is that the economics are clearly favorable and the failure modes are all known and handle-able, which is exactly the point at which it is worth committing engineering time.
What does the code actually look like?
The reason this is a build and not a research project is that the API surface is small. You feature-detect, you create a session, you ask for a structured verdict, you validate it. The shape looks like this:
// 1. Gate on hardware + model availability before promising anything
const ready = await LanguageModel.availability();
if (ready !== "available") {
// fall back to the server path, or to infra-only checks
return runServerScan(url);
}
// 2. Create a local session. This runs on the visitor's machine.
const session = await LanguageModel.create({
initialPrompts: [{ role: "system", content: "You audit AI search visibility." }]
});
// 3. Ask for a parseable verdict, not prose. Chrome 148 added JSON output.
const verdict = await session.prompt(
`For the query "best AI visibility tool", would a model plausibly mention ${brand}? Answer yes or no with one reason.`,
{ responseConstraint: schema } // schema-constrained output
); That is the whole judgment loop. The availability() gate is doing the heavy lifting: it is what lets you keep the tool open to every visitor while only running local inference on machines that can take it. The responseConstraint is what turns a chatbot into a scan backend. Everything else is the same parsing and scoring logic the server version already runs, just pointed at a local model instead of a paid endpoint. Small surface, known failure modes, favorable economics. That is the case for building it.
How this fits the AVR framework
This is one input to a larger model. Our AVR framework (AI Visibility Readiness) splits a site’s readiness into infrastructure, content, and authority signals, and grades each on verifiable evidence rather than a single made-up score. Chrome’s Prompt API does not change what AVR measures. It changes how cheaply the measuring tool can run the judgment-grade checks, which is a delivery-cost question, not a methodology question. If you want the measurement methodology itself, the framework explainer is the canonical version, and the citability audit walkthrough covers what actually predicts whether AI cites a page.
The broader pattern is worth naming. Client-side inference is going to push a whole class of “send your data to our server, we run a model, we charge you” tools toward “the model runs in your browser and the tool is mostly free.” AI visibility scanning is a small, clean example because so much of it is either deterministic parsing or short classification. The expensive, nuanced reasoning stays on the server. The cheap, repeatable judgment moves local. That split is the actual story of on-device AI, and it is more boring and more useful than the demos suggest.
The honest bottom line
Chrome 148 is real and stable. The Prompt API on web pages with Gemini Nano and Gemma 197M is real. Our server-side scan costs are real: $0 for the infrastructure layer because it never needed a model, about $0.60 for each judgment-grade test, about $2 for the full audit. The $0 client-side version of that judgment layer is projected, not shipped, and it depends on a small local model being good enough for the easy cases and on a clean fallback for the machines and the hard cases that the local model cannot handle. If you are building anything that runs a model per visitor, this is the question to sit with: which of your model calls are actually classification in disguise, and could those run for free on the visitor’s own machine.
Sources: Chrome at I/O 2026, The Prompt API, Chrome for Developers, Built-in AI, Chrome for Developers.
· Sources & further reading
Sources & Further Reading
Further reading
- I Spent $10K on AEO and Got Zero AI Citations. Here Is the Audit Section That Would Have Caught Why. /blog/citability-section-5-off-site-authority-launch citability.dev now scores Wikipedia, Wikidata, and JSON-LD sameAs presence. Free, opt-in, under 10s. Part of the AVR Framework, see chudi.dev/framework.
- Perplexity vs ChatGPT: Different Citation Rules /blog/perplexity-vs-chatgpt-citation-rules Perplexity quotes liberally. ChatGPT quotes selectively. The engine-level differences in citation behavior that change what a sub-DR-20 brand should optimize for, engine by engine.
- Why Domain Authority Is Irrelevant for AI Search (And What to Build Instead) /blog/domain-authority-irrelevant-ai-search Domain authority has zero correlation with AI citation rates. Data from 7 audits shows what predicts whether ChatGPT, Perplexity, and Claude cite you.
What do you think?
I post about this stuff on LinkedIn every day and the conversations there are great. If this post sparked a thought, I'd love to hear it.
Discuss on LinkedIn