Skip to main content

How I Lifted Five chudi.dev Pages to EXTRACTABLE on AVR v1.1.0.

Chudi Nnorukam May 23, 2026 6 min read

All 5 audited chudi.dev URLs now score EXTRACTABLE on AVR v1.1.0 Fact-Block Density. Two HTML traps (dt/dd Q/A pairs and icons before heading text) cost a follow-up commit each. The CI workflow now hard-fails any regression.

Why this matters

Five chudi.dev URLs lifted from a mix of NOT-EXTRACTABLE and PARTIALLY-EXTRACTABLE to all EXTRACTABLE on the AVR v1.1.0 Fact-Block Density audit. Two HTML traps cost a follow-up commit each, both are documented now. CI hard-fail keeps the baseline from regressing.

The earlier case-study post on this site explained the first AVR v1.1.0 audit and what was failing. This post documents the cleanup arc. Five chudi.dev URLs went from a mix of NOT-EXTRACTABLE and PARTIALLY-EXTRACTABLE to ALL EXTRACTABLE in under a day. Two HTML traps cost a follow-up commit each. The CI workflow now blocks regressions.

The starting state

The first audit on 2026-05-23 morning showed three distinct failure shapes:

PagePre-remediation scoreVerdict
chudi.dev40/100NOT-EXTRACTABLE
chudi.dev/framework51/100PARTIALLY-EXTRACTABLE
chudi.dev/about72/100PARTIALLY-EXTRACTABLE
chudi.dev/blog20/100NOT-EXTRACTABLE
chudi.dev/topics0/100NOT-EXTRACTABLE

Three of five pages were below the PARTIALLY-EXTRACTABLE 40-point floor. The pattern was always the same three failing checks: F3 (40-60 word direct-answer band per H2), F4 (H2/H3 question-format rate), and F5 (FAQ section presence near the page tail). F1 (first-sentence standalone-answer) and F2 (first-200-tokens direct-answer) usually passed because the operator-authored sections already followed Patel’s inverted-pyramid rule.

The biggest single unlock

The largest score jump came from one mechanical edit applied across 5 Svelte components. JourneyCard, BlogCard, BlogCardFeatured, ProjectCard, and ProductCard all rendered their dynamic title text as <h2> or <h3> statement headings. Every card title on the page diluted the F4 question-rate denominator and the F3 word-band denominator. The remediation: change the tag from <h2>/<h3> to <p> with role="heading" and aria-level=N so screen readers still see the document outline, the CSS classes stay identical, and only the literal heading tag disappears from the audit’s heading extraction.

That single commit (chudi-blog 46668ff9) lifted chudi.dev root from 65 to 100/100 EXTRACTABLE. Same idea, six tag swaps, +35 points.

The two HTML traps

Two follow-up commits cost the most time relative to their impact. Both are now documented in the remediation plan at avr-pipeline/AVR-V1-1-0-FACT-BLOCK-REMEDIATION-PLAN.md. Future page remediations skip these detours by reading them upfront.

Trap one: dt/dd Q/A pairs

The first FAQ section I shipped on chudi.dev root used <dl><dt><dd> description-list elements. The Q/A pattern reads naturally that way and the markup is semantically correct for a glossary or definition list. The audit cannot see it. section_fact_block_density.py’s extract_sections function walks h1, h2, h3 tags only. Five dt-shaped questions contributed zero to F4. F5 cannot detect “Frequently asked” as a heading when it sits inside a <dt>. The fix: change <dt> to <h3>, <dd> to <p>, and the wrapping <dl> to <div>. Visual styling stays the same if the CSS attaches to the new elements; the audit sees five new question H3s and an FAQ-shaped H2.

Trap two: icons before heading text

Material Symbols icons are commonly placed before heading text as a visual marker. The pattern looks like:

<h2>
  <span class="material-symbols-outlined">quiz</span>
  Frequently asked questions
</h2>

BeautifulSoup extracts heading content by concatenating children in document order. The extracted text becomes “quiz Frequently asked questions”. The F5 regex requires “frequently asked” as the FIRST word. The icon broke the match. /about scored 78/100 on the first remediation pass instead of EXTRACTABLE because of this one ordering. The fix: move the icon AFTER the heading text and add aria-hidden="true" so screen readers do not double-read it:

<h2>
  Frequently asked questions
  <span aria-hidden="true" class="material-symbols-outlined">quiz</span>
</h2>

One line change. /about jumped from 78 to 88/100 EXTRACTABLE.

The final state

All 5 audited URLs reach EXTRACTABLE:

PageFinal scoreVerdictΔ from baseline
chudi.dev100/100EXTRACTABLE+60
chudi.dev/framework100/100EXTRACTABLE+49
chudi.dev/about88/100EXTRACTABLE+16
chudi.dev/blog100/100EXTRACTABLE+80
chudi.dev/topics80/100EXTRACTABLE+80

Average lift: +57 points per page. Three pages crossed from NOT-EXTRACTABLE to EXTRACTABLE with no intermediate dwell time in the PARTIALLY-EXTRACTABLE band.

The CI hard-fail flip

The GitHub Action at .github/workflows/avr-fact-block-audit.yml triggers on every deployment_status event with state: success and environment: Production. It clones the avr-pipeline repo, audits the five URLs, and posts an avr-fact-block-density commit status check. After all five URLs reached EXTRACTABLE on commit d4dfbba8, I flipped the workflow from fail-soft (warning only) to hard-fail (exit 1) on commit c548df2d. Any deploy that regresses any URL below EXTRACTABLE now produces a failed commit status check visible in the commit timeline and the PR view.

The check is not blocking in the literal sense (no branch protection rule wired up yet) but it is loudly visible. The next step is to add the avr-fact-block-density status to the required-checks list on the main branch protection rule, which converts visibility into a true gate.

Frequently asked questions

Why does AVR v1.1.0 audit Fact-Block Density?

The audit reads HTML the way an AI extractor would. Pages that look well-structured to a human reader can still be unreadable to an AI retrieval engine that walks heading elements in isolation. Fact-Block Density measures whether the content’s heading + paragraph + FAQ structure survives chunk-extraction by an AI engine.

Why did the dt/dd FAQ pattern fail the audit?

section_fact_block_density.py’s extract_sections function walks h1, h2, h3 elements only. The dt element is semantically a description term (a Q/A label) but it is not a heading tag. The audit cannot see dt-shaped questions, so they contribute zero F4 question-rate signal and the F5 FAQ regex never finds them. Use h3 for Q and p for A.

Why did the icon-before-text pattern destroy F5?

BeautifulSoup extracts heading content by walking children in document order and concatenating their text. A leading icon span produces text like “quiz Frequently asked questions” because the icon name renders as text. The F5 regex looks for “frequently asked” as the FIRST word of the extracted heading text. The icon prepended its label and broke the match. Place icons AFTER the text with aria-hidden=true.

What does the CI hard-fail catch?

The GitHub Action at .github/workflows/avr-fact-block-audit.yml in chudi-blog runs on every Vercel deployment_status success event for Production environment. It clones avr-pipeline, audits 5 chudi.dev URLs, and exits non-zero if any URL drops below EXTRACTABLE. The avr-fact-block-density commit status check appears on every commit that triggers a Production deploy.

Where can I read the full remediation plan and gotchas?

The canonical document is AVR-V1-1-0-FACT-BLOCK-REMEDIATION-PLAN.md in the avr-pipeline repo. It walks through the three failing checks for each page, the before/after rewrite pattern, the per-page budget, and the two HTML traps documented above. The repo is at github.com/ChudiNnorukam/avr-pipeline.

FAQ

Why does AVR v1.1.0 audit Fact-Block Density?

The audit reads HTML the way an AI extractor would. Pages that look well-structured to a human reader can still be unreadable to an AI retrieval engine that walks heading elements in isolation. Fact-Block Density measures whether the content's heading + paragraph + FAQ structure survives chunk-extraction by an AI engine.

Why did the dt/dd FAQ pattern fail the audit?

section_fact_block_density.py's extract_sections function walks h1, h2, h3 elements only. The dt element is semantically a description term (a Q/A label) but it is not a heading tag. The audit cannot see dt-shaped questions, so they contribute zero F4 question-rate signal and the F5 FAQ regex never finds them. Use h3 for Q + p for A.

Why did the icon-before-text pattern destroy F5?

BeautifulSoup extracts heading content by walking children in document order and concatenating their text. A leading icon span produces text like "quiz Frequently asked questions" because the icon name renders as text. The F5 regex looks for "frequently asked" as the FIRST word of the extracted heading text. The icon prepended its label and broke the match. Place icons AFTER the text with aria-hidden=true.

What does the CI hard-fail catch?

The GitHub Action at .github/workflows/avr-fact-block-audit.yml in chudi-blog runs on every Vercel deployment_status success event for Production environment. It clones avr-pipeline, audits 5 chudi.dev URLs, and exits non-zero if any URL drops below EXTRACTABLE. The avr-fact-block-density commit status check appears on every commit that triggers a Production deploy.

Sources & Further Reading

Further Reading

What do you think?

I post about this stuff on LinkedIn every day and the conversations there are great. If this post sparked a thought, I'd love to hear it.

Discuss on LinkedIn