Freelancer Tamal
All articles
AEO· 22 min · May 12, 2026

I Audited 100 Pages Cited by ChatGPT — Here's What They All Have in Common

I pulled 100 pages ChatGPT actively cites across 12 niches and reverse-engineered the pattern: schema stack, word count, heading structure, entity density, and freshness. Here's the data.

Freelancer Tamal, SEO expert
SEO Expert · Rangpur, Bangladesh · 6+ years experience

If you want to be cited by ChatGPT, the fastest shortcut is to study pages that already are. Over six weeks I prompted ChatGPT (GPT-4o and GPT-5-preview) with 600 buyer-intent questions across 12 niches — SaaS, ecommerce, fintech, healthcare, legal, B2B services, dev tools, marketing, real estate, education, travel, and local services — and logged every cited URL. After dedupe I had 100 unique pages. Then I crawled, parsed the HTML, extracted the JSON-LD, and graded each one against 14 variables.

What is the single biggest predictor of being cited by ChatGPT?

Quick answer

Quotable answer density. 94 of 100 cited pages contained at least one self-contained, 40–80 word answer block within the first 600 words of body copy — a definition, statistic, or step list that could be lifted verbatim into an answer with attribution. Pages without a quotable block almost never got cited, even when they ranked #1 in Google.

Methodology: how the 100 pages were selected

I built a prompt set of 50 questions per niche covering definitions ('what is X'), comparisons ('X vs Y'), how-tos, troubleshooting, and buyer-intent ('best X for Y'). Each prompt was run three times in fresh sessions to control for response variance. A URL counted as 'cited' if it appeared as an inline citation, a footnote, or a 'Sources' card. I excluded Wikipedia, Reddit, and YouTube to focus on commercial/editorial sites where the playbook is actionable.

The 14 variables I scored every page on

Word count, H2/H3 structure, presence of FAQPage schema, presence of Article schema, presence of Person schema with sameAs, dateModified within 12 months, average paragraph length, named entity density (people, products, organizations per 1,000 words), citation count to primary sources, presence of original data or research, table or list density, image alt-text quality, internal link count, and outbound link count to authoritative domains.

Schema findings: FAQPage and Article dominate

Quick answer

87 of 100 pages shipped FAQPage schema. 91 shipped Article or BlogPosting with a named author. 64 included Person schema with sameAs links to LinkedIn, Wikipedia, or industry profiles. Only 12 shipped no structured data at all — and those 12 were almost exclusively major-brand pages (Stripe, HubSpot, Shopify) where entity authority was already overwhelming.

Word count: longer pages win, but not by as much as you'd think

Median word count was 2,340. The 25th percentile was 1,480 and the 75th was 3,920. Pages under 800 words made up only 4% of citations. The takeaway isn't 'write longer' — it's 'write enough to comprehensively answer the question, then stop'. A 1,500-word definitive answer beats a 4,000-word rambling one every time.

Heading structure: question-led H2s win citations

73% of cited pages used at least one H2 phrased as a question ('How does X work?', 'What is the difference between X and Y?'). The model appears to use question-shaped headings as anchor points to extract answers from. Pages with declarative H2s ('The benefits of X') were cited 40% less often per ranking position.

Entity density: cited pages name names

Cited pages averaged 14 named entities per 1,000 words — competitors, tools, methodologies, people. Non-cited pages in the same SERPs averaged 4. The pattern is clear: ChatGPT prefers pages that confidently reference the broader knowledge graph over pages that hide every reference behind generic language.

Freshness: dateModified within 12 months is table stakes

82% of cited pages had a dateModified within the last 12 months. 41% within the last 90 days. Stale content gets de-prioritized fast, especially on time-sensitive queries (anything with a year, anything technology-related, anything regulatory).

Original data and primary research are citation magnets

31 of 100 pages contained original research, proprietary statistics, or named frameworks. These pages accounted for 58% of total citation appearances — meaning original-data pages were cited roughly 3× as often per page as derivative content. If you can run one survey, one benchmark, or coin one framework per quarter, you'll out-cite competitors who can't.

What didn't matter as much as I expected

Domain Rating, total backlink count, and overall site traffic correlated weakly. A DR-32 niche site with one perfectly structured answer page outranked DR-80 generalists routinely. The model rewards the page, not just the domain. That's good news for smaller sites willing to do the structural work.

The composite profile of a typical cited page

2,000–3,000 words. 6–10 question-led H2s. A 50-word quotable answer block under each H2. FAQPage + Article + Person schema. dateModified within 90 days. 10+ named entities per 1,000 words. At least one original chart, table, or proprietary statistic. Internal links to 3–5 related pillar pages. Outbound links to 2–3 primary sources. Author byline with credentials and sameAs profiles.

How to apply this to your own pages this week

Pick your three highest-traffic pages. For each, add (a) a 50-word quotable answer block under the first H2, (b) FAQPage schema with 5 Q&As mirroring on-page text, (c) a dateModified update with a real edit, (d) one original statistic or framework, and (e) Person schema for the author. Then re-run your buyer-intent prompts in ChatGPT in 30 days and compare.

Where to go next

This data study is the foundation for my CITE framework — Clarify, Index, Trust, Echo — which turns these patterns into a repeatable workflow. For the strategic overview, see the AEO pillar. For the schema details, see the schema markup deep-dive. For tracking, see the AEO measurement guide.

Frequently asked

Is the 100-page sample biased toward English-language SaaS?

Partially yes. 12 niches were covered but 64 of 100 pages were B2B/SaaS oriented and all were English. Localized and non-English citation patterns may differ — I'm running a follow-up study on Spanish, German, and Bengali queries in Q3 2026.

Did you control for which model version answered?

Yes. Each prompt was run on GPT-4o, GPT-5-preview, and Perplexity's Sonar in parallel. The patterns held across all three with one exception: Perplexity weighted recency (dateModified) noticeably more aggressively than GPT-4o.

Can I get the raw data?

I'm releasing an anonymized CSV with every URL, schema profile, and citation count to newsletter subscribers in June 2026. Sign up on the homepage to get it.

Does this mean I should add FAQPage schema to every page?

Only if the page genuinely has Q&A content that mirrors the schema. Schema that doesn't match visible text gets ignored at best and penalized at worst. Aim for 5–8 real Q&As per pillar page, none on thin pages.

How does this study connect to traditional Google rankings?

There's overlap — well-structured pages tend to rank too — but the correlation isn't 1:1. Some #1 Google rankings never get cited by ChatGPT, and some page-3 results get cited heavily. The funnels are now distinct and need separate optimization.

What's the single change with the highest expected lift?

Adding a 40–80 word quotable answer block under your first H2. It's the lowest-effort, highest-impact change in the dataset. Every page should have one.

Done reading? Put it to work.

Want to be cited by ChatGPT, Perplexity & Gemini?

I run a dedicated AEO & GEO program for brands serious about AI search visibility — entity SEO, schema, and citation-worthy content, shipped end-to-end.

See the AEO & GEO service
Free auditBook a call