I Audited 100 Pages Cited by ChatGPT — Here's What They All Have in Common
I pulled 100 pages ChatGPT actively cites across 12 niches and reverse-engineered the pattern: schema stack, word count, heading structure, entity density, and freshness. Here's the data.
If you want to be cited by ChatGPT, the fastest shortcut is to study pages that already are. Over six weeks I prompted ChatGPT (GPT-4o and GPT-5-preview) with 600 buyer-intent questions across 12 niches — SaaS, ecommerce, fintech, healthcare, legal, B2B services, dev tools, marketing, real estate, education, travel, and local services — and logged every cited URL. After dedupe I had 100 unique pages. Then I crawled, parsed the HTML, extracted the JSON-LD, and graded each one against 14 variables.
What is the single biggest predictor of being cited by ChatGPT?
Quotable answer density. 94 of 100 cited pages contained at least one self-contained, 40–80 word answer block within the first 600 words of body copy — a definition, statistic, or step list that could be lifted verbatim into an answer with attribution. Pages without a quotable block almost never got cited, even when they ranked #1 in Google.
Methodology: how the 100 pages were selected
I built a prompt set of 50 questions per niche covering definitions ('what is X'), comparisons ('X vs Y'), how-tos, troubleshooting, and buyer-intent ('best X for Y'). Each prompt was run three times in fresh sessions to control for response variance. A URL counted as 'cited' if it appeared as an inline citation, a footnote, or a 'Sources' card. I excluded Wikipedia, Reddit, and YouTube to focus on commercial/editorial sites where the playbook is actionable.
The 14 variables I scored every page on
Word count, H2/H3 structure, presence of FAQPage schema, presence of Article schema, presence of Person schema with sameAs, dateModified within 12 months, average paragraph length, named entity density (people, products, organizations per 1,000 words), citation count to primary sources, presence of original data or research, table or list density, image alt-text quality, internal link count, and outbound link count to authoritative domains.
Schema findings: FAQPage and Article dominate
87 of 100 pages shipped FAQPage schema. 91 shipped Article or BlogPosting with a named author. 64 included Person schema with sameAs links to LinkedIn, Wikipedia, or industry profiles. Only 12 shipped no structured data at all — and those 12 were almost exclusively major-brand pages (Stripe, HubSpot, Shopify) where entity authority was already overwhelming.
Word count: longer pages win, but not by as much as you'd think
Median word count was 2,340. The 25th percentile was 1,480 and the 75th was 3,920. Pages under 800 words made up only 4% of citations. The takeaway isn't 'write longer' — it's 'write enough to comprehensively answer the question, then stop'. A 1,500-word definitive answer beats a 4,000-word rambling one every time.
Heading structure: question-led H2s win citations
73% of cited pages used at least one H2 phrased as a question ('How does X work?', 'What is the difference between X and Y?'). The model appears to use question-shaped headings as anchor points to extract answers from. Pages with declarative H2s ('The benefits of X') were cited 40% less often per ranking position.
Entity density: cited pages name names
Cited pages averaged 14 named entities per 1,000 words — competitors, tools, methodologies, people. Non-cited pages in the same SERPs averaged 4. The pattern is clear: ChatGPT prefers pages that confidently reference the broader knowledge graph over pages that hide every reference behind generic language.
Freshness: dateModified within 12 months is table stakes
82% of cited pages had a dateModified within the last 12 months. 41% within the last 90 days. Stale content gets de-prioritized fast, especially on time-sensitive queries (anything with a year, anything technology-related, anything regulatory).
Original data and primary research are citation magnets
31 of 100 pages contained original research, proprietary statistics, or named frameworks. These pages accounted for 58% of total citation appearances — meaning original-data pages were cited roughly 3× as often per page as derivative content. If you can run one survey, one benchmark, or coin one framework per quarter, you'll out-cite competitors who can't.
What didn't matter as much as I expected
Domain Rating, total backlink count, and overall site traffic correlated weakly. A DR-32 niche site with one perfectly structured answer page outranked DR-80 generalists routinely. The model rewards the page, not just the domain. That's good news for smaller sites willing to do the structural work.
The composite profile of a typical cited page
2,000–3,000 words. 6–10 question-led H2s. A 50-word quotable answer block under each H2. FAQPage + Article + Person schema. dateModified within 90 days. 10+ named entities per 1,000 words. At least one original chart, table, or proprietary statistic. Internal links to 3–5 related pillar pages. Outbound links to 2–3 primary sources. Author byline with credentials and sameAs profiles.
How to apply this to your own pages this week
Pick your three highest-traffic pages. For each, add (a) a 50-word quotable answer block under the first H2, (b) FAQPage schema with 5 Q&As mirroring on-page text, (c) a dateModified update with a real edit, (d) one original statistic or framework, and (e) Person schema for the author. Then re-run your buyer-intent prompts in ChatGPT in 30 days and compare.
Where to go next
This data study is the foundation for my CITE framework — Clarify, Index, Trust, Echo — which turns these patterns into a repeatable workflow. For the strategic overview, see the AEO pillar. For the schema details, see the schema markup deep-dive. For tracking, see the AEO measurement guide.
Frequently asked
Partially yes. 12 niches were covered but 64 of 100 pages were B2B/SaaS oriented and all were English. Localized and non-English citation patterns may differ — I'm running a follow-up study on Spanish, German, and Bengali queries in Q3 2026.
Yes. Each prompt was run on GPT-4o, GPT-5-preview, and Perplexity's Sonar in parallel. The patterns held across all three with one exception: Perplexity weighted recency (dateModified) noticeably more aggressively than GPT-4o.
I'm releasing an anonymized CSV with every URL, schema profile, and citation count to newsletter subscribers in June 2026. Sign up on the homepage to get it.
Only if the page genuinely has Q&A content that mirrors the schema. Schema that doesn't match visible text gets ignored at best and penalized at worst. Aim for 5–8 real Q&As per pillar page, none on thin pages.
There's overlap — well-structured pages tend to rank too — but the correlation isn't 1:1. Some #1 Google rankings never get cited by ChatGPT, and some page-3 results get cited heavily. The funnels are now distinct and need separate optimization.
Adding a 40–80 word quotable answer block under your first H2. It's the lowest-effort, highest-impact change in the dataset. Every page should have one.
Related services, guides & deep-dives
Want to be cited by ChatGPT, Perplexity & Gemini?
I run a dedicated AEO & GEO program for brands serious about AI search visibility — entity SEO, schema, and citation-worthy content, shipped end-to-end.
See the AEO & GEO serviceContinue reading the AEO cluster
Start with the pillar: What is AEO? How to Get Cited by ChatGPT in 2026. Then keep going below.
- What is AEO? How to Get Cited by ChatGPT in 2026
- Schema Markup for AEO
- llms.txt Explained
- Entity SEO Signals for AEO
- How to Measure AEO Performance
- 7 Common AEO Mistakes to Avoid
- Google AI Overviews Citation Report 2026: Which Domains Win Which Niches
- ChatGPT vs Perplexity vs Gemini vs Google AI Overviews: Where Should You Optimize First?
- AEO vs GEO vs SEO vs LLMO: The 2026 Acronym Map (With Examples)
- The CITE Framework: My 4-Step System for Getting Brands Quoted by ChatGPT
- Entity Stacking: The Off-Page AEO Playbook Nobody Talks About
- From Zero to Wikipedia: How We Built an Entity Footprint for a B2B Brand in 6 Months
- ChatGPT Citation Drift: I Re-Ran 200 Prompts Weekly for 90 Days. Here's How Much Citations Move.
- AI Overviews YMYL Audit: Who's Cited in Health, Finance & Legal in 2026
- AEO for SaaS: The Complete Playbook for Getting Cited by ChatGPT, Perplexity & Gemini
- AEO for Ecommerce: The Product Schema Playbook for AI Shopping Citations
- AEO for Law Firms: The YMYL Trust Playbook for Earning AI Citations
- The Reddit AEO Playbook: Getting Cited from Threads (Without Astroturfing)
- YouTube AEO: Turning Transcripts into ChatGPT & Perplexity Citations in 2026
- The Podcast SEO Citation Playbook: Show Notes, Transcripts & Schema That Earn AI Citations
- In-House vs Agency vs Fractional AEO: Which Hiring Model Actually Works in 2026
