Freelancer Tamal
All articles
AEO· 18 min · May 15, 2026

ChatGPT Citation Drift: I Re-Ran 200 Prompts Weekly for 90 Days. Here's How Much Citations Move.

Citations from ChatGPT aren't stable — they drift run-to-run, day-to-day, and across model versions. I re-ran 200 buyer-intent prompts every week for 90 days. Here's the actual variance, and what it means for your AEO program.

Freelancer Tamal, SEO expert
SEO Expert · Rangpur, Bangladesh · 6+ years experience

Most AEO reporting tools take a snapshot — they ran your prompt set once last Tuesday, and that's the number on the dashboard. The dirty secret of AI citation tracking is that the same prompt, run twice in a row, often returns different cited sources. To quantify this, I ran the same 200 buyer-intent prompts in ChatGPT every week for 13 weeks. Same prompts. Same accounts. Same time of day. The variance was bigger than I expected.

Methodology / preview note: this is the first half of an ongoing study; the full anonymised dataset and per-prompt CSV is being released to newsletter subscribers in Q3 2026.

How much does ChatGPT citation share actually drift week-to-week?

Quick answer

Across 200 prompts re-run weekly for 13 weeks, mean week-over-week citation overlap was 71%. That means roughly 3 in every 10 cited URLs changed each week, even when the underlying pages didn't. Variance was highest on commercial comparison prompts (overlap as low as 48%) and lowest on definitional prompts (overlap up to 94%).

Methodology

Prompt set: 200 questions across 10 niches (SaaS, fintech, legal, health, ecommerce, dev tools, marketing, real estate, education, local services). Each prompt was run three times in fresh sessions every Wednesday between 14:00–16:00 UTC. I logged every citation: URL, position, and snippet. A 'cited URL' counts if it appears in the inline citations or the Sources card. Models tested: GPT-4o, GPT-5-preview, and Sonar-Large via Perplexity (control group).

Three patterns that drive most of the drift

(1) Freshness rotation. New high-quality content published in the last 14 days enters the citation pool aggressively, often pushing month-old citations out. (2) Model-version updates. Across the 13 weeks, OpenAI shipped two silent model refreshes; both caused a measurable 24-hour drop in mid-tier brand citations and a corresponding lift for institutional sources. (3) Session randomness. Even with identical prompts, ChatGPT's retrieval samples slightly different document slices — about 8% of citation variance is pure noise.

Which kinds of pages are most stable across runs?

Pages that ranked in week 1 and were still cited in week 13 (the 'sticky' set) shared four traits: (a) FAQPage schema mirroring on-page Q&A; (b) at least one named statistic or original framework; (c) Person schema for the author with sameAs links; (d) at least 5 inbound citations from sources LLMs trust (Reddit, Wikipedia, major industry blogs).

Which pages drop fastest?

Pages cited in week 1 but absent by week 4 were almost always thin comparison content with no proprietary data, no schema, and no entity authority. They got pulled in once on a freshness signal and replaced as soon as a stronger source published a similar piece.

Implications for your AEO program

Single-snapshot citation reports are misleading. Aim for 4-week rolling averages, not point-in-time numbers. Optimize for the 'sticky' profile above so your pages survive freshness rotations. And don't panic when a citation disappears for a week — re-run the prompt 7 days later before changing strategy.

What to do this month

Pick your top 10 priority prompts. Run them three times each, weekly, for the next 4 weeks. Build a simple spreadsheet of citation overlap. You'll discover which of your pages are stable cites versus drifting ones — and that's the only honest way to measure AEO right now.

Frequently asked

Is the drift bigger on Perplexity than ChatGPT?

No — slightly smaller. Perplexity's mean week-over-week overlap was 76% versus ChatGPT's 71%, mostly because Perplexity weights freshness more predictably.

Did time-of-day matter?

Marginally. Prompts run at 14:00 UTC vs 02:00 UTC had ~4% citation overlap difference — likely cache and load-balancing related, not a strategic lever.

Should I just track more prompts to average out the noise?

Yes. Below 50 prompts, weekly noise drowns out signal. Above 200, the rolling average is reliable enough to make decisions on.

How do I separate real drops from noise?

Re-run the prompt 3 times in fresh sessions on two separate days a week apart. If your URL is missing on all 6 runs, it's a real drop, not noise.

Will the drift get worse as more brands ship AEO?

Likely yes. As more pages compete for the same citation slots, week-to-week churn will increase — making structural moats (schema + entity + original data) more valuable, not less.

Where can I get the full dataset?

I'm releasing the anonymised per-prompt CSV to newsletter subscribers in Q3 2026. Sign up on the homepage to get notified.

Done reading? Put it to work.

Want to be cited by ChatGPT, Perplexity & Gemini?

I run a dedicated AEO & GEO program for brands serious about AI search visibility — entity SEO, schema, and citation-worthy content, shipped end-to-end.

See the AEO & GEO service
Free auditBook a call