← All field notes
AI search April 22, 2026 11 min read

product feeds are the new SEO : structuring catalogs for ChatGPT, Perplexity, and AI Overviews.

when an LLM cites your product in a buying guide, it's not pulling from your blog post. it's pulling from structured data — the same kind of structured data you've been ignoring for ten years. here's what that data needs to look like in 2026.

— What an AI buying-guide answer looks like. The citations are pulling from product feed data, not blog SEO.

For the last fifteen years, “search visibility” for a product meant one thing: rank a blog post on Google. write a “best running shoes 2024” listicle, get backlinks, hope your shoes get name-dropped, capture the click. content marketing built entire careers on this loop.

That loop is collapsing. not because Google stopped working — it works fine — but because a meaningful chunk of buying-intent traffic is moving to surfaces where the listicle was never the source of truth in the first place. AI Overviews on Google. ChatGPT shopping. Perplexity. when these surfaces recommend a product, they’re not citing a Reddit thread or a content site. they’re pulling from structured product data — what we used to call “the feed.”

If you’ve spent a decade thinking of your Merchant Center feed as plumbing for paid ads, this is the moment to reclassify it. it’s now your most-read piece of marketing copy, and the audience reading it is a model.

What LLMs actually consume

Specifics matter here, because a lot of what gets written about “AI SEO” is hand-wavy. let’s be concrete about what each of the major AI shopping surfaces uses as input:

The common thread is structured product data. specifically, four classes of structured data that LLMs are trained to read:

  1. Merchant Center / Commerce Catalog feeds — your XML or API feed.
  2. Schema.org Product markup — JSON-LD on your product detail pages.
  3. Open Graph & product OG tags — what shows up when a URL gets shared.
  4. Reviews schema — aggregated rating and review text in structured form.

None of this is new. Schema.org has been around since 2011. what changed is that the cost of leaving it incomplete is no longer “your rich snippet doesn’t show stars on Google” — it’s “you don’t get cited when an LLM answers a buying question.”

Why the feed beats the blog

Here’s the part that’s hard to internalize for content-marketing teams: when an LLM is asked “what’s the best merino base layer under $200,” it does not read your “Best Merino Base Layers 2026” listicle and copy your recommendations. it reads structured product data — title, description, price, attributes, reviews — and constructs an answer from facts.

Listicles still matter as a training-data signal — they help models learn that your brand is associated with the category. but the actual answer-time citation is generated from fresh, structured product data. fresh, because the model’s retrieval layer is calling out to a live index. structured, because that’s what the retrieval layer can parse cleanly.

If your blog ranks #1 for “best merino base layer” but your product feed has the description field set to “100% merino. machine wash cold.” — you’ve won the old game and lost the new one.

What “good” structured data looks like

Let’s compare two product description fields. same SKU, same brand, two different feeds:

What most stores ship
"100% merino wool. Machine washable. Available in multiple colors and sizes."
What an LLM can actually use
"Lightweight 200gsm merino wool base layer designed for backcountry skiing, ski touring, and cold-weather trail running. Flatlock seams reduce chafing under pack straps. Fits true to size; size up for layering. Machine wash cold, tumble dry low. Made in Vietnam from RWS-certified merino. Fall 2025 collection."

The second one isn’t longer for the sake of it. it’s denser with attributes a model needs to qualify the product against a query: weight (200gsm), use case (backcountry skiing, ski touring), fit detail (flatlock seams, true-to-size with sizing guidance), material certification (RWS), seasonality (Fall 2025). when someone asks “what’s a good 200gsm merino top for ski touring,” that product is now eligible to be cited. the first version is not.

If you’ve read our PMAX attribute checklist, the good news is most of the same fields apply — AI search and PMAX both consume the same feed. but the priorities are slightly different. here’s what to focus on for AI visibility specifically:

1. Description: density over length

Aim for 200–500 words per product. include: use cases, material composition, fit notes, care instructions, certifications, dimensions where relevant, and seasonality. do not pad with marketing copy. write for a model that’s parsing facts, not a shopper feeling a vibe.

2. Title: brand + category + key attribute

“Smartwool Classic Thermal Merino 250 Crew — Men’s, Charcoal Heather” gets parsed cleanly. “The 250 Crew” does not. AI shopping surfaces use the title as their primary disambiguation signal between similar SKUs.

3. Schema.org Product on every PDP

Shopify ships basic Product schema by default. it’s not enough. you want full coverage: brand, sku, gtin, aggregateRating, review, offers with price/currency/availability, material, color, size, audience. validate it with Google’s Rich Results Test, then validate again against Schema.org’s own validator — they catch different things.

4. Reviews in structured form

If reviews live only inside a JS widget and never make it into the page’s JSON-LD, the model doesn’t see them. most review apps support schema injection; turn it on. include the review body text, not just the star rating — LLMs are reading the actual sentiment.

5. Identifiers, identifiers, identifiers

gtin, mpn, brand. these are how a model knows that the product on your site is the same product on three other retailers’ sites — which is how it builds confidence to cite. without them, you’re a stranger at the party.

6. Freshness signals

Models trust feeds that update. set explicit availability and refresh it within a few hours of inventory changes. populate availability_date for preorders. if your feed ships once per day with stale stock data, you’ll get filtered out of “in stock now” queries — which is most of them.

!
A note on hallucination defense

The denser and more accurate your structured data, the less likely an LLM is to fabricate facts about your product. brands with thin feeds get hallucinated material claims, wrong sizes, invented features. brands with rich feeds get cited verbatim. you can think of feed quality as an LLM hallucination hedge.

What you can’t do

You can’t optimize for “ChatGPT rank.” there’s no rank. there’s eligibility, and there’s confidence — eligibility means your product matches the query’s attributes, confidence means the model has enough corroborating data to cite you without hedging. both are functions of how complete and consistent your structured data is across surfaces.

You also can’t game it the way you could game blog SEO. there’s no keyword stuffing, no link-building equivalent, no PBN strategy. the only lever is making your product data factually rich and machine-readable. which is, frankly, the lever marketing has been ignoring for a decade because it didn’t pay off in the old game.

Where this goes

It’s hard to predict the share of buying-intent traffic that ends up on AI surfaces five years out. our working assumption is “a lot, and concentrated in the high-consideration mid-funnel” — the “what should I buy” stage where people used to read listicles. transactional traffic ("buy {brand} {SKU}") will keep going to Google. discovery traffic is up for grabs.

The retailers who are going to win that grab are not the ones with the best content team. they’re the ones whose product feeds are richer, fresher, and more structurally consistent than their competitors’. which is, again, a feed problem.