
SEO is being catapulted into AI-driven discovery at a breakneck pace.
Bear in mind, this is an evolution, not an outright replacement.
While the SEO funnel may have changed, there’s still a decision-making process involved in AI search that marketers can leverage in their favor.
The challenge is that the online discovery process is increasingly becoming multimodal and driven by autonomous AI agents.
What’s multimodal?
At first glance, it may seem like multimodal search refers to using things like images and audio to look up things online (instead of just text).
While this is true, multimodal search goes deeper than that.
In a nutshell, multimodal search refers to all the ways that users discover content online, and that stretches far beyond what you house on your website.
That means multimodal search is also multi-channel, such as:
- Social media (TikTok explainer videos, YouTube shorts)
- Generative AI platforms (ChatGPT, Perplexity, Claude)
- Traditional search engines (10 blue links)
- Community discussion on Reddit, Quora, and niche forums
- Amazon and other e-commerce listings
At the same time, LLM-powered platforms like ChatGPT have autonomous agents working behind the scenes.
The future of SEO encompasses all these factors.
In this guide, we’ll break down all the ways SEO is evolving, and what you should do about it right now.
What are AI Agents? How are They Changing Online Discovery?

First, let’s clearly define what AI agents are to remove any confusion.
An AI agent is a system capable of observation, decision-making, and taking action without requiring human intervention to guide every step.
In other words, true AI agents have a level of autonomy where they’re able to perform certain tasks without a human prompting them to.
By this definition, LLMs that power platforms like ChatGPT are not AI agents.
Why is that?
It’s because they’re responsive by nature. LLMs respond to human prompts, and aren’t capable of striking up a conversation by themselves.
At the same time, ChatGPT and other platforms like Perplexity do make use of AI agents behind the scenes.
For example, Perplexity has a crawler agent that autonomously crawls and indexes the internet.
In the same vein, ChatGPT has search agents that work behind the scenes to:
- Choose sources
- Score authority
- Decide who gets cited
- Crawl internet sources
The LLM does the responding, but its AI agents handle the invisible work.
From Blue Links to AI Agents: The Evolution of Search
Before generative AI platforms like ChatGPT, most users relied on Google’s organic search results.
Whenever they wanted to research a purchase or learn something new, they’d ‘Google it.’ From there, they’d click on one of the top 10 ‘blue links’ to satisfy their search intent:

This drove traffic to a wide variety of websites, and it was the bread and butter of SEO up until very recently.
However, generative AI systems and agents are changing that fact more and more each day.
Why is that?
It’s mainly because AI platforms offer what we crave: convenience.
Instead of reading a few blogs or product round-up articles, it’s far easier to ask an AI chatbot which product or service it recommends:

The same is true for learning new definitions or concepts.
If they don’t use a tool like ChatGPT, then Google’s AI Overviews will likely step in to provide the user with a Cliff Notes version of what they’re looking for (which is actually sufficient for most).
Consider that nearly 60% of searches ended in zero clicks in 2024, or that the presence of an AI Overview can reduce clicks by over 30%.
Moreover, 60% of people already use AI for online shopping, and they trust its recommendations.
Because of this trust, lots of users are outsourcing shopping research to AI tools.
As a result, the traditional SEO sales funnel has morphed into a nonlinear loop.
Users can skip stages, like immediately deciding to purchase something because of an AI’s recommendation.
Also, since AI chatbots remember users’ past conversations, they can recommend related products or services, looping them into a brand-new shopping experience.
What is Multimodal Search? How is it Changing SEO?
Next, let’s take a closer look at multimodal search and all the ways it has been altering the search landscape.
Generative AI platforms like ChatGPT, Google’s AI Mode, and Perplexity now enable multimodal search features like:
- Image search
- Video search
- Audio search
An example would be uploading a photo of a book and asking Google for similar recommendations. Thanks to computer vision, the platform can actually ‘see’ images, so there’s no need to actually type out the name of the book.

Yet, as mentioned in the intro, the concept of multimodal search stretches beyond image, text, audio, and video.
It also includes cross-platform and cross-format experiences.
Here’s how that’s possible.
Modern multimodal, AI-powered search platforms are able to fuse different streams of information (social media content, images, forum discussion, etc.) into a shared semantic space.
For instance, Google’s AI Mode can understand the similarities between:
- A picture of the Eiffel Tower.
- A voice memo talking about ‘the big tower in Paris.’
- A video of someone walking by the Eiffel Tower.
- A text prompt asking, “What’s the most notorious landmark in Paris?”
While they’re a mix of different formats and terminology, a multimodal system like Google’s AI Mode will recognize that they all refer to the Eiffel Tower.
Entity recognition and vector embeddings are what make multimodal search possible, as they’re the glue that holds the entire ‘semantic space’ together.
How entity recognition enables multimodal search
Named entity recognition (NER) is the process that enables AI systems to understand what something is, regardless of its format.
Basically, it works by identifying entities (people, places, things, concepts, etc.) in text, images, and audio.
After the entity is identified, the system performs entity disambiguation to separate identical terms, like distinguishing apple the fruit from Apple the company.
Once the AI system is confident it has the correct entity, it gets linked to a corresponding entry in a knowledge graph or database.
This connection lets the AI pull additional context about the entity, such as facts, attributes, relationships, and the latest updates.
In the example mentioned above, the Eiffel Tower was the entity in question.
Thanks to NER, AI systems can correctly identify and present facts about all sorts of things, including your brand.
Meaning represented as vector embeddings

A vector embedding is a mathematical representation of a word’s meaning, and it’s what all modalities (text, image, audio) get converted into.
With the Eiffel Tower example, an AI system would convert all the pieces of content (the video, the voice memo, the text prompt, etc.) into vector embeddings, which are just long strings of numbers.
Here’s what one might look like (abridged, of course):
[0.23, -1.04, 0.88, 2.11, -0.67, ...]
You can think of embeddings as coordinates that indicate an entity’s position in a high-dimensional knowledge map learned by the model.
The AI model is able to determine the relationship between concepts based on the geometric distance between the embeddings.
All the content mentioning the Eiffel Tower would cluster together in the same embedding space since their vectors would sit very close to one another. This signals to the AI system that all these references are referring to the same thing.
Important note: Vector embeddings do not store important facts as knowledge base entries do. Instead, they store entity relationships as geometric distances.
It’s the combination of vector embeddings and entity recognition that enables AI models to fully engage in multimodal search.
Embeddings provide the semantic connections, while NER anchors multimodal results to real-world concepts.
How Can Brands Optimize for AI Agents and Multimodal Search?
Capitalizing on the new direction search is taking requires a shift in mindset. Tried and true SEO tactics won’t cut it, so you’ll need a new playbook.
Optimize for entity clarity
First, you’ll need to focus on entity optimization over keyword optimization. The idea is to:
- Solidify your brand as a known entity in major knowledge graphs (Google, Wikidata, Crunchbase, etc.).
- Get LLMs to understand how your brand relates to your products, services, and area of expertise.
- Demonstrate enough credibility through content, brand mentions, and backlinks so that you’ll become an authority figure in your field.
Achieving these goals will have a more powerful effect than targeting one keyword at a time.
Once LLMs trust your brand’s entity, you’ll have the potential to appear for thousands of related keywords without having to optimize for them outright.
Ways to improve entity clarity include:
- Producing interlinked content clusters to cover your area of expertise in as much depth as possible.
- Ensure knowledge database consistency for your brand (naming consistency is critical).
- Building brand mentions on credible news sites and media outlets.
- Improving your brand sentiment by managing your reviews and reputation.
| Need expert help building authoritative brand mentions? Check out our Digital PR service! |
Use structured data everywhere

Structured data has become even more important for GSO (generative search optimization) than it is for classic SEO.
Since LLMs and AI agents are taking over the web, making your content machine-readable is incredibly important. Schema markup and semantic HTML make your content easy for LLMs to parse, disambiguate, and cite.
Check out our guide on how LLMs read websites to learn more.
Multi-channel identity consistency

Multimodal AI agents don’t just retrieve information about your brand from your website. Instead, they also build entity profiles from:
- YouTube videos
- Social media profiles
- Reviews
- Product descriptions
- Community discussion
This means your brand must remain consistent across various channels, modalities, and platforms.
Things like inconsistent names or inaccurate pricing risks breaking the entity link, so vigilance is key. Make sure that your brand’s name, identity, and voice stay the same across all touchpoints.
Wrapping Up: AI Agents and Multimodal Search
The search world may be going through a radical metamorphosis, but that doesn’t mean you can’t market your products and services online.
The trick is to adopt GSO tactics into your existing SEO strategy before it’s too late.
Keep doing what’s working for you right now, but don’t forget about the future (it’ll be here before you know it).
Are you ready to start embracing the power of AI search?
Sign up for AI Discover, our battle-tested AI optimization service!
The author
Rachel Hernandez
description
Previous
Understanding Query Fan-Out (and How to Optimize for It)
Next
Understanding The Relationship Between Search Algorithms and LLMs
Discussion
Comments
No comments yet!
Be the first to comment.
