At Google’s annual Search On conference, the advertising giant announced a range of new features that will affect all retailers. MUM is at the heart of these product announcements; that is to say, Google’s Multitask Unified Model. This technology will revolutionise the search experience, bringing shoppers further into Google’s ecosystem. Shoppers will be able to search with text and images, then make a purchase without ever leaving the Google search results page.
The announcement brings together two important fields in retail AI: semantic search and computer vision. Gartner’s Hype Cycle for AI in 2021 highlights these two areas as the most advanced, emerging into the ‘Slope of Enlightenment’.
Retailers need to ask: Do they want to own the customer experience, or are they happy to hand this off to Google?
If they want to keep control, now is the time to add multimodal search to their website.
What is multimodal search?
First, we should clarify exactly what we mean by multimodal search. “Multimodality” is a portmanteau of “multiple” and “modalities”, with the latter referring to data types. For example, text, images, videos, and audio.
Google provided a short demo of what their multimodal search will look like, using text to narrow the focus of an image-based search:
Why does multimodal search matter for retailers?
Today’s websites have been built on text-based search, primarily because search engines can process these queries with a reasonable degree of accuracy. Retailers structure and label their websites with one eye on their potential presence in Google’s search results. Google then processes user queries and directs users towards the websites that best satisfy their intent.
Visual search, which uses an image as the stimulus for a search, is growing in popularity. It is perfect for those moments when we see something we would like to buy, but don’t know how to describe it. Take this chair, for example:
How would you search for this chair using text only?
Perhaps you would focus on its shape, or its pattern, or its materials.
It is unlikely that we would all search for it in the same way, yet such uniformity is needed for text-driven search to work optimally. After all, the retailer needs to know how to label the product to capture customer demand, and the search engine must be able to match that demand to supply.
That is a missed opportunity for customers – and the retailer. The average ecommerce conversion rate sits at just 0.68% today for furniture, so the industry has plenty of room for improvement.
Searching with an image removes ambiguity from the query. Computer vision technology can detect the shape, pattern, and materials, then identify the best match.
The retailer now knows what you want.
But do they know why you want it?
You might want more information about the chair, or to see similar chairs, or even complementary items if you already own the chair.
Again, there is a gap between consumer intent and retailer reality.
Multimodal search can simultaneously process multiple inputs to bridge this gap. Using different data types together means picking up on clues within the query to dig deeper into both what the customer wants – and why they want it. When retailers have that information, they can deliver better results and achieve better outcomes.
Of course, Google has always worked to improve its search algorithms, but there is a shift in the balance of power with these new multimodal search developments.
As we approach the end of the third-party cookie era, the big tech companies are all moving to take ownership of the customer journey. First party data has become a precious commodity.
Google no longer wants to interpret queries and then refer the customer to the best result. It wants to encourage the user to browse and purchase within the search results, with no reason to leave the Google ecosystem. In essence, it wants to be more like Amazon.
How can retailers maintain control of their own customer experience?
The short answer is that retailers must embrace multimodal search as a more effective way of delighting customers. This was true even before Google’s announcements, but this news provides much-needed incentive to make these overdue upgrades.
If retailers continue to offer a static ecommerce experience with simple text-based search, while Google offers a multimodal and personalised ecommerce experience, retailers will be left behind. Consumers will have little reason to visit websites, leaving Google to capture highly valuable data and a cut of the sales. Retailer websites become little more than data feeds for the giant aggregators.
After all, text has never been the ideal solution for the innately visual experience of shopping. Retailers were limited by technological constraints in the past, but these are no longer in place.
How Cadeera helps retailers take advantage of multimodal search
At Cadeera, we have built multimodal search technology that captures intent from text and image data. This can be applied by any retailer to personalise the customer experience. Whether the customer wants to make an instant purchase or explore similar styles, we guide them on that journey and deliver better outcomes for the retailer.
Cadeera has also developed the world’s largest ontology for the furniture and home décor industries. This means our technology can take the user’s query even further; we can match them to relevant designers, styles, and ideas that will inspire them. Perfect for when a shopper doesn’t know what they want yet, but they know it when they see it.
This technology is specialised for the industry in a way that an all-encompassing search engine will never be. Just as importantly, Cadeera gives the retailer control over their own customer experience, the accompanying data, and the resulting revenues. Google wants to take over this customer journey; it can only do so if retailers are too slow to respond.
Get in touch if you would like to find out how Cadeera can bring multimodal search to your business.