The relationship between people and their homes has changed distinctly over the last two years.
Homes are now places to work and learn, as well as escape from the daily grind. Unsurprisingly, many people are keen to freshen up their kitchens and bathrooms after a long period indoors.
This trend looks set to carry on into 2022: Harvard University’s Leading Indicator of Remodeling Activitypredicts an 8.6% increase in home remodelling spending next year.
What role can visual search play, both in inspiring shoppers and in linking them to the right retailer inventory?
The consumer perspective
Consumers can approach a shopping journey in a variety of mindsets. Sometimes, they know exactly what they want and they need the simplest route to purchase. This consumer has a high level of purchase intent and a limited desire to browse.
As our example, let’s assume that a consumer, Sophia, wants to purchase this kitchen tap:
She knows its colour, she knows it is an instant hot water tap, and she can describe its shape. Using text-only keyword search, how quickly can Sophia find this exact product online?
According to new Harris Poll research for Google, retailers in the US alone lose over $300 billion to “search abandonment”. This occurs when the consumer has a product in mind, but is unable to locate the product on a retailer’s website. As a result, they abandon the search. The research reports that for a huge 76% of shoppers an unsuccessful search resulted in a lost sale for the retail website.
All too often, consumers in a purchasing mindset leave a website frustrated and empty-handed. Forrester finds that 43% of consumers start their on-site journey by using the text search box and this helps to explain the missed opportunity.
Text alone cannot describe products with enough precision to source that one matching item from an inventory of thousands of SKUs.
So what is the solution to Sophia’s challenge?
Multimodal search, which blends inputs from multiple sources including text and images, allows the consumer to express their intent in more detail.
In this instance, Sophia can use visual search to share an image of the kitchen tap. Computer vision can identify the defining characteristics of the image. Next, it can find the closest match within the retailer’s inventory and return that one result to the consumer.
This is a much closer match to Sophia’s original intent.
If she wishes to browse more than one result, Cadeera can suggest close alternatives or open this up to an exploratory shopping journey.
Visual search caters very well to what we can call a “closed” mindset, where the consumer has a fixed idea of the outcome they desire from a website visit.
At the other end of the scale, consumers often visit kitchen and bathroom websites in the hope of discovering inspiration. We can consider this an “open” mindset.
Ecommerce websites do not serve this mindset successfully today. Rigid templates create a structure that rarely lends itself to a fluid shopping journey.
Yet multimodal search, in combination with new technologies like headless commerce, can help retailers deliver on this consumer expectation.
For example, Sophia may revisit the website and share this image of a bathroom she likes:
Visual search can identify not only the objects in the scene, but also their style. Cadeera can pick up on visual clues that will inform Sophia’s taste profile. From here, the technology can suggest appropriate styles and products for this individual consumer.
The retailer perspective
Naturally, the above is only one component of the full narrative. Visual search can help understand consumer intent, but that does not always mean retailers can serve this intent accurately.
The consumer has a product in mind, yet the retailer must also take into account the technical specifications of their inventory. Products may look very similar and still have very different applications.
Therefore, visual search technology can only work to its full potential if the retailer’s inventory is tagged correctly. This data remediation process is fundamental when preparing a website for multimodal search.
Cadeera begins with an inventory ingestion to assess the quantity and quality of data, before addressing any issues with an automated tagging solution.
From here, retailers can start to match products with the consumer intent Cadeera has gathered.
Another frustration for consumers arrives when they do track down the product they want, only to find that it is not in stock. In fact, Google searches for the term “in stock” are up 800% year-over-year. Cadeera’s real-time inventory tracking ensures that consumers are directed to suitable products that are ready to buy now.
This structural work is not the most glamorous element of visual search, but it is essential and often overlooked. Once this is in place, retailers can start to deliver true personalisation.
There is a misconception in ecommerce that personalisation only works when a large amount of customer data is available. According to this school of thought, retailers must build up a stock of customer data before they can start to tailor their experience. Until this point, consumers must make do with a static web experience.
Yet retailers can invite consumers to share their purchase intentions and preferences. They can achieve this through observed data, adapting to the consumer’s interactions with the website. They can use inferred data, by drawing on past examples from similar users to tailor the experience for new users.
Or they can simply ask the consumer to share their intentions. This is known as zero-party data and it is increasingly useful in an era of reduced third-party tracking. Consumers can respond to short questions or prompts and the retailer can personalise content based on their answers.
Multimodal search can be very powerful in these instances. For instance, the retailer can let the consumer share a photo and then add further context.
Let’s return to Sophia’s example of the kitchen tap from earlier.
When Sophia shares this image, how can computer vision alone satisfy her search intent?
It recognises the exact product and serves up that result, but what if Sophia simply likes the style of this tap? What if she is more interested in replicating the look of this sink area?
Multimodal search allows her to add context to the image using a short text prompt. For example, she could add a comment like, “taps in this style”. This data is blended with the image to reshape the query, and therefore the results she will see.
The future of personalisation is collaborative. It does not depend on invasive tracking and vague assumptions, but rather invites users to interact.