Semantic Similarity
Semantic similarity measures how closely products are related in meaning — based on their text descriptions, attributes, and metadata — rather than behavioral signals.
How it is calculated
Each product description and metadata is encoded into a dense vector (embedding) using multilingual transformer models (e.g., SBERT, FastText). The algorithm computes the cosine similarity between product vectors:

Where:
v_i,v_j= product embedding vectorsSimilarity ∈ [0,1], with 1 meaning semantically identical.
Example
Source product: “Nike Air Zoom Pegasus 40 running shoe, red” → Semantically similar products might include:
“Adidas Ultraboost 22 running shoe, blue” (same purpose and category)
“Salomon Speedcross trail shoe” (related usage, similar function)
❌ “Red dress” (similar color but irrelevant meaning — model filters this out).
Multilingual model
The embedding model is trained on 109 languages, including those without spaces (Japanese, Chinese, Thai, etc.), allowing semantic matching across all markets. This differs from the Content Interest Criterion used in the Segment Builder, which only supports Western languages.
Key takeaways
Works from day one — no behavioral data needed.
Ideal for new, long-tail, or low-traffic products.
Enables “Similar products” and “Alternative discovery” strategies.
Last updated
Was this helpful?

