# Semantic Similarity **Semantic similarity** measures how closely products are related in *meaning* — based on their text descriptions, attributes, and metadata — rather than behavioral signals. #### How it is calculated Each product description and metadata is encoded into a **dense vector (embedding)** using multilingual transformer models (e.g., SBERT, FastText).\ The algorithm computes the **cosine similarity** between product vectors:

Where: * `v_i`, `v_j` = product embedding vectors * Similarity ∈ \[0,1], with 1 meaning semantically identical. #### Example * Source product: *“Nike Air Zoom Pegasus 40 running shoe, red”*\ → Semantically similar products might include: * *“Adidas Ultraboost 22 running shoe, blue”* (same purpose and category) * *“Salomon Speedcross trail shoe”* (related usage, similar function) * ❌ *“Red dress”* (similar color but irrelevant meaning — model filters this out). #### Multilingual model The embedding model is trained on **109 languages**, including those without spaces (Japanese, Chinese, Thai, etc.), allowing semantic matching across all markets.\ This differs from the *Content Interest Criterion* used in the Segment Builder, which only supports Western languages. #### Key takeaways * Works from day one — no behavioral data needed. * Ideal for new, long-tail, or low-traffic products. * Enables “Similar products” and “Alternative discovery” strategies.