# Semantic Similarity

**Semantic similarity** measures how closely products are related in *meaning* — based on their text descriptions, attributes, and metadata — rather than behavioral signals.

#### How it is calculated

Each product description and metadata is encoded into a **dense vector (embedding)** using multilingual transformer models (e.g., SBERT, FastText).\
The algorithm computes the **cosine similarity** between product vectors:

<figure><img src="https://2350286830-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F6Yw9IRJ6KbbucQPwZUCZ%2Fuploads%2FDNgpxOw1mxGKyraFoORB%2Fimage.png?alt=media&#x26;token=6bfbef64-92fc-4f0a-b727-9e2adacf152a" alt=""><figcaption></figcaption></figure>

Where:

* `v_i`, `v_j` = product embedding vectors
* Similarity ∈ \[0,1], with 1 meaning semantically identical.

#### Example

* Source product: *“Nike Air Zoom Pegasus 40 running shoe, red”*\
  → Semantically similar products might include:
  * *“Adidas Ultraboost 22 running shoe, blue”* (same purpose and category)
  * *“Salomon Speedcross trail shoe”* (related usage, similar function)
  * ❌ *“Red dress”* (similar color but irrelevant meaning — model filters this out).

#### Multilingual model

The embedding model is trained on **109 languages**, including those without spaces (Japanese, Chinese, Thai, etc.), allowing semantic matching across all markets.\
This differs from the *Content Interest Criterion* used in the Segment Builder, which only supports Western languages.

#### Key takeaways

* Works from day one — no behavioral data needed.
* Ideal for new, long-tail, or low-traffic products.
* Enables “Similar products” and “Alternative discovery” strategies.
