Understanding embeddings - how machines see language

Embedding is the concept of representing words, concepts, and sentences so that computers can understand them. It’s fundamental to LLMs, and it also underpins things like semantic search, Netflix recommendations, and Google translate. I want to play around with embeddings a bit to see how they work and how they can be used.

Using local LLMs

I’m more interested in using local LLM models than using ChatGPT or Claude, mostly because I work with client data that can’t be sent out into the ether. I use Ollama, a free software tool that lets you run large language models like Llama, Phi, Mistral, and Gemma on your local machine.

After installing Ollama, open the terminal and type:

$ ollama pull llama3.1
$ ollama run llama3.1
>>> in one sentence, what is the meaning of life?

The meaning of life is a subjective and often debated concept, but it can be distilled to finding purpose, happiness, and fulfillment through personal growth, relationships, and contributions that bring value to oneself and others.

Seems about right. We now have an LLM running on our machine.

We should really be using a model specifically designed for embeddings, but the local ones available with Ollama give me worse results than Llama3.1. More on that later.

Embedding

When embedding language into a representation machines understand, we turn the words into vectors.

For example, if we were using two-dimensional vectors, we could visualise them in a 2D graph, shown below:

Even though “arrow” and “sparrow” are spelled similarly and sound similar, their 2D vector representation is more different than the difference between sparrow and eagle. This difference is usually measured by calculating cosine similarity, essentially the angle from one vector to another.

Embedding with Ollama and LangChain

The easiest way I could find to play around with embeddings with Ollama was LangChain, a toolkit meant to make application development with LLMs easier.

from langchain_ollama import OllamaEmbeddings
embeddings = OllamaEmbeddings(
    model="llama3.1",
)

We are using Llama3.1 because it has a version with 8bn parameters that performs acceptably fast on my Mac.

Now, let’s embed some words.

sparrow_vector = embeddings.embed_query("sparrow")
arrow_vector = embeddings.embed_query("arrow")
eagle_vector = embeddings.embed_query("eagle")

Here is what the first 10 elements of the sparrow vector look like.

sparrow_vector[:10]
[-0.0054991185,
 -0.026986344,
 0.022912376,
 0.014657578,
 0.009402687,
 0.0089849755,
 -0.016890066,
 0.0144533245,
 0.017101986,
 0.0026423668]

And if we look at the dimensions of the vector, we get a bit more than two dimensions:

len(sparrow_vector)
4096

Calculating similarities between vectors

Next, we use sklearn to calculate the cosine similarity between the vectors we’ve just created.

from sklearn.metrics.pairwise import cosine_similarity
similarity_matrix = cosine_similarity([sparrow_vector, arrow_vector, eagle_vector])

And to make it easier on the eyes we create a dataframe to visualise it

# code for compare_words hidden for clarity
df = compare_words(["sparrow", "arrow", "eagle"], similarity_matrix)
df
First Second Similarity
arrow sparrow 0.36
eagle sparrow 0.79
eagle arrow 0.35

As expected, the two birds are similar while the arrow is not. We can take this even further.

sentences = [
    "The man with the tie ran for office.",
    "The man with the tie ran for a bus.",
    "The woman in the dress became a politician."
]
vectors = []
for s in sentences:
    vectors.append(embeddings.embed_query(s))
similarity_matrix_v2 = cosine_similarity(vectors)
compare_words(sentences, similarity_matrix_v2)
First Second Similarity
The man with the tie ran for a bus. The man with the tie ran for office. 0.87
The woman in the dress became a politician. The man with the tie ran for office. 0.91
The woman in the dress became a politician. The man with the tie ran for a bus. 0.85

Again, the man running for office is closer to the woman politician than to the man running for the bus. Using our embeddings to find similar sentiments works!

I usually give ChatGPT a chance to code for me, but in the case of using Ollama to have local LLMs do embeddings, it was useless. I had to use Google (gasp) and find the LangChain documentation on Ollama embeddings.