Running Hugging Face models in Ollama

I’m a big fan of Ollama, the easiest way to run LLMs on my Mac. Ollama integrates nicely with Hugging Face, the GitHub of LLMs, so it can be very easy to run a model published on Hugging Face with Ollama.

I came across a blog post on Hugging Face showing how to do this.

  1. Install Ollama
  2. Log in to Hugging Face and enable Ollama in the local apps settings
  3. Click on the “use this model” drop down and click on Ollama

That will show you a console command you can paste into your terminal. The model I want to try out is mxbai-embed-large-v1-Q4_K_M-GGUF because it was the highest scoring GGUF model I could find on the embeddings leaderboard, and I’m experimenting with embeddings.

ollama run hf.co/elliotsayes/mxbai-embed-large-v1-Q4_K_M-GGUF

We now have the model accessible to Ollama. Note that the name of the model will be the full path, i.e. hf.co/.../.

So, to create a vector embedding with the model you would say

import ollama
embeddings = ollama.embed(
  model='hf.co/elliotsayes/mxbai-embed-large-v1-Q4_K_M-GGUF',
  input='The man with a tie ran for office.',
).embeddings
embeddings[0][:10]
[0.04531903,
 -0.015188997,
 -0.01498892,
 0.0064618383,
 -0.026388915,
 0.052302107,
 -0.033618174,
 -0.0036809053,
 0.026822101,
 -0.0020189686]