Ollama

Ollama is a way of running large language model locally, it's like running ChatGPT on your own machine. Of course, this means that the answers are generated on your machine, so you'll need a beefy one.

My screenfetch output:

 ██████████████████  ████████     trustm3@trustm3
 ██████████████████  ████████     OS: Manjaro 24.0.1 Wynsdey
 ██████████████████  ████████     Kernel: x86_64 Linux 6.9.2-1-MANJARO
 ██████████████████  ████████     Uptime: 8h 8m
 ████████            ████████     Packages: 1277
 ████████  ████████  ████████     Shell: zsh 5.9
 ████████  ████████  ████████     Resolution: 1920x1080
 ████████  ████████  ████████     DE: KDE
 ████████  ████████  ████████     WM: KWin
 ████████  ████████  ████████     GTK Theme: Breeze-Dark [GTK2], Breeze [GTK3]
 ████████  ████████  ████████     Icon Theme: breeze
 ████████  ████████  ████████     Disk: 280G / 920G (32%)
 ████████  ████████  ████████     CPU: 13th Gen Intel Core i5-13420H @ 12x 4.6GHz [42.0°C]
 ████████  ████████  ████████     GPU: NVIDIA GeForce RTX 3050 6GB Laptop GPU
                                  RAM: 3954MiB / 31822MiB

An example output of running Ollama

Installation

Go to the download page of Ollama and choose your operating system. Then, follow the instructions for installation.

macOS
windows
linux: curl -fsSL https://ollama.com/install.sh | sh

Usage

After the installation, you should be able to use it immidiately. You can find all available models here

We'll be using Llama 3 in this example run ollama pull llama3 You should be seeing something like this

ollama pull llama3
pulling manifest 
pulling 6a0746a1ec1a... 100% ▕███████████████████████████████████████▏ 4.7 GB                         
pulling 4fa551d4f938... 100% ▕███████████████████████████████████████▏  12 KB                         
pulling 8ab4849b038c... 100% ▕███████████████████████████████████████▏  254 B                         
pulling 577073ffcc6c... 100% ▕███████████████████████████████████████▏  110 B                         
pulling 3f8eb4da87fa... 100% ▕███████████████████████████████████████▏  485 B                         
verifying sha256 digest 
writing manifest 
removing any unused layers 
success

Now you can run the llama3 model ollama run llama3 and ask it whatever you want

Next steps

With this running, possibilities are endless 🚀

REST API

https://github.com/ollama/ollama/blob/main/docs/api.md

curl -X POST http://localhost:11434/api/generate -d '{
  "model": "llama3",
  "prompt":"Why is the sky blue?"
 }'

Github Copilot alternative

There's this VSCode extension that you can configure to use Ollama to become your Github Copilot replacement called Continue

After installing it you should be seeing a new icon in your sidebar to ask questions related to your code (give it time to index)

ChatGPT UI

You can run your own WebUI that has Retrieval Augmented Generation (RAG) support and more.

SDK's

There are SDKs available if you want to integrate the models into your existing apps

Conclusion

It's so easy to run a complex AI model locally nowaways, have a look into Huggingface as well.