Golang The Series EP.145: How to Run Llama 3 Locally with Ollama and Go

In our previous episode, we explored how to connect to GPT-4o via Cloud APIs. However, for projects that demand high privacy or long-term cost control, Local LLMs are the ideal solution. Today, we’re going to transform your computer into a private AI server using Ollama—a tool that stands out for being free to run, keeping your data secure, and functioning even without an internet connection.

How to Install Ollama on Windows/Mac to Run AI Models Locally

Ollama is currently the easiest tool for downloading and running Large Language Models (LLMs) such as Llama 3, Mistral, or Phi-3. It handles all the background library complexities, allowing us to focus entirely on building our applications.

Installation Steps:

Download: Visit ollama.com to download the installer (supports macOS, Linux, and Windows).
Install: Run the installer and follow the standard installation process.
Run Model: Open your Terminal and type the command to download and run your desired model:

Bash

ollama run llama3

Note: The first time you run this command, the system will download the model to your machine (approximately 4.7GB for Llama 3 8B). Once the download is complete, you can start chatting with the AI directly through your Terminal.

Connecting Ollama with Go: Commanding AI via Code

Once Ollama is installed, it automatically runs an API Server on port 11434 (localhost:11434). This allows us to write Go programs to interact with the model immediately.

While you could write custom net/http calls to hit the API directly, we will use the official Ollama library for speed, organization, and to keep our code looking like a true Gopher.

Library Installation

Run this command in your terminal to install the official Ollama SDK:

Bash

go get github.com/ollama/ollama

Code Example: Prompting the Model (Streaming)

The core of the Ollama SDK in Go is the Generate function, which uses a Callback Function to handle real-time data.

package main

import (
	"context"
	"fmt"
	"log"

	"github.com/ollama/ollama/api"
)

func main() {
	// 1. Create a client to connect to Ollama (automatically pulls from environment)
	client, err := api.ClientFromEnvironment()
	if err != nil {
		log.Fatal("Could not connect to Ollama:", err)
	}

	ctx := context.Background()

	// 2. Define the request details
	req := &api.GenerateRequest{
		Model:  "llama3", // Specify the model name downloaded on your machine
		Prompt: "Why is Docker important for AI work? Summarize briefly.",
	}

	// 3. Callback function to handle the streaming response
	respFunc := func(resp api.GenerateResponse) error {
		// Print each response chunk immediately as it arrives
		fmt.Print(resp.Response)
		return nil
	}

	// 4. Send the request to the model
	err = client.Generate(ctx, req, respFunc)
	if err != nil {
		fmt.Println("\nAn error occurred:", err)
	}
}

Why use this approach?

api.ClientFromEnvironment(): Highly convenient as it automatically locates the Ollama service running on your machine without needing to manually specify the IP or Port.
respFunc: The Ollama SDK is designed for streaming by default. Using this callback function ensures your application feels "smooth" and responsive, as it displays the AI's output piece by piece rather than waiting for the entire generation to finish.

Local vs. Cloud: Comparing Ollama and OpenAI—Which One Should You Choose?

Deciding whether to run a model locally (Ollama) or use an external service (OpenAI) doesn't have a single "right" answer. It depends entirely on the requirements of your specific project. Here is a head-to-head comparison to help you decide.

Feature	Local LLM (Ollama)	Cloud API (OpenAI)
Cost	Free (No subscription or token fees)	Pay-as-you-go based on Tokens
Privacy	Maximum (Data stays 100% on your machine)	Data is processed on the Cloud
Internet	Offline (No connection required)	Constant, stable connection required
Performance	Dependent on your GPU/RAM	Dependent on internet and server load
Intelligence	Moderate (Depends on the model size)	Very High (Access to massive models)
Scalability	Hard (Requires hardware upgrades)	Very Easy (Just request more quota)

💡 Quick Summary for Decision Making:

Choose Local (Ollama) when: You are working on projects with highly sensitive data (e.g., customer records, internal accounting), want to save on long-term costs, and have access to decent hardware (especially a dedicated GPU).
Choose Cloud (OpenAI) when: You need the highest possible intelligence (complex reasoning), want to avoid server setup/maintenance, or are building an app for a large number of concurrent users (High Scalability).

🎯 Daily Mission

To see this in action, I want everyone to try installing Ollama and downloading a "Small Language Model" (SLM) that isn't too hardware-demanding, such as phi3 or gemma:2b.

Homework Challenge:

Modify the Go code from the previous section to accept input via the Command Line using os.Args. This will allow you to pass your questions directly when running the program.

Goal: You should be able to run: go run main.go "Summarize Go's top 3 features"
Tip: Don't forget to check the number of arguments (using len(os.Args)) to prevent the program from crashing (Panic) if the user forgets to type a question!

Conclusion: A Major Step Towards Autonomous AI

Choosing Ollama paired with Go is about more than just cost-cutting. It’s about opening the door to building applications that are truly portable and 100% secure. Even in a world without an internet connection, your application remains intelligent and ready to serve your users at all times.

However, there is no perfect choice between Local and Cloud. As programmers, our responsibility is to select the right tool for the specific problem at hand and extract the maximum performance from it.

Coming Up Next | EP.146: Prompt Engineering for Gophers — Mastering AI Commands

Whether you choose a world-class model like GPT-4o or a local model via Ollama, the single factor that determines if the AI is worth its cost is the Prompt.

In the next episode, we’ll move beyond just connecting to commanding. We will dive deep into Prompt Engineering within code—techniques to control output precision and push the AI's efficiency to its absolute limit!

Follow Superdev Academy on all platforms:

🔵 Facebook: Superdev Academy Thailand
🎬 YouTube: Superdev Academy Channel
📸 Instagram: @superdevacademy
🎬 TikTok: @superdevacademy
🌐 Website: superdevacademy.com