Golang The Series EP.149: Token Management - Counting Tokens & API Cost Control

Welcome to EP.149! In our last episode, we successfully implemented a Streaming system to make our application feel lightning-fast. But as your system scales and your user base grows, there is one thing every Backend Gopher cannot afford to ignore: "Cost Control."

When we connect directly to Cloud LLMs like OpenAI, Anthropic, or Google Gemini, every cent spent is calculated based on "Tokens" (both Input and Output). Today, we’re going to learn how to count tokens directly within our Go backend before hitting the API. This will help us manage budgets, prevent bill shocks, and calculate costs with pinpoint accuracy.

What is a Token? Why Can't We Just Count Words?

Before diving into the code, we must understand that AI doesn't count characters using len(text) or simply count words like a human. Instead, the system breaks text down into units called "Tokens." Different languages have different "exchange rates":

English: 1 Token is roughly 4 characters long, or about one short word.
Thai: Due to the complex structure of consonants, vowels, and tone marks, a single Thai word can be broken into 2–4 tokens! This is why Thai language processing consumes significantly more tokens than English.

⚠️ The Risk: If your backend lacks a pre-submission token check and allows users to send unlimited long prompts, your end-of-month API bill could skyrocket beyond your budget before the team even realizes it.

Counting Tokens in Go with Tiktoken

OpenAI open-sourced a tool for token counting called tiktoken. For Go developers, we have an excellent community-ported library: pkoukk/tiktoken-go.

First, install or update the library to the latest version via your terminal:

Bash

go get github.com/pkoukk/tiktoken-go@latest

Go Implementation for Token Counting:

package main

import (
	"fmt"
	"log"

	"github.com/pkoukk/tiktoken-go"
)

func main() {
	text := "Hello! Welcome to the Go for AI-First era series."

	// Recommended: Use ModelToEncoding to automatically select the correct tokenizer
	// Note: gpt-4o / gpt-4o-mini use the 'o200k_base' encoding (highly efficient for Thai)
	modelName := "gpt-4o"
	encodingName, err := tiktoken.ModelToEncoding(modelName)
	if err != nil {
		// Fallback: If the library isn't updated with the latest model names
		encodingName = "o200k_base" 
	}

	tkm, err := tiktoken.GetEncoding(encodingName)
	if err != nil {
		log.Fatalf("Failed to get encoding: %v", err)
	}

	// Encode raw text into a slice of Token IDs
	tokens := tkm.Encode(text, nil, nil)

	// Results
	fmt.Printf("Text: %s\n", text)
	fmt.Printf("Character count (Runes): %d\n", len([]rune(text)))
	fmt.Printf("Actual Token count: %d tokens\n", len(tokens))
}

Note: Tiktoken is primarily used for OpenAI models. If you are using Google Gemini or Anthropic Claude, you should use their specific tokenizing libraries as their logic for splitting words differs.

Backend Production Implementation Techniques

Once you have the token count, you can enhance your backend's security and efficiency in three critical ways:

Rate Limiting / Quota Guard: Create a middleware to check if a user has exceeded their token quota (e.g., per day or per hour) before sending the request to OpenAI. This prevents bot spam or DDoS attacks that could result in a massive surprise bill.
Managing the Context Window: In chatbots, we must send Chat History back to the AI for continuity. Counting tokens on the fly tells the backend exactly when to Truncate or summarize old history so the data doesn't exceed the model's Context Window limit.
Cost Analytics & Billing: Log the number of tokens (both Input and Output) into your database (MySQL/PostgreSQL) linked to the User ID. This data can be used to build a Pay-per-use Billing system or a real-time cost dashboard for your management team.

Example: Cost Calculation Struct

type APILog struct {
	UserID           string  `json:"user_id"`
	PromptTokens     int     `json:"prompt_tokens"`
	CompletionTokens int     `json:"completion_tokens"`
	TotalCostUSD     float64 `json:"total_cost_usd"`
}

func CalculateCost(prompt, completion int) float64 {
	// Example pricing for GPT-4o (Check OpenAI for latest rates)
	inputPricePerMillion := 5.00
	outputPricePerMillion := 15.00
	
	inputCost := (float64(prompt) / 1000000) * inputPricePerMillion
	outputCost := (float64(completion) / 1000000) * outputPricePerMillion
	
	return inputCost + outputCost
}

⚠️ Production Note for Gophers: In this example, we use float64 for simplicity. However, for real financial systems, never use float64 to calculate money due to floating-point rounding errors. Use a library like shopspring/decimal for 100% financial accuracy.

🎯 Daily Mission

Challenge yourself to prove this theory: Copy one Thai sentence and one English sentence of similar length, then run them through the token-counting code above.

Homework: Observe the results—how many times more tokens does Thai consume compared to English?
Bonus Question: If we implemented a "Stop Words Removal" function (stripping words like "is/am/are" or Thai particles like "ครับ/ค่ะ") before hitting the API, what percentage of costs could we save? Share your findings in the comments!

FAQ (Frequently Asked Questions)

Q: Why does my Tiktoken count slightly differ from OpenAI's actual billed tokens?

A: Tiktoken only counts raw text. OpenAI bills for the complete chat structure (System, User, Assistant wrappers) and metadata from features like Function Calling. These add a base overhead of roughly 3–4 tokens per message, making the final bill slightly higher than raw text tokenization.

Q: Can I use Tiktoken to calculate token usage for Claude or Gemini?

A: No. Every LLM provider uses a unique Tokenizer and vocabulary library. Using Tiktoken for Anthropic or Google models will produce inaccurate results. For Claude or Gemini, use the Hugging Face Tokenizers library or their native Token Counter API endpoints instead.

Q: How do I calculate completion tokens on the backend when using Streaming mode?

A: You can either concatenate all received message chunks into a single string after the stream ends and pass it through tkm.Encode(), or (recommended) set the include_usage parameter to true in your stream request configuration to have the API return the final token metadata automatically.

Conclusion

Congratulations! You’ve now mastered the essential puzzles: Docker, REST vs RPC, Local/Cloud LLM, Prompts, JSON Output, Streams, and now Cost Control.

Coming up next in EP.150: We are bringing all these pieces together for "Workshop 1: Building a Simple AI Chatbot Server with Gin Framework." It’s time to launch the biggest project of the season. Don't miss it!

Follow Superdev Academy on all platforms:

🔵 Facebook: Superdev Academy Thailand
🎬 YouTube: Superdev Academy Channel
📸 Instagram: @superdevacademy
🎬 TikTok: @superdevacademy
🌐 Website: superdevacademy.com