View : 109

30/06/2026 04:01am

Semantic Search Pipeline Diagram using Go, OpenAI Embedding, and Qdrant DB

Golang The Series EP.156: Semantic Search with Qdrant & OpenAI

#Golang

#RAG

#Vector Database

#Qdrant

#Go

#Semantic Search

Welcome to EP.156! In our last episode, we discussed strategic chunking to keep our text segments contextually rich. Today, all the pieces we’ve been gathering since the start of this season finally come together to build Semantic Search—an intelligent search system that understands the actual intent and meaning behind human language, rather than just matching raw keywords.

Imagine a user searches for "how to fix internet connection issues". A traditional Keyword Search would completely miss a troubleshooting guide titled "Router Setup and Wi-Fi Dropouts Guide" simply because the exact words "internet" or "connection" never appear in it. Semantic Search, however, immediately recognizes that both phrases are talking about the exact same problem—all thanks to vector embeddings.

Let’s dive into Go and build this intelligent search pipeline.

The Semantic Search Pipeline Under the Hood

When a query hits our backend, the system processes it through 3 core steps:

  1. Query Embedding: The raw user query (e.g., "how do I reset my password") is sent to the Embedding API, converting it into a 1,536-dimensional vector ([]float32).

  2. Vector Similarity Search: That query vector is sent directly to our Qdrant Vector Database using a Nearest Neighbor search to pinpoint the text chunks closest to the query's conceptual meaning.

  3. Payload Extraction: We extract the raw text and metadata stored within Qdrant's payload to serve it back to the user or pass it to an LLM for response generation in the next step.

Building Semantic Search in Go with Qdrant

Here is a practical Go example that takes a user query, converts it into an embedding, and queries the Qdrant collection we set up back in EP.154.

Go

package main

import (
	"context"
	"fmt"
	"log"

	"github.com/qdrant/go-client/qdrant"
	"github.com/sashabaranov/go-openai"
)

func main() {
	ctx := context.Background()

	// 1. Initialize OpenAI and Qdrant clients (Use env variables for production!)
	openaiClient := openai.NewClient("YOUR_OPENAI_API_KEY")
	qdrantClient, err := qdrant.NewClient(&qdrant.Config{
		Host: "localhost",
		Port: 6334, // Utilizing gRPC for maximum speed
	})
	if err != nil {
		log.Fatalf("Failed to connect to Qdrant: %v", err)
	}
	defer qdrantClient.Close()

	// Simulate a user query
	userQuery := "อยากเปลี่ยนพาสเวิร์ดระบบต้องทำตรงไหน"

	// 2. Transform the user query into a vector (Query Embedding)
	embReq := openai.EmbeddingRequest{
		Input: []string{userQuery},
		Model: openai.SmallEmbedding3Small, // Popular 1,536-dimension model
	}
	embResp, err := openaiClient.CreateEmbeddings(ctx, embReq)
	if err != nil {
		log.Fatalf("Failed to create query embedding: %v", err)
	}
	queryVector := embResp.Data[0].Embedding

	// 3. Execute Semantic Search in Qdrant
	searchLimit := uint64(3) // Fetch top 3 closest matches
	searchResp, err := qdrantClient.Query(ctx, &qdrant.QueryPoints{
		CollectionName: "ai_knowledge_base",
		Query:          qdrant.NewQuery(queryVector...), // Unpack the float32 array
		Limit:          &searchLimit,
	})
	if err != nil {
		log.Fatalf("Qdrant search query failed: %v", err)
	}

	// 4. Extract and display the payload data
	fmt.Printf("🔍 Search results for: '%s'\n\n", userQuery)
	for i, point := range searchResp {
		payloadMap := point.Payload
		
		// Safely extract the original text from the "content" key
		contentValue, exists := payloadMap["content"]
		if !exists {
			continue
		}
		
		content := contentValue.GetStringValue()
		score := point.Score // Cosine Similarity score (Closer to 1.0 means higher relevance)

		fmt.Printf("[%d] Similarity Score: %.4f\n", i+1, score)
		fmt.Printf("   Document Content: %s\n\n", content)
	}
}

Why Semantic Search with Go is Incredibly Powerful

  • Low-Latency Serialization: Qdrant's Go client communicates natively over gRPC. This serializes massive vector arrays into a compact binary format right from the source, eliminating the overhead of parsing large arrays of floating-point numbers into JSON string text. Your system gets responses back in single-digit milliseconds.

  • Ready for RAG: The raw text chunks we pulled based on their similarity scores are the exact "reference material" we need. In the upcoming RAG phase, we will bundle these chunks with the original user query and feed them to an LLM to generate precise, grounded answers.

🎯 Daily Mission

Try integrating this Semantic Search logic into the Gin Web Server we built together back in Workshop EP.150. Instead of having an endpoint that serves static or randomized text, turn it into an intelligent document retrieval API.

💡 Food for Thought: Test out queries that match your documents word-for-word versus queries using synonyms, and watch how the Score changes. If you were deploying this to a production environment, what minimum similarity score (Threshold) would you enforce to filter out irrelevant noise? Go ahead and experiment!

💬 FAQ

What similarity score should I consider "relevant enough" for real-world use?

For OpenAI's text-embedding-3-small using Cosine Similarity, a safe production threshold for Thai context typically floats around 0.45 - 0.60+ (highly dependent on your chunk sizes). We recommend setting an initial threshold at 0.50. Anything below that can generally be discarded as irrelevant.

Can we combine traditional keyword search and semantic search in a live system?

Absolutely. In fact, this is the enterprise gold standard known as Hybrid Search. You run a classic keyword-matching algorithm (like BM25) alongside your Vector Search, then merge and re-rank the results using RRF (Reciprocal Rank Fusion). Qdrant actually supports this natively out of the box!


📝 Wrap-up

Stepping into semantic search brings us closer to building context-aware AI applications that genuinely understand what users are looking for, moving far beyond fragile keyword matching.

Coming up next in EP.157: Our search system is ready and our vector base is smart, but in reality, nobody manually types sentences into a payload map. Company documentation lives in dense, multi-page PDFs and Word files. In the next episode, we will build an automated "Document Ingestion Pipeline: Ingesting Data from PDFs/Word Documents Directly to Your Vector DB". Stay tuned!

Follow Superdev Academy on all platforms: