Building RAG System with Golang: From OpenAI API to Vector Database Complete Guide
📖 Introduction
RAG (Retrieval-Augmented Generation) has become one of the hottest AI application architectures in 2024-2025. By combining external knowledge base retrieval with large language model generation capabilities, it effectively addresses LLM hallucination issues and knowledge timeliness limitations.
This article will detail how to build a complete RAG system using Golang, including:
- 🔍 Document processing and chunking strategies
- 🧠 Embedding implementation
- 📊 Vector database integration (Qdrant)
- 🔎 Semantic search optimization
- 💬 Integration with OpenAI API for answer generation
- 🚀 Production environment best practices
🎯 What is RAG?
RAG Workflow
User Question → Vectorization → Semantic Search → Retrieve Relevant Documents → Build Prompt → LLM Generate AnswerRAG vs Traditional LLM
| Feature | Traditional LLM | RAG System |
|---|---|---|
| Knowledge Source | Training data (static) | External knowledge base (dynamic) |
| Timeliness | Poor (training time point) | Good (real-time updates) |
| Hallucination | Severe | Significantly reduced |
| Cost | High (requires many tokens) | Medium (retrieves relevant content only) |
| Traceability | None | Yes (can cite sources) |
🛠️ Technology Stack Selection
Core Components
// We will use the following technology stack
- Go 1.21+ // Main development language
- OpenAI API (gpt-4) // LLM service
- text-embedding-ada-002 // Embedding model
- Qdrant // Vector database
- gin // Web framework
- go-openai // OpenAI Go SDKWhy Choose Golang?
- High Performance: Concurrent processing of large volumes of documents
- Simple Deployment: Single binary file
- Excellent Concurrency Model: Goroutines for parallel tasks
- Rich Ecosystem: AI-related libraries maturing rapidly
📦 Environment Setup
1. Install Dependencies
# Initialize project
mkdir golang-rag-system
cd golang-rag-system
go mod init github.com/yourusername/golang-rag
# Install core dependencies
go get github.com/sashabaranov/go-openai
go get github.com/qdrant/go-client
go get github.com/gin-gonic/gin
go get github.com/joho/godotenv2. Configure Environment Variables
# .env
OPENAI_API_KEY=sk-xxxxxxxxxx
QDRANT_URL=http://localhost:6333
QDRANT_API_KEY=your-qdrant-key
COLLECTION_NAME=documents3. Start Qdrant (using Docker)
docker run -p 6333:6333 -p 6334:6334 \
-v $(pwd)/qdrant_storage:/qdrant/storage:z \
qdrant/qdrant💻 Core Implementation
1. Project Structure
golang-rag/
├── cmd/
│ └── main.go # Entry file
├── internal/
│ ├── embedding/ # Embedding generation
│ │ └── openai.go
│ ├── vectordb/ # Vector database operations
│ │ └── qdrant.go
│ ├── chunker/ # Document chunking
│ │ └── text_splitter.go
│ ├── retriever/ # Retriever
│ │ └── retriever.go
│ └── rag/ # RAG core logic
│ └── pipeline.go
├── pkg/
│ └── models/ # Data models
│ └── document.go
├── .env
├── go.mod
└── go.sum2. Data Model Definition
// pkg/models/document.go
package models
type Document struct {
ID string `json:"id"`
Content string `json:"content"`
Metadata map[string]interface{} `json:"metadata"`
Embedding []float32 `json:"embedding,omitempty"`
}
type SearchResult struct {
Document Document `json:"document"`
Score float64 `json:"score"`
}
type RAGResponse struct {
Answer string `json:"answer"`
Sources []SearchResult `json:"sources"`
TokenUsed int `json:"token_used"`
}3. Embedding Generator
// internal/embedding/openai.go
package embedding
import (
"context"
"fmt"
"github.com/sashabaranov/go-openai"
)
type EmbeddingService struct {
client *openai.Client
model string
}
func NewEmbeddingService(apiKey string) *EmbeddingService {
return &EmbeddingService{
client: openai.NewClient(apiKey),
model: openai.AdaEmbeddingV2,
}
}
// GenerateEmbedding generates embedding for a single text
func (s *EmbeddingService) GenerateEmbedding(ctx context.Context, text string) ([]float32, error) {
req := openai.EmbeddingRequest{
Input: []string{text},
Model: s.model,
}
resp, err := s.client.CreateEmbeddings(ctx, req)
if err != nil {
return nil, fmt.Errorf("failed to create embedding: %w", err)
}
if len(resp.Data) == 0 {
return nil, fmt.Errorf("no embedding data returned")
}
return resp.Data[0].Embedding, nil
}
// BatchGenerateEmbeddings generates embeddings in batch (performance optimization)
func (s *EmbeddingService) BatchGenerateEmbeddings(ctx context.Context, texts []string) ([][]float32, error) {
// OpenAI supports up to 2048 inputs per request
const batchSize = 100
var allEmbeddings [][]float32
for i := 0; i < len(texts); i += batchSize {
end := i + batchSize
if end > len(texts) {
end = len(texts)
}
batch := texts[i:end]
req := openai.EmbeddingRequest{
Input: batch,
Model: s.model,
}
resp, err := s.client.CreateEmbeddings(ctx, req)
if err != nil {
return nil, fmt.Errorf("batch embedding failed: %w", err)
}
for _, data := range resp.Data {
allEmbeddings = append(allEmbeddings, data.Embedding)
}
}
return allEmbeddings, nil
}4. Text Chunker
// internal/chunker/text_splitter.go
package chunker
import (
"strings"
"unicode/utf8"
)
type TextSplitter struct {
ChunkSize int // Characters per chunk
ChunkOverlap int // Overlap characters
}
func NewTextSplitter(chunkSize, overlap int) *TextSplitter {
return &TextSplitter{
ChunkSize: chunkSize,
ChunkOverlap: overlap,
}
}
// SplitText splits long text into multiple chunks
func (ts *TextSplitter) SplitText(text string) []string {
if utf8.RuneCountInString(text) <= ts.ChunkSize {
return []string{text}
}
var chunks []string
runes := []rune(text)
start := 0
for start < len(runes) {
end := start + ts.ChunkSize
if end > len(runes) {
end = len(runes)
}
// Try to split at sentence boundaries
if end < len(runes) {
// Find the nearest period, question mark, or newline
for i := end; i > start+ts.ChunkSize/2; i-- {
if runes[i] == '。' || runes[i] == '?' || runes[i] == '\n' || runes[i] == '.' {
end = i + 1
break
}
}
}
chunk := string(runes[start:end])
chunks = append(chunks, strings.TrimSpace(chunk))
// Calculate next start position (considering overlap)
start = end - ts.ChunkOverlap
if start < 0 {
start = 0
}
}
return chunks
}
// SplitByParagraph splits by paragraph (suitable for structured documents)
func (ts *TextSplitter) SplitByParagraph(text string) []string {
paragraphs := strings.Split(text, "\n\n")
var chunks []string
currentChunk := ""
for _, para := range paragraphs {
para = strings.TrimSpace(para)
if para == "" {
continue
}
if utf8.RuneCountInString(currentChunk+para) <= ts.ChunkSize {
if currentChunk != "" {
currentChunk += "\n\n"
}
currentChunk += para
} else {
if currentChunk != "" {
chunks = append(chunks, currentChunk)
}
// If single paragraph exceeds ChunkSize, split further
if utf8.RuneCountInString(para) > ts.ChunkSize {
chunks = append(chunks, ts.SplitText(para)...)
} else {
currentChunk = para
}
}
}
if currentChunk != "" {
chunks = append(chunks, currentChunk)
}
return chunks
}5. Vector Database Operations
// internal/vectordb/qdrant.go
package vectordb
import (
"context"
"fmt"
"github.com/google/uuid"
pb "github.com/qdrant/go-client/qdrant"
"google.golang.org/grpc"
"google.golang.org/grpc/credentials/insecure"
"your-project/pkg/models"
)
type QdrantClient struct {
client pb.PointsClient
collectionName string
}
func NewQdrantClient(address, collectionName string) (*QdrantClient, error) {
conn, err := grpc.Dial(address, grpc.WithTransportCredentials(insecure.NewCredentials()))
if err != nil {
return nil, fmt.Errorf("failed to connect to Qdrant: %w", err)
}
return &QdrantClient{
client: pb.NewPointsClient(conn),
collectionName: collectionName,
}, nil
}
// UpsertDocuments inserts or updates documents
func (q *QdrantClient) UpsertDocuments(ctx context.Context, docs []models.Document) error {
points := make([]*pb.PointStruct, 0, len(docs))
for _, doc := range docs {
if doc.ID == "" {
doc.ID = uuid.New().String()
}
// Convert metadata to payload
payload := make(map[string]*pb.Value)
payload["content"] = &pb.Value{
Kind: &pb.Value_StringValue{StringValue: doc.Content},
}
for k, v := range doc.Metadata {
if strVal, ok := v.(string); ok {
payload[k] = &pb.Value{
Kind: &pb.Value_StringValue{StringValue: strVal},
}
}
}
point := &pb.PointStruct{
Id: &pb.PointId{
PointIdOptions: &pb.PointId_Uuid{Uuid: doc.ID},
},
Vectors: &pb.Vectors{
VectorsOptions: &pb.Vectors_Vector{
Vector: &pb.Vector{Data: doc.Embedding},
},
},
Payload: payload,
}
points = append(points, point)
}
_, err := q.client.Upsert(ctx, &pb.UpsertPoints{
CollectionName: q.collectionName,
Points: points,
})
return err
}
// Search performs semantic search
func (q *QdrantClient) Search(ctx context.Context, queryVector []float32, topK int) ([]models.SearchResult, error) {
resp, err := q.client.Search(ctx, &pb.SearchPoints{
CollectionName: q.collectionName,
Vector: queryVector,
Limit: uint64(topK),
WithPayload: &pb.WithPayloadSelector{SelectorOptions: &pb.WithPayloadSelector_Enable{Enable: true}},
})
if err != nil {
return nil, fmt.Errorf("search failed: %w", err)
}
results := make([]models.SearchResult, 0, len(resp.Result))
for _, point := range resp.Result {
content := ""
metadata := make(map[string]interface{})
if payload := point.Payload; payload != nil {
if contentVal, ok := payload["content"]; ok {
if strVal := contentVal.GetStringValue(); strVal != "" {
content = strVal
}
}
for k, v := range payload {
if k != "content" {
if strVal := v.GetStringValue(); strVal != "" {
metadata[k] = strVal
}
}
}
}
results = append(results, models.SearchResult{
Document: models.Document{
ID: point.Id.GetUuid(),
Content: content,
Metadata: metadata,
},
Score: float64(point.Score),
})
}
return results, nil
}6. RAG Core Pipeline
// internal/rag/pipeline.go
package rag
import (
"context"
"fmt"
"strings"
"github.com/sashabaranov/go-openai"
"your-project/internal/embedding"
"your-project/internal/vectordb"
"your-project/pkg/models"
)
type RAGPipeline struct {
embeddingService *embedding.EmbeddingService
vectorDB *vectordb.QdrantClient
llmClient *openai.Client
topK int
}
func NewRAGPipeline(
embService *embedding.EmbeddingService,
vdb *vectordb.QdrantClient,
openaiKey string,
topK int,
) *RAGPipeline {
return &RAGPipeline{
embeddingService: embService,
vectorDB: vdb,
llmClient: openai.NewClient(openaiKey),
topK: topK,
}
}
// Query executes RAG query
func (r *RAGPipeline) Query(ctx context.Context, question string) (*models.RAGResponse, error) {
// 1. Vectorize the question
queryVector, err := r.embeddingService.GenerateEmbedding(ctx, question)
if err != nil {
return nil, fmt.Errorf("failed to generate query embedding: %w", err)
}
// 2. Retrieve relevant documents
searchResults, err := r.vectorDB.Search(ctx, queryVector, r.topK)
if err != nil {
return nil, fmt.Errorf("failed to search documents: %w", err)
}
if len(searchResults) == 0 {
return &models.RAGResponse{
Answer: "Sorry, I couldn't find any relevant information.",
Sources: []models.SearchResult{},
}, nil
}
// 3. Build context
context := r.buildContext(searchResults)
// 4. Call LLM to generate answer
answer, tokenUsed, err := r.generateAnswer(ctx, question, context)
if err != nil {
return nil, fmt.Errorf("failed to generate answer: %w", err)
}
return &models.RAGResponse{
Answer: answer,
Sources: searchResults,
TokenUsed: tokenUsed,
}, nil
}
// buildContext builds context prompt
func (r *RAGPipeline) buildContext(results []models.SearchResult) string {
var sb strings.Builder
sb.WriteString("Here is the relevant reference information:\n\n")
for i, result := range results {
sb.WriteString(fmt.Sprintf("【Reference %d】\n%s\n\n", i+1, result.Document.Content))
}
return sb.String()
}
// generateAnswer calls LLM to generate final answer
func (r *RAGPipeline) generateAnswer(ctx context.Context, question, context string) (string, int, error) {
prompt := fmt.Sprintf(`You are a professional technical assistant. Please answer the user's question based on the following reference information.
%s
User Question: %s
Please note:
1. Answer only based on the reference information, do not fabricate content
2. If the reference information is insufficient, please clearly state so
3. Answers should be accurate, professional, and easy to understand
4. You can cite reference information numbers
Your answer:`, context, question)
req := openai.ChatCompletionRequest{
Model: openai.GPT4TurboPreview,
Messages: []openai.ChatCompletionMessage{
{
Role: openai.ChatMessageRoleSystem,
Content: "You are a professional technical assistant who excels at accurately answering questions based on provided reference materials.",
},
{
Role: openai.ChatMessageRoleUser,
Content: prompt,
},
},
Temperature: 0.7,
MaxTokens: 1000,
}
resp, err := r.llmClient.CreateChatCompletion(ctx, req)
if err != nil {
return "", 0, err
}
if len(resp.Choices) == 0 {
return "", 0, fmt.Errorf("no response from LLM")
}
return resp.Choices[0].Message.Content, resp.Usage.TotalTokens, nil
}7. HTTP API Interface
// cmd/main.go
package main
import (
"context"
"log"
"net/http"
"os"
"github.com/gin-gonic/gin"
"github.com/joho/godotenv"
"your-project/internal/chunker"
"your-project/internal/embedding"
"your-project/internal/rag"
"your-project/internal/vectordb"
"your-project/pkg/models"
)
type Server struct {
ragPipeline *rag.RAGPipeline
embService *embedding.EmbeddingService
vectorDB *vectordb.QdrantClient
chunker *chunker.TextSplitter
}
func main() {
// Load environment variables
if err := godotenv.Load(); err != nil {
log.Println("Warning: .env file not found")
}
// Initialize services
embService := embedding.NewEmbeddingService(os.Getenv("OPENAI_API_KEY"))
vectorDB, err := vectordb.NewQdrantClient(
os.Getenv("QDRANT_URL"),
os.Getenv("COLLECTION_NAME"),
)
if err != nil {
log.Fatal("Failed to connect to Qdrant:", err)
}
ragPipeline := rag.NewRAGPipeline(embService, vectorDB, os.Getenv("OPENAI_API_KEY"), 5)
textSplitter := chunker.NewTextSplitter(500, 50)
server := &Server{
ragPipeline: ragPipeline,
embService: embService,
vectorDB: vectorDB,
chunker: textSplitter,
}
// Setup routes
r := gin.Default()
r.POST("/ingest", server.handleIngest)
r.POST("/query", server.handleQuery)
r.GET("/health", func(c *gin.Context) {
c.JSON(http.StatusOK, gin.H{"status": "healthy"})
})
log.Println("Server starting on :8080")
if err := r.Run(":8080"); err != nil {
log.Fatal("Failed to start server:", err)
}
}
// handleIngest handles document ingestion
func (s *Server) handleIngest(c *gin.Context) {
var req struct {
Text string `json:"text" binding:"required"`
Metadata map[string]interface{} `json:"metadata"`
}
if err := c.ShouldBindJSON(&req); err != nil {
c.JSON(http.StatusBadRequest, gin.H{"error": err.Error()})
return
}
ctx := context.Background()
// 1. Chunk the text
chunks := s.chunker.SplitText(req.Text)
// 2. Generate embeddings
embeddings, err := s.embService.BatchGenerateEmbeddings(ctx, chunks)
if err != nil {
c.JSON(http.StatusInternalServerError, gin.H{"error": "Failed to generate embeddings"})
return
}
// 3. Build documents
docs := make([]models.Document, len(chunks))
for i, chunk := range chunks {
docs[i] = models.Document{
Content: chunk,
Metadata: req.Metadata,
Embedding: embeddings[i],
}
}
// 4. Store in vector database
if err := s.vectorDB.UpsertDocuments(ctx, docs); err != nil {
c.JSON(http.StatusInternalServerError, gin.H{"error": "Failed to store documents"})
return
}
c.JSON(http.StatusOK, gin.H{
"message": "Documents ingested successfully",
"chunk_count": len(chunks),
})
}
// handleQuery handles query requests
func (s *Server) handleQuery(c *gin.Context) {
var req struct {
Question string `json:"question" binding:"required"`
}
if err := c.ShouldBindJSON(&req); err != nil {
c.JSON(http.StatusBadRequest, gin.H{"error": err.Error()})
return
}
ctx := context.Background()
response, err := s.ragPipeline.Query(ctx, req.Question)
if err != nil {
c.JSON(http.StatusInternalServerError, gin.H{"error": err.Error()})
return
}
c.JSON(http.StatusOK, response)
}🧪 Testing and Usage
1. Ingest Documents
curl -X POST http://localhost:8080/ingest \
-H "Content-Type: application/json" \
-d '{
"text": "Golang is an open-source programming language developed by Google, known for its simple syntax, excellent concurrency performance, and fast compilation speed. Go is particularly suitable for building high-performance server-side applications, microservice architectures, and cloud-native applications.",
"metadata": {
"source": "golang-introduction",
"category": "programming-language"
}
}'2. Query Questions
curl -X POST http://localhost:8080/query \
-H "Content-Type: application/json" \
-d '{
"question": "What are the characteristics of Golang?"
}'Response Example
{
"answer": "Based on the reference information, Golang's main characteristics include:\n1. Simple syntax\n2. Excellent concurrency performance\n3. Fast compilation speed\n4. Particularly suitable for building high-performance server-side applications, microservice architectures, and cloud-native applications",
"sources": [
{
"document": {
"id": "xxx-xxx-xxx",
"content": "Golang is an open-source programming language developed by Google...",
"metadata": {
"source": "golang-introduction"
}
},
"score": 0.89
}
],
"token_used": 245
}🚀 Production Environment Optimization
1. Performance Optimization
Using Connection Pooling
// Configure HTTP client connection pool for OpenAI API
httpClient := &http.Client{
Timeout: 30 * time.Second,
Transport: &http.Transport{
MaxIdleConns: 100,
MaxIdleConnsPerHost: 10,
IdleConnTimeout: 90 * time.Second,
},
}
config := openai.DefaultConfig(apiKey)
config.HTTPClient = httpClient
client := openai.NewClientWithConfig(config)Caching Embeddings
// Use Redis to cache embeddings for common queries
type CachedEmbeddingService struct {
embService *embedding.EmbeddingService
cache *redis.Client
ttl time.Duration
}
func (c *CachedEmbeddingService) GetEmbedding(ctx context.Context, text string) ([]float32, error) {
// Generate cache key
key := fmt.Sprintf("emb:%s", hashText(text))
// Try to get from cache
cached, err := c.cache.Get(ctx, key).Bytes()
if err == nil {
return deserializeEmbedding(cached), nil
}
// Cache miss, generate new embedding
emb, err := c.embService.GenerateEmbedding(ctx, text)
if err != nil {
return nil, err
}
// Store in cache
c.cache.Set(ctx, key, serializeEmbedding(emb), c.ttl)
return emb, nil
}2. Observability
Adding Logs and Metrics
import (
"github.com/prometheus/client_golang/prometheus"
"github.com/prometheus/client_golang/prometheus/promauto"
)
var (
queryDuration = promauto.NewHistogramVec(
prometheus.HistogramOpts{
Name: "rag_query_duration_seconds",
Help: "RAG query duration in seconds",
},
[]string{"status"},
)
embeddingRequests = promauto.NewCounterVec(
prometheus.CounterOpts{
Name: "embedding_requests_total",
Help: "Total number of embedding requests",
},
[]string{"status"},
)
)
func (r *RAGPipeline) Query(ctx context.Context, question string) (*models.RAGResponse, error) {
start := time.Now()
defer func() {
duration := time.Since(start).Seconds()
queryDuration.WithLabelValues("success").Observe(duration)
}()
// ... original logic
}3. Error Handling and Retry
import "github.com/cenkalti/backoff/v4"
func (s *EmbeddingService) GenerateEmbeddingWithRetry(ctx context.Context, text string) ([]float32, error) {
var result []float32
operation := func() error {
var err error
result, err = s.GenerateEmbedding(ctx, text)
return err
}
// Exponential backoff retry strategy
expBackoff := backoff.NewExponentialBackOff()
expBackoff.MaxElapsedTime = 30 * time.Second
err := backoff.Retry(operation, expBackoff)
return result, err
}📊 Advanced Techniques
1. Hybrid Search
Combining keyword search and semantic search:
type HybridRetriever struct {
vectorDB *vectordb.QdrantClient
keywordDB *ElasticsearchClient
vectorWeight float64 // 0-1, vector search weight
}
func (h *HybridRetriever) Search(ctx context.Context, query string, topK int) ([]models.SearchResult, error) {
// 1. Execute both searches in parallel
var vectorResults, keywordResults []models.SearchResult
var wg sync.WaitGroup
wg.Add(2)
go func() {
defer wg.Done()
vectorResults, _ = h.vectorDB.Search(ctx, queryVector, topK)
}()
go func() {
defer wg.Done()
keywordResults, _ = h.keywordDB.Search(ctx, query, topK)
}()
wg.Wait()
// 2. Fuse results (RRF - Reciprocal Rank Fusion)
return h.fuseResults(vectorResults, keywordResults), nil
}2. ReRank
Using Cohere ReRank API or local model to improve retrieval accuracy:
func (r *RAGPipeline) ReRank(ctx context.Context, query string, docs []models.Document) ([]models.Document, error) {
// Call Cohere ReRank API
client := cohere.NewClient(os.Getenv("COHERE_API_KEY"))
response, err := client.Rerank(ctx, &cohere.RerankRequest{
Query: query,
Documents: extractContents(docs),
TopN: 5,
Model: "rerank-multilingual-v2.0",
})
if err != nil {
return nil, err
}
// Reorder based on ReRank scores
reranked := make([]models.Document, len(response.Results))
for i, result := range response.Results {
reranked[i] = docs[result.Index]
}
return reranked, nil
}3. Multimodal RAG
Supporting multimodal content like images and tables:
// Extract image content using GPT-4 Vision
func (s *MultiModalService) ExtractImageContent(ctx context.Context, imageURL string) (string, error) {
req := openai.ChatCompletionRequest{
Model: openai.GPT4VisionPreview,
Messages: []openai.ChatCompletionMessage{
{
Role: openai.ChatMessageRoleUser,
MultiContent: []openai.ChatMessagePart{
{
Type: openai.ChatMessagePartTypeText,
Text: "Please describe in detail the content of this image, including text, charts, key information, etc.",
},
{
Type: openai.ChatMessagePartTypeImageURL,
ImageURL: &openai.ChatMessageImageURL{
URL: imageURL,
},
},
},
},
},
}
resp, err := s.client.CreateChatCompletion(ctx, req)
// ... handle response
}🎓 Best Practices Summary
✅ DO
Set Chunk Size Appropriately:
- Technical documentation: 300-500 characters
- Conversational data: 200-300 characters
- Long articles: 500-800 characters
Add Metadata:
- Document source, timestamp, category
- Facilitates filtering and tracing
Monitor Costs:
- Track token usage
- Use caching to reduce API calls
Test Retrieval Quality:
- Prepare test sets
- Calculate metrics like MRR, NDCG
❌ DON'T
- Don't blindly increase topK (high cost, more noise)
- Don't ignore error handling (API calls can fail)
- Don't hardcode prompts (use configuration files)
- Don't ignore security (API Key management, input validation)
🔗 Related Resources
📝 Summary
This article details how to build a complete RAG system using Golang, from basic architecture to production optimization. RAG technology is rapidly evolving, and it's recommended to stay updated on the latest developments:
- GraphRAG: Knowledge graph-based retrieval
- Self-RAG: Self-reflective RAG system
- Adaptive RAG: Adaptively selecting strategies based on queries
I hope this article helps you quickly get started with Golang + RAG development and build excellent AI applications!
Keywords: #Golang #RAG #AI #LLM #VectorDatabase #OpenAI #Qdrant #SemanticSearch #Embedding #IntelligentQA
Related Articles:

