Revolutionizing Document Ingestion & RAG with Docling, Azure AI Search, and Azure OpenAI
In today's AI landscape, building reliable knowledge systems requires more than just powerful language models. Enter Retrieval-Augmented Generation (RAG) – a pattern that enhances AI responses with contextual knowledge. In this guide, we'll build a production-grade RAG pipeline using Docling, Azure AI Search, and Azure OpenAI, taking you from concept to deployment with practical examples and best practices.
Understanding the RAG Architecture
RAG has emerged as a crucial pattern for grounding AI responses in reliable information. Let's understand why this matters and how our tools work together to create a robust solution.
The RAG Pipeline at a Glance
Here's how our pipeline processes documents and generates responses:
Each component in this pipeline serves a specific purpose:
Docling handles document processing and chunking
Azure OpenAI creates semantic embeddings
Azure AI Search manages vector storage and retrieval
The RAG prompt combines retrieved context with user queries
Why These Tools?
Let's examine what makes each component essential for production systems:
Docling's Advanced Document Processing:
Handles complex formats (PDFs, PPTX, DOCX) with structure preservation
Provides OCR capabilities for image-heavy documents
Implements GPU-accelerated processing for speed
Maintains hierarchical document structure during chunking
Azure AI Search's Vector Capabilities:
Offers efficient HNSW-based vector search
Supports hybrid retrieval (combining semantic and keyword search)
Provides automatic scaling and maintenance
Integrates seamlessly with Azure OpenAI
Azure OpenAI's Features:
Delivers state-of-the-art embedding models
Ensures enterprise-grade reliability
Offers cost-effective API pricing
Provides managed inference endpoints
Implementation Deep Dive
Let's walk through each stage of the pipeline with concrete examples and implementation details.
1. Document Processing: From Raw Files to Structured Content
Here's the minimal code needed to process a document with Docling:
from docling.document_converter import DocumentConverter
# Initialize with GPU acceleration if available
converter = DocumentConverter()
# Process a document (supports local files or URLs)
result = converter.convert("path/to/document.pdf")
# Preview the structured content
print(result.document.export_to_markdown()[:500])
When processing documents, Docling handles various challenges:
Layout analysis for complex PDFs
Table structure preservation
Image extraction and OCR when needed
Metadata retention for context
2. Hierarchical Chunking: Preserving Context and Structure
The chunking stage is crucial for effective retrieval. Here's how to implement it:
from docling_core.transforms.chunker import HierarchicalChunker
chunker = HierarchicalChunker()
doc_chunks = list(chunker.chunk(result.document))
all_chunks = []
for idx, c in enumerate(doc_chunks):
chunk_text = c.text
all_chunks.append((f"chunk_{idx}", chunk_text))
console.print(f"Total chunks from PDF: {len(all_chunks)}")
3. Vector Search Setup: Optimizing for Retrieval
Setting up Azure AI Search vector index:
from azure.identity import DefaultAzureCredential
from azure.search.documents.indexes import SearchIndexClient
from azure.search.documents.indexes.models import (
SearchIndex,
SearchField,
SearchFieldDataType,
SimpleField,
SearchableField,
VectorSearch,
HnswAlgorithmConfiguration,
VectorSearchProfile,
AzureOpenAIVectorizer,
AzureOpenAIVectorizerParameters,
)
from azure.core.credentials import AzureKeyCredential
VECTOR_DIM = 1536 # Adjust based on your chosen embeddings model
index_client = SearchIndexClient(AZURE_SEARCH_ENDPOINT, AzureKeyCredential(AZURE_SEARCH_KEY))
def create_search_index(index_name: str):
fields = [
SimpleField(name="chunk_id", type=SearchFieldDataType.String, key=True),
SearchableField(name="content", type=SearchFieldDataType.String),
SearchField(
name="content_vector",
type=SearchFieldDataType.Collection(SearchFieldDataType.Single),
searchable=True,
filterable=False,
sortable=False,
facetable=False,
vector_search_dimensions=VECTOR_DIM,
vector_search_profile_name="default",
),
]
vector_search = VectorSearch(
algorithms=[HnswAlgorithmConfiguration(name="default")],
profiles=[
VectorSearchProfile(
name="default",
algorithm_configuration_name="default",
vectorizer_name="default",
)
],
vectorizers=[
AzureOpenAIVectorizer(
vectorizer_name="default",
parameters=AzureOpenAIVectorizerParameters(
resource_url=AZURE_OPENAI_ENDPOINT,
deployment_name=AZURE_OPENAI_EMBEDDINGS,
model_name="text-embedding-3-small",
api_key=AZURE_OPENAI_API_KEY,
),
)
],
)
new_index = SearchIndex(
name=index_name,
fields=fields,
vector_search=vector_search
)
try:
index_client.delete_index(index_name)
except:
pass
index_client.create_or_update_index(new_index)
console.print(f"Index '{index_name}' created.")
create_search_index(AZURE_SEARCH_INDEX_NAME)
4. Efficient Batch Processing
I recommend efficient batch processing for generating embeddings and upserting to Azure AI Search:
from openai import AzureOpenAI
from azure.search.documents import SearchClient
import uuid
search_client = SearchClient(AZURE_SEARCH_ENDPOINT, AZURE_SEARCH_INDEX_NAME, AzureKeyCredential(AZURE_SEARCH_KEY))
openai_client = AzureOpenAI(
api_key=AZURE_OPENAI_API_KEY,
api_version=AZURE_OPENAI_API_VERSION,
azure_endpoint=AZURE_OPENAI_ENDPOINT,
)
def embed_text(text: str):
response = openai_client.embeddings.create(
input=text,
model=AZURE_OPENAI_EMBEDDINGS
)
return response.data[0].embedding
upload_docs = []
for chunk_id, chunk_text in all_chunks:
embedding_vector = embed_text(chunk_text)
upload_docs.append(
{
"chunk_id": str(uuid.uuid4()),
"content": chunk_text,
"content_vector": embedding_vector,
}
)
BATCH_SIZE = 250
for i in range(0, len(upload_docs), BATCH_SIZE):
subset = upload_docs[i : i + BATCH_SIZE]
resp = search_client.upload_documents(documents=subset)
console.print(
f"Uploaded batch {i} -> {i+len(subset)}; success: {resp[0].succeeded}, status code: {resp[0].status_code}"
)
console.print("All chunks uploaded to Azure Search.")
5. RAG Query Implementation
Here's a complete example of implementing RAG queries:
from azure.search.documents.models import VectorizableTextQuery
def generate_chat_response(prompt: str, system_message: str = None):
messages = []
if system_message:
messages.append({"role": "system", "content": system_message})
messages.append({"role": "user", "content": prompt})
completion = openai_client.chat.completions.create(
model=AZURE_OPENAI_CHAT_MODEL,
messages=messages,
temperature=0.7
)
return completion.choices[0].message.content
user_query = "in 2024, AI companies reached how many $$$ in value?"
user_embed = embed_text(user_query)
vector_query = VectorizableTextQuery(
text=user_query, # passing in text for a hybrid search
k_nearest_neighbors=5,
fields="content_vector"
)
search_results = search_client.search(
search_text=user_query,
vector_queries=[vector_query],
select=["content"],
top=10
)
retrieved_chunks = []
for result in search_results:
snippet = result["content"]
retrieved_chunks.append(snippet)
context_str = "\n---\n".join(retrieved_chunks)
rag_prompt = f"""
You are an AI assistant helping answering questions about the State of AI 2024 Report.
Use ONLY the text below to answer the user's question.
If the answer isn't in the text, say you don't know.
Context:
{context_str}
Question: {user_query}
Answer:
"""
final_answer = generate_chat_response(rag_prompt)
console.print(Panel(rag_prompt, title="RAG Prompt", style="bold red"))
console.print(Panel(final_answer, title="RAG Response", style="bold green"))
Answer: AI companies reached $9T in value in 2024.
Going Further
Want to explore more? Here are some advanced topics to consider:
Enhanced Retrieval:
Experiment with scoring profiles
Add re-ranking strategies such as Semantic Ranker in Azure AI Search
Try Query Rewriting (preview) in Azure AI Search
Quality Improvements:
Add relevance feedback loops
Implement chunk quality scoring
Monitor and tune retrieval performance
Conclusion
This RAG pipeline streamlines your end-to-end process—from ingesting documents with Docling to generating final answers with Azure OpenAI. It’s a robust and flexible foundation to build upon for content-heavy domains like legal, medical, finance, or any scenario where document ingestion is mission-critical.