Azure AI Search is a powerful tool for information retrieval, particularly in the context of Retrieval-Augmented Generation (RAG) and Generative AI models. Scoring profiles and document boosting are critical features that enhance the relevance of search results, allowing for sophisticated customizations. This blog will walk you through how to implement these features using Azure AI Search and Azure OpenAI.

Introduction

Businesses need to retrieve the most relevant information quickly and accurately in order to build RAG, Agents, AGI, whatever the AI use case may be. If you can't excel at retrieving relevant information over your own knowledge, regardless how powerful a new AI model may be, ask yourself if you are truly getting the business value our of these fancy AI language models?

Azure AI Search provides advanced functionalities like scoring profiles and document boosting to tune search relevance. This blog will explore these features and demonstrate their implementation through practical examples.

Setting up the Environment

First, let's set up the environment by installing the necessary libraries and downloading the dataset.

!pip install azure-identity  
!pip install kaggle  
!pip install python-dotenv  
!pip install rich  
!pip install azure-search-documents --pre

Downloading and Preparing the Dataset

We will use a news category dataset from Kaggle for this demonstration. You can find more information about this dataset here: News Category Dataset (kaggle.com) For the purpose of this blog, I won't show all the data wrangling that I did but here is a high-level summary:

Concatenated headline and short_description for as the input text for vectorization.
Created an id field as a string
Converted date field to ISO 8601 format.
Added a view_count for each article.

💡

Find the full step-by-step notebook here: azure-ai-search-python-playground/azure-ai-search-document-boosting.ipynb at main · farzad528/azure-ai-search-python-playground (github.com )

Generate Embeddings

Authenticate to Azure OpenAI and generate embeddings for the text data.

# Generate embeddings  
from tqdm import tqdm  
from tenacity import retry, stop_after_attempt, wait_exponential  

@retry(stop=stop_after_attempt(5), wait=wait_exponential(multiplier=1, min=4, max=60))  
def get_embeddings(openai_client, texts):  
    response = openai_client.embeddings.create(  
        input=texts,  
        model=os.getenv("AZURE_OPENAI_EMBEDDING_DEPLOYED_MODEL_NAME")  
    )  
    response_json = json.loads(response.model_dump_json(indent=2))  
    return [data['embedding'] for data in response_json['data']]  

def add_embeddings_to_df(df, text_column, vector_column, batch_size=1000):  
    embeddings = []  
    for i in tqdm(range(0, len(df[text_column]), batch_size)):  
        batch_texts = df[text_column][i:i+batch_size].tolist()  
        batch_embeddings = get_embeddings(openai_client, batch_texts)  
        embeddings.extend(batch_embeddings)  
    df[vector_column] = embeddings  
    return df  

df_vectors = add_embeddings_to_df(df, "text_to_vectorize", "vector") 
# Drop the text_to_vectorize column since we have the vector field 
df_vectors.drop(columns=['text_to_vectorize'], inplace=True)

💡

Fun Fact: If you are getting rate limited by Azure OpenAI, you can either request more quota for a higher rate limit or use the tenacity retry library in python for an exponential backoff-retry strategy. Only 1 line of code!

Create Azure AI Search Index

Authenticate to Azure AI Search and create an index with the necessary fields and scoring profiles.

# Initialize the SearchIndexClient  
index_client = SearchIndexClient(  
    endpoint=os.getenv("AZURE_SEARCH_SERVICE_ENDPOINT"),  
    credential=DefaultAzureCredential(),  
)  

# Define the fields  
fields = [  
    SimpleField(name="id", type=SearchFieldDataType.String, key=True),  
    SimpleField(name="link", type=SearchFieldDataType.String),  
    SearchableField(name="headline", type=SearchFieldDataType.String),  
    SearchableField(  
        name="category",  
        type=SearchFieldDataType.String,  
        filterable=True,  
        facetable=True,  
    ),  
    SearchableField(name="short_description", type=SearchFieldDataType.String),  
    SearchableField(name="authors", type=SearchFieldDataType.String),  
    SearchField(  
        name="date",  
        type=SearchFieldDataType.DateTimeOffset,  
        filterable=True,  
        sortable=True,  
    ),  
    SimpleField(  
        name="view_count",  
        type=SearchFieldDataType.Int32,  
        filterable=True,  
        sortable=True,  
    ),  
    SearchField(  
        name="vector",  
        type="Collection(Edm.Single)",  
        vector_search_dimensions=3072,  
        vector_search_profile_name="my-vector-config",  
    ),  
]  

# Define the vector search  
vector_search = VectorSearch(  
    profiles=[  
        VectorSearchProfile(  
            name="my-vector-config",  
            algorithm_configuration_name="my-hnsw",  
            vectorizer="my-vectorizer",  
        )  
    ],  
    algorithms=[  
        HnswAlgorithmConfiguration(  
            name="my-hnsw",  
            kind=VectorSearchAlgorithmKind.HNSW,  
            parameters=HnswParameters(metric=VectorSearchAlgorithmMetric.COSINE),  
        )  
    ],  
    vectorizers=[  
        AzureOpenAIVectorizer(  
            name="my-vectorizer",  
            azure_open_ai_parameters=AzureOpenAIParameters(  
                resource_uri=os.getenv("AZURE_OPENAI_ENDPOINT"),  
                deployment_id=os.getenv("AZURE_OPENAI_EMBEDDING_DEPLOYED_MODEL_NAME"),  
                model_name=AzureOpenAIModelName.TEXT_EMBEDDING3_LARGE,  
            ),  
        )  
    ],  
)  

# Define scoring profiles  
scoring_profiles = [  
    ScoringProfile(  
        name="boostCategory",  
        text_weights=TextWeights(  
            weights={  
                "category": 10.0,  
            }  
        ),  
    ),  
    ScoringProfile(  
        name="boostRecency",  
        functions=[  
            FreshnessScoringFunction(  
                field_name="date",  
                boost=10.0,  
                parameters=FreshnessScoringParameters(  
                    boosting_duration="P1095D",  
                ),  
                interpolation=ScoringFunctionInterpolation.LINEAR,  
            )  
        ],  
    ),  
    ScoringProfile(  
        name="boostByTag",  
        functions=[  
            TagScoringFunction(  
                field_name="category",  
                boost=10.0,  
                parameters=TagScoringParameters(  
                    tags_parameter="tags",  
                ),  
            )  
        ],  
    ),  
    ScoringProfile(  
        name="boostViewCount",  
        functions=[  
            MagnitudeScoringFunction(  
                field_name="view_count",  
                boost=10.0,  
                parameters=MagnitudeScoringParameters(  
                    boosting_range_start=0,  
                    boosting_range_end=10000,  
                ),  
                interpolation=ScoringFunctionInterpolation.LINEAR,  
            )  
        ],  
    ),  
]  

# Define the index  
index = SearchIndex(  
    name="news-category",  
    fields=fields,  
    scoring_profiles=scoring_profiles,  
    vector_search=vector_search,  
)  

# Create or update the index  
result = index_client.create_or_update_index(index)  
print(f"{result.name} created")

💡

You can save up to 75% on vector index size by using quantization in Azure AI Search: Announcing cost-effective RAG at scale with Azure AI Search (microsoft.com )

Deep Dive into Scoring Profiles

Let's take a look into the construction of each of our scoring profiles. After index creation, you can obtain these by looking at your index definition in the Azure Portal or using the GET Index REST call.

Text Weighting

{  
  "name": "boostCategory",  
  "functionAggregation": null,  
  "text": {  
    "weights": {  
      "category": 10  
    }  
  },  
  "functions": []  
}

The boostCategory scoring profile uses text weights to boost the relevance of documents based on the category field. With a boost factor of 10, documents that match the category specified in the search query will be ranked higher. Text weights define weights on index fields for which matches should boost scoring in search queries. Generally speaking, this is how customers can weight different fields differently in their search index to enhance the search experience.

💡

Since text weights is a full-text search only weight, this will only be applicable to full-text search and hybrid queries. If you use a pure vector search query with text weight, this scoring profile will NOT be applied.

Freshness Boosting

{  
  "name": "boostRecency",  
  "functionAggregation": "sum",  
  "text": null,  
  "functions": [  
    {  
      "fieldName": "date",  
      "interpolation": "linear",  
      "type": "freshness",  
      "boost": 10,  
      "freshness": {  
        "boostingDuration": "P1095D"  
      },  
      "magnitude": null,  
      "distance": null,  
      "tag": null  
    }  
  ]  
}

The boostRecency scoring profile prioritizes newer documents by boosting their relevance based on the date field. With a boost factor of 10 and a boosting duration of 1095 days (approximately 3 years), this profile ensures that recent content is ranked higher. This approach is beneficial for any platform where the timeliness of information is crucial, such as ensuring users have access to the most up-to-date content. By applying freshness boosting, platforms can enhance the relevancy and timeliness of search results, improving user experience.

Questions to Ask:

Do you need to prioritize the most recent content in your search results?
Is it important for your users to find the latest information quickly?
Should newer entries be more prominent in your search outcomes?

Tag Boosting

{  
  "name": "boostByTag",  
  "functionAggregation": "sum",  
  "text": null,  
  "functions": [  
    {  
      "fieldName": "category",  
      "interpolation": "linear",  
      "type": "tag",  
      "boost": 10,  
      "freshness": null,  
      "magnitude": null,  
      "distance": null,  
      "tag": {  
        "tagsParameter": "tags"  
      }  
    }  
  ]  
}

The boostByTag scoring profile enhances search relevance by boosting documents that share specific tags with the search query. By focusing on the category field and applying a boost factor of 10, this profile promotes content that matches the tags provided in the search parameters. This approach is beneficial for any platform where tagging is used to categorize and discover content, improving the precision and relevance of search results. By leveraging tag boosting, users can more effectively find content that aligns with their specific interests or requirements.

Questions to Ask:

Do you need to enhance the relevance of search results based on specific tags?
Is it important for your users to find categorized content quickly?
Should tagged entries be more prominent in your search outcomes?

Magnitude Boosting

{  
  "name": "boostViewCount",  
  "functionAggregation": "sum",  
  "text": null,  
  "functions": [  
    {  
      "fieldName": "view_count",  
      "interpolation": "linear",  
      "type": "magnitude",  
      "boost": 10,  
      "freshness": null,  
      "magnitude": {  
        "boostingRangeStart": 0,  
        "boostingRangeEnd": 10000,  
        "constantBoostBeyondRange": false  
      },  
      "distance": null,  
      "tag": null  
    }  
  ]  
}

The boostViewCount scoring profile increases the visibility of popular content by boosting documents based on their view count. With a boost factor of 10 and a range of 0 to 10000 views, this profile ensures that highly-viewed content is ranked higher. The magnitude function allows for specifying a range within which the boost is applied and whether to continue applying a constant boost beyond this range. This approach is useful for any platform where user engagement metrics, such as view count, play a crucial role in determining the value and relevance of content. By incorporating magnitude boosting, platforms can enhance the discovery of popular content, leading to a more engaging user experience.

Questions to Ask:

Do you need to boost the visibility of content based on user engagement metrics?
Is it important for your users to easily find the most popular items?
Should highly-viewed entries be more prominent in your search outcomes?

Evaluate Retrieval Quality with Different Document Boosting Techniques

We will evaluate the retrieval quality using different scoring profiles.

Text Weighting

Perform a hybrid search to apply category boosts.

from rich.console import Console  
from rich.table import Table  
from rich.text import Text  
from azure.search.documents.models import VectorizableTextQuery  

# Initialize a Rich console  
console = Console()  

def search_and_print_results(scoring_profile=None):
    query = "Entertainment Industry Trends"
    vector_query = VectorizableTextQuery(
        text=query, k_nearest_neighbors=50, fields="vector", exhaustive=True
    )
    results = search_client.search(
        search_text=query, # passing in text query for hybrid search
        vector_queries=[vector_query],
        scoring_profile=scoring_profile,
        top=3,
    )

    profile_name = scoring_profile if scoring_profile else 'Vanilla (No Scoring Profile)'
    console.print(f"\nResults for {profile_name} Scoring Profile:", style="bold blue")

    # Create a table for the results
    table = Table(show_header=True, header_style="bold magenta")
    table.add_column("Headline", style="dim", width=20)
    table.add_column("Score")
    table.add_column("Description", width=40)
    table.add_column("Category")
    table.add_column("Date")
    table.add_column("Link")

    for result in results:
        # Format the link as a clickable hyperlink
        link_text = Text(result['link'], style="link")
        link_text.stylize(f"link {result['link']}")

        table.add_row(
            result['headline'],
            str(result['@search.score']),
            result['short_description'],
            result['category'],
            result['date'],
            link_text  # Use the formatted link text here
        )
    console.print(table)

# Perform searches with and without the freshness scoring profile
search_and_print_results()  # Vanilla query without any scoring profile
search_and_print_results("boostCategory")

💡

Note, for hybrid search, Azure AI Search returns RRF (Reciprocal Rank Fusion) scores. Learn more: Hybrid search scoring (RRF) - Azure AI Search | Microsoft Learn

When the boostCategory scoring profile is applied, the search results prioritize articles within the specified category, as demonstrated by the higher ranking of articles categorized under ENTERTAINMENT. For example, the top three results include articles like:

Hollywood & Vine: The Entertainment Industry Seeks The Future In Viral Video
Is Music Dead? (Thoughts on the Music Industry After SXSW 2015)

which are directly related to the entertainment industry. In contrast, a vanilla query without any scoring profile returns results based on default relevance, where articles from unrelated categories like WEDDINGS also appear in the top results. This illustrates the effectiveness of the category boosting scoring profile in promoting content that is more relevant to the specified category.

Freshness Boosting

Apply a scoring profile to prioritize newer articles.

def search_and_print_results(scoring_profile=None):  
    query = "latest news on airlines"  
    vector_query = VectorizableTextQuery(  
        text=query, k_nearest_neighbors=50, fields="vector", exhaustive=True  
    )  
    results = search_client.search(  
        search_text=None,  
        vector_queries=[vector_query],  
        scoring_profile=scoring_profile,  
        top=3,  
    )  

    profile_name = scoring_profile if scoring_profile else 'Vanilla (No Scoring Profile)'  
    console.print(f"\nResults for {profile_name} Scoring Profile:", style="bold blue")  

    # Create a table for the results  
    table = Table(show_header=True, header_style="bold magenta")  
    table.add_column("Headline", style="dim", width=20)  
    table.add_column("Score")  
    table.add_column("Description", width=40)  
    table.add_column("Date", width=15)  
    table.add_column("Link")  

    for result in results:  
        # Format the link as a clickable hyperlink  
        link_text = Text(result['link'], style="link")  
        link_text.stylize(f"link {result['link']}")  

        table.add_row(  
            result['headline'],  
            str(result['@search.score']),  
            result['short_description'],  
            result['date'],  
            link_text  # Use the formatted link text here  
        )  
    console.print(table)  

# Perform searches with and without the freshness scoring profile  
search_and_print_results()  # Vanilla query without any scoring profile  
search_and_print_results("boostRecency")

When the boostRecency scoring profile is applied, it sets the time delta to P1095D, which boosts articles published within the past 1095 days (approximately 3 years) from the current date. As a result, the search prioritizes newer articles. For example, the Alaska Airlines article dated April 1, 2022, ranks higher. In contrast, a vanilla query without any scoring profile ranks results based on default relevance, where older articles from 2013 and 2012 are ranked higher. This demonstrates the effectiveness of the freshness scoring profile in promoting more recent content.

Tag Boosting

Apply tag-based boosting to promote content aligned with specific tags.

def search_and_print_results(scoring_profile=None):  
    query = "what are the hottest trends in the banking business industry"  
    tags = "BUSINESS"  
    vector_query = VectorizableTextQuery(  
        text=query, k_nearest_neighbors=50, fields="vector", exhaustive=True  
    )  

    # Prepare the search parameters  
    search_params = {  
        "search_text": None,  
        "vector_queries": [vector_query],  
        "scoring_profile": scoring_profile,  
        "top": 3,  
    }  

    # Conditionally add scoring_parameters if a scoring_profile is specified  
    if scoring_profile:  
        search_params["scoring_parameters"] = {"tags-BUSINESS": tags}  

    results = search_client.search(**search_params)  

    profile_name = (  
        scoring_profile if scoring_profile else "Vanilla (No Scoring Profile)"  
    )  
    console.print(f"\nResults for {profile_name} Scoring Profile:", style="bold blue")  

    # Create a table for the results  
    table = Table(show_header=True, header_style="bold magenta")  
    table.add_column("Headline", style="dim", width=20)  
    table.add_column("Score")  
    table.add_column("Description", width=40)  
    table.add_column("Category")  
    table.add_column("Date")  
    table.add_column("Link")  

    for result in results:  
        # Format the link as a clickable hyperlink  
        link_text = Text(result["link"], style="link")  
        link_text.stylize(f"link {result['link']}")  

        table.add_row(  
            result["headline"],  
            str(result["@search.score"]),  
            result["short_description"],  
            result["category"],  
            result["date"],  
            link_text,   
        )  
    console.print(table)  

# Perform searches with and without the tag boosting scoring profile  
search_and_print_results()  # Vanilla query without any scoring profile  
search_and_print_results("boostByTag")

When the boostByTag scoring profile is applied with the tag "BUSINESS," the search results prioritize articles related to the specified tag. For example, articles such as:

What's the Future of Retail Banking?
Banking Saves Health Care

which are directly related to the business category, rank higher. In contrast, a vanilla query without any scoring profile returns results based on default relevance, where articles from other categories like MONEY also appear in the top results. This demonstrates the effectiveness of the tag boosting scoring profile in promoting content that is more relevant to the specified tag.

Magnitude Boosting

Apply magnitude boosting based on the view count field to promote popular content.

def search_and_print_results(scoring_profile=None):  
    query = "Entertainment Industry Trends"  
    vector_query = VectorizableTextQuery(  
        text=query, k_nearest_neighbors=50, fields="vector", exhaustive=True  
    )  
    results = search_client.search(  
        search_text=None,  
        vector_queries=[vector_query],  
        scoring_profile=scoring_profile,  
        top=3,  
    )  

    profile_name = scoring_profile if scoring_profile else 'Vanilla (No Scoring Profile)'  
    console.print(f"\nResults for {profile_name} Scoring Profile:", style="bold blue")  

    # Create a table for the results  
    table = Table(show_header=True, header_style="bold magenta")  
    table.add_column("Headline", style="dim", width=20)  
    table.add_column("Score")  
    table.add_column("Category")  
    table.add_column("Date")  
    table.add_column("View Count")  
    table.add_column("Link")  

    for result in results:  
        # Format the link as a clickable hyperlink  
        link_text = Text(result['link'], style="link")  
        link_text.stylize(f"link {result['link']}")  

        table.add_row(  
            result['headline'],  
            str(result['@search.score']),  
            result['category'],  
            result['date'],  
            str(result['view_count']),  
            link_text  
        )  
    console.print(table)  

# Perform searches with and without the view count boosting scoring profile  
search_and_print_results()  # Vanilla query without any scoring profile  
search_and_print_results("boostViewCount")

When the boostViewCount scoring profile is applied, the search results prioritize articles with higher view counts, even if they are not the most topically relevant or recent. For example, articles such as:

Millennials & The Music Business: Inverting the Hierarchy
The Biggest Food Trends Of 2015

with significant views rank higher despite not being directly related to entertainment industry trends. In contrast, a vanilla query without any scoring profile returns results based on default relevance, where more topically relevant articles like:

Hollywood & Vine: The Entertainment Industry Seeks The Future In Viral Video
5 Entertainment Events We Want To See Happen In 2015

are ranked higher regardless of their view counts. This illustrates the effectiveness of the magnitude boosting scoring profile in promoting more popular content based on view count.

Conclusion

In this blog, we explored the capabilities of Azure AI Search for enhancing search relevance through document boosting and scoring profiles. By implementing these techniques, you can tailor search results to meet specific business needs, improve user engagement, and ensure the most relevant and popular content is surfaced. Whether it's through text weighting, freshness boosting, tag boosting, magnitude boosting, or geolocation/distance boosting (not shown in this blog but stay tuned for a future one!) Azure AI Search provides the flexibility and power needed for sophisticated relevance tuning.

Enhancing Search Relevance with Document Boosting in Azure AI Search

Introduction

Setting up the Environment

Downloading and Preparing the Dataset

Generate Embeddings

Create Azure AI Search Index

Deep Dive into Scoring Profiles

Text Weighting

Freshness Boosting

Questions to Ask:

Tag Boosting

Questions to Ask:

Magnitude Boosting

Questions to Ask:

Evaluate Retrieval Quality with Different Document Boosting Techniques

Text Weighting

Freshness Boosting

Tag Boosting

Magnitude Boosting

Conclusion

References

Enhancing Search Relevance with Document Boosting in Azure AI Search

Introduction

Setting up the Environment

Downloading and Preparing the Dataset

Generate Embeddings

Create Azure AI Search Index

Deep Dive into Scoring Profiles

Text Weighting

Freshness Boosting

Questions to Ask:

Tag Boosting

Questions to Ask:

Magnitude Boosting

Questions to Ask:

Evaluate Retrieval Quality with Different Document Boosting Techniques

Text Weighting

Freshness Boosting

Tag Boosting

Magnitude Boosting

Conclusion

References

Did you find this article valuable?