Mastering Fragment_Size in Elasticsearch for Optimized Search Results

Last Updated : 25 Jul, 2024

The article investigates the relationship between the 'fragment_size' option and search query efficiency, which is a critical component of Elasticsearch performance. The maximum amount of search result fragments that Elasticsearch will provide for a single document is determined by the fragment_size parameter, which can have a big impact on the user experience when searching.

This topic is important since it shows a superlinear increase in execution time with an increase in the number of requested documents, especially when searching through big document collections. Organizations that depend on Elasticsearch for their full-text search requirements frequently encounter this problem, which can result in partial search results or even the abandonment of searches.

Understanding Fragment_Size

The Elasticsearch fragment_size parameter controls the maximum character count that appears in the search result snippets. The goal is to achieve a compromise between managing the total response size, which can affect search performance, particularly for those with slower internet connections, and giving consumers enough context to evaluate relevancy. You may improve user experience by showing the most relevant information in the search results and optimize your Elasticsearch application's search performance by adjusting the fragment size.

Impact on Search Performance

You have an e-commerce website where customers can search for product descriptions. It is common for product descriptions to be quite long, often containing several paragraphs.

Consider the following scenario:

Java

import org.elasticsearch.action.search.SearchRequest;
import org.elasticsearch.action.search.SearchResponse;
import org.elasticsearch.client.RequestOptions;
import org.elasticsearch.client.RestHighLevelClient;
import org.elasticsearch.search.SearchHit;
import org.elasticsearch.search.SearchHits;
import org.elasticsearch.search.builder.SearchSourceBuilder;
import org.elasticsearch.search.highlight.HighlightField;

import java.io.IOException;
import java.util.Map;

public class ElasticsearchFragmentSizeExample {
    public static void main(String[] args) {
        // Create an Elasticsearch client
        RestHighLevelClient client = new RestHighLevelClient(
            // Your Elasticsearch client configuration
        );

        try {
            // Search for products with the keyword "coffee"
            searchWithFragmentSize(client, "products", "coffee", 50);
            searchWithFragmentSize(client, "products", "coffee", 100);
        } catch (IOException e) {
            e.printStackTrace();
        } finally {
            // Close the Elasticsearch client
            try {
                client.close();
            } catch (IOException e) {
                e.printStackTrace();
            }
        }
    }

    private static void searchWithFragmentSize(RestHighLevelClient client, String index, String keyword, int fragmentSize) throws IOException {
        // Create a search request
        SearchRequest searchRequest = new SearchRequest(index);
        SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();

        // Set the search query
        searchSourceBuilder.query(QueryBuilders.matchQuery("description", keyword));

        // Set the fragment_size parameter
        searchSourceBuilder.highlighter().fragmentSize(fragmentSize);

        // Add the source builder to the search request
        searchRequest.source(searchSourceBuilder);

        // Execute the search request
        SearchResponse searchResponse = client.search(searchRequest, RequestOptions.DEFAULT);

        // Process the search results
        SearchHits searchHits = searchResponse.getHits();
        System.out.println("Fragment size: " + fragmentSize);
        System.out.println("Total hits: " + searchHits.getTotalHits().value);

        for (SearchHit hit : searchHits.getHits()) {
            // Access the search result snippet
            Map<String, HighlightField> highlightFields = hit.getHighlightFields();
            if (highlightFields.containsKey("description")) {
                String fragment = highlightFields.get("description").fragments()[0].string();
                System.out.println("Search result snippet: " + fragment);
            }
        }

        System.out.println();
    }
}

Output:

Fragment size: 50
Total hits: 123
Search result snippet: ... the best <em>coffee</em> beans are harvested from...

Fragment size: 100
Total hits: 123
Search result snippet: ... the best <em>coffee</em> beans are harvested from high-altitude regions...

From the output, it is evident that search results with 'fragment_size' 50 are more concise, while search results with 'fragment_size' 100 provide more context around the keyword "coffee". You should choose 'fragment_size' based on your specific use case and your users' needs. When users need more information to determine the relevance of search results, a larger 'fragment_size' may be more suitable for quick previews. Be sure to monitor the impact on search performance and user experience by adjusting 'fragment_size' according to your requirements.

Best Practices for Setting Fragment_Size

Considerations for Small vs Large Text Fields

Consider the nature of your text fields when setting 'fragment_size'. To provide the best search experience, different types of content require different fragment sizes.

Small Text Fields

For small text fields, such as titles, tags, or short descriptions:

Smaller Fragment Sizes: Since the text is short, you can use smaller fragment sizes (e.g., 30-50 characters) to highlight the entire field.
Highlight Entire Field: Often, you might want to highlight the entire content of small fields to ensure no relevant information is missed.

{
  "highlight": {
    "fields": {
      "title": {
        "fragment_size": 50
      }
    }
  }
}

Large Text Fields

To enhance the quality and clarity of the language, consider using larger fragment sizes (e.g., 100-200 characters) when working with extensive text fields, such as articles, blogs, or product descriptions.
This approach will provide more context around the highlighted terms.
Additionally, consider allowing multiple fragments per field to address various parts of the document where the search terms might appear.

{
  "highlight": {
    "fields": {
      "content": {
        "fragment_size": 150,
        "number_of_fragments": 3
      }
    }
  }
}

Optimal Fragment_Size for Different Use Cases

News Articles or Blogs

Context-Rich Fragments: Articles and blogs often contain detailed information. Using a fragment_size of 150-200 characters can help provide enough context to make the highlighted text meaningful.
Multiple Fragments: Set number_of_fragments to 3-5 to ensure that different relevant sections of the document are covered.

{
  "highlight": {
    "fields": {
      "content": {
        "fragment_size": 200,
        "number_of_fragments": 5
      }
    }
  }
}

Product Descriptions

Moderate Fragment Sizes: Product descriptions are typically shorter than articles but longer than titles. A fragment_size of 100-150 characters is usually sufficient.
Few Fragments: Set number_of_fragments to 1-3 to avoid overloading users with too much highlighted text.

{
  "highlight": {
    "fields": {
      "description": {
        "fragment_size": 120,
        "number_of_fragments": 2
      }
    }
  }
}

Scientific Papers or Technical Documents

Larger Fragment Sizes: These documents are usually dense with information. Use a larger fragment_size of 200-300 characters to ensure users get enough context.
Multiple Fragments: Set number_of_fragments to 5 or more to capture various relevant sections of the document.

{
  "highlight": {
    "fields": {
      "body": {
        "fragment_size": 250,
        "number_of_fragments": 5
      }
    }
  }
}

Advanced Techniques for Fragmenting Text

Custom Fragmenters

Elasticsearch's default text fragmenter generally works well, but sometimes you might want more control over how your text is fragmented. Custom fragmenters are useful in this situation. Textual content can be fragmented according to your guidelines with custom fragmenters. The default fragmenter may not produce the desired results in languages with specific grammatical systems, or in unique situations.

You can create a custom fragmenter by adding the text_analyzer subject to your Elasticsearch mapping, and specifying the fragmenter type along with any additional configuration options. As an example, you can create a fragment that breaks text on sentence limitations:

{
  "mappings": {
    "properties": {
      "my_text_field": {
        "type": "text",
        "text_analyzer": {
          "type": "plain",
          "fragmenter": "sentence"
        }
      }
    }
  }
}

Output:

This mapping defines a field called my_text_field with the following properties:
The "type" of my_text_field is "text", which means it will be indexed for full-text searches.
A "text_analyzer" is defined as follows: [...]: This is a custom configuration for the text analyzer.
This specifies that the text analyzer type is "plain", which means that it will use Elasticsearch's default text analysis pipeline.
"fragmenter": "sentence": This configures the text fragmenter to use the "sentence" mode, that explains the text in the my_text_field will be segregated up into fragments based on sentence boundaries.

Highlighting Multiple Fields

In addition to custom fragmenter , Elasticsearch can also highlight textual content from multiple sources in your search results. This can be useful when you have a complex reporting system with many relevant fields, and you want to highlight to the user where there is relevant information.

You can use the fields parameter for your highlighting options to highlight the two fields. For example, you can use the following question to highlight notes in each title and content section.

{
  "query": {
    "match": {
      "content": "your search query"
    }
  },
  "highlight": {
    "fields": {
      "title": {},
      "content": {}
    }
  }
}

Output:

{
  "took": 10,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 10,
      "relation": "eq"
    },
    "max_score": 0.9808292,
    "hits": [
      {
        "_index": "my_index",
        "_type": "_doc",
        "_id": "1",
        "_score": 0.9808292,
        "_source": {
          "title": "This is the <em>title</em> of the document",
          "content": "This is the <em>content</em> of the document, which matches the search query."
        },
        "highlight": {
          "title": [
            "This is the <em>title</em> of the document"
          ],
          "content": [
            "This is the <em>content</em> of the document, which matches the search query."
          ]
        }
      },
      {
        "_index": "my_index",
        "_type": "_doc",
        "_id": "2",
        "_score": 0.81828034,
        "_source": {
          "title": "Another document with a <em>title</em> that matches",
          "content": "This document has content that also matches the search query."
        },
        "highlight": {
          "title": [
            "Another document with a <em>title</em> that matches"
          ],
          "content": [
            "This document has content that also matches the search query."
          ]
        }
      }
    ]
  }
}

Conclusion

The 'fragment_size' parameter in Elasticsearch performs a vital role in figuring out the quantity of textual content displayed in the highlighted seek outcomes. By cautiously tuning this placing, you could make sure that customers get maintain of the greatest amount of context throughout the matching textual content, without overwhelming them with immoderate or inappropriate records.

A well-selected fragment_size can enhance the client enjoy through using providing in reality the proper stability of info, making it simpler for them to rapid take a look at the relevance of each are trying to find end end result. Mastering the fragment_size parameter is a key problem of optimizing Elasticsearch search universal overall performance and relevance, allowing you to deliver a more polished and consumer-pleasant are seeking for revel in.

Mastering Fragment_Size in Elasticsearch for Optimized Search Results

poojapotyi58

Improve

Article Tags :

Mastering Fragment_Size in Elasticsearch for Optimized Search Results

Understanding Fragment_Size

Impact on Search Performance

Best Practices for Setting Fragment_Size

Considerations for Small vs Large Text Fields

Small Text Fields

Large Text Fields

Optimal Fragment_Size for Different Use Cases

Scientific Papers or Technical Documents

Advanced Techniques for Fragmenting Text

Custom Fragmenters

Highlighting Multiple Fields

Conclusion

Similar Reads

Thank You!

What kind of Experience do you want to share?