Open In App

Elasticsearch Group by Date

Last Updated : 31 May, 2024
Comments
Improve
Suggest changes
Like Article
Like
Report

Elasticsearch is a powerful search and analytics engine that allows you to store, search, and analyze big volumes of data quickly and in near real-time. One common requirement in data analysis is grouping data by date, which is especially useful for time-series data.

In this article, we will dive deep into how to perform group-by-date operations in Elasticsearch, including examples and expected outputs. Whether you're a beginner or someone looking to refine your Elasticsearch skills, this guide will help you understand the nuances of date aggregation in Elasticsearch.

Understanding Date Aggregations in Elasticsearch

Date aggregation in Elasticsearch allows you to group data based on date fields. This is particularly useful for tasks like generating reports, tracking trends, and creating dashboards. Elasticsearch provides several date-related aggregations to help with this:

  • Date Histogram Aggregation: Groups data into buckets based on specified intervals.
  • Date Range Aggregation: Groups data into buckets based on specified date ranges.
  • Date Histogram with Sub-Aggregations: Allows more complex grouping and analysis within each date bucket.

Setting Up Elasticsearch

Before we dive into the examples, let's make sure we have a running instance of Elasticsearch. If you haven't installed Elasticsearch yet, you can follow the official installation guide.

For our examples, we'll assume you have an Elasticsearch instance running locally on http://localhost:9200.

Indexing Sample Data

Let's start by indexing some sample data. We'll create an index called sales_data and insert a few documents representing sales transactions, each with a date field.

Creating the Index

PUT /sales_data
{
"mappings": {
"properties": {
"product": {
"type": "keyword"
},
"amount": {
"type": "float"
},
"date": {
"type": "date"
}
}
}
}

Indexing Documents

POST /sales_data/_doc/1
{
"product": "Laptop",
"amount": 1200.50,
"date": "2023-01-01T10:00:00Z"
}

POST /sales_data/_doc/2
{
"product": "Smartphone",
"amount": 650.75,
"date": "2023-01-02T12:30:00Z"
}

POST /sales_data/_doc/3
{
"product": "Tablet",
"amount": 300.00,
"date": "2023-01-01T15:00:00Z"
}

POST /sales_data/_doc/4
{
"product": "Laptop",
"amount": 1300.00,
"date": "2023-01-03T09:00:00Z"
}

POST /sales_data/_doc/5
{
"product": "Smartwatch",
"amount": 250.00,
"date": "2023-01-03T11:00:00Z"
}

Grouping Data by Date Using Date Histogram Aggregation

The date_histogram aggregation is the most commonly used method for grouping by date. It allows you to specify an interval (e.g., day, week, month) and groups documents into buckets based on that interval.

Example: Grouping by Day

Let's group our sales data by day to see the total sales amount for each day.

POST /sales_data/_search
{
"size": 0,
"aggs": {
"sales_per_day": {
"date_histogram": {
"field": "date",
"calendar_interval": "day"
},
"aggs": {
"total_sales": {
"sum": {
"field": "amount"
}
}
}
}
}
}

Explanation

  • size: 0: We set the size to 0 because we are only interested in the aggregation results, not the individual documents.
  • date_histogram: This is the main aggregation that groups documents by the date field.
  • field: "date": The field to group by.
  • calendar_interval: "day": The interval for grouping (in this case, daily).
  • total_sales: A sub-aggregation that calculates the sum of the amount field for each day.

Output:

{
"aggregations": {
"sales_per_day": {
"buckets": [
{
"key_as_string": "2023-01-01T00:00:00.000Z",
"key": 1672531200000,
"doc_count": 2,
"total_sales": {
"value": 1500.5
}
},
{
"key_as_string": "2023-01-02T00:00:00.000Z",
"key": 1672617600000,
"doc_count": 1,
"total_sales": {
"value": 650.75
}
},
{
"key_as_string": "2023-01-03T00:00:00.000Z",
"key": 1672704000000,
"doc_count": 2,
"total_sales": {
"value": 1550.0
}
}
]
}
}
}

Analysis

The output shows the total sales amount for each day:

  • On January 1, 2023, the total sales were $1500.50.
  • On January 2, 2023, the total sales were $650.75.
  • On January 3, 2023, the total sales were $1550.00.

Grouping Data by Month

Similarly, we can group the data by month. This is useful for generating monthly reports.

Example: Grouping by Month

POST /sales_data/_search
{
"size": 0,
"aggs": {
"sales_per_month": {
"date_histogram": {
"field": "date",
"calendar_interval": "month"
},
"aggs": {
"total_sales": {
"sum": {
"field": "amount"
}
}
}
}
}
}

Output:

{
"aggregations": {
"sales_per_month": {
"buckets": [
{
"key_as_string": "2023-01-01T00:00:00.000Z",
"key": 1672531200000,
"doc_count": 5,
"total_sales": {
"value": 3701.25
}
}
]
}
}
}

Analysis

The output shows the total sales amount for January 2023 was $3701.25.

Grouping Data by Custom Date Ranges

In some cases, you may want to group data by custom date ranges rather than fixed intervals like days or months. For this, you can use the date_range aggregation.

Example: Custom Date Ranges

Let's group our sales data into two custom ranges: before and after January 2, 2023.

POST /sales_data/_search
{
"size": 0,
"aggs": {
"sales_by_date_range": {
"date_range": {
"field": "date",
"ranges": [
{
"to": "2023-01-02T00:00:00Z"
},
{
"from": "2023-01-02T00:00:00Z"
}
]
},
"aggs": {
"total_sales": {
"sum": {
"field": "amount"
}
}
}
}
}
}

Explanation

  • date_range: The main aggregation that groups documents by custom date ranges.
  • field: "date": The field to group by.
  • ranges: An array of range definitions.
  • to: "2023-01-02T00:00:00Z": The first range is up to January 2, 2023.
  • from: "2023-01-02T00:00:00Z": The second range starts from January 2, 2023.

Output:

{
"aggregations": {
"sales_by_date_range": {
"buckets": [
{
"key": "*-2023-01-02T00:00:00.000Z",
"to": 1672617600000,
"to_as_string": "2023-01-02T00:00:00.000Z",
"doc_count": 2,
"total_sales": {
"value": 1500.5
}
},
{
"key": "2023-01-02T00:00:00.000Z-*",
"from": 1672617600000,
"from_as_string": "2023-01-02T00:00:00.000Z",
"doc_count": 3,
"total_sales": {
"value": 2200.75
}
}
]
}
}
}

Analysis

output:

  • Sales up to January 2, 2023, were $1500.50.
  • Sales from January 2, 2023, onwards were $2200.75.

Nested Date Aggregations

Sometimes, you may need more complex aggregations, such as grouping by month and then by day within each month. This is where nested aggregations come in handy.

Example: Grouping by Month and Then by Day

POST /sales_data/_search
{
"size": 0,
"aggs": {
"sales_per_month": {
"date_histogram": {
"field": "date",
"calendar_interval": "month"
},
"aggs": {
"sales_per_day": {
"date_histogram": {
"field": "date",
"calendar_interval": "day"
},
"aggs": {
"total_sales": {
"sum": {
"field": "amount"
}
}
}
}
}
}
}
}

Output:

{
"aggregations": {
"sales_per_month": {
"buckets": [
{
"key_as_string": "2023-01-01T00:00:00.000Z",
"key": 1672531200000,
"doc_count": 5,
"sales_per_day": {
"buckets": [
{
"key_as_string": "2023-01-01T00:00:00.000Z",
"key": 1672531200000,
"doc_count": 2,
"total_sales": {
"value": 1500.5
}
},
{
"key_as_string": "2023-01-02T00:00:00.000Z",
"key": 1672617600000,
"doc_count": 1,
"total_sales": {
"value": 650.75
}
},
{
"key_as_string": "2023-01-03T00:00:00.000Z",
"key": 1672704000000,
"doc_count": 2,
"total_sales": {
"value": 1550.0
}
}
]
}
}
]
}
}
}

Analysis

The total sales for each day within January 2023.

Conclusion

In this article, we've covered how to perform group-by-date operations in Elasticsearch using date histograms, custom date ranges, and nested aggregations. These techniques are powerful for analyzing time-series data, generating reports, and creating dashboards.

By mastering date aggregations, you can unlock the full potential of Elasticsearch for time-based data analysis, making it easier to spot trends, track performance, and make data-driven decisions. Whether you're analyzing sales data, website traffic, or any other time-stamped information, these methods will help you gain deeper insights from your data.


Next Article
Article Tags :

Similar Reads