Data Pagination Using Elasticsearch in Golang - by Eugene Nikolaev - Medium
Data Pagination Using Elasticsearch in Golang - by Eugene Nikolaev - Medium
Search Write
Sign up to discover
Elasticsearch, human
a powerful searchstories that deepen
and analytics engine, your understanding
provides robust
of suited
capabilities for data management the world.
for the cases when non-relational
databases are used. This article will explore how to use Elasticsearch’s
features within Golang to implement different data pagination strategies.
Membership
Setting up Elasticsearch and Golang Environment
Free
Access theenvironment.
Before diving into the code, let’s set up the necessary best member-only Ensure
stories.
installed on your
Organize your machine
knowledge with along
lists andwith the required
Listen Elasticsearch
to audio narrations.Golang
clienthighlights.
libraries. Read offline.
Tell your story. Find your audience.
Join the Partner Program and earn for
You can find how to setup dockerized ES on your
yourlocal
writing.machine in my other
article: https://satorsight.medium.com/setting-up-elasticsearch-with-
Sign up for free
Also, you can find entire project used in this article on github:
https://github.com/SatorSight/go-elastic-w-pagination
// main.go
func prepareESClient() *es.Client {
esHost := "http://localhost:4566/es/us-east-1/my-data"
Sign up to discover human stories that deepen your understanding
esUsername := ""
esPassword := ""
of the world.
esIndex := "my-index"
esCfg := elasticsearch.Config{
Addresses: []string{esHost},
Username: esUsername,
Password: esPassword,
CloudID: "",
APIKey: "",
Header: nil,
CACert: nil,
RetryOnStatus: nil, // List of status codes for retry. Default: 5
DisableRetry: false,
EnableRetryOnTimeout: true,
MaxRetries: 3,
DiscoverNodesOnStart: false,
DiscoverNodesInterval: 0,
EnableMetrics: true,
Sign up to discover human stories that deepen your understanding
EnableDebugLogger: true,
RetryBackoff:
Transport: of the world.
nil,
t,
Logger: esLogger,
Selector: nil,
ConnectionPoolFunc: nil,
}
Membership
customStorageCfg
Distraction-free := ads.
reading. No es.Config{ Support independent authors.
DefaultIndex: esIndex,
Organize your knowledge with lists and
MaxSearchQueryTimeout: maxTimeout, Listen to audio narrations.
highlights.
PathToMappingSchema: "",
IsTrackTotalHits: Read offline.
true, // always needed for cnt operations.
Tell}your story. Find your audience.
Join the Partner Program and earn for
your writing.
esClient, err := es.New(lg, esCfg, customStorageCfg)
if err != nil {
log.Fatalln("failed to init esClient")
}
return esClient
}
...
// es.go
func New(log *logger.Logger, esCfg elasticsearch.Config, customCfg Config) (*Cli
es, err := elasticsearch.NewClient(esCfg)
if err != nil {
log.Info("Could not create new ElasticSearch client due error")
return nil, err
}
c := &Client{
log: log,
esCfg: esCfg,
esClient: es,
defaultIndex: customCfg.DefaultIndex,
maxSearchQueryTimeout: customCfg.MaxSearchQueryTimeout,
isTrackTotalHits: true,
}
return c, nil
}
Free
Access the best member-only stories.
{
"mappings": {reading. No ads.
Distraction-free Support independent authors.
"properties": {
Organize
"id":your{knowledge with lists and Listen to audio narrations.
"type": "keyword"
highlights.
}, Read offline.
Tell your story. Find your{audience.
"created_at":
"type": "date" Join the Partner Program and earn for
}, your writing.
"username": {
"type": "keyword"
}
}
}
}
Our ES index will contain users with id, created_at and username fields.
// main.go
func createIndex(client *es.Client, ctx context.Context, index string) {
err := client.CreateIndex(ctx, index, "mapping.json")
if err != nil {
log.Fatalf("failed to create index: %v", err)
}
}
// es.go
func (c *Client) CreateIndex(ctx context.Context, index string, mapping string)
var file []byte
file, err := os.ReadFile(mapping)
if err != nil || file == nil {
Sign up to discover human stories that deepen your understanding
c.log.Fatal("Could not read file with mapping defaultIndex schema",
of the world.
zap.String("path_to_mapping_schema", mapping),
zap.Error(err))
}
indexMappingSchema := string(file)
Distraction-free
res, err :=reading. No ads. c.esClient)
req.Do(ctx, Support independent authors.
if err != nil {
Organize your knowledge
return with lists and
fmt.Errorf("err Listen to%v",
creating defaultIndex: audio err)
narrations.
highlights.
}
Read offline.
Tell your story. Find your audience.
defer func() {
Join the Partner Program and earn for
err = res.Body.Close()
your writing.
if err != nil {
c.log.Error("res.Body.Close() problem", zap.Error(err))
}
}()
if res.IsError() {
return fmt.Errorf("err creating defaultIndex. res: %s", res.String())
}
return nil
}
Then in main.go
func main() {
esClient := prepareESClient() // described above
ctx := prepareContext()
After running it we will have index created, next lets load 100k users into it:
Membership
Free
Access the best member-only stories.
// es.go
func (c *Client) Create100kUsers(ctx context.Context, index string) {
user := User{
ID: 0,
CreatedAt: time.Now(),
Username: "init",
}
This will run for a while but after it we will have an index with 100k users to
test pagination on.
Sign up to
Fetching discover
Data Withouthuman stories that deepen your understanding
Pagination
of the world.
Let’s start by looking at a simple data retrieval method from Elasticsearch in
Golang without pagination. The following code demonstrates a basic
retrieval mechanism. In Load function I omit most of the boilerplate code
Membership
like error handling because it’s pretty wordy:
Free
Access the best member-only stories.
//Organize
main.go your knowledge with lists and Listen to audio narrations.
func main() {
highlights.
esClient := prepareESClient() Read offline.
ctxyour
Tell := story.
prepareContext()
Find your audience.
Join the Partner Program and earn for
res := simpleLoad(esClient, ctx, "my-index") your writing.
pp(res)
}
// es.go
func (c *Client) Load(
ctx context.Context,
index string,
from int,
size int,
cursor float64,
) (SearchResult, error) {
query := map[string]interface{}{
"query": map[string]interface{}{
"match_all": map[string]interface{}{},
},
}
sortQuery := []map[string]map[string]interface{}{
{"ID": {"order": "asc"}},
}
query["sort"] = sortQuery
Signifupcursor
to discover human stories that deepen your understanding
!= 0 {
of the world.
query["search_after"] = []float64{cursor}
}
Organize
res, err = your knowledge with lists and
c.esClient.Search( Listen to audio narrations.
highlights.
c.esClient.Search.WithContext(ctx),
Read offline.
c.esClient.Search.WithTimeout(c.maxSearchQueryTimeout),
Tell your story. Find your audience.
c.esClient.Search.WithIndex(index),
c.esClient.Search.WithBody(&buf), Join the Partner Program and earn for
c.esClient.Search.WithFrom(from), your writing.
c.esClient.Search.WithSize(size),
c.esClient.Search.WithTrackTotalHits(c.isTrackTotalHits),
c.esClient.Search.WithPretty(), // todo remove in case of performance degrada
)
result :=
func() SearchResult {
totalCnt := int64(r["hits"].(map[string]interface{})["total"].(map[string]int
if totalCnt == 0 {
return SearchResult{}
}
cntFind := len(r["hits"].(map[string]interface{})["hits"].([]interface{}))
docs := make([]User, 0, cntFind)
var lastSort float64
Sign upifc.log.Error("es.client.Load()
toerrdiscover human stories that
= jsoniter.Unmarshal(jsonBody, deepen
&doc); err !=your
nil { understanding
zap.Any("r['hits']", r["hits"]),
zap.Error(err),
)
Membership
return SearchResult{Users: docs, TotalCount: totalCnt}
}
Free
Access the best member-only stories.
docs = append(docs, doc)
}
Distraction-free reading. No ads. Support independent authors.
return SearchResult{Users: docs, TotalCount: totalCnt, LastSort: lastSort}
Organize
}() your knowledge with lists and Listen to audio narrations.
highlights.
return result, nil Read offline.
} Tell your story. Find your audience.
Join the Partner Program and earn for
your writing.
This function will fetch the first 10 users from ES. The request is gonna
contain _search?from=0&size=10. The same function is gonna be reused
further.
and size params. Let’s implement a basic paginated output iterating with
from and size parameters:
Signfor
upito:=discover human stories that deepen your understanding
from; i < 100; i += size {
ofindex,
res, err2 := client.Load(ctx, the world.
i, size, 0)
if err2 != nil {
log.Fatalf("failed to fetch results: %v", err2)
}
The problem
The problem begins if we have a lot of data and try to go beyond 10000
records depth. If we try to do that by making request with something like
?from=10000&size=10, we will get following error:
Result window is too large, from + size must be less than or equal to: [10000]
but was [10010]. See the scroll api for a more efficient way to request
large data sets. This limit can be set by changing the
[index.max_result_window] index level setting.
This means that we can use from&size pagination only for the first 10k
records using certain sort conditions. The reason is written in ES docs:
Search requests take heap memory and time proportional to from + size and this
limits that memory.
Sign up towediscover
In theory can raise human
that limitstories that deepen
to something your understanding
like 100k,
of the world.
PUT /my-index/_settings
{
Membership
"index.max_result_window": 100000
Free
} Access the best member-only stories.
To use Search After API we first need to choose a sort field and direction. In
this article I will be using the most simple sort by ID ASC:
sortQuery := []map[string]map[string]interface{}{
{"ID": {"order": "asc"}},
}
In real projects integer ids are not always present (for example when using
UUID), and in that case I would consider using created_at + uuid or inner
“_id” field for pagination. For each field in sort subquery ES will give us
cursors in response, for example:
"search_after": [
"1O9tYowBuAaJdMU4BeRn",
1702465740836
],
"sort": {
Free
Access the best member-only stories.
ls = res2.LastSort
log.Printf("current cursor: %v\n", ls)
users := res2.Users
res = append(res, users...)
Sign up}return
to discover
res human stories that deepen your understanding
} of the world.
Membership
This will consequently scroll results using cursors.
Free
Access the best member-only stories.
Another thing to be careful about is that when using multiple cursors, the
order of them is important, and should be the same as the order of the fields
mentioned in the sort. For example,
sortQuery := []map[string]map[string]interface{}{
{"created_at": {"order": "asc"}},
{"_id": {"order": "asc"}},
}
...
"search_after": [
"1O9tYowBuAaJdMU4BeRn", // cursor for _id
1702465740836 // cursor for created_at
],
// error!
This one will give 400 error from the go-elasticsearch library because it will
try to cast cursors into the wrong type, in created_at cursor is float64
timestamp and for _id (inner uuid-like thing) it will be string.
0 Followers
Sign upfromtoEugene
See all discover
Nikolaev human stories that deepen your understanding
of the world.
Membership
Free
Access the best member-only stories.
Recommended fromNoMedium
Distraction-free reading. ads. Support independent authors.
Lists
Membership
Free
Access the best member-only stories.
Organize
Wahyu your
Hutomo Adjiknowledge with lists and CemListen
Bideci to audio narrations.
highlights.
Read offline.
Basic Tutorial GORM Implementing the Saga Pattern in
Tell your story. Find your audience.
GORM is An object-relational mapper (ORM) Go: A Hands-On
Join the PartnerApproach
Program and earn for
code library that automates the transfer of… yourfellow
Hey there, writing.
tech enthusiasts! Today, let’s
delve into the fascinating realm of software…
Jan 4 22 May 1 56 3
Brian in Dev Genius Mahes Sawira
Jan 29 20 Feb 8 9
Sign up recommendations
See more to discover human stories that deepen your understanding
of the world.
Membership
Free
Access the best member-only stories.