-
Notifications
You must be signed in to change notification settings - Fork 126
Description
I'm building an index on a really large dataset (the 96 repos with the most commits under google's github org).
Ran this:
CREATE INDEX file_by_lang ON files USING pilosa (LANGUAGE(file_path, blob_content));And it started creating it right away, at 22:53:32.
time="2018-10-10T22:53:32Z" level=debug msg="executing query" query="\nCREATE INDEX file_by_lang ON files USING pilosa (LANGUAGE(file_path, blob_content));\n"
time="2018-10-10T22:53:32Z" level=info msg="starting to save the index" async=true driver=pilosa id=file_by_lang
time="2018-10-10T22:53:33Z" level=debug msg="still creating index" driver=pilosa duration=1.467891121s id=file_by_lang rows=10000
At 01:00:51 (that's over one hour later) I see this in the logs:
nohup.out:time="2018-10-11T01:00:51Z" level=debug msg="still creating index" driver=pilosa duration=1.394008128s id=file_by_lang rows=33070000
nohup.out:time="2018-10-11T01:00:59Z" level=debug msg="still creating index" driver=pilosa duration=5.915422321s id=file_by_lang rows=10000
WAT? After one hour of computing the index simply restarted and the rows went down to zero?
There's no other logs in between, so no other query was made that could have dropped the previous index or anything like that.
I've also considered whether this is some kind of overflow, but I don't see how. 33070000 is close to 33554432 (2^25) but since the increase of rows is 10000 I would have expected to see 33080000.
Any ideas on what might be going on in here?
You can find the logs attached: file_by_lang.log
Also, if you prefer to see all of the logs in my session, they're right here: nohup.log