-
Notifications
You must be signed in to change notification settings - Fork 345
ALP implementation #3994
ALP implementation #3994
Conversation
|
Some older benchmarks can be found in this PR |
c85af48 to
a328bbd
Compare
011d428 to
1ffc72f
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #3994 +/- ##
==========================================
+ Coverage 83.80% 84.03% +0.23%
==========================================
Files 1321 1331 +10
Lines 52228 53126 +898
Branches 7302 7400 +98
==========================================
+ Hits 43772 44647 +875
- Misses 8301 8308 +7
- Partials 155 171 +16 ☔ View full report in Codecov by Sentry. |
b832548 to
71cc55a
Compare
71cc55a to
88652bd
Compare
7666ed3 to
ca5f9a8
Compare
ca5f9a8 to
d56eaeb
Compare
d56eaeb to
41f0bd2
Compare
a216e89 to
137426e
Compare
Remove unused exception handling code Refactor column write so different behaviour can be implemented for floats Run clang-format In place update WIP Implement initial version of in place updates Run clang-format Optimize in-place updates In-place update fixes Fixes after rebase Cache exception buffer in memory during transaction Fix warnings Compress chunk test cleanup Code cleanup 1 Fix compile issues on other platforms Add missing Column::initializeScanState() calls Fix overflow error in ALP exception count Add test TODOs Ignore current exceptions is entire chunk is updated Only flush used parts of exception buffer Avoid searching among non-finalized exceptions CI fix In place update optimizations Code Cleanup 1 Update exceptions in place if they are replaced by new exceptions Fix in-place updates for vector inputs Tests/modififications for constant/uncompressed compression type Add tests Code cleanup + tests Rust build fix Enable assertion during in place update More code cleanup + tests More code cleanup 2 Fix assertion failure in GetFloatCompressionMetadata when numValues is 0 Bump DB version Pass std::function by reference Allow use of constant compression for encoded floats Update uncompressed test to not break on 32-bit Bump extension version Optimize Pad compression metadata to multiple of 8 bytes self-review self-review 2 Add test for column chunk metadata serialize/deserialize Run clang-format Update values in compress chunk test to work on 32-bit system Remove unneeded changes to hash index/disk array Use existing serializer/deserializer for column chunk metadata Change ALP_EXCEPTION_* to physical type Fix test failures Address misc review comments Fix test failures again
Fix issues after rebase Address review comments Address review comments 2
a52cdf2 to
408ca5a
Compare
408ca5a to
08f22f7
Compare
| static LogicalType ANY(PhysicalTypeID physicalType) { | ||
| auto ret = LogicalType(LogicalTypeID::ANY); | ||
| ret.physicalType = physicalType; | ||
| return ret; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added this constructor to treat ANY as an internal logical type that contains a physical type of our choice. This is because logical types are needed to construct Column and ColumnChunkData.
| KUZU_API static std::vector<LogicalType> copy(const std::vector<LogicalType*>& types); | ||
|
|
||
| static LogicalType ANY() { return LogicalType(LogicalTypeID::ANY); } | ||
| static LogicalType ANY(PhysicalTypeID physicalType) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment here say this is a temporary hack and this interface is NOT supposed to be used anywhere else as we should get rid of this later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated
Benchmark ResultMaster commit hash:
|
ALP implementation (cherry picked from commit d3abcd0)
Description
Implements (ALP compression for floating-point values)[https://dl.acm.org/doi/pdf/10.1145/3626717]. The general idea of this compression algorithm consists of the following steps:
e,f(these can be found by sampling values, running the below steps, and picking the ones that provide the best compression ratio)encoded_value = float_value * 10^f * 10^(-e). We test if encoding and decoding a value will result in a loss of data. For values where there is no loss of data, the encoded integers are bitpacked normally. For values where there is a loss of data ('exceptions'), they are stored separately in uncompressed form.Generally, the compression ratio will improve if all the values are in a similar range + have similar decimal precision.
This implementation deviates from the reference implementation (duckdb) in a few ways:
The deviations generally negatively affect our performance and compression ratio; we should update our implementation once segmentation is done to follow the paper more faithfully.
Contributor agreement