Skip to content

Commit 02e665c

Browse files
dakhubgitgitster
authored andcommitted
diff-delta.c: Rationalize culling of hash buckets
The previous hash bucket culling resulted in a somewhat unpredictable number of hash bucket entries in the order of magnitude of HASH_LIMIT. Replace this with a Bresenham-like algorithm leaving us with exactly HASH_LIMIT entries by uniform culling. Signed-off-by: David Kastrup <[email protected]> Acked-by: Nicolas Pitre <[email protected]> Signed-off-by: Junio C Hamano <[email protected]>
1 parent d210086 commit 02e665c

File tree

1 file changed

+31
-10
lines changed

1 file changed

+31
-10
lines changed

diff-delta.c

Lines changed: 31 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -207,19 +207,40 @@ struct delta_index * create_delta_index(const void *buf, unsigned long bufsize)
207207
* the reference buffer.
208208
*/
209209
for (i = 0; i < hsize; i++) {
210-
if (hash_count[i] < HASH_LIMIT)
210+
int acc;
211+
212+
if (hash_count[i] <= HASH_LIMIT)
211213
continue;
214+
215+
entries -= hash_count[i] - HASH_LIMIT;
216+
/* We leave exactly HASH_LIMIT entries in the bucket */
217+
212218
entry = hash[i];
219+
acc = 0;
213220
do {
214-
struct unpacked_index_entry *keep = entry;
215-
int skip = hash_count[i] / HASH_LIMIT;
216-
do {
217-
--entries;
218-
entry = entry->next;
219-
} while(--skip && entry);
220-
++entries;
221-
keep->next = entry;
222-
} while(entry);
221+
acc += hash_count[i] - HASH_LIMIT;
222+
if (acc > 0) {
223+
struct unpacked_index_entry *keep = entry;
224+
do {
225+
entry = entry->next;
226+
acc -= HASH_LIMIT;
227+
} while (acc > 0);
228+
keep->next = entry->next;
229+
}
230+
entry = entry->next;
231+
} while (entry);
232+
233+
/* Assume that this loop is gone through exactly
234+
* HASH_LIMIT times and is entered and left with
235+
* acc==0. So the first statement in the loop
236+
* contributes (hash_count[i]-HASH_LIMIT)*HASH_LIMIT
237+
* to the accumulator, and the inner loop consequently
238+
* is run (hash_count[i]-HASH_LIMIT) times, removing
239+
* one element from the list each time. Since acc
240+
* balances out to 0 at the final run, the inner loop
241+
* body can't be left with entry==NULL. So we indeed
242+
* encounter entry==NULL in the outer loop only.
243+
*/
223244
}
224245
free(hash_count);
225246

0 commit comments

Comments
 (0)