Skip to content

Commit 1aa00ff

Browse files
mgornybenjaminp
authored andcommitted
fixes bpo-31834: Use optimized code for BLAKE2 only with SSSE3+ (#4066)
Rework the code choosing BLAKE2 code paths from using the optimized variant on all x86_64 machines to using it when SSSE3 or better supported instructions sets are available. Firstly, this solves the problem of using pure SSE2 code path on x86_64 machines. As reported in the bug, this code is slower than the reference code on all tested x86_64 machines. Furthermore, on Athlon64 that lacks SSSE3, it is even 2.5 times slower than the reference code! Checking for SSSE3 therefore ensures that the optimized implementation will only be used when it has a chance of performing better. Secondly, this makes it possible to use SSSE3+ optimizations on 32-bit x86 systems. This allows for even 2 times speed gain on modern 32-bit x86 systems (tested in a 32-bit chroot).
1 parent 3b66ebe commit 1aa00ff

File tree

4 files changed

+8
-11
lines changed

4 files changed

+8
-11
lines changed
Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
Use optimized code for BLAKE2 only with SSSE3+. The pure SSE2 implementation
2+
is slower than the pure C reference implementation.

Modules/_blake2/blake2b_impl.c

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,9 @@
2626
#include "impl/blake2.h"
2727
#include "impl/blake2-impl.h" /* for secure_zero_memory() and store48() */
2828

29-
#ifdef BLAKE2_USE_SSE
29+
/* pure SSE2 implementation is very slow, so only use the more optimized SSSE3+
30+
* https://bugs.python.org/issue31834 */
31+
#if defined(__SSSE3__) || defined(__SSE4_1__) || defined(__AVX__) || defined(__XOP__)
3032
#include "impl/blake2b.c"
3133
#else
3234
#include "impl/blake2b-ref.c"

Modules/_blake2/blake2s_impl.c

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,9 @@
2626
#include "impl/blake2.h"
2727
#include "impl/blake2-impl.h" /* for secure_zero_memory() and store48() */
2828

29-
#ifdef BLAKE2_USE_SSE
29+
/* pure SSE2 implementation is very slow, so only use the more optimized SSSE3+
30+
* https://bugs.python.org/issue31834 */
31+
#if defined(__SSSE3__) || defined(__SSE4_1__) || defined(__AVX__) || defined(__XOP__)
3032
#include "impl/blake2s.c"
3133
#else
3234
#include "impl/blake2s-ref.c"

setup.py

Lines changed: 0 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -922,19 +922,10 @@ def detect_modules(self):
922922
'Modules/_blake2/impl/*'))
923923
blake2_deps.append('hashlib.h')
924924

925-
blake2_macros = []
926-
if (not cross_compiling and
927-
os.uname().machine == "x86_64" and
928-
sys.maxsize > 2**32):
929-
# Every x86_64 machine has at least SSE2. Check for sys.maxsize
930-
# in case that kernel is 64-bit but userspace is 32-bit.
931-
blake2_macros.append(('BLAKE2_USE_SSE', '1'))
932-
933925
exts.append( Extension('_blake2',
934926
['_blake2/blake2module.c',
935927
'_blake2/blake2b_impl.c',
936928
'_blake2/blake2s_impl.c'],
937-
define_macros=blake2_macros,
938929
depends=blake2_deps) )
939930

940931
sha3_deps = glob(os.path.join(os.getcwd(), srcdir,

0 commit comments

Comments
 (0)