Skip to content

20231102-vector-register-dynamic-fallback-aes #6981

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

douzzer
Copy link
Contributor

@douzzer douzzer commented Nov 17, 2023

refactor AESNI implementations and *VECTOR_REGISTERS* macros to allow dynamic as-needed fallback to pure C, via WC_AES_C_DYNAMIC_FALLBACK.

wolfssl/wolfcrypt/aes.h: add key_C_fallback[] to struct Aes, and remove comment that "AESNI needs key first, rounds 2nd, not sure why yet" now that AES_128_Key_Expansion_AESNI no longer writes rounds after the expanded key.

wolfcrypt/src/aes.c:

  • add _AESNI or _aesni suffixes/infixes to AESNI implementations that were missing them: AES_CBC_encrypt(), AES_CBC_decrypt_by*(), AES_ECB_encrypt(), AES_*_Key_Expansion(), AES_set_encrypt_key(), AES_set_decrypt_key(), AES_GCM_encrypt(), AES_GCM_decrypt(), AES_XTS_encrypt(), and AES_XTS_decrypt().
  • move key size check from middle to start of wc_AesSetKeyLocal().
  • refactor pure-C AES setkey and cipher implementations to use aes->key_C_fallback when defined(WC_AES_C_DYNAMIC_FALLBACK).
  • refactor wc_AesSetKeyLocal() to set up both AESNI and pure-C expanded keys when defined(WC_AES_C_DYNAMIC_FALLBACK).
  • refactor all (haveAESNI && aes->use_aesni) conditions to just (aes->use_aesni).
  • add macros VECTOR_REGISTERS_PUSH and VECTOR_REGISTERS_POP, which do nothing but push a brace level when !defined(WC_AES_C_DYNAMIC_FALLBACK), but when defined(WC_AES_C_DYNAMIC_FALLBACK), they call SAVE_VECTOR_REGISTERS2() and on failure, temporarily clear aes->use_aesni and restore at _POP().
  • refactor all invocations of SAVE_VECTOR_REGISTERS() and RESTORE_VECTOR_REGISTERS() to VECTOR_REGISTERS_PUSH and VECTOR_REGISTERS_POP, except in wc_AesSetKeyLocal(), wc_AesXtsEncrypt(), and wc_AesXtsDecrypt(), which are refactored to use SAVE_VECTOR_REGISTERS2(), with graceful failure concealment if defined(WC_AES_C_DYNAMIC_FALLBACK).
  • orthogonalize cleanup code in wc_AesCbcEncrypt(), wc_AesCcmEncrypt() and wc_AesCcmDecrypt().
  • streamline fallthrough software definitions of wc_AesEncryptDirect() and wc_AesDecryptDirect(), and remove special-casing for defined(WOLFSSL_LINUXKM)&&defined(WOLFSSL_AESNI).

wolfcrypt/src/aes_asm.{S,asm}:

  • remove errant movl $10, 240(%rsi) from AES_128_Key_Expansion_AESNI.
  • add _AESNI suffixes/infixes to implementations that needed them.

wolfcrypt/src/{aes_gcm_asm.{S,asm},aes_xts_asm.S}: regenerate from revisions in https://github.com/wolfSSL/scripts/pull/357 -- adds _aesni suffixes to implementations that were missing them.

wolfssl/wolfcrypt/types.h: remove DEBUG_VECTOR_REGISTER_ACCESS macros, and add dummy fallthrough definitions for SAVE_VECTOR_REGISTERS2 and WC_DEBUG_SET_VECTOR_REGISTERS_RETVAL.

wolfssl/wolfcrypt/memory.h: adopt DEBUG_VECTOR_REGISTER_ACCESS code from types.h, and add definitions for WC_DEBUG_VECTOR_REGISTERS_RETVAL_INITVAL and WC_DEBUG_SET_VECTOR_REGISTERS_RETVAL.

linuxkm/linuxkm_wc_port.h: add arch-specific macro definitions for SAVE_VECTOR_REGISTERS2().

wolfcrypt/benchmark/benchmark.c: add missing gates around calls to RESTORE_VECTOR_REGISTERS().

configure.ac:

  • cover various interdependencies in enable-all/enable-all-crypto, for better behavior in combination with --disable-aesgcm, --disable-ecc, --disable-ocsp, --disable-hmac, --disable-chacha, --disable-ed25519, and --disable-ed448.
  • inhibit aesgcm_stream in enable-all/enable-all-crypto when ENABLED_LINUXKM_DEFAULTS, because it is currently incompatible with WC_AES_C_DYNAMIC_FALLBACK.
  • add -DWC_AES_C_DYNAMIC_FALLBACK when ENABLED_LINUXKM_DEFAULTS.
  • add 3 new interdependency checks: "ECCSI requires ECC.", "SAKKE requires ECC.", "WOLFSSH requires HMAC."

wolfcrypt/src/asn.c: tweak gating to accommodate defined(NO_RSA) && !defined(HAVE_ECC).

wolfcrypt/src/evp.c: tweak gating to accommodate defined(NO_HMAC).

wolfcrypt/src/logging.c: remove DEBUG_VECTOR_REGISTER_ACCESS code (moved to memory.c).

wolfcrypt/src/memory.c: change #include of settings.h to types.h; adopt DEBUG_VECTOR_REGISTER_ACCESS code from logging.c; add implementation of SAVE_VECTOR_REGISTERS2_fuzzer().

wolfcrypt/src/pwdbased.c: add explanatory #error scrypt requires HMAC.

wolfcrypt/test/test.c:

  • add DEBUG_VECTOR_REGISTER_ACCESS clauses to aes_xts_128_test(), aesecb_test(), aesctr_test(), aes_test() CBC section, aes256_test() CBC section, and aesgcm_default_test_helper()
  • remove duplicate wc_AesEcbDecrypt() call in aesecb_test().
  • add gating for pbkdf2_test().
  • fix cleanup code in dsa_test().
  • fix gating in pkcs7authenveloped_run_vectors() to accommodate !defined(HAVE_AESGCM).
  • fix gating in cryptocb_test() to accommodate defined(NO_HMAC).

wolfssl/wolfcrypt/cryptocb.h: remove gates around pk sub-struct of struct wc_CryptoInfo -- wc_CryptoInfo.pk.type (an int) is used unconditionally when --enable-debug, and is used with DH.

wolfssl/wolfcrypt/error-crypt.h: fix whitespace.

tested with wolfssl-multi-test.sh ... super-quick-check, and with over 4000 iterations of fuzzing thusly:

$ ./configure --quiet --disable-jobserver --enable-fips=disabled --enable-all-crypto --enable-cryptonly --disable-ocsp --disable-pkcs7 --disable-curve25519 --disable-ed25519 --disable-curve448 --disable-ed448 --disable-dh --disable-anon --disable-dsa --disable-rsa --disable-ecc --disable-chacha --disable-des3 --disable-hmac --disable-camellia --disable-poly1305 --enable-intelasm --disable-aesgcm-stream 'CFLAGS=-DWC_AES_C_DYNAMIC_FALLBACK -DDEBUG_VECTOR_REGISTER_ACCESS -DDEBUG_VECTOR_REGISTERS_ABORT_ON_FAIL -DDEBUG_VECTOR_REGISTER_ACCESS_FUZZING'
$ make
$ iter=0; while :; do iter=$((iter+1)); export WC_DEBUG_VECTOR_REGISTERS_FUZZING_SEED=$(randuint); saferun wolfcrypt/test/testwolfcrypt || break; echo "iter $iter OK, seed=${WC_DEBUG_VECTOR_REGISTERS_FUZZING_SEED}"; done; echo "$WC_DEBUG_VECTOR_REGISTERS_FUZZING_SEED"

… dynamic as-needed fallback to pure C, via WC_AES_C_DYNAMIC_FALLBACK.

wolfssl/wolfcrypt/aes.h: add key_C_fallback[] to struct Aes, and remove comment that "AESNI needs key first, rounds 2nd, not sure why yet" now that AES_128_Key_Expansion_AESNI no longer writes rounds after the expanded key.

wolfcrypt/src/aes.c:
* add _AESNI or _aesni suffixes/infixes to AESNI implementations that were missing them: AES_CBC_encrypt(), AES_CBC_decrypt_by*(), AES_ECB_encrypt(), AES_*_Key_Expansion(), AES_set_encrypt_key(), AES_set_decrypt_key(), AES_GCM_encrypt(), AES_GCM_decrypt(), AES_XTS_encrypt(), and AES_XTS_decrypt().
* move key size check from to start of wc_AesSetKeyLocal().
* refactor pure-C AES setkey and cipher implementations to use aes->key_C_fallback when defined(WC_AES_C_DYNAMIC_FALLBACK).
* refactor wc_AesSetKeyLocal() to set up both AESNI and pure-C expanded keys when defined(WC_AES_C_DYNAMIC_FALLBACK).
* refactor all (haveAESNI && aes->use_aesni) conditions to just (aes->use_aesni).
* add macros VECTOR_REGISTERS_PUSH and VECTOR_REGISTERS_POP, which do nothing but push a brace level when !defined(WC_AES_C_DYNAMIC_FALLBACK), but when defined(WC_AES_C_DYNAMIC_FALLBACK), they call SAVE_VECTOR_REGISTERS2() and on failure, temporarily clear aes->use_aesni and restore at _POP().
* refactor all invocations of SAVE_VECTOR_REGISTERS() and RESTORE_VECTOR_REGISTERS() to VECTOR_REGISTERS_PUSH and VECTOR_REGISTERS_POP, except in wc_AesSetKeyLocal(), wc_AesXtsEncrypt(), and wc_AesXtsDecrypt(), which are refactored to use SAVE_VECTOR_REGISTERS2(), with graceful failure concealment if defined(WC_AES_C_DYNAMIC_FALLBACK).
* orthogonalize cleanup code in wc_AesCbcEncrypt(),  wc_AesCcmEncrypt() and wc_AesCcmDecrypt().
* streamline fallthrough software definitions of wc_AesEncryptDirect() and wc_AesDecryptDirect(), and remove special-casing for defined(WOLFSSL_LINUXKM)&&defined(WOLFSSL_AESNI).

wolfcrypt/src/aes_asm.{S,asm}:
* remove errant "movl $10, 240(%rsi)" from AES_128_Key_Expansion_AESNI.
* add _AESNI suffixes/infixes to implementations that needed them.

wolfcrypt/src/{aes_gcm_asm.{S,asm},aes_xts_asm.S}: regenerate from revisions in scripts#357 -- adds _aesni suffixes to implementations that were missing them.

wolfssl/wolfcrypt/types.h: remove DEBUG_VECTOR_REGISTER_ACCESS macros, and add dummy fallthrough definitions for SAVE_VECTOR_REGISTERS2 and WC_DEBUG_SET_VECTOR_REGISTERS_RETVAL.

wolfssl/wolfcrypt/memory.h: adopt DEBUG_VECTOR_REGISTER_ACCESS code from types.h, and add definitions for WC_DEBUG_VECTOR_REGISTERS_RETVAL_INITVAL and WC_DEBUG_SET_VECTOR_REGISTERS_RETVAL.

linuxkm/linuxkm_wc_port.h: add arch-specific macro definitions for SAVE_VECTOR_REGISTERS2().

wolfcrypt/benchmark/benchmark.c: add missing gates around calls to RESTORE_VECTOR_REGISTERS().

configure.ac:
* cover various interdependencies in enable-all/enable-all-crypto, for better behavior in combination with --disable-aesgcm, --disable-ecc, --disable-ocsp, --disable-hmac, --disable-chacha, --disable-ed25519, and --disable-ed448.
* inhibit aesgcm_stream in enable-all/enable-all-crypto when ENABLED_LINUXKM_DEFAULTS, because it is currently incompatible with WC_AES_C_DYNAMIC_FALLBACK.
* add -DWC_AES_C_DYNAMIC_FALLBACK when ENABLED_LINUXKM_DEFAULTS.
* add 3 new interdependency checks: "ECCSI requires ECC.", "SAKKE requires ECC.", "WOLFSSH requires HMAC."

wolfcrypt/src/asn.c: tweak gating to accommodate defined(NO_RSA) && !defined(HAVE_ECC).

wolfcrypt/src/evp.c: tweak gating to accommodate defined(NO_HMAC).

wolfcrypt/src/logging.c: remove DEBUG_VECTOR_REGISTER_ACCESS code (moved to memory.c).

wolfcrypt/src/memory.c: change #include of settings.h to types.h; adopt DEBUG_VECTOR_REGISTER_ACCESS code from logging.c; add implementation of SAVE_VECTOR_REGISTERS2_fuzzer().

wolfcrypt/src/pwdbased.c: add explanatory #error scrypt requires HMAC.

wolfcrypt/test/test.c:
* add DEBUG_VECTOR_REGISTER_ACCESS clauses to aes_xts_128_test(), aesecb_test(), aesctr_test(), aes_test() CBC section, aes256_test() CBC section, and aesgcm_default_test_helper()
* remove duplicate wc_AesEcbDecrypt() in aesecb_test().
* add gating for pbkdf2_test().
* fix cleanup code in dsa_test().
* fix gating in pkcs7authenveloped_run_vectors() to accommodate !defined(HAVE_AESGCM).
* fix gating in cryptocb_test() to accommodate defined(NO_HMAC).

wolfssl/wolfcrypt/cryptocb.h: remove gates around "pk" sub-struct of struct wc_CryptoInfo -- wc_CryptoInfo.pk.type (an int) is used unconditionally when --enable-debug, and is used with DH.

wolfssl/wolfcrypt/error-crypt.h: fix whitespace.
@douzzer douzzer requested a review from SparkiDev November 17, 2023 07:34
@douzzer douzzer self-assigned this Nov 17, 2023
@douzzer douzzer assigned SparkiDev and unassigned douzzer Nov 18, 2023
@SparkiDev SparkiDev assigned wolfSSL-Bot and unassigned SparkiDev Nov 22, 2023
Copy link
Contributor

@JacobBarthelmeh JacobBarthelmeh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using the command listed as testing with fuzzing I get the following build error:

  CC       wolfcrypt/src/src_libwolfssl_la-memory.lo
wolfcrypt/src/memory.c:1451:44: error: variable has incomplete type 'struct drand48_data'
    static THREAD_LS_T struct drand48_data wc_svr_fuzzing_state;
                                           ^
wolfcrypt/src/memory.c:1451:31: note: forward declaration of 'struct drand48_data'
    static THREAD_LS_T struct drand48_data wc_svr_fuzzing_state;
                              ^
wolfcrypt/src/memory.c:1463:15: error: call to undeclared function 'srand48_r'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
        (void)srand48_r(seed, &wc_svr_fuzzing_state);
              ^
wolfcrypt/src/memory.c:1463:15: note: did you mean 'srand48'?
/Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/include/stdlib.h:240:7: note: 'srand48' declared here
void     srand48(long);
         ^
wolfcrypt/src/memory.c:1466:11: error: call to undeclared function 'lrand48_r'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
    (void)lrand48_r(&wc_svr_fuzzing_state, &result);
          ^
wolfcrypt/src/memory.c:1466:11: note: did you mean 'lrand48'?
/Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/include/stdlib.h:208:7: note: 'lrand48' declared here
long     lrand48(void) __swift_unavailable("Use arc4random instead.");
         ^
3 errors generated.

This was on a Mac

$ sysctl -a | grep cpu | grep brand
machdep.cpu.brand_string: Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz

Removing these defines "-DDEBUG_VECTOR_REGISTER_ACCESS -DDEBUG_VECTOR_REGISTERS_ABORT_ON_FAIL -DDEBUG_VECTOR_REGISTER_ACCESS_FUZZING" avoided the build error.

@@ -6146,12 +6233,12 @@ void AES_GCM_encrypt_avx2(const unsigned char *in, unsigned char *out,
#endif /* HAVE_INTEL_AVX1 */

#ifdef HAVE_AES_DECRYPT
void AES_GCM_decrypt(const unsigned char *in, unsigned char *out,
void AES_GCM_decrypt_aesni(const unsigned char *in, unsigned char *out,
Copy link
Contributor

@JacobBarthelmeh JacobBarthelmeh Nov 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Earlier versions of AES functions were appending aesni using all caps AES_ECB_decrypt_AESNI, is there a reason to not have it be uniform for both CBC/ECB and GCM?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

scripts/aes/x86/gcm_aesni.rb and scripts/aes/x86_64/gcm_aesni.rb already used "_aesni", and I stayed consistent with that. The tweaks in scripts/aes/x86_64/xts_aesni.rb are with "_aesni" for consistency with them. OTOH, I used "_AESNI" in wolfssl/wolfcrypt/src/aes_asm.S for consistency with bulk of existing code in wolfssl/wolfcrypt/src/aes.c. it should be harmonized at some point, which will be easy because none of these implementations are public, but I wanted to punt for now on the decision of whether to go with "_aesni" or "_AESNI". my preference is "_AESNI", FWIW.

@@ -12124,10 +12148,10 @@ int wc_AesXtsDecryptSector(XtsAes* aes, byte* out, const byte* in, word32 sz,
#define HAVE_INTEL_AVX2
#endif /* USE_INTEL_SPEEDUP */

void AES_XTS_encrypt(const unsigned char *in, unsigned char *out, word32 sz,
void AES_XTS_encrypt_aesni(const unsigned char *in, unsigned char *out, word32 sz,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit Picky... but same comment as with GCM, prefer all _AESNI naming or all _aesni.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(see prev comment)

@douzzer
Copy link
Contributor Author

douzzer commented Nov 28, 2023

The build errors on Mac with -DDEBUG_VECTOR_REGISTER_ACCESS_FUZZING are expected. In order to get deterministic behavior in the fuzzing, as needed for reproducibility, I had to use a reentrant PRNG from the C library, with thread-local PRNG state. These _r variant APIs are specific to glibc (no POSIX equiv), and the fuzzing mechanism won't work right without them. Given the main use case for the vector register save/restore mechanism (Linux kernel module) and the type of defect the fuzzing is intended to uncover (internal logic errors around error handling), it didn't seem worth the trouble to try to make it portable.

Btw I just rebased on current master and reran the test (on Linux obviously):

[sanitizer-all-intelasm-c-fallback-fuzzer] [1 of 1] [1538e20dff]
    seed=1990084130
    autogen.sh 1538e20dff...   real 0m17.580s  user 0m15.349s  sys 0m1.215s
    configure...   real 0m18.977s  user 0m10.563s  sys 0m9.720s
    build...   real 4m49.214s  user 13m41.423s  sys 0m10.859s
    check...   real 0m46.700s  user 0m49.507s  sys 0m14.717s
    sanitizer-all-intelasm-c-fallback-fuzzer OK

@JacobBarthelmeh JacobBarthelmeh merged commit 12ee732 into wolfSSL:master Nov 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants