0% found this document useful (0 votes)

14K views3,289 pages

ISA A64 XML V88a-2021-12 OPT

Uploaded by

Wei Jin

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14K views3,289 pages

ISA A64 XML V88a-2021-12 OPT

Uploaded by

Wei Jin

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 3289

Arm A64 Instruction Set Architecture

Armv8, for Armv8-A architecture profile

Copyright © 2010-2021 Arm Limited (or its affiliates). All rights reserved.
DDI 0596 (ID121321)
Arm A64 Instruction Set Architecture
Armv8, for Armv8-A architecture profile
Copyright © 2010-2021 Arm Limited (or its affiliates). All rights reserved.
Release Information

For information on the change history and known issues for this release, see the Release Notes in the A64 ISA XML for
Armv8.8 (2021-12).

Proprietary Notice

This document is protected by copyright and other related rights and the practice or implementation of the information contained
in this document may be protected by one or more patents or pending patent applications. No part of this document may be
reproduced in any form by any means without the express prior written permission of Arm. No license, express or implied, by
estoppel or otherwise to any intellectual property rights is granted by this document unless specifically stated.

Your access to the information in this document is conditional upon your acceptance that you will not use or permit others to use
the information for the purposes of determining whether implementations infringe any third party patents.

THIS DOCUMENT IS PROVIDED “AS IS”. ARM PROVIDES NO REPRESENTATIONS AND NO WARRANTIES,
EXPRESS, IMPLIED OR STATUTORY, INCLUDING, WITHOUT LIMITATION, THE IMPLIED WARRANTIES OF
MERCHANTABILITY, SATISFACTORY QUALITY, NON-INFRINGEMENT OR FITNESS FOR A PARTICULAR
PURPOSE WITH RESPECT TO THE DOCUMENT. For the avoidance of doubt, Arm makes no representation with respect to,
and has undertaken no analysis to identify or understand the scope and content of, patents, copyrights, trade secrets, or other rights.

This document may include technical inaccuracies or typographical errors.

TO THE EXTENT NOT PROHIBITED BY LAW, IN NO EVENT WILL ARM BE LIABLE FOR ANY DAMAGES,
INCLUDING WITHOUT LIMITATION ANY DIRECT, INDIRECT, SPECIAL, INCIDENTAL, PUNITIVE, OR
CONSEQUENTIAL DAMAGES, HOWEVER CAUSED AND REGARDLESS OF THE THEORY OF LIABILITY, ARISING
OUT OF ANY USE OF THIS DOCUMENT, EVEN IF ARM HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH
DAMAGES.

This document consists solely of commercial items. You shall be responsible for ensuring that any use, duplication or disclosure
of this document complies fully with any relevant export laws and regulations to assure that this document or any portion thereof
is not exported, directly or indirectly, in violation of such export laws. Use of the word “partner” in reference to Arm’s customers
is not intended to create or refer to any partnership relationship with any other company. Arm may make changes to this document
at any time and without notice.

This document may be translated into other languages for convenience, and you agree that if there is any conflict between the
English version of this document and any translation, the terms of the English version of the Agreement shall prevail.

The Arm corporate logo and words marked with ™ or © are registered trademarks or trademarks of Arm Limited (or its affiliates)
in the US and/or elsewhere. All rights reserved. Other brands and names mentioned in this document may be the trademarks of
their respective owners. You must follow the Arm’s trademark usage guidelines
http://www.arm.com/company/policies/trademarks.

Arm Limited. Company 02557590 registered in England.

110 Fulbourn Road, Cambridge, England CB1 9NJ.

(LES-PRE-20349 version 21.0)

Confidentiality Status

This document is Non-Confidential. The right to use, copy and disclose this document may be subject to license restrictions in
accordance with the terms of the agreement entered into by Arm and the party that Arm delivered this document to.

Product Status

This release covers multiple versions of the architecture. The content relating to different versions is given different quality ratings.

The information related to the 2021 Architecture Extensions is at Alpha quality. Alpha quality means that most major features of
the specification are described in the manual, some features and details might be missing.

The information related to the remaining Armv8-A features which was also published in previous releases, is at Beta quality. Beta
quality means that all major features of the specification are described, some details might be missing.

ii Copyright © 2010-2021 Arm Limited (or its affiliates). All rights reserved. DDI 0596
Non-Confidential ID121321
Web Address

http://www.arm.com

Progressive Terminology Commitment

Arm values inclusive communities. Arm recognizes that we and our industry have used terms that can be offensive. Arm strives
to lead the industry and create change.

Previous issues of this document included terms that can be offensive. We have replaced these terms. If you find offensive terms
in this document, please contact [email protected].

DDI 0596 Copyright © 2010-2021 Arm Limited (or its affiliates). All rights reserved. iii
ID121321 Non-Confidential
iv Copyright © 2010-2021 Arm Limited (or its affiliates). All rights reserved. DDI 0596
Non-Confidential ID121321
A64 -- Base Instructions (alphabetic order)

A64 -- Base Instructions (alphabetic order)

ADC: Add with Carry.

ADCS: Add with Carry, setting flags.

ADD (extended register): Add (extended register).

ADD (immediate): Add (immediate).

ADD (shifted register): Add (shifted register).

ADDG: Add with Tag.

ADDS (extended register): Add (extended register), setting flags.

ADDS (immediate): Add (immediate), setting flags.

ADDS (shifted register): Add (shifted register), setting flags.

ADR: Form PC-relative address.

ADRP: Form PC-relative address to 4KB page.

AND (immediate): Bitwise AND (immediate).

AND (shifted register): Bitwise AND (shifted register).

ANDS (immediate): Bitwise AND (immediate), setting flags.

ANDS (shifted register): Bitwise AND (shifted register), setting flags.

ASR (immediate): Arithmetic Shift Right (immediate): an alias of SBFM.

ASR (register): Arithmetic Shift Right (register): an alias of ASRV.

ASRV: Arithmetic Shift Right Variable.

AT: Address Translate: an alias of SYS.

AUTDA, AUTDZA: Authenticate Data address, using key A.

AUTDB, AUTDZB: Authenticate Data address, using key B.

AUTIA, AUTIA1716, AUTIASP, AUTIAZ, AUTIZA: Authenticate Instruction address, using key A.

AUTIB, AUTIB1716, AUTIBSP, AUTIBZ, AUTIZB: Authenticate Instruction address, using key B.

AXFLAG: Convert floating-point condition flags from Arm to external format.

B: Branch.

B.cond: Branch conditionally.

BC.cond: Branch Consistent conditionally.

BFC: Bitfield Clear: an alias of BFM.

BFI: Bitfield Insert: an alias of BFM.

BFM: Bitfield Move.

BFXIL: Bitfield extract and insert at low end: an alias of BFM.

BIC (shifted register): Bitwise Bit Clear (shifted register).

BICS (shifted register): Bitwise Bit Clear (shifted register), setting flags.

BL: Branch with Link.

Page 2
A64 -- Base Instructions (alphabetic order)

BLR: Branch with Link to Register.

BLRAA, BLRAAZ, BLRAB, BLRABZ: Branch with Link to Register, with pointer authentication.

BR: Branch to Register.

BRAA, BRAAZ, BRAB, BRABZ: Branch to Register, with pointer authentication.

BRK: Breakpoint instruction.

BTI: Branch Target Identification.

CAS, CASA, CASAL, CASL: Compare and Swap word or doubleword in memory.

CASB, CASAB, CASALB, CASLB: Compare and Swap byte in memory.

CASH, CASAH, CASALH, CASLH: Compare and Swap halfword in memory.

CASP, CASPA, CASPAL, CASPL: Compare and Swap Pair of words or doublewords in memory.

CBNZ: Compare and Branch on Nonzero.

CBZ: Compare and Branch on Zero.

CCMN (immediate): Conditional Compare Negative (immediate).

CCMN (register): Conditional Compare Negative (register).

CCMP (immediate): Conditional Compare (immediate).

CCMP (register): Conditional Compare (register).

CFINV: Invert Carry Flag.

CFP: Control Flow Prediction Restriction by Context: an alias of SYS.

CINC: Conditional Increment: an alias of CSINC.

CINV: Conditional Invert: an alias of CSINV.

CLREX: Clear Exclusive.

CLS: Count Leading Sign bits.

CLZ: Count Leading Zeros.

CMN (extended register): Compare Negative (extended register): an alias of ADDS (extended register).

CMN (immediate): Compare Negative (immediate): an alias of ADDS (immediate).

CMN (shifted register): Compare Negative (shifted register): an alias of ADDS (shifted register).

CMP (extended register): Compare (extended register): an alias of SUBS (extended register).

CMP (immediate): Compare (immediate): an alias of SUBS (immediate).

CMP (shifted register): Compare (shifted register): an alias of SUBS (shifted register).

CMPP: Compare with Tag: an alias of SUBPS.

CNEG: Conditional Negate: an alias of CSNEG.

CPP: Cache Prefetch Prediction Restriction by Context: an alias of SYS.

CPYFP, CPYFM, CPYFE: Memory Copy Forward-only.

CPYFPN, CPYFMN, CPYFEN: Memory Copy Forward-only, reads and writes non-temporal.

CPYFPRN, CPYFMRN, CPYFERN: Memory Copy Forward-only, reads non-temporal.

CPYFPRT, CPYFMRT, CPYFERT: Memory Copy Forward-only, reads unprivileged.

Page 3
A64 -- Base Instructions (alphabetic order)

CPYFPRTN, CPYFMRTN, CPYFERTN: Memory Copy Forward-only, reads unprivileged, reads and writes non-temporal.

CPYFPRTRN, CPYFMRTRN, CPYFERTRN: Memory Copy Forward-only, reads unprivileged and non-temporal.

CPYFPRTWN, CPYFMRTWN, CPYFERTWN: Memory Copy Forward-only, reads unprivileged, writes non-temporal.

CPYFPT, CPYFMT, CPYFET: Memory Copy Forward-only, reads and writes unprivileged.

CPYFPTN, CPYFMTN, CPYFETN: Memory Copy Forward-only, reads and writes unprivileged and non-temporal.

CPYFPTRN, CPYFMTRN, CPYFETRN: Memory Copy Forward-only, reads and writes unprivileged, reads non-temporal.

CPYFPTWN, CPYFMTWN, CPYFETWN: Memory Copy Forward-only, reads and writes unprivileged, writes non-
temporal.

CPYFPWN, CPYFMWN, CPYFEWN: Memory Copy Forward-only, writes non-temporal.

CPYFPWT, CPYFMWT, CPYFEWT: Memory Copy Forward-only, writes unprivileged.

CPYFPWTN, CPYFMWTN, CPYFEWTN: Memory Copy Forward-only, writes unprivileged, reads and writes non-
temporal.

CPYFPWTRN, CPYFMWTRN, CPYFEWTRN: Memory Copy Forward-only, writes unprivileged, reads non-temporal.

CPYFPWTWN, CPYFMWTWN, CPYFEWTWN: Memory Copy Forward-only, writes unprivileged and non-temporal.

CPYP, CPYM, CPYE: Memory Copy.

CPYPN, CPYMN, CPYEN: Memory Copy, reads and writes non-temporal.

CPYPRN, CPYMRN, CPYERN: Memory Copy, reads non-temporal.

CPYPRT, CPYMRT, CPYERT: Memory Copy, reads unprivileged.

CPYPRTN, CPYMRTN, CPYERTN: Memory Copy, reads unprivileged, reads and writes non-temporal.

CPYPRTRN, CPYMRTRN, CPYERTRN: Memory Copy, reads unprivileged and non-temporal.

CPYPRTWN, CPYMRTWN, CPYERTWN: Memory Copy, reads unprivileged, writes non-temporal.

CPYPT, CPYMT, CPYET: Memory Copy, reads and writes unprivileged.

CPYPTN, CPYMTN, CPYETN: Memory Copy, reads and writes unprivileged and non-temporal.

CPYPTRN, CPYMTRN, CPYETRN: Memory Copy, reads and writes unprivileged, reads non-temporal.

CPYPTWN, CPYMTWN, CPYETWN: Memory Copy, reads and writes unprivileged, writes non-temporal.

CPYPWN, CPYMWN, CPYEWN: Memory Copy, writes non-temporal.

CPYPWT, CPYMWT, CPYEWT: Memory Copy, writes unprivileged.

CPYPWTN, CPYMWTN, CPYEWTN: Memory Copy, writes unprivileged, reads and writes non-temporal.

CPYPWTRN, CPYMWTRN, CPYEWTRN: Memory Copy, writes unprivileged, reads non-temporal.

CPYPWTWN, CPYMWTWN, CPYEWTWN: Memory Copy, writes unprivileged and non-temporal.

CRC32B, CRC32H, CRC32W, CRC32X: CRC32 checksum.

CRC32CB, CRC32CH, CRC32CW, CRC32CX: CRC32C checksum.

CSDB: Consumption of Speculative Data Barrier.

CSEL: Conditional Select.

CSET: Conditional Set: an alias of CSINC.

CSETM: Conditional Set Mask: an alias of CSINV.

CSINC: Conditional Select Increment.

Page 4
A64 -- Base Instructions (alphabetic order)

CSINV: Conditional Select Invert.

CSNEG: Conditional Select Negation.

DC: Data Cache operation: an alias of SYS.

DCPS1: Debug Change PE State to EL1..

DCPS2: Debug Change PE State to EL2..

DCPS3: Debug Change PE State to EL3.

DGH: Data Gathering Hint.

DMB: Data Memory Barrier.

DRPS: Debug restore process state.

DSB: Data Synchronization Barrier.

DVP: Data Value Prediction Restriction by Context: an alias of SYS.

EON (shifted register): Bitwise Exclusive OR NOT (shifted register).

EOR (immediate): Bitwise Exclusive OR (immediate).

EOR (shifted register): Bitwise Exclusive OR (shifted register).

ERET: Exception Return.

ERETAA, ERETAB: Exception Return, with pointer authentication.

ESB: Error Synchronization Barrier.

EXTR: Extract register.

GMI: Tag Mask Insert.

HINT: Hint instruction.

HLT: Halt instruction.

HVC: Hypervisor Call.

IC: Instruction Cache operation: an alias of SYS.

IRG: Insert Random Tag.

ISB: Instruction Synchronization Barrier.

LD64B: Single-copy Atomic 64-byte Load.

LDADD, LDADDA, LDADDAL, LDADDL: Atomic add on word or doubleword in memory.

LDADDB, LDADDAB, LDADDALB, LDADDLB: Atomic add on byte in memory.

LDADDH, LDADDAH, LDADDALH, LDADDLH: Atomic add on halfword in memory.

LDAPR: Load-Acquire RCpc Register.

LDAPRB: Load-Acquire RCpc Register Byte.

LDAPRH: Load-Acquire RCpc Register Halfword.

LDAPUR: Load-Acquire RCpc Register (unscaled).

LDAPURB: Load-Acquire RCpc Register Byte (unscaled).

LDAPURH: Load-Acquire RCpc Register Halfword (unscaled).

LDAPURSB: Load-Acquire RCpc Register Signed Byte (unscaled).

Page 5
A64 -- Base Instructions (alphabetic order)

LDAPURSH: Load-Acquire RCpc Register Signed Halfword (unscaled).

LDAPURSW: Load-Acquire RCpc Register Signed Word (unscaled).

LDAR: Load-Acquire Register.

LDARB: Load-Acquire Register Byte.

LDARH: Load-Acquire Register Halfword.

LDAXP: Load-Acquire Exclusive Pair of Registers.

LDAXR: Load-Acquire Exclusive Register.

LDAXRB: Load-Acquire Exclusive Register Byte.

LDAXRH: Load-Acquire Exclusive Register Halfword.

LDCLR, LDCLRA, LDCLRAL, LDCLRL: Atomic bit clear on word or doubleword in memory.

LDCLRB, LDCLRAB, LDCLRALB, LDCLRLB: Atomic bit clear on byte in memory.

LDCLRH, LDCLRAH, LDCLRALH, LDCLRLH: Atomic bit clear on halfword in memory.

LDEOR, LDEORA, LDEORAL, LDEORL: Atomic exclusive OR on word or doubleword in memory.

LDEORB, LDEORAB, LDEORALB, LDEORLB: Atomic exclusive OR on byte in memory.

LDEORH, LDEORAH, LDEORALH, LDEORLH: Atomic exclusive OR on halfword in memory.

LDG: Load Allocation Tag.

LDGM: Load Tag Multiple.

LDLAR: Load LOAcquire Register.

LDLARB: Load LOAcquire Register Byte.

LDLARH: Load LOAcquire Register Halfword.

LDNP: Load Pair of Registers, with non-temporal hint.

LDP: Load Pair of Registers.

LDPSW: Load Pair of Registers Signed Word.

LDR (immediate): Load Register (immediate).

LDR (literal): Load Register (literal).

LDR (register): Load Register (register).

LDRAA, LDRAB: Load Register, with pointer authentication.

LDRB (immediate): Load Register Byte (immediate).

LDRB (register): Load Register Byte (register).

LDRH (immediate): Load Register Halfword (immediate).

LDRH (register): Load Register Halfword (register).

LDRSB (immediate): Load Register Signed Byte (immediate).

LDRSB (register): Load Register Signed Byte (register).

LDRSH (immediate): Load Register Signed Halfword (immediate).

LDRSH (register): Load Register Signed Halfword (register).

LDRSW (immediate): Load Register Signed Word (immediate).

Page 6
A64 -- Base Instructions (alphabetic order)

LDRSW (literal): Load Register Signed Word (literal).

LDRSW (register): Load Register Signed Word (register).

LDSET, LDSETA, LDSETAL, LDSETL: Atomic bit set on word or doubleword in memory.

LDSETB, LDSETAB, LDSETALB, LDSETLB: Atomic bit set on byte in memory.

LDSETH, LDSETAH, LDSETALH, LDSETLH: Atomic bit set on halfword in memory.

LDSMAX, LDSMAXA, LDSMAXAL, LDSMAXL: Atomic signed maximum on word or doubleword in memory.

LDSMAXB, LDSMAXAB, LDSMAXALB, LDSMAXLB: Atomic signed maximum on byte in memory.

LDSMAXH, LDSMAXAH, LDSMAXALH, LDSMAXLH: Atomic signed maximum on halfword in memory.

LDSMIN, LDSMINA, LDSMINAL, LDSMINL: Atomic signed minimum on word or doubleword in memory.

LDSMINB, LDSMINAB, LDSMINALB, LDSMINLB: Atomic signed minimum on byte in memory.

LDSMINH, LDSMINAH, LDSMINALH, LDSMINLH: Atomic signed minimum on halfword in memory.

LDTR: Load Register (unprivileged).

LDTRB: Load Register Byte (unprivileged).

LDTRH: Load Register Halfword (unprivileged).

LDTRSB: Load Register Signed Byte (unprivileged).

LDTRSH: Load Register Signed Halfword (unprivileged).

LDTRSW: Load Register Signed Word (unprivileged).

LDUMAX, LDUMAXA, LDUMAXAL, LDUMAXL: Atomic unsigned maximum on word or doubleword in memory.

LDUMAXB, LDUMAXAB, LDUMAXALB, LDUMAXLB: Atomic unsigned maximum on byte in memory.

LDUMAXH, LDUMAXAH, LDUMAXALH, LDUMAXLH: Atomic unsigned maximum on halfword in memory.

LDUMIN, LDUMINA, LDUMINAL, LDUMINL: Atomic unsigned minimum on word or doubleword in memory.

LDUMINB, LDUMINAB, LDUMINALB, LDUMINLB: Atomic unsigned minimum on byte in memory.

LDUMINH, LDUMINAH, LDUMINALH, LDUMINLH: Atomic unsigned minimum on halfword in memory.

LDUR: Load Register (unscaled).

LDURB: Load Register Byte (unscaled).

LDURH: Load Register Halfword (unscaled).

LDURSB: Load Register Signed Byte (unscaled).

LDURSH: Load Register Signed Halfword (unscaled).

LDURSW: Load Register Signed Word (unscaled).

LDXP: Load Exclusive Pair of Registers.

LDXR: Load Exclusive Register.

LDXRB: Load Exclusive Register Byte.

LDXRH: Load Exclusive Register Halfword.

LSL (immediate): Logical Shift Left (immediate): an alias of UBFM.

LSL (register): Logical Shift Left (register): an alias of LSLV.

LSLV: Logical Shift Left Variable.

Page 7
A64 -- Base Instructions (alphabetic order)

LSR (immediate): Logical Shift Right (immediate): an alias of UBFM.

LSR (register): Logical Shift Right (register): an alias of LSRV.

LSRV: Logical Shift Right Variable.

MADD: Multiply-Add.

MNEG: Multiply-Negate: an alias of MSUB.

MOV (bitmask immediate): Move (bitmask immediate): an alias of ORR (immediate).

MOV (inverted wide immediate): Move (inverted wide immediate): an alias of MOVN.

MOV (register): Move (register): an alias of ORR (shifted register).

MOV (to/from SP): Move between register and stack pointer: an alias of ADD (immediate).

MOV (wide immediate): Move (wide immediate): an alias of MOVZ.

MOVK: Move wide with keep.

MOVN: Move wide with NOT.

MOVZ: Move wide with zero.

MRS: Move System Register.

MSR (immediate): Move immediate value to Special Register.

MSR (register): Move general-purpose register to System Register.

MSUB: Multiply-Subtract.

MUL: Multiply: an alias of MADD.

MVN: Bitwise NOT: an alias of ORN (shifted register).

NEG (shifted register): Negate (shifted register): an alias of SUB (shifted register).

NEGS: Negate, setting flags: an alias of SUBS (shifted register).

NGC: Negate with Carry: an alias of SBC.

NGCS: Negate with Carry, setting flags: an alias of SBCS.

NOP: No Operation.

ORN (shifted register): Bitwise OR NOT (shifted register).

ORR (immediate): Bitwise OR (immediate).

ORR (shifted register): Bitwise OR (shifted register).

PACDA, PACDZA: Pointer Authentication Code for Data address, using key A.

PACDB, PACDZB: Pointer Authentication Code for Data address, using key B.

PACGA: Pointer Authentication Code, using Generic key.

PACIA, PACIA1716, PACIASP, PACIAZ, PACIZA: Pointer Authentication Code for Instruction address, using key A.

PACIB, PACIB1716, PACIBSP, PACIBZ, PACIZB: Pointer Authentication Code for Instruction address, using key B.

PRFM (immediate): Prefetch Memory (immediate).

PRFM (literal): Prefetch Memory (literal).

PRFM (register): Prefetch Memory (register).

PRFUM: Prefetch Memory (unscaled offset).

Page 8
A64 -- Base Instructions (alphabetic order)

PSB CSYNC: Profiling Synchronization Barrier.

PSSBB: Physical Speculative Store Bypass Barrier: an alias of DSB.

RBIT: Reverse Bits.

RET: Return from subroutine.

RETAA, RETAB: Return from subroutine, with pointer authentication.

REV: Reverse Bytes.

REV16: Reverse bytes in 16-bit halfwords.

REV32: Reverse bytes in 32-bit words.

REV64: Reverse Bytes: an alias of REV.

RMIF: Rotate, Mask Insert Flags.

ROR (immediate): Rotate right (immediate): an alias of EXTR.

ROR (register): Rotate Right (register): an alias of RORV.

RORV: Rotate Right Variable.

SB: Speculation Barrier.

SBC: Subtract with Carry.

SBCS: Subtract with Carry, setting flags.

SBFIZ: Signed Bitfield Insert in Zero: an alias of SBFM.

SBFM: Signed Bitfield Move.

SBFX: Signed Bitfield Extract: an alias of SBFM.

SDIV: Signed Divide.

SETF8, SETF16: Evaluation of 8 or 16 bit flag values.

SETGP, SETGM, SETGE: Memory Set with tag setting.

SETGPN, SETGMN, SETGEN: Memory Set with tag setting, non-temporal.

SETGPT, SETGMT, SETGET: Memory Set with tag setting, unprivileged.

SETGPTN, SETGMTN, SETGETN: Memory Set with tag setting, unprivileged and non-temporal.

SETP, SETM, SETE: Memory Set.

SETPN, SETMN, SETEN: Memory Set, non-temporal.

SETPT, SETMT, SETET: Memory Set, unprivileged.

SETPTN, SETMTN, SETETN: Memory Set, unprivileged and non-temporal.

SEV: Send Event.

SEVL: Send Event Local.

SMADDL: Signed Multiply-Add Long.

SMC: Secure Monitor Call.

SMNEGL: Signed Multiply-Negate Long: an alias of SMSUBL.

SMSUBL: Signed Multiply-Subtract Long.

SMULH: Signed Multiply High.

Page 9
A64 -- Base Instructions (alphabetic order)

SMULL: Signed Multiply Long: an alias of SMADDL.

SSBB: Speculative Store Bypass Barrier: an alias of DSB.

ST2G: Store Allocation Tags.

ST64B: Single-copy Atomic 64-byte Store without Return.

ST64BV: Single-copy Atomic 64-byte Store with Return.

ST64BV0: Single-copy Atomic 64-byte EL0 Store with Return.

STADD, STADDL: Atomic add on word or doubleword in memory, without return: an alias of LDADD, LDADDA,
LDADDAL, LDADDL.

STADDB, STADDLB: Atomic add on byte in memory, without return: an alias of LDADDB, LDADDAB, LDADDALB,
LDADDLB.

STADDH, STADDLH: Atomic add on halfword in memory, without return: an alias of LDADDH, LDADDAH, LDADDALH,
LDADDLH.

STCLR, STCLRL: Atomic bit clear on word or doubleword in memory, without return: an alias of LDCLR, LDCLRA,
LDCLRAL, LDCLRL.

STCLRB, STCLRLB: Atomic bit clear on byte in memory, without return: an alias of LDCLRB, LDCLRAB, LDCLRALB,
LDCLRLB.

STCLRH, STCLRLH: Atomic bit clear on halfword in memory, without return: an alias of LDCLRH, LDCLRAH,
LDCLRALH, LDCLRLH.

STEOR, STEORL: Atomic exclusive OR on word or doubleword in memory, without return: an alias of LDEOR,
LDEORA, LDEORAL, LDEORL.

STEORB, STEORLB: Atomic exclusive OR on byte in memory, without return: an alias of LDEORB, LDEORAB,
LDEORALB, LDEORLB.

STEORH, STEORLH: Atomic exclusive OR on halfword in memory, without return: an alias of LDEORH, LDEORAH,
LDEORALH, LDEORLH.

STG: Store Allocation Tag.

STGM: Store Tag Multiple.

STGP: Store Allocation Tag and Pair of registers.

STLLR: Store LORelease Register.

STLLRB: Store LORelease Register Byte.

STLLRH: Store LORelease Register Halfword.

STLR: Store-Release Register.

STLRB: Store-Release Register Byte.

STLRH: Store-Release Register Halfword.

STLUR: Store-Release Register (unscaled).

STLURB: Store-Release Register Byte (unscaled).

STLURH: Store-Release Register Halfword (unscaled).

STLXP: Store-Release Exclusive Pair of registers.

STLXR: Store-Release Exclusive Register.

STLXRB: Store-Release Exclusive Register Byte.

STLXRH: Store-Release Exclusive Register Halfword.

STNP: Store Pair of Registers, with non-temporal hint.

Page 10
A64 -- Base Instructions (alphabetic order)

STP: Store Pair of Registers.

STR (immediate): Store Register (immediate).

STR (register): Store Register (register).

STRB (immediate): Store Register Byte (immediate).

STRB (register): Store Register Byte (register).

STRH (immediate): Store Register Halfword (immediate).

STRH (register): Store Register Halfword (register).

STSET, STSETL: Atomic bit set on word or doubleword in memory, without return: an alias of LDSET, LDSETA,
LDSETAL, LDSETL.

STSETB, STSETLB: Atomic bit set on byte in memory, without return: an alias of LDSETB, LDSETAB, LDSETALB,
LDSETLB.

STSETH, STSETLH: Atomic bit set on halfword in memory, without return: an alias of LDSETH, LDSETAH, LDSETALH,
LDSETLH.

STSMAX, STSMAXL: Atomic signed maximum on word or doubleword in memory, without return: an alias of LDSMAX,
LDSMAXA, LDSMAXAL, LDSMAXL.

STSMAXB, STSMAXLB: Atomic signed maximum on byte in memory, without return: an alias of LDSMAXB,
LDSMAXAB, LDSMAXALB, LDSMAXLB.

STSMAXH, STSMAXLH: Atomic signed maximum on halfword in memory, without return: an alias of LDSMAXH,
LDSMAXAH, LDSMAXALH, LDSMAXLH.

STSMIN, STSMINL: Atomic signed minimum on word or doubleword in memory, without return: an alias of LDSMIN,
LDSMINA, LDSMINAL, LDSMINL.

STSMINB, STSMINLB: Atomic signed minimum on byte in memory, without return: an alias of LDSMINB, LDSMINAB,
LDSMINALB, LDSMINLB.

STSMINH, STSMINLH: Atomic signed minimum on halfword in memory, without return: an alias of LDSMINH,
LDSMINAH, LDSMINALH, LDSMINLH.

STTR: Store Register (unprivileged).

STTRB: Store Register Byte (unprivileged).

STTRH: Store Register Halfword (unprivileged).

STUMAX, STUMAXL: Atomic unsigned maximum on word or doubleword in memory, without return: an alias of
LDUMAX, LDUMAXA, LDUMAXAL, LDUMAXL.

STUMAXB, STUMAXLB: Atomic unsigned maximum on byte in memory, without return: an alias of LDUMAXB,
LDUMAXAB, LDUMAXALB, LDUMAXLB.

STUMAXH, STUMAXLH: Atomic unsigned maximum on halfword in memory, without return: an alias of LDUMAXH,
LDUMAXAH, LDUMAXALH, LDUMAXLH.

STUMIN, STUMINL: Atomic unsigned minimum on word or doubleword in memory, without return: an alias of
LDUMIN, LDUMINA, LDUMINAL, LDUMINL.

STUMINB, STUMINLB: Atomic unsigned minimum on byte in memory, without return: an alias of LDUMINB,
LDUMINAB, LDUMINALB, LDUMINLB.

STUMINH, STUMINLH: Atomic unsigned minimum on halfword in memory, without return: an alias of LDUMINH,
LDUMINAH, LDUMINALH, LDUMINLH.

STUR: Store Register (unscaled).

STURB: Store Register Byte (unscaled).

STURH: Store Register Halfword (unscaled).

STXP: Store Exclusive Pair of registers.

Page 11
A64 -- Base Instructions (alphabetic order)

STXR: Store Exclusive Register.

STXRB: Store Exclusive Register Byte.

STXRH: Store Exclusive Register Halfword.

STZ2G: Store Allocation Tags, Zeroing.

STZG: Store Allocation Tag, Zeroing.

STZGM: Store Tag and Zero Multiple.

SUB (extended register): Subtract (extended register).

SUB (immediate): Subtract (immediate).

SUB (shifted register): Subtract (shifted register).

SUBG: Subtract with Tag.

SUBP: Subtract Pointer.

SUBPS: Subtract Pointer, setting Flags.

SUBS (extended register): Subtract (extended register), setting flags.

SUBS (immediate): Subtract (immediate), setting flags.

SUBS (shifted register): Subtract (shifted register), setting flags.

SVC: Supervisor Call.

SWP, SWPA, SWPAL, SWPL: Swap word or doubleword in memory.

SWPB, SWPAB, SWPALB, SWPLB: Swap byte in memory.

SWPH, SWPAH, SWPALH, SWPLH: Swap halfword in memory.

SXTB: Signed Extend Byte: an alias of SBFM.

SXTH: Sign Extend Halfword: an alias of SBFM.

SXTW: Sign Extend Word: an alias of SBFM.

SYS: System instruction.

SYSL: System instruction with result.

TBNZ: Test bit and Branch if Nonzero.

TBZ: Test bit and Branch if Zero.

TLBI: TLB Invalidate operation: an alias of SYS.

TSB CSYNC: Trace Synchronization Barrier.

TST (immediate): Test bits (immediate): an alias of ANDS (immediate).

TST (shifted register): Test (shifted register): an alias of ANDS (shifted register).

UBFIZ: Unsigned Bitfield Insert in Zero: an alias of UBFM.

UBFM: Unsigned Bitfield Move.

UBFX: Unsigned Bitfield Extract: an alias of UBFM.

UDF: Permanently Undefined.

UDIV: Unsigned Divide.

UMADDL: Unsigned Multiply-Add Long.

Page 12
A64 -- Base Instructions (alphabetic order)

UMNEGL: Unsigned Multiply-Negate Long: an alias of UMSUBL.

UMSUBL: Unsigned Multiply-Subtract Long.

UMULH: Unsigned Multiply High.

UMULL: Unsigned Multiply Long: an alias of UMADDL.

UXTB: Unsigned Extend Byte: an alias of UBFM.

UXTH: Unsigned Extend Halfword: an alias of UBFM.

WFE: Wait For Event.

WFET: Wait For Event with Timeout.

WFI: Wait For Interrupt.

WFIT: Wait For Interrupt with Timeout.

XAFLAG: Convert floating-point condition flags from external format to Arm format.

XPACD, XPACI, XPACLRI: Strip Pointer Authentication Code.

YIELD: YIELD.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

Page 13
ADC

Add with Carry adds two register values and the Carry flag value, and writes the result to the destination register.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 0 0 1 1 0 1 0 0 0 0 Rm 0 0 0 0 0 0 Rn Rd
op S

32-bit (sf == 0)

ADC <Wd>, <Wn>, <Wm>

64-bit (sf == 1)

ADC <Xd>, <Xn>, <Xm>

integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer datasize = if sf == '1' then 64 else 32;

Assembler Symbols

Operation

bits(datasize) result;
bits(datasize) operand1 = X[n];
bits(datasize) operand2 = X[m];

(result, -) = AddWithCarry(operand1, operand2, PSTATE.C);

X[d] = result;

Operational information

If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

ADC Page 14
ADCS

Add with Carry, setting flags, adds two register values and the Carry flag value, and writes the result to the destination
register. It updates the condition flags based on the result.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 0 1 1 1 0 1 0 0 0 0 Rm 0 0 0 0 0 0 Rn Rd
op S

32-bit (sf == 0)

ADCS <Wd>, <Wn>, <Wm>

64-bit (sf == 1)

ADCS <Xd>, <Xn>, <Xm>

integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer datasize = if sf == '1' then 64 else 32;

Assembler Symbols

Operation

bits(datasize) result;
bits(datasize) operand1 = X[n];
bits(datasize) operand2 = X[m];
bits(4) nzcv;

(result, nzcv) = AddWithCarry(operand1, operand2, PSTATE.C);

PSTATE.<N,Z,C,V> = nzcv;

X[d] = result;

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

ADCS Page 15
ADD (extended register)

Add (extended register) adds a register value and a sign or zero-extended register value, followed by an optional left
shift amount, and writes the result to the destination register. The argument that is extended from the <Rm> register
can be a byte, halfword, word, or doubleword.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 0 0 0 1 0 1 1 0 0 1 Rm option imm3 Rn Rd
op S

32-bit (sf == 0)

ADD <Wd|WSP>, <Wn|WSP>, <Wm>{, <extend> {#<amount>}}

64-bit (sf == 1)

ADD <Xd|SP>, <Xn|SP>, <R><m>{, <extend> {#<amount>}}

integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer datasize = if sf == '1' then 64 else 32;
ExtendType extend_type = DecodeRegExtend(option);
integer shift = UInt(imm3);
if shift > 4 then UNDEFINED;

Assembler Symbols

<Wd|WSP> Is the 32-bit name of the destination general-purpose register or stack pointer, encoded in the "Rd"
field.
<Wn|WSP> Is the 32-bit name of the first source general-purpose register or stack pointer, encoded in the "Rn"
field.
<Wm> Is the 32-bit name of the second general-purpose source register, encoded in the "Rm" field.
<Xd|SP> Is the 64-bit name of the destination general-purpose register or stack pointer, encoded in the "Rd"
field.
<Xn|SP> Is the 64-bit name of the first source general-purpose register or stack pointer, encoded in the "Rn"
field.

<R> Is a width specifier, encoded in “option”:

option <R>
00x W
010 W
x11 X
10x W
110 W

<m> Is the number [0-30] of the second general-purpose source register or the name ZR (31), encoded in the
"Rm" field.

<extend> For the 32-bit variant: is the extension to be applied to the second source operand, encoded in “option”:

option <extend>
000 UXTB
001 UXTH
010 LSL|UXTW
011 UXTX
100 SXTB
101 SXTH
110 SXTW
111 SXTX

ADD (extended register) Page 16

If "Rd" or "Rn" is '11111' (WSP) and "option" is '010' then LSL is preferred, but may be omitted when
"imm3" is '000'. In all other cases <extend> is required and must be UXTW when "option" is '010'.

For the 64-bit variant: is the extension to be applied to the second source operand, encoded in “option”:

option <extend>
000 UXTB
001 UXTH
010 UXTW
011 LSL|UXTX
100 SXTB
101 SXTH
110 SXTW
111 SXTX
If "Rd" or "Rn" is '11111' (SP) and "option" is '011' then LSL is preferred, but may be omitted when
"imm3" is '000'. In all other cases <extend> is required and must be UXTX when "option" is '011'.
<amount> Is the left shift amount to be applied after extension in the range 0 to 4, defaulting to 0, encoded in the
"imm3" field. It must be absent when <extend> is absent, is required when <extend> is LSL, and is
optional when <extend> is present but not LSL.

Operation

bits(datasize) result;
bits(datasize) operand1 = if n == 31 then SP[] else X[n];
bits(datasize) operand2 = ExtendReg(m, extend_type, shift);

(result, -) = AddWithCarry(operand1, operand2, '0');

if d == 31 then
SP[] = result;
else
X[d] = result;

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

ADD (extended register) Page 17

ADD (immediate)

Add (immediate) adds a register value and an optionally-shifted immediate value, and writes the result to the
destination register.
This instruction is used by the alias MOV (to/from SP).
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 0 0 1 0 0 0 1 0 sh imm12 Rn Rd
op S

32-bit (sf == 0)

ADD <Wd|WSP>, <Wn|WSP>, #<imm>{, <shift>}

64-bit (sf == 1)

ADD <Xd|SP>, <Xn|SP>, #<imm>{, <shift>}

integer d = UInt(Rd);
integer n = UInt(Rn);
integer datasize = if sf == '1' then 64 else 32;
bits(datasize) imm;

case sh of
when '0' imm = ZeroExtend(imm12, datasize);
when '1' imm = ZeroExtend(imm12:Zeros(12), datasize);

Assembler Symbols

<Wd|WSP> Is the 32-bit name of the destination general-purpose register or stack pointer, encoded in the "Rd"
field.
<Wn|WSP> Is the 32-bit name of the source general-purpose register or stack pointer, encoded in the "Rn" field.
<Xd|SP> Is the 64-bit name of the destination general-purpose register or stack pointer, encoded in the "Rd"
field.
<Xn|SP> Is the 64-bit name of the source general-purpose register or stack pointer, encoded in the "Rn" field.
<imm> Is an unsigned immediate, in the range 0 to 4095, encoded in the "imm12" field.

<shift> Is the optional left shift to apply to the immediate, defaulting to LSL #0 and encoded in “sh”:

sh <shift>
0 LSL #0
1 LSL #12

Alias Conditions

Alias Is preferred when

MOV (to/from sh == '0' && imm12 == '000000000000' && (Rd == '11111' || Rn == '11111')
SP)

Operation

bits(datasize) result;
bits(datasize) operand1 = if n == 31 then SP[] else X[n];

(result, -) = AddWithCarry(operand1, imm, '0');

if d == 31 then
SP[] = result;
else
X[d] = result;

ADD (immediate) Page 18

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

ADD (immediate) Page 19

ADD (shifted register)

Add (shifted register) adds a register value and an optionally-shifted register value, and writes the result to the
destination register.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 0 0 0 1 0 1 1 shift 0 Rm imm6 Rn Rd
op S

32-bit (sf == 0)

ADD <Wd>, <Wn>, <Wm>{, <shift> #<amount>}

64-bit (sf == 1)

ADD <Xd>, <Xn>, <Xm>{, <shift> #<amount>}

integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer datasize = if sf == '1' then 64 else 32;

if shift == '11' then UNDEFINED;

if sf == '0' && imm6<5> == '1' then UNDEFINED;

ShiftType shift_type = DecodeShift(shift);

integer shift_amount = UInt(imm6);

Assembler Symbols

<shift> Is the optional shift type to be applied to the second source operand, defaulting to LSL and encoded in
“shift”:

shift <shift>
00 LSL
01 LSR
10 ASR
11 RESERVED

<amount> For the 32-bit variant: is the shift amount, in the range 0 to 31, defaulting to 0 and encoded in the
"imm6" field.
For the 64-bit variant: is the shift amount, in the range 0 to 63, defaulting to 0 and encoded in the
"imm6" field.

Operation

bits(datasize) result;
bits(datasize) operand1 = X[n];
bits(datasize) operand2 = ShiftReg(m, shift_type, shift_amount);

(result, -) = AddWithCarry(operand1, operand2, '0');

X[d] = result;

ADD (shifted register) Page 20

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

ADD (shifted register) Page 21

ADDG

Add with Tag adds an immediate value scaled by the Tag granule to the address in the source register, modifies the
Logical Address Tag of the address using an immediate value, and writes the result to the destination register. Tags
specified in GCR_EL1.Exclude are excluded from the possible outputs when modifying the Logical Address Tag.

Integer
(FEAT_MTE)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 0 1 0 0 0 1 1 0 uimm6 (0) (0) uimm4 Xn Xd
op3

ADDG <Xd|SP>, <Xn|SP>, #<uimm6>, #<uimm4>

if !HaveMTEExt() then UNDEFINED;

integer d = UInt(Xd);
integer n = UInt(Xn);
bits(64) offset = LSL(ZeroExtend(uimm6, 64), LOG2_TAG_GRANULE);

Assembler Symbols

<Xd|SP> Is the 64-bit name of the destination general-purpose register or stack pointer, encoded in the "Xd"
field.
<Xn|SP> Is the 64-bit name of the source general-purpose register or stack pointer, encoded in the "Xn" field.
<uimm6> Is an unsigned immediate, a multiple of 16 in the range 0 to 1008, encoded in the "uimm6" field.
<uimm4> Is an unsigned immediate, in the range 0 to 15, encoded in the "uimm4" field.

Operation

bits(64) operand1 = if n == 31 then SP[] else X[n];

bits(4) start_tag = AArch64.AllocationTagFromAddress(operand1);
bits(16) exclude = GCR_EL1.Exclude;
bits(64) result;
bits(4) rtag;

if AArch64.AllocationTagAccessIsEnabled(AccType_NORMAL) then
rtag = AArch64.ChooseNonExcludedTag(start_tag, uimm4, exclude);
else
rtag = '0000';

(result, -) = AddWithCarry(operand1, offset, '0');

result = AArch64.AddressWithAllocationTag(result, AccType_NORMAL, rtag);

if d == 31 then
SP[] = result;
else
X[d] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

ADDG Page 22
ADDS (extended register)

Add (extended register), setting flags, adds a register value and a sign or zero-extended register value, followed by an
optional left shift amount, and writes the result to the destination register. The argument that is extended from the
<Rm> register can be a byte, halfword, word, or doubleword. It updates the condition flags based on the result.
This instruction is used by the alias CMN (extended register).
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 0 1 0 1 0 1 1 0 0 1 Rm option imm3 Rn Rd
op S

32-bit (sf == 0)

ADDS <Wd>, <Wn|WSP>, <Wm>{, <extend> {#<amount>}}

64-bit (sf == 1)

ADDS <Xd>, <Xn|SP>, <R><m>{, <extend> {#<amount>}}

Assembler Symbols

<Wd> Is the 32-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Wn|WSP> Is the 32-bit name of the first source general-purpose register or stack pointer, encoded in the "Rn"
field.
<Wm> Is the 32-bit name of the second general-purpose source register, encoded in the "Rm" field.
<Xd> Is the 64-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Xn|SP> Is the 64-bit name of the first source general-purpose register or stack pointer, encoded in the "Rn"
field.

<R> Is a width specifier, encoded in “option”:

option <R>
00x W
010 W
x11 X
10x W
110 W

<m> Is the number [0-30] of the second general-purpose source register or the name ZR (31), encoded in the
"Rm" field.

<extend> For the 32-bit variant: is the extension to be applied to the second source operand, encoded in “option”:

option <extend>
000 UXTB
001 UXTH
010 LSL|UXTW
011 UXTX
100 SXTB
101 SXTH
110 SXTW
111 SXTX

ADDS (extended register) Page 23

If "Rn" is '11111' (WSP) and "option" is '010' then LSL is preferred, but may be omitted when "imm3" is
'000'. In all other cases <extend> is required and must be UXTW when "option" is '010'.

For the 64-bit variant: is the extension to be applied to the second source operand, encoded in “option”:

option <extend>
000 UXTB
001 UXTH
010 UXTW
011 LSL|UXTX
100 SXTB
101 SXTH
110 SXTW
111 SXTX
If "Rn" is '11111' (SP) and "option" is '011' then LSL is preferred, but may be omitted when "imm3" is
'000'. In all other cases <extend> is required and must be UXTX when "option" is '011'.
<amount> Is the left shift amount to be applied after extension in the range 0 to 4, defaulting to 0, encoded in the
"imm3" field. It must be absent when <extend> is absent, is required when <extend> is LSL, and is
optional when <extend> is present but not LSL.

Alias Conditions

Alias Is preferred when

CMN (extended register) Rd == '11111'

Operation

bits(datasize) result;
bits(datasize) operand1 = if n == 31 then SP[] else X[n];
bits(datasize) operand2 = ExtendReg(m, extend_type, shift);
bits(4) nzcv;

(result, nzcv) = AddWithCarry(operand1, operand2, '0');

PSTATE.<N,Z,C,V> = nzcv;

X[d] = result;

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

ADDS (extended register) Page 24

ADDS (immediate)

Add (immediate), setting flags, adds a register value and an optionally-shifted immediate value, and writes the result
to the destination register. It updates the condition flags based on the result.
This instruction is used by the alias CMN (immediate).
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 0 1 1 0 0 0 1 0 sh imm12 Rn Rd
op S

32-bit (sf == 0)

ADDS <Wd>, <Wn|WSP>, #<imm>{, <shift>}

64-bit (sf == 1)

ADDS <Xd>, <Xn|SP>, #<imm>{, <shift>}

integer d = UInt(Rd);
integer n = UInt(Rn);
integer datasize = if sf == '1' then 64 else 32;
bits(datasize) imm;

case sh of
when '0' imm = ZeroExtend(imm12, datasize);
when '1' imm = ZeroExtend(imm12:Zeros(12), datasize);

Assembler Symbols

<Wd> Is the 32-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Wn|WSP> Is the 32-bit name of the source general-purpose register or stack pointer, encoded in the "Rn" field.
<Xd> Is the 64-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Xn|SP> Is the 64-bit name of the source general-purpose register or stack pointer, encoded in the "Rn" field.
<imm> Is an unsigned immediate, in the range 0 to 4095, encoded in the "imm12" field.

<shift> Is the optional left shift to apply to the immediate, defaulting to LSL #0 and encoded in “sh”:

sh <shift>
0 LSL #0
1 LSL #12

Alias Conditions

Alias Is preferred when

CMN (immediate) Rd == '11111'

Operation

bits(datasize) result;
bits(datasize) operand1 = if n == 31 then SP[] else X[n];
bits(4) nzcv;

(result, nzcv) = AddWithCarry(operand1, imm, '0');

PSTATE.<N,Z,C,V> = nzcv;

X[d] = result;

ADDS (immediate) Page 25

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

ADDS (immediate) Page 26

ADDS (shifted register)

Add (shifted register), setting flags, adds a register value and an optionally-shifted register value, and writes the result
to the destination register. It updates the condition flags based on the result.
This instruction is used by the alias CMN (shifted register).
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 0 1 0 1 0 1 1 shift 0 Rm imm6 Rn Rd
op S

32-bit (sf == 0)

ADDS <Wd>, <Wn>, <Wm>{, <shift> #<amount>}

64-bit (sf == 1)

ADDS <Xd>, <Xn>, <Xm>{, <shift> #<amount>}

integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer datasize = if sf == '1' then 64 else 32;

if shift == '11' then UNDEFINED;

if sf == '0' && imm6<5> == '1' then UNDEFINED;

ShiftType shift_type = DecodeShift(shift);

integer shift_amount = UInt(imm6);

Assembler Symbols

<shift> Is the optional shift type to be applied to the second source operand, defaulting to LSL and encoded in
“shift”:

shift <shift>
00 LSL
01 LSR
10 ASR
11 RESERVED

Alias Conditions

Alias Is preferred when

CMN (shifted register) Rd == '11111'

ADDS (shifted register) Page 27

Operation

bits(datasize) result;
bits(datasize) operand1 = X[n];
bits(datasize) operand2 = ShiftReg(m, shift_type, shift_amount);
bits(4) nzcv;

(result, nzcv) = AddWithCarry(operand1, operand2, '0');

PSTATE.<N,Z,C,V> = nzcv;

X[d] = result;

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

ADDS (shifted register) Page 28

ADR

Form PC-relative address adds an immediate value to the PC value to form a PC-relative address, and writes the result
to the destination register.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 immlo 1 0 0 0 0 immhi Rd
op

ADR <Xd>, <label>

integer d = UInt(Rd);
bits(64) imm;

imm = SignExtend(immhi:immlo, 64);

Assembler Symbols

<Xd> Is the 64-bit name of the general-purpose destination register, encoded in the "Rd" field.
<label> Is the program label whose address is to be calculated. Its offset from the address of this instruction, in
the range +/-1MB, is encoded in "immhi:immlo".

Operation

bits(64) base = PC[];

X[d] = base + imm;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

ADR Page 29
ADRP

Form PC-relative address to 4KB page adds an immediate value that is shifted left by 12 bits, to the PC value to form a
PC-relative address, with the bottom 12 bits masked out, and writes the result to the destination register.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 immlo 1 0 0 0 0 immhi Rd
op

ADRP <Xd>, <label>

integer d = UInt(Rd);
bits(64) imm;

imm = SignExtend(immhi:immlo:Zeros(12), 64);

Assembler Symbols

<Xd> Is the 64-bit name of the general-purpose destination register, encoded in the "Rd" field.
<label> Is the program label whose 4KB page address is to be calculated. Its offset from the page address of
this instruction, in the range +/-4GB, is encoded as "immhi:immlo" times 4096.

Operation

bits(64) base = PC[];

base<11:0> = Zeros(12);

X[d] = base + imm;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

ADRP Page 30
AND (immediate)

Bitwise AND (immediate) performs a bitwise AND of a register value and an immediate value, and writes the result to
the destination register.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 0 0 1 0 0 1 0 0 N immr imms Rn Rd
opc

32-bit (sf == 0 && N == 0)

AND <Wd|WSP>, <Wn>, #<imm>

64-bit (sf == 1)

AND <Xd|SP>, <Xn>, #<imm>

integer d = UInt(Rd);
integer n = UInt(Rn);
integer datasize = if sf == '1' then 64 else 32;
bits(datasize) imm;
if sf == '0' && N != '0' then UNDEFINED;
(imm, -) = DecodeBitMasks(N, imms, immr, TRUE);

Assembler Symbols

<Wd|WSP> Is the 32-bit name of the destination general-purpose register or stack pointer, encoded in the "Rd"
field.
<Wn> Is the 32-bit name of the general-purpose source register, encoded in the "Rn" field.
<Xd|SP> Is the 64-bit name of the destination general-purpose register or stack pointer, encoded in the "Rd"
field.
<Xn> Is the 64-bit name of the general-purpose source register, encoded in the "Rn" field.
<imm> For the 32-bit variant: is the bitmask immediate, encoded in "imms:immr".
For the 64-bit variant: is the bitmask immediate, encoded in "N:imms:immr".

Operation

bits(datasize) result;
bits(datasize) operand1 = X[n];

result = operand1 AND imm;

if d == 31 then
SP[] = result;
else
X[d] = result;

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

AND (immediate) Page 31

AND (shifted register)

Bitwise AND (shifted register) performs a bitwise AND of a register value and an optionally-shifted register value, and
writes the result to the destination register.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 0 0 0 1 0 1 0 shift 0 Rm imm6 Rn Rd
opc N

32-bit (sf == 0)

AND <Wd>, <Wn>, <Wm>{, <shift> #<amount>}

64-bit (sf == 1)

AND <Xd>, <Xn>, <Xm>{, <shift> #<amount>}

integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer datasize = if sf == '1' then 64 else 32;
if sf == '0' && imm6<5> == '1' then UNDEFINED;

ShiftType shift_type = DecodeShift(shift);

integer shift_amount = UInt(imm6);

Assembler Symbols

<shift> Is the optional shift to be applied to the final source, defaulting to LSL and encoded in “shift”:

shift <shift>
00 LSL
01 LSR
10 ASR
11 ROR

Operation

bits(datasize) operand1 = X[n];

bits(datasize) operand2 = ShiftReg(m, shift_type, shift_amount);
bits(datasize) result;

result = operand1 AND operand2;

X[d] = result;

Operational information

If PSTATE.DIT is 1:

AND (shifted register) Page 32

• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

AND (shifted register) Page 33

ANDS (immediate)

Bitwise AND (immediate), setting flags, performs a bitwise AND of a register value and an immediate value, and writes
the result to the destination register. It updates the condition flags based on the result.
This instruction is used by the alias TST (immediate).
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 1 1 1 0 0 1 0 0 N immr imms Rn Rd
opc

32-bit (sf == 0 && N == 0)

ANDS <Wd>, <Wn>, #<imm>

64-bit (sf == 1)

ANDS <Xd>, <Xn>, #<imm>

integer d = UInt(Rd);
integer n = UInt(Rn);
integer datasize = if sf == '1' then 64 else 32;

bits(datasize) imm;
if sf == '0' && N != '0' then UNDEFINED;
(imm, -) = DecodeBitMasks(N, imms, immr, TRUE);

Assembler Symbols

Alias Conditions

Alias Is preferred when

TST (immediate) Rd == '11111'

Operation

bits(datasize) result;
bits(datasize) operand1 = X[n];

result = operand1 AND imm;

PSTATE.<N,Z,C,V> = result<datasize-1>:IsZeroBit(result):'00';

X[d] = result;

Operational information

ANDS (immediate) Page 34

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

ANDS (immediate) Page 35

ANDS (shifted register)

Bitwise AND (shifted register), setting flags, performs a bitwise AND of a register value and an optionally-shifted
register value, and writes the result to the destination register. It updates the condition flags based on the result.
This instruction is used by the alias TST (shifted register).
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 1 1 0 1 0 1 0 shift 0 Rm imm6 Rn Rd
opc N

32-bit (sf == 0)

ANDS <Wd>, <Wn>, <Wm>{, <shift> #<amount>}

64-bit (sf == 1)

ANDS <Xd>, <Xn>, <Xm>{, <shift> #<amount>}

integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer datasize = if sf == '1' then 64 else 32;

if sf == '0' && imm6<5> == '1' then UNDEFINED;

ShiftType shift_type = DecodeShift(shift);

integer shift_amount = UInt(imm6);

Assembler Symbols

<shift> Is the optional shift to be applied to the final source, defaulting to LSL and encoded in “shift”:

shift <shift>
00 LSL
01 LSR
10 ASR
11 ROR

Alias Conditions

Alias Is preferred when

TST (shifted register) Rd == '11111'

ANDS (shifted register) Page 36

Operation

bits(datasize) operand1 = X[n];

bits(datasize) operand2 = ShiftReg(m, shift_type, shift_amount);
bits(datasize) result;

result = operand1 AND operand2;

PSTATE.<N,Z,C,V> = result<datasize-1>:IsZeroBit(result):'00';

X[d] = result;

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

ANDS (shifted register) Page 37

ASR (immediate)

Arithmetic Shift Right (immediate) shifts a register value right by an immediate number of bits, shifting in copies of
the sign bit in the upper bits and zeros in the lower bits, and writes the result to the destination register.

This is an alias of SBFM. This means:

• The encodings in this description are named to match the encodings of SBFM.
• The description of SBFM gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 0 0 1 0 0 1 1 0 N immr x 1 1 1 1 1 Rn Rd
opc imms

32-bit (sf == 0 && N == 0 && imms == 011111)

ASR <Wd>, <Wn>, #<shift>

is equivalent to

SBFM <Wd>, <Wn>, #<shift>, #31

and is always the preferred disassembly.

64-bit (sf == 1 && N == 1 && imms == 111111)

ASR <Xd>, <Xn>, #<shift>

is equivalent to

SBFM <Xd>, <Xn>, #<shift>, #63

and is always the preferred disassembly.

Assembler Symbols

Operation

The description of SBFM gives the operational pseudocode for this instruction.

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

ASR (immediate) Page 38

ASR (register)

Arithmetic Shift Right (register) shifts a register value right by a variable number of bits, shifting in copies of its sign
bit, and writes the result to the destination register. The remainder obtained by dividing the second source register by
the data size defines the number of bits by which the first source register is right-shifted.

This is an alias of ASRV. This means:

• The encodings in this description are named to match the encodings of ASRV.
• The description of ASRV gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 0 0 1 1 0 1 0 1 1 0 Rm 0 0 1 0 1 0 Rn Rd
op2

32-bit (sf == 0)

ASR <Wd>, <Wn>, <Wm>

is equivalent to

ASRV <Wd>, <Wn>, <Wm>

and is always the preferred disassembly.

64-bit (sf == 1)

ASR <Xd>, <Xn>, <Xm>

is equivalent to

ASRV <Xd>, <Xn>, <Xm>

and is always the preferred disassembly.

Assembler Symbols

<Wd> Is the 32-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Wn> Is the 32-bit name of the first general-purpose source register, encoded in the "Rn" field.
<Wm> Is the 32-bit name of the second general-purpose source register holding a shift amount from 0 to 31 in
its bottom 5 bits, encoded in the "Rm" field.
<Xd> Is the 64-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Xn> Is the 64-bit name of the first general-purpose source register, encoded in the "Rn" field.
<Xm> Is the 64-bit name of the second general-purpose source register holding a shift amount from 0 to 63 in
its bottom 6 bits, encoded in the "Rm" field.

Operation

The description of ASRV gives the operational pseudocode for this instruction.

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

ASR (register) Page 39

ASR (register) Page 40

ASRV

Arithmetic Shift Right Variable shifts a register value right by a variable number of bits, shifting in copies of its sign
bit, and writes the result to the destination register. The remainder obtained by dividing the second source register by
the data size defines the number of bits by which the first source register is right-shifted.
This instruction is used by the alias ASR (register).
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 0 0 1 1 0 1 0 1 1 0 Rm 0 0 1 0 1 0 Rn Rd
op2

32-bit (sf == 0)

ASRV <Wd>, <Wn>, <Wm>

64-bit (sf == 1)

ASRV <Xd>, <Xn>, <Xm>

integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer datasize = if sf == '1' then 64 else 32;
ShiftType shift_type = DecodeShift(op2);

Assembler Symbols

Operation

bits(datasize) result;
bits(datasize) operand2 = X[m];

result = ShiftReg(n, shift_type, UInt(operand2) MOD datasize);

X[d] = result;

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

ASRV Page 41
AT

Address Translate. For more information, see op0==0b01, cache maintenance, TLB maintenance, and address
translation instructions.

This is an alias of SYS. This means:

• The encodings in this description are named to match the encodings of SYS.
• The description of SYS gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 0 1 0 1 0 0 0 0 1 op1 0 1 1 1 1 0 0 x op2 Rt
L CRn CRm

AT <at_op>, <Xt>

is equivalent to

SYS #<op1>, C7, <Cm>, #<op2>, <Xt>

and is the preferred disassembly when SysOp(op1,'0111',CRm,op2) == Sys_AT.

Assembler Symbols

<at_op> Is an AT instruction name, as listed for the AT system instruction group, encoded in
“op1:CRm<0>:op2”:

op1 CRm<0> op2 <at_op> Architectural Feature

000 0 000 S1E1R -
000 0 001 S1E1W -
000 0 010 S1E0R -
000 0 011 S1E0W -
000 1 000 S1E1RP FEAT_PAN2
000 1 001 S1E1WP FEAT_PAN2
100 0 000 S1E2R -
100 0 001 S1E2W -
100 0 100 S12E1R -
100 0 101 S12E1W -
100 0 110 S12E0R -
100 0 111 S12E0W -
110 0 000 S1E3R -
110 0 001 S1E3W -

<op1> Is a 3-bit unsigned immediate, in the range 0 to 7, encoded in the "op1" field.
<Cm> Is a name 'Cm', with 'm' in the range 0 to 15, encoded in the "CRm" field.
<op2> Is a 3-bit unsigned immediate, in the range 0 to 7, encoded in the "op2" field.
<Xt> Is the 64-bit name of the general-purpose source register, encoded in the "Rt" field.

Operation

The description of SYS gives the operational pseudocode for this instruction.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

AT Page 42
AUTDA, AUTDZA

Authenticate Data address, using key A. This instruction authenticates a data address, using a modifier and key A.
The address is in the general-purpose register that is specified by <Xd>.
The modifier is:
• In the general-purpose register or stack pointer that is specified by <Xn|SP> for AUTDA.
• The value zero, for AUTDZA.
If the authentication passes, the upper bits of the address are restored to enable subsequent use of the address. If the
authentication fails, the upper bits are corrupted and any subsequent use of the address results in a Translation fault.

Integer
(FEAT_PAuth)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 1 0 1 0 1 1 0 0 0 0 0 1 0 0 Z 1 1 0 Rn Rd

AUTDA (Z == 0)

AUTDA <Xd>, <Xn|SP>

AUTDZA (Z == 1 && Rn == 11111)

AUTDZA <Xd>

boolean source_is_sp = FALSE;

integer d = UInt(Rd);
integer n = UInt(Rn);

if !HavePACExt() then
UNDEFINED;

if Z == '0' then // AUTDA

if n == 31 then source_is_sp = TRUE;
else // AUTDZA
if n != 31 then UNDEFINED;

Assembler Symbols

<Xd> Is the 64-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Xn|SP> Is the 64-bit name of the general-purpose source register or stack pointer, encoded in the "Rn" field.

Operation

if HavePACExt() then
if source_is_sp then
X[d] = AuthDA(X[d], SP[], FALSE);
else
X[d] = AuthDA(X[d], X[n], FALSE);

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

AUTDA, AUTDZA Page 43

AUTDB, AUTDZB

Authenticate Data address, using key B. This instruction authenticates a data address, using a modifier and key B.
The address is in the general-purpose register that is specified by <Xd>.
The modifier is:
• In the general-purpose register or stack pointer that is specified by <Xn|SP> for AUTDB.
• The value zero, for AUTDZB.
If the authentication passes, the upper bits of the address are restored to enable subsequent use of the address. If the
authentication fails, the upper bits are corrupted and any subsequent use of the address results in a Translation fault.

Integer
(FEAT_PAuth)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 1 0 1 0 1 1 0 0 0 0 0 1 0 0 Z 1 1 1 Rn Rd

AUTDB (Z == 0)

AUTDB <Xd>, <Xn|SP>

AUTDZB (Z == 1 && Rn == 11111)

AUTDZB <Xd>

boolean source_is_sp = FALSE;

integer d = UInt(Rd);
integer n = UInt(Rn);

if !HavePACExt() then
UNDEFINED;

if Z == '0' then // AUTDB

if n == 31 then source_is_sp = TRUE;
else // AUTDZB
if n != 31 then UNDEFINED;

Assembler Symbols

Operation

if HavePACExt() then
if source_is_sp then
X[d] = AuthDB(X[d], SP[], FALSE);
else
X[d] = AuthDB(X[d], X[n], FALSE);

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

AUTDB, AUTDZB Page 44

AUTIA, AUTIA1716, AUTIASP, AUTIAZ, AUTIZA

Authenticate Instruction address, using key A. This instruction authenticates an instruction address, using a modifier
and key A.
The address is:
• In the general-purpose register that is specified by <Xd> for AUTIA and AUTIZA.
• In X17, for AUTIA1716.
• In X30, for AUTIASP and AUTIAZ.
The modifier is:
• In the general-purpose register or stack pointer that is specified by <Xn|SP> for AUTIA.
• The value zero, for AUTIZA and AUTIAZ.
• In X16, for AUTIA1716.
• In SP, for AUTIASP.
If the authentication passes, the upper bits of the address are restored to enable subsequent use of the address. If the
authentication fails, the upper bits are corrupted and any subsequent use of the address results in a Translation fault.
It has encodings from 2 classes: Integer and System

Integer
(FEAT_PAuth)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 1 0 1 0 1 1 0 0 0 0 0 1 0 0 Z 1 0 0 Rn Rd

AUTIA (Z == 0)

AUTIA <Xd>, <Xn|SP>

AUTIZA (Z == 1 && Rn == 11111)

AUTIZA <Xd>

boolean source_is_sp = FALSE;

integer d = UInt(Rd);
integer n = UInt(Rn);

if !HavePACExt() then
UNDEFINED;

if Z == '0' then // AUTIA

if n == 31 then source_is_sp = TRUE;
else // AUTIZA
if n != 31 then UNDEFINED;

System
(FEAT_PAuth)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 0 1 0 1 0 0 0 0 0 0 1 1 0 0 1 0 0 0 x 1 1 0 x 1 1 1 1 1
CRm op2

AUTIA, AUTIA1716, AUTIASP,

Page 45
AUTIAZ, AUTIZA
AUTIA1716 (CRm == 0001 && op2 == 100)

AUTIA1716

AUTIASP (CRm == 0011 && op2 == 101)

AUTIASP

AUTIAZ (CRm == 0011 && op2 == 100)

AUTIAZ

integer d;
integer n;
boolean source_is_sp = FALSE;

case CRm:op2 of
when '0011 100' // AUTIAZ
d = 30;
n = 31;
when '0011 101' // AUTIASP
d = 30;
source_is_sp = TRUE;
when '0001 100' // AUTIA1716
d = 17;
n = 16;
when '0001 000' SEE "PACIA";
when '0001 010' SEE "PACIB";
when '0001 110' SEE "AUTIB";
when '0011 00x' SEE "PACIA";
when '0011 01x' SEE "PACIB";
when '0011 11x' SEE "AUTIB";
when '0000 111' SEE "XPACLRI";
otherwise SEE "HINT";

Assembler Symbols

Operation

if HavePACExt() then
if source_is_sp then
X[d] = AuthIA(X[d], SP[], FALSE);
else
X[d] = AuthIA(X[d], X[n], FALSE);

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

AUTIA, AUTIA1716, AUTIASP,

Page 46
AUTIAZ, AUTIZA
AUTIB, AUTIB1716, AUTIBSP, AUTIBZ, AUTIZB

Authenticate Instruction address, using key B. This instruction authenticates an instruction address, using a modifier
and key B.
The address is:
• In the general-purpose register that is specified by <Xd> for AUTIB and AUTIZB.
• In X17, for AUTIB1716.
• In X30, for AUTIBSP and AUTIBZ.
The modifier is:
• In the general-purpose register or stack pointer that is specified by <Xn|SP> for AUTIB.
• The value zero, for AUTIZB and AUTIBZ.
• In X16, for AUTIB1716.
• In SP, for AUTIBSP.
If the authentication passes, the upper bits of the address are restored to enable subsequent use of the address. If the
authentication fails, the upper bits are corrupted and any subsequent use of the address results in a Translation fault.
It has encodings from 2 classes: Integer and System

Integer
(FEAT_PAuth)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 1 0 1 0 1 1 0 0 0 0 0 1 0 0 Z 1 0 1 Rn Rd

AUTIB (Z == 0)

AUTIB <Xd>, <Xn|SP>

AUTIZB (Z == 1 && Rn == 11111)

AUTIZB <Xd>

boolean source_is_sp = FALSE;

integer d = UInt(Rd);
integer n = UInt(Rn);

if !HavePACExt() then
UNDEFINED;

if Z == '0' then // AUTIB

if n == 31 then source_is_sp = TRUE;
else // AUTIZB
if n != 31 then UNDEFINED;

System
(FEAT_PAuth)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 0 1 0 1 0 0 0 0 0 0 1 1 0 0 1 0 0 0 x 1 1 1 x 1 1 1 1 1
CRm op2

AUTIB, AUTIB1716, AUTIBSP,

Page 47
AUTIBZ, AUTIZB
AUTIB1716 (CRm == 0001 && op2 == 110)

AUTIB1716

AUTIBSP (CRm == 0011 && op2 == 111)

AUTIBSP

AUTIBZ (CRm == 0011 && op2 == 110)

AUTIBZ

integer d;
integer n;
boolean source_is_sp = FALSE;

case CRm:op2 of
when '0011 110' // AUTIBZ
d = 30;
n = 31;
when '0011 111' // AUTIBSP
d = 30;
source_is_sp = TRUE;
when '0001 110' // AUTIB1716
d = 17;
n = 16;
when '0001 000' SEE "PACIA";
when '0001 010' SEE "PACIB";
when '0001 100' SEE "AUTIA";
when '0011 00x' SEE "PACIA";
when '0011 01x' SEE "PACIB";
when '0011 10x' SEE "AUTIA";
when '0000 111' SEE "XPACLRI";
otherwise SEE "HINT";

Assembler Symbols

Operation

if HavePACExt() then
if source_is_sp then
X[d] = AuthIB(X[d], SP[], FALSE);
else
X[d] = AuthIB(X[d], X[n], FALSE);

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

AUTIB, AUTIB1716, AUTIBSP,

Page 48
AUTIBZ, AUTIZB
AXFLAG

Convert floating-point condition flags from Arm to external format. This instruction converts the state of the
PSTATE.{N,Z,C,V} flags from a form representing the result of an Arm floating-point scalar compare instruction to an
alternative representation required by some software.

System
(FEAT_FlagM2)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 0 1 0 1 0 0 0 0 0 0 0 0 0 1 0 0 (0) (0) (0) (0) 0 1 0 1 1 1 1 1
CRm

AXFLAG

if !HaveFlagFormatExt() then UNDEFINED;

Operation

bit Z = PSTATE.Z OR PSTATE.V;

bit C = PSTATE.C AND NOT(PSTATE.V);

PSTATE.N = '0';
PSTATE.Z = Z;
PSTATE.C = C;
PSTATE.V = '0';

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

AXFLAG Page 49
B

Branch causes an unconditional branch to a label at a PC-relative offset, with a hint that this is not a subroutine call or
return.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 1 0 1 imm26
op

B <label>

bits(64) offset = SignExtend(imm26:'00', 64);

Assembler Symbols

<label> Is the program label to be unconditionally branched to. Its offset from the address of this instruction, in
the range +/-128MB, is encoded as "imm26" times 4.

Operation

BranchTo(PC[] + offset, BranchType_DIR, FALSE);

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

B Page 50
B.cond

Branch conditionally to a label at a PC-relative offset, with a hint that this is not a subroutine call or return.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 0 1 0 0 imm19 0 cond

B.<cond> <label>

bits(64) offset = SignExtend(imm19:'00', 64);

Assembler Symbols

<cond> Is one of the standard conditions, encoded in the "cond" field in the standard way.
<label> Is the program label to be conditionally branched to. Its offset from the address of this instruction, in
the range +/-1MB, is encoded as "imm19" times 4.

Operation

if ConditionHolds(cond) then
BranchTo(PC[] + offset, BranchType_DIR, TRUE);

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

B.cond Page 51
BC.cond

Branch Consistent conditionally to a label at a PC-relative offset, with a hint that this branch will behave very
consistently and is very unlikely to change direction.

19-bit signed PC-relative branch offset

(FEAT_HBC)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 0 1 0 0 imm19 1 cond

BC.<cond> <label>

if !HaveFeatHBC() then UNDEFINED;

bits(64) offset = SignExtend(imm19:'00', 64);

Assembler Symbols

Operation

if ConditionHolds(cond) then
BranchTo(PC[] + offset, BranchType_DIR, TRUE);

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

BC.cond Page 52
BFC

Bitfield Clear sets a bitfield of <width> bits at bit position <lsb> of the destination register to zero, leaving the other
destination bits unchanged.

This is an alias of BFM. This means:

• The encodings in this description are named to match the encodings of BFM.
• The description of BFM gives the operational pseudocode for this instruction.

Leaving other bits unchanged

(FEAT_ASMv8p2)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 0 1 1 0 0 1 1 0 N immr imms 1 1 1 1 1 Rd
opc Rn

32-bit (sf == 0 && N == 0)

BFC <Wd>, #<lsb>, #<width>

is equivalent to

BFM <Wd>, WZR, #(-<lsb> MOD 32), #(<width>-1)

and is the preferred disassembly when UInt(imms) < UInt(immr).

64-bit (sf == 1 && N == 1)

BFC <Xd>, #<lsb>, #<width>

is equivalent to

BFM <Xd>, XZR, #(-<lsb> MOD 64), #(<width>-1)

and is the preferred disassembly when UInt(imms) < UInt(immr).

Assembler Symbols

<Wd> Is the 32-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Xd> Is the 64-bit name of the general-purpose destination register, encoded in the "Rd" field.
<lsb> For the 32-bit variant: is the bit number of the lsb of the destination bitfield, in the range 0 to 31.
For the 64-bit variant: is the bit number of the lsb of the destination bitfield, in the range 0 to 63.
<width> For the 32-bit variant: is the width of the bitfield, in the range 1 to 32-<lsb>.
For the 64-bit variant: is the width of the bitfield, in the range 1 to 64-<lsb>.

Operation

The description of BFM gives the operational pseudocode for this instruction.

Operational information

BFC Page 53
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

BFC Page 54
BFI

Bitfield Insert copies a bitfield of <width> bits from the least significant bits of the source register to bit position
<lsb> of the destination register, leaving the other destination bits unchanged.

This is an alias of BFM. This means:

• The encodings in this description are named to match the encodings of BFM.
• The description of BFM gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 0 1 1 0 0 1 1 0 N immr imms != 11111 Rd
opc Rn

32-bit (sf == 0 && N == 0)

BFI <Wd>, <Wn>, #<lsb>, #<width>

is equivalent to

BFM <Wd>, <Wn>, #(-<lsb> MOD 32), #(<width>-1)

and is the preferred disassembly when UInt(imms) < UInt(immr).

64-bit (sf == 1 && N == 1)

BFI <Xd>, <Xn>, #<lsb>, #<width>

is equivalent to

BFM <Xd>, <Xn>, #(-<lsb> MOD 64), #(<width>-1)

and is the preferred disassembly when UInt(imms) < UInt(immr).

Assembler Symbols

<Wd> Is the 32-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Wn> Is the 32-bit name of the general-purpose source register, encoded in the "Rn" field.
<Xd> Is the 64-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Xn> Is the 64-bit name of the general-purpose source register, encoded in the "Rn" field.
<lsb> For the 32-bit variant: is the bit number of the lsb of the destination bitfield, in the range 0 to 31.
For the 64-bit variant: is the bit number of the lsb of the destination bitfield, in the range 0 to 63.
<width> For the 32-bit variant: is the width of the bitfield, in the range 1 to 32-<lsb>.
For the 64-bit variant: is the width of the bitfield, in the range 1 to 64-<lsb>.

Operation

The description of BFM gives the operational pseudocode for this instruction.

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

BFI Page 55
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.

BFI Page 56
BFM

Bitfield Move is usually accessed via one of its aliases, which are always preferred for disassembly.
If <imms> is greater than or equal to <immr>, this copies a bitfield of (<imms>-<immr>+1) bits starting from bit
position <immr> in the source register to the least significant bits of the destination register.
If <imms> is less than <immr>, this copies a bitfield of (<imms>+1) bits from the least significant bits of the source
register to bit position (regsize-<immr>) of the destination register, where regsize is the destination register size of 32
or 64 bits.
In both cases the other bits of the destination register remain unchanged.
This instruction is used by the aliases BFC, BFI, and BFXIL.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 0 1 1 0 0 1 1 0 N immr imms Rn Rd
opc

32-bit (sf == 0 && N == 0)

BFM <Wd>, <Wn>, #<immr>, #<imms>

64-bit (sf == 1 && N == 1)

BFM <Xd>, <Xn>, #<immr>, #<imms>

integer d = UInt(Rd);
integer n = UInt(Rn);
integer datasize = if sf == '1' then 64 else 32;

integer R;
bits(datasize) wmask;
bits(datasize) tmask;

if sf == '1' && N != '1' then UNDEFINED;

if sf == '0' && (N != '0' || immr<5> != '0' || imms<5> != '0') then UNDEFINED;

R = UInt(immr);
(wmask, tmask) = DecodeBitMasks(N, imms, immr, FALSE);

Assembler Symbols

<Wd> Is the 32-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Wn> Is the 32-bit name of the general-purpose source register, encoded in the "Rn" field.
<Xd> Is the 64-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Xn> Is the 64-bit name of the general-purpose source register, encoded in the "Rn" field.
<immr> For the 32-bit variant: is the right rotate amount, in the range 0 to 31, encoded in the "immr" field.
For the 64-bit variant: is the right rotate amount, in the range 0 to 63, encoded in the "immr" field.
<imms> For the 32-bit variant: is the leftmost bit number to be moved from the source, in the range 0 to 31,
encoded in the "imms" field.
For the 64-bit variant: is the leftmost bit number to be moved from the source, in the range 0 to 63,
encoded in the "imms" field.

Alias Conditions

Alias Is preferred when

BFC Rn == '11111' && UInt(imms) < UInt(immr)
BFI Rn != '11111' && UInt(imms) < UInt(immr)
BFXIL UInt(imms) >= UInt(immr)

BFM Page 57
Operation

bits(datasize) dst = X[d];

bits(datasize) src = X[n];

// perform bitfield move on low bits

bits(datasize) bot = (dst AND NOT(wmask)) OR (ROR(src, R) AND wmask);

// combine extension bits and result bits

X[d] = (dst AND NOT(tmask)) OR (bot AND tmask);

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

BFM Page 58
BFXIL

Bitfield Extract and Insert Low copies a bitfield of <width> bits starting from bit position <lsb> in the source register
to the least significant bits of the destination register, leaving the other destination bits unchanged.

This is an alias of BFM. This means:

32-bit (sf == 0 && N == 0)

BFXIL <Wd>, <Wn>, #<lsb>, #<width>

is equivalent to

BFM <Wd>, <Wn>, #<lsb>, #(<lsb>+<width>-1)

and is the preferred disassembly when UInt(imms) >= UInt(immr).

64-bit (sf == 1 && N == 1)

BFXIL <Xd>, <Xn>, #<lsb>, #<width>

is equivalent to

BFM <Xd>, <Xn>, #<lsb>, #(<lsb>+<width>-1)

and is the preferred disassembly when UInt(imms) >= UInt(immr).

Assembler Symbols

<Wd> Is the 32-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Wn> Is the 32-bit name of the general-purpose source register, encoded in the "Rn" field.
<Xd> Is the 64-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Xn> Is the 64-bit name of the general-purpose source register, encoded in the "Rn" field.
<lsb> For the 32-bit variant: is the bit number of the lsb of the source bitfield, in the range 0 to 31.
For the 64-bit variant: is the bit number of the lsb of the source bitfield, in the range 0 to 63.
<width> For the 32-bit variant: is the width of the bitfield, in the range 1 to 32-<lsb>.
For the 64-bit variant: is the width of the bitfield, in the range 1 to 64-<lsb>.

Operation

The description of BFM gives the operational pseudocode for this instruction.

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

BFXIL Page 59
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.

BFXIL Page 60
BIC (shifted register)

Bitwise Bit Clear (shifted register) performs a bitwise AND of a register value and the complement of an optionally-
shifted register value, and writes the result to the destination register.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 0 0 0 1 0 1 0 shift 1 Rm imm6 Rn Rd
opc N

32-bit (sf == 0)

BIC <Wd>, <Wn>, <Wm>{, <shift> #<amount>}

64-bit (sf == 1)

BIC <Xd>, <Xn>, <Xm>{, <shift> #<amount>}

integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer datasize = if sf == '1' then 64 else 32;
if sf == '0' && imm6<5> == '1' then UNDEFINED;

ShiftType shift_type = DecodeShift(shift);

integer shift_amount = UInt(imm6);

Assembler Symbols

<shift> Is the optional shift to be applied to the final source, defaulting to LSL and encoded in “shift”:

shift <shift>
00 LSL
01 LSR
10 ASR
11 ROR

Operation

bits(datasize) operand1 = X[n];

bits(datasize) operand2 = ShiftReg(m, shift_type, shift_amount);
bits(datasize) result;

operand2 = NOT(operand2);

result = operand1 AND operand2;

X[d] = result;

BIC (shifted register) Page 61

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

BIC (shifted register) Page 62

BICS (shifted register)

Bitwise Bit Clear (shifted register), setting flags, performs a bitwise AND of a register value and the complement of an
optionally-shifted register value, and writes the result to the destination register. It updates the condition flags based
on the result.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 1 1 0 1 0 1 0 shift 1 Rm imm6 Rn Rd
opc N

32-bit (sf == 0)

BICS <Wd>, <Wn>, <Wm>{, <shift> #<amount>}

64-bit (sf == 1)

BICS <Xd>, <Xn>, <Xm>{, <shift> #<amount>}

integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer datasize = if sf == '1' then 64 else 32;

if sf == '0' && imm6<5> == '1' then UNDEFINED;

ShiftType shift_type = DecodeShift(shift);

integer shift_amount = UInt(imm6);

Assembler Symbols

<shift> Is the optional shift to be applied to the final source, defaulting to LSL and encoded in “shift”:

shift <shift>
00 LSL
01 LSR
10 ASR
11 ROR

BICS (shifted register) Page 63

Operation

bits(datasize) operand1 = X[n];

bits(datasize) operand2 = ShiftReg(m, shift_type, shift_amount);
bits(datasize) result;

operand2 = NOT(operand2);

result = operand1 AND operand2;

PSTATE.<N,Z,C,V> = result<datasize-1>:IsZeroBit(result):'00';

X[d] = result;

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

BICS (shifted register) Page 64

Branch with Link branches to a PC-relative offset, setting the register X30 to PC+4. It provides a hint that this is a
subroutine call.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 0 1 0 1 imm26
op

BL <label>

bits(64) offset = SignExtend(imm26:'00', 64);

Assembler Symbols

<label> Is the program label to be unconditionally branched to. Its offset from the address of this instruction, in
the range +/-128MB, is encoded as "imm26" times 4.

Operation

X[30] = PC[] + 4;

BranchTo(PC[] + offset, BranchType_DIRCALL, FALSE);

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

BL Page 65
BLR

Branch with Link to Register calls a subroutine at an address in a register, setting register X30 to PC+4.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 0 1 1 0 0 0 1 1 1 1 1 1 0 0 0 0 0 0 Rn 0 0 0 0 0
Z op A M Rm

BLR <Xn>

integer n = UInt(Rn);

Assembler Symbols

<Xn> Is the 64-bit name of the general-purpose register holding the address to be branched to, encoded in
the "Rn" field.

Operation

bits(64) target = X[n];

X[30] = PC[] + 4;

// Value in BTypeNext will be used to set PSTATE.BTYPE

BTypeNext = '10';
BranchTo(target, BranchType_INDCALL, FALSE);

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

BLR Page 66
BLRAA, BLRAAZ, BLRAB, BLRABZ

Branch with Link to Register, with pointer authentication. This instruction authenticates the address in the general-
purpose register that is specified by <Xn>, using a modifier and the specified key, and calls a subroutine at the
authenticated address, setting register X30 to PC+4.
The modifier is:
• In the general-purpose register or stack pointer that is specified by <Xm|SP> for BLRAA and BLRAB.
• The value zero, for BLRAAZ and BLRABZ.
Key A is used for BLRAA and BLRAAZ, and key B is used for BLRAB and BLRABZ.
If the authentication passes, the PE continues execution at the target of the branch. If the authentication fails, a
Translation fault is generated.
The authenticated address is not written back to the general-purpose register.

Integer
(FEAT_PAuth)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 0 1 1 Z 0 0 1 1 1 1 1 1 0 0 0 0 1 M Rn Rm
op A

Key A, zero modifier (Z == 0 && M == 0 && Rm == 11111)

BLRAAZ <Xn>

Key A, register modifier (Z == 1 && M == 0)

BLRAA <Xn>, <Xm|SP>

Key B, zero modifier (Z == 0 && M == 1 && Rm == 11111)

BLRABZ <Xn>

Key B, register modifier (Z == 1 && M == 1)

BLRAB <Xn>, <Xm|SP>

integer n = UInt(Rn);
integer m = UInt(Rm);
boolean use_key_a = (M == '0');
boolean source_is_sp = ((Z == '1') && (m == 31));

if !HavePACExt() then
UNDEFINED;

if Z == '0' && m != 31 then

UNDEFINED;

Assembler Symbols

<Xn> Is the 64-bit name of the general-purpose register holding the address to be branched to, encoded in
the "Rn" field.
<Xm|SP> Is the 64-bit name of the general-purpose source register or stack pointer holding the modifier, encoded
in the "Rm" field.

BLRAA, BLRAAZ, BLRAB,

Page 67
BLRABZ
Operation

bits(64) target = X[n];

bits(64) modifier = if source_is_sp then SP[] else X[m];

if use_key_a then
target = AuthIA(target, modifier, TRUE);
else
target = AuthIB(target, modifier, TRUE);

X[30] = PC[] + 4;

// Value in BTypeNext will be used to set PSTATE.BTYPE

BTypeNext = '10';
BranchTo(target, BranchType_INDCALL, FALSE);

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

BLRAA, BLRAAZ, BLRAB,

Page 68
BLRABZ
BR

Branch to Register branches unconditionally to an address in a register, with a hint that this is not a subroutine return.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 0 1 1 0 0 0 0 1 1 1 1 1 0 0 0 0 0 0 Rn 0 0 0 0 0
Z op A M Rm

BR <Xn>

integer n = UInt(Rn);

Assembler Symbols

<Xn> Is the 64-bit name of the general-purpose register holding the address to be branched to, encoded in
the "Rn" field.

Operation

bits(64) target = X[n];

// Value in BTypeNext will be used to set PSTATE.BTYPE

if InGuardedPage then
if n == 16 || n == 17 then
BTypeNext = '01';
else
BTypeNext = '11';
else
BTypeNext = '01';
BranchTo(target, BranchType_INDIR, FALSE);

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

BR Page 69
BRAA, BRAAZ, BRAB, BRABZ

Branch to Register, with pointer authentication. This instruction authenticates the address in the general-purpose
register that is specified by <Xn>, using a modifier and the specified key, and branches to the authenticated address.
The modifier is:
• In the general-purpose register or stack pointer that is specified by <Xm|SP> for BRAA and BRAB.
• The value zero, for BRAAZ and BRABZ.
Key A is used for BRAA and BRAAZ, and key B is used for BRAB and BRABZ.
If the authentication passes, the PE continues execution at the target of the branch. If the authentication fails, a
Translation fault is generated.
The authenticated address is not written back to the general-purpose register.

Integer
(FEAT_PAuth)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 0 1 1 Z 0 0 0 1 1 1 1 1 0 0 0 0 1 M Rn Rm
op A

Key A, zero modifier (Z == 0 && M == 0 && Rm == 11111)

BRAAZ <Xn>

Key A, register modifier (Z == 1 && M == 0)

BRAA <Xn>, <Xm|SP>

Key B, zero modifier (Z == 0 && M == 1 && Rm == 11111)

BRABZ <Xn>

Key B, register modifier (Z == 1 && M == 1)

BRAB <Xn>, <Xm|SP>

integer n = UInt(Rn);
integer m = UInt(Rm);
boolean use_key_a = (M == '0');
boolean source_is_sp = ((Z == '1') && (m == 31));

if !HavePACExt() then
UNDEFINED;

if Z == '0' && m != 31 then

UNDEFINED;

Assembler Symbols

BRAA, BRAAZ, BRAB, BRABZ Page 70

Operation

bits(64) target = X[n];

bits(64) modifier = if source_is_sp then SP[] else X[m];

if use_key_a then
target = AuthIA(target, modifier, TRUE);
else
target = AuthIB(target, modifier, TRUE);

// Value in BTypeNext will be used to set PSTATE.BTYPE

if InGuardedPage then
if n == 16 || n == 17 then
BTypeNext = '01';
else
BTypeNext = '11';
else
BTypeNext = '01';
BranchTo(target, BranchType_INDIR, FALSE);

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

BRAA, BRAAZ, BRAB, BRABZ Page 71

BRK

Breakpoint instruction. A BRK instruction generates a Breakpoint Instruction exception. The PE records the exception
in ESR_ELx, using the EC value 0x3c, and captures the value of the immediate argument in ESR_ELx.ISS.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 0 1 0 0 0 0 1 imm16 0 0 0 0 0

BRK #<imm>

if HaveBTIExt() then
SetBTypeCompatible(TRUE);

Assembler Symbols

<imm> Is a 16-bit unsigned immediate, in the range 0 to 65535, encoded in the "imm16" field.

Operation

AArch64.SoftwareBreakpoint(imm16);

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

BRK Page 72
BTI

Branch Target Identification. A BTI instruction is used to guard against the execution of instructions which are not the
intended target of a branch.
Outside of a guarded memory region, a BTI instruction executes as a NOP. Within a guarded memory region while
PSTATE.BTYPE != 0b00, a BTI instruction compatible with the current value of PSTATE.BTYPE will not generate a
Branch Target Exception and will allow execution of subsequent instructions within the memory region.
The operand <targets> passed to a BTI instruction determines the values of PSTATE.BTYPE which the BTI instruction
is compatible with.

Note

Within a guarded memory region, when PSTATE.BTYPE != 0b00, all instructions will generate a Branch Target
Exception, other than BRK, BTI, HLT, PACIASP, and PACIBSP, which might not. See the individual instructions for
more information.

System
(FEAT_BTI)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 0 1 0 1 0 0 0 0 0 0 1 1 0 0 1 0 0 1 0 0 x x 0 1 1 1 1 1
CRm op2

BTI {<targets>}

SystemHintOp op;

if CRm:op2 == '0100 xx0' then

op = SystemHintOp_BTI;
// Check branch target compatibility between BTI instruction and PSTATE.BTYPE
SetBTypeCompatible(BTypeCompatible_BTI(op2<2:1>));
else
EndOfInstruction();

Assembler Symbols

<targets> Is the type of indirection, encoded in “op2<2:1>”:

op2<2:1> <targets>
00 (omitted)
01 c
10 j
11 jc

BTI Page 73
Operation

case op of
when SystemHintOp_YIELD
Hint_Yield();

when SystemHintOp_DGH
Hint_DGH();

when SystemHintOp_WFE
Hint_WFE(1, WFxType_WFE);

when SystemHintOp_WFI
Hint_WFI(1, WFxType_WFI);

when SystemHintOp_SEV
SendEvent();

when SystemHintOp_SEVL
SendEventLocal();

when SystemHintOp_ESB
SynchronizeErrors();
AArch64.ESBOperation();
if PSTATE.EL IN {EL0, EL1} && EL2Enabled() then AArch64.vESBOperation();
TakeUnmaskedSErrorInterrupts();

when SystemHintOp_PSB
ProfilingSynchronizationBarrier();

when SystemHintOp_TSB
TraceSynchronizationBarrier();

when SystemHintOp_CSDB
ConsumptionOfSpeculativeDataBarrier();

when SystemHintOp_BTI
SetBTypeNext('00');

otherwise // do nothing

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

BTI Page 74
CAS, CASA, CASAL, CASL

Compare and Swap word or doubleword in memory reads a 32-bit word or 64-bit doubleword from memory, and
compares it against the value held in a first register. If the comparison is equal, the value in a second register is
written to memory. If the write is performed, the read and write occur atomically such that no other modification of
the memory location can take place between the read and write.
• CASA and CASAL load from memory with acquire semantics.
• CASL and CASAL store to memory with release semantics.
• CAS has neither acquire nor release semantics.
For more information about memory ordering semantics see Load-Acquire, Store-Release.
For information about memory accesses see Load/Store addressing modes.
The architecture permits that the data read clears any exclusive monitors associated with that location, even if the
compare subsequently fails.
If the instruction generates a synchronous Data Abort, the register which is compared and loaded, that is <Ws>, or
<Xs>, is restored to the value held in the register before the instruction was executed.

No offset
(FEAT_LSE)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 x 0 0 1 0 0 0 1 L 1 Rs o0 1 1 1 1 1 Rn Rt
size

CAS, CASA, CASAL, CASL Page 75

32-bit CAS (size == 10 && L == 0 && o0 == 0)

CAS <Ws>, <Wt>, [<Xn|SP>{,#0}]

32-bit CASA (size == 10 && L == 1 && o0 == 0)

CASA <Ws>, <Wt>, [<Xn|SP>{,#0}]

32-bit CASAL (size == 10 && L == 1 && o0 == 1)

CASAL <Ws>, <Wt>, [<Xn|SP>{,#0}]

32-bit CASL (size == 10 && L == 0 && o0 == 1)

CASL <Ws>, <Wt>, [<Xn|SP>{,#0}]

64-bit CAS (size == 11 && L == 0 && o0 == 0)

CAS <Xs>, <Xt>, [<Xn|SP>{,#0}]

64-bit CASA (size == 11 && L == 1 && o0 == 0)

CASA <Xs>, <Xt>, [<Xn|SP>{,#0}]

64-bit CASAL (size == 11 && L == 1 && o0 == 1)

CASAL <Xs>, <Xt>, [<Xn|SP>{,#0}]

64-bit CASL (size == 11 && L == 0 && o0 == 1)

CASL <Xs>, <Xt>, [<Xn|SP>{,#0}]

if !HaveAtomicExt() then UNDEFINED;

integer n = UInt(Rn);
integer t = UInt(Rt);
integer s = UInt(Rs);

integer datasize = 8 << UInt(size);

integer regsize = if datasize == 64 then 64 else 32;
AccType ldacctype = if L == '1' then AccType_ORDEREDATOMICRW else AccType_ATOMICRW;
AccType stacctype = if o0 == '1' then AccType_ORDEREDATOMICRW else AccType_ATOMICRW;
boolean tag_checked = n != 31;

Assembler Symbols

<Ws> Is the 32-bit name of the general-purpose register to be compared and loaded, encoded in the "Rs" field.
<Wt> Is the 32-bit name of the general-purpose register to be conditionally stored, encoded in the "Rt" field.
<Xs> Is the 64-bit name of the general-purpose register to be compared and loaded, encoded in the "Rs" field.
<Xt> Is the 64-bit name of the general-purpose register to be conditionally stored, encoded in the "Rt" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.

CAS, CASA, CASAL, CASL Page 76

Operation

bits(64) address;
bits(datasize) comparevalue;
bits(datasize) newvalue;
bits(datasize) data;

if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);

comparevalue = X[s];
newvalue = X[t];

if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];

data = MemAtomicCompareAndSwap(address, comparevalue, newvalue, ldacctype, stacctype);

X[s] = ZeroExtend(data, regsize);

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

CAS, CASA, CASAL, CASL Page 77

CASB, CASAB, CASALB, CASLB

Compare and Swap byte in memory reads an 8-bit byte from memory, and compares it against the value held in a first
register. If the comparison is equal, the value in a second register is written to memory. If the write is performed, the
read and write occur atomically such that no other modification of the memory location can take place between the
read and write.
• CASAB and CASALB load from memory with acquire semantics.
• CASLB and CASALB store to memory with release semantics.
• CASB has neither acquire nor release semantics.
For more information about memory ordering semantics see Load-Acquire, Store-Release.
For information about memory accesses see Load/Store addressing modes.
The architecture permits that the data read clears any exclusive monitors associated with that location, even if the
compare subsequently fails.
If the instruction generates a synchronous Data Abort, the register which is compared and loaded, that is <Ws>, is
restored to the values held in the register before the instruction was executed.

No offset
(FEAT_LSE)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 1 0 0 0 1 L 1 Rs o0 1 1 1 1 1 Rn Rt
size

CASAB (L == 1 && o0 == 0)

CASAB <Ws>, <Wt>, [<Xn|SP>{,#0}]

CASALB (L == 1 && o0 == 1)

CASALB <Ws>, <Wt>, [<Xn|SP>{,#0}]

CASB (L == 0 && o0 == 0)

CASB <Ws>, <Wt>, [<Xn|SP>{,#0}]

CASLB (L == 0 && o0 == 1)

CASLB <Ws>, <Wt>, [<Xn|SP>{,#0}]

if !HaveAtomicExt() then UNDEFINED;

integer n = UInt(Rn);
integer t = UInt(Rt);
integer s = UInt(Rs);

AccType ldacctype = if L == '1' then AccType_ORDEREDATOMICRW else AccType_ATOMICRW;

AccType stacctype = if o0 == '1' then AccType_ORDEREDATOMICRW else AccType_ATOMICRW;
boolean tag_checked = n != 31;

Assembler Symbols

CASB, CASAB, CASALB,

Page 78
CASLB
Operation

bits(64) address;
bits(8) comparevalue;
bits(8) newvalue;
bits(8) data;

if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);

comparevalue = X[s];
newvalue = X[t];

if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];

data = MemAtomicCompareAndSwap(address, comparevalue, newvalue, ldacctype, stacctype);

X[s] = ZeroExtend(data, 32);

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

CASB, CASAB, CASALB,

Page 79
CASLB
CASH, CASAH, CASALH, CASLH

Compare and Swap halfword in memory reads a 16-bit halfword from memory, and compares it against the value held
in a first register. If the comparison is equal, the value in a second register is written to memory. If the write is
performed, the read and write occur atomically such that no other modification of the memory location can take place
between the read and write.
• CASAH and CASALH load from memory with acquire semantics.
• CASLH and CASALH store to memory with release semantics.
• CAS has neither acquire nor release semantics.
For more information about memory ordering semantics see Load-Acquire, Store-Release.
For information about memory accesses see Load/Store addressing modes.
The architecture permits that the data read clears any exclusive monitors associated with that location, even if the
compare subsequently fails.
If the instruction generates a synchronous Data Abort, the register which is compared and loaded, that is <Ws>, is
restored to the values held in the register before the instruction was executed.

No offset
(FEAT_LSE)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 0 1 0 0 0 1 L 1 Rs o0 1 1 1 1 1 Rn Rt
size

CASAH (L == 1 && o0 == 0)

CASAH <Ws>, <Wt>, [<Xn|SP>{,#0}]

CASALH (L == 1 && o0 == 1)

CASALH <Ws>, <Wt>, [<Xn|SP>{,#0}]

CASH (L == 0 && o0 == 0)

CASH <Ws>, <Wt>, [<Xn|SP>{,#0}]

CASLH (L == 0 && o0 == 1)

CASLH <Ws>, <Wt>, [<Xn|SP>{,#0}]

if !HaveAtomicExt() then UNDEFINED;

integer n = UInt(Rn);
integer t = UInt(Rt);
integer s = UInt(Rs);

AccType ldacctype = if L == '1' then AccType_ORDEREDATOMICRW else AccType_ATOMICRW;

AccType stacctype = if o0 == '1' then AccType_ORDEREDATOMICRW else AccType_ATOMICRW;
boolean tag_checked = n != 31;

Assembler Symbols

CASH, CASAH, CASALH,

Page 80
CASLH
Operation

bits(64) address;
bits(16) comparevalue;
bits(16) newvalue;
bits(16) data;

if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);

comparevalue = X[s];
newvalue = X[t];

if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];

data = MemAtomicCompareAndSwap(address, comparevalue, newvalue, ldacctype, stacctype);

X[s] = ZeroExtend(data, 32);

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

CASH, CASAH, CASALH,

Page 81
CASLH
CASP, CASPA, CASPAL, CASPL

Compare and Swap Pair of words or doublewords in memory reads a pair of 32-bit words or 64-bit doublewords from
memory, and compares them against the values held in the first pair of registers. If the comparison is equal, the values
in the second pair of registers are written to memory. If the writes are performed, the reads and writes occur
atomically such that no other modification of the memory location can take place between the reads and writes.
• CASPA and CASPAL load from memory with acquire semantics.
• CASPL and CASPAL store to memory with release semantics.
• CAS has neither acquire nor release semantics.
For more information about memory ordering semantics see Load-Acquire, Store-Release.
For information about memory accesses see Load/Store addressing modes.
The architecture permits that the data read clears any exclusive monitors associated with that location, even if the
compare subsequently fails.
If the instruction generates a synchronous Data Abort, the registers which are compared and loaded, that is <Ws>
and <W(s+1)>, or <Xs> and <X(s+1)>, are restored to the values held in the registers before the instruction was
executed.

No offset
(FEAT_LSE)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 sz 0 0 1 0 0 0 0 L 1 Rs o0 1 1 1 1 1 Rn Rt
Rt2

CASP, CASPA, CASPAL,

Page 82
CASPL
32-bit CASP (sz == 0 && L == 0 && o0 == 0)

CASP <Ws>, <W(s+1)>, <Wt>, <W(t+1)>, [<Xn|SP>{,#0}]

32-bit CASPA (sz == 0 && L == 1 && o0 == 0)

CASPA <Ws>, <W(s+1)>, <Wt>, <W(t+1)>, [<Xn|SP>{,#0}]

32-bit CASPAL (sz == 0 && L == 1 && o0 == 1)

CASPAL <Ws>, <W(s+1)>, <Wt>, <W(t+1)>, [<Xn|SP>{,#0}]

32-bit CASPL (sz == 0 && L == 0 && o0 == 1)

CASPL <Ws>, <W(s+1)>, <Wt>, <W(t+1)>, [<Xn|SP>{,#0}]

64-bit CASP (sz == 1 && L == 0 && o0 == 0)

CASP <Xs>, <X(s+1)>, <Xt>, <X(t+1)>, [<Xn|SP>{,#0}]

64-bit CASPA (sz == 1 && L == 1 && o0 == 0)

CASPA <Xs>, <X(s+1)>, <Xt>, <X(t+1)>, [<Xn|SP>{,#0}]

64-bit CASPAL (sz == 1 && L == 1 && o0 == 1)

CASPAL <Xs>, <X(s+1)>, <Xt>, <X(t+1)>, [<Xn|SP>{,#0}]

64-bit CASPL (sz == 1 && L == 0 && o0 == 1)

CASPL <Xs>, <X(s+1)>, <Xt>, <X(t+1)>, [<Xn|SP>{,#0}]

if !HaveAtomicExt() then UNDEFINED;

if Rs<0> == '1' then UNDEFINED;
if Rt<0> == '1' then UNDEFINED;

integer n = UInt(Rn);
integer t = UInt(Rt);
integer s = UInt(Rs);

integer datasize = 32 << UInt(sz);

AccType ldacctype = if L == '1' then AccType_ORDEREDATOMICRW else AccType_ATOMICRW;
AccType stacctype = if o0 == '1' then AccType_ORDEREDATOMICRW else AccType_ATOMICRW;
boolean tag_checked = n != 31;

Assembler Symbols

<Ws> Is the 32-bit name of the first general-purpose register to be compared and loaded, encoded in the "Rs"
field. <Ws> must be an even-numbered register.
<W(s+1)> Is the 32-bit name of the second general-purpose register to be compared and loaded.
<Wt> Is the 32-bit name of the first general-purpose register to be conditionally stored, encoded in the "Rt"
field. <Wt> must be an even-numbered register.
<W(t+1)> Is the 32-bit name of the second general-purpose register to be conditionally stored.
<Xs> Is the 64-bit name of the first general-purpose register to be compared and loaded, encoded in the "Rs"
field. <Xs> must be an even-numbered register.
<X(s+1)> Is the 64-bit name of the second general-purpose register to be compared and loaded.
<Xt> Is the 64-bit name of the first general-purpose register to be conditionally stored, encoded in the "Rt"
field. <Xt> must be an even-numbered register.

CASP, CASPA, CASPAL,

Page 83
CASPL
<X(t+1)> Is the 64-bit name of the second general-purpose register to be conditionally stored.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.

Operation

bits(64) address;
bits(2*datasize) comparevalue;
bits(2*datasize) newvalue;
bits(2*datasize) data;

bits(datasize) s1 = X[s];
bits(datasize) s2 = X[s+1];
bits(datasize) t1 = X[t];
bits(datasize) t2 = X[t+1];
comparevalue = if BigEndian(ldacctype) then s1:s2 else s2:s1;
newvalue = if BigEndian(stacctype) then t1:t2 else t2:t1;

if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);

if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];

data = MemAtomicCompareAndSwap(address, comparevalue, newvalue, ldacctype, stacctype);

if BigEndian(ldacctype) then
X[s] = data<2*datasize-1:datasize>;
X[s+1] = data<datasize-1:0>;
else
X[s] = data<datasize-1:0>;
X[s+1] = data<2*datasize-1:datasize>;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

CASP, CASPA, CASPAL,

Page 84
CASPL
CBNZ

Compare and Branch on Nonzero compares the value in a register with zero, and conditionally branches to a label at a
PC-relative offset if the comparison is not equal. It provides a hint that this is not a subroutine call or return. This
instruction does not affect the condition flags.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 0 1 1 0 1 0 1 imm19 Rt
op

32-bit (sf == 0)

CBNZ <Wt>, <label>

64-bit (sf == 1)

CBNZ <Xt>, <label>

integer t = UInt(Rt);
integer datasize = if sf == '1' then 64 else 32;
bits(64) offset = SignExtend(imm19:'00', 64);

Assembler Symbols

<Wt> Is the 32-bit name of the general-purpose register to be tested, encoded in the "Rt" field.
<Xt> Is the 64-bit name of the general-purpose register to be tested, encoded in the "Rt" field.
<label> Is the program label to be conditionally branched to. Its offset from the address of this instruction, in
the range +/-1MB, is encoded as "imm19" times 4.

Operation

bits(datasize) operand1 = X[t];

if IsZero(operand1) == FALSE then
BranchTo(PC[] + offset, BranchType_DIR, TRUE);

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

CBNZ Page 85
CBZ

Compare and Branch on Zero compares the value in a register with zero, and conditionally branches to a label at a PC-
relative offset if the comparison is equal. It provides a hint that this is not a subroutine call or return. This instruction
does not affect condition flags.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 0 1 1 0 1 0 0 imm19 Rt
op

32-bit (sf == 0)

CBZ <Wt>, <label>

64-bit (sf == 1)

CBZ <Xt>, <label>

integer t = UInt(Rt);
integer datasize = if sf == '1' then 64 else 32;
bits(64) offset = SignExtend(imm19:'00', 64);

Assembler Symbols

Operation

bits(datasize) operand1 = X[t];

if IsZero(operand1) == TRUE then
BranchTo(PC[] + offset, BranchType_DIR, TRUE);

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

CBZ Page 86
CCMN (immediate)

Conditional Compare Negative (immediate) sets the value of the condition flags to the result of the comparison of a
register value and a negated immediate value if the condition is TRUE, and an immediate value otherwise.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 0 1 1 1 0 1 0 0 1 0 imm5 cond 1 0 Rn 0 nzcv
op

32-bit (sf == 0)

CCMN <Wn>, #<imm>, #<nzcv>, <cond>

64-bit (sf == 1)

CCMN <Xn>, #<imm>, #<nzcv>, <cond>

integer n = UInt(Rn);
integer datasize = if sf == '1' then 64 else 32;
bits(4) flags = nzcv;
bits(datasize) imm = ZeroExtend(imm5, datasize);

Assembler Symbols

<Wn> Is the 32-bit name of the first general-purpose source register, encoded in the "Rn" field.
<Xn> Is the 64-bit name of the first general-purpose source register, encoded in the "Rn" field.
<imm> Is a five bit unsigned (positive) immediate encoded in the "imm5" field.
<nzcv> Is the flag bit specifier, an immediate in the range 0 to 15, giving the alternative state for the 4-bit
NZCV condition flags, encoded in the "nzcv" field.
<cond> Is one of the standard conditions, encoded in the "cond" field in the standard way.

Operation

if ConditionHolds(cond) then
bits(datasize) operand1 = X[n];
(-, flags) = AddWithCarry(operand1, imm, '0');
PSTATE.<N,Z,C,V> = flags;

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

CCMN (immediate) Page 87

CCMN (register)

Conditional Compare Negative (register) sets the value of the condition flags to the result of the comparison of a
register value and the inverse of another register value if the condition is TRUE, and an immediate value otherwise.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 0 1 1 1 0 1 0 0 1 0 Rm cond 0 0 Rn 0 nzcv
op

32-bit (sf == 0)

CCMN <Wn>, <Wm>, #<nzcv>, <cond>

64-bit (sf == 1)

CCMN <Xn>, <Xm>, #<nzcv>, <cond>

integer n = UInt(Rn);
integer m = UInt(Rm);
integer datasize = if sf == '1' then 64 else 32;
bits(4) flags = nzcv;

Assembler Symbols

<Wn> Is the 32-bit name of the first general-purpose source register, encoded in the "Rn" field.
<Wm> Is the 32-bit name of the second general-purpose source register, encoded in the "Rm" field.
<Xn> Is the 64-bit name of the first general-purpose source register, encoded in the "Rn" field.
<Xm> Is the 64-bit name of the second general-purpose source register, encoded in the "Rm" field.
<nzcv> Is the flag bit specifier, an immediate in the range 0 to 15, giving the alternative state for the 4-bit
NZCV condition flags, encoded in the "nzcv" field.
<cond> Is one of the standard conditions, encoded in the "cond" field in the standard way.

Operation

if ConditionHolds(cond) then
bits(datasize) operand1 = X[n];
bits(datasize) operand2 = X[m];
(-, flags) = AddWithCarry(operand1, operand2, '0');
PSTATE.<N,Z,C,V> = flags;

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

CCMN (register) Page 88

CCMP (immediate)

Conditional Compare (immediate) sets the value of the condition flags to the result of the comparison of a register
value and an immediate value if the condition is TRUE, and an immediate value otherwise.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 1 1 1 1 0 1 0 0 1 0 imm5 cond 1 0 Rn 0 nzcv
op

32-bit (sf == 0)

CCMP <Wn>, #<imm>, #<nzcv>, <cond>

64-bit (sf == 1)

CCMP <Xn>, #<imm>, #<nzcv>, <cond>

integer n = UInt(Rn);
integer datasize = if sf == '1' then 64 else 32;
bits(4) flags = nzcv;
bits(datasize) imm = ZeroExtend(imm5, datasize);

Assembler Symbols

Operation

if ConditionHolds(cond) then
bits(datasize) operand1 = X[n];
bits(datasize) operand2;
operand2 = NOT(imm);
(-, flags) = AddWithCarry(operand1, operand2, '1');
PSTATE.<N,Z,C,V> = flags;

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

CCMP (immediate) Page 89

CCMP (register)

Conditional Compare (register) sets the value of the condition flags to the result of the comparison of two registers if
the condition is TRUE, and an immediate value otherwise.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 1 1 1 1 0 1 0 0 1 0 Rm cond 0 0 Rn 0 nzcv
op

32-bit (sf == 0)

CCMP <Wn>, <Wm>, #<nzcv>, <cond>

64-bit (sf == 1)

CCMP <Xn>, <Xm>, #<nzcv>, <cond>

integer n = UInt(Rn);
integer m = UInt(Rm);
integer datasize = if sf == '1' then 64 else 32;
bits(4) flags = nzcv;

Assembler Symbols

Operation

if ConditionHolds(cond) then
bits(datasize) operand1 = X[n];
bits(datasize) operand2 = X[m];
operand2 = NOT(operand2);
(-, flags) = AddWithCarry(operand1, operand2, '1');
PSTATE.<N,Z,C,V> = flags;

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

CCMP (register) Page 90

CFINV

Invert Carry Flag. This instruction inverts the value of the PSTATE.C flag.

System
(FEAT_FlagM)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 0 1 0 1 0 0 0 0 0 0 0 0 0 1 0 0 (0) (0) (0) (0) 0 0 0 1 1 1 1 1
CRm

CFINV

if !HaveFlagManipulateExt() then UNDEFINED;

Operation

PSTATE.C = NOT(PSTATE.C);

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

CFINV Page 91
CFP

Control Flow Prediction Restriction by Context prevents control flow predictions that predict execution addresses
based on information gathered from earlier execution within a particular execution context. Control flow predictions
determined by the actions of code in the target execution context or contexts appearing in program order before the
instruction cannot be used to exploitatively control speculative execution occurring after the instruction is complete
and synchronized.
For more information, see CFP RCTX, Control Flow Prediction Restriction by Context.

This is an alias of SYS. This means:

• The encodings in this description are named to match the encodings of SYS.
• The description of SYS gives the operational pseudocode for this instruction.

System
(FEAT_SPECRES)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 0 1 0 1 0 0 0 0 1 0 1 1 0 1 1 1 0 0 1 1 1 0 0 Rt
L op1 CRn CRm op2

CFP RCTX, <Xt>

is equivalent to

SYS #3, C7, C3, #4, <Xt>

and is always the preferred disassembly.

Assembler Symbols

<Xt> Is the 64-bit name of the general-purpose source register, encoded in the "Rt" field.

Operation

The description of SYS gives the operational pseudocode for this instruction.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

CFP Page 92
CINC

Conditional Increment returns, in the destination register, the value of the source register incremented by 1 if the
condition is TRUE, and otherwise returns the value of the source register.

This is an alias of CSINC. This means:

• The encodings in this description are named to match the encodings of CSINC.
• The description of CSINC gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 0 0 1 1 0 1 0 1 0 0 != 11111 != 111x 0 1 != 11111 Rd
op Rm cond o2 Rn

32-bit (sf == 0)

CINC <Wd>, <Wn>, <cond>

is equivalent to

CSINC <Wd>, <Wn>, <Wn>, invert(<cond>)

and is the preferred disassembly when Rn == Rm.

64-bit (sf == 1)

CINC <Xd>, <Xn>, <cond>

is equivalent to

CSINC <Xd>, <Xn>, <Xn>, invert(<cond>)

and is the preferred disassembly when Rn == Rm.

Assembler Symbols

Operation

The description of CSINC gives the operational pseudocode for this instruction.

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

CINC Page 93
CINV

Conditional Invert returns, in the destination register, the bitwise inversion of the value of the source register if the
condition is TRUE, and otherwise returns the value of the source register.

This is an alias of CSINV. This means:

• The encodings in this description are named to match the encodings of CSINV.
• The description of CSINV gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 1 0 1 1 0 1 0 1 0 0 != 11111 != 111x 0 0 != 11111 Rd
op Rm cond o2 Rn

32-bit (sf == 0)

CINV <Wd>, <Wn>, <cond>

is equivalent to

CSINV <Wd>, <Wn>, <Wn>, invert(<cond>)

and is the preferred disassembly when Rn == Rm.

64-bit (sf == 1)

CINV <Xd>, <Xn>, <cond>

is equivalent to

CSINV <Xd>, <Xn>, <Xn>, invert(<cond>)

and is the preferred disassembly when Rn == Rm.

Assembler Symbols

Operation

The description of CSINV gives the operational pseudocode for this instruction.

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

CINV Page 94
CLREX

Clear Exclusive clears the local monitor of the executing PE.

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 0 1 0 1 0 0 0 0 0 0 1 1 0 0 1 1 CRm 0 1 0 1 1 1 1 1

CLREX {#<imm>}

// CRm field is ignored

Assembler Symbols

<imm> Is an optional 4-bit unsigned immediate, in the range 0 to 15, defaulting to 15 and encoded in the
"CRm" field.

Operation

ClearExclusiveLocal(ProcessorID());

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

CLREX Page 95
CLS

Count Leading Sign bits counts the number of leading bits of the source register that have the same value as the most
significant bit of the register, and writes the result to the destination register. This count does not include the most
significant bit of the source register.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 1 0 1 1 0 1 0 1 1 0 0 0 0 0 0 0 0 0 1 0 1 Rn Rd
op

32-bit (sf == 0)

CLS <Wd>, <Wn>

64-bit (sf == 1)

CLS <Xd>, <Xn>

integer d = UInt(Rd);
integer n = UInt(Rn);
integer datasize = if sf == '1' then 64 else 32;

Assembler Symbols

Operation

integer result;
bits(datasize) operand1 = X[n];

result = CountLeadingSignBits(operand1);

X[d] = result<datasize-1:0>;

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

CLS Page 96
CLZ

Count Leading Zeros counts the number of binary zero bits before the first binary one bit in the value of the source
register, and writes the result to the destination register.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 1 0 1 1 0 1 0 1 1 0 0 0 0 0 0 0 0 0 1 0 0 Rn Rd
op

32-bit (sf == 0)

CLZ <Wd>, <Wn>

64-bit (sf == 1)

CLZ <Xd>, <Xn>

integer d = UInt(Rd);
integer n = UInt(Rn);
integer datasize = if sf == '1' then 64 else 32;

Assembler Symbols

Operation

integer result;
bits(datasize) operand1 = X[n];

result = CountLeadingZeroBits(operand1);
X[d] = result<datasize-1:0>;

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

CLZ Page 97
CMN (extended register)

Compare Negative (extended register) adds a register value and a sign or zero-extended register value, followed by an
optional left shift amount. The argument that is extended from the <Rm> register can be a byte, halfword, word, or
doubleword. It updates the condition flags based on the result, and discards the result.

This is an alias of ADDS (extended register). This means:

• The encodings in this description are named to match the encodings of ADDS (extended register).
• The description of ADDS (extended register) gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 0 1 0 1 0 1 1 0 0 1 Rm option imm3 Rn 1 1 1 1 1
op S Rd

32-bit (sf == 0)

CMN <Wn|WSP>, <Wm>{, <extend> {#<amount>}}

is equivalent to

ADDS WZR, <Wn|WSP>, <Wm>{, <extend> {#<amount>}}

and is always the preferred disassembly.

64-bit (sf == 1)

CMN <Xn|SP>, <R><m>{, <extend> {#<amount>}}

is equivalent to

ADDS XZR, <Xn|SP>, <R><m>{, <extend> {#<amount>}}

and is always the preferred disassembly.

Assembler Symbols

<Wn|WSP> Is the 32-bit name of the first source general-purpose register or stack pointer, encoded in the "Rn"
field.
<Wm> Is the 32-bit name of the second general-purpose source register, encoded in the "Rm" field.
<Xn|SP> Is the 64-bit name of the first source general-purpose register or stack pointer, encoded in the "Rn"
field.

<R> Is a width specifier, encoded in “option”:

option <R>
00x W
010 W
x11 X
10x W
110 W

<m> Is the number [0-30] of the second general-purpose source register or the name ZR (31), encoded in the
"Rm" field.

<extend> For the 32-bit variant: is the extension to be applied to the second source operand, encoded in “option”:

CMN (extended register) Page 98

option <extend>
000 UXTB
001 UXTH
010 LSL|UXTW
011 UXTX
100 SXTB
101 SXTH
110 SXTW
111 SXTX
If "Rn" is '11111' (WSP) and "option" is '010' then LSL is preferred, but may be omitted when "imm3" is
'000'. In all other cases <extend> is required and must be UXTW when "option" is '010'.

For the 64-bit variant: is the extension to be applied to the second source operand, encoded in “option”:

Operation

The description of ADDS (extended register) gives the operational pseudocode for this instruction.

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

CMN (extended register) Page 99

CMN (immediate)

Compare Negative (immediate) adds a register value and an optionally-shifted immediate value. It updates the
condition flags based on the result, and discards the result.

This is an alias of ADDS (immediate). This means:

• The encodings in this description are named to match the encodings of ADDS (immediate).
• The description of ADDS (immediate) gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 0 1 1 0 0 0 1 0 sh imm12 Rn 1 1 1 1 1
op S Rd

32-bit (sf == 0)

CMN <Wn|WSP>, #<imm>{, <shift>}

is equivalent to

ADDS WZR, <Wn|WSP>, #<imm> {, <shift>}

and is always the preferred disassembly.

64-bit (sf == 1)

CMN <Xn|SP>, #<imm>{, <shift>}

is equivalent to

ADDS XZR, <Xn|SP>, #<imm> {, <shift>}

and is always the preferred disassembly.

Assembler Symbols

<Wn|WSP> Is the 32-bit name of the source general-purpose register or stack pointer, encoded in the "Rn" field.
<Xn|SP> Is the 64-bit name of the source general-purpose register or stack pointer, encoded in the "Rn" field.
<imm> Is an unsigned immediate, in the range 0 to 4095, encoded in the "imm12" field.

<shift> Is the optional left shift to apply to the immediate, defaulting to LSL #0 and encoded in “sh”:

sh <shift>
0 LSL #0
1 LSL #12

Operation

The description of ADDS (immediate) gives the operational pseudocode for this instruction.

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

CMN (immediate) Page 100

CMN (immediate) Page 101

CMN (shifted register)

Compare Negative (shifted register) adds a register value and an optionally-shifted register value. It updates the
condition flags based on the result, and discards the result.

This is an alias of ADDS (shifted register). This means:

• The encodings in this description are named to match the encodings of ADDS (shifted register).
• The description of ADDS (shifted register) gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 0 1 0 1 0 1 1 shift 0 Rm imm6 Rn 1 1 1 1 1
op S Rd

32-bit (sf == 0)

CMN <Wn>, <Wm>{, <shift> #<amount>}

is equivalent to

ADDS WZR, <Wn>, <Wm> {, <shift> #<amount>}

and is always the preferred disassembly.

64-bit (sf == 1)

CMN <Xn>, <Xm>{, <shift> #<amount>}

is equivalent to

ADDS XZR, <Xn>, <Xm> {, <shift> #<amount>}

and is always the preferred disassembly.

Assembler Symbols

<shift> Is the optional shift type to be applied to the second source operand, defaulting to LSL and encoded in
“shift”:

shift <shift>
00 LSL
01 LSR
10 ASR
11 RESERVED

Operation

The description of ADDS (shifted register) gives the operational pseudocode for this instruction.

CMN (shifted register) Page 102

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

CMN (shifted register) Page 103

CMP (extended register)

Compare (extended register) subtracts a sign or zero-extended register value, followed by an optional left shift
amount, from a register value. The argument that is extended from the <Rm> register can be a byte, halfword, word,
or doubleword. It updates the condition flags based on the result, and discards the result.

This is an alias of SUBS (extended register). This means:

• The encodings in this description are named to match the encodings of SUBS (extended register).
• The description of SUBS (extended register) gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 1 1 0 1 0 1 1 0 0 1 Rm option imm3 Rn 1 1 1 1 1
op S Rd

32-bit (sf == 0)

CMP <Wn|WSP>, <Wm>{, <extend> {#<amount>}}

is equivalent to

SUBS WZR, <Wn|WSP>, <Wm>{, <extend> {#<amount>}}

and is always the preferred disassembly.

64-bit (sf == 1)

CMP <Xn|SP>, <R><m>{, <extend> {#<amount>}}

is equivalent to

SUBS XZR, <Xn|SP>, <R><m>{, <extend> {#<amount>}}

and is always the preferred disassembly.

Assembler Symbols

<R> Is a width specifier, encoded in “option”:

option <R>
00x W
010 W
x11 X
10x W
110 W

<m> Is the number [0-30] of the second general-purpose source register or the name ZR (31), encoded in the
"Rm" field.

<extend> For the 32-bit variant: is the extension to be applied to the second source operand, encoded in “option”:

CMP (extended register) Page 104

For the 64-bit variant: is the extension to be applied to the second source operand, encoded in “option”:

Operation

The description of SUBS (extended register) gives the operational pseudocode for this instruction.

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

CMP (extended register) Page 105

CMP (immediate)

Compare (immediate) subtracts an optionally-shifted immediate value from a register value. It updates the condition
flags based on the result, and discards the result.

This is an alias of SUBS (immediate). This means:

• The encodings in this description are named to match the encodings of SUBS (immediate).
• The description of SUBS (immediate) gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 1 1 1 0 0 0 1 0 sh imm12 Rn 1 1 1 1 1
op S Rd

32-bit (sf == 0)

CMP <Wn|WSP>, #<imm>{, <shift>}

is equivalent to

SUBS WZR, <Wn|WSP>, #<imm> {, <shift>}

and is always the preferred disassembly.

64-bit (sf == 1)

CMP <Xn|SP>, #<imm>{, <shift>}

is equivalent to

SUBS XZR, <Xn|SP>, #<imm> {, <shift>}

and is always the preferred disassembly.

Assembler Symbols

<shift> Is the optional left shift to apply to the immediate, defaulting to LSL #0 and encoded in “sh”:

sh <shift>
0 LSL #0
1 LSL #12

Operation

The description of SUBS (immediate) gives the operational pseudocode for this instruction.

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

CMP (immediate) Page 106

CMP (immediate) Page 107

CMP (shifted register)

Compare (shifted register) subtracts an optionally-shifted register value from a register value. It updates the condition
flags based on the result, and discards the result.

This is an alias of SUBS (shifted register). This means:

• The encodings in this description are named to match the encodings of SUBS (shifted register).
• The description of SUBS (shifted register) gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 1 1 0 1 0 1 1 shift 0 Rm imm6 Rn 1 1 1 1 1
op S Rd

32-bit (sf == 0)

CMP <Wn>, <Wm>{, <shift> #<amount>}

is equivalent to

SUBS WZR, <Wn>, <Wm> {, <shift> #<amount>}

and is always the preferred disassembly.

64-bit (sf == 1)

CMP <Xn>, <Xm>{, <shift> #<amount>}

is equivalent to

SUBS XZR, <Xn>, <Xm> {, <shift> #<amount>}

and is always the preferred disassembly.

Assembler Symbols

<shift> Is the optional shift type to be applied to the second source operand, defaulting to LSL and encoded in
“shift”:

shift <shift>
00 LSL
01 LSR
10 ASR
11 RESERVED

Operation

The description of SUBS (shifted register) gives the operational pseudocode for this instruction.

CMP (shifted register) Page 108

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

CMP (shifted register) Page 109

CMPP

Compare with Tag subtracts the 56-bit address held in the second source register from the 56-bit address held in the
first source register, updates the condition flags based on the result of the subtraction, and discards the result.

This is an alias of SUBPS. This means:

• The encodings in this description are named to match the encodings of SUBPS.
• The description of SUBPS gives the operational pseudocode for this instruction.

Integer
(Armv8.5)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 1 1 0 1 0 1 1 0 Xm 0 0 0 0 0 0 Xn 1 1 1 1 1
Xd

CMPP <Xn|SP>, <Xm|SP>

is equivalent to

SUBPS XZR, <Xn|SP>, <Xm|SP>

and is always the preferred disassembly.

Assembler Symbols

<Xn|SP> Is the 64-bit name of the first source general-purpose register or stack pointer, encoded in the "Xn"
field.
<Xm|SP> Is the 64-bit name of the second general-purpose source register or stack pointer, encoded in the "Xm"
field.

Operation

The description of SUBPS gives the operational pseudocode for this instruction.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

CMPP Page 110

CNEG

Conditional Negate returns, in the destination register, the negated value of the source register if the condition is
TRUE, and otherwise returns the value of the source register.

This is an alias of CSNEG. This means:

• The encodings in this description are named to match the encodings of CSNEG.
• The description of CSNEG gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 1 0 1 1 0 1 0 1 0 0 Rm != 111x 0 1 Rn Rd
op cond o2

32-bit (sf == 0)

CNEG <Wd>, <Wn>, <cond>

is equivalent to

CSNEG <Wd>, <Wn>, <Wn>, invert(<cond>)

and is the preferred disassembly when Rn == Rm.

64-bit (sf == 1)

CNEG <Xd>, <Xn>, <cond>

is equivalent to

CSNEG <Xd>, <Xn>, <Xn>, invert(<cond>)

and is the preferred disassembly when Rn == Rm.

Assembler Symbols

Operation

The description of CSNEG gives the operational pseudocode for this instruction.

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

CNEG Page 111

CPP

Cache Prefetch Prediction Restriction by Context prevents cache allocation predictions that predict execution
addresses based on information gathered from earlier execution within a particular execution context. Cache prefetch
predictions determined by the actions of code in the target execution context or contexts appearing in program order
before the instruction cannot influence speculative execution occurring after the instruction is complete and
synchronized.
For more information, see CPP RCTX, Cache Prefetch Prediction Restriction by Context.

This is an alias of SYS. This means:

• The encodings in this description are named to match the encodings of SYS.
• The description of SYS gives the operational pseudocode for this instruction.

System
(FEAT_SPECRES)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 0 1 0 1 0 0 0 0 1 0 1 1 0 1 1 1 0 0 1 1 1 1 1 Rt
L op1 CRn CRm op2

CPP RCTX, <Xt>

is equivalent to

SYS #3, C7, C3, #7, <Xt>

and is always the preferred disassembly.

Assembler Symbols

<Xt> Is the 64-bit name of the general-purpose source register, encoded in the "Rt" field.

Operation

The description of SYS gives the operational pseudocode for this instruction.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

CPP Page 112

CPYFP, CPYFM, CPYFE

Memory Copy Forward-only. These instructions perform a memory copy. The prologue, main, and epilogue instructions
are expected to be run in succession and to appear consecutively in memory: CPYFP, then CPYFM, and then CPYFE.
CPYFP performs some preconditioning of the arguments suitable for using the CPYFM instruction, and performs an
IMPLEMENTATION DEFINED amount of the memory copy. CPYFM performs an IMPLEMENTATION DEFINED amount of the
memory copy. CPYFE performs the last part of the memory copy.

Note

The inclusion of IMPLEMENTATION DEFINED amounts of memory copy allows some optimization of the size that can be
performed.
The memory copy performed by these instructions is in the forward direction only, so the instructions are suitable for a
memory copy only where there is no overlap between the source and destination locations, or where the source
address is greater than the destination address.
The architecture supports two algorithms for the memory copy: option A and option B. Which algorithm is used is
IMPLEMENTATION DEFINED.

Note

Portable software should not assume that the choice of algorithm is constant.
After execution of CPYFP, option A (which results in encoding PSTATE.C = 0):
• If Xn<63> == 1, the copy size is saturated to 0x7FFFFFFFFFFFFFFF.
• Xs holds the original Xs + saturated Xn.
• Xd holds the original Xd + saturated Xn.
• Xn holds -1* saturated Xn + an IMPLEMENTATION DEFINED number of bytes copied.
• PSTATE.{N,Z,V} are set to {0,0,0}.
After execution of CPYFP, option B (which results in encoding PSTATE.C = 1):
• If Xn<63> == 1, the copy size is saturated to 0x7FFFFFFFFFFFFFFF.
• Xs holds the original Xs + an IMPLEMENTATION DEFINED number of bytes copied.
• Xd holds the original Xd + an IMPLEMENTATION DEFINED number of bytes copied.
• Xn holds the saturated Xn - an IMPLEMENTATION DEFINED number of bytes copied.
• PSTATE.{N,Z,V} are set to {0,0,0}.
For CPYFM, option A (encoded by PSTATE.C = 0), the format of the arguments is:
• Xn is treated as a signed 64-bit number and holds -1* the number of bytes remaining to be copied in the
memory copy in total.
• Xs holds the lowest address that the copy is copied from -Xn.
• Xd holds the lowest address that the copy is made to -Xn.
• At the end of the instruction, the value of Xn is written back with -1* the number of bytes remaining to be
copied in the memory copy in total.
For CPYFM, option B (encoded by PSTATE.C = 1), the format of the arguments is:
• Xn holds the number of bytes remaining to be copied in the memory copy in total.
• Xs holds the lowest address that the copy is copied from.
• Xd holds the lowest address that the copy is copied to.
• At the end of the instruction:
◦ the value of Xn is written back with the number of bytes remaining to be copied in the memory
copy in total.
◦ the value of Xs is written back with the lowest address that has not been copied from.
◦ the value of Xd is written back with the lowest address that has not been copied to.
For CPYFE, option A (encoded by PSTATE.C = 0), the format of the arguments is:
• Xn is treated as a signed 64-bit number and holds -1* the number of bytes remaining to be copied in the
memory copy in total.
• Xs holds the lowest address that the copy is copied from -Xn.
• Xd holds the lowest address that the copy is made to -Xn.
• At the end of the instruction, the value of Xn is written back with 0.
For CPYFE, option B (encoded by PSTATE.C = 1), the format of the arguments is:
• Xn holds the number of bytes remaining to be copied in the memory copy in total.
• Xs holds the lowest address that the copy is copied from.
• Xd holds the lowest address that the copy is copied to.
• At the end of the instruction:
◦ the value of Xn is written back with 0.
◦ the value of Xs is written back with the lowest address that has not been copied from.

CPYFP, CPYFM, CPYFE Page 113

◦ the value of Xd is written back with the lowest address that has not been copied to.

Integer
(FEAT_MOPS)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sz 0 1 1 0 0 1 op1 0 Rs 0 0 0 0 0 1 Rn Rd
op2

Epilogue (op1 == 10)

CPYFE [<Xd>]!, [<Xs>]!, <Xn>!

Main (op1 == 01)

CPYFM [<Xd>]!, [<Xs>]!, <Xn>!

Prologue (op1 == 00)

CPYFP [<Xd>]!, [<Xs>]!, <Xn>!

if !HaveFeatMOPS() then UNDEFINED;

if sz != '00' then UNDEFINED;

integer d = UInt(Rd);
integer s = UInt(Rs);
integer n = UInt(Rn);
bits(4) options = op2;

MOPSStage stage;
case op1 of
when '00' stage = MOPSStage_Prologue;
when '01' stage = MOPSStage_Main;
when '10' stage = MOPSStage_Epilogue;
otherwise SEE "Memory Copy and Memory Set";

if d == s || s == n || d == n then UNDEFINED;
if d == 31 || s == 31 || n == 31 then UNDEFINED;

Assembler Symbols

<Xd> For the epilogue and main variant: is the 64-bit name of the general-purpose register that holds an
encoding of the destination address, encoded in the "Rd" field.
For the prologue variant: is the 64-bit name of the general-purpose register that holds the destination
address and is updated by the instruction, encoded in the "Rd" field.
<Xs> For the epilogue and main variant: is the 64-bit name of the general-purpose register that holds an
encoding of the source address, encoded in the "Rs" field.
For the prologue variant: is the 64-bit name of the general-purpose register that holds the source
address and is updated by the instruction, encoded in the "Rs" field.
<Xn> For the epilogue variant: is the 64-bit name of the general-purpose register that holds an encoding of
the number of bytes to be transferred and is set to zero at the end of the instruction, encoded in the
"Rn" field.
For the main variant: is the 64-bit name of the general-purpose register that holds an encoding of the
number of bytes to be transferred, encoded in the "Rn" field.
For the prologue variant: is the 64-bit name of the general-purpose register that holds the number of
bytes to be transferred and is updated by the instruction to encode the remaining size and destination,
encoded in the "Rn" field.

CPYFP, CPYFM, CPYFE Page 114

Operation

CPYFP, CPYFM, CPYFE Page 115

CheckMOPSEnabled();

integer N = MaxBlockSizeCopiedBytes();
bits(64) toaddress = X[d];
bits(64) fromaddress = X[s];
bits(64) cpysize = X[n];
bits(64) stagecpysize;
bits(8*N) readdata;
integer B;

if HaveMTE2Ext() then
SetTagCheckedInstruction(TRUE);

boolean supports_option_a = MemCpyOptionA();

(racctype, wacctype) = MemCpyAccessTypes(options);

if stage == MOPSStage_Prologue then

if cpysize<63> == '1' then cpysize = 0x7FFFFFFFFFFFFFFF<63:0>;

if supports_option_a then
PSTATE.C = '0';
// Copy in the forward direction offsets the arguments.
toaddress = toaddress + cpysize;
fromaddress = fromaddress + cpysize;
cpysize = Zeros(64) - cpysize;
else
PSTATE.C = '1';
PSTATE.N = '0';
PSTATE.V = '0';
PSTATE.Z = '0';

// IMP DEF selection of the amount covered by pre-processing.

stagecpysize = CPYPreSizeChoice(toaddress, fromaddress, cpysize);
assert stagecpysize<63> == cpysize<63> || stagecpysize == Zeros();

if SInt(cpysize) > 0 then

assert SInt(stagecpysize) <= SInt(cpysize);
else
assert SInt(stagecpysize) >= SInt(cpysize);
else
boolean zero_size_exceptions = MemCpyZeroSizeCheck();

// Check if this version is consistent with the state of the call.

if zero_size_exceptions || SInt(cpysize) != 0 then
if supports_option_a then
if PSTATE.C == '1' then
boolean wrong_option = TRUE;
boolean from_epilogue = stage == MOPSStage_Epilogue;
MismatchedMemCpyException(supports_option_a, d, s, n, wrong_option, from_epilogue, option
else
if PSTATE.C == '0' then
boolean wrong_option = TRUE;
boolean from_epilogue = stage == MOPSStage_Epilogue;
MismatchedMemCpyException(supports_option_a, d, s, n, wrong_option, from_epilogue, option

bits(64) postsize = CPYPostSizeChoice(toaddress, fromaddress, cpysize);

assert postsize<63> == cpysize<63> || SInt(postsize) == 0;

if stage == MOPSStage_Main then

stagecpysize = cpysize - postsize;

// Check if the parameters to this instruction are valid.

if MemCpyParametersIllformedM(toaddress, fromaddress, cpysize) then
boolean wrong_option = FALSE;
boolean from_epilogue = FALSE;
MismatchedMemCpyException(supports_option_a, d, s, n, wrong_option, from_epilogue, options);
else
stagecpysize = postsize;

// Check if the parameters to this instruction are valid for the epilogue.

CPYFP, CPYFM, CPYFE Page 116

if (cpysize != postsize || MemCpyParametersIllformedE(toaddress, fromaddress, cpysize)) then
boolean wrong_option = FALSE;
boolean from_epilogue = TRUE;
MismatchedMemCpyException(supports_option_a, d, s, n, wrong_option, from_epilogue, options);

if supports_option_a then
while SInt(stagecpysize) != 0 do
// IMP DEF selection of the block size that is worked on. While many
// implementations might make this constant, that is not assumed.
B = CPYSizeChoice(toaddress, fromaddress, cpysize);
assert B <= -1 * SInt(stagecpysize);

readdata<B*8-1:0> = Mem[fromaddress+cpysize, B, racctype];

Mem[toaddress+cpysize, B, wacctype] = readdata<B*8-1:0>;
cpysize = cpysize + B;
stagecpysize = stagecpysize + B;

if stage != MOPSStage_Prologue then

X[n] = cpysize;
else
while UInt(stagecpysize) > 0 do
// IMP DEF selection of the block size that is worked on. While many
// implementations might make this constant, that is not assumed.
B = CPYSizeChoice(toaddress, fromaddress, cpysize);
assert B <= UInt(stagecpysize);

readdata<B*8-1:0> = Mem[fromaddress, B, racctype];

Mem[toaddress, B, wacctype] = readdata<B*8-1:0>;
fromaddress = fromaddress + B;
toaddress = toaddress + B;

cpysize = cpysize - B;
stagecpysize = stagecpysize - B;

if stage != MOPSStage_Prologue then

X[n] = cpysize;
X[d] = toaddress;
X[s] = fromaddress;

if stage == MOPSStage_Prologue then

X[n] = cpysize;
X[d] = toaddress;
X[s] = fromaddress;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

CPYFP, CPYFM, CPYFE Page 117

CPYFPN, CPYFMN, CPYFEN

Memory Copy Forward-only, reads and writes non-temporal. These instructions perform a memory copy. The prologue,
main, and epilogue instructions are expected to be run in succession and to appear consecutively in memory: CPYFPN,
then CPYFMN, and then CPYFEN.
CPYFPN performs some preconditioning of the arguments suitable for using the CPYFMN instruction, and performs an
IMPLEMENTATION DEFINED amount of the memory copy. CPYFMN performs an IMPLEMENTATION DEFINED amount of the
memory copy. CPYFEN performs the last part of the memory copy.

Note

Portable software should not assume that the choice of algorithm is constant.
After execution of CPYFPN, option A (which results in encoding PSTATE.C = 0):
• If Xn<63> == 1, the copy size is saturated to 0x7FFFFFFFFFFFFFFF.
• Xs holds the original Xs + saturated Xn.
• Xd holds the original Xd + saturated Xn.
• Xn holds -1* saturated Xn + an IMPLEMENTATION DEFINED number of bytes copied.
• PSTATE.{N,Z,V} are set to {0,0,0}.
After execution of CPYFPN, option B (which results in encoding PSTATE.C = 1):
• If Xn<63> == 1, the copy size is saturated to 0x7FFFFFFFFFFFFFFF.
• Xs holds the original Xs + an IMPLEMENTATION DEFINED number of bytes copied.
• Xd holds the original Xd + an IMPLEMENTATION DEFINED number of bytes copied.
• Xn holds the saturated Xn - an IMPLEMENTATION DEFINED number of bytes copied.
• PSTATE.{N,Z,V} are set to {0,0,0}.
For CPYFMN, option A (encoded by PSTATE.C = 0), the format of the arguments is:
• Xn is treated as a signed 64-bit number and holds -1* the number of bytes remaining to be copied in the
memory copy in total.
• Xs holds the lowest address that the copy is copied from -Xn.
• Xd holds the lowest address that the copy is made to -Xn.
• At the end of the instruction, the value of Xn is written back with -1* the number of bytes remaining to be
copied in the memory copy in total.
For CPYFMN, option B (encoded by PSTATE.C = 1), the format of the arguments is:
• Xn holds the number of bytes remaining to be copied in the memory copy in total.
• Xs holds the lowest address that the copy is copied from.
• Xd holds the lowest address that the copy is copied to.
• At the end of the instruction:
◦ the value of Xn is written back with the number of bytes remaining to be copied in the memory
copy in total.
◦ the value of Xs is written back with the lowest address that has not been copied from.
◦ the value of Xd is written back with the lowest address that has not been copied to.
For CPYFEN, option A (encoded by PSTATE.C = 0), the format of the arguments is:
• Xn is treated as a signed 64-bit number and holds -1* the number of bytes remaining to be copied in the
memory copy in total.
• Xs holds the lowest address that the copy is copied from -Xn.
• Xd holds the lowest address that the copy is made to -Xn.
• At the end of the instruction, the value of Xn is written back with 0.
For CPYFEN, option B (encoded by PSTATE.C = 1), the format of the arguments is:
• Xn holds the number of bytes remaining to be copied in the memory copy in total.
• Xs holds the lowest address that the copy is copied from.
• Xd holds the lowest address that the copy is copied to.
• At the end of the instruction:
◦ the value of Xn is written back with 0.

CPYFPN, CPYFMN, CPYFEN Page 118

◦ the value of Xs is written back with the lowest address that has not been copied from.
◦ the value of Xd is written back with the lowest address that has not been copied to.

Integer
(FEAT_MOPS)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sz 0 1 1 0 0 1 op1 0 Rs 1 1 0 0 0 1 Rn Rd
op2

Epilogue (op1 == 10)

CPYFEN [<Xd>]!, [<Xs>]!, <Xn>!

Main (op1 == 01)

CPYFMN [<Xd>]!, [<Xs>]!, <Xn>!

Prologue (op1 == 00)

CPYFPN [<Xd>]!, [<Xs>]!, <Xn>!

if !HaveFeatMOPS() then UNDEFINED;

if sz != '00' then UNDEFINED;

integer d = UInt(Rd);
integer s = UInt(Rs);
integer n = UInt(Rn);
bits(4) options = op2;

MOPSStage stage;
case op1 of
when '00' stage = MOPSStage_Prologue;
when '01' stage = MOPSStage_Main;
when '10' stage = MOPSStage_Epilogue;
otherwise SEE "Memory Copy and Memory Set";

if d == s || s == n || d == n then UNDEFINED;
if d == 31 || s == 31 || n == 31 then UNDEFINED;

Assembler Symbols

CPYFPN, CPYFMN, CPYFEN Page 119

Operation

CPYFPN, CPYFMN, CPYFEN Page 120

CheckMOPSEnabled();

integer N = MaxBlockSizeCopiedBytes();
bits(64) toaddress = X[d];
bits(64) fromaddress = X[s];
bits(64) cpysize = X[n];
bits(64) stagecpysize;
bits(8*N) readdata;
integer B;

if HaveMTE2Ext() then
SetTagCheckedInstruction(TRUE);

boolean supports_option_a = MemCpyOptionA();

(racctype, wacctype) = MemCpyAccessTypes(options);

if stage == MOPSStage_Prologue then

if cpysize<63> == '1' then cpysize = 0x7FFFFFFFFFFFFFFF<63:0>;

// IMP DEF selection of the amount covered by pre-processing.

stagecpysize = CPYPreSizeChoice(toaddress, fromaddress, cpysize);
assert stagecpysize<63> == cpysize<63> || stagecpysize == Zeros();

if SInt(cpysize) > 0 then

assert SInt(stagecpysize) <= SInt(cpysize);
else
assert SInt(stagecpysize) >= SInt(cpysize);
else
boolean zero_size_exceptions = MemCpyZeroSizeCheck();

// Check if this version is consistent with the state of the call.

bits(64) postsize = CPYPostSizeChoice(toaddress, fromaddress, cpysize);

assert postsize<63> == cpysize<63> || SInt(postsize) == 0;

if stage == MOPSStage_Main then

stagecpysize = cpysize - postsize;

// Check if the parameters to this instruction are valid.

// Check if the parameters to this instruction are valid for the epilogue.

CPYFPN, CPYFMN, CPYFEN Page 121

readdata<B*8-1:0> = Mem[fromaddress+cpysize, B, racctype];

Mem[toaddress+cpysize, B, wacctype] = readdata<B*8-1:0>;
cpysize = cpysize + B;
stagecpysize = stagecpysize + B;

if stage != MOPSStage_Prologue then

readdata<B*8-1:0> = Mem[fromaddress, B, racctype];

Mem[toaddress, B, wacctype] = readdata<B*8-1:0>;
fromaddress = fromaddress + B;
toaddress = toaddress + B;

cpysize = cpysize - B;
stagecpysize = stagecpysize - B;

if stage != MOPSStage_Prologue then

X[n] = cpysize;
X[d] = toaddress;
X[s] = fromaddress;

if stage == MOPSStage_Prologue then

X[n] = cpysize;
X[d] = toaddress;
X[s] = fromaddress;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

CPYFPN, CPYFMN, CPYFEN Page 122

CPYFPRN, CPYFMRN, CPYFERN

Memory Copy Forward-only, reads non-temporal. These instructions perform a memory copy. The prologue, main, and
epilogue instructions are expected to be run in succession and to appear consecutively in memory: CPYFPRN, then
CPYFMRN, and then CPYFERN.
CPYFPRN performs some preconditioning of the arguments suitable for using the CPYFMRN instruction, and performs
an IMPLEMENTATION DEFINED amount of the memory copy. CPYFMRN performs an IMPLEMENTATION DEFINED amount of
the memory copy. CPYFERN performs the last part of the memory copy.

Note

Portable software should not assume that the choice of algorithm is constant.
After execution of CPYFPRN, option A (which results in encoding PSTATE.C = 0):
• If Xn<63> == 1, the copy size is saturated to 0x7FFFFFFFFFFFFFFF.
• Xs holds the original Xs + saturated Xn.
• Xd holds the original Xd + saturated Xn.
• Xn holds -1* saturated Xn + an IMPLEMENTATION DEFINED number of bytes copied.
• PSTATE.{N,Z,V} are set to {0,0,0}.
After execution of CPYFPRN, option B (which results in encoding PSTATE.C = 1):
• If Xn<63> == 1, the copy size is saturated to 0x7FFFFFFFFFFFFFFF.
• Xs holds the original Xs + an IMPLEMENTATION DEFINED number of bytes copied.
• Xd holds the original Xd + an IMPLEMENTATION DEFINED number of bytes copied.
• Xn holds the saturated Xn - an IMPLEMENTATION DEFINED number of bytes copied.
• PSTATE.{N,Z,V} are set to {0,0,0}.
For CPYFMRN, option A (encoded by PSTATE.C = 0), the format of the arguments is:
• Xn is treated as a signed 64-bit number and holds -1* the number of bytes remaining to be copied in the
memory copy in total.
• Xs holds the lowest address that the copy is copied from -Xn.
• Xd holds the lowest address that the copy is made to -Xn.
• At the end of the instruction, the value of Xn is written back with -1* the number of bytes remaining to be
copied in the memory copy in total.
For CPYFMRN, option B (encoded by PSTATE.C = 1), the format of the arguments is:
• Xn holds the number of bytes remaining to be copied in the memory copy in total.
• Xs holds the lowest address that the copy is copied from.
• Xd holds the lowest address that the copy is copied to.
• At the end of the instruction:
◦ the value of Xn is written back with the number of bytes remaining to be copied in the memory
copy in total.
◦ the value of Xs is written back with the lowest address that has not been copied from.
◦ the value of Xd is written back with the lowest address that has not been copied to.
For CPYFERN, option A (encoded by PSTATE.C = 0), the format of the arguments is:
• Xn is treated as a signed 64-bit number and holds -1* the number of bytes remaining to be copied in the
memory copy in total.
• Xs holds the lowest address that the copy is copied from -Xn.
• Xd holds the lowest address that the copy is made to -Xn.
• At the end of the instruction, the value of Xn is written back with 0.
For CPYFERN, option B (encoded by PSTATE.C = 1), the format of the arguments is:
• Xn holds the number of bytes remaining to be copied in the memory copy in total.
• Xs holds the lowest address that the copy is copied from.
• Xd holds the lowest address that the copy is copied to.
• At the end of the instruction:
◦ the value of Xn is written back with 0.

CPYFPRN, CPYFMRN,
Page 123
CPYFERN
◦ the value of Xs is written back with the lowest address that has not been copied from.
◦ the value of Xd is written back with the lowest address that has not been copied to.

Integer
(FEAT_MOPS)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sz 0 1 1 0 0 1 op1 0 Rs 1 0 0 0 0 1 Rn Rd
op2

Epilogue (op1 == 10)

CPYFERN [<Xd>]!, [<Xs>]!, <Xn>!

Main (op1 == 01)

CPYFMRN [<Xd>]!, [<Xs>]!, <Xn>!

Prologue (op1 == 00)

CPYFPRN [<Xd>]!, [<Xs>]!, <Xn>!

if !HaveFeatMOPS() then UNDEFINED;

if sz != '00' then UNDEFINED;

integer d = UInt(Rd);
integer s = UInt(Rs);
integer n = UInt(Rn);
bits(4) options = op2;

MOPSStage stage;
case op1 of
when '00' stage = MOPSStage_Prologue;
when '01' stage = MOPSStage_Main;
when '10' stage = MOPSStage_Epilogue;
otherwise SEE "Memory Copy and Memory Set";

if d == s || s == n || d == n then UNDEFINED;
if d == 31 || s == 31 || n == 31 then UNDEFINED;

Assembler Symbols

CPYFPRN, CPYFMRN,
Page 124
CPYFERN
Operation

CPYFPRN, CPYFMRN,
Page 125
CPYFERN
CheckMOPSEnabled();

integer N = MaxBlockSizeCopiedBytes();
bits(64) toaddress = X[d];
bits(64) fromaddress = X[s];
bits(64) cpysize = X[n];
bits(64) stagecpysize;
bits(8*N) readdata;
integer B;

if HaveMTE2Ext() then
SetTagCheckedInstruction(TRUE);

boolean supports_option_a = MemCpyOptionA();

(racctype, wacctype) = MemCpyAccessTypes(options);

if stage == MOPSStage_Prologue then

if cpysize<63> == '1' then cpysize = 0x7FFFFFFFFFFFFFFF<63:0>;

// IMP DEF selection of the amount covered by pre-processing.

stagecpysize = CPYPreSizeChoice(toaddress, fromaddress, cpysize);
assert stagecpysize<63> == cpysize<63> || stagecpysize == Zeros();

if SInt(cpysize) > 0 then

assert SInt(stagecpysize) <= SInt(cpysize);
else
assert SInt(stagecpysize) >= SInt(cpysize);
else
boolean zero_size_exceptions = MemCpyZeroSizeCheck();

// Check if this version is consistent with the state of the call.

bits(64) postsize = CPYPostSizeChoice(toaddress, fromaddress, cpysize);

assert postsize<63> == cpysize<63> || SInt(postsize) == 0;

if stage == MOPSStage_Main then

stagecpysize = cpysize - postsize;

// Check if the parameters to this instruction are valid.

// Check if the parameters to this instruction are valid for the epilogue.

CPYFPRN, CPYFMRN,
Page 126
CPYFERN
if (cpysize != postsize || MemCpyParametersIllformedE(toaddress, fromaddress, cpysize)) then
boolean wrong_option = FALSE;
boolean from_epilogue = TRUE;
MismatchedMemCpyException(supports_option_a, d, s, n, wrong_option, from_epilogue, options);

readdata<B*8-1:0> = Mem[fromaddress+cpysize, B, racctype];

Mem[toaddress+cpysize, B, wacctype] = readdata<B*8-1:0>;
cpysize = cpysize + B;
stagecpysize = stagecpysize + B;

if stage != MOPSStage_Prologue then

readdata<B*8-1:0> = Mem[fromaddress, B, racctype];

Mem[toaddress, B, wacctype] = readdata<B*8-1:0>;
fromaddress = fromaddress + B;
toaddress = toaddress + B;

cpysize = cpysize - B;
stagecpysize = stagecpysize - B;

if stage != MOPSStage_Prologue then

X[n] = cpysize;
X[d] = toaddress;
X[s] = fromaddress;

if stage == MOPSStage_Prologue then

X[n] = cpysize;
X[d] = toaddress;
X[s] = fromaddress;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

CPYFPRN, CPYFMRN,
Page 127
CPYFERN
CPYFPRT, CPYFMRT, CPYFERT

Memory Copy Forward-only, reads unprivileged. These instructions perform a memory copy. The prologue, main, and
epilogue instructions are expected to be run in succession and to appear consecutively in memory: CPYFPRT, then
CPYFMRT, and then CPYFERT.
CPYFPRT performs some preconditioning of the arguments suitable for using the CPYFMRT instruction, and performs
an IMPLEMENTATION DEFINED amount of the memory copy. CPYFMRT performs an IMPLEMENTATION DEFINED amount of
the memory copy. CPYFERT performs the last part of the memory copy.

Note

Portable software should not assume that the choice of algorithm is constant.
After execution of CPYFPRT, option A (which results in encoding PSTATE.C = 0):
• If Xn<63> == 1, the copy size is saturated to 0x7FFFFFFFFFFFFFFF.
• Xs holds the original Xs + saturated Xn.
• Xd holds the original Xd + saturated Xn.
• Xn holds -1* saturated Xn + an IMPLEMENTATION DEFINED number of bytes copied.
• PSTATE.{N,Z,V} are set to {0,0,0}.
After execution of CPYFPRT, option B (which results in encoding PSTATE.C = 1):
• If Xn<63> == 1, the copy size is saturated to 0x7FFFFFFFFFFFFFFF.
• Xs holds the original Xs + an IMPLEMENTATION DEFINED number of bytes copied.
• Xd holds the original Xd + an IMPLEMENTATION DEFINED number of bytes copied.
• Xn holds the saturated Xn - an IMPLEMENTATION DEFINED number of bytes copied.
• PSTATE.{N,Z,V} are set to {0,0,0}.
For CPYFMRT, option A (encoded by PSTATE.C = 0), the format of the arguments is:
• Xn is treated as a signed 64-bit number and holds -1* the number of bytes remaining to be copied in the
memory copy in total.
• Xs holds the lowest address that the copy is copied from -Xn.
• Xd holds the lowest address that the copy is made to -Xn.
• At the end of the instruction, the value of Xn is written back with -1* the number of bytes remaining to be
copied in the memory copy in total.
For CPYFMRT, option B (encoded by PSTATE.C = 1), the format of the arguments is:
• Xn holds the number of bytes remaining to be copied in the memory copy in total.
• Xs holds the lowest address that the copy is copied from.
• Xd holds the lowest address that the copy is copied to.
• At the end of the instruction:
◦ the value of Xn is written back with the number of bytes remaining to be copied in the memory
copy in total.
◦ the value of Xs is written back with the lowest address that has not been copied from.
◦ the value of Xd is written back with the lowest address that has not been copied to.
For CPYFERT, option A (encoded by PSTATE.C = 0), the format of the arguments is:
• Xn is treated as a signed 64-bit number and holds -1* the number of bytes remaining to be copied in the
memory copy in total.
• Xs holds the lowest address that the copy is copied from -Xn.
• Xd holds the lowest address that the copy is made to -Xn.
• At the end of the instruction, the value of Xn is written back with 0.
For CPYFERT, option B (encoded by PSTATE.C = 1), the format of the arguments is:
• Xn holds the number of bytes remaining to be copied in the memory copy in total.
• Xs holds the lowest address that the copy is copied from.
• Xd holds the lowest address that the copy is copied to.
• At the end of the instruction:
◦ the value of Xn is written back with 0.

CPYFPRT, CPYFMRT,
Page 128
CPYFERT
◦ the value of Xs is written back with the lowest address that has not been copied from.
◦ the value of Xd is written back with the lowest address that has not been copied to.

Integer
(FEAT_MOPS)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sz 0 1 1 0 0 1 op1 0 Rs 0 0 1 0 0 1 Rn Rd
op2

Epilogue (op1 == 10)

CPYFERT [<Xd>]!, [<Xs>]!, <Xn>!

Main (op1 == 01)

CPYFMRT [<Xd>]!, [<Xs>]!, <Xn>!

Prologue (op1 == 00)

CPYFPRT [<Xd>]!, [<Xs>]!, <Xn>!

if !HaveFeatMOPS() then UNDEFINED;

if sz != '00' then UNDEFINED;

integer d = UInt(Rd);
integer s = UInt(Rs);
integer n = UInt(Rn);
bits(4) options = op2;

MOPSStage stage;
case op1 of
when '00' stage = MOPSStage_Prologue;
when '01' stage = MOPSStage_Main;
when '10' stage = MOPSStage_Epilogue;
otherwise SEE "Memory Copy and Memory Set";

if d == s || s == n || d == n then UNDEFINED;
if d == 31 || s == 31 || n == 31 then UNDEFINED;

Assembler Symbols

CPYFPRT, CPYFMRT,
Page 129
CPYFERT
Operation

CPYFPRT, CPYFMRT,
Page 130
CPYFERT
CheckMOPSEnabled();

integer N = MaxBlockSizeCopiedBytes();
bits(64) toaddress = X[d];
bits(64) fromaddress = X[s];
bits(64) cpysize = X[n];
bits(64) stagecpysize;
bits(8*N) readdata;
integer B;

if HaveMTE2Ext() then
SetTagCheckedInstruction(TRUE);

boolean supports_option_a = MemCpyOptionA();

(racctype, wacctype) = MemCpyAccessTypes(options);

if stage == MOPSStage_Prologue then

if cpysize<63> == '1' then cpysize = 0x7FFFFFFFFFFFFFFF<63:0>;

// IMP DEF selection of the amount covered by pre-processing.

stagecpysize = CPYPreSizeChoice(toaddress, fromaddress, cpysize);
assert stagecpysize<63> == cpysize<63> || stagecpysize == Zeros();

if SInt(cpysize) > 0 then

assert SInt(stagecpysize) <= SInt(cpysize);
else
assert SInt(stagecpysize) >= SInt(cpysize);
else
boolean zero_size_exceptions = MemCpyZeroSizeCheck();

// Check if this version is consistent with the state of the call.

bits(64) postsize = CPYPostSizeChoice(toaddress, fromaddress, cpysize);

assert postsize<63> == cpysize<63> || SInt(postsize) == 0;

if stage == MOPSStage_Main then

stagecpysize = cpysize - postsize;

// Check if the parameters to this instruction are valid.

// Check if the parameters to this instruction are valid for the epilogue.

CPYFPRT, CPYFMRT,
Page 131
CPYFERT
if (cpysize != postsize || MemCpyParametersIllformedE(toaddress, fromaddress, cpysize)) then
boolean wrong_option = FALSE;
boolean from_epilogue = TRUE;
MismatchedMemCpyException(supports_option_a, d, s, n, wrong_option, from_epilogue, options);

readdata<B*8-1:0> = Mem[fromaddress+cpysize, B, racctype];

Mem[toaddress+cpysize, B, wacctype] = readdata<B*8-1:0>;
cpysize = cpysize + B;
stagecpysize = stagecpysize + B;

if stage != MOPSStage_Prologue then

readdata<B*8-1:0> = Mem[fromaddress, B, racctype];

Mem[toaddress, B, wacctype] = readdata<B*8-1:0>;
fromaddress = fromaddress + B;
toaddress = toaddress + B;

cpysize = cpysize - B;
stagecpysize = stagecpysize - B;

if stage != MOPSStage_Prologue then

X[n] = cpysize;
X[d] = toaddress;
X[s] = fromaddress;

if stage == MOPSStage_Prologue then

X[n] = cpysize;
X[d] = toaddress;
X[s] = fromaddress;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

CPYFPRT, CPYFMRT,
Page 132
CPYFERT
CPYFPRTN, CPYFMRTN, CPYFERTN

Memory Copy Forward-only, reads unprivileged, reads and writes non-temporal. These instructions perform a memory
copy. The prologue, main, and epilogue instructions are expected to be run in succession and to appear consecutively
in memory: CPYFPRTN, then CPYFMRTN, and then CPYFERTN.
CPYFPRTN performs some preconditioning of the arguments suitable for using the CPYFMRTN instruction, and
performs an IMPLEMENTATION DEFINED amount of the memory copy. CPYFMRTN performs an IMPLEMENTATION DEFINED
amount of the memory copy. CPYFERTN performs the last part of the memory copy.

Note

Portable software should not assume that the choice of algorithm is constant.
After execution of CPYFPRTN, option A (which results in encoding PSTATE.C = 0):
• If Xn<63> == 1, the copy size is saturated to 0x7FFFFFFFFFFFFFFF.
• Xs holds the original Xs + saturated Xn.
• Xd holds the original Xd + saturated Xn.
• Xn holds -1* saturated Xn + an IMPLEMENTATION DEFINED number of bytes copied.
• PSTATE.{N,Z,V} are set to {0,0,0}.
After execution of CPYFPRTN, option B (which results in encoding PSTATE.C = 1):
• If Xn<63> == 1, the copy size is saturated to 0x7FFFFFFFFFFFFFFF.
• Xs holds the original Xs + an IMPLEMENTATION DEFINED number of bytes copied.
• Xd holds the original Xd + an IMPLEMENTATION DEFINED number of bytes copied.
• Xn holds the saturated Xn - an IMPLEMENTATION DEFINED number of bytes copied.
• PSTATE.{N,Z,V} are set to {0,0,0}.
For CPYFMRTN, option A (encoded by PSTATE.C = 0), the format of the arguments is:
• Xn is treated as a signed 64-bit number and holds -1* the number of bytes remaining to be copied in the
memory copy in total.
• Xs holds the lowest address that the copy is copied from -Xn.
• Xd holds the lowest address that the copy is made to -Xn.
• At the end of the instruction, the value of Xn is written back with -1* the number of bytes remaining to be
copied in the memory copy in total.
For CPYFMRTN, option B (encoded by PSTATE.C = 1), the format of the arguments is:
• Xn holds the number of bytes remaining to be copied in the memory copy in total.
• Xs holds the lowest address that the copy is copied from.
• Xd holds the lowest address that the copy is copied to.
• At the end of the instruction:
◦ the value of Xn is written back with the number of bytes remaining to be copied in the memory
copy in total.
◦ the value of Xs is written back with the lowest address that has not been copied from.
◦ the value of Xd is written back with the lowest address that has not been copied to.
For CPYFERTN, option A (encoded by PSTATE.C = 0), the format of the arguments is:
• Xn is treated as a signed 64-bit number and holds -1* the number of bytes remaining to be copied in the
memory copy in total.
• Xs holds the lowest address that the copy is copied from -Xn.
• Xd holds the lowest address that the copy is made to -Xn.
• At the end of the instruction, the value of Xn is written back with 0.
For CPYFERTN, option B (encoded by PSTATE.C = 1), the format of the arguments is:
• Xn holds the number of bytes remaining to be copied in the memory copy in total.
• Xs holds the lowest address that the copy is copied from.
• Xd holds the lowest address that the copy is copied to.
• At the end of the instruction:
◦ the value of Xn is written back with 0.

CPYFPRTN, CPYFMRTN,
Page 133
CPYFERTN
◦ the value of Xs is written back with the lowest address that has not been copied from.
◦ the value of Xd is written back with the lowest address that has not been copied to.

Integer
(FEAT_MOPS)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sz 0 1 1 0 0 1 op1 0 Rs 1 1 1 0 0 1 Rn Rd
op2

Epilogue (op1 == 10)

CPYFERTN [<Xd>]!, [<Xs>]!, <Xn>!

Main (op1 == 01)

CPYFMRTN [<Xd>]!, [<Xs>]!, <Xn>!

Prologue (op1 == 00)

CPYFPRTN [<Xd>]!, [<Xs>]!, <Xn>!

if !HaveFeatMOPS() then UNDEFINED;

if sz != '00' then UNDEFINED;

integer d = UInt(Rd);
integer s = UInt(Rs);
integer n = UInt(Rn);
bits(4) options = op2;

MOPSStage stage;
case op1 of
when '00' stage = MOPSStage_Prologue;
when '01' stage = MOPSStage_Main;
when '10' stage = MOPSStage_Epilogue;
otherwise SEE "Memory Copy and Memory Set";

if d == s || s == n || d == n then UNDEFINED;
if d == 31 || s == 31 || n == 31 then UNDEFINED;

Assembler Symbols

CPYFPRTN, CPYFMRTN,
Page 134
CPYFERTN
Operation

CPYFPRTN, CPYFMRTN,
Page 135
CPYFERTN
CheckMOPSEnabled();

integer N = MaxBlockSizeCopiedBytes();
bits(64) toaddress = X[d];
bits(64) fromaddress = X[s];
bits(64) cpysize = X[n];
bits(64) stagecpysize;
bits(8*N) readdata;
integer B;

if HaveMTE2Ext() then
SetTagCheckedInstruction(TRUE);

boolean supports_option_a = MemCpyOptionA();

(racctype, wacctype) = MemCpyAccessTypes(options);

if stage == MOPSStage_Prologue then

if cpysize<63> == '1' then cpysize = 0x7FFFFFFFFFFFFFFF<63:0>;

// IMP DEF selection of the amount covered by pre-processing.

stagecpysize = CPYPreSizeChoice(toaddress, fromaddress, cpysize);
assert stagecpysize<63> == cpysize<63> || stagecpysize == Zeros();

if SInt(cpysize) > 0 then

assert SInt(stagecpysize) <= SInt(cpysize);
else
assert SInt(stagecpysize) >= SInt(cpysize);
else
boolean zero_size_exceptions = MemCpyZeroSizeCheck();

// Check if this version is consistent with the state of the call.

bits(64) postsize = CPYPostSizeChoice(toaddress, fromaddress, cpysize);

assert postsize<63> == cpysize<63> || SInt(postsize) == 0;

if stage == MOPSStage_Main then

stagecpysize = cpysize - postsize;

// Check if the parameters to this instruction are valid.

// Check if the parameters to this instruction are valid for the epilogue.

CPYFPRTN, CPYFMRTN,
Page 136
CPYFERTN
if (cpysize != postsize || MemCpyParametersIllformedE(toaddress, fromaddress, cpysize)) then
boolean wrong_option = FALSE;
boolean from_epilogue = TRUE;
MismatchedMemCpyException(supports_option_a, d, s, n, wrong_option, from_epilogue, options);

readdata<B*8-1:0> = Mem[fromaddress+cpysize, B, racctype];

Mem[toaddress+cpysize, B, wacctype] = readdata<B*8-1:0>;
cpysize = cpysize + B;
stagecpysize = stagecpysize + B;

if stage != MOPSStage_Prologue then

readdata<B*8-1:0> = Mem[fromaddress, B, racctype];

Mem[toaddress, B, wacctype] = readdata<B*8-1:0>;
fromaddress = fromaddress + B;
toaddress = toaddress + B;

cpysize = cpysize - B;
stagecpysize = stagecpysize - B;

if stage != MOPSStage_Prologue then

X[n] = cpysize;
X[d] = toaddress;
X[s] = fromaddress;

if stage == MOPSStage_Prologue then

X[n] = cpysize;
X[d] = toaddress;
X[s] = fromaddress;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

CPYFPRTN, CPYFMRTN,
Page 137
CPYFERTN
CPYFPRTRN, CPYFMRTRN, CPYFERTRN

Memory Copy Forward-only, reads unprivileged and non-temporal. These instructions perform a memory copy. The
prologue, main, and epilogue instructions are expected to be run in succession and to appear consecutively in
memory: CPYFPRTRN, then CPYFMRTRN, and then CPYFERTRN.
CPYFPRTRN performs some preconditioning of the arguments suitable for using the CPYFMRTRN instruction, and
performs an IMPLEMENTATION DEFINED amount of the memory copy. CPYFMRTRN performs an IMPLEMENTATION DEFINED
amount of the memory copy. CPYFERTRN performs the last part of the memory copy.

Note

Portable software should not assume that the choice of algorithm is constant.
After execution of CPYFPRTRN, option A (which results in encoding PSTATE.C = 0):
• If Xn<63> == 1, the copy size is saturated to 0x7FFFFFFFFFFFFFFF.
• Xs holds the original Xs + saturated Xn.
• Xd holds the original Xd + saturated Xn.
• Xn holds -1* saturated Xn + an IMPLEMENTATION DEFINED number of bytes copied.
• PSTATE.{N,Z,V} are set to {0,0,0}.
After execution of CPYFPRTRN, option B (which results in encoding PSTATE.C = 1):
• If Xn<63> == 1, the copy size is saturated to 0x7FFFFFFFFFFFFFFF.
• Xs holds the original Xs + an IMPLEMENTATION DEFINED number of bytes copied.
• Xd holds the original Xd + an IMPLEMENTATION DEFINED number of bytes copied.
• Xn holds the saturated Xn - an IMPLEMENTATION DEFINED number of bytes copied.
• PSTATE.{N,Z,V} are set to {0,0,0}.
For CPYFMRTRN, option A (encoded by PSTATE.C = 0), the format of the arguments is:
• Xn is treated as a signed 64-bit number and holds -1* the number of bytes remaining to be copied in the
memory copy in total.
• Xs holds the lowest address that the copy is copied from -Xn.
• Xd holds the lowest address that the copy is made to -Xn.
• At the end of the instruction, the value of Xn is written back with -1* the number of bytes remaining to be
copied in the memory copy in total.
For CPYFMRTRN, option B (encoded by PSTATE.C = 1), the format of the arguments is:
• Xn holds the number of bytes remaining to be copied in the memory copy in total.
• Xs holds the lowest address that the copy is copied from.
• Xd holds the lowest address that the copy is copied to.
• At the end of the instruction:
◦ the value of Xn is written back with the number of bytes remaining to be copied in the memory
copy in total.
◦ the value of Xs is written back with the lowest address that has not been copied from.
◦ the value of Xd is written back with the lowest address that has not been copied to.
For CPYFERTRN, option A (encoded by PSTATE.C = 0), the format of the arguments is:
• Xn is treated as a signed 64-bit number and holds -1* the number of bytes remaining to be copied in the
memory copy in total.
• Xs holds the lowest address that the copy is copied from -Xn.
• Xd holds the lowest address that the copy is made to -Xn.
• At the end of the instruction, the value of Xn is written back with 0.
For CPYFERTRN option B (encoded by PSTATE.C = 1), the format of the arguments is:
• Xn holds the number of bytes remaining to be copied in the memory copy in total.
• Xs holds the lowest address that the copy is copied from.
• Xd holds the lowest address that the copy is copied to.
• At the end of the instruction:
◦ the value of Xn is written back with 0.

CPYFPRTRN, CPYFMRTRN,
Page 138
CPYFERTRN
◦ the value of Xs is written back with the lowest address that has not been copied from.
◦ the value of Xd is written back with the lowest address that has not been copied to.

Integer
(FEAT_MOPS)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sz 0 1 1 0 0 1 op1 0 Rs 1 0 1 0 0 1 Rn Rd
op2

Epilogue (op1 == 10)

CPYFERTRN [<Xd>]!, [<Xs>]!, <Xn>!

Main (op1 == 01)

CPYFMRTRN [<Xd>]!, [<Xs>]!, <Xn>!

Prologue (op1 == 00)

CPYFPRTRN [<Xd>]!, [<Xs>]!, <Xn>!

if !HaveFeatMOPS() then UNDEFINED;

if sz != '00' then UNDEFINED;

integer d = UInt(Rd);
integer s = UInt(Rs);
integer n = UInt(Rn);
bits(4) options = op2;

MOPSStage stage;
case op1 of
when '00' stage = MOPSStage_Prologue;
when '01' stage = MOPSStage_Main;
when '10' stage = MOPSStage_Epilogue;
otherwise SEE "Memory Copy and Memory Set";

if d == s || s == n || d == n then UNDEFINED;
if d == 31 || s == 31 || n == 31 then UNDEFINED;

Assembler Symbols

CPYFPRTRN, CPYFMRTRN,
Page 139
CPYFERTRN
Operation

CPYFPRTRN, CPYFMRTRN,
Page 140
CPYFERTRN
CheckMOPSEnabled();

integer N = MaxBlockSizeCopiedBytes();
bits(64) toaddress = X[d];
bits(64) fromaddress = X[s];
bits(64) cpysize = X[n];
bits(64) stagecpysize;
bits(8*N) readdata;
integer B;

if HaveMTE2Ext() then
SetTagCheckedInstruction(TRUE);

boolean supports_option_a = MemCpyOptionA();

(racctype, wacctype) = MemCpyAccessTypes(options);

if stage == MOPSStage_Prologue then

if cpysize<63> == '1' then cpysize = 0x7FFFFFFFFFFFFFFF<63:0>;

// IMP DEF selection of the amount covered by pre-processing.

stagecpysize = CPYPreSizeChoice(toaddress, fromaddress, cpysize);
assert stagecpysize<63> == cpysize<63> || stagecpysize == Zeros();

if SInt(cpysize) > 0 then

assert SInt(stagecpysize) <= SInt(cpysize);
else
assert SInt(stagecpysize) >= SInt(cpysize);
else
boolean zero_size_exceptions = MemCpyZeroSizeCheck();

// Check if this version is consistent with the state of the call.

bits(64) postsize = CPYPostSizeChoice(toaddress, fromaddress, cpysize);

assert postsize<63> == cpysize<63> || SInt(postsize) == 0;

if stage == MOPSStage_Main then

stagecpysize = cpysize - postsize;

// Check if the parameters to this instruction are valid.

// Check if the parameters to this instruction are valid for the epilogue.

CPYFPRTRN, CPYFMRTRN,
Page 141
CPYFERTRN
if (cpysize != postsize || MemCpyParametersIllformedE(toaddress, fromaddress, cpysize)) then
boolean wrong_option = FALSE;
boolean from_epilogue = TRUE;
MismatchedMemCpyException(supports_option_a, d, s, n, wrong_option, from_epilogue, options);

readdata<B*8-1:0> = Mem[fromaddress+cpysize, B, racctype];

Mem[toaddress+cpysize, B, wacctype] = readdata<B*8-1:0>;
cpysize = cpysize + B;
stagecpysize = stagecpysize + B;

if stage != MOPSStage_Prologue then

readdata<B*8-1:0> = Mem[fromaddress, B, racctype];

Mem[toaddress, B, wacctype] = readdata<B*8-1:0>;
fromaddress = fromaddress + B;
toaddress = toaddress + B;

cpysize = cpysize - B;
stagecpysize = stagecpysize - B;

if stage != MOPSStage_Prologue then

X[n] = cpysize;
X[d] = toaddress;
X[s] = fromaddress;

if stage == MOPSStage_Prologue then

X[n] = cpysize;
X[d] = toaddress;
X[s] = fromaddress;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

CPYFPRTRN, CPYFMRTRN,
Page 142
CPYFERTRN
CPYFPRTWN, CPYFMRTWN, CPYFERTWN

Memory Copy Forward-only, reads unprivileged, writes non-temporal. These instructions perform a memory copy. The
prologue, main, and epilogue instructions are expected to be run in succession and to appear consecutively in
memory: CPYFPRTWN, then CPYFMRTWN, and then CPYFERTWN.
CPYFPRTWN performs some preconditioning of the arguments suitable for using the CPYFMRTWN instruction, and
performs an IMPLEMENTATION DEFINED amount of the memory copy. CPYFMRTWN performs an IMPLEMENTATION DEFINED
amount of the memory copy. CPYFERTWN performs the last part of the memory copy.

Note

Portable software should not assume that the choice of algorithm is constant.
After execution of CPYFPRTWN, option A (which results in encoding PSTATE.C = 0):
• If Xn<63> == 1, the copy size is saturated to 0x7FFFFFFFFFFFFFFF.
• Xs holds the original Xs + saturated Xn.
• Xd holds the original Xd + saturated Xn.
• Xn holds -1* saturated Xn + an IMPLEMENTATION DEFINED number of bytes copied.
• PSTATE.{N,Z,V} are set to {0,0,0}.
After execution of CPYFPRTWN, option B (which results in encoding PSTATE.C = 1):
• If Xn<63> == 1, the copy size is saturated to 0x7FFFFFFFFFFFFFFF.
• Xs holds the original Xs + an IMPLEMENTATION DEFINED number of bytes copied.
• Xd holds the original Xd + an IMPLEMENTATION DEFINED number of bytes copied.
• Xn holds the saturated Xn - an IMPLEMENTATION DEFINED number of bytes copied.
• PSTATE.{N,Z,V} are set to {0,0,0}.
For CPYFMRTWN, option A (encoded by PSTATE.C = 0), the format of the arguments is:
• Xn is treated as a signed 64-bit number and holds -1* the number of bytes remaining to be copied in the
memory copy in total.
• Xs holds the lowest address that the copy is copied from -Xn.
• Xd holds the lowest address that the copy is made to -Xn.
• At the end of the instruction, the value of Xn is written back with -1* the number of bytes remaining to be
copied in the memory copy in total.
For CPYFMRTWN, option B (encoded by PSTATE.C = 1), the format of the arguments is:
• Xn holds the number of bytes remaining to be copied in the memory copy in total.
• Xs holds the lowest address that the copy is copied from.
• Xd holds the lowest address that the copy is copied to.
• At the end of the instruction:
◦ the value of Xn is written back with the number of bytes remaining to be copied in the memory
copy in total.
◦ the value of Xs is written back with the lowest address that has not been copied from.
◦ the value of Xd is written back with the lowest address that has not been copied to.
For CPYFERTWN, option A (encoded by PSTATE.C = 0), the format of the arguments is:
• Xn is treated as a signed 64-bit number and holds -1* the number of bytes remaining to be copied in the
memory copy in total.
• Xs holds the lowest address that the copy is copied from -Xn.
• Xd holds the lowest address that the copy is made to -Xn.
• At the end of the instruction, the value of Xn is written back with 0.
For CPYFERTWN, option B (encoded by PSTATE.C = 1), the format of the arguments is:
• Xn holds the number of bytes remaining to be copied in the memory copy in total.
• Xs holds the lowest address that the copy is copied from.
• Xd holds the lowest address that the copy is copied to.
• At the end of the instruction:
◦ the value of Xn is written back with 0.

CPYFPRTWN, CPYFMRTWN,
Page 143
CPYFERTWN
◦ the value of Xs is written back with the lowest address that has not been copied from.
◦ the value of Xd is written back with the lowest address that has not been copied to.

Integer
(FEAT_MOPS)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sz 0 1 1 0 0 1 op1 0 Rs 0 1 1 0 0 1 Rn Rd
op2

Epilogue (op1 == 10)

CPYFERTWN [<Xd>]!, [<Xs>]!, <Xn>!

Main (op1 == 01)

CPYFMRTWN [<Xd>]!, [<Xs>]!, <Xn>!

Prologue (op1 == 00)

CPYFPRTWN [<Xd>]!, [<Xs>]!, <Xn>!

if !HaveFeatMOPS() then UNDEFINED;

if sz != '00' then UNDEFINED;

integer d = UInt(Rd);
integer s = UInt(Rs);
integer n = UInt(Rn);
bits(4) options = op2;

MOPSStage stage;
case op1 of
when '00' stage = MOPSStage_Prologue;
when '01' stage = MOPSStage_Main;
when '10' stage = MOPSStage_Epilogue;
otherwise SEE "Memory Copy and Memory Set";

if d == s || s == n || d == n then UNDEFINED;
if d == 31 || s == 31 || n == 31 then UNDEFINED;

Assembler Symbols

CPYFPRTWN, CPYFMRTWN,
Page 144
CPYFERTWN
Operation

CPYFPRTWN, CPYFMRTWN,
Page 145
CPYFERTWN
CheckMOPSEnabled();

integer N = MaxBlockSizeCopiedBytes();
bits(64) toaddress = X[d];
bits(64) fromaddress = X[s];
bits(64) cpysize = X[n];
bits(64) stagecpysize;
bits(8*N) readdata;
integer B;

if HaveMTE2Ext() then
SetTagCheckedInstruction(TRUE);

boolean supports_option_a = MemCpyOptionA();

(racctype, wacctype) = MemCpyAccessTypes(options);

if stage == MOPSStage_Prologue then

if cpysize<63> == '1' then cpysize = 0x7FFFFFFFFFFFFFFF<63:0>;

// IMP DEF selection of the amount covered by pre-processing.

stagecpysize = CPYPreSizeChoice(toaddress, fromaddress, cpysize);
assert stagecpysize<63> == cpysize<63> || stagecpysize == Zeros();

if SInt(cpysize) > 0 then

assert SInt(stagecpysize) <= SInt(cpysize);
else
assert SInt(stagecpysize) >= SInt(cpysize);
else
boolean zero_size_exceptions = MemCpyZeroSizeCheck();

// Check if this version is consistent with the state of the call.

bits(64) postsize = CPYPostSizeChoice(toaddress, fromaddress, cpysize);

assert postsize<63> == cpysize<63> || SInt(postsize) == 0;

if stage == MOPSStage_Main then

stagecpysize = cpysize - postsize;

// Check if the parameters to this instruction are valid.

// Check if the parameters to this instruction are valid for the epilogue.

CPYFPRTWN, CPYFMRTWN,
Page 146
CPYFERTWN
if (cpysize != postsize || MemCpyParametersIllformedE(toaddress, fromaddress, cpysize)) then
boolean wrong_option = FALSE;
boolean from_epilogue = TRUE;
MismatchedMemCpyException(supports_option_a, d, s, n, wrong_option, from_epilogue, options);

readdata<B*8-1:0> = Mem[fromaddress+cpysize, B, racctype];

Mem[toaddress+cpysize, B, wacctype] = readdata<B*8-1:0>;
cpysize = cpysize + B;
stagecpysize = stagecpysize + B;

if stage != MOPSStage_Prologue then

readdata<B*8-1:0> = Mem[fromaddress, B, racctype];

Mem[toaddress, B, wacctype] = readdata<B*8-1:0>;
fromaddress = fromaddress + B;
toaddress = toaddress + B;

cpysize = cpysize - B;
stagecpysize = stagecpysize - B;

if stage != MOPSStage_Prologue then

X[n] = cpysize;
X[d] = toaddress;
X[s] = fromaddress;

if stage == MOPSStage_Prologue then

X[n] = cpysize;
X[d] = toaddress;
X[s] = fromaddress;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

CPYFPRTWN, CPYFMRTWN,
Page 147
CPYFERTWN
CPYFPT, CPYFMT, CPYFET

Memory Copy Forward-only, reads and writes unprivileged. These instructions perform a memory copy. The prologue,
main, and epilogue instructions are expected to be run in succession and to appear consecutively in memory: CPYFPT,
then CPYFMT, and then CPYFET.
CPYFPT performs some preconditioning of the arguments suitable for using the CPYFMT instruction, and performs an
IMPLEMENTATION DEFINED amount of the memory copy. CPYFMT performs an IMPLEMENTATION DEFINED amount of the
memory copy. CPYFET performs the last part of the memory copy.

Note

Portable software should not assume that the choice of algorithm is constant.
After execution of CPYFPT, option A (which results in encoding PSTATE.C = 0):
• If Xn<63> == 1, the copy size is saturated to 0x7FFFFFFFFFFFFFFF.
• Xs holds the original Xs + saturated Xn.
• Xd holds the original Xd + saturated Xn.
• Xn holds -1* saturated Xn + an IMPLEMENTATION DEFINED number of bytes copied.
• PSTATE.{N,Z,V} are set to {0,0,0}.
After execution of CPYFPT, option B (which results in encoding PSTATE.C = 1):
• If Xn<63> == 1, the copy size is saturated to 0x7FFFFFFFFFFFFFFF.
• Xs holds the original Xs + an IMPLEMENTATION DEFINED number of bytes copied.
• Xd holds the original Xd + an IMPLEMENTATION DEFINED number of bytes copied.
• Xn holds the saturated Xn - an IMPLEMENTATION DEFINED number of bytes copied.
• PSTATE.{N,Z,V} are set to {0,0,0}.
For CPYFMT, option A (encoded by PSTATE.C = 0), the format of the arguments is:
• Xn is treated as a signed 64-bit number and holds -1* the number of bytes remaining to be copied in the
memory copy in total.
• Xs holds the lowest address that the copy is copied from -Xn.
• Xd holds the lowest address that the copy is made to -Xn.
• At the end of the instruction, the value of Xn is written back with -1* the number of bytes remaining to be
copied in the memory copy in total.
For CPYFMT, option B (encoded by PSTATE.C = 1), the format of the arguments is:
• Xn holds the number of bytes remaining to be copied in the memory copy in total.
• Xs holds the lowest address that the copy is copied from.
• Xd holds the lowest address that the copy is copied to.
• At the end of the instruction:
◦ the value of Xn is written back with the number of bytes remaining to be copied in the memory
copy in total.
◦ the value of Xs is written back with the lowest address that has not been copied from.
◦ the value of Xd is written back with the lowest address that has not been copied to.
For CPYFET, option A (encoded by PSTATE.C = 0), the format of the arguments is:
• Xn is treated as a signed 64-bit number and holds -1* the number of bytes remaining to be copied in the
memory copy in total.
• Xs holds the lowest address that the copy is copied from -Xn.
• Xd holds the lowest address that the copy is made to -Xn.
• At the end of the instruction, the value of Xn is written back with 0.
For CPYFET, option B (encoded by PSTATE.C = 1), the format of the arguments is:
• Xn holds the number of bytes remaining to be copied in the memory copy in total.
• Xs holds the lowest address that the copy is copied from.
• Xd holds the lowest address that the copy is copied to.
• At the end of the instruction:
◦ the value of Xn is written back with 0.

CPYFPT, CPYFMT, CPYFET Page 148

◦ the value of Xs is written back with the lowest address that has not been copied from.
◦ the value of Xd is written back with the lowest address that has not been copied to.

Integer
(FEAT_MOPS)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sz 0 1 1 0 0 1 op1 0 Rs 0 0 1 1 0 1 Rn Rd
op2

Epilogue (op1 == 10)

CPYFET [<Xd>]!, [<Xs>]!, <Xn>!

Main (op1 == 01)

CPYFMT [<Xd>]!, [<Xs>]!, <Xn>!

Prologue (op1 == 00)

CPYFPT [<Xd>]!, [<Xs>]!, <Xn>!

if !HaveFeatMOPS() then UNDEFINED;

if sz != '00' then UNDEFINED;

integer d = UInt(Rd);
integer s = UInt(Rs);
integer n = UInt(Rn);
bits(4) options = op2;

MOPSStage stage;
case op1 of
when '00' stage = MOPSStage_Prologue;
when '01' stage = MOPSStage_Main;
when '10' stage = MOPSStage_Epilogue;
otherwise SEE "Memory Copy and Memory Set";

if d == s || s == n || d == n then UNDEFINED;
if d == 31 || s == 31 || n == 31 then UNDEFINED;

Assembler Symbols

CPYFPT, CPYFMT, CPYFET Page 149

Operation

CPYFPT, CPYFMT, CPYFET Page 150

CheckMOPSEnabled();

integer N = MaxBlockSizeCopiedBytes();
bits(64) toaddress = X[d];
bits(64) fromaddress = X[s];
bits(64) cpysize = X[n];
bits(64) stagecpysize;
bits(8*N) readdata;
integer B;

if HaveMTE2Ext() then
SetTagCheckedInstruction(TRUE);

boolean supports_option_a = MemCpyOptionA();

(racctype, wacctype) = MemCpyAccessTypes(options);

if stage == MOPSStage_Prologue then

if cpysize<63> == '1' then cpysize = 0x7FFFFFFFFFFFFFFF<63:0>;

// IMP DEF selection of the amount covered by pre-processing.

stagecpysize = CPYPreSizeChoice(toaddress, fromaddress, cpysize);
assert stagecpysize<63> == cpysize<63> || stagecpysize == Zeros();

if SInt(cpysize) > 0 then

assert SInt(stagecpysize) <= SInt(cpysize);
else
assert SInt(stagecpysize) >= SInt(cpysize);
else
boolean zero_size_exceptions = MemCpyZeroSizeCheck();

// Check if this version is consistent with the state of the call.

bits(64) postsize = CPYPostSizeChoice(toaddress, fromaddress, cpysize);

assert postsize<63> == cpysize<63> || SInt(postsize) == 0;

if stage == MOPSStage_Main then

stagecpysize = cpysize - postsize;

// Check if the parameters to this instruction are valid.

// Check if the parameters to this instruction are valid for the epilogue.

CPYFPT, CPYFMT, CPYFET Page 151

readdata<B*8-1:0> = Mem[fromaddress+cpysize, B, racctype];

Mem[toaddress+cpysize, B, wacctype] = readdata<B*8-1:0>;
cpysize = cpysize + B;
stagecpysize = stagecpysize + B;

if stage != MOPSStage_Prologue then

readdata<B*8-1:0> = Mem[fromaddress, B, racctype];

Mem[toaddress, B, wacctype] = readdata<B*8-1:0>;
fromaddress = fromaddress + B;
toaddress = toaddress + B;

cpysize = cpysize - B;
stagecpysize = stagecpysize - B;

if stage != MOPSStage_Prologue then

X[n] = cpysize;
X[d] = toaddress;
X[s] = fromaddress;

if stage == MOPSStage_Prologue then

X[n] = cpysize;
X[d] = toaddress;
X[s] = fromaddress;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

CPYFPT, CPYFMT, CPYFET Page 152

CPYFPTN, CPYFMTN, CPYFETN

Memory Copy Forward-only, reads and writes unprivileged and non-temporal. These instructions perform a memory
copy. The prologue, main, and epilogue instructions are expected to be run in succession and to appear consecutively
in memory: CPYFPTN, then CPYFMTN, and then CPYFETN.
CPYFPTN performs some preconditioning of the arguments suitable for using the CPYFMTN instruction, and performs
an IMPLEMENTATION DEFINED amount of the memory copy. CPYFMTN performs an IMPLEMENTATION DEFINED amount of
the memory copy. CPYFETN performs the last part of the memory copy.

Note

Portable software should not assume that the choice of algorithm is constant.
After execution of CPYFPTN, option A (which results in encoding PSTATE.C = 0):
• If Xn<63> == 1, the copy size is saturated to 0x7FFFFFFFFFFFFFFF.
• Xs holds the original Xs + saturated Xn.
• Xd holds the original Xd + saturated Xn.
• Xn holds -1* saturated Xn + an IMPLEMENTATION DEFINED number of bytes copied.
• PSTATE.{N,Z,V} are set to {0,0,0}.
After execution of CPYFPTN, option B (which results in encoding PSTATE.C = 1):
• If Xn<63> == 1, the copy size is saturated to 0x7FFFFFFFFFFFFFFF.
• Xs holds the original Xs + an IMPLEMENTATION DEFINED number of bytes copied.
• Xd holds the original Xd + an IMPLEMENTATION DEFINED number of bytes copied.
• Xn holds the saturated Xn - an IMPLEMENTATION DEFINED number of bytes copied.
• PSTATE.{N,Z,V} are set to {0,0,0}.
For CPYFMTN, option A (encoded by PSTATE.C = 0), the format of the arguments is:
• Xn is treated as a signed 64-bit number and holds -1* the number of bytes remaining to be copied in the
memory copy in total.
• Xs holds the lowest address that the copy is copied from -Xn.
• Xd holds the lowest address that the copy is made to -Xn.
• At the end of the instruction, the value of Xn is written back with -1* the number of bytes remaining to be
copied in the memory copy in total.
For CPYFMTN, option B (encoded by PSTATE.C = 1), the format of the arguments is:
• Xn holds the number of bytes remaining to be copied in the memory copy in total.
• Xs holds the lowest address that the copy is copied from.
• Xd holds the lowest address that the copy is copied to.
• At the end of the instruction:
◦ the value of Xn is written back with the number of bytes remaining to be copied in the memory
copy in total.
◦ the value of Xs is written back with the lowest address that has not been copied from.
◦ the value of Xd is written back with the lowest address that has not been copied to.
For CPYFETN, option A (encoded by PSTATE.C = 0), the format of the arguments is:
• Xn is treated as a signed 64-bit number and holds -1* the number of bytes remaining to be copied in the
memory copy in total.
• Xs holds the lowest address that the copy is copied from -Xn.
• Xd holds the lowest address that the copy is made to -Xn.
• At the end of the instruction, the value of Xn is written back with 0.
For CPYFETN, option B (encoded by PSTATE.C = 1), the format of the arguments is:
• Xn holds the number of bytes remaining to be copied in the memory copy in total.
• Xs holds the lowest address that the copy is copied from.
• Xd holds the lowest address that the copy is copied to.
• At the end of the instruction:
◦ the value of Xn is written back with 0.

CPYFPTN, CPYFMTN,
Page 153
CPYFETN
◦ the value of Xs is written back with the lowest address that has not been copied from.
◦ the value of Xd is written back with the lowest address that has not been copied to.

Integer
(FEAT_MOPS)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sz 0 1 1 0 0 1 op1 0 Rs 1 1 1 1 0 1 Rn Rd
op2

Epilogue (op1 == 10)

CPYFETN [<Xd>]!, [<Xs>]!, <Xn>!

Main (op1 == 01)

CPYFMTN [<Xd>]!, [<Xs>]!, <Xn>!

Prologue (op1 == 00)

CPYFPTN [<Xd>]!, [<Xs>]!, <Xn>!

if !HaveFeatMOPS() then UNDEFINED;

if sz != '00' then UNDEFINED;

integer d = UInt(Rd);
integer s = UInt(Rs);
integer n = UInt(Rn);
bits(4) options = op2;

MOPSStage stage;
case op1 of
when '00' stage = MOPSStage_Prologue;
when '01' stage = MOPSStage_Main;
when '10' stage = MOPSStage_Epilogue;
otherwise SEE "Memory Copy and Memory Set";

if d == s || s == n || d == n then UNDEFINED;
if d == 31 || s == 31 || n == 31 then UNDEFINED;

Assembler Symbols

CPYFPTN, CPYFMTN,
Page 154
CPYFETN
Operation

CPYFPTN, CPYFMTN,
Page 155
CPYFETN
CheckMOPSEnabled();

integer N = MaxBlockSizeCopiedBytes();
bits(64) toaddress = X[d];
bits(64) fromaddress = X[s];
bits(64) cpysize = X[n];
bits(64) stagecpysize;
bits(8*N) readdata;
integer B;

if HaveMTE2Ext() then
SetTagCheckedInstruction(TRUE);

boolean supports_option_a = MemCpyOptionA();

(racctype, wacctype) = MemCpyAccessTypes(options);

if stage == MOPSStage_Prologue then

if cpysize<63> == '1' then cpysize = 0x7FFFFFFFFFFFFFFF<63:0>;

// IMP DEF selection of the amount covered by pre-processing.

stagecpysize = CPYPreSizeChoice(toaddress, fromaddress, cpysize);
assert stagecpysize<63> == cpysize<63> || stagecpysize == Zeros();

if SInt(cpysize) > 0 then

assert SInt(stagecpysize) <= SInt(cpysize);
else
assert SInt(stagecpysize) >= SInt(cpysize);
else
boolean zero_size_exceptions = MemCpyZeroSizeCheck();

// Check if this version is consistent with the state of the call.

bits(64) postsize = CPYPostSizeChoice(toaddress, fromaddress, cpysize);

assert postsize<63> == cpysize<63> || SInt(postsize) == 0;

if stage == MOPSStage_Main then

stagecpysize = cpysize - postsize;

// Check if the parameters to this instruction are valid.

// Check if the parameters to this instruction are valid for the epilogue.

CPYFPTN, CPYFMTN,
Page 156
CPYFETN
if (cpysize != postsize || MemCpyParametersIllformedE(toaddress, fromaddress, cpysize)) then
boolean wrong_option = FALSE;
boolean from_epilogue = TRUE;
MismatchedMemCpyException(supports_option_a, d, s, n, wrong_option, from_epilogue, options);

readdata<B*8-1:0> = Mem[fromaddress+cpysize, B, racctype];

Mem[toaddress+cpysize, B, wacctype] = readdata<B*8-1:0>;
cpysize = cpysize + B;
stagecpysize = stagecpysize + B;

if stage != MOPSStage_Prologue then

readdata<B*8-1:0> = Mem[fromaddress, B, racctype];

Mem[toaddress, B, wacctype] = readdata<B*8-1:0>;
fromaddress = fromaddress + B;
toaddress = toaddress + B;

cpysize = cpysize - B;
stagecpysize = stagecpysize - B;

if stage != MOPSStage_Prologue then

X[n] = cpysize;
X[d] = toaddress;
X[s] = fromaddress;

if stage == MOPSStage_Prologue then

X[n] = cpysize;
X[d] = toaddress;
X[s] = fromaddress;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

CPYFPTN, CPYFMTN,
Page 157
CPYFETN
CPYFPTRN, CPYFMTRN, CPYFETRN

Memory Copy Forward-only, reads and writes unprivileged, reads non-temporal. These instructions perform a memory
copy. The prologue, main, and epilogue instructions are expected to be run in succession and to appear consecutively
in memory: CPYFPTRN, then CPYFMTRN, and then CPYFETRN.
CPYFPTRN performs some preconditioning of the arguments suitable for using the CPYFMTRN instruction, and
performs an IMPLEMENTATION DEFINED amount of the memory copy. CPYFMTRN performs an IMPLEMENTATION DEFINED
amount of the memory copy. CPYFETRN performs the last part of the memory copy.

Note

Portable software should not assume that the choice of algorithm is constant.
After execution of CPYFPTRN, option A (which results in encoding PSTATE.C = 0):
• If Xn<63> == 1, the copy size is saturated to 0x7FFFFFFFFFFFFFFF.
• Xs holds the original Xs + saturated Xn.
• Xd holds the original Xd + saturated Xn.
• Xn holds -1* saturated Xn + an IMPLEMENTATION DEFINED number of bytes copied.
• PSTATE.{N,Z,V} are set to {0,0,0}.
After execution of CPYFPTRN, option B (which results in encoding PSTATE.C = 1):
• If Xn<63> == 1, the copy size is saturated to 0x7FFFFFFFFFFFFFFF.
• Xs holds the original Xs + an IMPLEMENTATION DEFINED number of bytes copied.
• Xd holds the original Xd + an IMPLEMENTATION DEFINED number of bytes copied.
• Xn holds the saturated Xn - an IMPLEMENTATION DEFINED number of bytes copied.
• PSTATE.{N,Z,V} are set to {0,0,0}.
For CPYFMTRN, option A (encoded by PSTATE.C = 0), the format of the arguments is:
• Xn is treated as a signed 64-bit number and holds -1* the number of bytes remaining to be copied in the
memory copy in total.
• Xs holds the lowest address that the copy is copied from -Xn.
• Xd holds the lowest address that the copy is made to -Xn.
• At the end of the instruction, the value of Xn is written back with -1* the number of bytes remaining to be
copied in the memory copy in total.
For CPYFMTRN, option B (encoded by PSTATE.C = 1), the format of the arguments is:
• Xn holds the number of bytes remaining to be copied in the memory copy in total.
• Xs holds the lowest address that the copy is copied from.
• Xd holds the lowest address that the copy is copied to.
• At the end of the instruction:
◦ the value of Xn is written back with the number of bytes remaining to be copied in the memory
copy in total.
◦ the value of Xs is written back with the lowest address that has not been copied from.
◦ the value of Xd is written back with the lowest address that has not been copied to.
For CPYFETRN, option A (encoded by PSTATE.C = 0), the format of the arguments is:
• Xn is treated as a signed 64-bit number and holds -1* the number of bytes remaining to be copied in the
memory copy in total.
• Xs holds the lowest address that the copy is copied from -Xn.
• Xd holds the lowest address that the copy is made to -Xn.
• At the end of the instruction, the value of Xn is written back with 0.
For CPYFETRN, option B (encoded by PSTATE.C = 1), the format of the arguments is:
• Xn holds the number of bytes remaining to be copied in the memory copy in total.
• Xs holds the lowest address that the copy is copied from.
• Xd holds the lowest address that the copy is copied to.
• At the end of the instruction:
◦ the value of Xn is written back with 0.

CPYFPTRN, CPYFMTRN,
Page 158
CPYFETRN
◦ the value of Xs is written back with the lowest address that has not been copied from.
◦ the value of Xd is written back with the lowest address that has not been copied to.

Integer
(FEAT_MOPS)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sz 0 1 1 0 0 1 op1 0 Rs 1 0 1 1 0 1 Rn Rd
op2

Epilogue (op1 == 10)

CPYFETRN [<Xd>]!, [<Xs>]!, <Xn>!

Main (op1 == 01)

CPYFMTRN [<Xd>]!, [<Xs>]!, <Xn>!

Prologue (op1 == 00)

CPYFPTRN [<Xd>]!, [<Xs>]!, <Xn>!

if !HaveFeatMOPS() then UNDEFINED;

if sz != '00' then UNDEFINED;

integer d = UInt(Rd);
integer s = UInt(Rs);
integer n = UInt(Rn);
bits(4) options = op2;

MOPSStage stage;
case op1 of
when '00' stage = MOPSStage_Prologue;
when '01' stage = MOPSStage_Main;
when '10' stage = MOPSStage_Epilogue;
otherwise SEE "Memory Copy and Memory Set";

if d == s || s == n || d == n then UNDEFINED;
if d == 31 || s == 31 || n == 31 then UNDEFINED;

Assembler Symbols

CPYFPTRN, CPYFMTRN,
Page 159
CPYFETRN
Operation

CPYFPTRN, CPYFMTRN,
Page 160
CPYFETRN
CheckMOPSEnabled();

integer N = MaxBlockSizeCopiedBytes();
bits(64) toaddress = X[d];
bits(64) fromaddress = X[s];
bits(64) cpysize = X[n];
bits(64) stagecpysize;
bits(8*N) readdata;
integer B;

if HaveMTE2Ext() then
SetTagCheckedInstruction(TRUE);

boolean supports_option_a = MemCpyOptionA();

(racctype, wacctype) = MemCpyAccessTypes(options);

if stage == MOPSStage_Prologue then

if cpysize<63> == '1' then cpysize = 0x7FFFFFFFFFFFFFFF<63:0>;

// IMP DEF selection of the amount covered by pre-processing.

stagecpysize = CPYPreSizeChoice(toaddress, fromaddress, cpysize);
assert stagecpysize<63> == cpysize<63> || stagecpysize == Zeros();

if SInt(cpysize) > 0 then

assert SInt(stagecpysize) <= SInt(cpysize);
else
assert SInt(stagecpysize) >= SInt(cpysize);
else
boolean zero_size_exceptions = MemCpyZeroSizeCheck();

// Check if this version is consistent with the state of the call.

bits(64) postsize = CPYPostSizeChoice(toaddress, fromaddress, cpysize);

assert postsize<63> == cpysize<63> || SInt(postsize) == 0;

if stage == MOPSStage_Main then

stagecpysize = cpysize - postsize;

// Check if the parameters to this instruction are valid.

// Check if the parameters to this instruction are valid for the epilogue.

CPYFPTRN, CPYFMTRN,
Page 161
CPYFETRN
if (cpysize != postsize || MemCpyParametersIllformedE(toaddress, fromaddress, cpysize)) then
boolean wrong_option = FALSE;
boolean from_epilogue = TRUE;
MismatchedMemCpyException(supports_option_a, d, s, n, wrong_option, from_epilogue, options);

readdata<B*8-1:0> = Mem[fromaddress+cpysize, B, racctype];

Mem[toaddress+cpysize, B, wacctype] = readdata<B*8-1:0>;
cpysize = cpysize + B;
stagecpysize = stagecpysize + B;

if stage != MOPSStage_Prologue then

readdata<B*8-1:0> = Mem[fromaddress, B, racctype];

Mem[toaddress, B, wacctype] = readdata<B*8-1:0>;
fromaddress = fromaddress + B;
toaddress = toaddress + B;

cpysize = cpysize - B;
stagecpysize = stagecpysize - B;

if stage != MOPSStage_Prologue then

X[n] = cpysize;
X[d] = toaddress;
X[s] = fromaddress;

if stage == MOPSStage_Prologue then

X[n] = cpysize;
X[d] = toaddress;
X[s] = fromaddress;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

CPYFPTRN, CPYFMTRN,
Page 162
CPYFETRN
CPYFPTWN, CPYFMTWN, CPYFETWN

Memory Copy Forward-only, reads and writes unprivileged, writes non-temporal. These instructions perform a memory
copy. The prologue, main, and epilogue instructions are expected to be run in succession and to appear consecutively
in memory: CPYFPTWN, then CPYFMTWN, and then CPYFETWN.
CPYFPTWN performs some preconditioning of the arguments suitable for using the CPYFMTWN instruction, and
performs an IMPLEMENTATION DEFINED amount of the memory copy. CPYFMTWN performs an IMPLEMENTATION DEFINED
amount of the memory copy. CPYFETWN performs the last part of the memory copy.

Note

Portable software should not assume that the choice of algorithm is constant.
After execution of CPYFPTWN, option A (which results in encoding PSTATE.C = 0):
• If Xn<63> == 1, the copy size is saturated to 0x7FFFFFFFFFFFFFFF.
• Xs holds the original Xs + saturated Xn.
• Xd holds the original Xd + saturated Xn.
• Xn holds -1* saturated Xn + an IMPLEMENTATION DEFINED number of bytes copied.
• PSTATE.{N,Z,V} are set to {0,0,0}.
After execution of CPYFPTWN, option B (which results in encoding PSTATE.C = 1):
• If Xn<63> == 1, the copy size is saturated to 0x7FFFFFFFFFFFFFFF.
• Xs holds the original Xs + an IMPLEMENTATION DEFINED number of bytes copied.
• Xd holds the original Xd + an IMPLEMENTATION DEFINED number of bytes copied.
• Xn holds the saturated Xn - an IMPLEMENTATION DEFINED number of bytes copied.
• PSTATE.{N,Z,V} are set to {0,0,0}.
For CPYFMTWN, option A (encoded by PSTATE.C = 0), the format of the arguments is:
• Xn is treated as a signed 64-bit number and holds -1* the number of bytes remaining to be copied in the
memory copy in total.
• Xs holds the lowest address that the copy is copied from -Xn.
• Xd holds the lowest address that the copy is made to -Xn.
• At the end of the instruction, the value of Xn is written back with -1* the number of bytes remaining to be
copied in the memory copy in total.
For CPYFMTWN, option B (encoded by PSTATE.C = 1), the format of the arguments is:
• Xn holds the number of bytes remaining to be copied in the memory copy in total.
• Xs holds the lowest address that the copy is copied from.
• Xd holds the lowest address that the copy is copied to.
• At the end of the instruction:
◦ the value of Xn is written back with the number of bytes remaining to be copied in the memory
copy in total.
◦ the value of Xs is written back with the lowest address that has not been copied from.
◦ the value of Xd is written back with the lowest address that has not been copied to.
For CPYFETWN, option A (encoded by PSTATE.C = 0), the format of the arguments is:
• Xn is treated as a signed 64-bit number and holds -1* the number of bytes remaining to be copied in the
memory copy in total.
• Xs holds the lowest address that the copy is copied from -Xn.
• Xd holds the lowest address that the copy is made to -Xn.
• At the end of the instruction, the value of Xn is written back with 0.
For CPYFETWN, option B (encoded by PSTATE.C = 1), the format of the arguments is:
• Xn holds the number of bytes remaining to be copied in the memory copy in total.
• Xs holds the lowest address that the copy is copied from.
• Xd holds the lowest address that the copy is copied to.
• At the end of the instruction:
◦ the value of Xn is written back with 0.

CPYFPTWN, CPYFMTWN,
Page 163
CPYFETWN
◦ the value of Xs is written back with the lowest address that has not been copied from.
◦ the value of Xd is written back with the lowest address that has not been copied to.

Integer
(FEAT_MOPS)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sz 0 1 1 0 0 1 op1 0 Rs 0 1 1 1 0 1 Rn Rd
op2

Epilogue (op1 == 10)

CPYFETWN [<Xd>]!, [<Xs>]!, <Xn>!

Main (op1 == 01)

CPYFMTWN [<Xd>]!, [<Xs>]!, <Xn>!

Prologue (op1 == 00)

CPYFPTWN [<Xd>]!, [<Xs>]!, <Xn>!

if !HaveFeatMOPS() then UNDEFINED;

if sz != '00' then UNDEFINED;

integer d = UInt(Rd);
integer s = UInt(Rs);
integer n = UInt(Rn);
bits(4) options = op2;

MOPSStage stage;
case op1 of
when '00' stage = MOPSStage_Prologue;
when '01' stage = MOPSStage_Main;
when '10' stage = MOPSStage_Epilogue;
otherwise SEE "Memory Copy and Memory Set";

if d == s || s == n || d == n then UNDEFINED;
if d == 31 || s == 31 || n == 31 then UNDEFINED;

Assembler Symbols

CPYFPTWN, CPYFMTWN,
Page 164
CPYFETWN
Operation

CPYFPTWN, CPYFMTWN,
Page 165
CPYFETWN
CheckMOPSEnabled();

integer N = MaxBlockSizeCopiedBytes();
bits(64) toaddress = X[d];
bits(64) fromaddress = X[s];
bits(64) cpysize = X[n];
bits(64) stagecpysize;
bits(8*N) readdata;
integer B;

if HaveMTE2Ext() then
SetTagCheckedInstruction(TRUE);

boolean supports_option_a = MemCpyOptionA();

(racctype, wacctype) = MemCpyAccessTypes(options);

if stage == MOPSStage_Prologue then

if cpysize<63> == '1' then cpysize = 0x7FFFFFFFFFFFFFFF<63:0>;

// IMP DEF selection of the amount covered by pre-processing.

stagecpysize = CPYPreSizeChoice(toaddress, fromaddress, cpysize);
assert stagecpysize<63> == cpysize<63> || stagecpysize == Zeros();

if SInt(cpysize) > 0 then

assert SInt(stagecpysize) <= SInt(cpysize);
else
assert SInt(stagecpysize) >= SInt(cpysize);
else
boolean zero_size_exceptions = MemCpyZeroSizeCheck();

// Check if this version is consistent with the state of the call.

bits(64) postsize = CPYPostSizeChoice(toaddress, fromaddress, cpysize);

assert postsize<63> == cpysize<63> || SInt(postsize) == 0;

if stage == MOPSStage_Main then

stagecpysize = cpysize - postsize;

// Check if the parameters to this instruction are valid.

// Check if the parameters to this instruction are valid for the epilogue.

CPYFPTWN, CPYFMTWN,
Page 166
CPYFETWN
if (cpysize != postsize || MemCpyParametersIllformedE(toaddress, fromaddress, cpysize)) then
boolean wrong_option = FALSE;
boolean from_epilogue = TRUE;
MismatchedMemCpyException(supports_option_a, d, s, n, wrong_option, from_epilogue, options);

readdata<B*8-1:0> = Mem[fromaddress+cpysize, B, racctype];

Mem[toaddress+cpysize, B, wacctype] = readdata<B*8-1:0>;
cpysize = cpysize + B;
stagecpysize = stagecpysize + B;

if stage != MOPSStage_Prologue then

readdata<B*8-1:0> = Mem[fromaddress, B, racctype];

Mem[toaddress, B, wacctype] = readdata<B*8-1:0>;
fromaddress = fromaddress + B;
toaddress = toaddress + B;

cpysize = cpysize - B;
stagecpysize = stagecpysize - B;

if stage != MOPSStage_Prologue then

X[n] = cpysize;
X[d] = toaddress;
X[s] = fromaddress;

if stage == MOPSStage_Prologue then

X[n] = cpysize;
X[d] = toaddress;
X[s] = fromaddress;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

CPYFPTWN, CPYFMTWN,
Page 167
CPYFETWN
CPYFPWN, CPYFMWN, CPYFEWN

Memory Copy Forward-only, writes non-temporal. These instructions perform a memory copy. The prologue, main, and
epilogue instructions are expected to be run in succession and to appear consecutively in memory: CPYFPWN, then
CPYFMWN, and then CPYFEWN.
CPYFPWN performs some preconditioning of the arguments suitable for using the CPYFMWN instruction, and
performs an IMPLEMENTATION DEFINED amount of the memory copy. CPYFMWN performs an IMPLEMENTATION DEFINED
amount of the memory copy. CPYFEWN performs the last part of the memory copy.

Note

Portable software should not assume that the choice of algorithm is constant.
After execution of CPYFPWN, option A (which results in encoding PSTATE.C = 0):
• If Xn<63> == 1, the copy size is saturated to 0x7FFFFFFFFFFFFFFF.
• Xs holds the original Xs + saturated Xn.
• Xd holds the original Xd + saturated Xn.
• Xn holds -1* saturated Xn + an IMPLEMENTATION DEFINED number of bytes copied.
• PSTATE.{N,Z,V} are set to {0,0,0}.
After execution of CPYFPWN, option B (which results in encoding PSTATE.C = 1):
• If Xn<63> == 1, the copy size is saturated to 0x7FFFFFFFFFFFFFFF.
• Xs holds the original Xs + an IMPLEMENTATION DEFINED number of bytes copied.
• Xd holds the original Xd + an IMPLEMENTATION DEFINED number of bytes copied.
• Xn holds the saturated Xn - an IMPLEMENTATION DEFINED number of bytes copied.
• PSTATE.{N,Z,V} are set to {0,0,0}.
For CPYFMWN, option A (encoded by PSTATE.C = 0), the format of the arguments is:
• Xn is treated as a signed 64-bit number and holds -1* the number of bytes remaining to be copied in the
memory copy in total.
• Xs holds the lowest address that the copy is copied from -Xn.
• Xd holds the lowest address that the copy is made to -Xn.
• At the end of the instruction, the value of Xn is written back with -1* the number of bytes remaining to be
copied in the memory copy in total.
For CPYFMWN, option B (encoded by PSTATE.C = 1), the format of the arguments is:
• Xn holds the number of bytes remaining to be copied in the memory copy in total.
• Xs holds the lowest address that the copy is copied from.
• Xd holds the lowest address that the copy is copied to.
• At the end of the instruction:
◦ the value of Xn is written back with the number of bytes remaining to be copied in the memory
copy in total.
◦ the value of Xs is written back with the lowest address that has not been copied from.
◦ the value of Xd is written back with the lowest address that has not been copied to.
For CPYFEWN, option A (encoded by PSTATE.C = 0), the format of the arguments is:
• Xn is treated as a signed 64-bit number and holds -1* the number of bytes remaining to be copied in the
memory copy in total.
• Xs holds the lowest address that the copy is copied from -Xn.
• Xd holds the lowest address that the copy is made to -Xn.
• At the end of the instruction, the value of Xn is written back with 0.
For CPYFEWN, option B (encoded by PSTATE.C = 1), the format of the arguments is:
• Xn holds the number of bytes remaining to be copied in the memory copy in total.
• Xs holds the lowest address that the copy is copied from.
• Xd holds the lowest address that the copy is copied to.
• At the end of the instruction:
◦ the value of Xn is written back with 0.

CPYFPWN, CPYFMWN,
Page 168
CPYFEWN
◦ the value of Xs is written back with the lowest address that has not been copied from.
◦ the value of Xd is written back with the lowest address that has not been copied to.

Integer
(FEAT_MOPS)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sz 0 1 1 0 0 1 op1 0 Rs 0 1 0 0 0 1 Rn Rd
op2

Epilogue (op1 == 10)

CPYFEWN [<Xd>]!, [<Xs>]!, <Xn>!

Main (op1 == 01)

CPYFMWN [<Xd>]!, [<Xs>]!, <Xn>!

Prologue (op1 == 00)

CPYFPWN [<Xd>]!, [<Xs>]!, <Xn>!

if !HaveFeatMOPS() then UNDEFINED;

if sz != '00' then UNDEFINED;

integer d = UInt(Rd);
integer s = UInt(Rs);
integer n = UInt(Rn);
bits(4) options = op2;

MOPSStage stage;
case op1 of
when '00' stage = MOPSStage_Prologue;
when '01' stage = MOPSStage_Main;
when '10' stage = MOPSStage_Epilogue;
otherwise SEE "Memory Copy and Memory Set";

if d == s || s == n || d == n then UNDEFINED;
if d == 31 || s == 31 || n == 31 then UNDEFINED;

Assembler Symbols

CPYFPWN, CPYFMWN,
Page 169
CPYFEWN
Operation

CPYFPWN, CPYFMWN,
Page 170
CPYFEWN
CheckMOPSEnabled();

integer N = MaxBlockSizeCopiedBytes();
bits(64) toaddress = X[d];
bits(64) fromaddress = X[s];
bits(64) cpysize = X[n];
bits(64) stagecpysize;
bits(8*N) readdata;
integer B;

if HaveMTE2Ext() then
SetTagCheckedInstruction(TRUE);

boolean supports_option_a = MemCpyOptionA();

(racctype, wacctype) = MemCpyAccessTypes(options);

if stage == MOPSStage_Prologue then

if cpysize<63> == '1' then cpysize = 0x7FFFFFFFFFFFFFFF<63:0>;

// IMP DEF selection of the amount covered by pre-processing.

stagecpysize = CPYPreSizeChoice(toaddress, fromaddress, cpysize);
assert stagecpysize<63> == cpysize<63> || stagecpysize == Zeros();

if SInt(cpysize) > 0 then

assert SInt(stagecpysize) <= SInt(cpysize);
else
assert SInt(stagecpysize) >= SInt(cpysize);
else
boolean zero_size_exceptions = MemCpyZeroSizeCheck();

// Check if this version is consistent with the state of the call.

bits(64) postsize = CPYPostSizeChoice(toaddress, fromaddress, cpysize);

assert postsize<63> == cpysize<63> || SInt(postsize) == 0;

if stage == MOPSStage_Main then

stagecpysize = cpysize - postsize;

// Check if the parameters to this instruction are valid.

// Check if the parameters to this instruction are valid for the epilogue.

CPYFPWN, CPYFMWN,
Page 171
CPYFEWN
if (cpysize != postsize || MemCpyParametersIllformedE(toaddress, fromaddress, cpysize)) then
boolean wrong_option = FALSE;
boolean from_epilogue = TRUE;
MismatchedMemCpyException(supports_option_a, d, s, n, wrong_option, from_epilogue, options);

readdata<B*8-1:0> = Mem[fromaddress+cpysize, B, racctype];

Mem[toaddress+cpysize, B, wacctype] = readdata<B*8-1:0>;
cpysize = cpysize + B;
stagecpysize = stagecpysize + B;

if stage != MOPSStage_Prologue then

readdata<B*8-1:0> = Mem[fromaddress, B, racctype];

Mem[toaddress, B, wacctype] = readdata<B*8-1:0>;
fromaddress = fromaddress + B;
toaddress = toaddress + B;

cpysize = cpysize - B;
stagecpysize = stagecpysize - B;

if stage != MOPSStage_Prologue then

X[n] = cpysize;
X[d] = toaddress;
X[s] = fromaddress;

if stage == MOPSStage_Prologue then

X[n] = cpysize;
X[d] = toaddress;
X[s] = fromaddress;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

CPYFPWN, CPYFMWN,
Page 172
CPYFEWN
CPYFPWT, CPYFMWT, CPYFEWT

Memory Copy Forward-only, writes unprivileged. These instructions perform a memory copy. The prologue, main, and
epilogue instructions are expected to be run in succession and to appear consecutively in memory: CPYFPWT, then
CPYFMWT, and then CPYFEWT.
CPYFPWT performs some preconditioning of the arguments suitable for using the CPYFMWT instruction, and
performs an IMPLEMENTATION DEFINED amount of the memory copy. CPYFMWT performs an IMPLEMENTATION DEFINED
amount of the memory copy. CPYFEWT performs the last part of the memory copy.

Note

Portable software should not assume that the choice of algorithm is constant.
After execution of CPYFPWT, option A (which results in encoding PSTATE.C = 0):
• If Xn<63> == 1, the copy size is saturated to 0x7FFFFFFFFFFFFFFF.
• Xs holds the original Xs + saturated Xn.
• Xd holds the original Xd + saturated Xn.
• Xn holds -1* saturated Xn + an IMPLEMENTATION DEFINED number of bytes copied.
• PSTATE.{N,Z,V} are set to {0,0,0}.
After execution of CPYFPWT, option B (which results in encoding PSTATE.C = 1):
• If Xn<63> == 1, the copy size is saturated to 0x7FFFFFFFFFFFFFFF.
• Xs holds the original Xs + an IMPLEMENTATION DEFINED number of bytes copied.
• Xd holds the original Xd + an IMPLEMENTATION DEFINED number of bytes copied.
• Xn holds the saturated Xn - an IMPLEMENTATION DEFINED number of bytes copied.
• PSTATE.{N,Z,V} are set to {0,0,0}.
For CPYFMWT, option A (encoded by PSTATE.C = 0), the format of the arguments is:
• Xn is treated as a signed 64-bit number and holds -1* the number of bytes remaining to be copied in the
memory copy in total.
• Xs holds the lowest address that the copy is copied from -Xn.
• Xd holds the lowest address that the copy is made to -Xn.
• At the end of the instruction, the value of Xn is written back with -1* the number of bytes remaining to be
copied in the memory copy in total.
For CPYFMWT, option B (encoded by PSTATE.C = 1), the format of the arguments is:
• Xn holds the number of bytes remaining to be copied in the memory copy in total.
• Xs holds the lowest address that the copy is copied from.
• Xd holds the lowest address that the copy is copied to.
• At the end of the instruction:
◦ the value of Xn is written back with the number of bytes remaining to be copied in the memory
copy in total.
◦ the value of Xs is written back with the lowest address that has not been copied from.
◦ the value of Xd is written back with the lowest address that has not been copied to.
For CPYFEWT, option A (encoded by PSTATE.C = 0), the format of the arguments is:
• Xn is treated as a signed 64-bit number and holds -1* the number of bytes remaining to be copied in the
memory copy in total.
• Xs holds the lowest address that the copy is copied from -Xn.
• Xd holds the lowest address that the copy is made to -Xn.
• At the end of the instruction, the value of Xn is written back with 0.
For CPYFEWT, option B (encoded by PSTATE.C = 1), the format of the arguments is:
• Xn holds the number of bytes remaining to be copied in the memory copy in total.
• Xs holds the lowest address that the copy is copied from.
• Xd holds the lowest address that the copy is copied to.
• At the end of the instruction:
◦ the value of Xn is written back with 0.

CPYFPWT, CPYFMWT,
Page 173
CPYFEWT
◦ the value of Xs is written back with the lowest address that has not been copied from.
◦ the value of Xd is written back with the lowest address that has not been copied to.

Integer
(FEAT_MOPS)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sz 0 1 1 0 0 1 op1 0 Rs 0 0 0 1 0 1 Rn Rd
op2

Epilogue (op1 == 10)

CPYFEWT [<Xd>]!, [<Xs>]!, <Xn>!

Main (op1 == 01)

CPYFMWT [<Xd>]!, [<Xs>]!, <Xn>!

Prologue (op1 == 00)

CPYFPWT [<Xd>]!, [<Xs>]!, <Xn>!

if !HaveFeatMOPS() then UNDEFINED;

if sz != '00' then UNDEFINED;

integer d = UInt(Rd);
integer s = UInt(Rs);
integer n = UInt(Rn);
bits(4) options = op2;

MOPSStage stage;
case op1 of
when '00' stage = MOPSStage_Prologue;
when '01' stage = MOPSStage_Main;
when '10' stage = MOPSStage_Epilogue;
otherwise SEE "Memory Copy and Memory Set";

if d == s || s == n || d == n then UNDEFINED;
if d == 31 || s == 31 || n == 31 then UNDEFINED;

Assembler Symbols

CPYFPWT, CPYFMWT,
Page 174
CPYFEWT
Operation

CPYFPWT, CPYFMWT,
Page 175
CPYFEWT
CheckMOPSEnabled();

integer N = MaxBlockSizeCopiedBytes();
bits(64) toaddress = X[d];
bits(64) fromaddress = X[s];
bits(64) cpysize = X[n];
bits(64) stagecpysize;
bits(8*N) readdata;
integer B;

if HaveMTE2Ext() then
SetTagCheckedInstruction(TRUE);

boolean supports_option_a = MemCpyOptionA();

(racctype, wacctype) = MemCpyAccessTypes(options);

if stage == MOPSStage_Prologue then

if cpysize<63> == '1' then cpysize = 0x7FFFFFFFFFFFFFFF<63:0>;

// IMP DEF selection of the amount covered by pre-processing.

stagecpysize = CPYPreSizeChoice(toaddress, fromaddress, cpysize);
assert stagecpysize<63> == cpysize<63> || stagecpysize == Zeros();

if SInt(cpysize) > 0 then

assert SInt(stagecpysize) <= SInt(cpysize);
else
assert SInt(stagecpysize) >= SInt(cpysize);
else
boolean zero_size_exceptions = MemCpyZeroSizeCheck();

// Check if this version is consistent with the state of the call.

bits(64) postsize = CPYPostSizeChoice(toaddress, fromaddress, cpysize);

assert postsize<63> == cpysize<63> || SInt(postsize) == 0;

if stage == MOPSStage_Main then

stagecpysize = cpysize - postsize;

// Check if the parameters to this instruction are valid.

// Check if the parameters to this instruction are valid for the epilogue.

CPYFPWT, CPYFMWT,
Page 176
CPYFEWT
if (cpysize != postsize || MemCpyParametersIllformedE(toaddress, fromaddress, cpysize)) then
boolean wrong_option = FALSE;
boolean from_epilogue = TRUE;
MismatchedMemCpyException(supports_option_a, d, s, n, wrong_option, from_epilogue, options);

readdata<B*8-1:0> = Mem[fromaddress+cpysize, B, racctype];

Mem[toaddress+cpysize, B, wacctype] = readdata<B*8-1:0>;
cpysize = cpysize + B;
stagecpysize = stagecpysize + B;

if stage != MOPSStage_Prologue then

readdata<B*8-1:0> = Mem[fromaddress, B, racctype];

Mem[toaddress, B, wacctype] = readdata<B*8-1:0>;
fromaddress = fromaddress + B;
toaddress = toaddress + B;

cpysize = cpysize - B;
stagecpysize = stagecpysize - B;

if stage != MOPSStage_Prologue then

X[n] = cpysize;
X[d] = toaddress;
X[s] = fromaddress;

if stage == MOPSStage_Prologue then

X[n] = cpysize;
X[d] = toaddress;
X[s] = fromaddress;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

CPYFPWT, CPYFMWT,
Page 177
CPYFEWT
CPYFPWTN, CPYFMWTN, CPYFEWTN

Memory Copy Forward-only, writes unprivileged, reads and writes non-temporal. These instructions perform a memory
copy. The prologue, main, and epilogue instructions are expected to be run in succession and to appear consecutively
in memory: CPYFPWTN, then CPYFMWTN, and then CPYFEWTN.
CPYFPWTN performs some preconditioning of the arguments suitable for using the CPYFMWTN instruction, and
performs an IMPLEMENTATION DEFINED amount of the memory copy. CPYFMWTN performs an IMPLEMENTATION DEFINED
amount of the memory copy. CPYFEWTN performs the last part of the memory copy.

Note

Portable software should not assume that the choice of algorithm is constant.
After execution of CPYFPWTN, option A (which results in encoding PSTATE.C = 0):
• If Xn<63> == 1, the copy size is saturated to 0x7FFFFFFFFFFFFFFF.
• Xs holds the original Xs + saturated Xn.
• Xd holds the original Xd + saturated Xn.
• Xn holds -1* saturated Xn + an IMPLEMENTATION DEFINED number of bytes copied.
• PSTATE.{N,Z,V} are set to {0,0,0}.
After execution of CPYFPWTN, option B (which results in encoding PSTATE.C = 1):
• If Xn<63> == 1, the copy size is saturated to 0x7FFFFFFFFFFFFFFF.
• Xs holds the original Xs + an IMPLEMENTATION DEFINED number of bytes copied.
• Xd holds the original Xd + an IMPLEMENTATION DEFINED number of bytes copied.
• Xn holds the saturated Xn - an IMPLEMENTATION DEFINED number of bytes copied.
• PSTATE.{N,Z,V} are set to {0,0,0}.
For CPYFMWTN, option A (encoded by PSTATE.C = 0), the format of the arguments is:
• Xn is treated as a signed 64-bit number and holds -1* the number of bytes remaining to be copied in the
memory copy in total.
• Xs holds the lowest address that the copy is copied from -Xn.
• Xd holds the lowest address that the copy is made to -Xn.
• At the end of the instruction, the value of Xn is written back with -1* the number of bytes remaining to be
copied in the memory copy in total.
For CPYFMWTN, option B (encoded by PSTATE.C = 1), the format of the arguments is:
• Xn holds the number of bytes remaining to be copied in the memory copy in total.
• Xs holds the lowest address that the copy is copied from.
• Xd holds the lowest address that the copy is copied to.
• At the end of the instruction:
◦ the value of Xn is written back with the number of bytes remaining to be copied in the memory
copy in total.
◦ the value of Xs is written back with the lowest address that has not been copied from.
◦ the value of Xd is written back with the lowest address that has not been copied to.
For CPYFEWTN, option A (encoded by PSTATE.C = 0), the format of the arguments is:
• Xn is treated as a signed 64-bit number and holds -1* the number of bytes remaining to be copied in the
memory copy in total.
• Xs holds the lowest address that the copy is copied from -Xn.
• Xd holds the lowest address that the copy is made to -Xn.
• At the end of the instruction, the value of Xn is written back with 0.
For CPYFEWTN, option B (encoded by PSTATE.C = 1), the format of the arguments is:
• Xn holds the number of bytes remaining to be copied in the memory copy in total.
• Xs holds the lowest address that the copy is copied from.
• Xd holds the lowest address that the copy is copied to.
• At the end of the instruction:
◦ the value of Xn is written back with 0.

CPYFPWTN, CPYFMWTN,
Page 178
CPYFEWTN
◦ the value of Xs is written back with the lowest address that has not been copied from.
◦ the value of Xd is written back with the lowest address that has not been copied to.

Integer
(FEAT_MOPS)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sz 0 1 1 0 0 1 op1 0 Rs 1 1 0 1 0 1 Rn Rd
op2

Epilogue (op1 == 10)

CPYFEWTN [<Xd>]!, [<Xs>]!, <Xn>!

Main (op1 == 01)

CPYFMWTN [<Xd>]!, [<Xs>]!, <Xn>!

Prologue (op1 == 00)

CPYFPWTN [<Xd>]!, [<Xs>]!, <Xn>!

if !HaveFeatMOPS() then UNDEFINED;

if sz != '00' then UNDEFINED;

integer d = UInt(Rd);
integer s = UInt(Rs);
integer n = UInt(Rn);
bits(4) options = op2;

MOPSStage stage;
case op1 of
when '00' stage = MOPSStage_Prologue;
when '01' stage = MOPSStage_Main;
when '10' stage = MOPSStage_Epilogue;
otherwise SEE "Memory Copy and Memory Set";

if d == s || s == n || d == n then UNDEFINED;
if d == 31 || s == 31 || n == 31 then UNDEFINED;

Assembler Symbols

CPYFPWTN, CPYFMWTN,
Page 179
CPYFEWTN
Operation

CPYFPWTN, CPYFMWTN,
Page 180
CPYFEWTN
CheckMOPSEnabled();

integer N = MaxBlockSizeCopiedBytes();
bits(64) toaddress = X[d];
bits(64) fromaddress = X[s];
bits(64) cpysize = X[n];
bits(64) stagecpysize;
bits(8*N) readdata;
integer B;

if HaveMTE2Ext() then
SetTagCheckedInstruction(TRUE);

boolean supports_option_a = MemCpyOptionA();

(racctype, wacctype) = MemCpyAccessTypes(options);

if stage == MOPSStage_Prologue then

if cpysize<63> == '1' then cpysize = 0x7FFFFFFFFFFFFFFF<63:0>;

// IMP DEF selection of the amount covered by pre-processing.

stagecpysize = CPYPreSizeChoice(toaddress, fromaddress, cpysize);
assert stagecpysize<63> == cpysize<63> || stagecpysize == Zeros();

if SInt(cpysize) > 0 then

assert SInt(stagecpysize) <= SInt(cpysize);
else
assert SInt(stagecpysize) >= SInt(cpysize);
else
boolean zero_size_exceptions = MemCpyZeroSizeCheck();

// Check if this version is consistent with the state of the call.

bits(64) postsize = CPYPostSizeChoice(toaddress, fromaddress, cpysize);

assert postsize<63> == cpysize<63> || SInt(postsize) == 0;

if stage == MOPSStage_Main then

stagecpysize = cpysize - postsize;

// Check if the parameters to this instruction are valid.

// Check if the parameters to this instruction are valid for the epilogue.

CPYFPWTN, CPYFMWTN,
Page 181
CPYFEWTN
if (cpysize != postsize || MemCpyParametersIllformedE(toaddress, fromaddress, cpysize)) then
boolean wrong_option = FALSE;
boolean from_epilogue = TRUE;
MismatchedMemCpyException(supports_option_a, d, s, n, wrong_option, from_epilogue, options);

readdata<B*8-1:0> = Mem[fromaddress+cpysize, B, racctype];

Mem[toaddress+cpysize, B, wacctype] = readdata<B*8-1:0>;
cpysize = cpysize + B;
stagecpysize = stagecpysize + B;

if stage != MOPSStage_Prologue then

readdata<B*8-1:0> = Mem[fromaddress, B, racctype];

Mem[toaddress, B, wacctype] = readdata<B*8-1:0>;
fromaddress = fromaddress + B;
toaddress = toaddress + B;

cpysize = cpysize - B;
stagecpysize = stagecpysize - B;

if stage != MOPSStage_Prologue then

X[n] = cpysize;
X[d] = toaddress;
X[s] = fromaddress;

if stage == MOPSStage_Prologue then

X[n] = cpysize;
X[d] = toaddress;
X[s] = fromaddress;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

CPYFPWTN, CPYFMWTN,
Page 182
CPYFEWTN
CPYFPWTRN, CPYFMWTRN, CPYFEWTRN

Memory Copy Forward-only, writes unprivileged, reads non-temporal. These instructions perform a memory copy. The
prologue, main, and epilogue instructions are expected to be run in succession and to appear consecutively in
memory: CPYFPWTRN, then CPYFMWTRN, and then CPYFEWTRN.
CPYFPWTRN performs some preconditioning of the arguments suitable for using the CPYFMWTRN instruction, and
performs an IMPLEMENTATION DEFINED amount of the memory copy. CPYFMWTRN performs an IMPLEMENTATION DEFINED
amount of the memory copy. CPYFEWTRN performs the last part of the memory copy.

Note

Portable software should not assume that the choice of algorithm is constant.
After execution of CPYFPWTRN, option A (which results in encoding PSTATE.C = 0):
• If Xn<63> == 1, the copy size is saturated to 0x7FFFFFFFFFFFFFFF.
• Xs holds the original Xs + saturated Xn.
• Xd holds the original Xd + saturated Xn.
• Xn holds -1* saturated Xn + an IMPLEMENTATION DEFINED number of bytes copied.
• PSTATE.{N,Z,V} are set to {0,0,0}.
After execution of CPYFPWTRN, option B (which results in encoding PSTATE.C = 1):
• If Xn<63> == 1, the copy size is saturated to 0x7FFFFFFFFFFFFFFF.
• Xs holds the original Xs + an IMPLEMENTATION DEFINED number of bytes copied.
• Xd holds the original Xd + an IMPLEMENTATION DEFINED number of bytes copied.
• Xn holds the saturated Xn - an IMPLEMENTATION DEFINED number of bytes copied.
• PSTATE.{N,Z,V} are set to {0,0,0}.
For CPYFMWTRN, option A (encoded by PSTATE.C = 0), the format of the arguments is:
• Xn is treated as a signed 64-bit number and holds -1* the number of bytes remaining to be copied in the
memory copy in total.
• Xs holds the lowest address that the copy is copied from -Xn.
• Xd holds the lowest address that the copy is made to -Xn.
• At the end of the instruction, the value of Xn is written back with -1* the number of bytes remaining to be
copied in the memory copy in total.
For CPYFMWTRN, option B (encoded by PSTATE.C = 1), the format of the arguments is:
• Xn holds the number of bytes remaining to be copied in the memory copy in total.
• Xs holds the lowest address that the copy is copied from.
• Xd holds the lowest address that the copy is copied to.
• At the end of the instruction:
◦ the value of Xn is written back with the number of bytes remaining to be copied in the memory
copy in total.
◦ the value of Xs is written back with the lowest address that has not been copied from.
◦ the value of Xd is written back with the lowest address that has not been copied to.
For CPYFEWTRN, option A (encoded by PSTATE.C = 0), the format of the arguments is:
• Xn is treated as a signed 64-bit number and holds -1* the number of bytes remaining to be copied in the
memory copy in total.
• Xs holds the lowest address that the copy is copied from -Xn.
• Xd holds the lowest address that the copy is made to -Xn.
• At the end of the instruction, the value of Xn is written back with 0.
For CPYFEWTRN, option B (encoded by PSTATE.C = 1), the format of the arguments is:
• Xn holds the number of bytes remaining to be copied in the memory copy in total.
• Xs holds the lowest address that the copy is copied from.
• Xd holds the lowest address that the copy is copied to.
• At the end of the instruction:
◦ the value of Xn is written back with 0.

CPYFPWTRN, CPYFMWTRN,
Page 183
CPYFEWTRN
◦ the value of Xs is written back with the lowest address that has not been copied from.
◦ the value of Xd is written back with the lowest address that has not been copied to.

Integer
(FEAT_MOPS)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sz 0 1 1 0 0 1 op1 0 Rs 1 0 0 1 0 1 Rn Rd
op2

Epilogue (op1 == 10)

CPYFEWTRN [<Xd>]!, [<Xs>]!, <Xn>!

Main (op1 == 01)

CPYFMWTRN [<Xd>]!, [<Xs>]!, <Xn>!

Prologue (op1 == 00)

CPYFPWTRN [<Xd>]!, [<Xs>]!, <Xn>!

if !HaveFeatMOPS() then UNDEFINED;

if sz != '00' then UNDEFINED;

integer d = UInt(Rd);
integer s = UInt(Rs);
integer n = UInt(Rn);
bits(4) options = op2;

MOPSStage stage;
case op1 of
when '00' stage = MOPSStage_Prologue;
when '01' stage = MOPSStage_Main;
when '10' stage = MOPSStage_Epilogue;
otherwise SEE "Memory Copy and Memory Set";

if d == s || s == n || d == n then UNDEFINED;
if d == 31 || s == 31 || n == 31 then UNDEFINED;

Assembler Symbols

CPYFPWTRN, CPYFMWTRN,
Page 184
CPYFEWTRN
Operation

CPYFPWTRN, CPYFMWTRN,
Page 185
CPYFEWTRN
CheckMOPSEnabled();

integer N = MaxBlockSizeCopiedBytes();
bits(64) toaddress = X[d];
bits(64) fromaddress = X[s];
bits(64) cpysize = X[n];
bits(64) stagecpysize;
bits(8*N) readdata;
integer B;

if HaveMTE2Ext() then
SetTagCheckedInstruction(TRUE);

boolean supports_option_a = MemCpyOptionA();

(racctype, wacctype) = MemCpyAccessTypes(options);

if stage == MOPSStage_Prologue then

if cpysize<63> == '1' then cpysize = 0x7FFFFFFFFFFFFFFF<63:0>;

// IMP DEF selection of the amount covered by pre-processing.

stagecpysize = CPYPreSizeChoice(toaddress, fromaddress, cpysize);
assert stagecpysize<63> == cpysize<63> || stagecpysize == Zeros();

if SInt(cpysize) > 0 then

assert SInt(stagecpysize) <= SInt(cpysize);
else
assert SInt(stagecpysize) >= SInt(cpysize);
else
boolean zero_size_exceptions = MemCpyZeroSizeCheck();

// Check if this version is consistent with the state of the call.

bits(64) postsize = CPYPostSizeChoice(toaddress, fromaddress, cpysize);

assert postsize<63> == cpysize<63> || SInt(postsize) == 0;

if stage == MOPSStage_Main then

stagecpysize = cpysize - postsize;

// Check if the parameters to this instruction are valid.

// Check if the parameters to this instruction are valid for the epilogue.

CPYFPWTRN, CPYFMWTRN,
Page 186
CPYFEWTRN
if (cpysize != postsize || MemCpyParametersIllformedE(toaddress, fromaddress, cpysize)) then
boolean wrong_option = FALSE;
boolean from_epilogue = TRUE;
MismatchedMemCpyException(supports_option_a, d, s, n, wrong_option, from_epilogue, options);

readdata<B*8-1:0> = Mem[fromaddress+cpysize, B, racctype];

Mem[toaddress+cpysize, B, wacctype] = readdata<B*8-1:0>;
cpysize = cpysize + B;
stagecpysize = stagecpysize + B;

if stage != MOPSStage_Prologue then

readdata<B*8-1:0> = Mem[fromaddress, B, racctype];

Mem[toaddress, B, wacctype] = readdata<B*8-1:0>;
fromaddress = fromaddress + B;
toaddress = toaddress + B;

cpysize = cpysize - B;
stagecpysize = stagecpysize - B;

if stage != MOPSStage_Prologue then

X[n] = cpysize;
X[d] = toaddress;
X[s] = fromaddress;

if stage == MOPSStage_Prologue then

X[n] = cpysize;
X[d] = toaddress;
X[s] = fromaddress;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

CPYFPWTRN, CPYFMWTRN,
Page 187
CPYFEWTRN
CPYFPWTWN, CPYFMWTWN, CPYFEWTWN

Memory Copy Forward-only, writes unprivileged and non-temporal. These instructions perform a memory copy. The
prologue, main, and epilogue instructions are expected to be run in succession and to appear consecutively in
memory: CPYFPWTWN, then CPYFMWTWN, and then CPYFEWTWN.
CPYFPWTWN performs some preconditioning of the arguments suitable for using the CPYFMWTWN instruction, and
performs an IMPLEMENTATION DEFINED amount of the memory copy. CPYFMWTWN performs an IMPLEMENTATION DEFINED
amount of the memory copy. CPYFEWTWN performs the last part of the memory copy.

Note

Portable software should not assume that the choice of algorithm is constant.
After execution of CPYFPWTWN, option A (which results in encoding PSTATE.C = 0):
• If Xn<63> == 1, the copy size is saturated to 0x7FFFFFFFFFFFFFFF.
• Xs holds the original Xs + saturated Xn.
• Xd holds the original Xd + saturated Xn.
• Xn holds -1* saturated Xn + an IMPLEMENTATION DEFINED number of bytes copied.
• PSTATE.{N,Z,V} are set to {0,0,0}.
After execution of CPYFPWTWN, option B (which results in encoding PSTATE.C = 1):
• If Xn<63> == 1, the copy size is saturated to 0x7FFFFFFFFFFFFFFF.
• Xs holds the original Xs + an IMPLEMENTATION DEFINED number of bytes copied.
• Xd holds the original Xd + an IMPLEMENTATION DEFINED number of bytes copied.
• Xn holds the saturated Xn - an IMPLEMENTATION DEFINED number of bytes copied.
• PSTATE.{N,Z,V} are set to {0,0,0}.
For CPYFMWTWN, option A (encoded by PSTATE.C = 0), the format of the arguments is:
• Xn is treated as a signed 64-bit number and holds -1* the number of bytes remaining to be copied in the
memory copy in total.
• Xs holds the lowest address that the copy is copied from -Xn.
• Xd holds the lowest address that the copy is made to -Xn.
• At the end of the instruction, the value of Xn is written back with -1* the number of bytes remaining to be
copied in the memory copy in total.
For CPYFMWTWN, option B (encoded by PSTATE.C = 1), the format of the arguments is:
• Xn holds the number of bytes remaining to be copied in the memory copy in total.
• Xs holds the lowest address that the copy is copied from.
• Xd holds the lowest address that the copy is copied to.
• At the end of the instruction:
◦ the value of Xn is written back with the number of bytes remaining to be copied in the memory
copy in total.
◦ the value of Xs is written back with the lowest address that has not been copied from.
◦ the value of Xd is written back with the lowest address that has not been copied to.
For CPYFEWTWN, option A (encoded by PSTATE.C = 0), the format of the arguments is:
• Xn is treated as a signed 64-bit number and holds -1* the number of bytes remaining to be copied in the
memory copy in total.
• Xs holds the lowest address that the copy is copied from -Xn.
• Xd holds the lowest address that the copy is made to -Xn.
• At the end of the instruction, the value of Xn is written back with 0.
For CPYFEWTWN, option B (encoded by PSTATE.C = 1), the format of the arguments is:
• Xn holds the number of bytes remaining to be copied in the memory copy in total.
• Xs holds the lowest address that the copy is copied from.
• Xd holds the lowest address that the copy is copied to.
• At the end of the instruction:
◦ the value of Xn is written back with 0.

CPYFPWTWN, CPYFMWTWN,
Page 188
CPYFEWTWN
◦ the value of Xs is written back with the lowest address that has not been copied from.
◦ the value of Xd is written back with the lowest address that has not been copied to.

Integer
(FEAT_MOPS)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sz 0 1 1 0 0 1 op1 0 Rs 0 1 0 1 0 1 Rn Rd
op2

Epilogue (op1 == 10)

CPYFEWTWN [<Xd>]!, [<Xs>]!, <Xn>!

Main (op1 == 01)

CPYFMWTWN [<Xd>]!, [<Xs>]!, <Xn>!

Prologue (op1 == 00)

CPYFPWTWN [<Xd>]!, [<Xs>]!, <Xn>!

if !HaveFeatMOPS() then UNDEFINED;

if sz != '00' then UNDEFINED;

integer d = UInt(Rd);
integer s = UInt(Rs);
integer n = UInt(Rn);
bits(4) options = op2;

MOPSStage stage;
case op1 of
when '00' stage = MOPSStage_Prologue;
when '01' stage = MOPSStage_Main;
when '10' stage = MOPSStage_Epilogue;
otherwise SEE "Memory Copy and Memory Set";

if d == s || s == n || d == n then UNDEFINED;
if d == 31 || s == 31 || n == 31 then UNDEFINED;

Assembler Symbols

CPYFPWTWN, CPYFMWTWN,
Page 189
CPYFEWTWN
Operation

CPYFPWTWN, CPYFMWTWN,
Page 190
CPYFEWTWN
CheckMOPSEnabled();

integer N = MaxBlockSizeCopiedBytes();
bits(64) toaddress = X[d];
bits(64) fromaddress = X[s];
bits(64) cpysize = X[n];
bits(64) stagecpysize;
bits(8*N) readdata;
integer B;

if HaveMTE2Ext() then
SetTagCheckedInstruction(TRUE);

boolean supports_option_a = MemCpyOptionA();

(racctype, wacctype) = MemCpyAccessTypes(options);

if stage == MOPSStage_Prologue then

if cpysize<63> == '1' then cpysize = 0x7FFFFFFFFFFFFFFF<63:0>;

// IMP DEF selection of the amount covered by pre-processing.

stagecpysize = CPYPreSizeChoice(toaddress, fromaddress, cpysize);
assert stagecpysize<63> == cpysize<63> || stagecpysize == Zeros();

if SInt(cpysize) > 0 then

assert SInt(stagecpysize) <= SInt(cpysize);
else
assert SInt(stagecpysize) >= SInt(cpysize);
else
boolean zero_size_exceptions = MemCpyZeroSizeCheck();

// Check if this version is consistent with the state of the call.

bits(64) postsize = CPYPostSizeChoice(toaddress, fromaddress, cpysize);

assert postsize<63> == cpysize<63> || SInt(postsize) == 0;

if stage == MOPSStage_Main then

stagecpysize = cpysize - postsize;

// Check if the parameters to this instruction are valid.

// Check if the parameters to this instruction are valid for the epilogue.

CPYFPWTWN, CPYFMWTWN,
Page 191
CPYFEWTWN
if (cpysize != postsize || MemCpyParametersIllformedE(toaddress, fromaddress, cpysize)) then
boolean wrong_option = FALSE;
boolean from_epilogue = TRUE;
MismatchedMemCpyException(supports_option_a, d, s, n, wrong_option, from_epilogue, options);

readdata<B*8-1:0> = Mem[fromaddress+cpysize, B, racctype];

Mem[toaddress+cpysize, B, wacctype] = readdata<B*8-1:0>;
cpysize = cpysize + B;
stagecpysize = stagecpysize + B;

if stage != MOPSStage_Prologue then

readdata<B*8-1:0> = Mem[fromaddress, B, racctype];

Mem[toaddress, B, wacctype] = readdata<B*8-1:0>;
fromaddress = fromaddress + B;
toaddress = toaddress + B;

cpysize = cpysize - B;
stagecpysize = stagecpysize - B;

if stage != MOPSStage_Prologue then

X[n] = cpysize;
X[d] = toaddress;
X[s] = fromaddress;

if stage == MOPSStage_Prologue then

X[n] = cpysize;
X[d] = toaddress;
X[s] = fromaddress;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

CPYFPWTWN, CPYFMWTWN,
Page 192
CPYFEWTWN
CPYP, CPYM, CPYE

Memory Copy. These instructions perform a memory copy. The prologue, main, and epilogue instructions are expected
to be run in succession and to appear consecutively in memory: CPYP, then CPYM, and then CPYE.
CPYP performs some preconditioning of the arguments suitable for using the CPYM instruction, and performs an
IMPLEMENTATION DEFINED amount of the memory copy. CPYM performs an IMPLEMENTATION DEFINED amount of the
memory copy. CPYE performs the last part of the memory copy.

Note

The inclusion of IMPLEMENTATION DEFINED amounts of memory copy allows some optimization of the size that can be
performed.
For CPYP, the following saturation logic is applied:
If Xn<63:55> != 000000000, the copy size Xn is saturated to 0x007FFFFFFFFFFFFF.
After that saturation logic is applied, the direction of the memory copy is based on the following algorithm:
If (Xs > Xd) && (Xd + saturated Xn) > Xs, then direction = forward
Elsif (Xs < Xd) && (Xs + saturated Xn) > Xd, then direction = backward
Else direction = IMPLEMENTATION DEFINED choice between forward and backward.
The architecture supports two algorithms for the memory copy: option A and option B. Which algorithm is used is
IMPLEMENTATION DEFINED.

Note

Portable software should not assume that the choice of algorithm is constant.
After execution of CPYP, option A (which results in encoding PSTATE.C = 0):
• PSTATE.{N,Z,V} are set to {0,0,0}.
• If the copy is in the forward direction, then:
◦ Xs holds the original Xs + saturated Xn.
◦ Xd holds the original Xd + saturated Xn.
◦ Xn holds -1* saturated Xn + an IMPLEMENTATION DEFINED number of bytes copied.
• If the copy is in the backward direction, then:
◦ Xs and Xd are unchanged.
◦ Xn holds the saturated value of Xn - an IMPLEMENTATION DEFINED number of bytes copied.
After execution of CPYP, option B (which results in encoding PSTATE.C = 1):
• If the copy is in the forward direction, then:
◦ Xs holds the original Xs + an IMPLEMENTATION DEFINED number of bytes copied.
◦ Xd holds the original Xd + an IMPLEMENTATION DEFINED number of bytes copied.
◦ Xn holds the saturated Xn - an IMPLEMENTATION DEFINED number of bytes copied.
◦ PSTATE.{N,Z,V} are set to {0,0,0}.
• If the copy is in the backward direction, then:
◦ Xs holds the original Xs + saturated Xn - an IMPLEMENTATION DEFINED number of bytes copied.
◦ Xd holds the original Xd + saturated Xn - an IMPLEMENTATION DEFINED number of bytes copied.
◦ Xn holds the saturated Xn - an IMPLEMENTATION DEFINED number of bytes copied.
◦ PSTATE.{N,Z,V} are set to {1,0,0}.
For CPYM, option A (encoded by PSTATE.C = 0), the format of the arguments is:
• Xn is treated as a signed 64-bit number.
• If the copy is in the forward direction (Xn is a negative number), then:
◦ Xn holds -1* the number of bytes remaining to be copied in the memory copy in total.
◦ Xs holds the lowest address that the copy is copied from -Xn.
◦ Xd holds the lowest address that the copy is copied to -Xn
◦ At the end of the instruction, the value of Xn is written back with -1* the number of bytes remaining
to be copied in the memory copy in total.
• If the copy is in the backward direction (Xn is a positive number), then:
◦ Xn holds the number of bytes remaining to be copied in the memory copy in total.
◦ Xs holds the highest address that the copy is copied from -Xn+1.
◦ Xd holds the highest address that the copy is copied to -Xn+1.
◦ At the end of the instruction, the value of Xn is written back with the number of bytes remaining to
be copied in the memory copy in total.
For CPYM, option B (encoded by PSTATE.C = 1), the format of the arguments is:
• Xn holds the number of bytes to be copied in the memory copy in total.
• If the copy is in the forward direction (PSTATE.N == 0), then:

CPYP, CPYM, CPYE Page 193

◦ Xs holds the lowest address that the copy is copied from.
◦ Xd holds the lowest address that the copy is copied to.
◦ At the end of the instruction:
▪ the value of Xn is written back with the number of bytes remaining to be copied in the
memory copy in total.
▪ the value of Xs is written back with the lowest address that has not been copied from.
▪ the value of Xd is written back with the lowest address that has not been copied to.
• If the copy is in the backward direction (PSTATE.N == 1), then:
◦ Xs holds the highest address that the copy is copied from +1.
◦ Xd holds the highest address that the copy is copied to +1.
◦ At the end of the instruction:
▪ the value of Xn is written back with the number of bytes remaining to be copied in the
memory copy in total.
▪ the value of Xs is written back with the highest address that has not been copied from +1.
▪ the value of Xd is written back with the highest address that has not been copied to +1.
For CPYE, option A (encoded by PSTATE.C = 0), the format of the arguments is:
• Xn is treated as a signed 64-bit number.
• If the copy is in the forward direction (Xn is a negative number), then:
◦ Xn holds -1* the number of bytes remaining to be copied in the memory copy in total.
◦ Xs holds the lowest address that the copy is copied from.
◦ Xd holds the lowest address that the copy is made to.
◦ At the end of the instruction, the value of Xn is written back with 0.
• If the copy is in the backward direction (Xn is a positive number), then:
◦ Xn holds the number of bytes remaining to be copied in the memory copy in total.
◦ Xs holds the highest address that the copy is copied from -Xn+1.
◦ Xd holds the highest address that the copy is copied to -Xn+1.
◦ At the end of the instruction, the value of Xn is written back with 0.
For CPYE, option B (encoded by PSTATE.C = 1), the format of the arguments is:
• Xn holds the number of bytes to be copied in the memory copy in total
• If the copy is in the forward direction (PSTATE.N == 0), then:
◦ Xs holds the lowest address that the copy is copied from.
◦ Xd holds the lowest address that the copy is copied to.
◦ At the end of the instruction:
▪ the value of Xn is written back with 0.
▪ the value of Xs is written back with the lowest address that has not been copied from.
▪ the value of Xd is written back with the lowest address that has not been copied to.
• If the copy is in the backward direction (PSTATE.N == 1), then:
◦ Xs holds the highest address that the copy is copied from.
◦ Xd holds the highest address that the copy is copied to.
◦ At the end of the instruction:
▪ the value of Xn is written back with 0.
▪ the value of Xs is written back with the highest address that has not been copied from +1.
▪ the value of Xd is written back with the highest address that has not been copied to +1.

Integer
(FEAT_MOPS)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sz 0 1 1 1 0 1 op1 0 Rs 0 0 0 0 0 1 Rn Rd
op2

CPYP, CPYM, CPYE Page 194

Epilogue (op1 == 10)

CPYE [<Xd>]!, [<Xs>]!, <Xn>!

Main (op1 == 01)

CPYM [<Xd>]!, [<Xs>]!, <Xn>!

Prologue (op1 == 00)

CPYP [<Xd>]!, [<Xs>]!, <Xn>!

if !HaveFeatMOPS() then UNDEFINED;

if sz != '00' then UNDEFINED;

integer d = UInt(Rd);
integer s = UInt(Rs);
integer n = UInt(Rn);
bits(4) options = op2;

MOPSStage stage;
case op1 of
when '00' stage = MOPSStage_Prologue;
when '01' stage = MOPSStage_Main;
when '10' stage = MOPSStage_Epilogue;
otherwise SEE "Memory Copy and Memory Set";

if d == s || s == n || d == n then UNDEFINED;
if d == 31 || s == 31 || n == 31 then UNDEFINED;

Assembler Symbols

CPYP, CPYM, CPYE Page 195

Operation

CPYP, CPYM, CPYE Page 196

CheckMOPSEnabled();

integer N = MaxBlockSizeCopiedBytes();
bits(64) toaddress = X[d];
bits(64) fromaddress = X[s];
bits(64) cpysize = X[n];
bits(64) stagecpysize;
bits(8*N) readdata;
integer B;

if HaveMTE2Ext() then
SetTagCheckedInstruction(TRUE);

boolean supports_option_a = MemCpyOptionA();

(racctype, wacctype) = MemCpyAccessTypes(options);

if stage == MOPSStage_Prologue then

if cpysize<63:55> != '000000000' then cpysize = 0x007FFFFFFFFFFFFF<63:0>;

boolean forward;
if ((UInt(fromaddress<55:0>) > UInt(toaddress<55:0>)) && (UInt(fromaddress<55:0>) < UInt(toaddress<55
forward = TRUE;
elsif ((UInt(fromaddress<55:0>) < UInt(toaddress<55:0>)) && (UInt(fromaddress<55:0> + cpysize<55:0>)
forward = FALSE;
else
forward = MemCpyDirectionChoice(fromaddress, toaddress, cpysize);

if supports_option_a then
PSTATE.C = '0';
PSTATE.N = '0';
if forward then
// Copy in the forward direction offsets the arguments.
toaddress = toaddress + cpysize;
fromaddress = fromaddress + cpysize;
cpysize = Zeros(64) - cpysize;
else
PSTATE.C = '1';
if !forward then
// Copy in the reverse direction offsets the arguments.
toaddress = toaddress + cpysize;
fromaddress = fromaddress + cpysize;
PSTATE.N = '1';
else
PSTATE.N = '0';
PSTATE.V = '0';
PSTATE.Z = '0';

// IMP DEF selection of the amount covered by pre-processing.

stagecpysize = CPYPreSizeChoice(toaddress, fromaddress, cpysize);
assert stagecpysize<63> == cpysize<63> || stagecpysize == Zeros();

if SInt(cpysize) > 0 then

assert SInt(stagecpysize) <= SInt(cpysize);
else
assert SInt(stagecpysize) >= SInt(cpysize);
else
boolean zero_size_exceptions = MemCpyZeroSizeCheck();

// Check if this version is consistent with the state of the call.

CPYP, CPYM, CPYE Page 197

bits(64) postsize = CPYPostSizeChoice(toaddress, fromaddress, cpysize);
assert postsize<63> == cpysize<63> || SInt(postsize) == 0;

if stage == MOPSStage_Main then

stagecpysize = cpysize - postsize;

// Check if the parameters to this instruction are valid.

// Check if the parameters to the epilogue are valid.

if SInt(cpysize) < 0 then

assert B <= -1 * SInt(stagecpysize);

readdata<B*8-1:0> = Mem[fromaddress+cpysize, B, racctype];

Mem[toaddress+cpysize, B, wacctype] = readdata<B*8-1:0>;
cpysize = cpysize + B;
stagecpysize = stagecpysize + B;
else
assert B <= SInt(stagecpysize);

cpysize = cpysize - B;
stagecpysize = stagecpysize - B;
readdata<B*8-1:0> = Mem[fromaddress+cpysize, B, racctype];
Mem[toaddress+cpysize, B, wacctype] = readdata<B*8-1:0>;

if stage != MOPSStage_Prologue then

if PSTATE.N == '0' then

readdata<B*8-1:0> = Mem[fromaddress, B, racctype];
Mem[toaddress, B, wacctype] = readdata<B*8-1:0>;
fromaddress = fromaddress + B;
toaddress = toaddress + B;
else
readdata<B*8-1:0> = Mem[fromaddress-B, B, racctype];
Mem[toaddress-B, B, wacctype] = readdata<B*8-1:0>;
fromaddress = fromaddress - B;
toaddress = toaddress - B;

cpysize = cpysize - B;
stagecpysize = stagecpysize - B;

if stage != MOPSStage_Prologue then

X[n] = cpysize;
X[d] = toaddress;
X[s] = fromaddress;

CPYP, CPYM, CPYE Page 198

if stage == MOPSStage_Prologue then
X[n] = cpysize;
X[d] = toaddress;
X[s] = fromaddress;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

CPYP, CPYM, CPYE Page 199

CPYPN, CPYMN, CPYEN

Memory Copy, reads and writes non-temporal. These instructions perform a memory copy. The prologue, main, and
epilogue instructions are expected to be run in succession and to appear consecutively in memory: CPYPN, then
CPYMN, and then CPYEN.
CPYPN performs some preconditioning of the arguments suitable for using the CPYMN instruction, and performs an
IMPLEMENTATION DEFINED amount of the memory copy. CPYMN performs an IMPLEMENTATION DEFINED amount of the
memory copy. CPYEN performs the last part of the memory copy.

Note

The inclusion of IMPLEMENTATION DEFINED amounts of memory copy allows some optimization of the size that can be
performed.
For CPYPN, the following saturation logic is applied:
If Xn<63:55> != 000000000, the copy size Xn is saturated to 0x007FFFFFFFFFFFFF.
After that saturation logic is applied, the direction of the memory copy is based on the following algorithm:
If (Xs > Xd) && (Xd + saturated Xn) > Xs, then direction = forward
Elsif (Xs < Xd) && (Xs + saturated Xn) > Xd, then direction = backward
Else direction = IMPLEMENTATION DEFINED choice between forward and backward.
The architecture supports two algorithms for the memory copy: option A and option B. Which algorithm is used is
IMPLEMENTATION DEFINED.

Note

Portable software should not assume that the choice of algorithm is constant.
After execution of CPYPN, option A (which results in encoding PSTATE.C = 0):
• PSTATE.{N,Z,V} are set to {0,0,0}.
• If the copy is in the forward direction, then:
◦ Xs holds the original Xs + saturated Xn.
◦ Xd holds the original Xd + saturated Xn.
◦ Xn holds -1* saturated Xn + an IMPLEMENTATION DEFINED number of bytes copied.
• If the copy is in the backward direction, then:
◦ Xs and Xd are unchanged.
◦ Xn holds the saturated value of Xn - an IMPLEMENTATION DEFINED number of bytes copied.
After execution of CPYPN, option B (which results in encoding PSTATE.C = 1):
• If the copy is in the forward direction, then:
◦ Xs holds the original Xs + an IMPLEMENTATION DEFINED number of bytes copied.
◦ Xd holds the original Xd + an IMPLEMENTATION DEFINED number of bytes copied.
◦ Xn holds the saturated Xn - an IMPLEMENTATION DEFINED number of bytes copied.
◦ PSTATE.{N,Z,V} are set to {0,0,0}.
• If the copy is in the backward direction, then:
◦ Xs holds the original Xs + saturated Xn - an IMPLEMENTATION DEFINED number of bytes copied.
◦ Xd holds the original Xd + saturated Xn - an IMPLEMENTATION DEFINED number of bytes copied.
◦ Xn holds the saturated Xn - an IMPLEMENTATION DEFINED number of bytes copied.
◦ PSTATE.{N,Z,V} are set to {1,0,0}.
For CPYMN, option A (encoded by PSTATE.C = 0), the format of the arguments is:
• Xn is treated as a signed 64-bit number.
• If the copy is in the forward direction (Xn is a negative number), then:
◦ Xn holds -1* the number of bytes remaining to be copied in the memory copy in total.
◦ Xs holds the lowest address that the copy is copied from -Xn.
◦ Xd holds the lowest address that the copy is made to -Xn.
◦ At the end of the instruction, the value of Xn is written back with -1* the number of bytes remaining
to be copied in the memory copy in total.
• If the copy is in the backward direction (Xn is a positive number), then:
◦ Xn holds the number of bytes remaining to be copied in the memory copy in total.
◦ Xs holds the highest address that the copy is copied from -Xn+1.
◦ Xd holds the highest address that the copy is copied to -Xn+1.
◦ At the end of the instruction, the value of Xn is written back with the number of bytes remaining to
be copied in the memory copy in total.
For CPYMN, option B (encoded by PSTATE.C = 1), the format of the arguments is:
• Xn holds the number of bytes to be copied in the memory copy in total.

CPYPN, CPYMN, CPYEN Page 200

• If the copy is in the forward direction (PSTATE.N == 0), then:
◦ Xs holds the lowest address that the copy is copied from.
◦ Xd holds the lowest address that the copy is copied to.
◦ At the end of the instruction:
▪ the value of Xn is written back with the number of bytes remaining to be copied in the
memory copy in total.
▪ the value of Xs is written back with the lowest address that has not been copied from.
▪ the value of Xd is written back with the lowest address that has not been copied to.
• If the copy is in the backward direction (PSTATE.N == 1), then:
◦ Xs holds the highest address that the copy is copied from +1.
◦ Xd holds the highest address that the copy is copied to +1.
◦ At the end of the instruction:
▪ the value of Xn is written back with the number of bytes remaining to be copied in the
memory copy in total.
▪ the value of Xs is written back with the highest address that has not been copied from +1.
▪ the value of Xd is written back with the highest address that has not been copied to +1.
For CPYEN, option A (encoded by PSTATE.C = 0), the format of the arguments is:
• Xn is treated as a signed 64-bit number.
• If the copy is in the forward direction (Xn is a negative number), then:
◦ Xn holds -1* the number of bytes remaining to be copied in the memory copy in total.
◦ Xs holds the lowest address that the copy is copied from -Xn.
◦ Xd holds the lowest address that the copy is made to -Xn.
◦ At the end of the instruction, the value of Xn is written back with 0.
• If the copy is in the backward direction (Xn is a positive number), then:
◦ Xn holds the number of bytes remaining to be copied in the memory copy in total.
◦ Xs holds the highest address that the copy is copied from -Xn+1.
◦ Xd holds the highest address that the copy is copied to -Xn+1
◦ At the end of the instruction, the value of Xn is written back with 0.
For CPYEN, option B (encoded by PSTATE.C = 1), the format of the arguments is:
• Xn holds the number of bytes to be copied in the memory copy in total
• If the copy is in the forward direction (PSTATE.N == 0), then:
◦ Xs holds the lowest address that the copy is copied from.
◦ Xd holds the lowest address that the copy is copied to.
◦ At the end of the instruction:
▪ the value of Xn is written back with 0.
▪ the value of Xs is written back with the lowest address that has not been copied from.
▪ the value of Xd is written back with the lowest address that has not been copied to.
• If the copy is in the backward direction (PSTATE.N == 1), then:
◦ Xs holds the highest address that the copy is copied from +1.
◦ Xd holds the highest address that the copy is copied to +1.
◦ At the end of the instruction:
▪ the value of Xn is written back with 0.
▪ the value of Xs is written back with the highest address that has not been copied from +1.
▪ the value of Xd is written back with the highest address that has not been copied to +1.

Integer
(FEAT_MOPS)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sz 0 1 1 1 0 1 op1 0 Rs 1 1 0 0 0 1 Rn Rd
op2

CPYPN, CPYMN, CPYEN Page 201

Epilogue (op1 == 10)

CPYEN [<Xd>]!, [<Xs>]!, <Xn>!

Main (op1 == 01)

CPYMN [<Xd>]!, [<Xs>]!, <Xn>!

Prologue (op1 == 00)

CPYPN [<Xd>]!, [<Xs>]!, <Xn>!

if !HaveFeatMOPS() then UNDEFINED;

if sz != '00' then UNDEFINED;

integer d = UInt(Rd);
integer s = UInt(Rs);
integer n = UInt(Rn);
bits(4) options = op2;

MOPSStage stage;
case op1 of
when '00' stage = MOPSStage_Prologue;
when '01' stage = MOPSStage_Main;
when '10' stage = MOPSStage_Epilogue;
otherwise SEE "Memory Copy and Memory Set";

if d == s || s == n || d == n then UNDEFINED;
if d == 31 || s == 31 || n == 31 then UNDEFINED;

Assembler Symbols

CPYPN, CPYMN, CPYEN Page 202

Operation

CPYPN, CPYMN, CPYEN Page 203

CheckMOPSEnabled();

integer N = MaxBlockSizeCopiedBytes();
bits(64) toaddress = X[d];
bits(64) fromaddress = X[s];
bits(64) cpysize = X[n];
bits(64) stagecpysize;
bits(8*N) readdata;
integer B;

if HaveMTE2Ext() then
SetTagCheckedInstruction(TRUE);

boolean supports_option_a = MemCpyOptionA();

(racctype, wacctype) = MemCpyAccessTypes(options);

if stage == MOPSStage_Prologue then

if cpysize<63:55> != '000000000' then cpysize = 0x007FFFFFFFFFFFFF<63:0>;

// IMP DEF selection of the amount covered by pre-processing.

stagecpysize = CPYPreSizeChoice(toaddress, fromaddress, cpysize);
assert stagecpysize<63> == cpysize<63> || stagecpysize == Zeros();

if SInt(cpysize) > 0 then

assert SInt(stagecpysize) <= SInt(cpysize);
else
assert SInt(stagecpysize) >= SInt(cpysize);
else
boolean zero_size_exceptions = MemCpyZeroSizeCheck();

// Check if this version is consistent with the state of the call.

CPYPN, CPYMN, CPYEN Page 204

bits(64) postsize = CPYPostSizeChoice(toaddress, fromaddress, cpysize);
assert postsize<63> == cpysize<63> || SInt(postsize) == 0;

if stage == MOPSStage_Main then

stagecpysize = cpysize - postsize;

// Check if the parameters to this instruction are valid.

// Check if the parameters to the epilogue are valid.

if SInt(cpysize) < 0 then

assert B <= -1 * SInt(stagecpysize);

readdata<B*8-1:0> = Mem[fromaddress+cpysize, B, racctype];

Mem[toaddress+cpysize, B, wacctype] = readdata<B*8-1:0>;
cpysize = cpysize + B;
stagecpysize = stagecpysize + B;
else
assert B <= SInt(stagecpysize);

cpysize = cpysize - B;
stagecpysize = stagecpysize - B;
readdata<B*8-1:0> = Mem[fromaddress+cpysize, B, racctype];
Mem[toaddress+cpysize, B, wacctype] = readdata<B*8-1:0>;

if stage != MOPSStage_Prologue then

if PSTATE.N == '0' then

cpysize = cpysize - B;
stagecpysize = stagecpysize - B;

if stage != MOPSStage_Prologue then

X[n] = cpysize;
X[d] = toaddress;
X[s] = fromaddress;

CPYPN, CPYMN, CPYEN Page 205

if stage == MOPSStage_Prologue then
X[n] = cpysize;
X[d] = toaddress;
X[s] = fromaddress;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

CPYPN, CPYMN, CPYEN Page 206

CPYPRN, CPYMRN, CPYERN

Memory Copy, reads non-temporal. These instructions perform a memory copy. The prologue, main, and epilogue
instructions are expected to be run in succession and to appear consecutively in memory: CPYPRN, then CPYMRN, and
then CPYERN.
CPYPRN performs some preconditioning of the arguments suitable for using the CPYMRN instruction, and performs
an IMPLEMENTATION DEFINED amount of the memory copy. CPYMRN performs an IMPLEMENTATION DEFINED amount of the
memory copy. CPYERN performs the last part of the memory copy.

Note

The inclusion of IMPLEMENTATION DEFINED amounts of memory copy allows some optimization of the size that can be
performed.
For CPYPRN, the following saturation logic is applied:
If Xn<63:55> != 000000000, the copy size Xn is saturated to 0x007FFFFFFFFFFFFF.
After that saturation logic is applied, the direction of the memory copy is based on the following algorithm:
If (Xs > Xd) && (Xd + saturated Xn) > Xs, then direction = forward
Elsif (Xs < Xd) && (Xs + saturated Xn) > Xd, then direction = backward
Else direction = IMPLEMENTATION DEFINED choice between forward and backward.
The architecture supports two algorithms for the memory copy: option A and option B. Which algorithm is used is
IMPLEMENTATION DEFINED.

Note

Portable software should not assume that the choice of algorithm is constant.
After execution of CPYPRN, option A (which results in encoding PSTATE.C = 0):
• PSTATE.{N,Z,V} are set to {0,0,0}.
• If the copy is in the forward direction, then:
◦ Xs holds the original Xs + saturated Xn.
◦ Xd holds the original Xd + saturated Xn.
◦ Xn holds -1* saturated Xn + an IMPLEMENTATION DEFINED number of bytes copied.
• If the copy is in the backward direction, then:
◦ Xs and Xd are unchanged.
◦ Xn holds the saturated value of Xn - an IMPLEMENTATION DEFINED number of bytes copied.
After execution of CPYPRN, option B (which results in encoding PSTATE.C = 1):
• If the copy is in the forward direction, then:
◦ Xs holds the original Xs + an IMPLEMENTATION DEFINED number of bytes copied.
◦ Xd holds the original Xd + an IMPLEMENTATION DEFINED number of bytes copied.
◦ Xn holds the saturated Xn - an IMPLEMENTATION DEFINED number of bytes copied.
◦ PSTATE.{N,Z,V} are set to {0,0,0}.
• If the copy is in the backward direction, then:
◦ Xs holds the original Xs + saturated Xn - an IMPLEMENTATION DEFINED number of bytes copied.
◦ Xd holds the original Xd + saturated Xn - an IMPLEMENTATION DEFINED number of bytes copied.
◦ Xn holds the saturated Xn - an IMPLEMENTATION DEFINED number of bytes copied.
◦ PSTATE.{N,Z,V} are set to {1,0,0}.
For CPYMRN, option A (encoded by PSTATE.C = 0), the format of the arguments is:
• Xn is treated as a signed 64-bit number.
• If the copy is in the forward direction (Xn is a negative number), then:
◦ Xn holds -1* the number of bytes remaining to be copied in the memory copy in total.
◦ Xs holds the lowest address that the copy is copied from -Xn.
◦ Xd holds the lowest address that the copy is made to -Xn.
◦ At the end of the instruction, the value of Xn is written back with -1* the number of bytes remaining
to be copied in the memory copy in total.
• If the copy is in the backward direction (Xn is a positive number), then:
◦ Xn holds the number of bytes remaining to be copied in the memory copy in total.
◦ Xs holds the highest address that the copy is copied from -Xn+1.
◦ Xd holds the highest address that the copy is copied to -Xn+1.
◦ At the end of the instruction, the value of Xn is written back with the number of bytes remaining to
be copied in the memory copy in total.
For CPYMRN, option B (encoded by PSTATE.C = 1), the format of the arguments is:
• Xn holds the number of bytes to be copied in the memory copy in total.

CPYPRN, CPYMRN, CPYERN Page 207

• If the copy is in the forward direction (PSTATE.N == 0), then:
◦ Xs holds the lowest address that the copy is copied from.
◦ Xd holds the lowest address that the copy is copied to.
◦ At the end of the instruction:
▪ the value of Xn is written back with the number of bytes remaining to be copied in the
memory copy in total.
▪ the value of Xs is written back with the lowest address that has not been copied from.
▪ the value of Xd is written back with the lowest address that has not been copied to.
• If the copy is in the backward direction (PSTATE.N == 1), then:
◦ Xs holds the highest address that the copy is copied from +1.
◦ Xd holds the highest address that the copy is copied to +1.
◦ At the end of the instruction:
▪ the value of Xn is written back with the number of bytes remaining to be copied in the
memory copy in total.
▪ the value of Xs is written back with the highest address that has not been copied from +1.
▪ the value of Xd is written back with the highest address that has not been copied to +1.
For CPYERN, option A (encoded by PSTATE.C = 0), the format of the arguments is:
• Xn is treated as a signed 64-bit number.
• If the copy is in the forward direction (Xn is a negative number), then:
◦ Xn holds -1* the number of bytes remaining to be copied in the memory copy in total.
◦ Xs holds the lowest address that the copy is copied from -Xn.
◦ Xd holds the lowest address that the copy is made to -Xn.
◦ At the end of the instruction, the value of Xn is written back with 0.
• If the copy is in the backward direction (Xn is a positive number), then:
◦ Xn holds the number of bytes remaining to be copied in the memory copy in total.
◦ Xs holds the highest address that the copy is copied from -Xn+1.
◦ Xd holds the highest address that the copy is copied to -Xn+1.
◦ At the end of the instruction, the value of Xn is written back with 0.
For CPYERN, option B (encoded by PSTATE.C = 1), the format of the arguments is:
• Xn holds the number of bytes to be copied in the memory copy in total
• If the copy is in the forward direction (PSTATE.N == 0), then:
◦ Xs holds the lowest address that the copy is copied from.
◦ Xd holds the lowest address that the copy is copied to.
◦ At the end of the instruction:
▪ the value of Xn is written back with 0.
▪ the value of Xs is written back with the lowest address that has not been copied from.
▪ the value of Xd is written back with the lowest address that has not been copied to.
• If the copy is in the backward direction (PSTATE.N == 1), then:
◦ Xs holds the highest address that the copy is copied from +1.
◦ Xd holds the highest address that the copy is copied to +1.
◦ At the end of the instruction:
▪ the value of Xn is written back with 0.
▪ the value of Xs is written back with the highest address that has not been copied from +1.
▪ the value of Xd is written back with the highest address that has not been copied to +1.

Integer
(FEAT_MOPS)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sz 0 1 1 1 0 1 op1 0 Rs 1 0 0 0 0 1 Rn Rd
op2

CPYPRN, CPYMRN, CPYERN Page 208

Epilogue (op1 == 10)

CPYERN [<Xd>]!, [<Xs>]!, <Xn>!

Main (op1 == 01)

CPYMRN [<Xd>]!, [<Xs>]!, <Xn>!

Prologue (op1 == 00)

CPYPRN [<Xd>]!, [<Xs>]!, <Xn>!

if !HaveFeatMOPS() then UNDEFINED;

if sz != '00' then UNDEFINED;

integer d = UInt(Rd);
integer s = UInt(Rs);
integer n = UInt(Rn);
bits(4) options = op2;

MOPSStage stage;
case op1 of
when '00' stage = MOPSStage_Prologue;
when '01' stage = MOPSStage_Main;
when '10' stage = MOPSStage_Epilogue;
otherwise SEE "Memory Copy and Memory Set";

if d == s || s == n || d == n then UNDEFINED;
if d == 31 || s == 31 || n == 31 then UNDEFINED;

Assembler Symbols

CPYPRN, CPYMRN, CPYERN Page 209

Operation

CPYPRN, CPYMRN, CPYERN Page 210

CheckMOPSEnabled();

integer N = MaxBlockSizeCopiedBytes();
bits(64) toaddress = X[d];
bits(64) fromaddress = X[s];
bits(64) cpysize = X[n];
bits(64) stagecpysize;
bits(8*N) readdata;
integer B;

if HaveMTE2Ext() then
SetTagCheckedInstruction(TRUE);

boolean supports_option_a = MemCpyOptionA();

(racctype, wacctype) = MemCpyAccessTypes(options);

if stage == MOPSStage_Prologue then

if cpysize<63:55> != '000000000' then cpysize = 0x007FFFFFFFFFFFFF<63:0>;

// IMP DEF selection of the amount covered by pre-processing.

stagecpysize = CPYPreSizeChoice(toaddress, fromaddress, cpysize);
assert stagecpysize<63> == cpysize<63> || stagecpysize == Zeros();

if SInt(cpysize) > 0 then

assert SInt(stagecpysize) <= SInt(cpysize);
else
assert SInt(stagecpysize) >= SInt(cpysize);
else
boolean zero_size_exceptions = MemCpyZeroSizeCheck();

// Check if this version is consistent with the state of the call.

CPYPRN, CPYMRN, CPYERN Page 211

bits(64) postsize = CPYPostSizeChoice(toaddress, fromaddress, cpysize);
assert postsize<63> == cpysize<63> || SInt(postsize) == 0;

if stage == MOPSStage_Main then

stagecpysize = cpysize - postsize;

// Check if the parameters to this instruction are valid.

// Check if the parameters to the epilogue are valid.

if SInt(cpysize) < 0 then

assert B <= -1 * SInt(stagecpysize);

readdata<B*8-1:0> = Mem[fromaddress+cpysize, B, racctype];

Mem[toaddress+cpysize, B, wacctype] = readdata<B*8-1:0>;
cpysize = cpysize + B;
stagecpysize = stagecpysize + B;
else
assert B <= SInt(stagecpysize);

cpysize = cpysize - B;
stagecpysize = stagecpysize - B;
readdata<B*8-1:0> = Mem[fromaddress+cpysize, B, racctype];
Mem[toaddress+cpysize, B, wacctype] = readdata<B*8-1:0>;

if stage != MOPSStage_Prologue then

if PSTATE.N == '0' then

cpysize = cpysize - B;
stagecpysize = stagecpysize - B;

if stage != MOPSStage_Prologue then

X[n] = cpysize;
X[d] = toaddress;
X[s] = fromaddress;

CPYPRN, CPYMRN, CPYERN Page 212

if stage == MOPSStage_Prologue then
X[n] = cpysize;
X[d] = toaddress;
X[s] = fromaddress;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

CPYPRN, CPYMRN, CPYERN Page 213

CPYPRT, CPYMRT, CPYERT

Memory Copy, reads unprivileged. These instructions perform a memory copy. The prologue, main, and epilogue
instructions are expected to be run in succession and to appear consecutively in memory: CPYPRT, then CPYMRT, and
then CPYERT.
CPYPRT performs some preconditioning of the arguments suitable for using the CPYMRT instruction, and performs an
IMPLEMENTATION DEFINED amount of the memory copy. CPYMRT performs an IMPLEMENTATION DEFINED amount of the
memory copy. CPYERT performs the last part of the memory copy.

Note

The inclusion of IMPLEMENTATION DEFINED amounts of memory copy allows some optimization of the size that can be
performed.
For CPYPRT, the following saturation logic is applied:
If Xn<63:55> != 000000000, the copy size Xn is saturated to 0x007FFFFFFFFFFFFF.
After that saturation logic is applied, the direction of the memory copy is based on the following algorithm:
If (Xs > Xd) && (Xd + saturated Xn) > Xs, then direction = forward
Elsif (Xs < Xd) && (Xs + saturated Xn) > Xd, then direction = backward
Else direction = IMPLEMENTATION DEFINED choice between forward and backward.
The architecture supports two algorithms for the memory copy: option A and option B. Which algorithm is used is
IMPLEMENTATION DEFINED.

Note

Portable software should not assume that the choice of algorithm is constant.
After execution of CPYPRT, option A (which results in encoding PSTATE.C = 0):
• PSTATE.{N,Z,V} are set to {0,0,0}.
• If the copy is in the forward direction, then:
◦ Xs holds the original Xs + saturated Xn.
◦ Xd holds the original Xd + saturated Xn.
◦ Xn holds -1* saturated Xn + an IMPLEMENTATION DEFINED number of bytes copied.
• If the copy is in the backward direction, then:
◦ Xs and Xd are unchanged.
◦ Xn holds the saturated value of Xn - an IMPLEMENTATION DEFINED number of bytes copied.
After execution of CPYPRT, option B (which results in encoding PSTATE.C = 1):
• If the copy is in the forward direction, then:
◦ Xs holds the original Xs + an IMPLEMENTATION DEFINED number of bytes copied.
◦ Xd holds the original Xd + an IMPLEMENTATION DEFINED number of bytes copied.
◦ Xn holds the saturated Xn - an IMPLEMENTATION DEFINED number of bytes copied.
◦ PSTATE.{N,Z,V} are set to {0,0,0}.
• If the copy is in the backward direction, then:
◦ Xs holds the original Xs + saturated Xn - an IMPLEMENTATION DEFINED number of bytes copied.
◦ Xd holds the original Xd + saturated Xn - an IMPLEMENTATION DEFINED number of bytes copied.
◦ Xn holds the saturated Xn - an IMPLEMENTATION DEFINED number of bytes copied.
◦ PSTATE.{N,Z,V} are set to {1,0,0}.
For CPYMRT, option A (encoded by PSTATE.C = 0), the format of the arguments is:
• Xn is treated as a signed 64-bit number.
• If the copy is in the forward direction (Xn is a negative number), then:
◦ Xn holds -1* the number of bytes remaining to be copied in the memory copy in total.
◦ Xs holds the lowest address that the copy is copied from -Xn.
◦ Xd holds the lowest address that the copy is made to -Xn.
◦ At the end of the instruction, the value of Xn is written back with -1* the number of bytes remaining
to be copied in the memory copy in total.
• If the copy is in the backward direction (Xn is a positive number), then:
◦ Xn holds the number of bytes remaining to be copied in the memory copy in total.
◦ Xs holds the highest address that the copy is copied from -Xn+1.
◦ Xd holds the highest address that the copy is copied to -Xn+1.
◦ At the end of the instruction, the value of Xn is written back with the number of bytes remaining to
be copied in the memory copy in total.
For CPYMRT, option B (encoded by PSTATE.C = 1), the format of the arguments is:
• Xn holds the number of bytes to be copied in the memory copy in total.

CPYPRT, CPYMRT, CPYERT Page 214

• If the copy is in the forward direction (PSTATE.N == 0), then:
◦ Xs holds the lowest address that the copy is copied from.
◦ Xd holds the lowest address that the copy is copied to.
◦ At the end of the instruction:
▪ the value of Xn is written back with the number of bytes remaining to be copied in the
memory copy in total.
▪ the value of Xs is written back with the lowest address that has not been copied from.
▪ the value of Xd is written back with the lowest address that has not been copied to.
• If the copy is in the backward direction (PSTATE.N == 1), then:
◦ Xs holds the highest address that the copy is copied from +1.
◦ Xd holds the highest address that the copy is copied to +1.
◦ At the end of the instruction:
▪ the value of Xn is written back with the number of bytes remaining to be copied in the
memory copy in total.
▪ the value of Xs is written back with the highest address that has not been copied from +1.
▪ the value of Xd is written back with the highest address that has not been copied to +1.
For CPYERT, option A (encoded by PSTATE.C = 0), the format of the arguments is:
• Xn is treated as a signed 64-bit number.
• If the copy is in the forward direction (Xn is a negative number), then:
◦ Xn holds -1* the number of bytes remaining to be copied in the memory copy in total.
◦ Xs holds the lowest address that the copy is copied from -Xn.
◦ Xd holds the lowest address that the copy is made to -Xn.
◦ At the end of the instruction, the value of Xn is written back with 0.
• If the copy is in the backward direction (Xn is a positive number), then:
◦ Xn holds the number of bytes remaining to be copied in the memory copy in total.
◦ Xs holds the highest address that the copy is copied from -Xn+1.
◦ Xd holds the highest address that the copy is copied to -Xn+1.
◦ At the end of the instruction, the value of Xn is written back with 0.
For CPYERT, option B (encoded by PSTATE.C = 1), the format of the arguments is:
• Xn holds the number of bytes to be copied in the memory copy in total
• If the copy is in the forward direction (PSTATE.N == 0), then:
◦ Xs holds the lowest address that the copy is copied from.
◦ Xd holds the lowest address that the copy is copied to.
◦ At the end of the instruction:
▪ the value of Xn is written back with 0.
▪ the value of Xs is written back with the lowest address that has not been copied from.
▪ the value of Xd is written back with the lowest address that has not been copied to.
• If the copy is in the backward direction (PSTATE.N == 1), then:
◦ Xs holds the highest address that the copy is copied from +1.
◦ Xd holds the highest address that the copy is copied to +1.
◦ At the end of the instruction:
▪ the value of Xn is written back with 0.
▪ the value of Xs is written back with the highest address that has not been copied from +1.
▪ the value of Xd is written back with the highest address that has not been copied to +1.

Integer
(FEAT_MOPS)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sz 0 1 1 1 0 1 op1 0 Rs 0 0 1 0 0 1 Rn Rd
op2

CPYPRT, CPYMRT, CPYERT Page 215

Epilogue (op1 == 10)

CPYERT [<Xd>]!, [<Xs>]!, <Xn>!

Main (op1 == 01)

CPYMRT [<Xd>]!, [<Xs>]!, <Xn>!

Prologue (op1 == 00)

CPYPRT [<Xd>]!, [<Xs>]!, <Xn>!

if !HaveFeatMOPS() then UNDEFINED;

if sz != '00' then UNDEFINED;

integer d = UInt(Rd);
integer s = UInt(Rs);
integer n = UInt(Rn);
bits(4) options = op2;

MOPSStage stage;
case op1 of
when '00' stage = MOPSStage_Prologue;
when '01' stage = MOPSStage_Main;
when '10' stage = MOPSStage_Epilogue;
otherwise SEE "Memory Copy and Memory Set";

if d == s || s == n || d == n then UNDEFINED;
if d == 31 || s == 31 || n == 31 then UNDEFINED;

Assembler Symbols

CPYPRT, CPYMRT, CPYERT Page 216

Operation

CPYPRT, CPYMRT, CPYERT Page 217

CheckMOPSEnabled();

integer N = MaxBlockSizeCopiedBytes();
bits(64) toaddress = X[d];
bits(64) fromaddress = X[s];
bits(64) cpysize = X[n];
bits(64) stagecpysize;
bits(8*N) readdata;
integer B;

if HaveMTE2Ext() then
SetTagCheckedInstruction(TRUE);

boolean supports_option_a = MemCpyOptionA();

(racctype, wacctype) = MemCpyAccessTypes(options);

if stage == MOPSStage_Prologue then

if cpysize<63:55> != '000000000' then cpysize = 0x007FFFFFFFFFFFFF<63:0>;

// IMP DEF selection of the amount covered by pre-processing.

stagecpysize = CPYPreSizeChoice(toaddress, fromaddress, cpysize);
assert stagecpysize<63> == cpysize<63> || stagecpysize == Zeros();

if SInt(cpysize) > 0 then

assert SInt(stagecpysize) <= SInt(cpysize);
else
assert SInt(stagecpysize) >= SInt(cpysize);
else
boolean zero_size_exceptions = MemCpyZeroSizeCheck();

// Check if this version is consistent with the state of the call.

CPYPRT, CPYMRT, CPYERT Page 218

bits(64) postsize = CPYPostSizeChoice(toaddress, fromaddress, cpysize);
assert postsize<63> == cpysize<63> || SInt(postsize) == 0;

if stage == MOPSStage_Main then

stagecpysize = cpysize - postsize;

// Check if the parameters to this instruction are valid.

// Check if the parameters to the epilogue are valid.

if SInt(cpysize) < 0 then

assert B <= -1 * SInt(stagecpysize);

readdata<B*8-1:0> = Mem[fromaddress+cpysize, B, racctype];

Mem[toaddress+cpysize, B, wacctype] = readdata<B*8-1:0>;
cpysize = cpysize + B;
stagecpysize = stagecpysize + B;
else
assert B <= SInt(stagecpysize);

cpysize = cpysize - B;
stagecpysize = stagecpysize - B;
readdata<B*8-1:0> = Mem[fromaddress+cpysize, B, racctype];
Mem[toaddress+cpysize, B, wacctype] = readdata<B*8-1:0>;

if stage != MOPSStage_Prologue then

if PSTATE.N == '0' then

cpysize = cpysize - B;
stagecpysize = stagecpysize - B;

if stage != MOPSStage_Prologue then

X[n] = cpysize;
X[d] = toaddress;
X[s] = fromaddress;

CPYPRT, CPYMRT, CPYERT Page 219

if stage == MOPSStage_Prologue then
X[n] = cpysize;
X[d] = toaddress;
X[s] = fromaddress;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

CPYPRT, CPYMRT, CPYERT Page 220

CPYPRTN, CPYMRTN, CPYERTN

Memory Copy, reads unprivileged, reads and writes non-temporal. These instructions perform a memory copy. The
prologue, main, and epilogue instructions are expected to be run in succession and to appear consecutively in
memory: CPYPRTN, then CPYMRTN, and then CPYERTN.
CPYPRTN performs some preconditioning of the arguments suitable for using the CPYMRTN instruction, and performs
an IMPLEMENTATION DEFINED amount of the memory copy. CPYMRTN performs an IMPLEMENTATION DEFINED amount of
the memory copy. CPYERTN performs the last part of the memory copy.

Note

The inclusion of IMPLEMENTATION DEFINED amounts of memory copy allows some optimization of the size that can be
performed.
For CPYPRTN, the following saturation logic is applied:
If Xn<63:55> != 000000000, the copy size Xn is saturated to 0x007FFFFFFFFFFFFF.
After that saturation logic is applied, the direction of the memory copy is based on the following algorithm:
If (Xs > Xd) && (Xd + saturated Xn) > Xs, then direction = forward
Elsif (Xs < Xd) && (Xs + saturated Xn) > Xd, then direction = backward
Else direction = IMPLEMENTATION DEFINED choice between forward and backward.
The architecture supports two algorithms for the memory copy: option A and option B. Which algorithm is used is
IMPLEMENTATION DEFINED.

Note

Portable software should not assume that the choice of algorithm is constant.
After execution of CPYPRTN, option A (which results in encoding PSTATE.C = 0):
• PSTATE.{N,Z,V} are set to {0,0,0}.
• If the copy is in the forward direction, then:
◦ Xs holds the original Xs + saturated Xn.
◦ Xd holds the original Xd + saturated Xn.
◦ Xn holds -1* saturated Xn + an IMPLEMENTATION DEFINED number of bytes copied.
• If the copy is in the backward direction, then:
◦ Xs and Xd are unchanged.
◦ Xn holds the saturated value of Xn - an IMPLEMENTATION DEFINED number of bytes copied.
After execution of CPYPRTN, option B (which results in encoding PSTATE.C = 1):
• If the copy is in the forward direction, then:
◦ Xs holds the original Xs + an IMPLEMENTATION DEFINED number of bytes copied.
◦ Xd holds the original Xd + an IMPLEMENTATION DEFINED number of bytes copied.
◦ Xn holds the saturated Xn - an IMPLEMENTATION DEFINED number of bytes copied.
◦ PSTATE.{N,Z,V} are set to {0,0,0}.
• If the copy is in the backward direction, then:
◦ Xs holds the original Xs + saturated Xn - an IMPLEMENTATION DEFINED number of bytes copied.
◦ Xd holds the original Xd + saturated Xn - an IMPLEMENTATION DEFINED number of bytes copied.
◦ Xn holds the saturated Xn - an IMPLEMENTATION DEFINED number of bytes copied.
◦ PSTATE.{N,Z,V} are set to {1,0,0}.
For CPYMRTN, option A (encoded by PSTATE.C = 0), the format of the arguments is:
• Xn is treated as a signed 64-bit number.
• If the copy is in the forward direction (Xn is a negative number), then:
◦ Xn holds -1* the number of bytes remaining to be copied in the memory copy in total.
◦ Xs holds the lowest address that the copy is copied from -Xn.
◦ Xd holds the lowest address that the copy is made to -Xn.
◦ At the end of the instruction, the value of Xn is written back with -1* the number of bytes remaining
to be copied in the memory copy in total.
• If the copy is in the backward direction (Xn is a positive number), then:
◦ Xn holds the number of bytes remaining to be copied in the memory copy in total.
◦ Xs holds the highest address that the copy is copied from -Xn+1.
◦ Xd holds the highest address that the copy is copied to -Xn+1.
◦ At the end of the instruction, the value of Xn is written back with the number of bytes remaining to
be copied in the memory copy in total.
For CPYMRTN, option B (encoded by PSTATE.C = 1), the format of the arguments is:
• Xn holds the number of bytes to be copied in the memory copy in total.

CPYPRTN, CPYMRTN,
Page 221
CPYERTN
• If the copy is in the forward direction (PSTATE.N == 0), then:
◦ Xs holds the lowest address that the copy is copied from.
◦ Xd holds the lowest address that the copy is copied to.
◦ At the end of the instruction:
▪ the value of Xn is written back with the number of bytes remaining to be copied in the
memory copy in total.
▪ the value of Xs is written back with the lowest address that has not been copied from.
▪ the value of Xd is written back with the lowest address that has not been copied to.
• If the copy is in the backward direction (PSTATE.N == 1), then:
◦ Xs holds the highest address that the copy is copied from +1.
◦ Xd holds the highest address that the copy is copied to +1.
◦ At the end of the instruction:
▪ the value of Xn is written back with the number of bytes remaining to be copied in the
memory copy in total.
▪ the value of Xs is written back with the highest address that has not been copied from +1.
▪ the value of Xd is written back with the highest address that has not been copied to +1.
For CPYERTN, option A (encoded by PSTATE.C = 0), the format of the arguments is:
• Xn is treated as a signed 64-bit number.
• If the copy is in the forward direction (Xn is a negative number), then:
◦ Xn holds -1* the number of bytes remaining to be copied in the memory copy in total.
◦ Xs holds the lowest address that the copy is copied from -Xn.
◦ Xd holds the lowest address that the copy is made to -Xn.
◦ At the end of the instruction, the value of Xn is written back with 0.
• If the copy is in the backward direction (Xn is a positive number), then:
◦ Xn holds the number of bytes remaining to be copied in the memory copy in total.
◦ Xs holds the highest address that the copy is copied from -Xn+1.
◦ Xd holds the highest address that the copy is copied to -Xn+1.
◦ At the end of the instruction, the value of Xn is written back with 0.
For CPYERTN, option B (encoded by PSTATE.C = 1), the format of the arguments is:
• Xn holds the number of bytes to be copied in the memory copy in total
• If the copy is in the forward direction (PSTATE.N == 0), then:
◦ Xs holds the lowest address that the copy is copied from.
◦ Xd holds the lowest address that the copy is copied to.
◦ At the end of the instruction:
▪ the value of Xn is written back with 0.
▪ the value of Xs is written back with the lowest address that has not been copied from.
▪ the value of Xd is written back with the lowest address that has not been copied to.
• If the copy is in the backward direction (PSTATE.N == 1), then:
◦ Xs holds the highest address that the copy is copied from +1.
◦ Xd holds the highest address that the copy is copied to +1.
◦ At the end of the instruction:
▪ the value of Xn is written back with 0.
▪ the value of Xs is written back with the highest address that has not been copied from +1.
▪ the value of Xd is written back with the highest address that has not been copied to +1.

Integer
(FEAT_MOPS)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sz 0 1 1 1 0 1 op1 0 Rs 1 1 1 0 0 1 Rn Rd
op2

CPYPRTN, CPYMRTN,
Page 222
CPYERTN
Epilogue (op1 == 10)

CPYERTN [<Xd>]!, [<Xs>]!, <Xn>!

Main (op1 == 01)

CPYMRTN [<Xd>]!, [<Xs>]!, <Xn>!

Prologue (op1 == 00)

CPYPRTN [<Xd>]!, [<Xs>]!, <Xn>!

if !HaveFeatMOPS() then UNDEFINED;

if sz != '00' then UNDEFINED;

integer d = UInt(Rd);
integer s = UInt(Rs);
integer n = UInt(Rn);
bits(4) options = op2;

MOPSStage stage;
case op1 of
when '00' stage = MOPSStage_Prologue;
when '01' stage = MOPSStage_Main;
when '10' stage = MOPSStage_Epilogue;
otherwise SEE "Memory Copy and Memory Set";

if d == s || s == n || d == n then UNDEFINED;
if d == 31 || s == 31 || n == 31 then UNDEFINED;

Assembler Symbols

CPYPRTN, CPYMRTN,
Page 223
CPYERTN
Operation

CPYPRTN, CPYMRTN,
Page 224
CPYERTN
CheckMOPSEnabled();

integer N = MaxBlockSizeCopiedBytes();
bits(64) toaddress = X[d];
bits(64) fromaddress = X[s];
bits(64) cpysize = X[n];
bits(64) stagecpysize;
bits(8*N) readdata;
integer B;

if HaveMTE2Ext() then
SetTagCheckedInstruction(TRUE);

boolean supports_option_a = MemCpyOptionA();

(racctype, wacctype) = MemCpyAccessTypes(options);

if stage == MOPSStage_Prologue then

if cpysize<63:55> != '000000000' then cpysize = 0x007FFFFFFFFFFFFF<63:0>;

// IMP DEF selection of the amount covered by pre-processing.

stagecpysize = CPYPreSizeChoice(toaddress, fromaddress, cpysize);
assert stagecpysize<63> == cpysize<63> || stagecpysize == Zeros();

if SInt(cpysize) > 0 then

assert SInt(stagecpysize) <= SInt(cpysize);
else
assert SInt(stagecpysize) >= SInt(cpysize);
else
boolean zero_size_exceptions = MemCpyZeroSizeCheck();

// Check if this version is consistent with the state of the call.

CPYPRTN, CPYMRTN,
Page 225
CPYERTN
bits(64) postsize = CPYPostSizeChoice(toaddress, fromaddress, cpysize);
assert postsize<63> == cpysize<63> || SInt(postsize) == 0;

if stage == MOPSStage_Main then

stagecpysize = cpysize - postsize;

// Check if the parameters to this instruction are valid.

// Check if the parameters to the epilogue are valid.

if SInt(cpysize) < 0 then

assert B <= -1 * SInt(stagecpysize);

readdata<B*8-1:0> = Mem[fromaddress+cpysize, B, racctype];

Mem[toaddress+cpysize, B, wacctype] = readdata<B*8-1:0>;
cpysize = cpysize + B;
stagecpysize = stagecpysize + B;
else
assert B <= SInt(stagecpysize);

cpysize = cpysize - B;
stagecpysize = stagecpysize - B;
readdata<B*8-1:0> = Mem[fromaddress+cpysize, B, racctype];
Mem[toaddress+cpysize, B, wacctype] = readdata<B*8-1:0>;

if stage != MOPSStage_Prologue then

if PSTATE.N == '0' then

cpysize = cpysize - B;
stagecpysize = stagecpysize - B;

if stage != MOPSStage_Prologue then

X[n] = cpysize;
X[d] = toaddress;
X[s] = fromaddress;

CPYPRTN, CPYMRTN,
Page 226
CPYERTN
if stage == MOPSStage_Prologue then
X[n] = cpysize;
X[d] = toaddress;
X[s] = fromaddress;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

CPYPRTN, CPYMRTN,
Page 227
CPYERTN
CPYPRTRN, CPYMRTRN, CPYERTRN

Memory Copy, reads unprivileged and non-temporal. These instructions perform a memory copy. The prologue, main,
and epilogue instructions are expected to be run in succession and to appear consecutively in memory: CPYPRTRN,
then CPYMRTRN, and then CPYERTRN.
CPYPRTRN performs some preconditioning of the arguments suitable for using the CPYMRTRN instruction, and
performs an IMPLEMENTATION DEFINED amount of the memory copy. CPYMRTRN performs an IMPLEMENTATION DEFINED
amount of the memory copy. CPYERTRN performs the last part of the memory copy.

Note

The inclusion of IMPLEMENTATION DEFINED amounts of memory copy allows some optimization of the size that can be
performed.
For CPYPRTRN, the following saturation logic is applied:
If Xn<63:55> != 000000000, the copy size Xn is saturated to 0x007FFFFFFFFFFFFF.
After that saturation logic is applied, the direction of the memory copy is based on the following algorithm:
If (Xs > Xd) && (Xd + saturated Xn) > Xs, then direction = forward
Elsif (Xs < Xd) && (Xs + saturated Xn) > Xd, then direction = backward
Else direction = IMPLEMENTATION DEFINED choice between forward and backward.
The architecture supports two algorithms for the memory copy: option A and option B. Which algorithm is used is
IMPLEMENTATION DEFINED.

Note

Portable software should not assume that the choice of algorithm is constant.
After execution of CPYPRTRN, option A (which results in encoding PSTATE.C = 0):
• PSTATE.{N,Z,V} are set to {0,0,0}.
• If the copy is in the forward direction, then:
◦ Xs holds the original Xs + saturated Xn.
◦ Xd holds the original Xd + saturated Xn.
◦ Xn holds -1* saturated Xn + an IMPLEMENTATION DEFINED number of bytes copied.
• If the copy is in the backward direction, then:
◦ Xs and Xd are unchanged.
◦ Xn holds the saturated value of Xn - an IMPLEMENTATION DEFINED number of bytes copied.
After execution of CPYPRTRN, option B (which results in encoding PSTATE.C = 1):
• If the copy is in the forward direction, then:
◦ Xs holds the original Xs + an IMPLEMENTATION DEFINED number of bytes copied.
◦ Xd holds the original Xd + an IMPLEMENTATION DEFINED number of bytes copied.
◦ Xn holds the saturated Xn - an IMPLEMENTATION DEFINED number of bytes copied.
◦ PSTATE.{N,Z,V} are set to {0,0,0}.
• If the copy is in the backward direction, then:
◦ Xs holds the original Xs + saturated Xn - an IMPLEMENTATION DEFINED number of bytes copied.
◦ Xd holds the original Xd + saturated Xn - an IMPLEMENTATION DEFINED number of bytes copied.
◦ Xn holds the saturated Xn - an IMPLEMENTATION DEFINED number of bytes copied.
◦ PSTATE.{N,Z,V} are set to {1,0,0}.
For CPYMRTRN, option A (encoded by PSTATE.C = 0), the format of the arguments is:
• Xn is treated as a signed 64-bit number.
• If the copy is in the forward direction (Xn is a negative number), then:
◦ Xn holds -1* the number of bytes remaining to be copied in the memory copy in total.
◦ Xs holds the lowest address that the copy is copied from -Xn.
◦ Xd holds the lowest address that the copy is made to -Xn.
◦ At the end of the instruction, the value of Xn is written back with -1* the number of bytes remaining
to be copied in the memory copy in total.
• If the copy is in the backward direction (Xn is a positive number), then:
◦ Xn holds the number of bytes remaining to be copied in the memory copy in total.
◦ Xs holds the highest address that the copy is copied from -Xn+1.
◦ Xd holds the highest address that the copy is copied to -Xn+1.
◦ At the end of the instruction, the value of Xn is written back with the number of bytes remaining to
be copied in the memory copy in total.
For CPYMRTRN, option B (encoded by PSTATE.C = 1), the format of the arguments is:
• Xn holds the number of bytes to be copied in the memory copy in total.

CPYPRTRN, CPYMRTRN,
Page 228
CPYERTRN
• If the copy is in the forward direction (PSTATE.N == 0), then:
◦ Xs holds the lowest address that the copy is copied from.
◦ Xd holds the lowest address that the copy is copied to.
◦ At the end of the instruction:
▪ the value of Xn is written back with the number of bytes remaining to be copied in the
memory copy in total.
▪ the value of Xs is written back with the lowest address that has not been copied from.
▪ the value of Xd is written back with the lowest address that has not been copied to.
• If the copy is in the backward direction (PSTATE.N == 1), then:
◦ Xs holds the highest address that the copy is copied from +1.
◦ Xd holds the highest address that the copy is copied to +1.
◦ At the end of the instruction:
▪ the value of Xn is written back with the number of bytes remaining to be copied in the
memory copy in total.
▪ the value of Xs is written back with the highest address that has not been copied from +1.
▪ the value of Xd is written back with the highest address that has not been copied to +1.
For CPYERTRN, option A (encoded by PSTATE.C = 0), the format of the arguments is:
• Xn is treated as a signed 64-bit number.
• If the copy is in the forward direction (Xn is a negative number), then:
◦ Xn holds -1* the number of bytes remaining to be copied in the memory copy in total.
◦ Xs holds the lowest address that the copy is copied from -Xn.
◦ Xd holds the lowest address that the copy is made to -Xn.
◦ At the end of the instruction, the value of Xn is written back with 0.
• If the copy is in the backward direction (Xn is a positive number), then:
◦ Xn holds the number of bytes remaining to be copied in the memory copy in total.
◦ Xs holds the highest address that the copy is copied from -Xn+1.
◦ Xd holds the highest address that the copy is copied to -Xn+1.
◦ At the end of the instruction, the value of Xn is written back with 0.
For CPYERTRN, option B (encoded by PSTATE.C = 1), the format of the arguments is:
• Xn holds the number of bytes to be copied in the memory copy in total
• If the copy is in the forward direction (PSTATE.N == 0), then:
◦ Xs holds the lowest address that the copy is copied from.
◦ Xd holds the lowest address that the copy is copied to.
◦ At the end of the instruction:
▪ the value of Xn is written back with 0.
▪ the value of Xs is written back with the lowest address that has not been copied from.
▪ the value of Xd is written back with the lowest address that has not been copied to.
• If the copy is in the backward direction (PSTATE.N == 1), then:
◦ Xs holds the highest address that the copy is copied from +1.
◦ Xd holds the highest address that the copy is copied to +1
◦ At the end of the instruction:
▪ the value of Xn is written back with 0.
▪ the value of Xs is written back with the highest address that has not been copied from +1.
▪ the value of Xd is written back with the highest address that has not been copied to +1.

Integer
(FEAT_MOPS)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sz 0 1 1 1 0 1 op1 0 Rs 1 0 1 0 0 1 Rn Rd
op2

CPYPRTRN, CPYMRTRN,
Page 229
CPYERTRN
Epilogue (op1 == 10)

CPYERTRN [<Xd>]!, [<Xs>]!, <Xn>!

Main (op1 == 01)

CPYMRTRN [<Xd>]!, [<Xs>]!, <Xn>!

Prologue (op1 == 00)

CPYPRTRN [<Xd>]!, [<Xs>]!, <Xn>!

if !HaveFeatMOPS() then UNDEFINED;

if sz != '00' then UNDEFINED;

integer d = UInt(Rd);
integer s = UInt(Rs);
integer n = UInt(Rn);
bits(4) options = op2;

MOPSStage stage;
case op1 of
when '00' stage = MOPSStage_Prologue;
when '01' stage = MOPSStage_Main;
when '10' stage = MOPSStage_Epilogue;
otherwise SEE "Memory Copy and Memory Set";

if d == s || s == n || d == n then UNDEFINED;
if d == 31 || s == 31 || n == 31 then UNDEFINED;

Assembler Symbols

CPYPRTRN, CPYMRTRN,
Page 230
CPYERTRN
Operation

CPYPRTRN, CPYMRTRN,
Page 231
CPYERTRN
CheckMOPSEnabled();

integer N = MaxBlockSizeCopiedBytes();
bits(64) toaddress = X[d];
bits(64) fromaddress = X[s];
bits(64) cpysize = X[n];
bits(64) stagecpysize;
bits(8*N) readdata;
integer B;

if HaveMTE2Ext() then
SetTagCheckedInstruction(TRUE);

boolean supports_option_a = MemCpyOptionA();

(racctype, wacctype) = MemCpyAccessTypes(options);

if stage == MOPSStage_Prologue then

if cpysize<63:55> != '000000000' then cpysize = 0x007FFFFFFFFFFFFF<63:0>;

// IMP DEF selection of the amount covered by pre-processing.

stagecpysize = CPYPreSizeChoice(toaddress, fromaddress, cpysize);
assert stagecpysize<63> == cpysize<63> || stagecpysize == Zeros();

if SInt(cpysize) > 0 then

assert SInt(stagecpysize) <= SInt(cpysize);
else
assert SInt(stagecpysize) >= SInt(cpysize);
else
boolean zero_size_exceptions = MemCpyZeroSizeCheck();

// Check if this version is consistent with the state of the call.

CPYPRTRN, CPYMRTRN,
Page 232
CPYERTRN
bits(64) postsize = CPYPostSizeChoice(toaddress, fromaddress, cpysize);
assert postsize<63> == cpysize<63> || SInt(postsize) == 0;

if stage == MOPSStage_Main then

stagecpysize = cpysize - postsize;

// Check if the parameters to this instruction are valid.

// Check if the parameters to the epilogue are valid.

if SInt(cpysize) < 0 then

assert B <= -1 * SInt(stagecpysize);

readdata<B*8-1:0> = Mem[fromaddress+cpysize, B, racctype];

Mem[toaddress+cpysize, B, wacctype] = readdata<B*8-1:0>;
cpysize = cpysize + B;
stagecpysize = stagecpysize + B;
else
assert B <= SInt(stagecpysize);

cpysize = cpysize - B;
stagecpysize = stagecpysize - B;
readdata<B*8-1:0> = Mem[fromaddress+cpysize, B, racctype];
Mem[toaddress+cpysize, B, wacctype] = readdata<B*8-1:0>;

if stage != MOPSStage_Prologue then

if PSTATE.N == '0' then

cpysize = cpysize - B;
stagecpysize = stagecpysize - B;

if stage != MOPSStage_Prologue then

X[n] = cpysize;
X[d] = toaddress;
X[s] = fromaddress;

CPYPRTRN, CPYMRTRN,
Page 233
CPYERTRN
if stage == MOPSStage_Prologue then
X[n] = cpysize;
X[d] = toaddress;
X[s] = fromaddress;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

CPYPRTRN, CPYMRTRN,
Page 234
CPYERTRN
CPYPRTWN, CPYMRTWN, CPYERTWN

Memory Copy, reads unprivileged, writes non-temporal. These instructions perform a memory copy. The prologue,
main, and epilogue instructions are expected to be run in succession and to appear consecutively in memory:
CPYPRTWN, then CPYMRTWN, and then CPYERTWN.
CPYPRTWN performs some preconditioning of the arguments suitable for using the CPYMRTWN instruction, and
performs an IMPLEMENTATION DEFINED amount of the memory copy. CPYMRTWN performs an IMPLEMENTATION DEFINED
amount of the memory copy. CPYERTWN performs the last part of the memory copy.

Note

The inclusion of IMPLEMENTATION DEFINED amounts of memory copy allows some optimization of the size that can be
performed.
For CPYPRTWN, the following saturation logic is applied:
If Xn<63:55> != 000000000, the copy size Xn is saturated to 0x007FFFFFFFFFFFFF.
After that saturation logic is applied, the direction of the memory copy is based on the following algorithm:
If (Xs > Xd) && (Xd + saturated Xn) > Xs, then direction = forward
Elsif (Xs < Xd) && (Xs + saturated Xn) > Xd, then direction = backward
Else direction = IMPLEMENTATION DEFINED choice between forward and backward.
The architecture supports two algorithms for the memory copy: option A and option B. Which algorithm is used is
IMPLEMENTATION DEFINED.

Note

Portable software should not assume that the choice of algorithm is constant.
After execution of CPYPRTWN, option A (which results in encoding PSTATE.C = 0):
• PSTATE.{N,Z,V} are set to {0,0,0}.
• If the copy is in the forward direction, then:
◦ Xs holds the original Xs + saturated Xn.
◦ Xd holds the original Xd + saturated Xn.
◦ Xn holds -1* saturated Xn + an IMPLEMENTATION DEFINED number of bytes copied.
• If the copy is in the backward direction, then:
◦ Xs and Xd are unchanged.
◦ Xn holds the saturated value of Xn - an IMPLEMENTATION DEFINED number of bytes copied.
After execution of CPYPRTWN, option B (which results in encoding PSTATE.C = 1):
• If the copy is in the forward direction, then:
◦ Xs holds the original Xs + an IMPLEMENTATION DEFINED number of bytes copied.
◦ Xd holds the original Xd + an IMPLEMENTATION DEFINED number of bytes copied.
◦ Xn holds the saturated Xn - an IMPLEMENTATION DEFINED number of bytes copied.
◦ PSTATE.{N,Z,V} are set to {0,0,0}.
• If the copy is in the backward direction, then:
◦ Xs holds the original Xs + saturated Xn - an IMPLEMENTATION DEFINED number of bytes copied.
◦ Xd holds the original Xd + saturated Xn - an IMPLEMENTATION DEFINED number of bytes copied.
◦ Xn holds the saturated Xn - an IMPLEMENTATION DEFINED number of bytes copied.
◦ PSTATE.{N,Z,V} are set to {1,0,0}.
For CPYMRTWN, option A (encoded by PSTATE.C = 0), the format of the arguments is:
• Xn is treated as a signed 64-bit number.
• If the copy is in the forward direction (Xn is a negative number), then:
◦ Xn holds -1* the number of bytes remaining to be copied in the memory copy in total.
◦ Xs holds the lowest address that the copy is copied from -Xn.
◦ Xd holds the lowest address that the copy is made to -Xn.
◦ At the end of the instruction, the value of Xn is written back with -1* the number of bytes remaining
to be copied in the memory copy in total.
• If the copy is in the backward direction (Xn is a positive number), then:
◦ Xn holds the number of bytes remaining to be copied in the memory copy in total.
◦ Xs holds the highest address that the copy is copied from -Xn+1.
◦ Xd holds the highest address that the copy is copied to -Xn+1.
◦ At the end of the instruction, the value of Xn is written back with the number of bytes remaining to
be copied in the memory copy in total.
For CPYMRTWN, option B (encoded by PSTATE.C = 1), the format of the arguments is:
• Xn holds the number of bytes to be copied in the memory copy in total.

CPYPRTWN, CPYMRTWN,
Page 235
CPYERTWN
• If the copy is in the forward direction (PSTATE.N == 0), then:
◦ Xs holds the lowest address that the copy is copied from.
◦ Xd holds the lowest address that the copy is copied to.
◦ At the end of the instruction:
▪ the value of Xn is written back with the number of bytes remaining to be copied in the
memory copy in total.
▪ the value of Xs is written back with the lowest address that has not been copied from.
▪ the value of Xd is written back with the lowest address that has not been copied to.
• If the copy is in the backward direction (PSTATE.N == 1), then:
◦ Xs holds the highest address that the copy is copied from +1.
◦ Xd holds the highest address that the copy is copied to +1.
◦ At the end of the instruction:
▪ the value of Xn is written back with the number of bytes remaining to be copied in the
memory copy in total.
▪ the value of Xs is written back with the highest address that has not been copied from +1.
▪ the value of Xd is written back with the highest address that has not been copied to +1.
For CPYERTWN, option A (encoded by PSTATE.C = 0), the format of the arguments is:
• Xn is treated as a signed 64-bit number.
• If the copy is in the forward direction (Xn is a negative number), then:
◦ Xn holds -1* the number of bytes remaining to be copied in the memory copy in total.
◦ Xs holds the lowest address that the copy is copied from -Xn.
◦ Xd holds the lowest address that the copy is made to -Xn.
◦ At the end of the instruction, the value of Xn is written back with 0.
• If the copy is in the backward direction (Xn is a positive number), then:
◦ Xn holds the number of bytes remaining to be copied in the memory copy in total.
◦ Xs holds the highest address that the copy is copied from -Xn+1.
◦ Xd holds the highest address that the copy is copied to -Xn+1.
◦ At the end of the instruction, the value of Xn is written back with 0.
For CPYERTWN, option B (encoded by PSTATE.C = 1), the format of the arguments is:
• Xn holds the number of bytes to be copied in the memory copy in total
• If the copy is in the forward direction (PSTATE.N == 0), then:
◦ Xs holds the lowest address that the copy is copied from.
◦ Xd holds the lowest address that the copy is copied to.
◦ At the end of the instruction:
▪ the value of Xn is written back with 0.
▪ the value of Xs is written back with the lowest address that has not been copied from.
▪ the value of Xd is written back with the lowest address that has not been copied to.
• If the copy is in the backward direction (PSTATE.N == 1), then:
◦ Xs holds the highest address that the copy is copied from +1.
◦ Xd holds the highest address that the copy is copied to +1.
◦ At the end of the instruction:
▪ the value of Xn is written back with 0.
▪ the value of Xs is written back with the highest address that has not been copied from +1.
▪ the value of Xd is written back with the highest address that has not been copied to +1

Integer
(FEAT_MOPS)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sz 0 1 1 1 0 1 op1 0 Rs 0 1 1 0 0 1 Rn Rd
op2

CPYPRTWN, CPYMRTWN,
Page 236
CPYERTWN
Epilogue (op1 == 10)

CPYERTWN [<Xd>]!, [<Xs>]!, <Xn>!

Main (op1 == 01)

CPYMRTWN [<Xd>]!, [<Xs>]!, <Xn>!

Prologue (op1 == 00)

CPYPRTWN [<Xd>]!, [<Xs>]!, <Xn>!

if !HaveFeatMOPS() then UNDEFINED;

if sz != '00' then UNDEFINED;

integer d = UInt(Rd);
integer s = UInt(Rs);
integer n = UInt(Rn);
bits(4) options = op2;

MOPSStage stage;
case op1 of
when '00' stage = MOPSStage_Prologue;
when '01' stage = MOPSStage_Main;
when '10' stage = MOPSStage_Epilogue;
otherwise SEE "Memory Copy and Memory Set";

if d == s || s == n || d == n then UNDEFINED;
if d == 31 || s == 31 || n == 31 then UNDEFINED;

Assembler Symbols

CPYPRTWN, CPYMRTWN,
Page 237
CPYERTWN
Operation

CPYPRTWN, CPYMRTWN,
Page 238
CPYERTWN
CheckMOPSEnabled();

integer N = MaxBlockSizeCopiedBytes();
bits(64) toaddress = X[d];
bits(64) fromaddress = X[s];
bits(64) cpysize = X[n];
bits(64) stagecpysize;
bits(8*N) readdata;
integer B;

if HaveMTE2Ext() then
SetTagCheckedInstruction(TRUE);

boolean supports_option_a = MemCpyOptionA();

(racctype, wacctype) = MemCpyAccessTypes(options);

if stage == MOPSStage_Prologue then

if cpysize<63:55> != '000000000' then cpysize = 0x007FFFFFFFFFFFFF<63:0>;

// IMP DEF selection of the amount covered by pre-processing.

stagecpysize = CPYPreSizeChoice(toaddress, fromaddress, cpysize);
assert stagecpysize<63> == cpysize<63> || stagecpysize == Zeros();

if SInt(cpysize) > 0 then

assert SInt(stagecpysize) <= SInt(cpysize);
else
assert SInt(stagecpysize) >= SInt(cpysize);
else
boolean zero_size_exceptions = MemCpyZeroSizeCheck();

// Check if this version is consistent with the state of the call.

CPYPRTWN, CPYMRTWN,
Page 239
CPYERTWN
bits(64) postsize = CPYPostSizeChoice(toaddress, fromaddress, cpysize);
assert postsize<63> == cpysize<63> || SInt(postsize) == 0;

if stage == MOPSStage_Main then

stagecpysize = cpysize - postsize;

// Check if the parameters to this instruction are valid.

// Check if the parameters to the epilogue are valid.

if SInt(cpysize) < 0 then

assert B <= -1 * SInt(stagecpysize);

readdata<B*8-1:0> = Mem[fromaddress+cpysize, B, racctype];

Mem[toaddress+cpysize, B, wacctype] = readdata<B*8-1:0>;
cpysize = cpysize + B;
stagecpysize = stagecpysize + B;
else
assert B <= SInt(stagecpysize);

cpysize = cpysize - B;
stagecpysize = stagecpysize - B;
readdata<B*8-1:0> = Mem[fromaddress+cpysize, B, racctype];
Mem[toaddress+cpysize, B, wacctype] = readdata<B*8-1:0>;

if stage != MOPSStage_Prologue then

if PSTATE.N == '0' then

cpysize = cpysize - B;
stagecpysize = stagecpysize - B;

if stage != MOPSStage_Prologue then

X[n] = cpysize;
X[d] = toaddress;
X[s] = fromaddress;

CPYPRTWN, CPYMRTWN,
Page 240
CPYERTWN
if stage == MOPSStage_Prologue then
X[n] = cpysize;
X[d] = toaddress;
X[s] = fromaddress;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

CPYPRTWN, CPYMRTWN,
Page 241
CPYERTWN
CPYPT, CPYMT, CPYET

Memory Copy, reads and writes unprivileged. These instructions perform a memory copy. The prologue, main, and
epilogue instructions are expected to be run in succession and to appear consecutively in memory: CPYPT, then
CPYMT, and then CPYET.
CPYPT performs some preconditioning of the arguments suitable for using the CPYMT instruction, and performs an
IMPLEMENTATION DEFINED amount of the memory copy. CPYMT performs an IMPLEMENTATION DEFINED amount of the
memory copy. CPYET performs the last part of the memory copy.

Note

The inclusion of IMPLEMENTATION DEFINED amounts of memory copy allows some optimization of the size that can be
performed.
For CPYPT, the following saturation logic is applied:
If Xn<63:55> != 000000000, the copy size Xn is saturated to 0x007FFFFFFFFFFFFF.
After that saturation logic is applied, the direction of the memory copy is based on the following algorithm:
If (Xs > Xd) && (Xd + saturated Xn) > Xs, then direction = forward
Elsif (Xs < Xd) && (Xs + saturated Xn) > Xd, then direction = backward
Else direction = IMPLEMENTATION DEFINED choice between forward and backward.
The architecture supports two algorithms for the memory copy: option A and option B. Which algorithm is used is
IMPLEMENTATION DEFINED.

Note

Portable software should not assume that the choice of algorithm is constant.
After execution of CPYPT, option A (which results in encoding PSTATE.C = 0):
• PSTATE.{N,Z,V} are set to {0,0,0}.
• If the copy is in the forward direction, then:
◦ Xs holds the original Xs + saturated Xn.
◦ Xd holds the original Xd + saturated Xn.
◦ Xn holds -1* saturated Xn + an IMPLEMENTATION DEFINED number of bytes copied.
• If the copy is in the backward direction, then:
◦ Xs and Xd are unchanged.
◦ Xn holds the saturated value of Xn - an IMPLEMENTATION DEFINED number of bytes copied.
After execution of CPYPT, option B (which results in encoding PSTATE.C = 1):
• If the copy is in the forward direction, then:
◦ Xs holds the original Xs + an IMPLEMENTATION DEFINED number of bytes copied.
◦ Xd holds the original Xd + an IMPLEMENTATION DEFINED number of bytes copied.
◦ Xn holds the saturated Xn - an IMPLEMENTATION DEFINED number of bytes copied.
◦ PSTATE.{N,Z,V} are set to {0,0,0}.
• If the copy is in the backward direction, then:
◦ Xs holds the original Xs + saturated Xn - an IMPLEMENTATION DEFINED number of bytes copied.
◦ Xd holds the original Xd + saturated Xn - an IMPLEMENTATION DEFINED number of bytes copied.
◦ Xn holds the saturated Xn - an IMPLEMENTATION DEFINED number of bytes copied.
◦ PSTATE.{N,Z,V} are set to {1,0,0}.
For CPYMT, option A (encoded by PSTATE.C = 0), the format of the arguments is:
• Xn is treated as a signed 64-bit number.
• If the copy is in the forward direction (Xn is a negative number), then:
◦ Xn holds -1* the number of bytes remaining to be copied in the memory copy in total.
◦ Xs holds the lowest address that the copy is copied from -Xn.
◦ Xd holds the lowest address that the copy is made to -Xn.
◦ At the end of the instruction, the value of Xn is written back with -1* the number of bytes remaining
to be copied in the memory copy in total.
• If the copy is in the backward direction (Xn is a positive number), then:
◦ Xn holds the number of bytes remaining to be copied in the memory copy in total.
◦ Xs holds the highest address that the copy is copied from -Xn+1.
◦ Xd holds the highest address that the copy is copied to -Xn+1.
◦ At the end of the instruction, the value of Xn is written back with the number of bytes remaining to
be copied in the memory copy in total.
For CPYMT, option B (encoded by PSTATE.C = 1), the format of the arguments is:
• Xn holds the number of bytes to be copied in the memory copy in total.

CPYPT, CPYMT, CPYET Page 242

• If the copy is in the forward direction (PSTATE.N == 0), then:
◦ Xs holds the lowest address that the copy is copied from.
◦ Xd holds the lowest address that the copy is copied to.
◦ At the end of the instruction:
▪ the value of Xn is written back with the number of bytes remaining to be copied in the
memory copy in total.
▪ the value of Xs is written back with the lowest address that has not been copied from.
▪ the value of Xd is written back with the lowest address that has not been copied to.
• If the copy is in the backward direction (PSTATE.N == 1), then:
◦ Xs holds the highest address that the copy is copied from +1.
◦ Xd holds the highest address that the copy is copied to +1.
◦ At the end of the instruction:
▪ the value of Xn is written back with the number of bytes remaining to be copied in the
memory copy in total.
▪ the value of Xs is written back with the highest address that has not been copied from +1.
▪ the value of Xd is written back with the highest address that has not been copied to +1.
For CPYET, option A (encoded by PSTATE.C = 0), the format of the arguments is:
• Xn is treated as a signed 64-bit number.
• If the copy is in the forward direction (Xn is a negative number), then:
◦ Xn holds -1* the number of bytes remaining to be copied in the memory copy in total.
◦ Xs holds the lowest address that the copy is copied from -Xn.
◦ Xd holds the lowest address that the copy is made to -Xn.
◦ At the end of the instruction, the value of Xn is written back with 0.
• If the copy is in the backward direction (Xn is a positive number), then:
◦ Xn holds the number of bytes remaining to be copied in the memory copy in total.
◦ Xs holds the highest address that the copy is copied from -Xn+1.
◦ Xd holds the highest address that the copy is copied to -Xn+1.
◦ At the end of the instruction, the value of Xn is written back with 0.
For CPYET, option B (encoded by PSTATE.C = 1), the format of the arguments is:
• Xn holds the number of bytes to be copied in the memory copy in total
• If the copy is in the forward direction (PSTATE.N == 0), then:
◦ Xs holds the lowest address that the copy is copied from.
◦ Xd holds the lowest address that the copy is copied to.
◦ At the end of the instruction:
▪ the value of Xn is written back with 0.
▪ the value of Xs is written back with the lowest address that has not been copied from.
▪ the value of Xd is written back with the lowest address that has not been copied to.
• If the copy is in the backward direction (PSTATE.N == 1), then:
◦ Xs holds the highest address that the copy is copied from +1.
◦ Xd holds the highest address that the copy is copied to +1.
◦ At the end of the instruction:
▪ the value of Xn is written back with 0.
▪ the value of Xs is written back with the highest address that has not been copied from +1.
▪ the value of Xd is written back with the highest address that has not been copied to +1.

Integer
(FEAT_MOPS)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sz 0 1 1 1 0 1 op1 0 Rs 0 0 1 1 0 1 Rn Rd
op2

CPYPT, CPYMT, CPYET Page 243

Epilogue (op1 == 10)

CPYET [<Xd>]!, [<Xs>]!, <Xn>!

Main (op1 == 01)

CPYMT [<Xd>]!, [<Xs>]!, <Xn>!

Prologue (op1 == 00)

CPYPT [<Xd>]!, [<Xs>]!, <Xn>!

if !HaveFeatMOPS() then UNDEFINED;

if sz != '00' then UNDEFINED;

integer d = UInt(Rd);
integer s = UInt(Rs);
integer n = UInt(Rn);
bits(4) options = op2;

MOPSStage stage;
case op1 of
when '00' stage = MOPSStage_Prologue;
when '01' stage = MOPSStage_Main;
when '10' stage = MOPSStage_Epilogue;
otherwise SEE "Memory Copy and Memory Set";

if d == s || s == n || d == n then UNDEFINED;
if d == 31 || s == 31 || n == 31 then UNDEFINED;

Assembler Symbols

CPYPT, CPYMT, CPYET Page 244

Operation

CPYPT, CPYMT, CPYET Page 245

CheckMOPSEnabled();

integer N = MaxBlockSizeCopiedBytes();
bits(64) toaddress = X[d];
bits(64) fromaddress = X[s];
bits(64) cpysize = X[n];
bits(64) stagecpysize;
bits(8*N) readdata;
integer B;

if HaveMTE2Ext() then
SetTagCheckedInstruction(TRUE);

boolean supports_option_a = MemCpyOptionA();

(racctype, wacctype) = MemCpyAccessTypes(options);

if stage == MOPSStage_Prologue then

if cpysize<63:55> != '000000000' then cpysize = 0x007FFFFFFFFFFFFF<63:0>;

// IMP DEF selection of the amount covered by pre-processing.

stagecpysize = CPYPreSizeChoice(toaddress, fromaddress, cpysize);
assert stagecpysize<63> == cpysize<63> || stagecpysize == Zeros();

if SInt(cpysize) > 0 then

assert SInt(stagecpysize) <= SInt(cpysize);
else
assert SInt(stagecpysize) >= SInt(cpysize);
else
boolean zero_size_exceptions = MemCpyZeroSizeCheck();

// Check if this version is consistent with the state of the call.

CPYPT, CPYMT, CPYET Page 246

bits(64) postsize = CPYPostSizeChoice(toaddress, fromaddress, cpysize);
assert postsize<63> == cpysize<63> || SInt(postsize) == 0;

if stage == MOPSStage_Main then

stagecpysize = cpysize - postsize;

// Check if the parameters to this instruction are valid.

// Check if the parameters to the epilogue are valid.

if SInt(cpysize) < 0 then

assert B <= -1 * SInt(stagecpysize);

readdata<B*8-1:0> = Mem[fromaddress+cpysize, B, racctype];

Mem[toaddress+cpysize, B, wacctype] = readdata<B*8-1:0>;
cpysize = cpysize + B;
stagecpysize = stagecpysize + B;
else
assert B <= SInt(stagecpysize);

cpysize = cpysize - B;
stagecpysize = stagecpysize - B;
readdata<B*8-1:0> = Mem[fromaddress+cpysize, B, racctype];
Mem[toaddress+cpysize, B, wacctype] = readdata<B*8-1:0>;

if stage != MOPSStage_Prologue then

if PSTATE.N == '0' then

cpysize = cpysize - B;
stagecpysize = stagecpysize - B;

if stage != MOPSStage_Prologue then

X[n] = cpysize;
X[d] = toaddress;
X[s] = fromaddress;

CPYPT, CPYMT, CPYET Page 247

if stage == MOPSStage_Prologue then
X[n] = cpysize;
X[d] = toaddress;
X[s] = fromaddress;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

CPYPT, CPYMT, CPYET Page 248

CPYPTN, CPYMTN, CPYETN

Memory Copy, reads and writes unprivileged and non-temporal. These instructions perform a memory copy. The
prologue, main, and epilogue instructions are expected to be run in succession and to appear consecutively in
memory: CPYPTN, then CPYMTN, and then CPYETN.
CPYPTN performs some preconditioning of the arguments suitable for using the CPYMTN instruction, and performs an
IMPLEMENTATION DEFINED amount of the memory copy. CPYMTN performs an IMPLEMENTATION DEFINED amount of the
memory copy. CPYETN performs the last part of the memory copy.

Note

The inclusion of IMPLEMENTATION DEFINED amounts of memory copy allows some optimization of the size that can be
performed.
For CPYPTN, the following saturation logic is applied:
If Xn<63:55> != 000000000, the copy size Xn is saturated to 0x007FFFFFFFFFFFFF.
After that saturation logic is applied, the direction of the memory copy is based on the following algorithm:
If (Xs > Xd) && (Xd + saturated Xn) > Xs, then direction = forward
Elsif (Xs < Xd) && (Xs + saturated Xn) > Xd, then direction = backward
Else direction = IMPLEMENTATION DEFINED choice between forward and backward.
The architecture supports two algorithms for the memory copy: option A and option B. Which algorithm is used is
IMPLEMENTATION DEFINED.

Note

Portable software should not assume that the choice of algorithm is constant.
After execution of CPYPTN, option A (which results in encoding PSTATE.C = 0):
• PSTATE.{N,Z,V} are set to {0,0,0}.
• If the copy is in the forward direction, then:
◦ Xs holds the original Xs + saturated Xn.
◦ Xd holds the original Xd + saturated Xn.
◦ Xn holds -1* saturated Xn + an IMPLEMENTATION DEFINED number of bytes copied.
• If the copy is in the backward direction, then:
◦ Xs and Xd are unchanged.
◦ Xn holds the saturated value of Xn - an IMPLEMENTATION DEFINED number of bytes copied.
After execution of CPYPTN, option B (which results in encoding PSTATE.C = 1):
• If the copy is in the forward direction, then:
◦ Xs holds the original Xs + an IMPLEMENTATION DEFINED number of bytes copied.
◦ Xd holds the original Xd + an IMPLEMENTATION DEFINED number of bytes copied.
◦ Xn holds the saturated Xn - an IMPLEMENTATION DEFINED number of bytes copied.
◦ PSTATE.{N,Z,V} are set to {0,0,0}.
• If the copy is in the backward direction, then:
◦ Xs holds the original Xs + saturated Xn - an IMPLEMENTATION DEFINED number of bytes copied.
◦ Xd holds the original Xd + saturated Xn - an IMPLEMENTATION DEFINED number of bytes copied.
◦ Xn holds the saturated Xn - an IMPLEMENTATION DEFINED number of bytes copied.
◦ PSTATE.{N,Z,V} are set to {1,0,0}.
For CPYMTN, option A (encoded by PSTATE.C = 0), the format of the arguments is:
• Xn is treated as a signed 64-bit number.
• If the copy is in the forward direction (Xn is a negative number), then:
◦ Xn holds -1* the number of bytes remaining to be copied in the memory copy in total.
◦ Xs holds the lowest address that the copy is copied from -Xn.
◦ Xd holds the lowest address that the copy is made to -Xn.
◦ At the end of the instruction, the value of Xn is written back with -1* the number of bytes remaining
to be copied in the memory copy in total.
• If the copy is in the backward direction (Xn is a positive number), then:
◦ Xn holds the number of bytes remaining to be copied in the memory copy in total.
◦ Xs holds the highest address that the copy is copied from -Xn+1.
◦ Xd holds the highest address that the copy is copied to -Xn+1.
◦ At the end of the instruction, the value of Xn is written back with the number of bytes remaining to
be copied in the memory copy in total.
For CPYMTN, option B (encoded by PSTATE.C = 1), the format of the arguments is:
• Xn holds the number of bytes to be copied in the memory copy in total.

CPYPTN, CPYMTN, CPYETN Page 249

• If the copy is in the forward direction (PSTATE.N == 0), then:
◦ Xs holds the lowest address that the copy is copied from.
◦ Xd holds the lowest address that the copy is copied to.
◦ At the end of the instruction:
▪ the value of Xn is written back with the number of bytes remaining to be copied in the
memory copy in total.
▪ the value of Xs is written back with the lowest address that has not been copied from.
▪ the value of Xd is written back with the lowest address that has not been copied to.
• If the copy is in the backward direction (PSTATE.N == 1), then:
◦ Xs holds the highest address that the copy is copied from +1.
◦ Xd holds the highest address that the copy is copied to +1.
◦ At the end of the instruction:
▪ the value of Xn is written back with the number of bytes remaining to be copied in the
memory copy in total.
▪ the value of Xs is written back with the highest address that has not been copied from +1.
▪ the value of Xd is written back with the highest address that has not been copied to +1.
For CPYETN, option A (encoded by PSTATE.C = 0), the format of the arguments is:
• Xn is treated as a signed 64-bit number.
• If the copy is in the forward direction (Xn is a negative number), then:
◦ Xn holds -1* the number of bytes remaining to be copied in the memory copy in total.
◦ Xs holds the lowest address that the copy is copied from -Xn.
◦ Xd holds the lowest address that the copy is made to -Xn.
◦ At the end of the instruction, the value of Xn is written back with 0.
• If the copy is in the backward direction (Xn is a positive number), then:
◦ Xn holds the number of bytes remaining to be copied in the memory copy in total.
◦ Xs holds the highest address that the copy is copied from -Xn+1.
◦ Xd holds the highest address that the copy is copied to -Xn+1.
◦ At the end of the instruction, the value of Xn is written back with 0.
For CPYETN, option B (encoded by PSTATE.C = 1), the format of the arguments is:
• Xn holds the number of bytes to be copied in the memory copy in total
• If the copy is in the forward direction (PSTATE.N == 0), then:
◦ Xs holds the lowest address that the copy is copied from.
◦ Xd holds the lowest address that the copy is copied to.
◦ At the end of the instruction:
▪ the value of Xn is written back with 0.
▪ the value of Xs is written back with the lowest address that has not been copied from.
▪ the value of Xd is written back with the lowest address that has not been copied to.
• If the copy is in the backward direction (PSTATE.N == 1), then:
◦ Xs holds the highest address that the copy is copied from +1.
◦ Xd holds the highest address that the copy is copied to +1.
◦ At the end of the instruction:
▪ the value of Xn is written back with 0.
▪ the value of Xs is written back with the highest address that has not been copied from +1.
▪ the value of Xd is written back with the highest address that has not been copied to +1.

Integer
(FEAT_MOPS)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sz 0 1 1 1 0 1 op1 0 Rs 1 1 1 1 0 1 Rn Rd
op2

CPYPTN, CPYMTN, CPYETN Page 250

Epilogue (op1 == 10)

CPYETN [<Xd>]!, [<Xs>]!, <Xn>!

Main (op1 == 01)

CPYMTN [<Xd>]!, [<Xs>]!, <Xn>!

Prologue (op1 == 00)

CPYPTN [<Xd>]!, [<Xs>]!, <Xn>!

if !HaveFeatMOPS() then UNDEFINED;

if sz != '00' then UNDEFINED;

integer d = UInt(Rd);
integer s = UInt(Rs);
integer n = UInt(Rn);
bits(4) options = op2;

MOPSStage stage;
case op1 of
when '00' stage = MOPSStage_Prologue;
when '01' stage = MOPSStage_Main;
when '10' stage = MOPSStage_Epilogue;
otherwise SEE "Memory Copy and Memory Set";

if d == s || s == n || d == n then UNDEFINED;
if d == 31 || s == 31 || n == 31 then UNDEFINED;

Assembler Symbols

CPYPTN, CPYMTN, CPYETN Page 251

Operation

CPYPTN, CPYMTN, CPYETN Page 252

CheckMOPSEnabled();

integer N = MaxBlockSizeCopiedBytes();
bits(64) toaddress = X[d];
bits(64) fromaddress = X[s];
bits(64) cpysize = X[n];
bits(64) stagecpysize;
bits(8*N) readdata;
integer B;

if HaveMTE2Ext() then
SetTagCheckedInstruction(TRUE);

boolean supports_option_a = MemCpyOptionA();

(racctype, wacctype) = MemCpyAccessTypes(options);

if stage == MOPSStage_Prologue then

if cpysize<63:55> != '000000000' then cpysize = 0x007FFFFFFFFFFFFF<63:0>;

// IMP DEF selection of the amount covered by pre-processing.

stagecpysize = CPYPreSizeChoice(toaddress, fromaddress, cpysize);
assert stagecpysize<63> == cpysize<63> || stagecpysize == Zeros();

if SInt(cpysize) > 0 then

assert SInt(stagecpysize) <= SInt(cpysize);
else
assert SInt(stagecpysize) >= SInt(cpysize);
else
boolean zero_size_exceptions = MemCpyZeroSizeCheck();

// Check if this version is consistent with the state of the call.

CPYPTN, CPYMTN, CPYETN Page 253

bits(64) postsize = CPYPostSizeChoice(toaddress, fromaddress, cpysize);
assert postsize<63> == cpysize<63> || SInt(postsize) == 0;

if stage == MOPSStage_Main then

stagecpysize = cpysize - postsize;

// Check if the parameters to this instruction are valid.

// Check if the parameters to the epilogue are valid.

if SInt(cpysize) < 0 then

assert B <= -1 * SInt(stagecpysize);

readdata<B*8-1:0> = Mem[fromaddress+cpysize, B, racctype];

Mem[toaddress+cpysize, B, wacctype] = readdata<B*8-1:0>;
cpysize = cpysize + B;
stagecpysize = stagecpysize + B;
else
assert B <= SInt(stagecpysize);

cpysize = cpysize - B;
stagecpysize = stagecpysize - B;
readdata<B*8-1:0> = Mem[fromaddress+cpysize, B, racctype];
Mem[toaddress+cpysize, B, wacctype] = readdata<B*8-1:0>;

if stage != MOPSStage_Prologue then

if PSTATE.N == '0' then

cpysize = cpysize - B;
stagecpysize = stagecpysize - B;

if stage != MOPSStage_Prologue then

X[n] = cpysize;
X[d] = toaddress;
X[s] = fromaddress;

CPYPTN, CPYMTN, CPYETN Page 254

if stage == MOPSStage_Prologue then
X[n] = cpysize;
X[d] = toaddress;
X[s] = fromaddress;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

CPYPTN, CPYMTN, CPYETN Page 255

CPYPTRN, CPYMTRN, CPYETRN

Memory Copy, reads and writes unprivileged, reads non-temporal. These instructions perform a memory copy. The
prologue, main, and epilogue instructions are expected to be run in succession and to appear consecutively in
memory: CPYPTRN, then CPYMTRN, and then CPYETRN.
CPYPTRN performs some preconditioning of the arguments suitable for using the CPYMTRN instruction, and performs
an IMPLEMENTATION DEFINED amount of the memory copy. CPYMTRN performs an IMPLEMENTATION DEFINED amount of
the memory copy. CPYETRN performs the last part of the memory copy.

Note

The inclusion of IMPLEMENTATION DEFINED amounts of memory copy allows some optimization of the size that can be
performed.
For CPYPTRN, the following saturation logic is applied:
If Xn<63:55> != 000000000, the copy size Xn is saturated to 0x007FFFFFFFFFFFFF.
After that saturation logic is applied, the direction of the memory copy is based on the following algorithm:
If (Xs > Xd) && (Xd + saturated Xn) > Xs, then direction = forward
Elsif (Xs < Xd) && (Xs + saturated Xn) > Xd, then direction = backward
Else direction = IMPLEMENTATION DEFINED choice between forward and backward.
The architecture supports two algorithms for the memory copy: option A and option B. Which algorithm is used is
IMPLEMENTATION DEFINED.

Note

Portable software should not assume that the choice of algorithm is constant.
After execution of CPYPTRN, option A (which results in encoding PSTATE.C = 0):
• PSTATE.{N,Z,V} are set to {0,0,0}.
• If the copy is in the forward direction, then:
◦ Xs holds the original Xs + saturated Xn.
◦ Xd holds the original Xd + saturated Xn.
◦ Xn holds -1* saturated Xn + an IMPLEMENTATION DEFINED number of bytes copied.
• If the copy is in the backward direction, then:
◦ Xs and Xd are unchanged.
◦ Xn holds the saturated value of Xn - an IMPLEMENTATION DEFINED number of bytes copied.
After execution of CPYPTRN, option B (which results in encoding PSTATE.C = 1):
• If the copy is in the forward direction, then:
◦ Xs holds the original Xs + an IMPLEMENTATION DEFINED number of bytes copied.
◦ Xd holds the original Xd + an IMPLEMENTATION DEFINED number of bytes copied.
◦ Xn holds the saturated Xn - an IMPLEMENTATION DEFINED number of bytes copied.
◦ PSTATE.{N,Z,V} are set to {0,0,0}.
• If the copy is in the backward direction, then:
◦ Xs holds the original Xs + saturated Xn - an IMPLEMENTATION DEFINED number of bytes copied.
◦ Xd holds the original Xd + saturated Xn - an IMPLEMENTATION DEFINED number of bytes copied.
◦ Xn holds the saturated Xn - an IMPLEMENTATION DEFINED number of bytes copied.
◦ PSTATE.{N,Z,V} are set to {1,0,0}.
For CPYMTRN, option A (encoded by PSTATE.C = 0), the format of the arguments is:
• Xn is treated as a signed 64-bit number.
• If the copy is in the forward direction (Xn is a negative number), then:
◦ Xn holds -1* the number of bytes remaining to be copied in the memory copy in total.
◦ Xs holds the lowest address that the copy is copied from -Xn.
◦ Xd holds the lowest address that the copy is made to -Xn.
◦ At the end of the instruction, the value of Xn is written back with -1* the number of bytes remaining
to be copied in the memory copy in total.
• If the copy is in the backward direction (Xn is a positive number), then:
◦ Xn holds the number of bytes remaining to be copied in the memory copy in total.
◦ Xs holds the highest address that the copy is copied from -Xn+1.
◦ Xd holds the highest address that the copy is copied to -Xn+1.
◦ At the end of the instruction, the value of Xn is written back with the number of bytes remaining to
be copied in the memory copy in total.
For CPYMTRN, option B (encoded by PSTATE.C = 1), the format of the arguments is:
• Xn holds the number of bytes to be copied in the memory copy in total.

CPYPTRN, CPYMTRN,
Page 256
CPYETRN
• If the copy is in the forward direction (PSTATE.N == 0), then:
◦ Xs holds the lowest address that the copy is copied from.
◦ Xd holds the lowest address that the copy is copied to.
◦ At the end of the instruction:
▪ the value of Xn is written back with the number of bytes remaining to be copied in the
memory copy in total.
▪ the value of Xs is written back with the lowest address that has not been copied from.
▪ the value of Xd is written back with the lowest address that has not been copied to.
• If the copy is in the backward direction (PSTATE.N == 1), then:
◦ Xs holds the highest address that the copy is copied from +1.
◦ Xd holds the highest address that the copy is copied to +1.
◦ At the end of the instruction:
▪ the value of Xn is written back with the number of bytes remaining to be copied in the
memory copy in total.
▪ the value of Xs is written back with the highest address that has not been copied from +1.
▪ the value of Xd is written back with the highest address that has not been copied to +1.
For CPYETRN, option A (encoded by PSTATE.C = 0), the format of the arguments is:
• Xn is treated as a signed 64-bit number.
• If the copy is in the forward direction (Xn is a negative number), then:
◦ Xn holds -1* the number of bytes remaining to be copied in the memory copy in total.
◦ Xs holds the lowest address that the copy is copied from -Xn.
◦ Xd holds the lowest address that the copy is made to -Xn.
◦ At the end of the instruction, the value of Xn is written back with 0.
• If the copy is in the backward direction (Xn is a positive number), then:
◦ Xn holds the number of bytes remaining to be copied in the memory copy in total.
◦ Xs holds the highest address that the copy is copied from -Xn+1.
◦ Xd holds the highest address that the copy is copied to -Xn+1.
◦ At the end of the instruction, the value of Xn is written back with 0.
For CPYETRN, option B (encoded by PSTATE.C = 1), the format of the arguments is:
• Xn holds the number of bytes to be copied in the memory copy in total
• If the copy is in the forward direction (PSTATE.N == 0), then:
◦ Xs holds the lowest address that the copy is copied from.
◦ Xd holds the lowest address that the copy is copied to.
◦ At the end of the instruction:
▪ the value of Xn is written back with 0.
▪ the value of Xs is written back with the lowest address that has not been copied from.
▪ the value of Xd is written back with the lowest address that has not been copied to.
• If the copy is in the backward direction (PSTATE.N == 1), then:
◦ Xs holds the highest address that the copy is copied from +1.
◦ Xd holds the highest address that the copy is copied to +1.
◦ At the end of the instruction:
▪ the value of Xn is written back with 0.
▪ the value of Xs is written back with the highest address that has not been copied from +1.
▪ the value of Xd is written back with the highest address that has not been copied to +1.

Integer
(FEAT_MOPS)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sz 0 1 1 1 0 1 op1 0 Rs 1 0 1 1 0 1 Rn Rd
op2

CPYPTRN, CPYMTRN,
Page 257
CPYETRN
Epilogue (op1 == 10)

CPYETRN [<Xd>]!, [<Xs>]!, <Xn>!

Main (op1 == 01)

CPYMTRN [<Xd>]!, [<Xs>]!, <Xn>!

Prologue (op1 == 00)

CPYPTRN [<Xd>]!, [<Xs>]!, <Xn>!

if !HaveFeatMOPS() then UNDEFINED;

if sz != '00' then UNDEFINED;

integer d = UInt(Rd);
integer s = UInt(Rs);
integer n = UInt(Rn);
bits(4) options = op2;

MOPSStage stage;
case op1 of
when '00' stage = MOPSStage_Prologue;
when '01' stage = MOPSStage_Main;
when '10' stage = MOPSStage_Epilogue;
otherwise SEE "Memory Copy and Memory Set";

if d == s || s == n || d == n then UNDEFINED;
if d == 31 || s == 31 || n == 31 then UNDEFINED;

Assembler Symbols

CPYPTRN, CPYMTRN,
Page 258
CPYETRN
Operation

CPYPTRN, CPYMTRN,
Page 259
CPYETRN
CheckMOPSEnabled();

integer N = MaxBlockSizeCopiedBytes();
bits(64) toaddress = X[d];
bits(64) fromaddress = X[s];
bits(64) cpysize = X[n];
bits(64) stagecpysize;
bits(8*N) readdata;
integer B;

if HaveMTE2Ext() then
SetTagCheckedInstruction(TRUE);

boolean supports_option_a = MemCpyOptionA();

(racctype, wacctype) = MemCpyAccessTypes(options);

if stage == MOPSStage_Prologue then

if cpysize<63:55> != '000000000' then cpysize = 0x007FFFFFFFFFFFFF<63:0>;

// IMP DEF selection of the amount covered by pre-processing.

stagecpysize = CPYPreSizeChoice(toaddress, fromaddress, cpysize);
assert stagecpysize<63> == cpysize<63> || stagecpysize == Zeros();

if SInt(cpysize) > 0 then

assert SInt(stagecpysize) <= SInt(cpysize);
else
assert SInt(stagecpysize) >= SInt(cpysize);
else
boolean zero_size_exceptions = MemCpyZeroSizeCheck();

// Check if this version is consistent with the state of the call.

CPYPTRN, CPYMTRN,
Page 260
CPYETRN
bits(64) postsize = CPYPostSizeChoice(toaddress, fromaddress, cpysize);
assert postsize<63> == cpysize<63> || SInt(postsize) == 0;

if stage == MOPSStage_Main then

stagecpysize = cpysize - postsize;

// Check if the parameters to this instruction are valid.

// Check if the parameters to the epilogue are valid.

if SInt(cpysize) < 0 then

assert B <= -1 * SInt(stagecpysize);

readdata<B*8-1:0> = Mem[fromaddress+cpysize, B, racctype];

Mem[toaddress+cpysize, B, wacctype] = readdata<B*8-1:0>;
cpysize = cpysize + B;
stagecpysize = stagecpysize + B;
else
assert B <= SInt(stagecpysize);

cpysize = cpysize - B;
stagecpysize = stagecpysize - B;
readdata<B*8-1:0> = Mem[fromaddress+cpysize, B, racctype];
Mem[toaddress+cpysize, B, wacctype] = readdata<B*8-1:0>;

if stage != MOPSStage_Prologue then

if PSTATE.N == '0' then

cpysize = cpysize - B;
stagecpysize = stagecpysize - B;

if stage != MOPSStage_Prologue then

X[n] = cpysize;
X[d] = toaddress;
X[s] = fromaddress;

CPYPTRN, CPYMTRN,
Page 261
CPYETRN
if stage == MOPSStage_Prologue then
X[n] = cpysize;
X[d] = toaddress;
X[s] = fromaddress;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

CPYPTRN, CPYMTRN,
Page 262
CPYETRN
CPYPTWN, CPYMTWN, CPYETWN

Memory Copy, reads and writes unprivileged, writes non-temporal. These instructions perform a memory copy. The
prologue, main, and epilogue instructions are expected to be run in succession and to appear consecutively in
memory: CPYPTWN, then CPYMTWN, and then CPYETWN.
CPYPTWN performs some preconditioning of the arguments suitable for using the CPYMTWN instruction, and
performs an IMPLEMENTATION DEFINED amount of the memory copy. CPYMTWN performs an IMPLEMENTATION DEFINED
amount of the memory copy. CPYETWN performs the last part of the memory copy.

Note

The inclusion of IMPLEMENTATION DEFINED amounts of memory copy allows some optimization of the size that can be
performed.
For CPYPTWN, the following saturation logic is applied:
If Xn<63:55> != 000000000, the copy size Xn is saturated to 0x007FFFFFFFFFFFFF.
After that saturation logic is applied, the direction of the memory copy is based on the following algorithm:
If (Xs > Xd) && (Xd + saturated Xn) > Xs, then direction = forward
Elsif (Xs < Xd) && (Xs + saturated Xn) > Xd, then direction = backward
Else direction = IMPLEMENTATION DEFINED choice between forward and backward.
The architecture supports two algorithms for the memory copy: option A and option B. Which algorithm is used is
IMPLEMENTATION DEFINED.

Note

Portable software should not assume that the choice of algorithm is constant.
After execution of CPYPTWN, option A (which results in encoding PSTATE.C = 0):
• PSTATE.{N,Z,V} are set to {0,0,0}.
• If the copy is in the forward direction, then:
◦ Xs holds the original Xs + saturated Xn.
◦ Xd holds the original Xd + saturated Xn.
◦ Xn holds -1* saturated Xn + an IMPLEMENTATION DEFINED number of bytes copied.
• If the copy is in the backward direction, then:
◦ Xs and Xd are unchanged.
◦ Xn holds the saturated value of Xn - an IMPLEMENTATION DEFINED number of bytes copied.
After execution of CPYPTWN, option B (which results in encoding PSTATE.C = 1):
• If the copy is in the forward direction, then:
◦ Xs holds the original Xs + an IMPLEMENTATION DEFINED number of bytes copied.
◦ Xd holds the original Xd + an IMPLEMENTATION DEFINED number of bytes copied.
◦ Xn holds the saturated Xn - an IMPLEMENTATION DEFINED number of bytes copied.
◦ PSTATE.{N,Z,V} are set to {0,0,0}.
• If the copy is in the backward direction, then:
◦ Xs holds the original Xs + saturated Xn - an IMPLEMENTATION DEFINED number of bytes copied.
◦ Xd holds the original Xd + saturated Xn - an IMPLEMENTATION DEFINED number of bytes copied.
◦ Xn holds the saturated Xn - an IMPLEMENTATION DEFINED number of bytes copied.
◦ PSTATE.{N,Z,V} are set to {1,0,0}.
For CPYMTWN, option A (encoded by PSTATE.C = 0), the format of the arguments is:
• Xn is treated as a signed 64-bit number.
• If the copy is in the forward direction (Xn is a negative number), then:
◦ Xn holds -1* the number of bytes remaining to be copied in the memory copy in total.
◦ Xs holds the lowest address that the copy is copied from -Xn.
◦ Xd holds the lowest address that the copy is made to -Xn.
◦ At the end of the instruction, the value of Xn is written back with -1* the number of bytes remaining
to be copied in the memory copy in total.
• If the copy is in the backward direction (Xn is a positive number), then:
◦ Xn holds the number of bytes remaining to be copied in the memory copy in total.
◦ Xs holds the highest address that the copy is copied from -Xn+1.
◦ Xd holds the highest address that the copy is copied to -Xn+1.
◦ At the end of the instruction, the value of Xn is written back with the number of bytes remaining to
be copied in the memory copy in total.
For CPYMTWN, option B (encoded by PSTATE.C = 1), the format of the arguments is:
• Xn holds the number of bytes to be copied in the memory copy in total.

CPYPTWN, CPYMTWN,
Page 263
CPYETWN
• If the copy is in the forward direction (PSTATE.N == 0), then:
◦ Xs holds the lowest address that the copy is copied from.
◦ Xd holds the lowest address that the copy is copied to.
◦ At the end of the instruction:
▪ the value of Xn is written back with the number of bytes remaining to be copied in the
memory copy in total.
▪ the value of Xs is written back with the lowest address that has not been copied from.
▪ the value of Xd is written back with the lowest address that has not been copied to.
• If the copy is in the backward direction (PSTATE.N == 1), then:
◦ Xs holds the highest address that the copy is copied from +1.
◦ Xd holds the highest address that the copy is copied to +1.
◦ At the end of the instruction:
▪ the value of Xn is written back with the number of bytes remaining to be copied in the
memory copy in total.
▪ the value of Xs is written back with the highest address that has not been copied from +1.
▪ the value of Xd is written back with the highest address that has not been copied to +1.
For CPYETWN, option A (encoded by PSTATE.C = 0), the format of the arguments is:
• Xn is treated as a signed 64-bit number.
• If the copy is in the forward direction (Xn is a negative number), then:
◦ Xn holds -1* the number of bytes remaining to be copied in the memory copy in total.
◦ Xs holds the lowest address that the copy is copied from -Xn.
◦ Xd holds the lowest address that the copy is made to -Xn.
◦ At the end of the instruction, the value of Xn is written back with 0.
• If the copy is in the backward direction (Xn is a positive number), then:
◦ Xn holds the number of bytes remaining to be copied in the memory copy in total.
◦ Xs holds the highest address that the copy is copied from -Xn+1.
◦ Xd holds the highest address that the copy is copied to -Xn+1.
◦ At the end of the instruction, the value of Xn is written back with 0.
For CPYETWN, option B (encoded by PSTATE.C = 1), the format of the arguments is:
• Xn holds the number of bytes to be copied in the memory copy in total
• If the copy is in the forward direction (PSTATE.N == 0), then:
◦ Xs holds the lowest address that the copy is copied from.
◦ Xd holds the lowest address that the copy is copied to.
◦ At the end of the instruction:
▪ the value of Xn is written back with 0.
▪ the value of Xs is written back with the lowest address that has not been copied from.
▪ the value of Xd is written back with the lowest address that has not been copied to.
• If the copy is in the backward direction (PSTATE.N == 1), then:
◦ Xs holds the highest address that the copy is copied from +1.
◦ Xd holds the highest address that the copy is copied to +1.
◦ At the end of the instruction:
▪ the value of Xn is written back with 0.
▪ the value of Xs is written back with the highest address that has not been copied from +1.
▪ the value of Xd is written back with the highest address that has not been copied to +1.

Integer
(FEAT_MOPS)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sz 0 1 1 1 0 1 op1 0 Rs 0 1 1 1 0 1 Rn Rd
op2

CPYPTWN, CPYMTWN,
Page 264
CPYETWN
Epilogue (op1 == 10)

CPYETWN [<Xd>]!, [<Xs>]!, <Xn>!

Main (op1 == 01)

CPYMTWN [<Xd>]!, [<Xs>]!, <Xn>!

Prologue (op1 == 00)

CPYPTWN [<Xd>]!, [<Xs>]!, <Xn>!

if !HaveFeatMOPS() then UNDEFINED;

if sz != '00' then UNDEFINED;

integer d = UInt(Rd);
integer s = UInt(Rs);
integer n = UInt(Rn);
bits(4) options = op2;

MOPSStage stage;
case op1 of
when '00' stage = MOPSStage_Prologue;
when '01' stage = MOPSStage_Main;
when '10' stage = MOPSStage_Epilogue;
otherwise SEE "Memory Copy and Memory Set";

if d == s || s == n || d == n then UNDEFINED;
if d == 31 || s == 31 || n == 31 then UNDEFINED;

Assembler Symbols

CPYPTWN, CPYMTWN,
Page 265
CPYETWN
Operation

CPYPTWN, CPYMTWN,
Page 266
CPYETWN
CheckMOPSEnabled();

integer N = MaxBlockSizeCopiedBytes();
bits(64) toaddress = X[d];
bits(64) fromaddress = X[s];
bits(64) cpysize = X[n];
bits(64) stagecpysize;
bits(8*N) readdata;
integer B;

if HaveMTE2Ext() then
SetTagCheckedInstruction(TRUE);

boolean supports_option_a = MemCpyOptionA();

(racctype, wacctype) = MemCpyAccessTypes(options);

if stage == MOPSStage_Prologue then

if cpysize<63:55> != '000000000' then cpysize = 0x007FFFFFFFFFFFFF<63:0>;

// IMP DEF selection of the amount covered by pre-processing.

stagecpysize = CPYPreSizeChoice(toaddress, fromaddress, cpysize);
assert stagecpysize<63> == cpysize<63> || stagecpysize == Zeros();

if SInt(cpysize) > 0 then

assert SInt(stagecpysize) <= SInt(cpysize);
else
assert SInt(stagecpysize) >= SInt(cpysize);
else
boolean zero_size_exceptions = MemCpyZeroSizeCheck();

// Check if this version is consistent with the state of the call.

CPYPTWN, CPYMTWN,
Page 267
CPYETWN
bits(64) postsize = CPYPostSizeChoice(toaddress, fromaddress, cpysize);
assert postsize<63> == cpysize<63> || SInt(postsize) == 0;

if stage == MOPSStage_Main then

stagecpysize = cpysize - postsize;

// Check if the parameters to this instruction are valid.

// Check if the parameters to the epilogue are valid.

if SInt(cpysize) < 0 then

assert B <= -1 * SInt(stagecpysize);

readdata<B*8-1:0> = Mem[fromaddress+cpysize, B, racctype];

Mem[toaddress+cpysize, B, wacctype] = readdata<B*8-1:0>;
cpysize = cpysize + B;
stagecpysize = stagecpysize + B;
else
assert B <= SInt(stagecpysize);

cpysize = cpysize - B;
stagecpysize = stagecpysize - B;
readdata<B*8-1:0> = Mem[fromaddress+cpysize, B, racctype];
Mem[toaddress+cpysize, B, wacctype] = readdata<B*8-1:0>;

if stage != MOPSStage_Prologue then

if PSTATE.N == '0' then

cpysize = cpysize - B;
stagecpysize = stagecpysize - B;

if stage != MOPSStage_Prologue then

X[n] = cpysize;
X[d] = toaddress;
X[s] = fromaddress;

CPYPTWN, CPYMTWN,
Page 268
CPYETWN
if stage == MOPSStage_Prologue then
X[n] = cpysize;
X[d] = toaddress;
X[s] = fromaddress;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

CPYPTWN, CPYMTWN,
Page 269
CPYETWN
CPYPWN, CPYMWN, CPYEWN

Memory Copy, writes non-temporal. These instructions perform a memory copy. The prologue, main, and epilogue
instructions are expected to be run in succession and to appear consecutively in memory: CPYPWN, then CPYMWN,
and then CPYEWN.
CPYPWN performs some preconditioning of the arguments suitable for using the CPYMWN instruction, and performs
an IMPLEMENTATION DEFINED amount of the memory copy. CPYMWN performs an IMPLEMENTATION DEFINED amount of
the memory copy. CPYEWN performs the last part of the memory copy.

Note

The inclusion of IMPLEMENTATION DEFINED amounts of memory copy allows some optimization of the size that can be
performed.
For CPYPWN, the following saturation logic is applied:
If Xn<63:55> != 000000000, the copy size Xn is saturated to 0x007FFFFFFFFFFFFF.
After that saturation logic is applied, the direction of the memory copy is based on the following algorithm:
If (Xs > Xd) && (Xd + saturated Xn) > Xs, then direction = forward
Elsif (Xs < Xd) && (Xs + saturated Xn) > Xd, then direction = backward
Else direction = IMPLEMENTATION DEFINED choice between forward and backward.
The architecture supports two algorithms for the memory copy: option A and option B. Which algorithm is used is
IMPLEMENTATION DEFINED.

Note

Portable software should not assume that the choice of algorithm is constant.
After execution of CPYPWN, option A (which results in encoding PSTATE.C = 0):
• PSTATE.{N,Z,V} are set to {0,0,0}.
• If the copy is in the forward direction, then:
◦ Xs holds the original Xs + saturated Xn.
◦ Xd holds the original Xd + saturated Xn.
◦ Xn holds -1* saturated Xn + an IMPLEMENTATION DEFINED number of bytes copied.
• If the copy is in the backward direction, then:
◦ Xs and Xd are unchanged.
◦ Xn holds the saturated value of Xn - an IMPLEMENTATION DEFINED number of bytes copied.
After execution of CPYPWN, option B (which results in encoding PSTATE.C = 1):
• If the copy is in the forward direction, then:
◦ Xs holds the original Xs + an IMPLEMENTATION DEFINED number of bytes copied.
◦ Xd holds the original Xd + an IMPLEMENTATION DEFINED number of bytes copied.
◦ Xn holds the saturated Xn - an IMPLEMENTATION DEFINED number of bytes copied.
◦ PSTATE.{N,Z,V} are set to {0,0,0}.
• If the copy is in the backward direction, then:
◦ Xs holds the original Xs + saturated Xn - an IMPLEMENTATION DEFINED number of bytes copied.
◦ Xd holds the original Xd + saturated Xn - an IMPLEMENTATION DEFINED number of bytes copied.
◦ Xn holds the saturated Xn - an IMPLEMENTATION DEFINED number of bytes copied.
◦ PSTATE.{N,Z,V} are set to {1,0,0}.
For CPYMWN, option A (encoded by PSTATE.C = 0), the format of the arguments is:
• Xn is treated as a signed 64-bit number.
• If the copy is in the forward direction (Xn is a negative number), then:
◦ Xn holds -1* the number of bytes remaining to be copied in the memory copy in total.
◦ Xs holds the lowest address that the copy is copied from -Xn.
◦ Xd holds the lowest address that the copy is made to -Xn.
◦ At the end of the instruction, the value of Xn is written back with -1* the number of bytes remaining
to be copied in the memory copy in total.
• If the copy is in the backward direction (Xn is a positive number), then:
◦ Xn holds the number of bytes remaining to be copied in the memory copy in total.
◦ Xs holds the highest address that the copy is copied from -Xn+1.
◦ Xd holds the highest address that the copy is copied to -Xn+1.
◦ At the end of the instruction, the value of Xn is written back with the number of bytes remaining to
be copied in the memory copy in total.
For CPYMWN, option B (encoded by PSTATE.C = 1), the format of the arguments is:
• Xn holds the number of bytes to be copied in the memory copy in total.

CPYPWN, CPYMWN,
Page 270
CPYEWN
• If the copy is in the forward direction (PSTATE.N == 0), then:
◦ Xs holds the lowest address that the copy is copied from.
◦ Xd holds the lowest address that the copy is copied to.
◦ At the end of the instruction:
▪ the value of Xn is written back with the number of bytes remaining to be copied in the
memory copy in total.
▪ the value of Xs is written back with the lowest address that has not been copied from.
▪ the value of Xd is written back with the lowest address that has not been copied to.
• If the copy is in the backward direction (PSTATE.N == 1), then:
◦ Xs holds the highest address that the copy is copied from +1.
◦ Xd holds the highest address that the copy is copied to +1.
◦ At the end of the instruction:
▪ the value of Xn is written back with the number of bytes remaining to be copied in the
memory copy in total.
▪ the value of Xs is written back with the highest address that has not been copied from +1.
▪ the value of Xd is written back with the highest address that has not been copied to +1.
For CPYEWN, option A (encoded by PSTATE.C = 0), the format of the arguments is:
• Xn is treated as a signed 64-bit number.
• If the copy is in the forward direction (Xn is a negative number), then:
◦ Xn holds -1* the number of bytes remaining to be copied in the memory copy in total.
◦ Xs holds the lowest address that the copy is copied from -Xn.
◦ Xd holds the lowest address that the copy is made to -Xn.
◦ At the end of the instruction, the value of Xn is written back with 0.
• If the copy is in the backward direction (Xn is a positive number), then:
◦ Xn holds the number of bytes remaining to be copied in the memory copy in total.
◦ Xs holds the highest address that the copy is copied from -Xn+1.
◦ Xd holds the highest address that the copy is copied to -Xn+1.
◦ At the end of the instruction, the value of Xn is written back with 0.
For CPYEWN, option B (encoded by PSTATE.C = 1), the format of the arguments is:
• Xn holds the number of bytes to be copied in the memory copy in total
• If the copy is in the forward direction (PSTATE.N == 0), then:
◦ Xs holds the lowest address that the copy is copied from.
◦ Xd holds the lowest address that the copy is copied to.
◦ At the end of the instruction:
▪ the value of Xn is written back with 0.
▪ the value of Xs is written back with the lowest address that has not been copied from.
▪ the value of Xd is written back with the lowest address that has not been copied to.
• If the copy is in the backward direction (PSTATE.N == 1), then:
◦ Xs holds the highest address that the copy is copied from +1.
◦ Xd holds the highest address that the copy is copied to +1.
◦ At the end of the instruction:
▪ the value of Xn is written back with 0.
▪ the value of Xs is written back with the highest address that has not been copied from +1.
▪ the value of Xd is written back with the highest address that has not been copied to +1.

Integer
(FEAT_MOPS)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sz 0 1 1 1 0 1 op1 0 Rs 0 1 0 0 0 1 Rn Rd
op2

CPYPWN, CPYMWN,
Page 271
CPYEWN
Epilogue (op1 == 10)

CPYEWN [<Xd>]!, [<Xs>]!, <Xn>!

Main (op1 == 01)

CPYMWN [<Xd>]!, [<Xs>]!, <Xn>!

Prologue (op1 == 00)

CPYPWN [<Xd>]!, [<Xs>]!, <Xn>!

if !HaveFeatMOPS() then UNDEFINED;

if sz != '00' then UNDEFINED;

integer d = UInt(Rd);
integer s = UInt(Rs);
integer n = UInt(Rn);
bits(4) options = op2;

MOPSStage stage;
case op1 of
when '00' stage = MOPSStage_Prologue;
when '01' stage = MOPSStage_Main;
when '10' stage = MOPSStage_Epilogue;
otherwise SEE "Memory Copy and Memory Set";

if d == s || s == n || d == n then UNDEFINED;
if d == 31 || s == 31 || n == 31 then UNDEFINED;

Assembler Symbols

CPYPWN, CPYMWN,
Page 272
CPYEWN
Operation

CPYPWN, CPYMWN,
Page 273
CPYEWN
CheckMOPSEnabled();

integer N = MaxBlockSizeCopiedBytes();
bits(64) toaddress = X[d];
bits(64) fromaddress = X[s];
bits(64) cpysize = X[n];
bits(64) stagecpysize;
bits(8*N) readdata;
integer B;

if HaveMTE2Ext() then
SetTagCheckedInstruction(TRUE);

boolean supports_option_a = MemCpyOptionA();

(racctype, wacctype) = MemCpyAccessTypes(options);

if stage == MOPSStage_Prologue then

if cpysize<63:55> != '000000000' then cpysize = 0x007FFFFFFFFFFFFF<63:0>;

// IMP DEF selection of the amount covered by pre-processing.

stagecpysize = CPYPreSizeChoice(toaddress, fromaddress, cpysize);
assert stagecpysize<63> == cpysize<63> || stagecpysize == Zeros();

if SInt(cpysize) > 0 then

assert SInt(stagecpysize) <= SInt(cpysize);
else
assert SInt(stagecpysize) >= SInt(cpysize);
else
boolean zero_size_exceptions = MemCpyZeroSizeCheck();

// Check if this version is consistent with the state of the call.

CPYPWN, CPYMWN,
Page 274
CPYEWN
bits(64) postsize = CPYPostSizeChoice(toaddress, fromaddress, cpysize);
assert postsize<63> == cpysize<63> || SInt(postsize) == 0;

if stage == MOPSStage_Main then

stagecpysize = cpysize - postsize;

// Check if the parameters to this instruction are valid.

// Check if the parameters to the epilogue are valid.

if SInt(cpysize) < 0 then

assert B <= -1 * SInt(stagecpysize);

readdata<B*8-1:0> = Mem[fromaddress+cpysize, B, racctype];

Mem[toaddress+cpysize, B, wacctype] = readdata<B*8-1:0>;
cpysize = cpysize + B;
stagecpysize = stagecpysize + B;
else
assert B <= SInt(stagecpysize);

cpysize = cpysize - B;
stagecpysize = stagecpysize - B;
readdata<B*8-1:0> = Mem[fromaddress+cpysize, B, racctype];
Mem[toaddress+cpysize, B, wacctype] = readdata<B*8-1:0>;

if stage != MOPSStage_Prologue then

if PSTATE.N == '0' then

cpysize = cpysize - B;
stagecpysize = stagecpysize - B;

if stage != MOPSStage_Prologue then

X[n] = cpysize;
X[d] = toaddress;
X[s] = fromaddress;

CPYPWN, CPYMWN,
Page 275
CPYEWN
if stage == MOPSStage_Prologue then
X[n] = cpysize;
X[d] = toaddress;
X[s] = fromaddress;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

CPYPWN, CPYMWN,
Page 276
CPYEWN
CPYPWT, CPYMWT, CPYEWT

Memory Copy, writes unprivileged. These instructions perform a memory copy. The prologue, main, and epilogue
instructions are expected to be run in succession and to appear consecutively in memory: CPYPWT, then CPYMWT, and
then CPYEWT.
CPYPWT performs some preconditioning of the arguments suitable for using the CPYMWT instruction, and performs
an IMPLEMENTATION DEFINED amount of the memory copy. CPYMWT performs an IMPLEMENTATION DEFINED amount of the
memory copy. CPYEWT performs the last part of the memory copy.

Note

The inclusion of IMPLEMENTATION DEFINED amounts of memory copy allows some optimization of the size that can be
performed.
For CPYPWT, the following saturation logic is applied:
If Xn<63:55> != 000000000, the copy size Xn is saturated to 0x007FFFFFFFFFFFFF.
After that saturation logic is applied, the direction of the memory copy is based on the following algorithm:
If (Xs > Xd) && (Xd + saturated Xn) > Xs, then direction = forward
Elsif (Xs < Xd) && (Xs + saturated Xn) > Xd, then direction = backward
Else direction = IMPLEMENTATION DEFINED choice between forward and backward.
The architecture supports two algorithms for the memory copy: option A and option B. Which algorithm is used is
IMPLEMENTATION DEFINED.

Note

Portable software should not assume that the choice of algorithm is constant.
After execution of CPYPWT, option A (which results in encoding PSTATE.C = 0):
• PSTATE.{N,Z,V} are set to {0,0,0}.
• If the copy is in the forward direction, then:
◦ Xs holds the original Xs + saturated Xn.
◦ Xd holds the original Xd + saturated Xn.
◦ Xn holds -1* saturated Xn + an IMPLEMENTATION DEFINED number of bytes copied.
• If the copy is in the backward direction, then:
◦ Xs and Xd are unchanged.
◦ Xn holds the saturated value of Xn - an IMPLEMENTATION DEFINED number of bytes copied.
After execution of CPYPWT, option B (which results in encoding PSTATE.C = 1):
• If the copy is in the forward direction, then:
◦ Xs holds the original Xs + an IMPLEMENTATION DEFINED number of bytes copied.
◦ Xd holds the original Xd + an IMPLEMENTATION DEFINED number of bytes copied.
◦ Xn holds the saturated Xn - an IMPLEMENTATION DEFINED number of bytes copied.
◦ PSTATE.{N,Z,V} are set to {0,0,0}.
• If the copy is in the backward direction, then:
◦ Xs holds the original Xs + saturated Xn - an IMPLEMENTATION DEFINED number of bytes copied.
◦ Xd holds the original Xd + saturated Xn - an IMPLEMENTATION DEFINED number of bytes copied.
◦ Xn holds the saturated Xn - an IMPLEMENTATION DEFINED number of bytes copied.
◦ PSTATE.{N,Z,V} are set to {1,0,0}.
For CPYMWT, option A (encoded by PSTATE.C = 0), the format of the arguments is:
• Xn is treated as a signed 64-bit number.
• If the copy is in the forward direction (Xn is a negative number), then:
◦ Xn holds -1* the number of bytes remaining to be copied in the memory copy in total.
◦ Xs holds the lowest address that the copy is copied from -Xn.
◦ Xd holds the lowest address that the copy is made to -Xn.
◦ At the end of the instruction, the value of Xn is written back with -1* the number of bytes remaining
to be copied in the memory copy in total.
• If the copy is in the backward direction (Xn is a positive number), then:
◦ Xn holds the number of bytes remaining to be copied in the memory copy in total.
◦ Xs holds the highest address that the copy is copied from -Xn+1.
◦ Xd holds the highest address that the copy is copied to -Xn+1.
◦ At the end of the instruction, the value of Xn is written back with the number of bytes remaining to
be copied in the memory copy in total.
For CPYMWT, option B (encoded by PSTATE.C = 1), the format of the arguments is:
• Xn holds the number of bytes to be copied in the memory copy in total.

CPYPWT, CPYMWT, CPYEWT Page 277

• If the copy is in the forward direction (PSTATE.N == 0), then:
◦ Xs holds the lowest address that the copy is copied from.
◦ Xd holds the lowest address that the copy is copied to.
◦ At the end of the instruction:
▪ the value of Xn is written back with the number of bytes remaining to be copied in the
memory copy in total.
▪ the value of Xs is written back with the lowest address that has not been copied from.
▪ the value of Xd is written back with the lowest address that has not been copied to.
• If the copy is in the backward direction (PSTATE.N == 1), then:
◦ Xs holds the highest address that the copy is copied from +1.
◦ Xd holds the highest address that the copy is copied to +1.
◦ At the end of the instruction:
▪ the value of Xn is written back with the number of bytes remaining to be copied in the
memory copy in total.
▪ the value of Xs is written back with the highest address that has not been copied from +1.
▪ the value of Xd is written back with the highest address that has not been copied to +1.
For CPYEWT, option A (encoded by PSTATE.C = 0), the format of the arguments is:
• Xn is treated as a signed 64-bit number.
• If the copy is in the forward direction (Xn is a negative number), then:
◦ Xn holds -1* the number of bytes remaining to be copied in the memory copy in total.
◦ Xs holds the lowest address that the copy is copied from -Xn.
◦ Xd holds the lowest address that the copy is made to -Xn.
◦ At the end of the instruction, the value of Xn is written back with 0.
• If the copy is in the backward direction (Xn is a positive number), then:
◦ Xn holds the number of bytes remaining to be copied in the memory copy in total.
◦ Xs holds the highest address that the copy is copied from -Xn+1.
◦ Xd holds the highest address that the copy is copied to -Xn+1.
◦ At the end of the instruction, the value of Xn is written back with 0.
For CPYEWT, option B (encoded by PSTATE.C = 1), the format of the arguments is:
• Xn holds the number of bytes to be copied in the memory copy in total
• If the copy is in the forward direction (PSTATE.N == 0), then:
◦ Xs holds the lowest address that the copy is copied from.
◦ Xd holds the lowest address that the copy is copied to.
◦ At the end of the instruction:
▪ the value of Xn is written back with 0.
▪ the value of Xs is written back with the lowest address that has not been copied from.
▪ the value of Xd is written back with the lowest address that has not been copied to.
• If the copy is in the backward direction (PSTATE.N == 1), then:
◦ Xs holds the highest address that the copy is copied from +1.
◦ Xd holds the highest address that the copy is copied to +1.
◦ At the end of the instruction:
▪ the value of Xn is written back with 0.
▪ the value of Xs is written back with the highest address that has not been copied from +1.
▪ the value of Xd is written back with the highest address that has not been copied to +1.

Integer
(FEAT_MOPS)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sz 0 1 1 1 0 1 op1 0 Rs 0 0 0 1 0 1 Rn Rd
op2

CPYPWT, CPYMWT, CPYEWT Page 278

Epilogue (op1 == 10)

CPYEWT [<Xd>]!, [<Xs>]!, <Xn>!

Main (op1 == 01)

CPYMWT [<Xd>]!, [<Xs>]!, <Xn>!

Prologue (op1 == 00)

CPYPWT [<Xd>]!, [<Xs>]!, <Xn>!

if !HaveFeatMOPS() then UNDEFINED;

if sz != '00' then UNDEFINED;

integer d = UInt(Rd);
integer s = UInt(Rs);
integer n = UInt(Rn);
bits(4) options = op2;

MOPSStage stage;
case op1 of
when '00' stage = MOPSStage_Prologue;
when '01' stage = MOPSStage_Main;
when '10' stage = MOPSStage_Epilogue;
otherwise SEE "Memory Copy and Memory Set";

if d == s || s == n || d == n then UNDEFINED;
if d == 31 || s == 31 || n == 31 then UNDEFINED;

Assembler Symbols

CPYPWT, CPYMWT, CPYEWT Page 279

Operation

CPYPWT, CPYMWT, CPYEWT Page 280

CheckMOPSEnabled();

integer N = MaxBlockSizeCopiedBytes();
bits(64) toaddress = X[d];
bits(64) fromaddress = X[s];
bits(64) cpysize = X[n];
bits(64) stagecpysize;
bits(8*N) readdata;
integer B;

if HaveMTE2Ext() then
SetTagCheckedInstruction(TRUE);

boolean supports_option_a = MemCpyOptionA();

(racctype, wacctype) = MemCpyAccessTypes(options);

if stage == MOPSStage_Prologue then

if cpysize<63:55> != '000000000' then cpysize = 0x007FFFFFFFFFFFFF<63:0>;

// IMP DEF selection of the amount covered by pre-processing.

stagecpysize = CPYPreSizeChoice(toaddress, fromaddress, cpysize);
assert stagecpysize<63> == cpysize<63> || stagecpysize == Zeros();

if SInt(cpysize) > 0 then

assert SInt(stagecpysize) <= SInt(cpysize);
else
assert SInt(stagecpysize) >= SInt(cpysize);
else
boolean zero_size_exceptions = MemCpyZeroSizeCheck();

// Check if this version is consistent with the state of the call.

CPYPWT, CPYMWT, CPYEWT Page 281

bits(64) postsize = CPYPostSizeChoice(toaddress, fromaddress, cpysize);
assert postsize<63> == cpysize<63> || SInt(postsize) == 0;

if stage == MOPSStage_Main then

stagecpysize = cpysize - postsize;

// Check if the parameters to this instruction are valid.

// Check if the parameters to the epilogue are valid.

if SInt(cpysize) < 0 then

assert B <= -1 * SInt(stagecpysize);

readdata<B*8-1:0> = Mem[fromaddress+cpysize, B, racctype];

Mem[toaddress+cpysize, B, wacctype] = readdata<B*8-1:0>;
cpysize = cpysize + B;
stagecpysize = stagecpysize + B;
else
assert B <= SInt(stagecpysize);

cpysize = cpysize - B;
stagecpysize = stagecpysize - B;
readdata<B*8-1:0> = Mem[fromaddress+cpysize, B, racctype];
Mem[toaddress+cpysize, B, wacctype] = readdata<B*8-1:0>;

if stage != MOPSStage_Prologue then

if PSTATE.N == '0' then

cpysize = cpysize - B;
stagecpysize = stagecpysize - B;

if stage != MOPSStage_Prologue then

X[n] = cpysize;
X[d] = toaddress;
X[s] = fromaddress;

CPYPWT, CPYMWT, CPYEWT Page 282

if stage == MOPSStage_Prologue then
X[n] = cpysize;
X[d] = toaddress;
X[s] = fromaddress;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

CPYPWT, CPYMWT, CPYEWT Page 283

CPYPWTN, CPYMWTN, CPYEWTN

Memory Copy, writes unprivileged, reads and writes non-temporal. These instructions perform a memory copy. The
prologue, main, and epilogue instructions are expected to be run in succession and to appear consecutively in
memory: CPYPWTN, then CPYMWTN, and then CPYEWTN.
CPYPWTN performs some preconditioning of the arguments suitable for using the CPYMWTN instruction, and
performs an IMPLEMENTATION DEFINED amount of the memory copy. CPYMWTN performs an IMPLEMENTATION DEFINED
amount of the memory copy. CPYEWTN performs the last part of the memory copy.

Note

The inclusion of IMPLEMENTATION DEFINED amounts of memory copy allows some optimization of the size that can be
performed.
For CPYPWTN, the following saturation logic is applied:
If Xn<63:55> != 000000000, the copy size Xn is saturated to 0x007FFFFFFFFFFFFF.
After that saturation logic is applied, the direction of the memory copy is based on the following algorithm:
If (Xs > Xd) && (Xd + saturated Xn) > Xs, then direction = forward
Elsif (Xs < Xd) && (Xs + saturated Xn) > Xd, then direction = backward
Else direction = IMPLEMENTATION DEFINED choice between forward and backward.
The architecture supports two algorithms for the memory copy: option A and option B. Which algorithm is used is
IMPLEMENTATION DEFINED.

Note

Portable software should not assume that the choice of algorithm is constant.
After execution of CPYPWTN, option A (which results in encoding PSTATE.C = 0):
• PSTATE.{N,Z,V} are set to {0,0,0}.
• If the copy is in the forward direction, then:
◦ Xs holds the original Xs + saturated Xn.
◦ Xd holds the original Xd + saturated Xn.
◦ Xn holds -1* saturated Xn + an IMPLEMENTATION DEFINED number of bytes copied.
• If the copy is in the backward direction, then:
◦ Xs and Xd are unchanged.
◦ Xn holds the saturated value of Xn - an IMPLEMENTATION DEFINED number of bytes copied.
After execution of CPYPWTN, option B (which results in encoding PSTATE.C = 1):
• If the copy is in the forward direction, then:
◦ Xs holds the original Xs + an IMPLEMENTATION DEFINED number of bytes copied.
◦ Xd holds the original Xd + an IMPLEMENTATION DEFINED number of bytes copied.
◦ Xn holds the saturated Xn - an IMPLEMENTATION DEFINED number of bytes copied.
◦ PSTATE.{N,Z,V} are set to {0,0,0}.
• If the copy is in the backward direction, then:
◦ Xs holds the original Xs + saturated Xn - an IMPLEMENTATION DEFINED number of bytes copied.
◦ Xd holds the original Xd + saturated Xn - an IMPLEMENTATION DEFINED number of bytes copied.
◦ Xn holds the saturated Xn - an IMPLEMENTATION DEFINED number of bytes copied.
◦ PSTATE.{N,Z,V} are set to {1,0,0}.
For CPYMWTN, option A (encoded by PSTATE.C = 0), the format of the arguments is:
• Xn is treated as a signed 64-bit number.
• If the copy is in the forward direction (Xn is a negative number), then:
◦ Xn holds -1* the number of bytes remaining to be copied in the memory copy in total.
◦ Xs holds the lowest address that the copy is copied from -Xn.
◦ Xd holds the lowest address that the copy is made to -Xn.
◦ At the end of the instruction, the value of Xn is written back with -1* the number of bytes remaining
to be copied in the memory copy in total.
• If the copy is in the backward direction (Xn is a positive number), then:
◦ Xn holds the number of bytes remaining to be copied in the memory copy in total.
◦ Xs holds the highest address that the copy is copied from -Xn+1.
◦ Xd holds the highest address that the copy is copied to -Xn+1.
◦ At the end of the instruction, the value of Xn is written back with the number of bytes remaining to
be copied in the memory copy in total.
For CPYMWTN, option B (encoded by PSTATE.C = 1), the format of the arguments is:
• Xn holds the number of bytes to be copied in the memory copy in total.

CPYPWTN, CPYMWTN,
Page 284
CPYEWTN
• If the copy is in the forward direction (PSTATE.N == 0), then:
◦ Xs holds the lowest address that the copy is copied from.
◦ Xd holds the lowest address that the copy is copied to.
◦ At the end of the instruction:
▪ the value of Xn is written back with the number of bytes remaining to be copied in the
memory copy in total.
▪ the value of Xs is written back with the lowest address that has not been copied from.
▪ the value of Xd is written back with the lowest address that has not been copied to.
• If the copy is in the backward direction (PSTATE.N == 1), then:
◦ Xs holds the highest address that the copy is copied from +1.
◦ Xd holds the highest address that the copy is copied to +1.
◦ At the end of the instruction:
▪ the value of Xn is written back with the number of bytes remaining to be copied in the
memory copy in total.
▪ the value of Xs is written back with the highest address that has not been copied from +1.
▪ the value of Xd is written back with the highest address that has not been copied to +1.
For CPYEWTN, option A (encoded by PSTATE.C = 0), the format of the arguments is:
• Xn is treated as a signed 64-bit number.
• If the copy is in the forward direction (Xn is a negative number), then:
◦ Xn holds -1* the number of bytes remaining to be copied in the memory copy in total.
◦ Xs holds the lowest address that the copy is copied from -Xn.
◦ Xd holds the lowest address that the copy is made to -Xn.
◦ At the end of the instruction, the value of Xn is written back with 0.
• If the copy is in the backward direction (Xn is a positive number), then:
◦ Xn holds the number of bytes remaining to be copied in the memory copy in total.
◦ Xs holds the highest address that the copy is copied from -Xn+1.
◦ Xd holds the highest address that the copy is copied to -Xn+1.
◦ At the end of the instruction, the value of Xn is written back with 0.
For CPYEWTN, option B (encoded by PSTATE.C = 1), the format of the arguments is:
• Xn holds the number of bytes to be copied in the memory copy in total
• If the copy is in the forward direction (PSTATE.N == 0), then:
◦ Xs holds the lowest address that the copy is copied from.
◦ Xd holds the lowest address that the copy is copied to.
◦ At the end of the instruction:
▪ the value of Xn is written back with 0.
▪ the value of Xs is written back with the lowest address that has not been copied from.
▪ the value of Xd is written back with the lowest address that has not been copied to.
• If the copy is in the backward direction (PSTATE.N == 1), then:
◦ Xs holds the highest address that the copy is copied from +1.
◦ Xd holds the highest address that the copy is copied to +1.
◦ At the end of the instruction:
▪ the value of Xn is written back with 0.
▪ the value of Xs is written back with the highest address that has not been copied from +1.
▪ the value of Xd is written back with the highest address that has not been copied to +1.

Integer
(FEAT_MOPS)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sz 0 1 1 1 0 1 op1 0 Rs 1 1 0 1 0 1 Rn Rd
op2

CPYPWTN, CPYMWTN,
Page 285
CPYEWTN
Epilogue (op1 == 10)

CPYEWTN [<Xd>]!, [<Xs>]!, <Xn>!

Main (op1 == 01)

CPYMWTN [<Xd>]!, [<Xs>]!, <Xn>!

Prologue (op1 == 00)

CPYPWTN [<Xd>]!, [<Xs>]!, <Xn>!

if !HaveFeatMOPS() then UNDEFINED;

if sz != '00' then UNDEFINED;

integer d = UInt(Rd);
integer s = UInt(Rs);
integer n = UInt(Rn);
bits(4) options = op2;

MOPSStage stage;
case op1 of
when '00' stage = MOPSStage_Prologue;
when '01' stage = MOPSStage_Main;
when '10' stage = MOPSStage_Epilogue;
otherwise SEE "Memory Copy and Memory Set";

if d == s || s == n || d == n then UNDEFINED;
if d == 31 || s == 31 || n == 31 then UNDEFINED;

Assembler Symbols

CPYPWTN, CPYMWTN,
Page 286
CPYEWTN
Operation

CPYPWTN, CPYMWTN,
Page 287
CPYEWTN
CheckMOPSEnabled();

integer N = MaxBlockSizeCopiedBytes();
bits(64) toaddress = X[d];
bits(64) fromaddress = X[s];
bits(64) cpysize = X[n];
bits(64) stagecpysize;
bits(8*N) readdata;
integer B;

if HaveMTE2Ext() then
SetTagCheckedInstruction(TRUE);

boolean supports_option_a = MemCpyOptionA();

(racctype, wacctype) = MemCpyAccessTypes(options);

if stage == MOPSStage_Prologue then

if cpysize<63:55> != '000000000' then cpysize = 0x007FFFFFFFFFFFFF<63:0>;

// IMP DEF selection of the amount covered by pre-processing.

stagecpysize = CPYPreSizeChoice(toaddress, fromaddress, cpysize);
assert stagecpysize<63> == cpysize<63> || stagecpysize == Zeros();

if SInt(cpysize) > 0 then

assert SInt(stagecpysize) <= SInt(cpysize);
else
assert SInt(stagecpysize) >= SInt(cpysize);
else
boolean zero_size_exceptions = MemCpyZeroSizeCheck();

// Check if this version is consistent with the state of the call.

CPYPWTN, CPYMWTN,
Page 288
CPYEWTN
bits(64) postsize = CPYPostSizeChoice(toaddress, fromaddress, cpysize);
assert postsize<63> == cpysize<63> || SInt(postsize) == 0;

if stage == MOPSStage_Main then

stagecpysize = cpysize - postsize;

// Check if the parameters to this instruction are valid.

// Check if the parameters to the epilogue are valid.

if SInt(cpysize) < 0 then

assert B <= -1 * SInt(stagecpysize);

readdata<B*8-1:0> = Mem[fromaddress+cpysize, B, racctype];

Mem[toaddress+cpysize, B, wacctype] = readdata<B*8-1:0>;
cpysize = cpysize + B;
stagecpysize = stagecpysize + B;
else
assert B <= SInt(stagecpysize);

cpysize = cpysize - B;
stagecpysize = stagecpysize - B;
readdata<B*8-1:0> = Mem[fromaddress+cpysize, B, racctype];
Mem[toaddress+cpysize, B, wacctype] = readdata<B*8-1:0>;

if stage != MOPSStage_Prologue then

if PSTATE.N == '0' then

cpysize = cpysize - B;
stagecpysize = stagecpysize - B;

if stage != MOPSStage_Prologue then

X[n] = cpysize;
X[d] = toaddress;
X[s] = fromaddress;

CPYPWTN, CPYMWTN,
Page 289
CPYEWTN
if stage == MOPSStage_Prologue then
X[n] = cpysize;
X[d] = toaddress;
X[s] = fromaddress;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

CPYPWTN, CPYMWTN,
Page 290
CPYEWTN
CPYPWTRN, CPYMWTRN, CPYEWTRN

Memory Copy, writes unprivileged, reads non-temporal. These instructions perform a memory copy. The prologue,
main, and epilogue instructions are expected to be run in succession and to appear consecutively in memory:
CPYPWTRN, then CPYMWTRN, and then CPYEWTRN.
CPYPWTRN performs some preconditioning of the arguments suitable for using the CPYMWTRN instruction, and
performs an IMPLEMENTATION DEFINED amount of the memory copy. CPYMWTRN performs an IMPLEMENTATION DEFINED
amount of the memory copy. CPYEWTRN performs the last part of the memory copy.

Note

The inclusion of IMPLEMENTATION DEFINED amounts of memory copy allows some optimization of the size that can be
performed.
For CPYPWTRN, the following saturation logic is applied:
If Xn<63:55> != 000000000, the copy size Xn is saturated to 0x007FFFFFFFFFFFFF.
After that saturation logic is applied, the direction of the memory copy is based on the following algorithm:
If (Xs > Xd) && (Xd + saturated Xn) > Xs, then direction = forward
Elsif (Xs < Xd) && (Xs + saturated Xn) > Xd, then direction = backward
Else direction = IMPLEMENTATION DEFINED choice between forward and backward.
The architecture supports two algorithms for the memory copy: option A and option B. Which algorithm is used is
IMPLEMENTATION DEFINED.

Note

Portable software should not assume that the choice of algorithm is constant.
After execution of CPYPWTRN, option A (which results in encoding PSTATE.C = 0):
• PSTATE.{N,Z,V} are set to {0,0,0}.
• If the copy is in the forward direction, then:
◦ Xs holds the original Xs + saturated Xn.
◦ Xd holds the original Xd + saturated Xn.
◦ Xn holds -1* saturated Xn + an IMPLEMENTATION DEFINED number of bytes copied.
• If the copy is in the backward direction, then:
◦ Xs and Xd are unchanged.
◦ Xn holds the saturated value of Xn - an IMPLEMENTATION DEFINED number of bytes copied.
After execution of CPYPWTRN, option B (which results in encoding PSTATE.C = 1):
• If the copy is in the forward direction, then:
◦ Xs holds the original Xs + an IMPLEMENTATION DEFINED number of bytes copied.
◦ Xd holds the original Xd + an IMPLEMENTATION DEFINED number of bytes copied.
◦ Xn holds the saturated Xn - an IMPLEMENTATION DEFINED number of bytes copied.
◦ PSTATE.{N,Z,V} are set to {0,0,0}.
• If the copy is in the backward direction, then:
◦ Xs holds the original Xs + saturated Xn - an IMPLEMENTATION DEFINED number of bytes copied.
◦ Xd holds the original Xd + saturated Xn - an IMPLEMENTATION DEFINED number of bytes copied.
◦ Xn holds the saturated Xn - an IMPLEMENTATION DEFINED number of bytes copied.
◦ PSTATE.{N,Z,V} are set to {1,0,0}.
For CPYMWTRN, option A (encoded by PSTATE.C = 0), the format of the arguments is:
• Xn is treated as a signed 64-bit number.
• If the copy is in the forward direction (Xn is a negative number), then:
◦ Xn holds -1* the number of bytes remaining to be copied in the memory copy in total.
◦ Xs holds the lowest address that the copy is copied from -Xn.
◦ Xd holds the lowest address that the copy is made to -Xn.
◦ At the end of the instruction, the value of Xn is written back with -1* the number of bytes remaining
to be copied in the memory copy in total.
• If the copy is in the backward direction (Xn is a positive number), then:
◦ Xn holds the number of bytes remaining to be copied in the memory copy in total.
◦ Xs holds the highest address that the copy is copied from -Xn+1.
◦ Xd holds the highest address that the copy is copied to -Xn+1.
◦ At the end of the instruction, the value of Xn is written back with the number of bytes remaining to
be copied in the memory copy in total.
For CPYMWTRN, option B (encoded by PSTATE.C = 1), the format of the arguments is:
• Xn holds the number of bytes to be copied in the memory copy in total.

CPYPWTRN, CPYMWTRN,
Page 291
CPYEWTRN
• If the copy is in the forward direction (PSTATE.N == 0), then:
◦ Xs holds the lowest address that the copy is copied from.
◦ Xd holds the lowest address that the copy is copied to.
◦ At the end of the instruction:
▪ the value of Xn is written back with the number of bytes remaining to be copied in the
memory copy in total.
▪ the value of Xs is written back with the lowest address that has not been copied from.
▪ the value of Xd is written back with the lowest address that has not been copied to.
• If the copy is in the backward direction (PSTATE.N == 1), then:
◦ Xs holds the highest address that the copy is copied from +1.
◦ Xd holds the highest address that the copy is copied to +1.
◦ At the end of the instruction:
▪ the value of Xn is written back with the number of bytes remaining to be copied in the
memory copy in total.
▪ the value of Xs is written back with the highest address that has not been copied from +1.
▪ the value of Xd is written back with the highest address that has not been copied to +1.
For CPYEWTRN, option A (encoded by PSTATE.C = 0), the format of the arguments is:
• Xn is treated as a signed 64-bit number.
• If the copy is in the forward direction (Xn is a negative number), then:
◦ Xn holds -1* the number of bytes remaining to be copied in the memory copy in total.
◦ Xs holds the lowest address that the copy is copied from -Xn.
◦ Xd holds the lowest address that the copy is made to -Xn.
◦ At the end of the instruction, the value of Xn is written back with 0.
• If the copy is in the backward direction (Xn is a positive number), then:
◦ Xn holds the number of bytes remaining to be copied in the memory copy in total.
◦ Xs holds the highest address that the copy is copied from -Xn+1.
◦ Xd holds the highest address that the copy is copied to -Xn+1.
◦ At the end of the instruction, the value of Xn is written back with 0.
For CPYEWTRN, option B (encoded by PSTATE.C = 1), the format of the arguments is:
• Xn holds the number of bytes to be copied in the memory copy in total
• If the copy is in the forward direction (PSTATE.N == 0), then:
◦ Xs holds the lowest address that the copy is copied from.
◦ Xd holds the lowest address that the copy is copied to.
◦ At the end of the instruction:
▪ the value of Xn is written back with 0.
▪ the value of Xs is written back with the lowest address that has not been copied from.
▪ the value of Xd is written back with the lowest address that has not been copied to.
• If the copy is in the backward direction (PSTATE.N == 1), then:
◦ Xs holds the highest address that the copy is copied from +1.
◦ Xd holds the highest address that the copy is copied to +1.
◦ At the end of the instruction:
▪ the value of Xn is written back with 0.
▪ the value of Xs is written back with the highest address that has not been copied from +1.
▪ the value of Xd is written back with the highest address that has not been copied to +1.

Integer
(FEAT_MOPS)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sz 0 1 1 1 0 1 op1 0 Rs 1 0 0 1 0 1 Rn Rd
op2

CPYPWTRN, CPYMWTRN,
Page 292
CPYEWTRN
Epilogue (op1 == 10)

CPYEWTRN [<Xd>]!, [<Xs>]!, <Xn>!

Main (op1 == 01)

CPYMWTRN [<Xd>]!, [<Xs>]!, <Xn>!

Prologue (op1 == 00)

CPYPWTRN [<Xd>]!, [<Xs>]!, <Xn>!

if !HaveFeatMOPS() then UNDEFINED;

if sz != '00' then UNDEFINED;

integer d = UInt(Rd);
integer s = UInt(Rs);
integer n = UInt(Rn);
bits(4) options = op2;

MOPSStage stage;
case op1 of
when '00' stage = MOPSStage_Prologue;
when '01' stage = MOPSStage_Main;
when '10' stage = MOPSStage_Epilogue;
otherwise SEE "Memory Copy and Memory Set";

if d == s || s == n || d == n then UNDEFINED;
if d == 31 || s == 31 || n == 31 then UNDEFINED;

Assembler Symbols

CPYPWTRN, CPYMWTRN,
Page 293
CPYEWTRN
Operation

CPYPWTRN, CPYMWTRN,
Page 294
CPYEWTRN
CheckMOPSEnabled();

integer N = MaxBlockSizeCopiedBytes();
bits(64) toaddress = X[d];
bits(64) fromaddress = X[s];
bits(64) cpysize = X[n];
bits(64) stagecpysize;
bits(8*N) readdata;
integer B;

if HaveMTE2Ext() then
SetTagCheckedInstruction(TRUE);

boolean supports_option_a = MemCpyOptionA();

(racctype, wacctype) = MemCpyAccessTypes(options);

if stage == MOPSStage_Prologue then

if cpysize<63:55> != '000000000' then cpysize = 0x007FFFFFFFFFFFFF<63:0>;

// IMP DEF selection of the amount covered by pre-processing.

stagecpysize = CPYPreSizeChoice(toaddress, fromaddress, cpysize);
assert stagecpysize<63> == cpysize<63> || stagecpysize == Zeros();

if SInt(cpysize) > 0 then

assert SInt(stagecpysize) <= SInt(cpysize);
else
assert SInt(stagecpysize) >= SInt(cpysize);
else
boolean zero_size_exceptions = MemCpyZeroSizeCheck();

// Check if this version is consistent with the state of the call.

CPYPWTRN, CPYMWTRN,
Page 295
CPYEWTRN
bits(64) postsize = CPYPostSizeChoice(toaddress, fromaddress, cpysize);
assert postsize<63> == cpysize<63> || SInt(postsize) == 0;

if stage == MOPSStage_Main then

stagecpysize = cpysize - postsize;

// Check if the parameters to this instruction are valid.

// Check if the parameters to the epilogue are valid.

if SInt(cpysize) < 0 then

assert B <= -1 * SInt(stagecpysize);

readdata<B*8-1:0> = Mem[fromaddress+cpysize, B, racctype];

Mem[toaddress+cpysize, B, wacctype] = readdata<B*8-1:0>;
cpysize = cpysize + B;
stagecpysize = stagecpysize + B;
else
assert B <= SInt(stagecpysize);

cpysize = cpysize - B;
stagecpysize = stagecpysize - B;
readdata<B*8-1:0> = Mem[fromaddress+cpysize, B, racctype];
Mem[toaddress+cpysize, B, wacctype] = readdata<B*8-1:0>;

if stage != MOPSStage_Prologue then

if PSTATE.N == '0' then

cpysize = cpysize - B;
stagecpysize = stagecpysize - B;

if stage != MOPSStage_Prologue then

X[n] = cpysize;
X[d] = toaddress;
X[s] = fromaddress;

CPYPWTRN, CPYMWTRN,
Page 296
CPYEWTRN
if stage == MOPSStage_Prologue then
X[n] = cpysize;
X[d] = toaddress;
X[s] = fromaddress;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

CPYPWTRN, CPYMWTRN,
Page 297
CPYEWTRN
CPYPWTWN, CPYMWTWN, CPYEWTWN

Memory Copy, writes unprivileged and non-temporal. These instructions perform a memory copy. The prologue, main,
and epilogue instructions are expected to be run in succession and to appear consecutively in memory: CPYPWTWN,
then CPYMWTWN, and then CPYEWTWN.
CPYPWTWN performs some preconditioning of the arguments suitable for using the CPYMWTWN instruction, and
performs an IMPLEMENTATION DEFINED amount of the memory copy. CPYMWTWN performs an IMPLEMENTATION DEFINED
amount of the memory copy. CPYEWTWN performs the last part of the memory copy.

Note

The inclusion of IMPLEMENTATION DEFINED amounts of memory copy allows some optimization of the size that can be
performed.
For CPYPWTWN, the following saturation logic is applied:
If Xn<63:55> != 000000000, the copy size Xn is saturated to 0x007FFFFFFFFFFFFF.
After that saturation logic is applied, the direction of the memory copy is based on the following algorithm:
If (Xs > Xd) && (Xd + saturated Xn) > Xs, then direction = forward
Elsif (Xs < Xd) && (Xs + saturated Xn) > Xd, then direction = backward
Else direction = IMPLEMENTATION DEFINED choice between forward and backward.
The architecture supports two algorithms for the memory copy: option A and option B. Which algorithm is used is
IMPLEMENTATION DEFINED.

Note

Portable software should not assume that the choice of algorithm is constant.
After execution of CPYPWTWN, option A (which results in encoding PSTATE.C = 0):
• PSTATE.{N,Z,V} are set to {0,0,0}.
• If the copy is in the forward direction, then:
◦ Xs holds the original Xs + saturated Xn.
◦ Xd holds the original Xd + saturated Xn.
◦ Xn holds -1* saturated Xn + an IMPLEMENTATION DEFINED number of bytes copied.
• If the copy is in the backward direction, then:
◦ Xs and Xd are unchanged.
◦ Xn holds the saturated value of Xn - an IMPLEMENTATION DEFINED number of bytes copied.
After execution of CPYPWTWN, option B (which results in encoding PSTATE.C = 1):
• If the copy is in the forward direction, then:
◦ Xs holds the original Xs + an IMPLEMENTATION DEFINED number of bytes copied.
◦ Xd holds the original Xd + an IMPLEMENTATION DEFINED number of bytes copied.
◦ Xn holds the saturated Xn - an IMPLEMENTATION DEFINED number of bytes copied.
◦ PSTATE.{N,Z,V} are set to {0,0,0}.
• If the copy is in the backward direction, then:
◦ Xs holds the original Xs + saturated Xn - an IMPLEMENTATION DEFINED number of bytes copied.
◦ Xd holds the original Xd + saturated Xn - an IMPLEMENTATION DEFINED number of bytes copied.
◦ Xn holds the saturated Xn - an IMPLEMENTATION DEFINED number of bytes copied.
◦ PSTATE.{N,Z,V} are set to {1,0,0}.
For CPYMWTWN, option A (encoded by PSTATE.C = 0), the format of the arguments is:
• Xn is treated as a signed 64-bit number.
• If the copy is in the forward direction (Xn is a negative number), then:
◦ Xn holds -1* the number of bytes remaining to be copied in the memory copy in total.
◦ Xs holds the lowest address that the copy is copied from -Xn.
◦ Xd holds the lowest address that the copy is made to -Xn.
◦ At the end of the instruction, the value of Xn is written back with -1* the number of bytes remaining
to be copied in the memory copy in total.
• If the copy is in the backward direction (Xn is a positive number), then:
◦ Xn holds the number of bytes remaining to be copied in the memory copy in total.
◦ Xs holds the highest address that the copy is copied from -Xn+1.
◦ Xd holds the highest address that the copy is copied to -Xn+1.
◦ At the end of the instruction, the value of Xn is written back with the number of bytes remaining to
be copied in the memory copy in total.
For CPYMWTWN, option B (encoded by PSTATE.C = 1), the format of the arguments is:
• Xn holds the number of bytes to be copied in the memory copy in total.

CPYPWTWN, CPYMWTWN,
Page 298
CPYEWTWN
• If the copy is in the forward direction (PSTATE.N == 0), then:
◦ Xs holds the lowest address that the copy is copied from.
◦ Xd holds the lowest address that the copy is copied to.
◦ At the end of the instruction:
▪ the value of Xn is written back with the number of bytes remaining to be copied in the
memory copy in total.
▪ the value of Xs is written back with the lowest address that has not been copied from.
▪ the value of Xd is written back with the lowest address that has not been copied to.
• If the copy is in the backward direction (PSTATE.N == 1), then:
◦ Xs holds the highest address that the copy is copied from +1.
◦ Xd holds the highest address that the copy is copied to +1.
◦ At the end of the instruction:
▪ the value of Xn is written back with the number of bytes remaining to be copied in the
memory copy in total.
▪ the value of Xs is written back with the highest address that has not been copied from +1.
▪ the value of Xd is written back with the highest address that has not been copied to +1.
For CPYEWTWN, option A (encoded by PSTATE.C = 0), the format of the arguments is:
• Xn is treated as a signed 64-bit number.
• If the copy is in the forward direction (Xn is a negative number), then:
◦ Xn holds -1* the number of bytes remaining to be copied in the memory copy in total.
◦ Xs holds the lowest address that the copy is copied from -Xn.
◦ Xd holds the lowest address that the copy is made to -Xn.
◦ At the end of the instruction, the value of Xn is written back with 0.
• If the copy is in the backward direction (Xn is a positive number), then:
◦ Xn holds the number of bytes remaining to be copied in the memory copy in total.
◦ Xs holds the highest address that the copy is copied from -Xn+1.
◦ Xd holds the highest address that the copy is copied to -Xn+1.
◦ At the end of the instruction, the value of Xn is written back with 0.
For CPYEWTWN, option B (encoded by PSTATE.C = 1), the format of the arguments is:
• Xn holds the number of bytes to be copied in the memory copy in total
• If the copy is in the forward direction (PSTATE.N == 0), then:
◦ Xs holds the lowest address that the copy is copied from.
◦ Xd holds the lowest address that the copy is copied to.
◦ At the end of the instruction:
▪ the value of Xn is written back with 0.
▪ the value of Xs is written back with the lowest address that has not been copied from.
▪ the value of Xd is written back with the lowest address that has not been copied to.
• If the copy is in the backward direction (PSTATE.N == 1), then:
◦ Xs holds the highest address that the copy is copied from +1.
◦ Xd holds the highest address that the copy is copied to +1.
◦ At the end of the instruction:
▪ the value of Xn is written back with 0.
▪ the value of Xs is written back with the highest address that has not been copied from +1.
▪ the value of Xd is written back with the highest address that has not been copied to +1.

Integer
(FEAT_MOPS)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sz 0 1 1 1 0 1 op1 0 Rs 0 1 0 1 0 1 Rn Rd
op2

CPYPWTWN, CPYMWTWN,
Page 299
CPYEWTWN
Epilogue (op1 == 10)

CPYEWTWN [<Xd>]!, [<Xs>]!, <Xn>!

Main (op1 == 01)

CPYMWTWN [<Xd>]!, [<Xs>]!, <Xn>!

Prologue (op1 == 00)

CPYPWTWN [<Xd>]!, [<Xs>]!, <Xn>!

if !HaveFeatMOPS() then UNDEFINED;

if sz != '00' then UNDEFINED;

integer d = UInt(Rd);
integer s = UInt(Rs);
integer n = UInt(Rn);
bits(4) options = op2;

MOPSStage stage;
case op1 of
when '00' stage = MOPSStage_Prologue;
when '01' stage = MOPSStage_Main;
when '10' stage = MOPSStage_Epilogue;
otherwise SEE "Memory Copy and Memory Set";

if d == s || s == n || d == n then UNDEFINED;
if d == 31 || s == 31 || n == 31 then UNDEFINED;

Assembler Symbols

CPYPWTWN, CPYMWTWN,
Page 300
CPYEWTWN
Operation

CPYPWTWN, CPYMWTWN,
Page 301
CPYEWTWN
CheckMOPSEnabled();

integer N = MaxBlockSizeCopiedBytes();
bits(64) toaddress = X[d];
bits(64) fromaddress = X[s];
bits(64) cpysize = X[n];
bits(64) stagecpysize;
bits(8*N) readdata;
integer B;

if HaveMTE2Ext() then
SetTagCheckedInstruction(TRUE);

boolean supports_option_a = MemCpyOptionA();

(racctype, wacctype) = MemCpyAccessTypes(options);

if stage == MOPSStage_Prologue then

if cpysize<63:55> != '000000000' then cpysize = 0x007FFFFFFFFFFFFF<63:0>;

// IMP DEF selection of the amount covered by pre-processing.

stagecpysize = CPYPreSizeChoice(toaddress, fromaddress, cpysize);
assert stagecpysize<63> == cpysize<63> || stagecpysize == Zeros();

if SInt(cpysize) > 0 then

assert SInt(stagecpysize) <= SInt(cpysize);
else
assert SInt(stagecpysize) >= SInt(cpysize);
else
boolean zero_size_exceptions = MemCpyZeroSizeCheck();

// Check if this version is consistent with the state of the call.

CPYPWTWN, CPYMWTWN,
Page 302
CPYEWTWN
bits(64) postsize = CPYPostSizeChoice(toaddress, fromaddress, cpysize);
assert postsize<63> == cpysize<63> || SInt(postsize) == 0;

if stage == MOPSStage_Main then

stagecpysize = cpysize - postsize;

// Check if the parameters to this instruction are valid.

// Check if the parameters to the epilogue are valid.

if SInt(cpysize) < 0 then

assert B <= -1 * SInt(stagecpysize);

readdata<B*8-1:0> = Mem[fromaddress+cpysize, B, racctype];

Mem[toaddress+cpysize, B, wacctype] = readdata<B*8-1:0>;
cpysize = cpysize + B;
stagecpysize = stagecpysize + B;
else
assert B <= SInt(stagecpysize);

cpysize = cpysize - B;
stagecpysize = stagecpysize - B;
readdata<B*8-1:0> = Mem[fromaddress+cpysize, B, racctype];
Mem[toaddress+cpysize, B, wacctype] = readdata<B*8-1:0>;

if stage != MOPSStage_Prologue then

if PSTATE.N == '0' then

cpysize = cpysize - B;
stagecpysize = stagecpysize - B;

if stage != MOPSStage_Prologue then

X[n] = cpysize;
X[d] = toaddress;
X[s] = fromaddress;

CPYPWTWN, CPYMWTWN,
Page 303
CPYEWTWN
if stage == MOPSStage_Prologue then
X[n] = cpysize;
X[d] = toaddress;
X[s] = fromaddress;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

CPYPWTWN, CPYMWTWN,
Page 304
CPYEWTWN
CRC32B, CRC32H, CRC32W, CRC32X

CRC32 checksum performs a cyclic redundancy check (CRC) calculation on a value held in a general-purpose register.
It takes an input CRC value in the first source operand, performs a CRC on the input value in the second source
operand, and returns the output CRC value. The second source operand can be 8, 16, 32, or 64 bits. To align with
common usage, the bit order of the values is reversed as part of the operation, and the polynomial 0x04C11DB7 is
used for the CRC calculation.
In an Armv8.0 implementation, this is an OPTIONAL instruction. From Armv8.1, it is mandatory for all implementations
to implement this instruction.

Note

ID_AA64ISAR0_EL1.CRC32 indicates whether this instruction is supported.

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 0 0 1 1 0 1 0 1 1 0 Rm 0 1 0 0 sz Rn Rd
C

CRC32B (sf == 0 && sz == 00)

CRC32B <Wd>, <Wn>, <Wm>

CRC32H (sf == 0 && sz == 01)

CRC32H <Wd>, <Wn>, <Wm>

CRC32W (sf == 0 && sz == 10)

CRC32W <Wd>, <Wn>, <Wm>

CRC32X (sf == 1 && sz == 11)

CRC32X <Wd>, <Wn>, <Xm>

if !HaveCRCExt() then UNDEFINED;

integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
if sf == '1' && sz != '11' then UNDEFINED;
if sf == '0' && sz == '11' then UNDEFINED;
integer size = 8 << UInt(sz);

Assembler Symbols

<Wd> Is the 32-bit name of the general-purpose accumulator output register, encoded in the "Rd" field.
<Wn> Is the 32-bit name of the general-purpose accumulator input register, encoded in the "Rn" field.
<Xm> Is the 64-bit name of the general-purpose data source register, encoded in the "Rm" field.
<Wm> Is the 32-bit name of the general-purpose data source register, encoded in the "Rm" field.

CRC32B, CRC32H, CRC32W,

Page 305
CRC32X
Operation

bits(32) acc = X[n]; // accumulator

bits(size) val = X[m]; // input value
bits(32) poly = 0x04C11DB7<31:0>;

bits(32+size) tempacc = BitReverse(acc):Zeros(size);

bits(size+32) tempval = BitReverse(val):Zeros(32);

// Poly32Mod2 on a bitstring does a polynomial Modulus over {0,1} operation

X[d] = BitReverse(Poly32Mod2(tempacc EOR tempval, poly));

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

CRC32B, CRC32H, CRC32W,

Page 306
CRC32X
CRC32CB, CRC32CH, CRC32CW, CRC32CX

CRC32 checksum performs a cyclic redundancy check (CRC) calculation on a value held in a general-purpose register.
It takes an input CRC value in the first source operand, performs a CRC on the input value in the second source
operand, and returns the output CRC value. The second source operand can be 8, 16, 32, or 64 bits. To align with
common usage, the bit order of the values is reversed as part of the operation, and the polynomial 0x1EDC6F41 is
used for the CRC calculation.
In an Armv8.0 implementation, this is an OPTIONAL instruction. From Armv8.1, it is mandatory for all implementations
to implement this instruction.

Note

ID_AA64ISAR0_EL1.CRC32 indicates whether this instruction is supported.

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 0 0 1 1 0 1 0 1 1 0 Rm 0 1 0 1 sz Rn Rd
C

CRC32CB (sf == 0 && sz == 00)

CRC32CB <Wd>, <Wn>, <Wm>

CRC32CH (sf == 0 && sz == 01)

CRC32CH <Wd>, <Wn>, <Wm>

CRC32CW (sf == 0 && sz == 10)

CRC32CW <Wd>, <Wn>, <Wm>

CRC32CX (sf == 1 && sz == 11)

CRC32CX <Wd>, <Wn>, <Xm>

if !HaveCRCExt() then UNDEFINED;

integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
if sf == '1' && sz != '11' then UNDEFINED;
if sf == '0' && sz == '11' then UNDEFINED;
integer size = 8 << UInt(sz);

Assembler Symbols

CRC32CB, CRC32CH,
Page 307
CRC32CW, CRC32CX
Operation

bits(32) acc = X[n]; // accumulator

bits(size) val = X[m]; // input value
bits(32) poly = 0x1EDC6F41<31:0>;

bits(32+size) tempacc = BitReverse(acc):Zeros(size);

bits(size+32) tempval = BitReverse(val):Zeros(32);

// Poly32Mod2 on a bitstring does a polynomial Modulus over {0,1} operation

X[d] = BitReverse(Poly32Mod2(tempacc EOR tempval, poly));

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

CRC32CB, CRC32CH,
Page 308
CRC32CW, CRC32CX
CSDB

Consumption of Speculative Data Barrier is a memory barrier that controls speculative execution and data value
prediction.
No instruction other than branch instructions appearing in program order after the CSDB can be speculatively
executed using the results of any:
• Data value predictions of any instructions.
• PSTATE.{N,Z,C,V} predictions of any instructions other than conditional branch instructions appearing in
program order before the CSDB that have not been architecturally resolved.
• Predictions of SVE predication state for any SVE instructions.

Note

For purposes of the definition of CSDB, PSTATE.{N,Z,C,V} is not considered a data value. This definition permits:
• Control flow speculation before and after the CSDB.
• Speculative execution of conditional data processing instructions after the CSDB, unless they use the
results of data value or PSTATE.{N,Z,C,V} predictions of instructions appearing in program order before
the CSDB that have not been architecturally resolved.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 0 1 0 1 0 0 0 0 0 0 1 1 0 0 1 0 0 0 1 0 1 0 0 1 1 1 1 1
CRm op2

CSDB

// Empty.

Operation

ConsumptionOfSpeculativeDataBarrier();

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

CSDB Page 309

CSEL

If the condition is true, Conditional Select writes the value of the first source register to the destination register. If the
condition is false, it writes the value of the second source register to the destination register.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 0 0 1 1 0 1 0 1 0 0 Rm cond 0 0 Rn Rd
op o2

32-bit (sf == 0)

CSEL <Wd>, <Wn>, <Wm>, <cond>

64-bit (sf == 1)

CSEL <Xd>, <Xn>, <Xm>, <cond>

integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer datasize = if sf == '1' then 64 else 32;

Assembler Symbols

<Wd> Is the 32-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Wn> Is the 32-bit name of the first general-purpose source register, encoded in the "Rn" field.
<Wm> Is the 32-bit name of the second general-purpose source register, encoded in the "Rm" field.
<Xd> Is the 64-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Xn> Is the 64-bit name of the first general-purpose source register, encoded in the "Rn" field.
<Xm> Is the 64-bit name of the second general-purpose source register, encoded in the "Rm" field.
<cond> Is one of the standard conditions, encoded in the "cond" field in the standard way.

Operation

bits(datasize) result;
if ConditionHolds(cond) then
result = X[n];
else
result = X[m];

X[d] = result;

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

CSEL Page 310

CSET

Conditional Set sets the destination register to 1 if the condition is TRUE, and otherwise sets it to 0.

This is an alias of CSINC. This means:

32-bit (sf == 0)

CSET <Wd>, <cond>

is equivalent to

CSINC <Wd>, WZR, WZR, invert(<cond>)

and is always the preferred disassembly.

64-bit (sf == 1)

CSET <Xd>, <cond>

is equivalent to

CSINC <Xd>, XZR, XZR, invert(<cond>)

and is always the preferred disassembly.

Assembler Symbols

<Wd> Is the 32-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Xd> Is the 64-bit name of the general-purpose destination register, encoded in the "Rd" field.
<cond> Is one of the standard conditions, excluding AL and NV, encoded in the "cond" field with its least
significant bit inverted.

Operation

The description of CSINC gives the operational pseudocode for this instruction.

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

CSET Page 311

CSETM

Conditional Set Mask sets all bits of the destination register to 1 if the condition is TRUE, and otherwise sets all bits to
0.

This is an alias of CSINV. This means:

32-bit (sf == 0)

CSETM <Wd>, <cond>

is equivalent to

CSINV <Wd>, WZR, WZR, invert(<cond>)

and is always the preferred disassembly.

64-bit (sf == 1)

CSETM <Xd>, <cond>

is equivalent to

CSINV <Xd>, XZR, XZR, invert(<cond>)

and is always the preferred disassembly.

Assembler Symbols

<Wd> Is the 32-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Xd> Is the 64-bit name of the general-purpose destination register, encoded in the "Rd" field.
<cond> Is one of the standard conditions, excluding AL and NV, encoded in the "cond" field with its least
significant bit inverted.

Operation

The description of CSINV gives the operational pseudocode for this instruction.

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

CSETM Page 312

CSINC

Conditional Select Increment returns, in the destination register, the value of the first source register if the condition
is TRUE, and otherwise returns the value of the second source register incremented by 1.
This instruction is used by the aliases CINC, and CSET.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 0 0 1 1 0 1 0 1 0 0 Rm cond 0 1 Rn Rd
op o2

32-bit (sf == 0)

CSINC <Wd>, <Wn>, <Wm>, <cond>

64-bit (sf == 1)

CSINC <Xd>, <Xn>, <Xm>, <cond>

integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer datasize = if sf == '1' then 64 else 32;

Assembler Symbols

<Wd> Is the 32-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Wn> Is the 32-bit name of the first general-purpose source register, encoded in the "Rn" field.
<Wm> Is the 32-bit name of the second general-purpose source register, encoded in the "Rm" field.
<Xd> Is the 64-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Xn> Is the 64-bit name of the first general-purpose source register, encoded in the "Rn" field.
<Xm> Is the 64-bit name of the second general-purpose source register, encoded in the "Rm" field.
<cond> Is one of the standard conditions, encoded in the "cond" field in the standard way.

Alias Conditions

Alias Is preferred when

CINC Rm != '11111' && cond != '111x' && Rn != '11111' && Rn == Rm
CSET Rm == '11111' && cond != '111x' && Rn == '11111'

Operation

bits(datasize) result;
if ConditionHolds(cond) then
result = X[n];
else
result = X[m];
result = result + 1;

X[d] = result;

Operational information

CSINC Page 313

◦ The values of the NZCV flags.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

CSINC Page 314

CSINV

Conditional Select Invert returns, in the destination register, the value of the first source register if the condition is
TRUE, and otherwise returns the bitwise inversion value of the second source register.
This instruction is used by the aliases CINV, and CSETM.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 1 0 1 1 0 1 0 1 0 0 Rm cond 0 0 Rn Rd
op o2

32-bit (sf == 0)

CSINV <Wd>, <Wn>, <Wm>, <cond>

64-bit (sf == 1)

CSINV <Xd>, <Xn>, <Xm>, <cond>

integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer datasize = if sf == '1' then 64 else 32;

Assembler Symbols

<Wd> Is the 32-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Wn> Is the 32-bit name of the first general-purpose source register, encoded in the "Rn" field.
<Wm> Is the 32-bit name of the second general-purpose source register, encoded in the "Rm" field.
<Xd> Is the 64-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Xn> Is the 64-bit name of the first general-purpose source register, encoded in the "Rn" field.
<Xm> Is the 64-bit name of the second general-purpose source register, encoded in the "Rm" field.
<cond> Is one of the standard conditions, encoded in the "cond" field in the standard way.

Alias Conditions

Alias Is preferred when

CINV Rm != '11111' && cond != '111x' && Rn != '11111' && Rn == Rm
CSETM Rm == '11111' && cond != '111x' && Rn == '11111'

Operation

bits(datasize) result;
if ConditionHolds(cond) then
result = X[n];
else
result = X[m];
result = NOT(result);

X[d] = result;

Operational information

CSINV Page 315

◦ The values of the NZCV flags.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

CSINV Page 316

CSNEG

Conditional Select Negation returns, in the destination register, the value of the first source register if the condition is
TRUE, and otherwise returns the negated value of the second source register.
This instruction is used by the alias CNEG.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 1 0 1 1 0 1 0 1 0 0 Rm cond 0 1 Rn Rd
op o2

32-bit (sf == 0)

CSNEG <Wd>, <Wn>, <Wm>, <cond>

64-bit (sf == 1)

CSNEG <Xd>, <Xn>, <Xm>, <cond>

integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer datasize = if sf == '1' then 64 else 32;

Assembler Symbols

<Wd> Is the 32-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Wn> Is the 32-bit name of the first general-purpose source register, encoded in the "Rn" field.
<Wm> Is the 32-bit name of the second general-purpose source register, encoded in the "Rm" field.
<Xd> Is the 64-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Xn> Is the 64-bit name of the first general-purpose source register, encoded in the "Rn" field.
<Xm> Is the 64-bit name of the second general-purpose source register, encoded in the "Rm" field.
<cond> Is one of the standard conditions, encoded in the "cond" field in the standard way.

Alias Conditions

Alias Is preferred when

CNEG cond != '111x' && Rn == Rm

Operation

bits(datasize) result;
if ConditionHolds(cond) then
result = X[n];
else
result = X[m];
result = NOT(result);
result = result + 1;

X[d] = result;

Operational information

CSNEG Page 317

◦ The values of the NZCV flags.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

CSNEG Page 318

Data Cache operation. For more information, see op0==0b01, cache maintenance, TLB maintenance, and address
translation instructions.

This is an alias of SYS. This means:

DC <dc_op>, <Xt>

is equivalent to

SYS #<op1>, C7, <Cm>, #<op2>, <Xt>

and is the preferred disassembly when SysOp(op1,'0111',CRm,op2) == Sys_DC.

Assembler Symbols

<dc_op> Is a DC instruction name, as listed for the DC system instruction group, encoded in “op1:CRm:op2”:

op1 CRm op2 <dc_op> Architectural Feature

000 0110 001 IVAC -
000 0110 010 ISW -
000 0110 011 IGVAC FEAT_MTE2
000 0110 100 IGSW FEAT_MTE2
000 0110 101 IGDVAC FEAT_MTE2
000 0110 110 IGDSW FEAT_MTE2
000 1010 010 CSW -
000 1010 100 CGSW FEAT_MTE2
000 1010 110 CGDSW FEAT_MTE2
000 1110 010 CISW -
000 1110 100 CIGSW FEAT_MTE2
000 1110 110 CIGDSW FEAT_MTE2
011 0100 001 ZVA -
011 0100 011 GVA FEAT_MTE
011 0100 100 GZVA FEAT_MTE
011 1010 001 CVAC -
011 1010 011 CGVAC FEAT_MTE
011 1010 101 CGDVAC FEAT_MTE
011 1011 001 CVAU -
011 1100 001 CVAP FEAT_DPB
011 1100 011 CGVAP FEAT_MTE
011 1100 101 CGDVAP FEAT_MTE
011 1101 001 CVADP FEAT_DPB2
011 1101 011 CGVADP FEAT_MTE
011 1101 101 CGDVADP FEAT_MTE
011 1110 001 CIVAC -
011 1110 011 CIGVAC FEAT_MTE
011 1110 101 CIGDVAC FEAT_MTE

DC Page 319
Operation

The description of SYS gives the operational pseudocode for this instruction.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

DC Page 320
DCPS1

Debug Change PE State to EL1, when executed in Debug state:

• If executed at EL0 changes the current Exception level and SP to EL1 using SP_EL1.
• Otherwise, if executed at ELx, selects SP_ELx.
The target exception level of a DCPS1 instruction is:
• EL1 if the instruction is executed at EL0.
• Otherwise, the Exception level at which the instruction is executed.
When the target Exception level of a DCPS1 instruction is ELx, on executing this instruction:
• ELR_ELx becomes UNKNOWN.
• SPSR_ELx becomes UNKNOWN.
• ESR_ELx becomes UNKNOWN.
• DLR_EL0 and DSPSR_EL0 become UNKNOWN.
• The endianness is set according to SCTLR_ELx.EE.
This instruction is UNDEFINED at EL0 in Non-secure state if EL2 is implemented and HCR_EL2.TGE == 1.
This instruction is always UNDEFINED in Non-debug state.
For more information on the operation of the DCPS<n> instructions, see DCPS.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 0 1 0 0 1 0 1 imm16 0 0 0 0 1
LL

DCPS1 {#<imm>}

if !Halted() then UNDEFINED;

Assembler Symbols

<imm> Is an optional 16-bit unsigned immediate, in the range 0 to 65535, defaulting to 0 and encoded in the
"imm16" field.

Operation

DCPSInstruction(LL);

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

DCPS1 Page 321

DCPS2

Debug Change PE State to EL2, when executed in Debug state:

• If executed at EL0 or EL1 changes the current Exception level and SP to EL2 using SP_EL2.
• Otherwise, if executed at ELx, selects SP_ELx.
The target exception level of a DCPS2 instruction is:
• EL2 if the instruction is executed at an exception level that is not EL3.
• EL3 if the instruction is executed at EL3.
When the target Exception level of a DCPS2 instruction is ELx, on executing this instruction:
• ELR_ELx becomes UNKNOWN.
• SPSR_ELx becomes UNKNOWN.
• ESR_ELx becomes UNKNOWN.
• DLR_EL0 and DSPSR_EL0 become UNKNOWN.
• The endianness is set according to SCTLR_ELx.EE.
This instruction is UNDEFINED at the following exception levels:
• All exception levels if EL2 is not implemented.
• At EL0 and EL1 if EL2 is disabled in the current Security state.
This instruction is always UNDEFINED in Non-debug state.
For more information on the operation of the DCPS<n> instructions, see DCPS.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 0 1 0 0 1 0 1 imm16 0 0 0 1 0
LL

DCPS2 {#<imm>}

if !Halted() then UNDEFINED;

Assembler Symbols

<imm> Is an optional 16-bit unsigned immediate, in the range 0 to 65535, defaulting to 0 and encoded in the
"imm16" field.

Operation

DCPSInstruction(LL);

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

DCPS2 Page 322

DCPS3

Debug Change PE State to EL3, when executed in Debug state:

• If executed at EL3 selects SP_EL3.
• Otherwise, changes the current Exception level and SP to EL3 using SP_EL3.
The target exception level of a DCPS3 instruction is EL3.
On executing a DCPS3 instruction:
• ELR_EL3 becomes UNKNOWN.
• SPSR_EL3 becomes UNKNOWN.
• ESR_EL3 becomes UNKNOWN.
• DLR_EL0 and DSPSR_EL0 become UNKNOWN.
• The endianness is set according to SCTLR_EL3.EE.
This instruction is UNDEFINED at all exception levels if either:
• EDSCR.SDD == 1.
• EL3 is not implemented.
This instruction is always UNDEFINED in Non-debug state.
For more information on the operation of the DCPS<n> instructions, see DCPS.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 0 1 0 0 1 0 1 imm16 0 0 0 1 1
LL

DCPS3 {#<imm>}

if !Halted() then UNDEFINED;

Assembler Symbols

<imm> Is an optional 16-bit unsigned immediate, in the range 0 to 65535, defaulting to 0 and encoded in the
"imm16" field.

Operation

DCPSInstruction(LL);

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

DCPS3 Page 323

DGH

Data Gathering Hint is a hint instruction that indicates that it is not expected to be performance optimal to merge
memory accesses with Normal Non-cacheable or Device-GRE attributes appearing in program order before the hint
instruction with any memory accesses appearing after the hint instruction into a single memory transaction on an
interconnect.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 0 1 0 1 0 0 0 0 0 0 1 1 0 0 1 0 0 0 0 0 1 1 0 1 1 1 1 1
CRm op2

DGH

if !HaveDGHExt() then EndOfInstruction();

Operation

Hint_DGH();

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

DGH Page 324

DMB

Data Memory Barrier is a memory barrier that ensures the ordering of observations of memory accesses, see Data
Memory Barrier.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 0 1 0 1 0 0 0 0 0 0 1 1 0 0 1 1 CRm 1 0 1 1 1 1 1 1
opc

DMB <option>|#<imm>

MBReqDomain domain;
MBReqTypes types;
case CRm<3:2> of
when '00' domain = MBReqDomain_OuterShareable;
when '01' domain = MBReqDomain_Nonshareable;
when '10' domain = MBReqDomain_InnerShareable;
when '11' domain = MBReqDomain_FullSystem;
case CRm<1:0> of
when '00' types = MBReqTypes_All; domain = MBReqDomain_FullSystem;
when '01' types = MBReqTypes_Reads;
when '10' types = MBReqTypes_Writes;
when '11' types = MBReqTypes_All;

Assembler Symbols

<option> Specifies the limitation on the barrier operation. Values are:

SY
Full system is the required shareability domain, reads and writes are the required access types,
both before and after the barrier instruction. This option is referred to as the full system barrier.
Encoded as CRm = 0b1111.

ST
Full system is the required shareability domain, writes are the required access type, both before
and after the barrier instruction. Encoded as CRm = 0b1110.

LD
Full system is the required shareability domain, reads are the required access type before the
barrier instruction, and reads and writes are the required access types after the barrier
instruction. Encoded as CRm = 0b1101.

ISH
Inner Shareable is the required shareability domain, reads and writes are the required access
types, both before and after the barrier instruction. Encoded as CRm = 0b1011.

ISHST
Inner Shareable is the required shareability domain, writes are the required access type, both
before and after the barrier instruction. Encoded as CRm = 0b1010.

ISHLD
Inner Shareable is the required shareability domain, reads are the required access type before the
barrier instruction, and reads and writes are the required access types after the barrier
instruction. Encoded as CRm = 0b1001.

NSH
Non-shareable is the required shareability domain, reads and writes are the required access, both
before and after the barrier instruction. Encoded as CRm = 0b0111.

NSHST
Non-shareable is the required shareability domain, writes are the required access type, both
before and after the barrier instruction. Encoded as CRm = 0b0110.

NSHLD
Non-shareable is the required shareability domain, reads are the required access type before the
barrier instruction, and reads and writes are the required access types after the barrier
instruction. Encoded as CRm = 0b0101.

DMB Page 325

OSH
Outer Shareable is the required shareability domain, reads and writes are the required access
types, both before and after the barrier instruction. Encoded as CRm = 0b0011.

OSHST
Outer Shareable is the required shareability domain, writes are the required access type, both
before and after the barrier instruction. Encoded as CRm = 0b0010.

OSHLD
Outer Shareable is the required shareability domain, reads are the required access type before the
barrier instruction, and reads and writes are the required access types after the barrier
instruction. Encoded as CRm = 0b0001.

All other encodings of CRm that are not listed above are reserved, and can be encoded using the
#<imm> syntax. All unsupported and reserved options must execute as a full system barrier operation,
but software must not rely on this behavior. For more information on whether an access is before or
after a barrier instruction, see Data Memory Barrier (DMB) or see Data Synchronization Barrier (DSB).
<imm> Is a 4-bit unsigned immediate, in the range 0 to 15, encoded in the "CRm" field.

Operation

DataMemoryBarrier(domain, types);

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

DMB Page 326

DRPS

Debug restore process state

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 0 1 1 0 1 0 1 1 1 1 1 1 0 0 0 0 0 0 1 1 1 1 1 0 0 0 0 0

DRPS

if !Halted() || PSTATE.EL == EL0 then UNDEFINED;

Operation

DRPSInstruction();

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

DRPS Page 327

DSB

Data Synchronization Barrier is a memory barrier that ensures the completion of memory accesses, see Data
Synchronization Barrier.
A DSB instruction with the nXS qualifier is complete when the subset of these memory accesses with the XS attribute
set to 0 are complete. It does not require that memory accesses with the XS attribute set to 1 are complete.
This instruction is used by the aliases PSSBB, and SSBB.
It has encodings from 2 classes: Memory barrier and Memory nXS barrier

Memory barrier

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 0 1 0 1 0 0 0 0 0 0 1 1 0 0 1 1 CRm 1 0 0 1 1 1 1 1
opc

DSB <option>|#<imm>

boolean nXS = FALSE;

DSBAlias alias;
case CRm of
when '0000' alias = DSBAlias_SSBB;
when '0100' alias = DSBAlias_PSSBB;
otherwise alias = DSBAlias_DSB;

MBReqDomain domain;
case CRm<3:2> of
when '00' domain = MBReqDomain_OuterShareable;
when '01' domain = MBReqDomain_Nonshareable;
when '10' domain = MBReqDomain_InnerShareable;
when '11' domain = MBReqDomain_FullSystem;

MBReqTypes types;
case CRm<1:0> of
when '00' types = MBReqTypes_All; domain = MBReqDomain_FullSystem;
when '01' types = MBReqTypes_Reads;
when '10' types = MBReqTypes_Writes;
when '11' types = MBReqTypes_All;

Memory nXS barrier

(FEAT_XS)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 0 1 0 1 0 0 0 0 0 0 1 1 0 0 1 1 imm2 1 0 0 0 1 1 1 1 1 1

DSB <option>nXS|#<imm>

if !HaveFeatXS() then UNDEFINED;

MBReqTypes types = MBReqTypes_All;
boolean nXS = TRUE;
DSBAlias alias = DSBAlias_DSB;
MBReqDomain domain;

case imm2 of
when '00' domain = MBReqDomain_OuterShareable;
when '01' domain = MBReqDomain_Nonshareable;
when '10' domain = MBReqDomain_InnerShareable;
when '11' domain = MBReqDomain_FullSystem;

Assembler Symbols

<option> For the memory barrier variant: specifies the limitation on the barrier operation. Values are:

DSB Page 328

ST
Full system is the required shareability domain, writes are the required access type, both before
and after the barrier instruction. Encoded as CRm = 0b1110.

ISH
Inner Shareable is the required shareability domain, reads and writes are the required access
types, both before and after the barrier instruction. Encoded as CRm = 0b1011.

ISHST
Inner Shareable is the required shareability domain, writes are the required access type, both
before and after the barrier instruction. Encoded as CRm = 0b1010.

NSH
Non-shareable is the required shareability domain, reads and writes are the required access, both
before and after the barrier instruction. Encoded as CRm = 0b0111.

NSHST
Non-shareable is the required shareability domain, writes are the required access type, both
before and after the barrier instruction. Encoded as CRm = 0b0110.

OSH
Outer Shareable is the required shareability domain, reads and writes are the required access
types, both before and after the barrier instruction. Encoded as CRm = 0b0011.

OSHST
Outer Shareable is the required shareability domain, writes are the required access type, both
before and after the barrier instruction. Encoded as CRm = 0b0010.

All other encodings of CRm, other than the values 0b0000 and 0b0100, that are not listed above are
reserved, and can be encoded using the #<imm> syntax. All unsupported and reserved options must
execute as a full system barrier operation, but software must not rely on this behavior. For more
information on whether an access is before or after a barrier instruction, see Data Memory Barrier
(DMB) or see Data Synchronization Barrier (DSB).

Note

The value 0b0000 is used to encode SSBB and the value 0b0100 is used to encode PSSBB.
For the memory nXS barrier variant: specifies the limitation on the barrier operation. Values are:
SY
Full system is the required shareability domain, reads and writes are the required access types,
both before and after the barrier instruction. This option is referred to as the full system barrier.
Encoded as CRm<3:2> = 0b11.

DSB Page 329

ISH
Inner Shareable is the required shareability domain, reads and writes are the required access
types, both before and after the barrier instruction. Encoded as CRm<3:2> = 0b10.

NSH
Non-shareable is the required shareability domain, reads and writes are the required access, both
before and after the barrier instruction. Encoded as CRm<3:2> = 0b01.

OSH
Outer Shareable is the required shareability domain, reads and writes are the required access
types, both before and after the barrier instruction. Encoded as CRm<3:2> = 0b00.

<imm> For the memory barrier variant: is a 4-bit unsigned immediate, in the range 0 to 15, encoded in the
"CRm" field.

For the memory nXS barrier variant: is a 5-bit unsigned immediate, encoded in “imm2”:

imm2 <imm>
00 16
01 20
10 24
11 28

Alias Conditions

Alias Is preferred when

PSSBB CRm == '0100'
SSBB CRm == '0000'

Operation

case alias of
when DSBAlias_SSBB
SpeculativeStoreBypassBarrierToVA();
when DSBAlias_PSSBB
SpeculativeStoreBypassBarrierToPA();
when DSBAlias_DSB
if !nXS && HaveFeatXS() && HaveFeatHCX() then
nXS = PSTATE.EL IN {EL0, EL1} && IsHCRXEL2Enabled() && HCRX_EL2.FnXS == '1';
DataSynchronizationBarrier(domain, types, nXS);
otherwise
Unreachable();

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

DSB Page 330

DVP

Data Value Prediction Restriction by Context prevents data value predictions that predict execution addresses based
on information gathered from earlier execution within a particular execution context. Data value predictions
determined by the actions of code in the target execution context or contexts appearing in program order before the
instruction cannot be used to exploitatively control speculative execution occurring after the instruction is complete
and synchronized.
For more information, see DVP RCTX, Data Value Prediction Restriction by Context.

This is an alias of SYS. This means:

• The encodings in this description are named to match the encodings of SYS.
• The description of SYS gives the operational pseudocode for this instruction.

System
(FEAT_SPECRES)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 0 1 0 1 0 0 0 0 1 0 1 1 0 1 1 1 0 0 1 1 1 0 1 Rt
L op1 CRn CRm op2

DVP RCTX, <Xt>

is equivalent to

SYS #3, C7, C3, #5, <Xt>

and is always the preferred disassembly.

Assembler Symbols

<Xt> Is the 64-bit name of the general-purpose source register, encoded in the "Rt" field.

Operation

The description of SYS gives the operational pseudocode for this instruction.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

DVP Page 331

EON (shifted register)

Bitwise Exclusive OR NOT (shifted register) performs a bitwise Exclusive OR NOT of a register value and an
optionally-shifted register value, and writes the result to the destination register.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 1 0 0 1 0 1 0 shift 1 Rm imm6 Rn Rd
opc N

32-bit (sf == 0)

EON <Wd>, <Wn>, <Wm>{, <shift> #<amount>}

64-bit (sf == 1)

EON <Xd>, <Xn>, <Xm>{, <shift> #<amount>}

integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer datasize = if sf == '1' then 64 else 32;
if sf == '0' && imm6<5> == '1' then UNDEFINED;

ShiftType shift_type = DecodeShift(shift);

integer shift_amount = UInt(imm6);

Assembler Symbols

<shift> Is the optional shift to be applied to the final source, defaulting to LSL and encoded in “shift”:

shift <shift>
00 LSL
01 LSR
10 ASR
11 ROR

Operation

bits(datasize) operand1 = X[n];

bits(datasize) operand2 = ShiftReg(m, shift_type, shift_amount);
bits(datasize) result;

operand2 = NOT(operand2);

result = operand1 EOR operand2;

X[d] = result;

EON (shifted register) Page 332

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

EON (shifted register) Page 333

EOR (immediate)

Bitwise Exclusive OR (immediate) performs a bitwise Exclusive OR of a register value and an immediate value, and
writes the result to the destination register.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 1 0 1 0 0 1 0 0 N immr imms Rn Rd
opc

32-bit (sf == 0 && N == 0)

EOR <Wd|WSP>, <Wn>, #<imm>

64-bit (sf == 1)

EOR <Xd|SP>, <Xn>, #<imm>

Assembler Symbols

Operation

bits(datasize) result;
bits(datasize) operand1 = X[n];

result = operand1 EOR imm;

if d == 31 then
SP[] = result;
else
X[d] = result;

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

EOR (immediate) Page 334

EOR (shifted register)

Bitwise Exclusive OR (shifted register) performs a bitwise Exclusive OR of a register value and an optionally-shifted
register value, and writes the result to the destination register.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 1 0 0 1 0 1 0 shift 0 Rm imm6 Rn Rd
opc N

32-bit (sf == 0)

EOR <Wd>, <Wn>, <Wm>{, <shift> #<amount>}

64-bit (sf == 1)

EOR <Xd>, <Xn>, <Xm>{, <shift> #<amount>}

integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer datasize = if sf == '1' then 64 else 32;
if sf == '0' && imm6<5> == '1' then UNDEFINED;

ShiftType shift_type = DecodeShift(shift);

integer shift_amount = UInt(imm6);

Assembler Symbols

<shift> Is the optional shift to be applied to the final source, defaulting to LSL and encoded in “shift”:

shift <shift>
00 LSL
01 LSR
10 ASR
11 ROR

Operation

bits(datasize) operand1 = X[n];

bits(datasize) operand2 = ShiftReg(m, shift_type, shift_amount);
bits(datasize) result;

result = operand1 EOR operand2;

X[d] = result;

EOR (shifted register) Page 335

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

EOR (shifted register) Page 336

ERET

Exception Return using the ELR and SPSR for the current Exception level. When executed, the PE restores PSTATE
from the SPSR, and branches to the address held in the ELR.
The PE checks the SPSR for the current Exception level for an illegal return event. See Illegal return events from
AArch64 state.
ERET is UNDEFINED at EL0.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 0 1 1 0 1 0 0 1 1 1 1 1 0 0 0 0 0 0 1 1 1 1 1 0 0 0 0 0
A M Rn op4

ERET

if PSTATE.EL == EL0 then UNDEFINED;

Operation

AArch64.CheckForERetTrap(FALSE, TRUE);
bits(64) target = ELR[];

AArch64.ExceptionReturn(target, SPSR[]);

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

ERET Page 337

ERETAA, ERETAB

Exception Return, with pointer authentication. This instruction authenticates the address in ELR, using SP as the
modifier and the specified key, the PE restores PSTATE from the SPSR for the current Exception level, and branches to
the authenticated address.
Key A is used for ERETAA, and key B is used for ERETAB.
If the authentication passes, the PE continues execution at the target of the branch. If the authentication fails, a
Translation fault is generated.
The authenticated address is not written back to ELR.
The PE checks the SPSR for the current Exception level for an illegal return event. See Illegal return events from
AArch64 state.
ERETAA and ERETAB are UNDEFINED at EL0.

Integer
(FEAT_PAuth)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 0 1 1 0 1 0 0 1 1 1 1 1 0 0 0 0 1 M 1 1 1 1 1 1 1 1 1 1
A Rn op4

ERETAA (M == 0)

ERETAA

ERETAB (M == 1)

ERETAB

if PSTATE.EL == EL0 then UNDEFINED;

boolean use_key_a = (M == '0');

if !HavePACExt() then
UNDEFINED;

Operation

AArch64.CheckForERetTrap(TRUE, use_key_a);
bits(64) target;

if use_key_a then
target = AuthIA(ELR[], SP[], TRUE);
else
target = AuthIB(ELR[], SP[], TRUE);

AArch64.ExceptionReturn(target, SPSR[]);

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

ERETAA, ERETAB Page 338

ESB

Error Synchronization Barrier is an error synchronization event that might also update DISR_EL1 and VDISR_EL2.
This instruction can be used at all Exception levels and in Debug state.
In Debug state, this instruction behaves as if SError interrupts are masked at all Exception levels. See Error
Synchronization Barrier in the Arm(R) Reliability, Availability, and Serviceability (RAS) Specification, Armv8, for
Armv8-A architecture profile.
If the RAS Extension is not implemented, this instruction executes as a NOP.

System
(FEAT_RAS)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 0 1 0 1 0 0 0 0 0 0 1 1 0 0 1 0 0 0 1 0 0 0 0 1 1 1 1 1
CRm op2

ESB

if !HaveRASExt() then EndOfInstruction();

Operation

SynchronizeErrors();
AArch64.ESBOperation();
if PSTATE.EL IN {EL0, EL1} && EL2Enabled() then AArch64.vESBOperation();
TakeUnmaskedSErrorInterrupts();

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

ESB Page 339

EXTR

Extract register extracts a register from a pair of registers.

This instruction is used by the alias ROR (immediate).
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 0 0 1 0 0 1 1 1 N 0 Rm imms Rn Rd

32-bit (sf == 0 && N == 0 && imms == 0xxxxx)

EXTR <Wd>, <Wn>, <Wm>, #<lsb>

64-bit (sf == 1 && N == 1)

EXTR <Xd>, <Xn>, <Xm>, #<lsb>

integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer datasize = if sf == '1' then 64 else 32;
integer lsb;

if N != sf then UNDEFINED;
if sf == '0' && imms<5> == '1' then UNDEFINED;
lsb = UInt(imms);

Assembler Symbols

<Wd> Is the 32-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Wn> Is the 32-bit name of the first general-purpose source register, encoded in the "Rn" field.
<Wm> Is the 32-bit name of the second general-purpose source register, encoded in the "Rm" field.
<Xd> Is the 64-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Xn> Is the 64-bit name of the first general-purpose source register, encoded in the "Rn" field.
<Xm> Is the 64-bit name of the second general-purpose source register, encoded in the "Rm" field.
<lsb> For the 32-bit variant: is the least significant bit position from which to extract, in the range 0 to 31,
encoded in the "imms" field.
For the 64-bit variant: is the least significant bit position from which to extract, in the range 0 to 63,
encoded in the "imms" field.

Alias Conditions

Alias Is preferred when

ROR (immediate) Rn == Rm

Operation

bits(datasize) result;
bits(datasize) operand1 = X[n];
bits(datasize) operand2 = X[m];
bits(2*datasize) concat = operand1:operand2;

result = concat<lsb+datasize-1:lsb>;

X[d] = result;

Operational information

If PSTATE.DIT is 1:

EXTR Page 340

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

EXTR Page 341

GMI

Tag Mask Insert inserts the tag in the first source register into the excluded set specified in the second source
register, writing the new excluded set to the destination register.

Integer
(FEAT_MTE)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 0 1 1 0 1 0 1 1 0 Xm 0 0 0 1 0 1 Xn Xd

GMI <Xd>, <Xn|SP>, <Xm>

if !HaveMTEExt() then UNDEFINED;

integer d = UInt(Xd);
integer n = UInt(Xn);
integer m = UInt(Xm);

Assembler Symbols

<Xd> Is the 64-bit name of the general-purpose destination register, encoded in the "Xd" field.
<Xn|SP> Is the 64-bit name of the first source general-purpose register or stack pointer, encoded in the "Xn"
field.
<Xm> Is the 64-bit name of the second general-purpose source register, encoded in the "Xm" field.

Operation

bits(64) address = if n == 31 then SP[] else X[n];

bits(64) mask = X[m];
bits(4) tag = AArch64.AllocationTagFromAddress(address);

mask<UInt(tag)> = '1';
X[d] = mask;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

GMI Page 342

HINT

Hint instruction is for the instruction set space that is reserved for architectural hint instructions.
Some encodings described here are not allocated in this revision of the architecture, and behave as NOPs. These
encodings might be allocated to other hint functionality in future revisions of the architecture and therefore must not
be used by software.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 0 1 0 1 0 0 0 0 0 0 1 1 0 0 1 0 CRm op2 1 1 1 1 1

HINT #<imm>

SystemHintOp op;

case CRm:op2 of
when '0000 000' op = SystemHintOp_NOP;
when '0000 001' op = SystemHintOp_YIELD;
when '0000 010' op = SystemHintOp_WFE;
when '0000 011' op = SystemHintOp_WFI;
when '0000 100' op = SystemHintOp_SEV;
when '0000 101' op = SystemHintOp_SEVL;
when '0000 110'
if !HaveDGHExt() then EndOfInstruction(); // Instruction executes as NOP
op = SystemHintOp_DGH;
when '0000 111' SEE "XPACLRI";
when '0001 xxx'
case op2 of
when '000' SEE "PACIA1716";
when '010' SEE "PACIB1716";
when '100' SEE "AUTIA1716";
when '110' SEE "AUTIB1716";
otherwise EndOfInstruction();
when '0010 000'
if !HaveRASExt() then EndOfInstruction(); // Instruction executes as NOP
op = SystemHintOp_ESB;
when '0010 001'
if !HaveStatisticalProfiling() then EndOfInstruction(); // Instruction executes as NOP
op = SystemHintOp_PSB;
when '0010 010'
if !HaveSelfHostedTrace() then EndOfInstruction(); // Instruction executes as NOP
op = SystemHintOp_TSB;
when '0010 100'
op = SystemHintOp_CSDB;
when '0011 xxx'
case op2 of
when '000' SEE "PACIAZ";
when '001' SEE "PACIASP";
when '010' SEE "PACIBZ";
when '011' SEE "PACIBSP";
when '100' SEE "AUTIAZ";
when '101' SEE "AUTIASP";
when '110' SEE "AUTIBZ";
when '111' SEE "AUTIBSP";
when '0100 xx0'
op = SystemHintOp_BTI;
// Check branch target compatibility between BTI instruction and PSTATE.BTYPE
SetBTypeCompatible(BTypeCompatible_BTI(op2<2:1>));
otherwise EndOfInstruction();

Assembler Symbols

<imm> Is a 7-bit unsigned immediate, in the range 0 to 127 encoded in the "CRm:op2" field.
The encodings that are allocated to architectural hint functionality are described in the "Hints" table in
the "Index by Encoding".

HINT Page 343

Note

For allocated encodings of "CRm:op2":

• A disassembler will disassemble the allocated instruction, rather than the HINT
instruction.
• An assembler may support assembly of allocated encodings using HINT with the
corresponding <imm> value, but it is not required to do so.

Operation

case op of
when SystemHintOp_YIELD
Hint_Yield();

when SystemHintOp_DGH
Hint_DGH();

when SystemHintOp_WFE
Hint_WFE(1, WFxType_WFE);

when SystemHintOp_WFI
Hint_WFI(1, WFxType_WFI);

when SystemHintOp_SEV
SendEvent();

when SystemHintOp_SEVL
SendEventLocal();

when SystemHintOp_ESB
SynchronizeErrors();
AArch64.ESBOperation();
if PSTATE.EL IN {EL0, EL1} && EL2Enabled() then AArch64.vESBOperation();
TakeUnmaskedSErrorInterrupts();

when SystemHintOp_PSB
ProfilingSynchronizationBarrier();

when SystemHintOp_TSB
TraceSynchronizationBarrier();

when SystemHintOp_CSDB
ConsumptionOfSpeculativeDataBarrier();

when SystemHintOp_BTI
SetBTypeNext('00');

otherwise // do nothing

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

HINT Page 344

HLT

Halt instruction. An HLT instruction can generate a Halt Instruction debug event, which causes entry into Debug state.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 0 1 0 0 0 1 0 imm16 0 0 0 0 0

HLT #<imm>

if EDSCR.HDE == '0' || !HaltingAllowed() then UNDEFINED;

if HaveBTIExt() then
SetBTypeCompatible(TRUE);

Assembler Symbols

<imm> Is a 16-bit unsigned immediate, in the range 0 to 65535, encoded in the "imm16" field.

Operation

Halt(DebugHalt_HaltInstruction);

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

HLT Page 345

HVC

Hypervisor Call causes an exception to EL2. Software executing at EL1 can use this instruction to call the hypervisor
to request a service.
The HVC instruction is UNDEFINED:
• When EL3 is implemented and SCR_EL3.HCE is set to 0.
• When EL3 is not implemented and HCR_EL2.HCD is set to 1.
• When EL2 is not implemented.
• At EL1 if EL2 is not enabled in the current Security state.
• At EL0.
On executing an HVC instruction, the PE records the exception as a Hypervisor Call exception in ESR_ELx, using the
EC value 0x16, and the value of the immediate argument.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 0 1 0 0 0 0 0 imm16 0 0 0 1 0

HVC #<imm>

// Empty.

Assembler Symbols

<imm> Is a 16-bit unsigned immediate, in the range 0 to 65535, encoded in the "imm16" field.

Operation

if !HaveEL(EL2) || PSTATE.EL == EL0 || (PSTATE.EL == EL1 && (!IsSecureEL2Enabled() && IsSecure())) then
UNDEFINED;

hvc_enable = if HaveEL(EL3) then SCR_EL3.HCE else NOT(HCR_EL2.HCD);

if hvc_enable == '0' then

UNDEFINED;
else
AArch64.CallHypervisor(imm16);

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

HVC Page 346

Instruction Cache operation. For more information, see op0==0b01, cache maintenance, TLB maintenance, and
address translation instructions.

This is an alias of SYS. This means:

IC <ic_op>{, <Xt>}

is equivalent to

SYS #<op1>, C7, <Cm>, #<op2>{, <Xt>}

and is the preferred disassembly when SysOp(op1,'0111',CRm,op2) == Sys_IC.

Assembler Symbols

<ic_op> Is an IC instruction name, as listed for the IC system instruction pages, encoded in “op1:CRm:op2”:

op1 CRm op2 <ic_op>

000 0001 000 IALLUIS
000 0101 000 IALLU
011 0101 001 IVAU

<op1> Is a 3-bit unsigned immediate, in the range 0 to 7, encoded in the "op1" field.
<Cm> Is a name 'Cm', with 'm' in the range 0 to 15, encoded in the "CRm" field.
<op2> Is a 3-bit unsigned immediate, in the range 0 to 7, encoded in the "op2" field.
<Xt> Is the 64-bit name of the optional general-purpose source register, defaulting to '11111', encoded in the
"Rt" field.

Operation

The description of SYS gives the operational pseudocode for this instruction.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

IC Page 347
IRG

Insert Random Tag inserts a random Logical Address Tag into the address in the first source register, and writes the
result to the destination register. Any tags specified in the optional second source register or in GCR_EL1.Exclude are
excluded from the selection of the random Logical Address Tag.

Integer
(FEAT_MTE)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 0 1 1 0 1 0 1 1 0 Xm 0 0 0 1 0 0 Xn Xd

IRG <Xd|SP>, <Xn|SP>{, <Xm>}

if !HaveMTEExt() then UNDEFINED;

integer d = UInt(Xd);
integer n = UInt(Xn);
integer m = UInt(Xm);

Assembler Symbols

<Xd|SP> Is the 64-bit name of the destination general-purpose register or stack pointer, encoded in the "Xd"
field.
<Xn|SP> Is the 64-bit name of the first source general-purpose register or stack pointer, encoded in the "Xn"
field.
<Xm> Is the 64-bit name of the second general-purpose source register, encoded in the "Xm" field. Defaults to
XZR if absent.

Operation

bits(64) operand = if n == 31 then SP[] else X[n];

bits(64) exclude_reg = X[m];
bits(16) exclude = exclude_reg<15:0> OR GCR_EL1.Exclude;
bits(4) rtag;

if AArch64.AllocationTagAccessIsEnabled(AccType_NORMAL) then
if GCR_EL1.RRND == '1' then
RGSR_EL1 = bits(64) UNKNOWN;
if IsOnes(exclude) then
rtag = '0000';
else
rtag = ChooseRandomNonExcludedTag(exclude);
else
bits(4) start = RGSR_EL1.TAG;
bits(4) offset = AArch64.RandomTag();

rtag = AArch64.ChooseNonExcludedTag(start, offset, exclude);

RGSR_EL1.TAG = rtag;
else
rtag = '0000';

bits(64) result = AArch64.AddressWithAllocationTag(operand, AccType_NORMAL, rtag);

if d == 31 then
SP[] = result;
else
X[d] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

IRG Page 348

ISB

Instruction Synchronization Barrier flushes the pipeline in the PE and is a context synchronization event. For more
information, see Instruction Synchronization Barrier (ISB).
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 0 1 0 1 0 0 0 0 0 0 1 1 0 0 1 1 CRm 1 1 0 1 1 1 1 1
opc

ISB {<option>|#<imm>}

// No additional decoding required

Assembler Symbols

<option> Specifies an optional limitation on the barrier operation. Values are:

SY
Full system barrier operation, encoded as CRm = 0b1111. Can be omitted.

All other encodings of CRm are reserved. The corresponding instructions execute as full system barrier
operations, but must not be relied upon by software.
<imm> Is an optional 4-bit unsigned immediate, in the range 0 to 15, defaulting to 15 and encoded in the
"CRm" field.

Operation

InstructionSynchronizationBarrier();

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

ISB Page 349

LD64B

Single-copy Atomic 64-byte Load derives an address from a base register value, loads eight 64-bit doublewords from a
memory location, and writes them to consecutive registers, Xt to X(t+7). The data that is loaded is atomic and is
required to be 64-byte aligned.

Integer
(FEAT_LS64)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 1 1 0 0 0 0 0 1 1 1 1 1 1 1 1 0 1 0 0 Rn Rt

LD64B <Xt>, [<Xn|SP> {,#0}]

if !HaveFeatLS64() then UNDEFINED;

if Rt<4:3> == '11' || Rt<0> == '1' then UNDEFINED;

integer n = UInt(Rn);
integer t = UInt(Rt);
boolean tag_checked = n != 31;

Assembler Symbols

<Xt> Is the 64-bit name of the first general-purpose register to be transferred, encoded in the "Rt" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.

Operation

CheckLDST64BEnabled();

bits(512) data;
bits(64) address;
bits(64) value;
acctype = AccType_ATOMICLS64;

if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);

if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];

data = MemLoad64B(address, acctype);

for i = 0 to 7
value = data<63+64*i:64*i>;
if BigEndian(acctype) then value = BigEndianReverse(value);
X[t+i] = value;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

LD64B Page 350

LDADD, LDADDA, LDADDAL, LDADDL

Atomic add on word or doubleword in memory atomically loads a 32-bit word or 64-bit doubleword from memory, adds
the value held in a register to it, and stores the result back to memory. The value initially loaded from memory is
returned in the destination register.
• If the destination register is not one of WZR or XZR, LDADDA and LDADDAL load from memory with acquire
semantics.
• LDADDL and LDADDAL store to memory with release semantics.
• LDADD has neither acquire nor release semantics.
For more information about memory ordering semantics see Load-Acquire, Store-Release.
For information about memory accesses see Load/Store addressing modes.
This instruction is used by the alias STADD, STADDL.

Integer
(FEAT_LSE)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 x 1 1 1 0 0 0 A R 1 Rs 0 0 0 0 0 0 Rn Rt
size opc

LDADD, LDADDA, LDADDAL,

Page 351
LDADDL
32-bit LDADD (size == 10 && A == 0 && R == 0)

LDADD <Ws>, <Wt>, [<Xn|SP>]

32-bit LDADDA (size == 10 && A == 1 && R == 0)

LDADDA <Ws>, <Wt>, [<Xn|SP>]

32-bit LDADDAL (size == 10 && A == 1 && R == 1)

LDADDAL <Ws>, <Wt>, [<Xn|SP>]

32-bit LDADDL (size == 10 && A == 0 && R == 1)

LDADDL <Ws>, <Wt>, [<Xn|SP>]

64-bit LDADD (size == 11 && A == 0 && R == 0)

LDADD <Xs>, <Xt>, [<Xn|SP>]

64-bit LDADDA (size == 11 && A == 1 && R == 0)

LDADDA <Xs>, <Xt>, [<Xn|SP>]

64-bit LDADDAL (size == 11 && A == 1 && R == 1)

LDADDAL <Xs>, <Xt>, [<Xn|SP>]

64-bit LDADDL (size == 11 && A == 0 && R == 1)

LDADDL <Xs>, <Xt>, [<Xn|SP>]

if !HaveAtomicExt() then UNDEFINED;

integer t = UInt(Rt);
integer n = UInt(Rn);
integer s = UInt(Rs);

integer datasize = 8 << UInt(size);

integer regsize = if datasize == 64 then 64 else 32;
AccType ldacctype = if A == '1' && Rt != '11111' then AccType_ORDEREDATOMICRW else AccType_ATOMICRW;
AccType stacctype = if R == '1' then AccType_ORDEREDATOMICRW else AccType_ATOMICRW;
boolean tag_checked = n != 31;

Assembler Symbols

<Ws> Is the 32-bit name of the general-purpose register holding the data value to be operated on with the
contents of the memory location, encoded in the "Rs" field.
<Wt> Is the 32-bit name of the general-purpose register to be loaded, encoded in the "Rt" field.
<Xs> Is the 64-bit name of the general-purpose register holding the data value to be operated on with the
contents of the memory location, encoded in the "Rs" field.
<Xt> Is the 64-bit name of the general-purpose register to be loaded, encoded in the "Rt" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.

Alias Conditions

Alias Is preferred when

STADD, STADDL A == '0' && Rt == '11111'

LDADD, LDADDA, LDADDAL,

Page 352
LDADDL
Operation

bits(64) address;
bits(datasize) value;
bits(datasize) data;

if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);

value = X[s];
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];

data = MemAtomic(address, MemAtomicOp_ADD, value, ldacctype, stacctype);

if t != 31 then
X[t] = ZeroExtend(data, regsize);

Operational information

If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

LDADD, LDADDA, LDADDAL,

Page 353
LDADDL
LDADDB, LDADDAB, LDADDALB, LDADDLB

Atomic add on byte in memory atomically loads an 8-bit byte from memory, adds the value held in a register to it, and
stores the result back to memory. The value initially loaded from memory is returned in the destination register.
• If the destination register is not WZR, LDADDAB and LDADDALB load from memory with acquire semantics.
• LDADDLB and LDADDALB store to memory with release semantics.
• LDADDB has neither acquire nor release semantics.
For more information about memory ordering semantics see Load-Acquire, Store-Release.
For information about memory accesses see Load/Store addressing modes.
This instruction is used by the alias STADDB, STADDLB.

Integer
(FEAT_LSE)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 1 1 0 0 0 A R 1 Rs 0 0 0 0 0 0 Rn Rt
size opc

LDADDAB (A == 1 && R == 0)

LDADDAB <Ws>, <Wt>, [<Xn|SP>]

LDADDALB (A == 1 && R == 1)

LDADDALB <Ws>, <Wt>, [<Xn|SP>]

LDADDB (A == 0 && R == 0)

LDADDB <Ws>, <Wt>, [<Xn|SP>]

LDADDLB (A == 0 && R == 1)

LDADDLB <Ws>, <Wt>, [<Xn|SP>]

if !HaveAtomicExt() then UNDEFINED;

integer t = UInt(Rt);
integer n = UInt(Rn);
integer s = UInt(Rs);

AccType ldacctype = if A == '1' && Rt != '11111' then AccType_ORDEREDATOMICRW else AccType_ATOMICRW;

AccType stacctype = if R == '1' then AccType_ORDEREDATOMICRW else AccType_ATOMICRW;
boolean tag_checked = n != 31;

Assembler Symbols

Alias Conditions

Alias Is preferred when

STADDB, STADDLB A == '0' && Rt == '11111'

LDADDB, LDADDAB,
Page 354
LDADDALB, LDADDLB
Operation

bits(64) address;
bits(8) value;
bits(8) data;

if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);

value = X[s];
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];

data = MemAtomic(address, MemAtomicOp_ADD, value, ldacctype, stacctype);

if t != 31 then
X[t] = ZeroExtend(data, 32);

Operational information

If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

LDADDB, LDADDAB,
Page 355
LDADDALB, LDADDLB
LDADDH, LDADDAH, LDADDALH, LDADDLH

Atomic add on halfword in memory atomically loads a 16-bit halfword from memory, adds the value held in a register
to it, and stores the result back to memory. The value initially loaded from memory is returned in the destination
register.
• If the destination register is not WZR, LDADDAH and LDADDALH load from memory with acquire semantics.
• LDADDLH and LDADDALH store to memory with release semantics.
• LDADDH has neither acquire nor release semantics.
For more information about memory ordering semantics see Load-Acquire, Store-Release.
For information about memory accesses see Load/Store addressing modes.
This instruction is used by the alias STADDH, STADDLH.

Integer
(FEAT_LSE)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 0 0 0 A R 1 Rs 0 0 0 0 0 0 Rn Rt
size opc

LDADDAH (A == 1 && R == 0)

LDADDAH <Ws>, <Wt>, [<Xn|SP>]

LDADDALH (A == 1 && R == 1)

LDADDALH <Ws>, <Wt>, [<Xn|SP>]

LDADDH (A == 0 && R == 0)

LDADDH <Ws>, <Wt>, [<Xn|SP>]

LDADDLH (A == 0 && R == 1)

LDADDLH <Ws>, <Wt>, [<Xn|SP>]

if !HaveAtomicExt() then UNDEFINED;

integer t = UInt(Rt);
integer n = UInt(Rn);
integer s = UInt(Rs);

AccType ldacctype = if A == '1' && Rt != '11111' then AccType_ORDEREDATOMICRW else AccType_ATOMICRW;

AccType stacctype = if R == '1' then AccType_ORDEREDATOMICRW else AccType_ATOMICRW;
boolean tag_checked = n != 31;

Assembler Symbols

Alias Conditions

Alias Is preferred when

STADDH, STADDLH A == '0' && Rt == '11111'

LDADDH, LDADDAH,
Page 356
LDADDALH, LDADDLH
Operation

bits(64) address;
bits(16) value;
bits(16) data;

if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);

value = X[s];
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];

data = MemAtomic(address, MemAtomicOp_ADD, value, ldacctype, stacctype);

if t != 31 then
X[t] = ZeroExtend(data, 32);

Operational information

If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

LDADDH, LDADDAH,
Page 357
LDADDALH, LDADDLH
LDAPR

Load-Acquire RCpc Register derives an address from a base register value, loads a 32-bit word or 64-bit doubleword
from the derived address in memory, and writes it to a register.
The instruction has memory ordering semantics as described in Load-Acquire, Load-AcquirePC, and Store-Release,
except that:
• There is no ordering requirement, separate from the requirements of a Load-AcquirePC or a Store-Release,
created by having a Store-Release followed by a Load-AcquirePC instruction.
• The reading of a value written by a Store-Release by a Load-AcquirePC instruction by the same observer does
not make the write of the Store-Release globally observed.
This difference in memory ordering is not described in the pseudocode.
For information about memory accesses, see Load/Store addressing modes.

Integer
(FEAT_LRCPC)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 x 1 1 1 0 0 0 1 0 1 (1) (1) (1) (1) (1) 1 1 0 0 0 0 Rn Rt
size Rs

32-bit (size == 10)

LDAPR <Wt>, [<Xn|SP> {,#0}]

64-bit (size == 11)

LDAPR <Xt>, [<Xn|SP> {,#0}]

integer n = UInt(Rn);
integer t = UInt(Rt);

integer elsize = 8 << UInt(size);

integer regsize = if elsize == 64 then 64 else 32;
boolean tag_checked = n != 31;

Assembler Symbols

<Wt> Is the 32-bit name of the general-purpose register to be loaded, encoded in the "Rt" field.
<Xt> Is the 64-bit name of the general-purpose register to be loaded, encoded in the "Rt" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.

Operation

bits(64) address;
bits(elsize) data;
constant integer dbytes = elsize DIV 8;

if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);

if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];

data = Mem[address, dbytes, AccType_ORDERED];

X[t] = ZeroExtend(data, regsize);

LDAPR Page 358

Operational information

If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

LDAPR Page 359

LDAPRB

Load-Acquire RCpc Register Byte derives an address from a base register value, loads a byte from the derived address
in memory, zero-extends it and writes it to a register.
The instruction has memory ordering semantics as described in Load-Acquire, Load-AcquirePC, and Store-Release,
except that:
• There is no ordering requirement, separate from the requirements of a Load-AcquirePC or a Store-Release,
created by having a Store-Release followed by a Load-AcquirePC instruction.
• The reading of a value written by a Store-Release by a Load-AcquirePC instruction by the same observer does
not make the write of the Store-Release globally observed.
This difference in memory ordering is not described in the pseudocode.
For information about memory accesses, see Load/Store addressing modes.

Integer
(FEAT_LRCPC)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 1 1 0 0 0 1 0 1 (1) (1) (1) (1) (1) 1 1 0 0 0 0 Rn Rt
size Rs

LDAPRB <Wt>, [<Xn|SP> {,#0}]

integer n = UInt(Rn);
integer t = UInt(Rt);

boolean tag_checked = n != 31;

Assembler Symbols

<Wt> Is the 32-bit name of the general-purpose register to be loaded, encoded in the "Rt" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.

Operation

bits(64) address;
bits(8) data;

if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);

if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];

data = Mem[address, 1, AccType_ORDERED];

X[t] = ZeroExtend(data, 32);

Operational information

If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

LDAPRB Page 360

LDAPRH

Load-Acquire RCpc Register Halfword derives an address from a base register value, loads a halfword from the
derived address in memory, zero-extends it and writes it to a register.
The instruction has memory ordering semantics as described in Load-Acquire, Load-AcquirePC, and Store-Release,
except that:
• There is no ordering requirement, separate from the requirements of a Load-AcquirePC or a Store-Release,
created by having a Store-Release followed by a Load-AcquirePC instruction.
• The reading of a value written by a Store-Release by a Load-AcquirePC instruction by the same observer does
not make the write of the Store-Release globally observed.
This difference in memory ordering is not described in the pseudocode.
For information about memory accesses, see Load/Store addressing modes.

Integer
(FEAT_LRCPC)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 0 0 0 1 0 1 (1) (1) (1) (1) (1) 1 1 0 0 0 0 Rn Rt
size Rs

LDAPRH <Wt>, [<Xn|SP> {,#0}]

integer n = UInt(Rn);
integer t = UInt(Rt);

boolean tag_checked = n != 31;

Assembler Symbols

Operation

bits(64) address;
bits(16) data;

if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);

if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];

data = Mem[address, 2, AccType_ORDERED];

X[t] = ZeroExtend(data, 32);

Operational information

If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

LDAPRH Page 361

LDAPUR

Load-Acquire RCpc Register (unscaled) calculates an address from a base register and an immediate offset, loads a
32-bit word or 64-bit doubleword from memory, zero-extends it, and writes it to a register.
The instruction has memory ordering semantics as described in Load-Acquire, Load-AcquirePC, and Store-Release,
except that:
• There is no ordering requirement, separate from the requirements of a Load-AcquirePC or a Store-Release,
created by having a Store-Release followed by a Load-AcquirePC instruction.
• The reading of a value written by a Store-Release by a Load-AcquirePC instruction by the same observer does
not make the write of the Store-Release globally observed.
This difference in memory ordering is not described in the pseudocode.
For information about memory accesses, see Load/Store addressing modes.

Unscaled offset
(FEAT_LRCPC2)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 x 0 1 1 0 0 1 0 1 0 imm9 0 0 Rn Rt
size opc

32-bit (size == 10)

LDAPUR <Wt>, [<Xn|SP>{, #<simm>}]

64-bit (size == 11)

LDAPUR <Xt>, [<Xn|SP>{, #<simm>}]

integer scale = UInt(size);

bits(64) offset = SignExtend(imm9, 64);

Assembler Symbols

Shared Decode

integer n = UInt(Rn);
integer t = UInt(Rt);
integer regsize;

regsize = if size == '11' then 64 else 32;

integer datasize = 8 << scale;
boolean tag_checked = n != 31;

LDAPUR Page 362

Operation

bits(64) address;
bits(datasize) data;

if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);

if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];

address = address + offset;

data = Mem[address, datasize DIV 8, AccType_ORDERED];

X[t] = ZeroExtend(data, regsize);

Operational information

If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

LDAPUR Page 363

LDAPURB

Load-Acquire RCpc Register Byte (unscaled) calculates an address from a base register and an immediate offset, loads
a byte from memory, zero-extends it, and writes it to a register.
The instruction has memory ordering semantics as described in Load-Acquire, Load-AcquirePC, and Store-Release,
except that:
• There is no ordering requirement, separate from the requirements of a Load-AcquirePC or a Store-Release,
created by having a Store-Release followed by a Load-AcquirePC instruction.
• The reading of a value written by a Store-Release by a Load-AcquirePC instruction by the same observer does
not make the write of the Store-Release globally observed.
This difference in memory ordering is not described in the pseudocode.
For information about memory accesses, see Load/Store addressing modes.

Unscaled offset
(FEAT_LRCPC2)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 1 1 0 0 1 0 1 0 imm9 0 0 Rn Rt
size opc

LDAPURB <Wt>, [<Xn|SP>{, #<simm>}]

bits(64) offset = SignExtend(imm9, 64);

Assembler Symbols

<Wt> Is the 32-bit name of the general-purpose register to be transferred, encoded in the "Rt" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<simm> Is the optional signed immediate byte offset, in the range -256 to 255, defaulting to 0 and encoded in
the "imm9" field.

Shared Decode

integer n = UInt(Rn);
integer t = UInt(Rt);

boolean tag_checked = n != 31;

Operation

bits(64) address;
bits(8) data;

if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);

if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];

address = address + offset;

data = Mem[address, 1, AccType_ORDERED];

X[t] = ZeroExtend(data, 32);

Operational information

If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.

LDAPURB Page 364

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

LDAPURB Page 365

LDAPURH

Load-Acquire RCpc Register Halfword (unscaled) calculates an address from a base register and an immediate offset,
loads a halfword from memory, zero-extends it, and writes it to a register.
The instruction has memory ordering semantics as described in Load-Acquire, Load-AcquirePC, and Store-Release,
except that:
• There is no ordering requirement, separate from the requirements of a Load-AcquirePC or a Store-Release,
created by having a Store-Release followed by a Load-AcquirePC instruction.
• The reading of a value written by a Store-Release by a Load-AcquirePC instruction by the same observer does
not make the write of the Store-Release globally observed.
This difference in memory ordering is not described in the pseudocode.
For information about memory accesses, see Load/Store addressing modes.

Unscaled offset
(FEAT_LRCPC2)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 0 0 1 0 1 0 imm9 0 0 Rn Rt
size opc

LDAPURH <Wt>, [<Xn|SP>{, #<simm>}]

bits(64) offset = SignExtend(imm9, 64);

Assembler Symbols

<Wt> Is the 32-bit name of the general-purpose register to be transferred, encoded in the "Rt" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<simm> Is the optional signed immediate byte offset, in the range -256 to 255, defaulting to 0 and encoded in
the "imm9" field.

Shared Decode

integer n = UInt(Rn);
integer t = UInt(Rt);

boolean tag_checked = n != 31;

Operation

bits(64) address;
bits(16) data;

if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);

if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];

address = address + offset;

data = Mem[address, 2, AccType_ORDERED];

X[t] = ZeroExtend(data, 32);

Operational information

If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.

LDAPURH Page 366

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

LDAPURH Page 367

LDAPURSB

Load-Acquire RCpc Register Signed Byte (unscaled) calculates an address from a base register and an immediate
offset, loads a signed byte from memory, sign-extends it, and writes it to a register.
The instruction has memory ordering semantics as described in Load-Acquire, Load-AcquirePC, and Store-Release,
except that:
• There is no ordering requirement, separate from the requirements of a Load-AcquirePC or a Store-Release,
created by having a Store-Release followed by a Load-AcquirePC instruction.
• The reading of a value written by a Store-Release by a Load-AcquirePC instruction by the same observer does
not make the write of the Store-Release globally observed.
This difference in memory ordering is not described in the pseudocode.
For information about memory accesses, see Load/Store addressing modes.

Unscaled offset
(FEAT_LRCPC2)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 1 1 0 0 1 1 x 0 imm9 0 0 Rn Rt
size opc

32-bit (opc == 11)

LDAPURSB <Wt>, [<Xn|SP>{, #<simm>}]

64-bit (opc == 10)

LDAPURSB <Xt>, [<Xn|SP>{, #<simm>}]

bits(64) offset = SignExtend(imm9, 64);

Assembler Symbols

Shared Decode

integer n = UInt(Rn);
integer t = UInt(Rt);
MemOp memop;
boolean signed;
integer regsize;

if opc<1> == '0' then

// store or zero-extending load
memop = if opc<0> == '1' then MemOp_LOAD else MemOp_STORE;
regsize = 32;
signed = FALSE;
else
// sign-extending load
memop = MemOp_LOAD;
regsize = if opc<0> == '1' then 32 else 64;
signed = TRUE;

boolean tag_checked = memop != MemOp_PREFETCH && (n != 31);

LDAPURSB Page 368

Operation

bits(64) address;
bits(8) data;

if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);

if n == 31 then
if memop != MemOp_PREFETCH then CheckSPAlignment();
address = SP[];
else
address = X[n];

address = address + offset;

case memop of
when MemOp_STORE
data = X[t];
Mem[address, 1, AccType_ORDERED] = data;

when MemOp_LOAD
data = Mem[address, 1, AccType_ORDERED];
if signed then
X[t] = SignExtend(data, regsize);
else
X[t] = ZeroExtend(data, regsize);

when MemOp_PREFETCH
Prefetch(address, t<4:0>);

Operational information

If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

LDAPURSB Page 369

LDAPURSH

Load-Acquire RCpc Register Signed Halfword (unscaled) calculates an address from a base register and an immediate
offset, loads a signed halfword from memory, sign-extends it, and writes it to a register.
The instruction has memory ordering semantics as described in Load-Acquire, Load-AcquirePC, and Store-Release,
except that:
• There is no ordering requirement, separate from the requirements of a Load-AcquirePC or a Store-Release,
created by having a Store-Release followed by a Load-AcquirePC instruction.
• The reading of a value written by a Store-Release by a Load-AcquirePC instruction by the same observer does
not make the write of the Store-Release globally observed.
This difference in memory ordering is not described in the pseudocode.
For information about memory accesses, see Load/Store addressing modes.

Unscaled offset
(FEAT_LRCPC2)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 0 0 1 1 x 0 imm9 0 0 Rn Rt
size opc

32-bit (opc == 11)

LDAPURSH <Wt>, [<Xn|SP>{, #<simm>}]

64-bit (opc == 10)

LDAPURSH <Xt>, [<Xn|SP>{, #<simm>}]

bits(64) offset = SignExtend(imm9, 64);

Assembler Symbols

Shared Decode

integer n = UInt(Rn);
integer t = UInt(Rt);
MemOp memop;
boolean signed;
integer regsize;

if opc<1> == '0' then

boolean tag_checked = memop != MemOp_PREFETCH && (n != 31);

LDAPURSH Page 370

Operation

bits(64) address;
bits(16) data;

if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);

if n == 31 then
if memop != MemOp_PREFETCH then CheckSPAlignment();
address = SP[];
else
address = X[n];

address = address + offset;

case memop of
when MemOp_STORE
data = X[t];
Mem[address, 2, AccType_ORDERED] = data;

when MemOp_LOAD
data = Mem[address, 2, AccType_ORDERED];
if signed then
X[t] = SignExtend(data, regsize);
else
X[t] = ZeroExtend(data, regsize);

when MemOp_PREFETCH
Prefetch(address, t<4:0>);

Operational information

If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

LDAPURSH Page 371

LDAPURSW

Load-Acquire RCpc Register Signed Word (unscaled) calculates an address from a base register and an immediate
offset, loads a signed word from memory, sign-extends it, and writes it to a register.
The instruction has memory ordering semantics as described in Load-Acquire, Load-AcquirePC, and Store-Release,
except that:
• There is no ordering requirement, separate from the requirements of a Load-AcquirePC or a Store-Release,
created by having a Store-Release followed by a Load-AcquirePC instruction.
• The reading of a value written by a Store-Release by a Load-AcquirePC instruction by the same observer does
not make the write of the Store-Release globally observed.
This difference in memory ordering is not described in the pseudocode.
For information about memory accesses, see Load/Store addressing modes.

Unscaled offset
(FEAT_LRCPC2)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 0 1 1 0 0 1 1 0 0 imm9 0 0 Rn Rt
size opc

LDAPURSW <Xt>, [<Xn|SP>{, #<simm>}]

bits(64) offset = SignExtend(imm9, 64);

Assembler Symbols

<Xt> Is the 64-bit name of the general-purpose register to be transferred, encoded in the "Rt" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<simm> Is the optional signed immediate byte offset, in the range -256 to 255, defaulting to 0 and encoded in
the "imm9" field.

Shared Decode

integer n = UInt(Rn);
integer t = UInt(Rt);

boolean tag_checked = n != 31;

Operation

bits(64) address;
bits(32) data;

if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);

if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];

address = address + offset;

data = Mem[address, 4, AccType_ORDERED];

X[t] = SignExtend(data, 64);

Operational information

If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.

LDAPURSW Page 372

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

LDAPURSW Page 373

LDAR

Load-Acquire Register derives an address from a base register value, loads a 32-bit word or 64-bit doubleword from
memory, and writes it to a register. The instruction also has memory ordering semantics as described in Load-Acquire,
Store-Release. For information about memory accesses, see Load/Store addressing modes.

Note

For this instruction, if the destination is WZR/XZR, it is impossible for software to observe the presence of the
acquire semantic other than its effect on the arrival at endpoints.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 x 0 0 1 0 0 0 1 1 0 (1) (1) (1) (1) (1) 1 (1) (1) (1) (1) (1) Rn Rt
size L Rs o0 Rt2

32-bit (size == 10)

LDAR <Wt>, [<Xn|SP>{,#0}]

64-bit (size == 11)

LDAR <Xt>, [<Xn|SP>{,#0}]

integer n = UInt(Rn);
integer t = UInt(Rt);

integer elsize = 8 << UInt(size);

integer regsize = if elsize == 64 then 64 else 32;
boolean tag_checked = n != 31;

Assembler Symbols

Operation

bits(64) address;
bits(elsize) data;
constant integer dbytes = elsize DIV 8;

if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);

if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];

data = Mem[address, dbytes, AccType_ORDERED];

X[t] = ZeroExtend(data, regsize);

Operational information

If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

LDAR Page 374

LDARB

Load-Acquire Register Byte derives an address from a base register value, loads a byte from memory, zero-extends it
and writes it to a register. The instruction also has memory ordering semantics as described in Load-Acquire, Store-
Release. For information about memory accesses, see Load/Store addressing modes.

Note

For this instruction, if the destination is WZR/XZR, it is impossible for software to observe the presence of the
acquire semantic other than its effect on the arrival at endpoints.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 1 0 0 0 1 1 0 (1) (1) (1) (1) (1) 1 (1) (1) (1) (1) (1) Rn Rt
size L Rs o0 Rt2

LDARB <Wt>, [<Xn|SP>{,#0}]

integer n = UInt(Rn);
integer t = UInt(Rt);

boolean tag_checked = n != 31;

Assembler Symbols

Operation

bits(64) address;
bits(8) data;

if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);

if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];

data = Mem[address, 1, AccType_ORDERED];

X[t] = ZeroExtend(data, 32);

Operational information

If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

LDARB Page 375

LDARH

Load-Acquire Register Halfword derives an address from a base register value, loads a halfword from memory, zero-
extends it, and writes it to a register. The instruction also has memory ordering semantics as described in Load-
Acquire, Store-Release. For information about memory accesses, see Load/Store addressing modes.

Note

For this instruction, if the destination is WZR/XZR, it is impossible for software to observe the presence of the
acquire semantic other than its effect on the arrival at endpoints.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 0 1 0 0 0 1 1 0 (1) (1) (1) (1) (1) 1 (1) (1) (1) (1) (1) Rn Rt
size L Rs o0 Rt2

LDARH <Wt>, [<Xn|SP>{,#0}]

integer n = UInt(Rn);
integer t = UInt(Rt);

boolean tag_checked = n != 31;

Assembler Symbols

Operation

bits(64) address;
bits(16) data;

if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);

if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];

data = Mem[address, 2, AccType_ORDERED];

X[t] = ZeroExtend(data, 32);

Operational information

If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

LDARH Page 376

LDAXP

Load-Acquire Exclusive Pair of Registers derives an address from a base register value, loads two 32-bit words or two
64-bit doublewords from memory, and writes them to two registers. For information on single-copy atomicity and
alignment requirements, see Requirements for single-copy atomicity and Alignment of data accesses. The PE marks
the physical address being accessed as an exclusive access. This exclusive access mark is checked by Store Exclusive
instructions. See Synchronization and semaphores. The instruction also has memory ordering semantics, as described
in Load-Acquire, Store-Release. For information about memory accesses, see Load/Store addressing modes.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 sz 0 0 1 0 0 0 0 1 1 (1) (1) (1) (1) (1) 1 Rt2 Rn Rt
L Rs o0

32-bit (sz == 0)

LDAXP <Wt1>, <Wt2>, [<Xn|SP>{,#0}]

64-bit (sz == 1)

LDAXP <Xt1>, <Xt2>, [<Xn|SP>{,#0}]

integer n = UInt(Rn);
integer t = UInt(Rt);
integer t2 = UInt(Rt2);

integer elsize = 32 << UInt(sz);

integer datasize = elsize * 2;
boolean tag_checked = n != 31;

boolean rt_unknown = FALSE;

if t == t2 then
Constraint c = ConstrainUnpredictable(Unpredictable_LDPOVERLAP);
assert c IN {Constraint_UNKNOWN, Constraint_UNDEF, Constraint_NOP};
case c of
when Constraint_UNKNOWN rt_unknown = TRUE; // result is UNKNOWN
when Constraint_UNDEF UNDEFINED;
when Constraint_NOP EndOfInstruction();

For information about the CONSTRAINED UNPREDICTABLE behavior of this instruction, see Architectural Constraints on
UNPREDICTABLE behaviors, and particularly LDAXP.

Assembler Symbols

LDAXP Page 377

Operation

bits(64) address;
bits(datasize) data;
constant integer dbytes = datasize DIV 8;

if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);

if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];

// Tell the Exclusives monitors to record a sequence of one or more atomic

// memory reads from virtual address range [address, address+dbytes-1].
// The Exclusives monitor will only be set if all the reads are from the
// same dbytes-aligned physical address, to allow for the possibility of
// an atomicity break if the translation is changed between reads.
AArch64.SetExclusiveMonitors(address, dbytes);

if rt_unknown then
// ConstrainedUNPREDICTABLE case
X[t] = bits(datasize) UNKNOWN; // In this case t = t2
elsif elsize == 32 then
// 32-bit load exclusive pair (atomic)
data = Mem[address, dbytes, AccType_ORDEREDATOMIC];
if BigEndian(AccType_ORDEREDATOMIC) then
X[t] = data<datasize-1:elsize>;
X[t2] = data<elsize-1:0>;
else
X[t] = data<elsize-1:0>;
X[t2] = data<datasize-1:elsize>;
else // elsize == 64
// 64-bit load exclusive pair (not atomic),
// but must be 128-bit aligned
if address != Align(address, dbytes) then
AArch64.Abort(address, AlignmentFault(AccType_ORDEREDATOMIC, FALSE, FALSE));
X[t] = Mem[address, 8, AccType_ORDEREDATOMIC];
X[t2] = Mem[address+8, 8, AccType_ORDEREDATOMIC];

Operational information

If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

LDAXP Page 378

LDAXR

Load-Acquire Exclusive Register derives an address from a base register value, loads a 32-bit word or 64-bit
doubleword from memory, and writes it to a register. The memory access is atomic. The PE marks the physical address
being accessed as an exclusive access. This exclusive access mark is checked by Store Exclusive instructions. See
Synchronization and semaphores. The instruction also has memory ordering semantics as described in Load-Acquire,
Store-Release. For information about memory accesses see Load/Store addressing modes.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 x 0 0 1 0 0 0 0 1 0 (1) (1) (1) (1) (1) 1 (1) (1) (1) (1) (1) Rn Rt
size L Rs o0 Rt2

32-bit (size == 10)

LDAXR <Wt>, [<Xn|SP>{,#0}]

64-bit (size == 11)

LDAXR <Xt>, [<Xn|SP>{,#0}]

integer n = UInt(Rn);
integer t = UInt(Rt);

integer elsize = 8 << UInt(size);

integer regsize = if elsize == 64 then 64 else 32;
boolean tag_checked = n != 31;

Assembler Symbols

Operation

bits(64) address;
bits(elsize) data;
constant integer dbytes = elsize DIV 8;

if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);

if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];

// Tell the Exclusives monitors to record a sequence of one or more atomic

data = Mem[address, dbytes, AccType_ORDEREDATOMIC];

X[t] = ZeroExtend(data, regsize);

Operational information

If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.

LDAXR Page 379

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

LDAXR Page 380

LDAXRB

Load-Acquire Exclusive Register Byte derives an address from a base register value, loads a byte from memory, zero-
extends it and writes it to a register. The memory access is atomic. The PE marks the physical address being accessed
as an exclusive access. This exclusive access mark is checked by Store Exclusive instructions. See Synchronization and
semaphores. The instruction also has memory ordering semantics as described in Load-Acquire, Store-Release. For
information about memory accesses see Load/Store addressing modes.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 1 0 0 0 0 1 0 (1) (1) (1) (1) (1) 1 (1) (1) (1) (1) (1) Rn Rt
size L Rs o0 Rt2

LDAXRB <Wt>, [<Xn|SP>{,#0}]

integer n = UInt(Rn);
integer t = UInt(Rt);

boolean tag_checked = n != 31;

Assembler Symbols

Operation

bits(64) address;
bits(8) data;

if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);

if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];

// Tell the Exclusives monitors to record a sequence of one or more atomic

data = Mem[address, 1, AccType_ORDEREDATOMIC];

X[t] = ZeroExtend(data, 32);

Operational information

If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

LDAXRB Page 381

LDAXRH

Load-Acquire Exclusive Register Halfword derives an address from a base register value, loads a halfword from
memory, zero-extends it and writes it to a register. The memory access is atomic. The PE marks the physical address
being accessed as an exclusive access. This exclusive access mark is checked by Store Exclusive instructions. See
Synchronization and semaphores. The instruction also has memory ordering semantics as described in Load-Acquire,
Store-Release. For information about memory accesses see Load/Store addressing modes.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 0 1 0 0 0 0 1 0 (1) (1) (1) (1) (1) 1 (1) (1) (1) (1) (1) Rn Rt
size L Rs o0 Rt2

LDAXRH <Wt>, [<Xn|SP>{,#0}]

integer n = UInt(Rn);
integer t = UInt(Rt);

boolean tag_checked = n != 31;

Assembler Symbols

Operation

bits(64) address;
bits(16) data;

if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);

if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];

// Tell the Exclusives monitors to record a sequence of one or more atomic

data = Mem[address, 2, AccType_ORDEREDATOMIC];

X[t] = ZeroExtend(data, 32);

Operational information

If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

LDAXRH Page 382

LDCLR, LDCLRA, LDCLRAL, LDCLRL

Atomic bit clear on word or doubleword in memory atomically loads a 32-bit word or 64-bit doubleword from memory,
performs a bitwise AND with the complement of the value held in a register on it, and stores the result back to
memory. The value initially loaded from memory is returned in the destination register.
• If the destination register is not one of WZR or XZR, LDCLRA and LDCLRAL load from memory with acquire
semantics.
• LDCLRL and LDCLRAL store to memory with release semantics.
• LDCLR has neither acquire nor release semantics.
For more information about memory ordering semantics see Load-Acquire, Store-Release.
For information about memory accesses see Load/Store addressing modes.
This instruction is used by the alias STCLR, STCLRL.

Integer
(FEAT_LSE)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 x 1 1 1 0 0 0 A R 1 Rs 0 0 0 1 0 0 Rn Rt
size opc

LDCLR, LDCLRA, LDCLRAL,

Page 383
LDCLRL
32-bit LDCLR (size == 10 && A == 0 && R == 0)

LDCLR <Ws>, <Wt>, [<Xn|SP>]

32-bit LDCLRA (size == 10 && A == 1 && R == 0)

LDCLRA <Ws>, <Wt>, [<Xn|SP>]

32-bit LDCLRAL (size == 10 && A == 1 && R == 1)

LDCLRAL <Ws>, <Wt>, [<Xn|SP>]

32-bit LDCLRL (size == 10 && A == 0 && R == 1)

LDCLRL <Ws>, <Wt>, [<Xn|SP>]

64-bit LDCLR (size == 11 && A == 0 && R == 0)

LDCLR <Xs>, <Xt>, [<Xn|SP>]

64-bit LDCLRA (size == 11 && A == 1 && R == 0)

LDCLRA <Xs>, <Xt>, [<Xn|SP>]

64-bit LDCLRAL (size == 11 && A == 1 && R == 1)

LDCLRAL <Xs>, <Xt>, [<Xn|SP>]

64-bit LDCLRL (size == 11 && A == 0 && R == 1)

LDCLRL <Xs>, <Xt>, [<Xn|SP>]

if !HaveAtomicExt() then UNDEFINED;

integer t = UInt(Rt);
integer n = UInt(Rn);
integer s = UInt(Rs);

integer datasize = 8 << UInt(size);

Assembler Symbols

Alias Conditions

Alias Is preferred when

STCLR, STCLRL A == '0' && Rt == '11111'

LDCLR, LDCLRA, LDCLRAL,

Page 384
LDCLRL
Operation

bits(64) address;
bits(datasize) value;
bits(datasize) data;

if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);

value = X[s];
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];

data = MemAtomic(address, MemAtomicOp_BIC, value, ldacctype, stacctype);

if t != 31 then
X[t] = ZeroExtend(data, regsize);

Operational information

If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

LDCLR, LDCLRA, LDCLRAL,

Page 385
LDCLRL
LDCLRB, LDCLRAB, LDCLRALB, LDCLRLB

Atomic bit clear on byte in memory atomically loads an 8-bit byte from memory, performs a bitwise AND with the
complement of the value held in a register on it, and stores the result back to memory. The value initially loaded from
memory is returned in the destination register.
• If the destination register is not WZR, LDCLRAB and LDCLRALB load from memory with acquire semantics.
• LDCLRLB and LDCLRALB store to memory with release semantics.
• LDCLRB has neither acquire nor release semantics.
For more information about memory ordering semantics see Load-Acquire, Store-Release.
For information about memory accesses see Load/Store addressing modes.
This instruction is used by the alias STCLRB, STCLRLB.

Integer
(FEAT_LSE)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 1 1 0 0 0 A R 1 Rs 0 0 0 1 0 0 Rn Rt
size opc

LDCLRAB (A == 1 && R == 0)

LDCLRAB <Ws>, <Wt>, [<Xn|SP>]

LDCLRALB (A == 1 && R == 1)

LDCLRALB <Ws>, <Wt>, [<Xn|SP>]

LDCLRB (A == 0 && R == 0)

LDCLRB <Ws>, <Wt>, [<Xn|SP>]

LDCLRLB (A == 0 && R == 1)

LDCLRLB <Ws>, <Wt>, [<Xn|SP>]

if !HaveAtomicExt() then UNDEFINED;

integer t = UInt(Rt);
integer n = UInt(Rn);
integer s = UInt(Rs);

AccType ldacctype = if A == '1' && Rt != '11111' then AccType_ORDEREDATOMICRW else AccType_ATOMICRW;

AccType stacctype = if R == '1' then AccType_ORDEREDATOMICRW else AccType_ATOMICRW;
boolean tag_checked = n != 31;

Assembler Symbols

Alias Conditions

Alias Is preferred when

STCLRB, STCLRLB A == '0' && Rt == '11111'

LDCLRB, LDCLRAB,
Page 386
LDCLRALB, LDCLRLB
Operation

bits(64) address;
bits(8) value;
bits(8) data;

if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);

value = X[s];
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];

data = MemAtomic(address, MemAtomicOp_BIC, value, ldacctype, stacctype);

if t != 31 then
X[t] = ZeroExtend(data, 32);

Operational information

If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

LDCLRB, LDCLRAB,
Page 387
LDCLRALB, LDCLRLB
LDCLRH, LDCLRAH, LDCLRALH, LDCLRLH

Atomic bit clear on halfword in memory atomically loads a 16-bit halfword from memory, performs a bitwise AND with
the complement of the value held in a register on it, and stores the result back to memory. The value initially loaded
from memory is returned in the destination register.
• If the destination register is not WZR, LDCLRAH and LDCLRALH load from memory with acquire semantics.
• LDCLRLH and LDCLRALH store to memory with release semantics.
• LDCLRH has neither acquire nor release semantics.
For more information about memory ordering semantics see Load-Acquire, Store-Release.
For information about memory accesses see Load/Store addressing modes.
This instruction is used by the alias STCLRH, STCLRLH.

Integer
(FEAT_LSE)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 0 0 0 A R 1 Rs 0 0 0 1 0 0 Rn Rt
size opc

LDCLRAH (A == 1 && R == 0)

LDCLRAH <Ws>, <Wt>, [<Xn|SP>]

LDCLRALH (A == 1 && R == 1)

LDCLRALH <Ws>, <Wt>, [<Xn|SP>]

LDCLRH (A == 0 && R == 0)

LDCLRH <Ws>, <Wt>, [<Xn|SP>]

LDCLRLH (A == 0 && R == 1)

LDCLRLH <Ws>, <Wt>, [<Xn|SP>]

if !HaveAtomicExt() then UNDEFINED;

integer t = UInt(Rt);
integer n = UInt(Rn);
integer s = UInt(Rs);

AccType ldacctype = if A == '1' && Rt != '11111' then AccType_ORDEREDATOMICRW else AccType_ATOMICRW;

AccType stacctype = if R == '1' then AccType_ORDEREDATOMICRW else AccType_ATOMICRW;
boolean tag_checked = n != 31;

Assembler Symbols

Alias Conditions

Alias Is preferred when

STCLRH, STCLRLH A == '0' && Rt == '11111'

LDCLRH, LDCLRAH,
Page 388
LDCLRALH, LDCLRLH
Operation

bits(64) address;
bits(16) value;
bits(16) data;

if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);

value = X[s];
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];

data = MemAtomic(address, MemAtomicOp_BIC, value, ldacctype, stacctype);

if t != 31 then
X[t] = ZeroExtend(data, 32);

Operational information

If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

LDCLRH, LDCLRAH,
Page 389
LDCLRALH, LDCLRLH
LDEOR, LDEORA, LDEORAL, LDEORL

Atomic exclusive OR on word or doubleword in memory atomically loads a 32-bit word or 64-bit doubleword from
memory, performs an exclusive OR with the value held in a register on it, and stores the result back to memory. The
value initially loaded from memory is returned in the destination register.
• If the destination register is not one of WZR or XZR, LDEORA and LDEORAL load from memory with acquire
semantics.
• LDEORL and LDEORAL store to memory with release semantics.
• LDEOR has neither acquire nor release semantics.
For more information about memory ordering semantics see Load-Acquire, Store-Release.
For information about memory accesses see Load/Store addressing modes.
This instruction is used by the alias STEOR, STEORL.

Integer
(FEAT_LSE)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 x 1 1 1 0 0 0 A R 1 Rs 0 0 1 0 0 0 Rn Rt
size opc

LDEOR, LDEORA, LDEORAL,

Page 390
LDEORL
32-bit LDEOR (size == 10 && A == 0 && R == 0)

LDEOR <Ws>, <Wt>, [<Xn|SP>]

32-bit LDEORA (size == 10 && A == 1 && R == 0)

LDEORA <Ws>, <Wt>, [<Xn|SP>]

32-bit LDEORAL (size == 10 && A == 1 && R == 1)

LDEORAL <Ws>, <Wt>, [<Xn|SP>]

32-bit LDEORL (size == 10 && A == 0 && R == 1)

LDEORL <Ws>, <Wt>, [<Xn|SP>]

64-bit LDEOR (size == 11 && A == 0 && R == 0)

LDEOR <Xs>, <Xt>, [<Xn|SP>]

64-bit LDEORA (size == 11 && A == 1 && R == 0)

LDEORA <Xs>, <Xt>, [<Xn|SP>]

64-bit LDEORAL (size == 11 && A == 1 && R == 1)

LDEORAL <Xs>, <Xt>, [<Xn|SP>]

64-bit LDEORL (size == 11 && A == 0 && R == 1)

LDEORL <Xs>, <Xt>, [<Xn|SP>]

if !HaveAtomicExt() then UNDEFINED;

integer t = UInt(Rt);
integer n = UInt(Rn);
integer s = UInt(Rs);

integer datasize = 8 << UInt(size);

Assembler Symbols

Alias Conditions

Alias Is preferred when

STEOR, STEORL A == '0' && Rt == '11111'

LDEOR, LDEORA, LDEORAL,

Page 391
LDEORL
Operation

bits(64) address;
bits(datasize) value;
bits(datasize) data;

if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);

value = X[s];
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];

data = MemAtomic(address, MemAtomicOp_EOR, value, ldacctype, stacctype);

if t != 31 then
X[t] = ZeroExtend(data, regsize);

Operational information

If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

LDEOR, LDEORA, LDEORAL,

Page 392
LDEORL
LDEORB, LDEORAB, LDEORALB, LDEORLB

Atomic exclusive OR on byte in memory atomically loads an 8-bit byte from memory, performs an exclusive OR with
the value held in a register on it, and stores the result back to memory. The value initially loaded from memory is
returned in the destination register.
• If the destination register is not WZR, LDEORAB and LDEORALB load from memory with acquire semantics.
• LDEORLB and LDEORALB store to memory with release semantics.
• LDEORB has neither acquire nor release semantics.
For more information about memory ordering semantics see Load-Acquire, Store-Release.
For information about memory accesses see Load/Store addressing modes.
This instruction is used by the alias STEORB, STEORLB.

Integer
(FEAT_LSE)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 1 1 0 0 0 A R 1 Rs 0 0 1 0 0 0 Rn Rt
size opc

LDEORAB (A == 1 && R == 0)

LDEORAB <Ws>, <Wt>, [<Xn|SP>]

LDEORALB (A == 1 && R == 1)

LDEORALB <Ws>, <Wt>, [<Xn|SP>]

LDEORB (A == 0 && R == 0)

LDEORB <Ws>, <Wt>, [<Xn|SP>]

LDEORLB (A == 0 && R == 1)

LDEORLB <Ws>, <Wt>, [<Xn|SP>]

if !HaveAtomicExt() then UNDEFINED;

integer t = UInt(Rt);
integer n = UInt(Rn);
integer s = UInt(Rs);

AccType ldacctype = if A == '1' && Rt != '11111' then AccType_ORDEREDATOMICRW else AccType_ATOMICRW;

AccType stacctype = if R == '1' then AccType_ORDEREDATOMICRW else AccType_ATOMICRW;
boolean tag_checked = n != 31;

Assembler Symbols

Alias Conditions

Alias Is preferred when

STEORB, STEORLB A == '0' && Rt == '11111'

LDEORB, LDEORAB,
Page 393
LDEORALB, LDEORLB
Operation

bits(64) address;
bits(8) value;
bits(8) data;

if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);

value = X[s];
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];

data = MemAtomic(address, MemAtomicOp_EOR, value, ldacctype, stacctype);

if t != 31 then
X[t] = ZeroExtend(data, 32);

Operational information

If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

LDEORB, LDEORAB,
Page 394
LDEORALB, LDEORLB
LDEORH, LDEORAH, LDEORALH, LDEORLH

Atomic exclusive OR on halfword in memory atomically loads a 16-bit halfword from memory, performs an exclusive
OR with the value held in a register on it, and stores the result back to memory. The value initially loaded from
memory is returned in the destination register.
• If the destination register is not WZR, LDEORAH and LDEORALH load from memory with acquire semantics.
• LDEORLH and LDEORALH store to memory with release semantics.
• LDEORH has neither acquire nor release semantics.
For more information about memory ordering semantics see Load-Acquire, Store-Release.
For information about memory accesses see Load/Store addressing modes.
This instruction is used by the alias STEORH, STEORLH.

Integer
(FEAT_LSE)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 0 0 0 A R 1 Rs 0 0 1 0 0 0 Rn Rt
size opc

LDEORAH (A == 1 && R == 0)

LDEORAH <Ws>, <Wt>, [<Xn|SP>]

LDEORALH (A == 1 && R == 1)

LDEORALH <Ws>, <Wt>, [<Xn|SP>]

LDEORH (A == 0 && R == 0)

LDEORH <Ws>, <Wt>, [<Xn|SP>]

LDEORLH (A == 0 && R == 1)

LDEORLH <Ws>, <Wt>, [<Xn|SP>]

if !HaveAtomicExt() then UNDEFINED;

integer t = UInt(Rt);
integer n = UInt(Rn);
integer s = UInt(Rs);

AccType ldacctype = if A == '1' && Rt != '11111' then AccType_ORDEREDATOMICRW else AccType_ATOMICRW;

AccType stacctype = if R == '1' then AccType_ORDEREDATOMICRW else AccType_ATOMICRW;
boolean tag_checked = n != 31;

Assembler Symbols

Alias Conditions

Alias Is preferred when

STEORH, STEORLH A == '0' && Rt == '11111'

LDEORH, LDEORAH,
Page 395
LDEORALH, LDEORLH
Operation

bits(64) address;
bits(16) value;
bits(16) data;

if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);

value = X[s];
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];

data = MemAtomic(address, MemAtomicOp_EOR, value, ldacctype, stacctype);

if t != 31 then
X[t] = ZeroExtend(data, 32);

Operational information

If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

LDEORH, LDEORAH,
Page 396
LDEORALH, LDEORLH
LDG

Load Allocation Tag loads an Allocation Tag from a memory address, generates a Logical Address Tag from the
Allocation Tag and merges it into the destination register. The address used for the load is calculated from the base
register and an immediate signed offset scaled by the Tag granule.

Integer
(FEAT_MTE)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 1 0 0 1 0 1 1 imm9 0 0 Xn Xt

LDG <Xt>, [<Xn|SP>{, #<simm>}]

if !HaveMTEExt() then UNDEFINED;

integer t = UInt(Xt);
integer n = UInt(Xn);
bits(64) offset = LSL(SignExtend(imm9, 64), LOG2_TAG_GRANULE);

Assembler Symbols

<Xt> Is the 64-bit name of the general-purpose register to be transferred, encoded in the "Xt" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Xn" field.
<simm> Is the optional signed immediate offset, a multiple of 16 in the range -4096 to 4080, defaulting to 0 and
encoded in the "imm9" field.

Operation

bits(64) address;
bits(4) tag;

if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];

address = address + offset;

address = Align(address, TAG_GRANULE);

tag = AArch64.MemTag[address, AccType_NORMAL];

X[t] = AArch64.AddressWithAllocationTag(X[t], AccType_NORMAL, tag);

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

LDG Page 397

LDGM

Load Tag Multiple reads a naturally aligned block of N Allocation Tags, where the size of N is identified in
GMID_EL1.BS, and writes the Allocation Tag read from address A to the destination register at
4*A<7:4>+3:4*A<7:4>. Bits of the destination register not written with an Allocation Tag are set to 0.
This instruction is UNDEFINED at EL0.
This instruction generates an Unchecked access.

Integer
(FEAT_MTE2)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 1 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 Xn Xt

LDGM <Xt>, [<Xn|SP>]

if !HaveMTE2Ext() then UNDEFINED;

integer t = UInt(Xt);
integer n = UInt(Xn);

Assembler Symbols

Operation

if PSTATE.EL == EL0 then

UNDEFINED;

bits(64) data = Zeros(64);

bits(64) address;

if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];

integer size = 4 * (2 ^ (UInt(GMID_EL1.BS)));

address = Align(address, size);
integer count = size >> LOG2_TAG_GRANULE;
integer index = UInt(address<LOG2_TAG_GRANULE+3:LOG2_TAG_GRANULE>);

for i = 0 to count-1
bits(4) tag = AArch64.MemTag[address, AccType_NORMAL];
data<(index*4)+3:index*4> = tag;
address = address + TAG_GRANULE;
index = index + 1;

X[t] = data;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

LDGM Page 398

LDLAR

Load LOAcquire Register loads a 32-bit word or 64-bit doubleword from memory, and writes it to a register. The
instruction also has memory ordering semantics as described in Load LOAcquire, Store LORelease. For information
about memory accesses, see Load/Store addressing modes.

Note

For this instruction, if the destination is WZR/XZR, it is impossible for software to observe the presence of the
acquire semantic other than its effect on the arrival at endpoints.

No offset
(FEAT_LOR)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 x 0 0 1 0 0 0 1 1 0 (1) (1) (1) (1) (1) 0 (1) (1) (1) (1) (1) Rn Rt
size L Rs o0 Rt2

32-bit (size == 10)

LDLAR <Wt>, [<Xn|SP>{,#0}]

64-bit (size == 11)

LDLAR <Xt>, [<Xn|SP>{,#0}]

integer n = UInt(Rn);
integer t = UInt(Rt);

integer elsize = 8 << UInt(size);

integer regsize = if elsize == 64 then 64 else 32;
boolean tag_checked = n != 31;

Assembler Symbols

Operation

bits(64) address;
bits(elsize) data;
constant integer dbytes = elsize DIV 8;

if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);

if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];

data = Mem[address, dbytes, AccType_LIMITEDORDERED];

X[t] = ZeroExtend(data, regsize);

LDLAR Page 399

Operational information

If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

LDLAR Page 400

LDLARB

Load LOAcquire Register Byte loads a byte from memory, zero-extends it and writes it to a register. The instruction
also has memory ordering semantics as described in Load LOAcquire, Store LORelease. For information about memory
accesses, see Load/Store addressing modes.

Note

For this instruction, if the destination is WZR/XZR, it is impossible for software to observe the presence of the
acquire semantic other than its effect on the arrival at endpoints.

No offset
(FEAT_LOR)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 1 0 0 0 1 1 0 (1) (1) (1) (1) (1) 0 (1) (1) (1) (1) (1) Rn Rt
size L Rs o0 Rt2

LDLARB <Wt>, [<Xn|SP>{,#0}]

integer n = UInt(Rn);
integer t = UInt(Rt);

boolean tag_checked = n != 31;

Assembler Symbols

Operation

bits(64) address;
bits(8) data;

if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);

if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];

data = Mem[address, 1, AccType_LIMITEDORDERED];

X[t] = ZeroExtend(data, 32);

Operational information

If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

LDLARB Page 401

LDLARH

Load LOAcquire Register Halfword loads a halfword from memory, zero-extends it, and writes it to a register. The
instruction also has memory ordering semantics as described in Load LOAcquire, Store LORelease. For information
about memory accesses, see Load/Store addressing modes.

Note

For this instruction, if the destination is WZR/XZR, it is impossible for software to observe the presence of the
acquire semantic other than its effect on the arrival at endpoints.

No offset
(FEAT_LOR)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 0 1 0 0 0 1 1 0 (1) (1) (1) (1) (1) 0 (1) (1) (1) (1) (1) Rn Rt
size L Rs o0 Rt2

LDLARH <Wt>, [<Xn|SP>{,#0}]

integer n = UInt(Rn);
integer t = UInt(Rt);

boolean tag_checked = n != 31;

Assembler Symbols

Operation

bits(64) address;
bits(16) data;

if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);

if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];

data = Mem[address, 2, AccType_LIMITEDORDERED];

X[t] = ZeroExtend(data, 32);

Operational information

If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

LDLARH Page 402

LDNP

Load Pair of Registers, with non-temporal hint, calculates an address from a base register value and an immediate
offset, loads two 32-bit words or two 64-bit doublewords from memory, and writes them to two registers.
For information about memory accesses, see Load/Store addressing modes. For information about Non-temporal pair
instructions, see Load/Store Non-temporal pair.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
x 0 1 0 1 0 0 0 0 1 imm7 Rt2 Rn Rt
opc L

32-bit (opc == 00)

LDNP <Wt1>, <Wt2>, [<Xn|SP>{, #<imm>}]

64-bit (opc == 10)

LDNP <Xt1>, <Xt2>, [<Xn|SP>{, #<imm>}]

// Empty.

For information about the CONSTRAINED UNPREDICTABLE behavior of this instruction, see Architectural Constraints on
UNPREDICTABLE behaviors, and particularly LDNP.

Assembler Symbols

<Wt1> Is the 32-bit name of the first general-purpose register to be transferred, encoded in the "Rt" field.
<Wt2> Is the 32-bit name of the second general-purpose register to be transferred, encoded in the "Rt2" field.
<Xt1> Is the 64-bit name of the first general-purpose register to be transferred, encoded in the "Rt" field.
<Xt2> Is the 64-bit name of the second general-purpose register to be transferred, encoded in the "Rt2" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<imm> For the 32-bit variant: is the optional signed immediate byte offset, a multiple of 4 in the range -256 to
252, defaulting to 0 and encoded in the "imm7" field as <imm>/4.
For the 64-bit variant: is the optional signed immediate byte offset, a multiple of 8 in the range -512 to
504, defaulting to 0 and encoded in the "imm7" field as <imm>/8.

Shared Decode

integer n = UInt(Rn);
integer t = UInt(Rt);
integer t2 = UInt(Rt2);
if opc<0> == '1' then UNDEFINED;
integer scale = 2 + UInt(opc<1>);
integer datasize = 8 << scale;
bits(64) offset = LSL(SignExtend(imm7, 64), scale);
boolean tag_checked = n != 31;

boolean rt_unknown = FALSE;

LDNP Page 403

Operation

bits(64) address;
bits(datasize) data1;
bits(datasize) data2;
constant integer dbytes = datasize DIV 8;

if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);

if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];

address = address + offset;

if HaveLSE2Ext() then
bits(2*datasize) full_data;
full_data = Mem[address, 2*dbytes, AccType_NORMAL, TRUE];
if BigEndian(AccType_STREAM) then
data2 = full_data<(datasize-1):0>;
data1 = full_data<(2*datasize-1):datasize>;
else
data1 = full_data<(datasize-1):0>;
data2 = full_data<(2*datasize-1):datasize>;
else
data1 = Mem[address, dbytes, AccType_STREAM];
data2 = Mem[address+dbytes, dbytes, AccType_STREAM];
if rt_unknown then
data1 = bits(datasize) UNKNOWN;
data2 = bits(datasize) UNKNOWN;
X[t] = data1;
X[t2] = data2;

Operational information

If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

LDNP Page 404

LDP

Load Pair of Registers calculates an address from a base register value and an immediate offset, loads two 32-bit
words or two 64-bit doublewords from memory, and writes them to two registers. For information about memory
accesses, see Load/Store addressing modes.
It has encodings from 3 classes: Post-index , Pre-index and Signed offset

Post-index

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
x 0 1 0 1 0 0 0 1 1 imm7 Rt2 Rn Rt
opc L

32-bit (opc == 00)

LDP <Wt1>, <Wt2>, [<Xn|SP>], #<imm>

64-bit (opc == 10)

LDP <Xt1>, <Xt2>, [<Xn|SP>], #<imm>

boolean wback = TRUE;

boolean postindex = TRUE;

Pre-index

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
x 0 1 0 1 0 0 1 1 1 imm7 Rt2 Rn Rt
opc L

32-bit (opc == 00)

LDP <Wt1>, <Wt2>, [<Xn|SP>, #<imm>]!

64-bit (opc == 10)

LDP <Xt1>, <Xt2>, [<Xn|SP>, #<imm>]!

boolean wback = TRUE;

boolean postindex = FALSE;

Signed offset

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
x 0 1 0 1 0 0 1 0 1 imm7 Rt2 Rn Rt
opc L

32-bit (opc == 00)

LDP <Wt1>, <Wt2>, [<Xn|SP>{, #<imm>}]

64-bit (opc == 10)

LDP <Xt1>, <Xt2>, [<Xn|SP>{, #<imm>}]

boolean wback = FALSE;

boolean postindex = FALSE;

LDP Page 405

For information about the CONSTRAINED UNPREDICTABLE behavior of this instruction, see Architectural Constraints on
UNPREDICTABLE behaviors, and particularly LDP.

Assembler Symbols

<Wt1> Is the 32-bit name of the first general-purpose register to be transferred, encoded in the "Rt" field.
<Wt2> Is the 32-bit name of the second general-purpose register to be transferred, encoded in the "Rt2" field.
<Xt1> Is the 64-bit name of the first general-purpose register to be transferred, encoded in the "Rt" field.
<Xt2> Is the 64-bit name of the second general-purpose register to be transferred, encoded in the "Rt2" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<imm> For the 32-bit post-index and 32-bit pre-index variant: is the signed immediate byte offset, a multiple of
4 in the range -256 to 252, encoded in the "imm7" field as <imm>/4.
For the 32-bit signed offset variant: is the optional signed immediate byte offset, a multiple of 4 in the
range -256 to 252, defaulting to 0 and encoded in the "imm7" field as <imm>/4.
For the 64-bit post-index and 64-bit pre-index variant: is the signed immediate byte offset, a multiple of
8 in the range -512 to 504, encoded in the "imm7" field as <imm>/8.
For the 64-bit signed offset variant: is the optional signed immediate byte offset, a multiple of 8 in the
range -512 to 504, defaulting to 0 and encoded in the "imm7" field as <imm>/8.

Shared Decode

integer n = UInt(Rn);
integer t = UInt(Rt);
integer t2 = UInt(Rt2);
if L:opc<0> == '01' || opc == '11' then UNDEFINED;
boolean signed = (opc<0> != '0');
integer scale = 2 + UInt(opc<1>);
integer datasize = 8 << scale;
bits(64) offset = LSL(SignExtend(imm7, 64), scale);
boolean tag_checked = wback || n != 31;

boolean rt_unknown = FALSE;

boolean wb_unknown = FALSE;

if wback && (t == n || t2 == n) && n != 31 then

Constraint c = ConstrainUnpredictable(Unpredictable_WBOVERLAPLD);
assert c IN {Constraint_WBSUPPRESS, Constraint_UNKNOWN, Constraint_UNDEF, Constraint_NOP};
case c of
when Constraint_WBSUPPRESS wback = FALSE; // writeback is suppressed
when Constraint_UNKNOWN wb_unknown = TRUE; // writeback is UNKNOWN
when Constraint_UNDEF UNDEFINED;
when Constraint_NOP EndOfInstruction();

LDP Page 406

Operation

bits(64) address;
bits(datasize) data1;
bits(datasize) data2;
constant integer dbytes = datasize DIV 8;

if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);

if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];

if !postindex then
address = address + offset;

if HaveLSE2Ext() && !signed then

bits(2*datasize) full_data;
full_data = Mem[address, 2*dbytes, AccType_NORMAL, TRUE];
if BigEndian(AccType_NORMAL) then
data2 = full_data<(datasize-1):0>;
data1 = full_data<(2*datasize-1):datasize>;
else
data1 = full_data<(datasize-1):0>;
data2 = full_data<(2*datasize-1):datasize>;
else
data1 = Mem[address, dbytes, AccType_NORMAL];
data2 = Mem[address+dbytes, dbytes, AccType_NORMAL];
if rt_unknown then
data1 = bits(datasize) UNKNOWN;
data2 = bits(datasize) UNKNOWN;
if signed then
X[t] = SignExtend(data1, 64);
X[t2] = SignExtend(data2, 64);
else
X[t] = data1;
X[t2] = data2;

if wback then
if wb_unknown then
address = bits(64) UNKNOWN;
elsif postindex then
address = address + offset;
if n == 31 then
SP[] = address;
else
X[n] = address;

Operational information

If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

LDP Page 407

LDPSW

Load Pair of Registers Signed Word calculates an address from a base register value and an immediate offset, loads
two 32-bit words from memory, sign-extends them, and writes them to two registers. For information about memory
accesses, see Load/Store addressing modes.
It has encodings from 3 classes: Post-index , Pre-index and Signed offset

Post-index

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 1 0 0 0 1 1 imm7 Rt2 Rn Rt
opc L

LDPSW <Xt1>, <Xt2>, [<Xn|SP>], #<imm>

boolean wback = TRUE;

boolean postindex = TRUE;

Pre-index

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 1 0 0 1 1 1 imm7 Rt2 Rn Rt
opc L

LDPSW <Xt1>, <Xt2>, [<Xn|SP>, #<imm>]!

boolean wback = TRUE;

boolean postindex = FALSE;

Signed offset

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 1 0 0 1 0 1 imm7 Rt2 Rn Rt
opc L

LDPSW <Xt1>, <Xt2>, [<Xn|SP>{, #<imm>}]

boolean wback = FALSE;

boolean postindex = FALSE;

For information about the CONSTRAINED UNPREDICTABLE behavior of this instruction, see Architectural Constraints on
UNPREDICTABLE behaviors, and particularly LDPSW.

Assembler Symbols

<Xt1> Is the 64-bit name of the first general-purpose register to be transferred, encoded in the "Rt" field.
<Xt2> Is the 64-bit name of the second general-purpose register to be transferred, encoded in the "Rt2" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<imm> For the post-index and pre-index variant: is the signed immediate byte offset, a multiple of 4 in the
range -256 to 252, encoded in the "imm7" field as <imm>/4.
For the signed offset variant: is the optional signed immediate byte offset, a multiple of 4 in the range
-256 to 252, defaulting to 0 and encoded in the "imm7" field as <imm>/4.

LDPSW Page 408

Shared Decode

integer n = UInt(Rn);
integer t = UInt(Rt);
integer t2 = UInt(Rt2);
bits(64) offset = LSL(SignExtend(imm7, 64), 2);
boolean tag_checked = wback || n != 31;

boolean rt_unknown = FALSE;

boolean wb_unknown = FALSE;

if wback && (t == n || t2 == n) && n != 31 then

LDPSW Page 409

Operation

bits(64) address;
bits(32) data1;
bits(32) data2;

if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);

if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];

if !postindex then
address = address + offset;

if HaveLSE2Ext() && FALSE then

bits(64) full_data;
full_data = Mem[address, 8, AccType_NORMAL, TRUE];
if BigEndian(AccType_NORMAL) then
data2 = full_data<31:0>;
data1 = full_data<63:32>;
else
data1 = full_data<31:0>;
data2 = full_data<63:32>;
else
data1 = Mem[address, 4, AccType_NORMAL];
data2 = Mem[address+4, 4, AccType_NORMAL];
if rt_unknown then
data1 = bits(32) UNKNOWN;
data2 = bits(32) UNKNOWN;
X[t] = SignExtend(data1, 64);
X[t2] = SignExtend(data2, 64);
if wback then
if wb_unknown then
address = bits(64) UNKNOWN;
elsif postindex then
address = address + offset;
if n == 31 then
SP[] = address;
else
X[n] = address;

Operational information

If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

LDPSW Page 410

LDR (immediate)

Load Register (immediate) loads a word or doubleword from memory and writes it to a register. The address that is
used for the load is calculated from a base register and an immediate offset. For information about memory accesses,
see Load/Store addressing modes. The Unsigned offset variant scales the immediate offset value by the size of the
value accessed before adding it to the base register value.
It has encodings from 3 classes: Post-index , Pre-index and Unsigned offset

Post-index

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 x 1 1 1 0 0 0 0 1 0 imm9 0 1 Rn Rt
size opc

32-bit (size == 10)

LDR <Wt>, [<Xn|SP>], #<simm>

64-bit (size == 11)

LDR <Xt>, [<Xn|SP>], #<simm>

boolean wback = TRUE;

boolean postindex = TRUE;
integer scale = UInt(size);
bits(64) offset = SignExtend(imm9, 64);

Pre-index

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 x 1 1 1 0 0 0 0 1 0 imm9 1 1 Rn Rt
size opc

32-bit (size == 10)

LDR <Wt>, [<Xn|SP>, #<simm>]!

64-bit (size == 11)

LDR <Xt>, [<Xn|SP>, #<simm>]!

boolean wback = TRUE;

boolean postindex = FALSE;
integer scale = UInt(size);
bits(64) offset = SignExtend(imm9, 64);

Unsigned offset

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 x 1 1 1 0 0 1 0 1 imm12 Rn Rt
size opc

LDR (immediate) Page 411

32-bit (size == 10)

LDR <Wt>, [<Xn|SP>{, #<pimm>}]

64-bit (size == 11)

LDR <Xt>, [<Xn|SP>{, #<pimm>}]

boolean wback = FALSE;

boolean postindex = FALSE;
integer scale = UInt(size);
bits(64) offset = LSL(ZeroExtend(imm12, 64), scale);

For information about the CONSTRAINED UNPREDICTABLE behavior of this instruction, see Architectural Constraints on
UNPREDICTABLE behaviors, and particularly LDR (immediate).

Assembler Symbols

<Wt> Is the 32-bit name of the general-purpose register to be transferred, encoded in the "Rt" field.
<Xt> Is the 64-bit name of the general-purpose register to be transferred, encoded in the "Rt" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<simm> Is the signed immediate byte offset, in the range -256 to 255, encoded in the "imm9" field.
<pimm> For the 32-bit variant: is the optional positive immediate byte offset, a multiple of 4 in the range 0 to
16380, defaulting to 0 and encoded in the "imm12" field as <pimm>/4.
For the 64-bit variant: is the optional positive immediate byte offset, a multiple of 8 in the range 0 to
32760, defaulting to 0 and encoded in the "imm12" field as <pimm>/8.

Shared Decode

integer n = UInt(Rn);
integer t = UInt(Rt);
integer regsize;

regsize = if size == '11' then 64 else 32;

integer datasize = 8 << scale;
boolean tag_checked = wback || n != 31;

boolean wb_unknown = FALSE;

if wback && n == t && n != 31 then

c = ConstrainUnpredictable(Unpredictable_WBOVERLAPLD);
assert c IN {Constraint_WBSUPPRESS, Constraint_UNKNOWN, Constraint_UNDEF, Constraint_NOP};
case c of
when Constraint_WBSUPPRESS wback = FALSE; // writeback is suppressed
when Constraint_UNKNOWN wb_unknown = TRUE; // writeback is UNKNOWN
when Constraint_UNDEF UNDEFINED;
when Constraint_NOP EndOfInstruction();

LDR (immediate) Page 412

Operation

bits(64) address;
bits(datasize) data;

if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);

if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];

if !postindex then
address = address + offset;

data = Mem[address, datasize DIV 8, AccType_NORMAL];

X[t] = ZeroExtend(data, regsize);

if wback then
if wb_unknown then
address = bits(64) UNKNOWN;
elsif postindex then
address = address + offset;
if n == 31 then
SP[] = address;
else
X[n] = address;

Operational information

If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

LDR (immediate) Page 413

LDR (literal)

Load Register (literal) calculates an address from the PC value and an immediate offset, loads a word from memory,
and writes it to a register. For information about memory accesses, see Load/Store addressing modes.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 x 0 1 1 0 0 0 imm19 Rt
opc

32-bit (opc == 00)

LDR <Wt>, <label>

64-bit (opc == 01)

LDR <Xt>, <label>

integer t = UInt(Rt);
MemOp memop = MemOp_LOAD;
boolean signed = FALSE;
integer size;
bits(64) offset;

case opc of
when '00'
size = 4;
when '01'
size = 8;
when '10'
size = 4;
signed = TRUE;
when '11'
memop = MemOp_PREFETCH;

offset = SignExtend(imm19:'00', 64);

Assembler Symbols

<Wt> Is the 32-bit name of the general-purpose register to be loaded, encoded in the "Rt" field.
<Xt> Is the 64-bit name of the general-purpose register to be loaded, encoded in the "Rt" field.
<label> Is the program label from which the data is to be loaded. Its offset from the address of this instruction,
in the range +/-1MB, is encoded as "imm19" times 4.

Operation

bits(64) address = PC[] + offset;

bits(size*8) data;

if HaveMTE2Ext() then
SetTagCheckedInstruction(FALSE);

case memop of
when MemOp_LOAD
data = Mem[address, size, AccType_NORMAL];
if signed then
X[t] = SignExtend(data, 64);
else
X[t] = data;

when MemOp_PREFETCH
Prefetch(address, t<4:0>);

LDR (literal) Page 414

Operational information

If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

LDR (literal) Page 415

LDR (register)

Load Register (register) calculates an address from a base register value and an offset register value, loads a word
from memory, and writes it to a register. The offset register value can optionally be shifted and extended. For
information about memory accesses, see Load/Store addressing modes.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 x 1 1 1 0 0 0 0 1 1 Rm option S 1 0 Rn Rt
size opc

32-bit (size == 10)

LDR <Wt>, [<Xn|SP>, (<Wm>|<Xm>){, <extend> {<amount>}}]

64-bit (size == 11)

LDR <Xt>, [<Xn|SP>, (<Wm>|<Xm>){, <extend> {<amount>}}]

integer scale = UInt(size);

if option<1> == '0' then UNDEFINED; // sub-word index
ExtendType extend_type = DecodeRegExtend(option);
integer shift = if S == '1' then scale else 0;

Assembler Symbols

<Wt> Is the 32-bit name of the general-purpose register to be transferred, encoded in the "Rt" field.
<Xt> Is the 64-bit name of the general-purpose register to be transferred, encoded in the "Rt" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<Wm> When option<0> is set to 0, is the 32-bit name of the general-purpose index register, encoded in the
"Rm" field.
<Xm> When option<0> is set to 1, is the 64-bit name of the general-purpose index register, encoded in the
"Rm" field.

<extend> Is the index extend/shift specifier, defaulting to LSL, and which must be omitted for the LSL option
when <amount> is omitted. encoded in “option”:

option <extend>
010 UXTW
011 LSL
110 SXTW
111 SXTX

<amount> For the 32-bit variant: is the index shift amount, optional only when <extend> is not LSL. Where it is
permitted to be optional, it defaults to #0. It is encoded in “S”:

S <amount>
0 #0
1 #2

For the 64-bit variant: is the index shift amount, optional only when <extend> is not LSL. Where it is
permitted to be optional, it defaults to #0. It is encoded in “S”:

S <amount>
0 #0
1 #3

LDR (register) Page 416

Shared Decode

integer n = UInt(Rn);
integer t = UInt(Rt);
integer m = UInt(Rm);
integer regsize;

regsize = if size == '11' then 64 else 32;

integer datasize = 8 << scale;

Operation

bits(64) offset = ExtendReg(m, extend_type, shift);

bits(64) address;
bits(datasize) data;

if HaveMTE2Ext() then
SetTagCheckedInstruction(TRUE);

if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];

address = address + offset;

data = Mem[address, datasize DIV 8, AccType_NORMAL];

X[t] = ZeroExtend(data, regsize);

Operational information

If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

LDR (register) Page 417

LDRAA, LDRAB

Load Register, with pointer authentication. This instruction authenticates an address from a base register using a
modifier of zero and the specified key, adds an immediate offset to the authenticated address, and loads a 64-bit
doubleword from memory at this resulting address into a register.
Key A is used for LDRAA, and key B is used for LDRAB.
If the authentication passes, the PE behaves the same as for an LDR instruction. If the authentication fails, a
Translation fault is generated.
The authenticated address is not written back to the base register, unless the pre-indexed variant of the instruction is
used. In this case, the address that is written back to the base register does not include the pointer authentication
code.
For information about memory accesses, see Load/Store addressing modes.

Unscaled offset
(FEAT_PAuth)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 1 1 0 0 0 M S 1 imm9 W 1 Rn Rt
size

Key A, offset (M == 0 && W == 0)

LDRAA <Xt>, [<Xn|SP>{, #<simm>}]

Key A, pre-indexed (M == 0 && W == 1)

LDRAA <Xt>, [<Xn|SP>{, #<simm>}]!

Key B, offset (M == 1 && W == 0)

LDRAB <Xt>, [<Xn|SP>{, #<simm>}]

Key B, pre-indexed (M == 1 && W == 1)

LDRAB <Xt>, [<Xn|SP>{, #<simm>}]!

if !HavePACExt() then UNDEFINED;

integer t = UInt(Rt);
integer n = UInt(Rn);
boolean wback = (W == '1');
boolean use_key_a = (M == '0');
bits(10) S10 = S:imm9;
bits(64) offset = LSL(SignExtend(S10, 64), 3);
boolean tag_checked = wback || n != 31;

Assembler Symbols

<Xt> Is the 64-bit name of the general-purpose register to be transferred, encoded in the "Rt" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<simm> Is the optional signed immediate byte offset, a multiple of 8 in the range -4096 to 4088, defaulting to 0
and encoded in the "S:imm9" field as <simm>/8.

LDRAA, LDRAB Page 418

Operation

bits(64) address;
bits(64) data;
boolean wb_unknown = FALSE;

if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);

if wback && n == t && n != 31 then

if n == 31 then
address = SP[];
else
address = X[n];

if use_key_a then
address = AuthDA(address, X[31], TRUE);
else
address = AuthDB(address, X[31], TRUE);

if n == 31 then
CheckSPAlignment();

address = address + offset;

data = Mem[address, 8, AccType_NORMAL];
X[t] = data;

if wback then
if wb_unknown then
address = bits(64) UNKNOWN;
if n == 31 then
SP[] = address;
else
X[n] = address;

Operational information

If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

LDRAA, LDRAB Page 419

LDRB (immediate)

Load Register Byte (immediate) loads a byte from memory, zero-extends it, and writes the result to a register. The
address that is used for the load is calculated from a base register and an immediate offset. For information about
memory accesses, see Load/Store addressing modes.
It has encodings from 3 classes: Post-index , Pre-index and Unsigned offset

Post-index

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 1 1 0 0 0 0 1 0 imm9 0 1 Rn Rt
size opc

LDRB <Wt>, [<Xn|SP>], #<simm>

boolean wback = TRUE;

boolean postindex = TRUE;
bits(64) offset = SignExtend(imm9, 64);

Pre-index

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 1 1 0 0 0 0 1 0 imm9 1 1 Rn Rt
size opc

LDRB <Wt>, [<Xn|SP>, #<simm>]!

boolean wback = TRUE;

boolean postindex = FALSE;
bits(64) offset = SignExtend(imm9, 64);

Unsigned offset

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 1 1 0 0 1 0 1 imm12 Rn Rt
size opc

LDRB <Wt>, [<Xn|SP>{, #<pimm>}]

boolean wback = FALSE;

boolean postindex = FALSE;
bits(64) offset = LSL(ZeroExtend(imm12, 64), 0);

For information about the CONSTRAINED UNPREDICTABLE behavior of this instruction, see Architectural Constraints on
UNPREDICTABLE behaviors, and particularly LDRH (immediate).

Assembler Symbols

<Wt> Is the 32-bit name of the general-purpose register to be transferred, encoded in the "Rt" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<simm> Is the signed immediate byte offset, in the range -256 to 255, encoded in the "imm9" field.
<pimm> Is the optional positive immediate byte offset, in the range 0 to 4095, defaulting to 0 and encoded in the
"imm12" field.

LDRB (immediate) Page 420

Shared Decode

integer n = UInt(Rn);
integer t = UInt(Rt);

boolean tag_checked = wback || n != 31;

boolean wb_unknown = FALSE;

if wback && n == t && n != 31 then

Operation

bits(64) address;
bits(8) data;

if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);

if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];

if !postindex then
address = address + offset;

data = Mem[address, 1, AccType_NORMAL];

X[t] = ZeroExtend(data, 32);

if wback then
if wb_unknown then
address = bits(64) UNKNOWN;
elsif postindex then
address = address + offset;
if n == 31 then
SP[] = address;
else
X[n] = address;

Operational information

If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

LDRB (immediate) Page 421

LDRB (register)

Load Register Byte (register) calculates an address from a base register value and an offset register value, loads a
byte from memory, zero-extends it, and writes it to a register. For information about memory accesses, see Load/Store
addressing modes.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 1 1 0 0 0 0 1 1 Rm option S 1 0 Rn Rt
size opc

Extended register (option != 011)

LDRB <Wt>, [<Xn|SP>, (<Wm>|<Xm>), <extend> {<amount>}]

Shifted register (option == 011)

LDRB <Wt>, [<Xn|SP>, <Xm>{, LSL <amount>}]

if option<1> == '0' then UNDEFINED; // sub-word index

ExtendType extend_type = DecodeRegExtend(option);

Assembler Symbols

<Wt> Is the 32-bit name of the general-purpose register to be transferred, encoded in the "Rt" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<Wm> When option<0> is set to 0, is the 32-bit name of the general-purpose index register, encoded in the
"Rm" field.
<Xm> When option<0> is set to 1, is the 64-bit name of the general-purpose index register, encoded in the
"Rm" field.

<extend> Is the index extend specifier, encoded in “option”:

option <extend>
010 UXTW
110 SXTW
111 SXTX

<amount> Is the index shift amount, it must be #0, encoded in "S" as 0 if omitted, or as 1 if present.

Shared Decode

integer n = UInt(Rn);
integer t = UInt(Rt);
integer m = UInt(Rm);

LDRB (register) Page 422

Operation

bits(64) offset = ExtendReg(m, extend_type, 0);

bits(64) address;
bits(8) data;

if HaveMTE2Ext() then
SetTagCheckedInstruction(TRUE);

if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];

address = address + offset;

data = Mem[address, 1, AccType_NORMAL];

X[t] = ZeroExtend(data, 32);

Operational information

If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

LDRB (register) Page 423

LDRH (immediate)

Load Register Halfword (immediate) loads a halfword from memory, zero-extends it, and writes the result to a register.
The address that is used for the load is calculated from a base register and an immediate offset. For information about
memory accesses, see Load/Store addressing modes.
It has encodings from 3 classes: Post-index , Pre-index and Unsigned offset

Post-index

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 0 0 0 0 1 0 imm9 0 1 Rn Rt
size opc

LDRH <Wt>, [<Xn|SP>], #<simm>

boolean wback = TRUE;

boolean postindex = TRUE;
bits(64) offset = SignExtend(imm9, 64);

Pre-index

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 0 0 0 0 1 0 imm9 1 1 Rn Rt
size opc

LDRH <Wt>, [<Xn|SP>, #<simm>]!

boolean wback = TRUE;

boolean postindex = FALSE;
bits(64) offset = SignExtend(imm9, 64);

Unsigned offset

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 0 0 1 0 1 imm12 Rn Rt
size opc

LDRH <Wt>, [<Xn|SP>{, #<pimm>}]

boolean wback = FALSE;

boolean postindex = FALSE;
bits(64) offset = LSL(ZeroExtend(imm12, 64), 1);

For information about the CONSTRAINED UNPREDICTABLE behavior of this instruction, see Architectural Constraints on
UNPREDICTABLE behaviors, and particularly LDRH (immediate).

Assembler Symbols

<Wt> Is the 32-bit name of the general-purpose register to be transferred, encoded in the "Rt" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<simm> Is the signed immediate byte offset, in the range -256 to 255, encoded in the "imm9" field.
<pimm> Is the optional positive immediate byte offset, a multiple of 2 in the range 0 to 8190, defaulting to 0 and
encoded in the "imm12" field as <pimm>/2.

LDRH (immediate) Page 424

Shared Decode

integer n = UInt(Rn);
integer t = UInt(Rt);

boolean tag_checked = wback || n != 31;

boolean wb_unknown = FALSE;

if wback && n == t && n != 31 then

Operation

bits(64) address;
bits(16) data;

if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);

if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];

if !postindex then
address = address + offset;

data = Mem[address, 2, AccType_NORMAL];

X[t] = ZeroExtend(data, 32);

if wback then
if wb_unknown then
address = bits(64) UNKNOWN;
elsif postindex then
address = address + offset;
if n == 31 then
SP[] = address;
else
X[n] = address;

Operational information

If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

LDRH (immediate) Page 425

LDRH (register)

Load Register Halfword (register) calculates an address from a base register value and an offset register value, loads a
halfword from memory, zero-extends it, and writes it to a register. For information about memory accesses, see Load/
Store addressing modes.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 0 0 0 0 1 1 Rm option S 1 0 Rn Rt
size opc

LDRH <Wt>, [<Xn|SP>, (<Wm>|<Xm>){, <extend> {<amount>}}]

if option<1> == '0' then UNDEFINED; // sub-word index

ExtendType extend_type = DecodeRegExtend(option);
integer shift = if S == '1' then 1 else 0;

Assembler Symbols

<Wt> Is the 32-bit name of the general-purpose register to be transferred, encoded in the "Rt" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<Wm> When option<0> is set to 0, is the 32-bit name of the general-purpose index register, encoded in the
"Rm" field.
<Xm> When option<0> is set to 1, is the 64-bit name of the general-purpose index register, encoded in the
"Rm" field.

<extend> Is the index extend/shift specifier, defaulting to LSL, and which must be omitted for the LSL option
when <amount> is omitted. encoded in “option”:

option <extend>
010 UXTW
011 LSL
110 SXTW
111 SXTX

<amount> Is the index shift amount, optional only when <extend> is not LSL. Where it is permitted to be optional,
it defaults to #0. It is encoded in “S”:

S <amount>
0 #0
1 #1

Shared Decode

integer n = UInt(Rn);
integer t = UInt(Rt);
integer m = UInt(Rm);

LDRH (register) Page 426

Operation

bits(64) offset = ExtendReg(m, extend_type, shift);

bits(64) address;
bits(16) data;

if HaveMTE2Ext() then
SetTagCheckedInstruction(TRUE);

if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];

address = address + offset;

data = Mem[address, 2, AccType_NORMAL];

X[t] = ZeroExtend(data, 32);

Operational information

If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

LDRH (register) Page 427

LDRSB (immediate)

Load Register Signed Byte (immediate) loads a byte from memory, sign-extends it to either 32 bits or 64 bits, and
writes the result to a register. The address that is used for the load is calculated from a base register and an
immediate offset. For information about memory accesses, see Load/Store addressing modes.
It has encodings from 3 classes: Post-index , Pre-index and Unsigned offset

Post-index

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 1 1 0 0 0 1 x 0 imm9 0 1 Rn Rt
size opc

32-bit (opc == 11)

LDRSB <Wt>, [<Xn|SP>], #<simm>

64-bit (opc == 10)

LDRSB <Xt>, [<Xn|SP>], #<simm>

boolean wback = TRUE;

boolean postindex = TRUE;
bits(64) offset = SignExtend(imm9, 64);

Pre-index

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 1 1 0 0 0 1 x 0 imm9 1 1 Rn Rt
size opc

32-bit (opc == 11)

LDRSB <Wt>, [<Xn|SP>, #<simm>]!

64-bit (opc == 10)

LDRSB <Xt>, [<Xn|SP>, #<simm>]!

boolean wback = TRUE;

boolean postindex = FALSE;
bits(64) offset = SignExtend(imm9, 64);

Unsigned offset

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 1 1 0 0 1 1 x imm12 Rn Rt
size opc

LDRSB (immediate) Page 428

32-bit (opc == 11)

LDRSB <Wt>, [<Xn|SP>{, #<pimm>}]

64-bit (opc == 10)

LDRSB <Xt>, [<Xn|SP>{, #<pimm>}]

boolean wback = FALSE;

boolean postindex = FALSE;
bits(64) offset = LSL(ZeroExtend(imm12, 64), 0);

For information about the CONSTRAINED UNPREDICTABLE behavior of this instruction, see Architectural Constraints on
UNPREDICTABLE behaviors, and particularly LDRSB (immediate).

Assembler Symbols

<Wt> Is the 32-bit name of the general-purpose register to be transferred, encoded in the "Rt" field.
<Xt> Is the 64-bit name of the general-purpose register to be transferred, encoded in the "Rt" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<simm> Is the signed immediate byte offset, in the range -256 to 255, encoded in the "imm9" field.
<pimm> Is the optional positive immediate byte offset, in the range 0 to 4095, defaulting to 0 and encoded in the
"imm12" field.

LDRSB (immediate) Page 429

Shared Decode

integer n = UInt(Rn);
integer t = UInt(Rt);
MemOp memop;
boolean signed;
integer regsize;

if opc<1> == '0' then

boolean tag_checked = memop != MemOp_PREFETCH && (wback || n != 31);

boolean wb_unknown = FALSE;

boolean rt_unknown = FALSE;

if memop == MemOp_LOAD && wback && n == t && n != 31 then

if memop == MemOp_STORE && wback && n == t && n != 31 then

c = ConstrainUnpredictable(Unpredictable_WBOVERLAPST);
assert c IN {Constraint_NONE, Constraint_UNKNOWN, Constraint_UNDEF, Constraint_NOP};
case c of
when Constraint_NONE rt_unknown = FALSE; // value stored is original value
when Constraint_UNKNOWN rt_unknown = TRUE; // value stored is UNKNOWN
when Constraint_UNDEF UNDEFINED;
when Constraint_NOP EndOfInstruction();

LDRSB (immediate) Page 430

Operation

bits(64) address;
bits(8) data;

if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);

if n == 31 then
if memop != MemOp_PREFETCH then CheckSPAlignment();
address = SP[];
else
address = X[n];

if !postindex then
address = address + offset;

case memop of
when MemOp_STORE
if rt_unknown then
data = bits(8) UNKNOWN;
else
data = X[t];
Mem[address, 1, AccType_NORMAL] = data;

when MemOp_LOAD
data = Mem[address, 1, AccType_NORMAL];
if signed then
X[t] = SignExtend(data, regsize);
else
X[t] = ZeroExtend(data, regsize);

when MemOp_PREFETCH
Prefetch(address, t<4:0>);

if wback then
if wb_unknown then
address = bits(64) UNKNOWN;
elsif postindex then
address = address + offset;
if n == 31 then
SP[] = address;
else
X[n] = address;

Operational information

If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

LDRSB (immediate) Page 431

LDRSB (register)

Load Register Signed Byte (register) calculates an address from a base register value and an offset register value,
loads a byte from memory, sign-extends it, and writes it to a register. For information about memory accesses, see
Load/Store addressing modes.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 1 1 0 0 0 1 x 1 Rm option S 1 0 Rn Rt
size opc

32-bit with extended register offset (opc == 11 && option != 011)

LDRSB <Wt>, [<Xn|SP>, (<Wm>|<Xm>), <extend> {<amount>}]

32-bit with shifted register offset (opc == 11 && option == 011)

LDRSB <Wt>, [<Xn|SP>, <Xm>{, LSL <amount>}]

64-bit with extended register offset (opc == 10 && option != 011)

LDRSB <Xt>, [<Xn|SP>, (<Wm>|<Xm>), <extend> {<amount>}]

64-bit with shifted register offset (opc == 10 && option == 011)

LDRSB <Xt>, [<Xn|SP>, <Xm>{, LSL <amount>}]

if option<1> == '0' then UNDEFINED; // sub-word index

ExtendType extend_type = DecodeRegExtend(option);

Assembler Symbols

<Wt> Is the 32-bit name of the general-purpose register to be transferred, encoded in the "Rt" field.
<Xt> Is the 64-bit name of the general-purpose register to be transferred, encoded in the "Rt" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<Wm> When option<0> is set to 0, is the 32-bit name of the general-purpose index register, encoded in the
"Rm" field.
<Xm> When option<0> is set to 1, is the 64-bit name of the general-purpose index register, encoded in the
"Rm" field.

<extend> Is the index extend specifier, encoded in “option”:

option <extend>
010 UXTW
110 SXTW
111 SXTX

<amount> Is the index shift amount, it must be #0, encoded in "S" as 0 if omitted, or as 1 if present.

LDRSB (register) Page 432

Shared Decode

integer n = UInt(Rn);
integer t = UInt(Rt);
integer m = UInt(Rm);
MemOp memop;
boolean signed;
integer regsize;

if opc<1> == '0' then

boolean tag_checked = memop != MemOp_PREFETCH;

Operation

bits(64) offset = ExtendReg(m, extend_type, 0);

bits(64) address;
bits(8) data;

if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);

if n == 31 then
if memop != MemOp_PREFETCH then CheckSPAlignment();
address = SP[];
else
address = X[n];

address = address + offset;

case memop of
when MemOp_STORE
data = X[t];
Mem[address, 1, AccType_NORMAL] = data;

when MemOp_LOAD
data = Mem[address, 1, AccType_NORMAL];
if signed then
X[t] = SignExtend(data, regsize);
else
X[t] = ZeroExtend(data, regsize);

when MemOp_PREFETCH
Prefetch(address, t<4:0>);

Operational information

If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

LDRSB (register) Page 433

LDRSH (immediate)

Load Register Signed Halfword (immediate) loads a halfword from memory, sign-extends it to 32 bits or 64 bits, and
writes the result to a register. The address that is used for the load is calculated from a base register and an
immediate offset. For information about memory accesses, see Load/Store addressing modes.
It has encodings from 3 classes: Post-index , Pre-index and Unsigned offset

Post-index

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 0 0 0 1 x 0 imm9 0 1 Rn Rt
size opc

32-bit (opc == 11)

LDRSH <Wt>, [<Xn|SP>], #<simm>

64-bit (opc == 10)

LDRSH <Xt>, [<Xn|SP>], #<simm>

boolean wback = TRUE;

boolean postindex = TRUE;
bits(64) offset = SignExtend(imm9, 64);

Pre-index

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 0 0 0 1 x 0 imm9 1 1 Rn Rt
size opc

32-bit (opc == 11)

LDRSH <Wt>, [<Xn|SP>, #<simm>]!

64-bit (opc == 10)

LDRSH <Xt>, [<Xn|SP>, #<simm>]!

boolean wback = TRUE;

boolean postindex = FALSE;
bits(64) offset = SignExtend(imm9, 64);

Unsigned offset

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 0 0 1 1 x imm12 Rn Rt
size opc

LDRSH (immediate) Page 434

32-bit (opc == 11)

LDRSH <Wt>, [<Xn|SP>{, #<pimm>}]

64-bit (opc == 10)

LDRSH <Xt>, [<Xn|SP>{, #<pimm>}]

boolean wback = FALSE;

boolean postindex = FALSE;
bits(64) offset = LSL(ZeroExtend(imm12, 64), 1);

For information about the CONSTRAINED UNPREDICTABLE behavior of this instruction, see Architectural Constraints on
UNPREDICTABLE behaviors, and particularly LDRSH (immediate).

Assembler Symbols

<Wt> Is the 32-bit name of the general-purpose register to be transferred, encoded in the "Rt" field.
<Xt> Is the 64-bit name of the general-purpose register to be transferred, encoded in the "Rt" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<simm> Is the signed immediate byte offset, in the range -256 to 255, encoded in the "imm9" field.
<pimm> Is the optional positive immediate byte offset, a multiple of 2 in the range 0 to 8190, defaulting to 0 and
encoded in the "imm12" field as <pimm>/2.

LDRSH (immediate) Page 435

Shared Decode

integer n = UInt(Rn);
integer t = UInt(Rt);
MemOp memop;
boolean signed;
integer regsize;

if opc<1> == '0' then

boolean tag_checked = memop != MemOp_PREFETCH && (wback || n != 31);

boolean wb_unknown = FALSE;

boolean rt_unknown = FALSE;

if memop == MemOp_LOAD && wback && n == t && n != 31 then

if memop == MemOp_STORE && wback && n == t && n != 31 then

LDRSH (immediate) Page 436

Operation

bits(64) address;
bits(16) data;

if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);

if n == 31 then
if memop != MemOp_PREFETCH then CheckSPAlignment();
address = SP[];
else
address = X[n];

if !postindex then
address = address + offset;

case memop of
when MemOp_STORE
if rt_unknown then
data = bits(16) UNKNOWN;
else
data = X[t];
Mem[address, 2, AccType_NORMAL] = data;

when MemOp_LOAD
data = Mem[address, 2, AccType_NORMAL];
if signed then
X[t] = SignExtend(data, regsize);
else
X[t] = ZeroExtend(data, regsize);

when MemOp_PREFETCH
Prefetch(address, t<4:0>);

if wback then
if wb_unknown then
address = bits(64) UNKNOWN;
elsif postindex then
address = address + offset;
if n == 31 then
SP[] = address;
else
X[n] = address;

Operational information

If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

LDRSH (immediate) Page 437

LDRSH (register)

Load Register Signed Halfword (register) calculates an address from a base register value and an offset register value,
loads a halfword from memory, sign-extends it, and writes it to a register. For information about memory accesses see
Load/Store addressing modes.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 0 0 0 1 x 1 Rm option S 1 0 Rn Rt
size opc

32-bit (opc == 11)

LDRSH <Wt>, [<Xn|SP>, (<Wm>|<Xm>){, <extend> {<amount>}}]

64-bit (opc == 10)

LDRSH <Xt>, [<Xn|SP>, (<Wm>|<Xm>){, <extend> {<amount>}}]

if option<1> == '0' then UNDEFINED; // sub-word index

ExtendType extend_type = DecodeRegExtend(option);
integer shift = if S == '1' then 1 else 0;

Assembler Symbols

<Wt> Is the 32-bit name of the general-purpose register to be transferred, encoded in the "Rt" field.
<Xt> Is the 64-bit name of the general-purpose register to be transferred, encoded in the "Rt" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<Wm> When option<0> is set to 0, is the 32-bit name of the general-purpose index register, encoded in the
"Rm" field.
<Xm> When option<0> is set to 1, is the 64-bit name of the general-purpose index register, encoded in the
"Rm" field.

<extend> Is the index extend/shift specifier, defaulting to LSL, and which must be omitted for the LSL option
when <amount> is omitted. encoded in “option”:

option <extend>
010 UXTW
011 LSL
110 SXTW
111 SXTX

<amount> Is the index shift amount, optional only when <extend> is not LSL. Where it is permitted to be optional,
it defaults to #0. It is encoded in “S”:

S <amount>
0 #0
1 #1

LDRSH (register) Page 438

Shared Decode

integer n = UInt(Rn);
integer t = UInt(Rt);
integer m = UInt(Rm);
MemOp memop;
boolean signed;
integer regsize;

if opc<1> == '0' then

boolean tag_checked = memop != MemOp_PREFETCH;

Operation

bits(64) offset = ExtendReg(m, extend_type, shift);

bits(64) address;
bits(16) data;

if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);

if n == 31 then
if memop != MemOp_PREFETCH then CheckSPAlignment();
address = SP[];
else
address = X[n];

address = address + offset;

case memop of
when MemOp_STORE
data = X[t];
Mem[address, 2, AccType_NORMAL] = data;

when MemOp_LOAD
data = Mem[address, 2, AccType_NORMAL];
if signed then
X[t] = SignExtend(data, regsize);
else
X[t] = ZeroExtend(data, regsize);

when MemOp_PREFETCH
Prefetch(address, t<4:0>);

Operational information

If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

LDRSH (register) Page 439

LDRSW (immediate)

Load Register Signed Word (immediate) loads a word from memory, sign-extends it to 64 bits, and writes the result to
a register. The address that is used for the load is calculated from a base register and an immediate offset. For
information about memory accesses, see Load/Store addressing modes.
It has encodings from 3 classes: Post-index , Pre-index and Unsigned offset

Post-index

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 1 1 0 0 0 1 0 0 imm9 0 1 Rn Rt
size opc

LDRSW <Xt>, [<Xn|SP>], #<simm>

boolean wback = TRUE;

boolean postindex = TRUE;
bits(64) offset = SignExtend(imm9, 64);

Pre-index

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 1 1 0 0 0 1 0 0 imm9 1 1 Rn Rt
size opc

LDRSW <Xt>, [<Xn|SP>, #<simm>]!

boolean wback = TRUE;

boolean postindex = FALSE;
bits(64) offset = SignExtend(imm9, 64);

Unsigned offset

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 1 1 0 0 1 1 0 imm12 Rn Rt
size opc

LDRSW <Xt>, [<Xn|SP>{, #<pimm>}]

boolean wback = FALSE;

boolean postindex = FALSE;
bits(64) offset = LSL(ZeroExtend(imm12, 64), 2);

For information about the CONSTRAINED UNPREDICTABLE behavior of this instruction, see Architectural Constraints on
UNPREDICTABLE behaviors, and particularly LDRSW (immediate).

Assembler Symbols

<Xt> Is the 64-bit name of the general-purpose register to be transferred, encoded in the "Rt" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<simm> Is the signed immediate byte offset, in the range -256 to 255, encoded in the "imm9" field.
<pimm> Is the optional positive immediate byte offset, a multiple of 4 in the range 0 to 16380, defaulting to 0
and encoded in the "imm12" field as <pimm>/4.

LDRSW (immediate) Page 440

Shared Decode

integer n = UInt(Rn);
integer t = UInt(Rt);

boolean tag_checked = wback || n != 31;

boolean wb_unknown = FALSE;

if wback && n == t && n != 31 then

Operation

bits(64) address;
bits(32) data;

if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);

if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];

if !postindex then
address = address + offset;

data = Mem[address, 4, AccType_NORMAL];

X[t] = SignExtend(data, 64);
if wback then
if wb_unknown then
address = bits(64) UNKNOWN;
elsif postindex then
address = address + offset;
if n == 31 then
SP[] = address;
else
X[n] = address;

Operational information

If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

LDRSW (immediate) Page 441

LDRSW (literal)

Load Register Signed Word (literal) calculates an address from the PC value and an immediate offset, loads a word
from memory, and writes it to a register. For information about memory accesses, see Load/Store addressing modes.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 0 1 1 0 0 0 imm19 Rt
opc

LDRSW <Xt>, <label>

integer t = UInt(Rt);
bits(64) offset;

offset = SignExtend(imm19:'00', 64);

Assembler Symbols

<Xt> Is the 64-bit name of the general-purpose register to be loaded, encoded in the "Rt" field.
<label> Is the program label from which the data is to be loaded. Its offset from the address of this instruction,
in the range +/-1MB, is encoded as "imm19" times 4.

Operation

bits(64) address = PC[] + offset;

bits(32) data;

if HaveMTE2Ext() then
SetTagCheckedInstruction(FALSE);

data = Mem[address, 4, AccType_NORMAL];

X[t] = SignExtend(data, 64);

Operational information

If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

LDRSW (literal) Page 442

LDRSW (register)

Load Register Signed Word (register) calculates an address from a base register value and an offset register value,
loads a word from memory, sign-extends it to form a 64-bit value, and writes it to a register. The offset register value
can be shifted left by 0 or 2 bits. For information about memory accesses, see Load/Store addressing modes.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 1 1 0 0 0 1 0 1 Rm option S 1 0 Rn Rt
size opc

LDRSW <Xt>, [<Xn|SP>, (<Wm>|<Xm>){, <extend> {<amount>}}]

if option<1> == '0' then UNDEFINED; // sub-word index

ExtendType extend_type = DecodeRegExtend(option);
integer shift = if S == '1' then 2 else 0;

Assembler Symbols

<Xt> Is the 64-bit name of the general-purpose register to be transferred, encoded in the "Rt" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<Wm> When option<0> is set to 0, is the 32-bit name of the general-purpose index register, encoded in the
"Rm" field.
<Xm> When option<0> is set to 1, is the 64-bit name of the general-purpose index register, encoded in the
"Rm" field.

<extend> Is the index extend/shift specifier, defaulting to LSL, and which must be omitted for the LSL option
when <amount> is omitted. encoded in “option”:

option <extend>
010 UXTW
011 LSL
110 SXTW
111 SXTX

<amount> Is the index shift amount, optional only when <extend> is not LSL. Where it is permitted to be optional,
it defaults to #0. It is encoded in “S”:

S <amount>
0 #0
1 #2

Shared Decode

integer n = UInt(Rn);
integer t = UInt(Rt);
integer m = UInt(Rm);

LDRSW (register) Page 443

Operation

bits(64) offset = ExtendReg(m, extend_type, shift);

bits(64) address;
bits(32) data;

if HaveMTE2Ext() then
SetTagCheckedInstruction(TRUE);

if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];

address = address + offset;

data = Mem[address, 4, AccType_NORMAL];

X[t] = SignExtend(data, 64);

Operational information

If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

LDRSW (register) Page 444

LDSET, LDSETA, LDSETAL, LDSETL

Atomic bit set on word or doubleword in memory atomically loads a 32-bit word or 64-bit doubleword from memory,
performs a bitwise OR with the value held in a register on it, and stores the result back to memory. The value initially
loaded from memory is returned in the destination register.
• If the destination register is not one of WZR or XZR, LDSETA and LDSETAL load from memory with acquire
semantics.
• LDSETL and LDSETAL store to memory with release semantics.
• LDSET has neither acquire nor release semantics.
For more information about memory ordering semantics see Load-Acquire, Store-Release.
For information about memory accesses see Load/Store addressing modes.
This instruction is used by the alias STSET, STSETL.

Integer
(FEAT_LSE)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 x 1 1 1 0 0 0 A R 1 Rs 0 0 1 1 0 0 Rn Rt
size opc

LDSET, LDSETA, LDSETAL,

Page 445
LDSETL
32-bit LDSET (size == 10 && A == 0 && R == 0)

LDSET <Ws>, <Wt>, [<Xn|SP>]

32-bit LDSETA (size == 10 && A == 1 && R == 0)

LDSETA <Ws>, <Wt>, [<Xn|SP>]

32-bit LDSETAL (size == 10 && A == 1 && R == 1)

LDSETAL <Ws>, <Wt>, [<Xn|SP>]

32-bit LDSETL (size == 10 && A == 0 && R == 1)

LDSETL <Ws>, <Wt>, [<Xn|SP>]

64-bit LDSET (size == 11 && A == 0 && R == 0)

LDSET <Xs>, <Xt>, [<Xn|SP>]

64-bit LDSETA (size == 11 && A == 1 && R == 0)

LDSETA <Xs>, <Xt>, [<Xn|SP>]

64-bit LDSETAL (size == 11 && A == 1 && R == 1)

LDSETAL <Xs>, <Xt>, [<Xn|SP>]

64-bit LDSETL (size == 11 && A == 0 && R == 1)

LDSETL <Xs>, <Xt>, [<Xn|SP>]

if !HaveAtomicExt() then UNDEFINED;

integer t = UInt(Rt);
integer n = UInt(Rn);
integer s = UInt(Rs);

integer datasize = 8 << UInt(size);

Assembler Symbols

Alias Conditions

Alias Is preferred when

STSET, STSETL A == '0' && Rt == '11111'

LDSET, LDSETA, LDSETAL,

Page 446
LDSETL
Operation

bits(64) address;
bits(datasize) value;
bits(datasize) data;

if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);

value = X[s];
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];

data = MemAtomic(address, MemAtomicOp_ORR, value, ldacctype, stacctype);

if t != 31 then
X[t] = ZeroExtend(data, regsize);

Operational information

If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

LDSET, LDSETA, LDSETAL,

Page 447
LDSETL
LDSETB, LDSETAB, LDSETALB, LDSETLB

Atomic bit set on byte in memory atomically loads an 8-bit byte from memory, performs a bitwise OR with the value
held in a register on it, and stores the result back to memory. The value initially loaded from memory is returned in the
destination register.
• If the destination register is not WZR, LDSETAB and LDSETALB load from memory with acquire semantics.
• LDSETLB and LDSETALB store to memory with release semantics.
• LDSETB has neither acquire nor release semantics.
For more information about memory ordering semantics see Load-Acquire, Store-Release.
For information about memory accesses see Load/Store addressing modes.
This instruction is used by the alias STSETB, STSETLB.

Integer
(FEAT_LSE)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 1 1 0 0 0 A R 1 Rs 0 0 1 1 0 0 Rn Rt
size opc

LDSETAB (A == 1 && R == 0)

LDSETAB <Ws>, <Wt>, [<Xn|SP>]

LDSETALB (A == 1 && R == 1)

LDSETALB <Ws>, <Wt>, [<Xn|SP>]

LDSETB (A == 0 && R == 0)

LDSETB <Ws>, <Wt>, [<Xn|SP>]

LDSETLB (A == 0 && R == 1)

LDSETLB <Ws>, <Wt>, [<Xn|SP>]

if !HaveAtomicExt() then UNDEFINED;

integer t = UInt(Rt);
integer n = UInt(Rn);
integer s = UInt(Rs);

AccType ldacctype = if A == '1' && Rt != '11111' then AccType_ORDEREDATOMICRW else AccType_ATOMICRW;

AccType stacctype = if R == '1' then AccType_ORDEREDATOMICRW else AccType_ATOMICRW;
boolean tag_checked = n != 31;

Assembler Symbols

Alias Conditions

Alias Is preferred when

STSETB, STSETLB A == '0' && Rt == '11111'

LDSETB, LDSETAB,
Page 448
LDSETALB, LDSETLB
Operation

bits(64) address;
bits(8) value;
bits(8) data;

if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);

value = X[s];
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];

data = MemAtomic(address, MemAtomicOp_ORR, value, ldacctype, stacctype);

if t != 31 then
X[t] = ZeroExtend(data, 32);

Operational information

If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

LDSETB, LDSETAB,
Page 449
LDSETALB, LDSETLB
LDSETH, LDSETAH, LDSETALH, LDSETLH

Atomic bit set on halfword in memory atomically loads a 16-bit halfword from memory, performs a bitwise OR with the
value held in a register on it, and stores the result back to memory. The value initially loaded from memory is returned
in the destination register.
• If the destination register is not WZR, LDSETAH and LDSETALH load from memory with acquire semantics.
• LDSETLH and LDSETALH store to memory with release semantics.
• LDSETH has neither acquire nor release semantics.
For more information about memory ordering semantics see Load-Acquire, Store-Release.
For information about memory accesses see Load/Store addressing modes.
This instruction is used by the alias STSETH, STSETLH.

Integer
(FEAT_LSE)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 0 0 0 A R 1 Rs 0 0 1 1 0 0 Rn Rt
size opc

LDSETAH (A == 1 && R == 0)

LDSETAH <Ws>, <Wt>, [<Xn|SP>]

LDSETALH (A == 1 && R == 1)

LDSETALH <Ws>, <Wt>, [<Xn|SP>]

LDSETH (A == 0 && R == 0)

LDSETH <Ws>, <Wt>, [<Xn|SP>]

LDSETLH (A == 0 && R == 1)

LDSETLH <Ws>, <Wt>, [<Xn|SP>]

if !HaveAtomicExt() then UNDEFINED;

integer t = UInt(Rt);
integer n = UInt(Rn);
integer s = UInt(Rs);

AccType ldacctype = if A == '1' && Rt != '11111' then AccType_ORDEREDATOMICRW else AccType_ATOMICRW;

AccType stacctype = if R == '1' then AccType_ORDEREDATOMICRW else AccType_ATOMICRW;
boolean tag_checked = n != 31;

Assembler Symbols

Alias Conditions

Alias Is preferred when

STSETH, STSETLH A == '0' && Rt == '11111'

LDSETH, LDSETAH,
Page 450
LDSETALH, LDSETLH
Operation

bits(64) address;
bits(16) value;
bits(16) data;

if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);

value = X[s];
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];

data = MemAtomic(address, MemAtomicOp_ORR, value, ldacctype, stacctype);

if t != 31 then
X[t] = ZeroExtend(data, 32);

Operational information

If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

LDSETH, LDSETAH,
Page 451
LDSETALH, LDSETLH
LDSMAX, LDSMAXA, LDSMAXAL, LDSMAXL

Atomic signed maximum on word or doubleword in memory atomically loads a 32-bit word or 64-bit doubleword from
memory, compares it against the value held in a register, and stores the larger value back to memory, treating the
values as signed numbers. The value initially loaded from memory is returned in the destination register.
• If the destination register is not one of WZR or XZR, LDSMAXA and LDSMAXAL load from memory with acquire
semantics.
• LDSMAXL and LDSMAXAL store to memory with release semantics.
• LDSMAX has neither acquire nor release semantics.
For more information about memory ordering semantics see Load-Acquire, Store-Release.
For information about memory accesses see Load/Store addressing modes.
This instruction is used by the alias STSMAX, STSMAXL.

Integer
(FEAT_LSE)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 x 1 1 1 0 0 0 A R 1 Rs 0 1 0 0 0 0 Rn Rt
size opc

LDSMAX, LDSMAXA,
Page 452
LDSMAXAL, LDSMAXL
32-bit LDSMAX (size == 10 && A == 0 && R == 0)

LDSMAX <Ws>, <Wt>, [<Xn|SP>]

32-bit LDSMAXA (size == 10 && A == 1 && R == 0)

LDSMAXA <Ws>, <Wt>, [<Xn|SP>]

32-bit LDSMAXAL (size == 10 && A == 1 && R == 1)

LDSMAXAL <Ws>, <Wt>, [<Xn|SP>]

32-bit LDSMAXL (size == 10 && A == 0 && R == 1)

LDSMAXL <Ws>, <Wt>, [<Xn|SP>]

64-bit LDSMAX (size == 11 && A == 0 && R == 0)

LDSMAX <Xs>, <Xt>, [<Xn|SP>]

64-bit LDSMAXA (size == 11 && A == 1 && R == 0)

LDSMAXA <Xs>, <Xt>, [<Xn|SP>]

64-bit LDSMAXAL (size == 11 && A == 1 && R == 1)

LDSMAXAL <Xs>, <Xt>, [<Xn|SP>]

64-bit LDSMAXL (size == 11 && A == 0 && R == 1)

LDSMAXL <Xs>, <Xt>, [<Xn|SP>]

if !HaveAtomicExt() then UNDEFINED;

integer t = UInt(Rt);
integer n = UInt(Rn);
integer s = UInt(Rs);

integer datasize = 8 << UInt(size);

Assembler Symbols

Alias Conditions

Alias Is preferred when

STSMAX, STSMAXL A == '0' && Rt == '11111'

LDSMAX, LDSMAXA,
Page 453
LDSMAXAL, LDSMAXL
Operation

bits(64) address;
bits(datasize) value;
bits(datasize) data;

if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);

value = X[s];
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];

data = MemAtomic(address, MemAtomicOp_SMAX, value, ldacctype, stacctype);

if t != 31 then
X[t] = ZeroExtend(data, regsize);

Operational information

If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

LDSMAX, LDSMAXA,
Page 454
LDSMAXAL, LDSMAXL
LDSMAXB, LDSMAXAB, LDSMAXALB, LDSMAXLB

Atomic signed maximum on byte in memory atomically loads an 8-bit byte from memory, compares it against the value
held in a register, and stores the larger value back to memory, treating the values as signed numbers. The value
initially loaded from memory is returned in the destination register.
• If the destination register is not WZR, LDSMAXAB and LDSMAXALB load from memory with acquire semantics.
• LDSMAXLB and LDSMAXALB store to memory with release semantics.
• LDSMAXB has neither acquire nor release semantics.
For more information about memory ordering semantics see Load-Acquire, Store-Release.
For information about memory accesses see Load/Store addressing modes.
This instruction is used by the alias STSMAXB, STSMAXLB.

Integer
(FEAT_LSE)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 1 1 0 0 0 A R 1 Rs 0 1 0 0 0 0 Rn Rt
size opc

LDSMAXAB (A == 1 && R == 0)

LDSMAXAB <Ws>, <Wt>, [<Xn|SP>]

LDSMAXALB (A == 1 && R == 1)

LDSMAXALB <Ws>, <Wt>, [<Xn|SP>]

LDSMAXB (A == 0 && R == 0)

LDSMAXB <Ws>, <Wt>, [<Xn|SP>]

LDSMAXLB (A == 0 && R == 1)

LDSMAXLB <Ws>, <Wt>, [<Xn|SP>]

if !HaveAtomicExt() then UNDEFINED;

integer t = UInt(Rt);
integer n = UInt(Rn);
integer s = UInt(Rs);

AccType ldacctype = if A == '1' && Rt != '11111' then AccType_ORDEREDATOMICRW else AccType_ATOMICRW;

AccType stacctype = if R == '1' then AccType_ORDEREDATOMICRW else AccType_ATOMICRW;
boolean tag_checked = n != 31;

Assembler Symbols

Alias Conditions

Alias Is preferred when

STSMAXB, STSMAXLB A == '0' && Rt == '11111'

LDSMAXB, LDSMAXAB,
Page 455
LDSMAXALB, LDSMAXLB
Operation

bits(64) address;
bits(8) value;
bits(8) data;

if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);

value = X[s];
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];

data = MemAtomic(address, MemAtomicOp_SMAX, value, ldacctype, stacctype);

if t != 31 then
X[t] = ZeroExtend(data, 32);

Operational information

If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

LDSMAXB, LDSMAXAB,
Page 456
LDSMAXALB, LDSMAXLB
LDSMAXH, LDSMAXAH, LDSMAXALH, LDSMAXLH

Atomic signed maximum on halfword in memory atomically loads a 16-bit halfword from memory, compares it against
the value held in a register, and stores the larger value back to memory, treating the values as signed numbers. The
value initially loaded from memory is returned in the destination register.
• If the destination register is not WZR, LDSMAXAH and LDSMAXALH load from memory with acquire semantics.
• LDSMAXLH and LDSMAXALH store to memory with release semantics.
• LDSMAXH has neither acquire nor release semantics.
For more information about memory ordering semantics see Load-Acquire, Store-Release.
For information about memory accesses see Load/Store addressing modes.
This instruction is used by the alias STSMAXH, STSMAXLH.

Integer
(FEAT_LSE)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 0 0 0 A R 1 Rs 0 1 0 0 0 0 Rn Rt
size opc

LDSMAXAH (A == 1 && R == 0)

LDSMAXAH <Ws>, <Wt>, [<Xn|SP>]

LDSMAXALH (A == 1 && R == 1)

LDSMAXALH <Ws>, <Wt>, [<Xn|SP>]

LDSMAXH (A == 0 && R == 0)

LDSMAXH <Ws>, <Wt>, [<Xn|SP>]

LDSMAXLH (A == 0 && R == 1)

LDSMAXLH <Ws>, <Wt>, [<Xn|SP>]

if !HaveAtomicExt() then UNDEFINED;

integer t = UInt(Rt);
integer n = UInt(Rn);
integer s = UInt(Rs);

AccType ldacctype = if A == '1' && Rt != '11111' then AccType_ORDEREDATOMICRW else AccType_ATOMICRW;

AccType stacctype = if R == '1' then AccType_ORDEREDATOMICRW else AccType_ATOMICRW;
boolean tag_checked = n != 31;

Assembler Symbols

Alias Conditions

Alias Is preferred when

STSMAXH, STSMAXLH A == '0' && Rt == '11111'

LDSMAXH, LDSMAXAH,
Page 457
LDSMAXALH, LDSMAXLH
Operation

bits(64) address;
bits(16) value;
bits(16) data;

if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);

value = X[s];
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];

data = MemAtomic(address, MemAtomicOp_SMAX, value, ldacctype, stacctype);

if t != 31 then
X[t] = ZeroExtend(data, 32);

Operational information

If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

LDSMAXH, LDSMAXAH,
Page 458
LDSMAXALH, LDSMAXLH
LDSMIN, LDSMINA, LDSMINAL, LDSMINL

Atomic signed minimum on word or doubleword in memory atomically loads a 32-bit word or 64-bit doubleword from
memory, compares it against the value held in a register, and stores the smaller value back to memory, treating the
values as signed numbers. The value initially loaded from memory is returned in the destination register.
• If the destination register is not one of WZR or XZR, LDSMINA and LDSMINAL load from memory with acquire
semantics.
• LDSMINL and LDSMINAL store to memory with release semantics.
• LDSMIN has neither acquire nor release semantics.
For more information about memory ordering semantics see Load-Acquire, Store-Release.
For information about memory accesses see Load/Store addressing modes.
This instruction is used by the alias STSMIN, STSMINL.

Integer
(FEAT_LSE)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 x 1 1 1 0 0 0 A R 1 Rs 0 1 0 1 0 0 Rn Rt
size opc

LDSMIN, LDSMINA,
Page 459
LDSMINAL, LDSMINL
32-bit LDSMIN (size == 10 && A == 0 && R == 0)

LDSMIN <Ws>, <Wt>, [<Xn|SP>]

32-bit LDSMINA (size == 10 && A == 1 && R == 0)

LDSMINA <Ws>, <Wt>, [<Xn|SP>]

32-bit LDSMINAL (size == 10 && A == 1 && R == 1)

LDSMINAL <Ws>, <Wt>, [<Xn|SP>]

32-bit LDSMINL (size == 10 && A == 0 && R == 1)

LDSMINL <Ws>, <Wt>, [<Xn|SP>]

64-bit LDSMIN (size == 11 && A == 0 && R == 0)

LDSMIN <Xs>, <Xt>, [<Xn|SP>]

64-bit LDSMINA (size == 11 && A == 1 && R == 0)

LDSMINA <Xs>, <Xt>, [<Xn|SP>]

64-bit LDSMINAL (size == 11 && A == 1 && R == 1)

LDSMINAL <Xs>, <Xt>, [<Xn|SP>]

64-bit LDSMINL (size == 11 && A == 0 && R == 1)

LDSMINL <Xs>, <Xt>, [<Xn|SP>]

if !HaveAtomicExt() then UNDEFINED;

integer t = UInt(Rt);
integer n = UInt(Rn);
integer s = UInt(Rs);

integer datasize = 8 << UInt(size);

Assembler Symbols

Alias Conditions

Alias Is preferred when

STSMIN, STSMINL A == '0' && Rt == '11111'

LDSMIN, LDSMINA,
Page 460
LDSMINAL, LDSMINL
Operation

bits(64) address;
bits(datasize) value;
bits(datasize) data;

if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);

value = X[s];
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];

data = MemAtomic(address, MemAtomicOp_SMIN, value, ldacctype, stacctype);

if t != 31 then
X[t] = ZeroExtend(data, regsize);

Operational information

If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

LDSMIN, LDSMINA,
Page 461
LDSMINAL, LDSMINL
LDSMINB, LDSMINAB, LDSMINALB, LDSMINLB

Atomic signed minimum on byte in memory atomically loads an 8-bit byte from memory, compares it against the value
held in a register, and stores the smaller value back to memory, treating the values as signed numbers. The value
initially loaded from memory is returned in the destination register.
• If the destination register is not WZR, LDSMINAB and LDSMINALB load from memory with acquire semantics.
• LDSMINLB and LDSMINALB store to memory with release semantics.
• LDSMINB has neither acquire nor release semantics.
For more information about memory ordering semantics see Load-Acquire, Store-Release.
For information about memory accesses see Load/Store addressing modes.
This instruction is used by the alias STSMINB, STSMINLB.

Integer
(FEAT_LSE)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 1 1 0 0 0 A R 1 Rs 0 1 0 1 0 0 Rn Rt
size opc

LDSMINAB (A == 1 && R == 0)

LDSMINAB <Ws>, <Wt>, [<Xn|SP>]

LDSMINALB (A == 1 && R == 1)

LDSMINALB <Ws>, <Wt>, [<Xn|SP>]

LDSMINB (A == 0 && R == 0)

LDSMINB <Ws>, <Wt>, [<Xn|SP>]

LDSMINLB (A == 0 && R == 1)

LDSMINLB <Ws>, <Wt>, [<Xn|SP>]

if !HaveAtomicExt() then UNDEFINED;

integer t = UInt(Rt);
integer n = UInt(Rn);
integer s = UInt(Rs);

AccType ldacctype = if A == '1' && Rt != '11111' then AccType_ORDEREDATOMICRW else AccType_ATOMICRW;

AccType stacctype = if R == '1' then AccType_ORDEREDATOMICRW else AccType_ATOMICRW;
boolean tag_checked = n != 31;

Assembler Symbols

Alias Conditions

Alias Is preferred when

STSMINB, STSMINLB A == '0' && Rt == '11111'

LDSMINB, LDSMINAB,
Page 462
LDSMINALB, LDSMINLB
Operation

bits(64) address;
bits(8) value;
bits(8) data;

if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);

value = X[s];
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];

data = MemAtomic(address, MemAtomicOp_SMIN, value, ldacctype, stacctype);

if t != 31 then
X[t] = ZeroExtend(data, 32);

Operational information

If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

LDSMINB, LDSMINAB,
Page 463
LDSMINALB, LDSMINLB
LDSMINH, LDSMINAH, LDSMINALH, LDSMINLH

Atomic signed minimum on halfword in memory atomically loads a 16-bit halfword from memory, compares it against
the value held in a register, and stores the smaller value back to memory, treating the values as signed numbers. The
value initially loaded from memory is returned in the destination register.
• If the destination register is not WZR, LDSMINAH and LDSMINALH load from memory with acquire semantics.
• LDSMINLH and LDSMINALH store to memory with release semantics.
• LDSMINH has neither acquire nor release semantics.
For more information about memory ordering semantics see Load-Acquire, Store-Release.
For information about memory accesses see Load/Store addressing modes.
This instruction is used by the alias STSMINH, STSMINLH.

Integer
(FEAT_LSE)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 0 0 0 A R 1 Rs 0 1 0 1 0 0 Rn Rt
size opc

LDSMINAH (A == 1 && R == 0)

LDSMINAH <Ws>, <Wt>, [<Xn|SP>]

LDSMINALH (A == 1 && R == 1)

LDSMINALH <Ws>, <Wt>, [<Xn|SP>]

LDSMINH (A == 0 && R == 0)

LDSMINH <Ws>, <Wt>, [<Xn|SP>]

LDSMINLH (A == 0 && R == 1)

LDSMINLH <Ws>, <Wt>, [<Xn|SP>]

if !HaveAtomicExt() then UNDEFINED;

integer t = UInt(Rt);
integer n = UInt(Rn);
integer s = UInt(Rs);

AccType ldacctype = if A == '1' && Rt != '11111' then AccType_ORDEREDATOMICRW else AccType_ATOMICRW;

AccType stacctype = if R == '1' then AccType_ORDEREDATOMICRW else AccType_ATOMICRW;
boolean tag_checked = n != 31;

Assembler Symbols

Alias Conditions

Alias Is preferred when

STSMINH, STSMINLH A == '0' && Rt == '11111'

LDSMINH, LDSMINAH,
Page 464
LDSMINALH, LDSMINLH
Operation

bits(64) address;
bits(16) value;
bits(16) data;

if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);

value = X[s];
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];

data = MemAtomic(address, MemAtomicOp_SMIN, value, ldacctype, stacctype);

if t != 31 then
X[t] = ZeroExtend(data, 32);

Operational information

If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

LDSMINH, LDSMINAH,
Page 465
LDSMINALH, LDSMINLH
LDTR

Load Register (unprivileged) loads a word or doubleword from memory, and writes it to a register. The address that is
used for the load is calculated from a base register and an immediate offset.
Memory accesses made by the instruction behave as if the instruction was executed at EL0 if the Effective value of
PSTATE.UAO is 0 and either:
• The instruction is executed at EL1.
• The instruction is executed at EL2 when the Effective value of HCR_EL2.{E2H, TGE} is {1, 1}.
Otherwise, the memory access operates with the restrictions determined by the Exception level at which the
instruction is executed. For information about memory accesses, see Load/Store addressing modes.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 x 1 1 1 0 0 0 0 1 0 imm9 1 0 Rn Rt
size opc

32-bit (size == 10)

LDTR <Wt>, [<Xn|SP>{, #<simm>}]

64-bit (size == 11)

LDTR <Xt>, [<Xn|SP>{, #<simm>}]

integer scale = UInt(size);

bits(64) offset = SignExtend(imm9, 64);

Assembler Symbols

Shared Decode

integer n = UInt(Rn);
integer t = UInt(Rt);
AccType acctype;
unpriv_at_el1 = PSTATE.EL == EL1 && !(EL2Enabled() && HaveNVExt() && HCR_EL2.<NV,NV1> == '11');
unpriv_at_el2 = PSTATE.EL == EL2 && HaveVirtHostExt() && HCR_EL2.<E2H,TGE> == '11';

user_access_override = HaveUAOExt() && PSTATE.UAO == '1';

if !user_access_override && (unpriv_at_el1 || unpriv_at_el2) then
acctype = AccType_UNPRIV;
else
acctype = AccType_NORMAL;

integer regsize;

regsize = if size == '11' then 64 else 32;

integer datasize = 8 << scale;
boolean tag_checked = n != 31;

LDTR Page 466

Operation

bits(64) address;
bits(datasize) data;

if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);

if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];

address = address + offset;

data = Mem[address, datasize DIV 8, acctype];

X[t] = ZeroExtend(data, regsize);

Operational information

If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

LDTR Page 467

LDTRB

Load Register Byte (unprivileged) loads a byte from memory, zero-extends it, and writes the result to a register. The
address that is used for the load is calculated from a base register and an immediate offset.
Memory accesses made by the instruction behave as if the instruction was executed at EL0 if the Effective value of
PSTATE.UAO is 0 and either:
• The instruction is executed at EL1.
• The instruction is executed at EL2 when the Effective value of HCR_EL2.{E2H, TGE} is {1, 1}.
Otherwise, the memory access operates with the restrictions determined by the Exception level at which the
instruction is executed. For information about memory accesses, see Load/Store addressing modes.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 1 1 0 0 0 0 1 0 imm9 1 0 Rn Rt
size opc

LDTRB <Wt>, [<Xn|SP>{, #<simm>}]

bits(64) offset = SignExtend(imm9, 64);

Assembler Symbols

<Wt> Is the 32-bit name of the general-purpose register to be transferred, encoded in the "Rt" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<simm> Is the optional signed immediate byte offset, in the range -256 to 255, defaulting to 0 and encoded in
the "imm9" field.

Shared Decode

user_access_override = HaveUAOExt() && PSTATE.UAO == '1';

if !user_access_override && (unpriv_at_el1 || unpriv_at_el2) then
acctype = AccType_UNPRIV;
else
acctype = AccType_NORMAL;

boolean tag_checked = n != 31;

Operation

bits(64) address;
bits(8) data;

if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);

if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];

address = address + offset;

data = Mem[address, 1, acctype];

X[t] = ZeroExtend(data, 32);

LDTRB Page 468

Operational information

If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

LDTRB Page 469

LDTRH

Load Register Halfword (unprivileged) loads a halfword from memory, zero-extends it, and writes the result to a
register. The address that is used for the load is calculated from a base register and an immediate offset.
Memory accesses made by the instruction behave as if the instruction was executed at EL0 if the Effective value of
PSTATE.UAO is 0 and either:
• The instruction is executed at EL1.
• The instruction is executed at EL2 when the Effective value of HCR_EL2.{E2H, TGE} is {1, 1}.
Otherwise, the memory access operates with the restrictions determined by the Exception level at which the
instruction is executed. For information about memory accesses, see Load/Store addressing modes.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 0 0 0 0 1 0 imm9 1 0 Rn Rt
size opc

LDTRH <Wt>, [<Xn|SP>{, #<simm>}]

bits(64) offset = SignExtend(imm9, 64);

Assembler Symbols

<Wt> Is the 32-bit name of the general-purpose register to be transferred, encoded in the "Rt" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<simm> Is the optional signed immediate byte offset, in the range -256 to 255, defaulting to 0 and encoded in
the "imm9" field.

Shared Decode

user_access_override = HaveUAOExt() && PSTATE.UAO == '1';

if !user_access_override && (unpriv_at_el1 || unpriv_at_el2) then
acctype = AccType_UNPRIV;
else
acctype = AccType_NORMAL;

boolean tag_checked = n != 31;

Operation

bits(64) address;
bits(16) data;

if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);

if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];

address = address + offset;

data = Mem[address, 2, acctype];

X[t] = ZeroExtend(data, 32);

LDTRH Page 470

Operational information

If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

LDTRH Page 471

LDTRSB

Load Register Signed Byte (unprivileged) loads a byte from memory, sign-extends it to 32 bits or 64 bits, and writes
the result to a register. The address that is used for the load is calculated from a base register and an immediate
offset.
Memory accesses made by the instruction behave as if the instruction was executed at EL0 if the Effective value of
PSTATE.UAO is 0 and either:
• The instruction is executed at EL1.
• The instruction is executed at EL2 when the Effective value of HCR_EL2.{E2H, TGE} is {1, 1}.
Otherwise, the memory access operates with the restrictions determined by the Exception level at which the
instruction is executed. For information about memory accesses, see Load/Store addressing modes.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 1 1 0 0 0 1 x 0 imm9 1 0 Rn Rt
size opc

32-bit (opc == 11)

LDTRSB <Wt>, [<Xn|SP>{, #<simm>}]

64-bit (opc == 10)

LDTRSB <Xt>, [<Xn|SP>{, #<simm>}]

bits(64) offset = SignExtend(imm9, 64);

Assembler Symbols

LDTRSB Page 472

Shared Decode

user_access_override = HaveUAOExt() && PSTATE.UAO == '1';

if !user_access_override && (unpriv_at_el1 || unpriv_at_el2) then
acctype = AccType_UNPRIV;
else
acctype = AccType_NORMAL;

MemOp memop;
boolean signed;
integer regsize;

if opc<1> == '0' then

boolean tag_checked = memop != MemOp_PREFETCH && (n != 31);

Operation

bits(64) address;
bits(8) data;

if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);

if n == 31 then
if memop != MemOp_PREFETCH then CheckSPAlignment();
address = SP[];
else
address = X[n];

address = address + offset;

case memop of
when MemOp_STORE
data = X[t];
Mem[address, 1, acctype] = data;

when MemOp_LOAD
data = Mem[address, 1, acctype];
if signed then
X[t] = SignExtend(data, regsize);
else
X[t] = ZeroExtend(data, regsize);

when MemOp_PREFETCH
Prefetch(address, t<4:0>);

Operational information

If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

LDTRSB Page 473

LDTRSH

Load Register Signed Halfword (unprivileged) loads a halfword from memory, sign-extends it to 32 bits or 64 bits, and
writes the result to a register. The address that is used for the load is calculated from a base register and an
immediate offset.
Memory accesses made by the instruction behave as if the instruction was executed at EL0 if the Effective value of
PSTATE.UAO is 0 and either:
• The instruction is executed at EL1.
• The instruction is executed at EL2 when the Effective value of HCR_EL2.{E2H, TGE} is {1, 1}.
Otherwise, the memory access operates with the restrictions determined by the Exception level at which the
instruction is executed. For information about memory accesses, see Load/Store addressing modes.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 0 0 0 1 x 0 imm9 1 0 Rn Rt
size opc

32-bit (opc == 11)

LDTRSH <Wt>, [<Xn|SP>{, #<simm>}]

64-bit (opc == 10)

LDTRSH <Xt>, [<Xn|SP>{, #<simm>}]

bits(64) offset = SignExtend(imm9, 64);

Assembler Symbols

LDTRSH Page 474

Shared Decode

user_access_override = HaveUAOExt() && PSTATE.UAO == '1';

if !user_access_override && (unpriv_at_el1 || unpriv_at_el2) then
acctype = AccType_UNPRIV;
else
acctype = AccType_NORMAL;

MemOp memop;
boolean signed;
integer regsize;

if opc<1> == '0' then

boolean tag_checked = memop != MemOp_PREFETCH && (n != 31);

Operation

bits(64) address;
bits(16) data;

if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);

if n == 31 then
if memop != MemOp_PREFETCH then CheckSPAlignment();
address = SP[];
else
address = X[n];

address = address + offset;

case memop of
when MemOp_STORE
data = X[t];
Mem[address, 2, acctype] = data;

when MemOp_LOAD
data = Mem[address, 2, acctype];
if signed then
X[t] = SignExtend(data, regsize);
else
X[t] = ZeroExtend(data, regsize);

when MemOp_PREFETCH
Prefetch(address, t<4:0>);

Operational information

If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

LDTRSH Page 475

LDTRSW

Load Register Signed Word (unprivileged) loads a word from memory, sign-extends it to 64 bits, and writes the result
to a register. The address that is used for the load is calculated from a base register and an immediate offset.
Memory accesses made by the instruction behave as if the instruction was executed at EL0 if the Effective value of
PSTATE.UAO is 0 and either:
• The instruction is executed at EL1.
• The instruction is executed at EL2 when the Effective value of HCR_EL2.{E2H, TGE} is {1, 1}.
Otherwise, the memory access operates with the restrictions determined by the Exception level at which the
instruction is executed. For information about memory accesses, see Load/Store addressing modes.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 1 1 0 0 0 1 0 0 imm9 1 0 Rn Rt
size opc

LDTRSW <Xt>, [<Xn|SP>{, #<simm>}]

bits(64) offset = SignExtend(imm9, 64);

Assembler Symbols

Shared Decode

user_access_override = HaveUAOExt() && PSTATE.UAO == '1';

if !user_access_override && (unpriv_at_el1 || unpriv_at_el2) then
acctype = AccType_UNPRIV;
else
acctype = AccType_NORMAL;

boolean tag_checked = n != 31;

Operation

bits(64) address;
bits(32) data;

if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);

if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];

address = address + offset;

data = Mem[address, 4, acctype];

X[t] = SignExtend(data, 64);

LDTRSW Page 476

Operational information

If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

LDTRSW Page 477

LDUMAX, LDUMAXA, LDUMAXAL, LDUMAXL

Atomic unsigned maximum on word or doubleword in memory atomically loads a 32-bit word or 64-bit doubleword
from memory, compares it against the value held in a register, and stores the larger value back to memory, treating the
values as unsigned numbers. The value initially loaded from memory is returned in the destination register.
• If the destination register is not one of WZR or XZR, LDUMAXA and LDUMAXAL load from memory with acquire
semantics.
• LDUMAXL and LDUMAXAL store to memory with release semantics.
• LDUMAX has neither acquire nor release semantics.
For more information about memory ordering semantics see Load-Acquire, Store-Release.
For information about memory accesses see Load/Store addressing modes.
This instruction is used by the alias STUMAX, STUMAXL.

Integer
(FEAT_LSE)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 x 1 1 1 0 0 0 A R 1 Rs 0 1 1 0 0 0 Rn Rt
size opc

LDUMAX, LDUMAXA,
Page 478
LDUMAXAL, LDUMAXL
32-bit LDUMAX (size == 10 && A == 0 && R == 0)

LDUMAX <Ws>, <Wt>, [<Xn|SP>]

32-bit LDUMAXA (size == 10 && A == 1 && R == 0)

LDUMAXA <Ws>, <Wt>, [<Xn|SP>]

32-bit LDUMAXAL (size == 10 && A == 1 && R == 1)

LDUMAXAL <Ws>, <Wt>, [<Xn|SP>]

32-bit LDUMAXL (size == 10 && A == 0 && R == 1)

LDUMAXL <Ws>, <Wt>, [<Xn|SP>]

64-bit LDUMAX (size == 11 && A == 0 && R == 0)

LDUMAX <Xs>, <Xt>, [<Xn|SP>]

64-bit LDUMAXA (size == 11 && A == 1 && R == 0)

LDUMAXA <Xs>, <Xt>, [<Xn|SP>]

64-bit LDUMAXAL (size == 11 && A == 1 && R == 1)

LDUMAXAL <Xs>, <Xt>, [<Xn|SP>]

64-bit LDUMAXL (size == 11 && A == 0 && R == 1)

LDUMAXL <Xs>, <Xt>, [<Xn|SP>]

if !HaveAtomicExt() then UNDEFINED;

integer t = UInt(Rt);
integer n = UInt(Rn);
integer s = UInt(Rs);

integer datasize = 8 << UInt(size);

Assembler Symbols

Alias Conditions

Alias Is preferred when

STUMAX, STUMAXL A == '0' && Rt == '11111'

LDUMAX, LDUMAXA,
Page 479
LDUMAXAL, LDUMAXL
Operation

bits(64) address;
bits(datasize) value;
bits(datasize) data;

if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);

value = X[s];
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];

data = MemAtomic(address, MemAtomicOp_UMAX, value, ldacctype, stacctype);

if t != 31 then
X[t] = ZeroExtend(data, regsize);

Operational information

If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

LDUMAX, LDUMAXA,
Page 480
LDUMAXAL, LDUMAXL
LDUMAXB, LDUMAXAB, LDUMAXALB, LDUMAXLB

Atomic unsigned maximum on byte in memory atomically loads an 8-bit byte from memory, compares it against the
value held in a register, and stores the larger value back to memory, treating the values as unsigned numbers. The
value initially loaded from memory is returned in the destination register.
• If the destination register is not WZR, LDUMAXAB and LDUMAXALB load from memory with acquire semantics.
• LDUMAXLB and LDUMAXALB store to memory with release semantics.
• LDUMAXB has neither acquire nor release semantics.
For more information about memory ordering semantics see Load-Acquire, Store-Release.
For information about memory accesses see Load/Store addressing modes.
This instruction is used by the alias STUMAXB, STUMAXLB.

Integer
(FEAT_LSE)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 1 1 0 0 0 A R 1 Rs 0 1 1 0 0 0 Rn Rt
size opc

LDUMAXAB (A == 1 && R == 0)

LDUMAXAB <Ws>, <Wt>, [<Xn|SP>]

LDUMAXALB (A == 1 && R == 1)

LDUMAXALB <Ws>, <Wt>, [<Xn|SP>]

LDUMAXB (A == 0 && R == 0)

LDUMAXB <Ws>, <Wt>, [<Xn|SP>]

LDUMAXLB (A == 0 && R == 1)

LDUMAXLB <Ws>, <Wt>, [<Xn|SP>]

if !HaveAtomicExt() then UNDEFINED;

integer t = UInt(Rt);
integer n = UInt(Rn);
integer s = UInt(Rs);

AccType ldacctype = if A == '1' && Rt != '11111' then AccType_ORDEREDATOMICRW else AccType_ATOMICRW;

AccType stacctype = if R == '1' then AccType_ORDEREDATOMICRW else AccType_ATOMICRW;
boolean tag_checked = n != 31;

Assembler Symbols

Alias Conditions

Alias Is preferred when

STUMAXB, STUMAXLB A == '0' && Rt == '11111'

LDUMAXB, LDUMAXAB,
Page 481
LDUMAXALB, LDUMAXLB
Operation

bits(64) address;
bits(8) value;
bits(8) data;

if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);

value = X[s];
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];

data = MemAtomic(address, MemAtomicOp_UMAX, value, ldacctype, stacctype);

if t != 31 then
X[t] = ZeroExtend(data, 32);

Operational information

If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

LDUMAXB, LDUMAXAB,
Page 482
LDUMAXALB, LDUMAXLB
LDUMAXH, LDUMAXAH, LDUMAXALH, LDUMAXLH

Atomic unsigned maximum on halfword in memory atomically loads a 16-bit halfword from memory, compares it
against the value held in a register, and stores the larger value back to memory, treating the values as unsigned
numbers. The value initially loaded from memory is returned in the destination register.
• If the destination register is not WZR, LDUMAXAH and LDUMAXALH load from memory with acquire semantics.
• LDUMAXLH and LDUMAXALH store to memory with release semantics.
• LDUMAXH has neither acquire nor release semantics.
For more information about memory ordering semantics see Load-Acquire, Store-Release.
For information about memory accesses see Load/Store addressing modes.
This instruction is used by the alias STUMAXH, STUMAXLH.

Integer
(FEAT_LSE)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 0 0 0 A R 1 Rs 0 1 1 0 0 0 Rn Rt
size opc

LDUMAXAH (A == 1 && R == 0)

LDUMAXAH <Ws>, <Wt>, [<Xn|SP>]

LDUMAXALH (A == 1 && R == 1)

LDUMAXALH <Ws>, <Wt>, [<Xn|SP>]

LDUMAXH (A == 0 && R == 0)

LDUMAXH <Ws>, <Wt>, [<Xn|SP>]

LDUMAXLH (A == 0 && R == 1)

LDUMAXLH <Ws>, <Wt>, [<Xn|SP>]

if !HaveAtomicExt() then UNDEFINED;

integer t = UInt(Rt);
integer n = UInt(Rn);
integer s = UInt(Rs);

AccType ldacctype = if A == '1' && Rt != '11111' then AccType_ORDEREDATOMICRW else AccType_ATOMICRW;

AccType stacctype = if R == '1' then AccType_ORDEREDATOMICRW else AccType_ATOMICRW;
boolean tag_checked = n != 31;

Assembler Symbols

Alias Conditions

Alias Is preferred when

STUMAXH, STUMAXLH A == '0' && Rt == '11111'

LDUMAXH, LDUMAXAH,
Page 483
LDUMAXALH, LDUMAXLH
Operation

bits(64) address;
bits(16) value;
bits(16) data;

if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);

value = X[s];
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];

data = MemAtomic(address, MemAtomicOp_UMAX, value, ldacctype, stacctype);

if t != 31 then
X[t] = ZeroExtend(data, 32);

Operational information

If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

LDUMAXH, LDUMAXAH,
Page 484
LDUMAXALH, LDUMAXLH
LDUMIN, LDUMINA, LDUMINAL, LDUMINL

Atomic unsigned minimum on word or doubleword in memory atomically loads a 32-bit word or 64-bit doubleword
from memory, compares it against the value held in a register, and stores the smaller value back to memory, treating
the values as unsigned numbers. The value initially loaded from memory is returned in the destination register.
• If the destination register is not one of WZR or XZR, LDUMINA and LDUMINAL load from memory with acquire
semantics.
• LDUMINL and LDUMINAL store to memory with release semantics.
• LDUMIN has neither acquire nor release semantics.
For more information about memory ordering semantics see Load-Acquire, Store-Release.
For information about memory accesses see Load/Store addressing modes.
This instruction is used by the alias STUMIN, STUMINL.

Integer
(FEAT_LSE)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 x 1 1 1 0 0 0 A R 1 Rs 0 1 1 1 0 0 Rn Rt
size opc

LDUMIN, LDUMINA,
Page 485
LDUMINAL, LDUMINL
32-bit LDUMIN (size == 10 && A == 0 && R == 0)

LDUMIN <Ws>, <Wt>, [<Xn|SP>]

32-bit LDUMINA (size == 10 && A == 1 && R == 0)

LDUMINA <Ws>, <Wt>, [<Xn|SP>]

32-bit LDUMINAL (size == 10 && A == 1 && R == 1)

LDUMINAL <Ws>, <Wt>, [<Xn|SP>]

32-bit LDUMINL (size == 10 && A == 0 && R == 1)

LDUMINL <Ws>, <Wt>, [<Xn|SP>]

64-bit LDUMIN (size == 11 && A == 0 && R == 0)

LDUMIN <Xs>, <Xt>, [<Xn|SP>]

64-bit LDUMINA (size == 11 && A == 1 && R == 0)

LDUMINA <Xs>, <Xt>, [<Xn|SP>]

64-bit LDUMINAL (size == 11 && A == 1 && R == 1)

LDUMINAL <Xs>, <Xt>, [<Xn|SP>]

64-bit LDUMINL (size == 11 && A == 0 && R == 1)

LDUMINL <Xs>, <Xt>, [<Xn|SP>]

if !HaveAtomicExt() then UNDEFINED;

integer t = UInt(Rt);
integer n = UInt(Rn);
integer s = UInt(Rs);

integer datasize = 8 << UInt(size);

Assembler Symbols

Alias Conditions

Alias Is preferred when

STUMIN, STUMINL A == '0' && Rt == '11111'

LDUMIN, LDUMINA,
Page 486
LDUMINAL, LDUMINL
Operation

bits(64) address;
bits(datasize) value;
bits(datasize) data;

if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);

value = X[s];
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];

data = MemAtomic(address, MemAtomicOp_UMIN, value, ldacctype, stacctype);

if t != 31 then
X[t] = ZeroExtend(data, regsize);

Operational information

If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

LDUMIN, LDUMINA,
Page 487
LDUMINAL, LDUMINL
LDUMINB, LDUMINAB, LDUMINALB, LDUMINLB

Atomic unsigned minimum on byte in memory atomically loads an 8-bit byte from memory, compares it against the
value held in a register, and stores the smaller value back to memory, treating the values as unsigned numbers. The
value initially loaded from memory is returned in the destination register.
• If the destination register is not WZR, LDUMINAB and LDUMINALB load from memory with acquire semantics.
• LDUMINLB and LDUMINALB store to memory with release semantics.
• LDUMINB has neither acquire nor release semantics.
For more information about memory ordering semantics see Load-Acquire, Store-Release.
For information about memory accesses see Load/Store addressing modes.
This instruction is used by the alias STUMINB, STUMINLB.

Integer
(FEAT_LSE)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 1 1 0 0 0 A R 1 Rs 0 1 1 1 0 0 Rn Rt
size opc

LDUMINAB (A == 1 && R == 0)

LDUMINAB <Ws>, <Wt>, [<Xn|SP>]

LDUMINALB (A == 1 && R == 1)

LDUMINALB <Ws>, <Wt>, [<Xn|SP>]

LDUMINB (A == 0 && R == 0)

LDUMINB <Ws>, <Wt>, [<Xn|SP>]

LDUMINLB (A == 0 && R == 1)

LDUMINLB <Ws>, <Wt>, [<Xn|SP>]

if !HaveAtomicExt() then UNDEFINED;

integer t = UInt(Rt);
integer n = UInt(Rn);
integer s = UInt(Rs);

AccType ldacctype = if A == '1' && Rt != '11111' then AccType_ORDEREDATOMICRW else AccType_ATOMICRW;

AccType stacctype = if R == '1' then AccType_ORDEREDATOMICRW else AccType_ATOMICRW;
boolean tag_checked = n != 31;

Assembler Symbols

Alias Conditions

Alias Is preferred when

STUMINB, STUMINLB A == '0' && Rt == '11111'

LDUMINB, LDUMINAB,
Page 488
LDUMINALB, LDUMINLB
Operation

bits(64) address;
bits(8) value;
bits(8) data;

if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);

value = X[s];
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];

data = MemAtomic(address, MemAtomicOp_UMIN, value, ldacctype, stacctype);

if t != 31 then
X[t] = ZeroExtend(data, 32);

Operational information

If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

LDUMINB, LDUMINAB,
Page 489
LDUMINALB, LDUMINLB
LDUMINH, LDUMINAH, LDUMINALH, LDUMINLH

Atomic unsigned minimum on halfword in memory atomically loads a 16-bit halfword from memory, compares it
against the value held in a register, and stores the smaller value back to memory, treating the values as unsigned
numbers. The value initially loaded from memory is returned in the destination register.
• If the destination register is not WZR, LDUMINAH and LDUMINALH load from memory with acquire semantics.
• LDUMINLH and LDUMINALH store to memory with release semantics.
• LDUMINH has neither acquire nor release semantics.
For more information about memory ordering semantics see Load-Acquire, Store-Release.
For information about memory accesses see Load/Store addressing modes.
This instruction is used by the alias STUMINH, STUMINLH.

Integer
(FEAT_LSE)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 0 0 0 A R 1 Rs 0 1 1 1 0 0 Rn Rt
size opc

LDUMINAH (A == 1 && R == 0)

LDUMINAH <Ws>, <Wt>, [<Xn|SP>]

LDUMINALH (A == 1 && R == 1)

LDUMINALH <Ws>, <Wt>, [<Xn|SP>]

LDUMINH (A == 0 && R == 0)

LDUMINH <Ws>, <Wt>, [<Xn|SP>]

LDUMINLH (A == 0 && R == 1)

LDUMINLH <Ws>, <Wt>, [<Xn|SP>]

if !HaveAtomicExt() then UNDEFINED;

integer t = UInt(Rt);
integer n = UInt(Rn);
integer s = UInt(Rs);

AccType ldacctype = if A == '1' && Rt != '11111' then AccType_ORDEREDATOMICRW else AccType_ATOMICRW;

AccType stacctype = if R == '1' then AccType_ORDEREDATOMICRW else AccType_ATOMICRW;
boolean tag_checked = n != 31;

Assembler Symbols

Alias Conditions

Alias Is preferred when

STUMINH, STUMINLH A == '0' && Rt == '11111'

LDUMINH, LDUMINAH,
Page 490
LDUMINALH, LDUMINLH
Operation

bits(64) address;
bits(16) value;
bits(16) data;

if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);

value = X[s];
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];

data = MemAtomic(address, MemAtomicOp_UMIN, value, ldacctype, stacctype);

if t != 31 then
X[t] = ZeroExtend(data, 32);

Operational information

If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

LDUMINH, LDUMINAH,
Page 491
LDUMINALH, LDUMINLH
LDUR

Load Register (unscaled) calculates an address from a base register and an immediate offset, loads a 32-bit word or
64-bit doubleword from memory, zero-extends it, and writes it to a register. For information about memory accesses,
see Load/Store addressing modes.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 x 1 1 1 0 0 0 0 1 0 imm9 0 0 Rn Rt
size opc

32-bit (size == 10)

LDUR <Wt>, [<Xn|SP>{, #<simm>}]

64-bit (size == 11)

LDUR <Xt>, [<Xn|SP>{, #<simm>}]

integer scale = UInt(size);

bits(64) offset = SignExtend(imm9, 64);

Assembler Symbols

Shared Decode

integer n = UInt(Rn);
integer t = UInt(Rt);
integer regsize;

regsize = if size == '11' then 64 else 32;

integer datasize = 8 << scale;
boolean tag_checked = n != 31;

Operation

bits(64) address;
bits(datasize) data;

if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);

if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];

address = address + offset;

data = Mem[address, datasize DIV 8, AccType_NORMAL];

X[t] = ZeroExtend(data, regsize);

Operational information

If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.

LDUR Page 492

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

LDUR Page 493

LDURB

Load Register Byte (unscaled) calculates an address from a base register and an immediate offset, loads a byte from
memory, zero-extends it, and writes it to a register. For information about memory accesses, see Load/Store
addressing modes.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 1 1 0 0 0 0 1 0 imm9 0 0 Rn Rt
size opc

LDURB <Wt>, [<Xn|SP>{, #<simm>}]

bits(64) offset = SignExtend(imm9, 64);

Assembler Symbols

<Wt> Is the 32-bit name of the general-purpose register to be transferred, encoded in the "Rt" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<simm> Is the optional signed immediate byte offset, in the range -256 to 255, defaulting to 0 and encoded in
the "imm9" field.

Shared Decode

integer n = UInt(Rn);
integer t = UInt(Rt);

boolean tag_checked = n != 31;

Operation

bits(64) address;
bits(8) data;

if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);

if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];

address = address + offset;

data = Mem[address, 1, AccType_NORMAL];

X[t] = ZeroExtend(data, 32);

Operational information

If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

LDURB Page 494

LDURH

Load Register Halfword (unscaled) calculates an address from a base register and an immediate offset, loads a
halfword from memory, zero-extends it, and writes it to a register. For information about memory accesses, see Load/
Store addressing modes.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 0 0 0 0 1 0 imm9 0 0 Rn Rt
size opc

LDURH <Wt>, [<Xn|SP>{, #<simm>}]

bits(64) offset = SignExtend(imm9, 64);

Assembler Symbols

<Wt> Is the 32-bit name of the general-purpose register to be transferred, encoded in the "Rt" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<simm> Is the optional signed immediate byte offset, in the range -256 to 255, defaulting to 0 and encoded in
the "imm9" field.

Shared Decode

integer n = UInt(Rn);
integer t = UInt(Rt);

boolean tag_checked = n != 31;

Operation

bits(64) address;
bits(16) data;

if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);

if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];

address = address + offset;

data = Mem[address, 2, AccType_NORMAL];

X[t] = ZeroExtend(data, 32);

Operational information

If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

LDURH Page 495

LDURSB

Load Register Signed Byte (unscaled) calculates an address from a base register and an immediate offset, loads a
signed byte from memory, sign-extends it, and writes it to a register. For information about memory accesses, see
Load/Store addressing modes.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 1 1 0 0 0 1 x 0 imm9 0 0 Rn Rt
size opc

32-bit (opc == 11)

LDURSB <Wt>, [<Xn|SP>{, #<simm>}]

64-bit (opc == 10)

LDURSB <Xt>, [<Xn|SP>{, #<simm>}]

bits(64) offset = SignExtend(imm9, 64);

Assembler Symbols

Shared Decode

integer n = UInt(Rn);
integer t = UInt(Rt);
MemOp memop;
boolean signed;
integer regsize;

if opc<1> == '0' then

boolean tag_checked = memop != MemOp_PREFETCH && (n != 31);

LDURSB Page 496

Operation

bits(64) address;
bits(8) data;

if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);

if n == 31 then
if memop != MemOp_PREFETCH then CheckSPAlignment();
address = SP[];
else
address = X[n];

address = address + offset;

case memop of
when MemOp_STORE
data = X[t];
Mem[address, 1, AccType_NORMAL] = data;

when MemOp_LOAD
data = Mem[address, 1, AccType_NORMAL];
if signed then
X[t] = SignExtend(data, regsize);
else
X[t] = ZeroExtend(data, regsize);

when MemOp_PREFETCH
Prefetch(address, t<4:0>);

Operational information

If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

LDURSB Page 497

LDURSH

Load Register Signed Halfword (unscaled) calculates an address from a base register and an immediate offset, loads a
signed halfword from memory, sign-extends it, and writes it to a register. For information about memory accesses, see
Load/Store addressing modes.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 0 0 0 1 x 0 imm9 0 0 Rn Rt
size opc

32-bit (opc == 11)

LDURSH <Wt>, [<Xn|SP>{, #<simm>}]

64-bit (opc == 10)

LDURSH <Xt>, [<Xn|SP>{, #<simm>}]

bits(64) offset = SignExtend(imm9, 64);

Assembler Symbols

Shared Decode

integer n = UInt(Rn);
integer t = UInt(Rt);
MemOp memop;
boolean signed;
integer regsize;

if opc<1> == '0' then

boolean tag_checked = memop != MemOp_PREFETCH && (n != 31);

LDURSH Page 498

Operation

bits(64) address;
bits(16) data;

if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);

if n == 31 then
if memop != MemOp_PREFETCH then CheckSPAlignment();
address = SP[];
else
address = X[n];

address = address + offset;

case memop of
when MemOp_STORE
data = X[t];
Mem[address, 2, AccType_NORMAL] = data;

when MemOp_LOAD
data = Mem[address, 2, AccType_NORMAL];
if signed then
X[t] = SignExtend(data, regsize);
else
X[t] = ZeroExtend(data, regsize);

when MemOp_PREFETCH
Prefetch(address, t<4:0>);

Operational information

If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

LDURSH Page 499

LDURSW

Load Register Signed Word (unscaled) calculates an address from a base register and an immediate offset, loads a
signed word from memory, sign-extends it, and writes it to a register. For information about memory accesses, see
Load/Store addressing modes.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 1 1 0 0 0 1 0 0 imm9 0 0 Rn Rt
size opc

LDURSW <Xt>, [<Xn|SP>{, #<simm>}]

bits(64) offset = SignExtend(imm9, 64);

Assembler Symbols

Shared Decode

integer n = UInt(Rn);
integer t = UInt(Rt);

boolean tag_checked = n != 31;

Operation

bits(64) address;
bits(32) data;

if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);

if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];

address = address + offset;

data = Mem[address, 4, AccType_NORMAL];

X[t] = SignExtend(data, 64);

Operational information

If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

LDURSW Page 500

LDXP

Load Exclusive Pair of Registers derives an address from a base register value, loads two 32-bit words or two 64-bit
doublewords from memory, and writes them to two registers. For information on single-copy atomicity and alignment
requirements, see Requirements for single-copy atomicity and Alignment of data accesses. The PE marks the physical
address being accessed as an exclusive access. This exclusive access mark is checked by Store Exclusive instructions.
See Synchronization and semaphores. For information about memory accesses, see Load/Store addressing modes.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 sz 0 0 1 0 0 0 0 1 1 (1) (1) (1) (1) (1) 0 Rt2 Rn Rt
L Rs o0

32-bit (sz == 0)

LDXP <Wt1>, <Wt2>, [<Xn|SP>{,#0}]

64-bit (sz == 1)

LDXP <Xt1>, <Xt2>, [<Xn|SP>{,#0}]

integer n = UInt(Rn);
integer t = UInt(Rt);
integer t2 = UInt(Rt2);

integer elsize = 32 << UInt(sz);

integer datasize = elsize * 2;
boolean tag_checked = n != 31;

boolean rt_unknown = FALSE;

For information about the CONSTRAINED UNPREDICTABLE behavior of this instruction, see Architectural Constraints on
UNPREDICTABLE behaviors, and particularly LDXP.

Assembler Symbols

LDXP Page 501

Operation

bits(64) address;
bits(datasize) data;
constant integer dbytes = datasize DIV 8;

if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);

if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];

// Tell the Exclusives monitors to record a sequence of one or more atomic

if rt_unknown then
// ConstrainedUNPREDICTABLE case
X[t] = bits(datasize) UNKNOWN; // In this case t = t2
elsif elsize == 32 then
// 32-bit load exclusive pair (atomic)
data = Mem[address, dbytes, AccType_ATOMIC];
if BigEndian(AccType_ATOMIC) then
X[t] = data<datasize-1:elsize>;
X[t2] = data<elsize-1:0>;
else
X[t] = data<elsize-1:0>;
X[t2] = data<datasize-1:elsize>;
else // elsize == 64
// 64-bit load exclusive pair (not atomic),
// but must be 128-bit aligned
if address != Align(address, dbytes) then
AArch64.Abort(address, AlignmentFault(AccType_ATOMIC, FALSE, FALSE));
X[t] = Mem[address, 8, AccType_ATOMIC];
X[t2] = Mem[address+8, 8, AccType_ATOMIC];

Operational information

If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

LDXP Page 502

LDXR

Load Exclusive Register derives an address from a base register value, loads a 32-bit word or a 64-bit doubleword
from memory, and writes it to a register. The memory access is atomic. The PE marks the physical address being
accessed as an exclusive access. This exclusive access mark is checked by Store Exclusive instructions. See
Synchronization and semaphores. For information about memory accesses see Load/Store addressing modes.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 x 0 0 1 0 0 0 0 1 0 (1) (1) (1) (1) (1) 0 (1) (1) (1) (1) (1) Rn Rt
size L Rs o0 Rt2

32-bit (size == 10)

LDXR <Wt>, [<Xn|SP>{,#0}]

64-bit (size == 11)

LDXR <Xt>, [<Xn|SP>{,#0}]

integer n = UInt(Rn);
integer t = UInt(Rt);

integer elsize = 8 << UInt(size);

integer regsize = if elsize == 64 then 64 else 32;
boolean tag_checked = n != 31;

Assembler Symbols

Operation

bits(64) address;
bits(elsize) data;
constant integer dbytes = elsize DIV 8;

if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);

if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];

// Tell the Exclusives monitors to record a sequence of one or more atomic

data = Mem[address, dbytes, AccType_ATOMIC];

X[t] = ZeroExtend(data, regsize);

Operational information

If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

LDXR Page 503

LDXR Page 504

LDXRB

Load Exclusive Register Byte derives an address from a base register value, loads a byte from memory, zero-extends it
and writes it to a register. The memory access is atomic. The PE marks the physical address being accessed as an
exclusive access. This exclusive access mark is checked by Store Exclusive instructions. See Synchronization and
semaphores. For information about memory accesses see Load/Store addressing modes.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 1 0 0 0 0 1 0 (1) (1) (1) (1) (1) 0 (1) (1) (1) (1) (1) Rn Rt
size L Rs o0 Rt2

LDXRB <Wt>, [<Xn|SP>{,#0}]

integer n = UInt(Rn);
integer t = UInt(Rt);

boolean tag_checked = n != 31;

Assembler Symbols

Operation

bits(64) address;
bits(8) data;

if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);

if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];

// Tell the Exclusives monitors to record a sequence of one or more atomic

data = Mem[address, 1, AccType_ATOMIC];

X[t] = ZeroExtend(data, 32);

Operational information

If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

LDXRB Page 505

LDXRH

Load Exclusive Register Halfword derives an address from a base register value, loads a halfword from memory, zero-
extends it and writes it to a register. The memory access is atomic. The PE marks the physical address being accessed
as an exclusive access. This exclusive access mark is checked by Store Exclusive instructions. See Synchronization and
semaphores. For information about memory accesses see Load/Store addressing modes.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 0 1 0 0 0 0 1 0 (1) (1) (1) (1) (1) 0 (1) (1) (1) (1) (1) Rn Rt
size L Rs o0 Rt2

LDXRH <Wt>, [<Xn|SP>{,#0}]

integer n = UInt(Rn);
integer t = UInt(Rt);

boolean tag_checked = n != 31;

Assembler Symbols

Operation

bits(64) address;
bits(16) data;

if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);

if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];

// Tell the Exclusives monitors to record a sequence of one or more atomic

data = Mem[address, 2, AccType_ATOMIC];

X[t] = ZeroExtend(data, 32);

Operational information

If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

LDXRH Page 506

LSL (immediate)

Logical Shift Left (immediate) shifts a register value left by an immediate number of bits, shifting in zeros, and writes
the result to the destination register.

This is an alias of UBFM. This means:

• The encodings in this description are named to match the encodings of UBFM.
• The description of UBFM gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 1 0 1 0 0 1 1 0 N immr != x11111 Rn Rd
opc imms

32-bit (sf == 0 && N == 0 && imms != 011111)

LSL <Wd>, <Wn>, #<shift>

is equivalent to

UBFM <Wd>, <Wn>, #(-<shift> MOD 32), #(31-<shift>)

and is the preferred disassembly when imms + 1 == immr.

64-bit (sf == 1 && N == 1 && imms != 111111)

LSL <Xd>, <Xn>, #<shift>

is equivalent to

UBFM <Xd>, <Xn>, #(-<shift> MOD 64), #(63-<shift>)

and is the preferred disassembly when imms + 1 == immr.

Assembler Symbols

Operation

The description of UBFM gives the operational pseudocode for this instruction.

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

LSL (immediate) Page 507

LSL (register)

Logical Shift Left (register) shifts a register value left by a variable number of bits, shifting in zeros, and writes the
result to the destination register. The remainder obtained by dividing the second source register by the data size
defines the number of bits by which the first source register is left-shifted.

This is an alias of LSLV. This means:

• The encodings in this description are named to match the encodings of LSLV.
• The description of LSLV gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 0 0 1 1 0 1 0 1 1 0 Rm 0 0 1 0 0 0 Rn Rd
op2

32-bit (sf == 0)

LSL <Wd>, <Wn>, <Wm>

is equivalent to

LSLV <Wd>, <Wn>, <Wm>

and is always the preferred disassembly.

64-bit (sf == 1)

LSL <Xd>, <Xn>, <Xm>

is equivalent to

LSLV <Xd>, <Xn>, <Xm>

and is always the preferred disassembly.

Assembler Symbols

Operation

The description of LSLV gives the operational pseudocode for this instruction.

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

LSL (register) Page 508

LSL (register) Page 509

LSLV

Logical Shift Left Variable shifts a register value left by a variable number of bits, shifting in zeros, and writes the
result to the destination register. The remainder obtained by dividing the second source register by the data size
defines the number of bits by which the first source register is left-shifted.
This instruction is used by the alias LSL (register).
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 0 0 1 1 0 1 0 1 1 0 Rm 0 0 1 0 0 0 Rn Rd
op2

32-bit (sf == 0)

LSLV <Wd>, <Wn>, <Wm>

64-bit (sf == 1)

LSLV <Xd>, <Xn>, <Xm>

integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer datasize = if sf == '1' then 64 else 32;
ShiftType shift_type = DecodeShift(op2);

Assembler Symbols

Operation

bits(datasize) result;
bits(datasize) operand2 = X[m];

result = ShiftReg(n, shift_type, UInt(operand2) MOD datasize);

X[d] = result;

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

LSLV Page 510

LSR (immediate)

Logical Shift Right (immediate) shifts a register value right by an immediate number of bits, shifting in zeros, and
writes the result to the destination register.

This is an alias of UBFM. This means:

32-bit (sf == 0 && N == 0 && imms == 011111)

LSR <Wd>, <Wn>, #<shift>

is equivalent to

UBFM <Wd>, <Wn>, #<shift>, #31

and is always the preferred disassembly.

64-bit (sf == 1 && N == 1 && imms == 111111)

LSR <Xd>, <Xn>, #<shift>

is equivalent to

UBFM <Xd>, <Xn>, #<shift>, #63

and is always the preferred disassembly.

Assembler Symbols

Operation

The description of UBFM gives the operational pseudocode for this instruction.

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

LSR (immediate) Page 511

LSR (register)

Logical Shift Right (register) shifts a register value right by a variable number of bits, shifting in zeros, and writes the
result to the destination register. The remainder obtained by dividing the second source register by the data size
defines the number of bits by which the first source register is right-shifted.

This is an alias of LSRV. This means:

• The encodings in this description are named to match the encodings of LSRV.
• The description of LSRV gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 0 0 1 1 0 1 0 1 1 0 Rm 0 0 1 0 0 1 Rn Rd
op2

32-bit (sf == 0)

LSR <Wd>, <Wn>, <Wm>

is equivalent to

LSRV <Wd>, <Wn>, <Wm>

and is always the preferred disassembly.

64-bit (sf == 1)

LSR <Xd>, <Xn>, <Xm>

is equivalent to

LSRV <Xd>, <Xn>, <Xm>

and is always the preferred disassembly.

Assembler Symbols

Operation

The description of LSRV gives the operational pseudocode for this instruction.

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

LSR (register) Page 512

LSR (register) Page 513

LSRV

Logical Shift Right Variable shifts a register value right by a variable number of bits, shifting in zeros, and writes the
result to the destination register. The remainder obtained by dividing the second source register by the data size
defines the number of bits by which the first source register is right-shifted.
This instruction is used by the alias LSR (register).
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 0 0 1 1 0 1 0 1 1 0 Rm 0 0 1 0 0 1 Rn Rd
op2

32-bit (sf == 0)

LSRV <Wd>, <Wn>, <Wm>

64-bit (sf == 1)

LSRV <Xd>, <Xn>, <Xm>

integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer datasize = if sf == '1' then 64 else 32;
ShiftType shift_type = DecodeShift(op2);

Assembler Symbols

Operation

bits(datasize) result;
bits(datasize) operand2 = X[m];

result = ShiftReg(n, shift_type, UInt(operand2) MOD datasize);

X[d] = result;

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

LSRV Page 514

MADD

Multiply-Add multiplies two register values, adds a third register value, and writes the result to the destination
register.
This instruction is used by the alias MUL.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 0 0 1 1 0 1 1 0 0 0 Rm 0 Ra Rn Rd
o0

32-bit (sf == 0)

MADD <Wd>, <Wn>, <Wm>, <Wa>

64-bit (sf == 1)

MADD <Xd>, <Xn>, <Xm>, <Xa>

integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer a = UInt(Ra);
integer destsize = if sf == '1' then 64 else 32;

Assembler Symbols

<Wd> Is the 32-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Wn> Is the 32-bit name of the first general-purpose source register holding the multiplicand, encoded in the
"Rn" field.
<Wm> Is the 32-bit name of the second general-purpose source register holding the multiplier, encoded in the
"Rm" field.
<Wa> Is the 32-bit name of the third general-purpose source register holding the addend, encoded in the "Ra"
field.
<Xd> Is the 64-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Xn> Is the 64-bit name of the first general-purpose source register holding the multiplicand, encoded in the
"Rn" field.
<Xm> Is the 64-bit name of the second general-purpose source register holding the multiplier, encoded in the
"Rm" field.
<Xa> Is the 64-bit name of the third general-purpose source register holding the addend, encoded in the "Ra"
field.

Alias Conditions

Alias Is preferred when

MUL Ra == '11111'

Operation

bits(destsize) operand1 = X[n];

bits(destsize) operand2 = X[m];
bits(destsize) operand3 = X[a];

integer result;

result = UInt(operand3) + (UInt(operand1) * UInt(operand2));

X[d] = result<destsize-1:0>;

MADD Page 515

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

MADD Page 516

MNEG

Multiply-Negate multiplies two register values, negates the product, and writes the result to the destination register.

This is an alias of MSUB. This means:

• The encodings in this description are named to match the encodings of MSUB.
• The description of MSUB gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 0 0 1 1 0 1 1 0 0 0 Rm 1 1 1 1 1 1 Rn Rd
o0 Ra

32-bit (sf == 0)

MNEG <Wd>, <Wn>, <Wm>

is equivalent to

MSUB <Wd>, <Wn>, <Wm>, WZR

and is always the preferred disassembly.

64-bit (sf == 1)

MNEG <Xd>, <Xn>, <Xm>

is equivalent to

MSUB <Xd>, <Xn>, <Xm>, XZR

and is always the preferred disassembly.

Assembler Symbols

<Wd> Is the 32-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Wn> Is the 32-bit name of the first general-purpose source register holding the multiplicand, encoded in the
"Rn" field.
<Wm> Is the 32-bit name of the second general-purpose source register holding the multiplier, encoded in the
"Rm" field.
<Xd> Is the 64-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Xn> Is the 64-bit name of the first general-purpose source register holding the multiplicand, encoded in the
"Rn" field.
<Xm> Is the 64-bit name of the second general-purpose source register holding the multiplier, encoded in the
"Rm" field.

Operation

The description of MSUB gives the operational pseudocode for this instruction.

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

MNEG Page 517

MNEG Page 518

MOV (bitmask immediate)

Move (bitmask immediate) writes a bitmask immediate value to a register.

This is an alias of ORR (immediate). This means:

• The encodings in this description are named to match the encodings of ORR (immediate).
• The description of ORR (immediate) gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 0 1 1 0 0 1 0 0 N immr imms 1 1 1 1 1 Rd
opc Rn

32-bit (sf == 0 && N == 0)

MOV <Wd|WSP>, #<imm>

is equivalent to

ORR <Wd|WSP>, WZR, #<imm>

and is the preferred disassembly when ! MoveWidePreferred(sf, N, imms, immr).

64-bit (sf == 1)

MOV <Xd|SP>, #<imm>

is equivalent to

ORR <Xd|SP>, XZR, #<imm>

and is the preferred disassembly when ! MoveWidePreferred(sf, N, imms, immr).

Assembler Symbols

<Wd|WSP> Is the 32-bit name of the destination general-purpose register or stack pointer, encoded in the "Rd"
field.
<Xd|SP> Is the 64-bit name of the destination general-purpose register or stack pointer, encoded in the "Rd"
field.
<imm> For the 32-bit variant: is the bitmask immediate, encoded in "imms:immr", but excluding values which
could be encoded by MOVZ or MOVN.
For the 64-bit variant: is the bitmask immediate, encoded in "N:imms:immr", but excluding values which
could be encoded by MOVZ or MOVN.

Operation

The description of ORR (immediate) gives the operational pseudocode for this instruction.

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

MOV (bitmask immediate) Page 519

MOV (inverted wide immediate)

Move (inverted wide immediate) moves an inverted 16-bit immediate value to a register.

This is an alias of MOVN. This means:

• The encodings in this description are named to match the encodings of MOVN.
• The description of MOVN gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 0 0 1 0 0 1 0 1 hw imm16 Rd
opc

32-bit (sf == 0 && hw == 0x)

MOV <Wd>, #<imm>

is equivalent to

MOVN <Wd>, #<imm16>, LSL #<shift>

and is the preferred disassembly when ! (IsZero(imm16) && hw != '00') && ! IsOnes(imm16).

64-bit (sf == 1)

MOV <Xd>, #<imm>

is equivalent to

MOVN <Xd>, #<imm16>, LSL #<shift>

and is the preferred disassembly when ! (IsZero(imm16) && hw != '00').

Assembler Symbols

<Wd> Is the 32-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Xd> Is the 64-bit name of the general-purpose destination register, encoded in the "Rd" field.
<imm> For the 32-bit variant: is a 32-bit immediate, the bitwise inverse of which can be encoded in
"imm16:hw", but excluding 0xffff0000 and 0x0000ffff
For the 64-bit variant: is a 64-bit immediate, the bitwise inverse of which can be encoded in
"imm16:hw".
<shift> For the 32-bit variant: is the amount by which to shift the immediate left, either 0 (the default) or 16,
encoded in the "hw" field as <shift>/16.
For the 64-bit variant: is the amount by which to shift the immediate left, either 0 (the default), 16, 32
or 48, encoded in the "hw" field as <shift>/16.

Operation

The description of MOVN gives the operational pseudocode for this instruction.

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

MOV (inverted wide

Page 520
immediate)
Copyright © 2010-2021 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.

MOV (inverted wide

Page 521
immediate)
MOV (register)

Move (register) copies the value in a source register to the destination register.

This is an alias of ORR (shifted register). This means:

• The encodings in this description are named to match the encodings of ORR (shifted register).
• The description of ORR (shifted register) gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 0 1 0 1 0 1 0 0 0 0 Rm 0 0 0 0 0 0 1 1 1 1 1 Rd
opc shift N imm6 Rn

32-bit (sf == 0)

MOV <Wd>, <Wm>

is equivalent to

ORR <Wd>, WZR, <Wm>

and is always the preferred disassembly.

64-bit (sf == 1)

MOV <Xd>, <Xm>

is equivalent to

ORR <Xd>, XZR, <Xm>

and is always the preferred disassembly.

Assembler Symbols

<Wd> Is the 32-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Wm> Is the 32-bit name of the general-purpose source register, encoded in the "Rm" field.
<Xd> Is the 64-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Xm> Is the 64-bit name of the general-purpose source register, encoded in the "Rm" field.

Operation

The description of ORR (shifted register) gives the operational pseudocode for this instruction.

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

MOV (register) Page 522

MOV (to/from SP)

Move between register and stack pointer

: Rd = Rn.

This is an alias of ADD (immediate). This means:

• The encodings in this description are named to match the encodings of ADD (immediate).
• The description of ADD (immediate) gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Rn Rd
op S sh imm12

32-bit (sf == 0)

MOV <Wd|WSP>, <Wn|WSP>

is equivalent to

ADD <Wd|WSP>, <Wn|WSP>, #0

and is the preferred disassembly when (Rd == '11111' || Rn == '11111').

64-bit (sf == 1)

MOV <Xd|SP>, <Xn|SP>

is equivalent to

ADD <Xd|SP>, <Xn|SP>, #0

and is the preferred disassembly when (Rd == '11111' || Rn == '11111').

Assembler Symbols

Operation

The description of ADD (immediate) gives the operational pseudocode for this instruction.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

MOV (to/from SP) Page 523

MOV (wide immediate)

Move (wide immediate) moves a 16-bit immediate value to a register.

This is an alias of MOVZ. This means:

• The encodings in this description are named to match the encodings of MOVZ.
• The description of MOVZ gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 1 0 1 0 0 1 0 1 hw imm16 Rd
opc

32-bit (sf == 0 && hw == 0x)

MOV <Wd>, #<imm>

is equivalent to

MOVZ <Wd>, #<imm16>, LSL #<shift>

and is the preferred disassembly when ! (IsZero(imm16) && hw != '00').

64-bit (sf == 1)

MOV <Xd>, #<imm>

is equivalent to

MOVZ <Xd>, #<imm16>, LSL #<shift>

and is the preferred disassembly when ! (IsZero(imm16) && hw != '00').

Assembler Symbols

<Wd> Is the 32-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Xd> Is the 64-bit name of the general-purpose destination register, encoded in the "Rd" field.
<imm> For the 32-bit variant: is a 32-bit immediate which can be encoded in "imm16:hw".
For the 64-bit variant: is a 64-bit immediate which can be encoded in "imm16:hw".
<shift> For the 32-bit variant: is the amount by which to shift the immediate left, either 0 (the default) or 16,
encoded in the "hw" field as <shift>/16.
For the 64-bit variant: is the amount by which to shift the immediate left, either 0 (the default), 16, 32
or 48, encoded in the "hw" field as <shift>/16.

Operation

The description of MOVZ gives the operational pseudocode for this instruction.

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

MOV (wide immediate) Page 524

MOVK

Move wide with keep moves an optionally-shifted 16-bit immediate value into a register, keeping other bits unchanged.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 1 1 1 0 0 1 0 1 hw imm16 Rd
opc

32-bit (sf == 0 && hw == 0x)

MOVK <Wd>, #<imm>{, LSL #<shift>}

64-bit (sf == 1)

MOVK <Xd>, #<imm>{, LSL #<shift>}

integer d = UInt(Rd);
integer datasize = if sf == '1' then 64 else 32;
integer pos;

if sf == '0' && hw<1> == '1' then UNDEFINED;

pos = UInt(hw:'0000');

Assembler Symbols

<Wd> Is the 32-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Xd> Is the 64-bit name of the general-purpose destination register, encoded in the "Rd" field.
<imm> Is the 16-bit unsigned immediate, in the range 0 to 65535, encoded in the "imm16" field.
<shift> For the 32-bit variant: is the amount by which to shift the immediate left, either 0 (the default) or 16,
encoded in the "hw" field as <shift>/16.
For the 64-bit variant: is the amount by which to shift the immediate left, either 0 (the default), 16, 32
or 48, encoded in the "hw" field as <shift>/16.

Operation

bits(datasize) result;

result = X[d];
result<pos+15:pos> = imm16;
X[d] = result;

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

MOVK Page 525

MOVN

Move wide with NOT moves the inverse of an optionally-shifted 16-bit immediate value to a register.
This instruction is used by the alias MOV (inverted wide immediate).
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 0 0 1 0 0 1 0 1 hw imm16 Rd
opc

32-bit (sf == 0 && hw == 0x)

MOVN <Wd>, #<imm>{, LSL #<shift>}

64-bit (sf == 1)

MOVN <Xd>, #<imm>{, LSL #<shift>}

integer d = UInt(Rd);
integer datasize = if sf == '1' then 64 else 32;
integer pos;

if sf == '0' && hw<1> == '1' then UNDEFINED;

pos = UInt(hw:'0000');

Assembler Symbols

<Wd> Is the 32-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Xd> Is the 64-bit name of the general-purpose destination register, encoded in the "Rd" field.
<imm> Is the 16-bit unsigned immediate, in the range 0 to 65535, encoded in the "imm16" field.
<shift> For the 32-bit variant: is the amount by which to shift the immediate left, either 0 (the default) or 16,
encoded in the "hw" field as <shift>/16.
For the 64-bit variant: is the amount by which to shift the immediate left, either 0 (the default), 16, 32
or 48, encoded in the "hw" field as <shift>/16.

Alias Conditions

Of
Alias Is preferred when
variant
MOV (inverted wide 64-bit ! (IsZero(imm16) && hw != '00')
immediate)
MOV (inverted wide 32-bit ! (IsZero(imm16) && hw != '00') && ! IsOnes(imm16)
immediate)

Operation

bits(datasize) result;

result = Zeros();

result<pos+15:pos> = imm16;
result = NOT(result);
X[d] = result;

Operational information

If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.

MOVN Page 526

• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

MOVN Page 527

MOVZ

Move wide with zero moves an optionally-shifted 16-bit immediate value to a register.
This instruction is used by the alias MOV (wide immediate).
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 1 0 1 0 0 1 0 1 hw imm16 Rd
opc

32-bit (sf == 0 && hw == 0x)

MOVZ <Wd>, #<imm>{, LSL #<shift>}

64-bit (sf == 1)

MOVZ <Xd>, #<imm>{, LSL #<shift>}

integer d = UInt(Rd);
integer datasize = if sf == '1' then 64 else 32;
integer pos;

if sf == '0' && hw<1> == '1' then UNDEFINED;

pos = UInt(hw:'0000');

Assembler Symbols

<Wd> Is the 32-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Xd> Is the 64-bit name of the general-purpose destination register, encoded in the "Rd" field.
<imm> Is the 16-bit unsigned immediate, in the range 0 to 65535, encoded in the "imm16" field.
<shift> For the 32-bit variant: is the amount by which to shift the immediate left, either 0 (the default) or 16,
encoded in the "hw" field as <shift>/16.
For the 64-bit variant: is the amount by which to shift the immediate left, either 0 (the default), 16, 32
or 48, encoded in the "hw" field as <shift>/16.

Alias Conditions

Alias Is preferred when

MOV (wide immediate) ! (IsZero(imm16) && hw != '00')

Operation

bits(datasize) result;

result = Zeros();

result<pos+15:pos> = imm16;
X[d] = result;

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

MOVZ Page 528

MOVZ Page 529

MRS

Move System Register allows the PE to read an AArch64 System register into a general-purpose register.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 0 1 0 1 0 0 1 1 o0 op1 CRn CRm op2 Rt
L

MRS <Xt>, (<systemreg>|S<op0>_<op1>_<Cn>_<Cm>_<op2>)

integer t = UInt(Rt);

integer sys_op0 = 2 + UInt(o0);

integer sys_op1 = UInt(op1);
integer sys_op2 = UInt(op2);
integer sys_crn = UInt(CRn);
integer sys_crm = UInt(CRm);

Assembler Symbols

<Xt> Is the 64-bit name of the general-purpose destination register, encoded in the "Rt" field.
<systemreg> Is a System register name, encoded in the "o0:op1:CRn:CRm:op2".
The System register names are defined in 'AArch64 System Registers' in the System Register XML.

<op0> Is an unsigned immediate, encoded in “o0”:

o0 <op0>
0 2
1 3

<op1> Is a 3-bit unsigned immediate, in the range 0 to 7, encoded in the "op1" field.
<Cn> Is a name 'Cn', with 'n' in the range 0 to 15, encoded in the "CRn" field.
<Cm> Is a name 'Cm', with 'm' in the range 0 to 15, encoded in the "CRm" field.
<op2> Is a 3-bit unsigned immediate, in the range 0 to 7, encoded in the "op2" field.

Operation

AArch64.SysRegRead(sys_op0, sys_op1, sys_crn, sys_crm, sys_op2, t);

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

MRS Page 530

MSR (immediate)

Move immediate value to Special Register moves an immediate value to selected bits of the PSTATE. For more
information, see Process state, PSTATE.
The bits that can be written by this instruction are:
• PSTATE.D, PSTATE.A, PSTATE.I, PSTATE.F, and PSTATE.SP.
• If FEAT_SSBS is implemented, PSTATE.SSBS.
• If FEAT_PAN is implemented, PSTATE.PAN.
• If FEAT_UAO is implemented, PSTATE.UAO.
• If FEAT_DIT is implemented, PSTATE.DIT.
• If FEAT_MTE is implemented, PSTATE.TCO.
• If FEAT_NMI is implemented, PSTATE.ALLINT.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 0 1 0 1 0 0 0 0 0 op1 0 1 0 0 CRm op2 1 1 1 1 1

MSR (immediate) Page 531

MSR <pstatefield>, #<imm>

if op1 == '000' && op2 == '000' then SEE "CFINV";

if op1 == '000' && op2 == '001' then SEE "XAFLAG";
if op1 == '000' && op2 == '010' then SEE "AXFLAG";

bits(2) min_EL;
boolean need_secure = FALSE;

case op1 of
when '00x'
min_EL = EL1;
when '010'
min_EL = EL1;
when '011'
min_EL = EL0;
when '100'
min_EL = EL2;
when '101'
if !HaveVirtHostExt() then
UNDEFINED;
min_EL = EL2;
when '110'
min_EL = EL3;
when '111'
min_EL = EL1;
need_secure = TRUE;

if UInt(PSTATE.EL) < UInt(min_EL) || (need_secure && !IsSecure()) then

UNDEFINED;

PSTATEField field;
case op1:op2 of
when '000 011'
if !HaveUAOExt() then UNDEFINED;
field = PSTATEField_UAO;
when '000 100'
if !HavePANExt() then UNDEFINED;
field = PSTATEField_PAN;
when '000 101' field = PSTATEField_SP;
when '001 000'
if !HaveFeatNMI() then UNDEFINED;
if CRm<3:1> != '000' then UNDEFINED;
field = PSTATEField_ALLINT;
when '011 010'
if !HaveDITExt() then UNDEFINED;
field = PSTATEField_DIT;
when '011 100'
if !HaveMTEExt() then UNDEFINED;
field = PSTATEField_TCO;
when '011 110' field = PSTATEField_DAIFSet;
when '011 111' field = PSTATEField_DAIFClr;
when '011 001'
if !HaveSSBSExt() then UNDEFINED;
field = PSTATEField_SSBS;
otherwise UNDEFINED;

// Check that an AArch64 MSR/MRS access to the DAIF flags is permitted

if PSTATE.EL == EL0 && field IN {PSTATEField_DAIFSet, PSTATEField_DAIFClr} then
if !ELUsingAArch32(EL1) && ((EL2Enabled() && HCR_EL2.<E2H,TGE> == '11') || SCTLR_EL1.UMA == '0') then
if EL2Enabled() && !ELUsingAArch32(EL2) && HCR_EL2.TGE == '1' then
AArch64.SystemAccessTrap(EL2, 0x18);
else
AArch64.SystemAccessTrap(EL1, 0x18);

MSR (immediate) Page 532

Assembler Symbols

<pstatefield> Is a PSTATE field name, encoded in “op1:op2:CRm”:

op1 op2 CRm <pstatefield> Architectural Feature

000 00x xxxx SEE PSTATE -
000 010 xxxx SEE PSTATE -
000 011 xxxx UAO FEAT_UAO
000 100 xxxx PAN FEAT_PAN
000 101 xxxx SPSel -
000 11x xxxx RESERVED -
001 000 000x ALLINT FEAT_NMI
001 000 001x RESERVED -
001 000 01xx RESERVED -
001 000 1xxx RESERVED -
001 001 xxxx RESERVED -
001 01x xxxx RESERVED -
001 1xx xxxx RESERVED -
010 xxx xxxx RESERVED -
011 000 xxxx RESERVED -
011 001 xxxx SSBS FEAT_SSBS
011 010 xxxx DIT FEAT_DIT
011 011 xxxx RESERVED -
011 100 xxxx TCO FEAT_MTE
011 101 xxxx RESERVED -
011 110 xxxx DAIFSet -
011 111 xxxx DAIFClr -
1xx xxx xxxx RESERVED -

<imm> Is a 4-bit unsigned immediate, in the range 0 to 15, encoded in the "CRm" field. Restricted to the range
0 to 1, encoded in "CRm<0>", when <pstatefield> is ALLINT.

Operation

case field of
when PSTATEField_SSBS
PSTATE.SSBS = CRm<0>;
when PSTATEField_SP
PSTATE.SP = CRm<0>;
when PSTATEField_DAIFSet
PSTATE.D = PSTATE.D OR CRm<3>;
PSTATE.A = PSTATE.A OR CRm<2>;
PSTATE.I = PSTATE.I OR CRm<1>;
PSTATE.F = PSTATE.F OR CRm<0>;
when PSTATEField_DAIFClr
PSTATE.D = PSTATE.D AND NOT(CRm<3>);
PSTATE.A = PSTATE.A AND NOT(CRm<2>);
PSTATE.I = PSTATE.I AND NOT(CRm<1>);
PSTATE.F = PSTATE.F AND NOT(CRm<0>);
when PSTATEField_PAN
PSTATE.PAN = CRm<0>;
when PSTATEField_UAO
PSTATE.UAO = CRm<0>;
when PSTATEField_DIT
PSTATE.DIT = CRm<0>;
when PSTATEField_TCO
PSTATE.TCO = CRm<0>;
when PSTATEField_ALLINT
if (PSTATE.EL == EL1 && IsHCRXEL2Enabled() && HCRX_EL2.TALLINT == '1' && CRm<0> == '1') then
AArch64.SystemAccessTrap(EL2, 0x18);
PSTATE.ALLINT = CRm<0>;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

MSR (immediate) Page 533

MSR (register)

Move general-purpose register to System Register allows the PE to write an AArch64 System register from a general-
purpose register.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 0 1 0 1 0 0 0 1 o0 op1 CRn CRm op2 Rt
L

MSR (<systemreg>|S<op0>_<op1>_<Cn>_<Cm>_<op2>), <Xt>

integer t = UInt(Rt);

integer sys_op0 = 2 + UInt(o0);

integer sys_op1 = UInt(op1);
integer sys_op2 = UInt(op2);
integer sys_crn = UInt(CRn);
integer sys_crm = UInt(CRm);

Assembler Symbols

<systemreg> Is a System register name, encoded in the "o0:op1:CRn:CRm:op2".

The System register names are defined in 'AArch64 System Registers' in the System Register XML.

<op0> Is an unsigned immediate, encoded in “o0”:

o0 <op0>
0 2
1 3

Operation

AArch64.SysRegWrite(sys_op0, sys_op1, sys_crn, sys_crm, sys_op2, t);

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

MSR (register) Page 534

MSUB

Multiply-Subtract multiplies two register values, subtracts the product from a third register value, and writes the
result to the destination register.
This instruction is used by the alias MNEG.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 0 0 1 1 0 1 1 0 0 0 Rm 1 Ra Rn Rd
o0

32-bit (sf == 0)

MSUB <Wd>, <Wn>, <Wm>, <Wa>

64-bit (sf == 1)

MSUB <Xd>, <Xn>, <Xm>, <Xa>

integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer a = UInt(Ra);
integer destsize = if sf == '1' then 64 else 32;

Assembler Symbols

<Wd> Is the 32-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Wn> Is the 32-bit name of the first general-purpose source register holding the multiplicand, encoded in the
"Rn" field.
<Wm> Is the 32-bit name of the second general-purpose source register holding the multiplier, encoded in the
"Rm" field.
<Wa> Is the 32-bit name of the third general-purpose source register holding the minuend, encoded in the
"Ra" field.
<Xd> Is the 64-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Xn> Is the 64-bit name of the first general-purpose source register holding the multiplicand, encoded in the
"Rn" field.
<Xm> Is the 64-bit name of the second general-purpose source register holding the multiplier, encoded in the
"Rm" field.
<Xa> Is the 64-bit name of the third general-purpose source register holding the minuend, encoded in the
"Ra" field.

Alias Conditions

Alias Is preferred when

MNEG Ra == '11111'

Operation

bits(destsize) operand1 = X[n];

bits(destsize) operand2 = X[m];
bits(destsize) operand3 = X[a];

integer result;

result = UInt(operand3) - (UInt(operand1) * UInt(operand2));

X[d] = result<destsize-1:0>;

MSUB Page 535

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

MSUB Page 536

MUL

Multiply

: Rd = Rn * Rm.

This is an alias of MADD. This means:

• The encodings in this description are named to match the encodings of MADD.
• The description of MADD gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 0 0 1 1 0 1 1 0 0 0 Rm 0 1 1 1 1 1 Rn Rd
o0 Ra

32-bit (sf == 0)

MUL <Wd>, <Wn>, <Wm>

is equivalent to

MADD <Wd>, <Wn>, <Wm>, WZR

and is always the preferred disassembly.

64-bit (sf == 1)

MUL <Xd>, <Xn>, <Xm>

is equivalent to

MADD <Xd>, <Xn>, <Xm>, XZR

and is always the preferred disassembly.

Assembler Symbols

<Wd> Is the 32-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Wn> Is the 32-bit name of the first general-purpose source register holding the multiplicand, encoded in the
"Rn" field.
<Wm> Is the 32-bit name of the second general-purpose source register holding the multiplier, encoded in the
"Rm" field.
<Xd> Is the 64-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Xn> Is the 64-bit name of the first general-purpose source register holding the multiplicand, encoded in the
"Rn" field.
<Xm> Is the 64-bit name of the second general-purpose source register holding the multiplier, encoded in the
"Rm" field.

Operation

The description of MADD gives the operational pseudocode for this instruction.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

MUL Page 537

MVN

Bitwise NOT writes the bitwise inverse of a register value to the destination register.

This is an alias of ORN (shifted register). This means:

• The encodings in this description are named to match the encodings of ORN (shifted register).
• The description of ORN (shifted register) gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 0 1 0 1 0 1 0 shift 1 Rm imm6 1 1 1 1 1 Rd
opc N Rn

32-bit (sf == 0)

MVN <Wd>, <Wm>{, <shift> #<amount>}

is equivalent to

ORN <Wd>, WZR, <Wm>{, <shift> #<amount>}

and is always the preferred disassembly.

64-bit (sf == 1)

MVN <Xd>, <Xm>{, <shift> #<amount>}

is equivalent to

ORN <Xd>, XZR, <Xm>{, <shift> #<amount>}

and is always the preferred disassembly.

Assembler Symbols

<shift> Is the optional shift to be applied to the final source, defaulting to LSL and encoded in “shift”:

shift <shift>
00 LSL
01 LSR
10 ASR
11 ROR

Operation

The description of ORN (shifted register) gives the operational pseudocode for this instruction.

Operational information

If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:

MVN Page 538

◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

MVN Page 539

NEG (shifted register)

Negate (shifted register) negates an optionally-shifted register value, and writes the result to the destination register.

This is an alias of SUB (shifted register). This means:

• The encodings in this description are named to match the encodings of SUB (shifted register).
• The description of SUB (shifted register) gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 1 0 0 1 0 1 1 shift 0 Rm imm6 1 1 1 1 1 Rd
op S Rn

32-bit (sf == 0)

NEG <Wd>, <Wm>{, <shift> #<amount>}

is equivalent to

SUB <Wd>, WZR, <Wm> {, <shift> #<amount>}

and is always the preferred disassembly.

64-bit (sf == 1)

NEG <Xd>, <Xm>{, <shift> #<amount>}

is equivalent to

SUB <Xd>, XZR, <Xm> {, <shift> #<amount>}

and is always the preferred disassembly.

Assembler Symbols

<shift> Is the optional shift type to be applied to the second source operand, defaulting to LSL and encoded in
“shift”:

shift <shift>
00 LSL
01 LSR
10 ASR
11 RESERVED

Operation

The description of SUB (shifted register) gives the operational pseudocode for this instruction.

Operational information

If PSTATE.DIT is 1:

NEG (shifted register) Page 540

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

NEG (shifted register) Page 541

NEGS

Negate, setting flags, negates an optionally-shifted register value, and writes the result to the destination register. It
updates the condition flags based on the result.

This is an alias of SUBS (shifted register). This means:

32-bit (sf == 0)

NEGS <Wd>, <Wm>{, <shift> #<amount>}

is equivalent to

SUBS <Wd>, WZR, <Wm> {, <shift> #<amount>}

and is always the preferred disassembly.

64-bit (sf == 1)

NEGS <Xd>, <Xm>{, <shift> #<amount>}

is equivalent to

SUBS <Xd>, XZR, <Xm> {, <shift> #<amount>}

and is always the preferred disassembly.

Assembler Symbols

<shift> Is the optional shift type to be applied to the second source operand, defaulting to LSL and encoded in
“shift”:

shift <shift>
00 LSL
01 LSR
10 ASR
11 RESERVED

Operation

The description of SUBS (shifted register) gives the operational pseudocode for this instruction.

NEGS Page 542

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

NEGS Page 543

NGC

Negate with Carry negates the sum of a register value and the value of NOT (Carry flag), and writes the result to the
destination register.

This is an alias of SBC. This means:

• The encodings in this description are named to match the encodings of SBC.
• The description of SBC gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 1 0 1 1 0 1 0 0 0 0 Rm 0 0 0 0 0 0 1 1 1 1 1 Rd
op S Rn

32-bit (sf == 0)

NGC <Wd>, <Wm>

is equivalent to

SBC <Wd>, WZR, <Wm>

and is always the preferred disassembly.

64-bit (sf == 1)

NGC <Xd>, <Xm>

is equivalent to

SBC <Xd>, XZR, <Xm>

and is always the preferred disassembly.

Assembler Symbols

Operation

The description of SBC gives the operational pseudocode for this instruction.

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

NGC Page 544

NGCS

Negate with Carry, setting flags, negates the sum of a register value and the value of NOT (Carry flag), and writes the
result to the destination register. It updates the condition flags based on the result.

This is an alias of SBCS. This means:

• The encodings in this description are named to match the encodings of SBCS.
• The description of SBCS gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 1 1 1 1 0 1 0 0 0 0 Rm 0 0 0 0 0 0 1 1 1 1 1 Rd
op S Rn

32-bit (sf == 0)

NGCS <Wd>, <Wm>

is equivalent to

SBCS <Wd>, WZR, <Wm>

and is always the preferred disassembly.

64-bit (sf == 1)

NGCS <Xd>, <Xm>

is equivalent to

SBCS <Xd>, XZR, <Xm>

and is always the preferred disassembly.

Assembler Symbols

Operation

The description of SBCS gives the operational pseudocode for this instruction.

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

NGCS Page 545

NOP

No Operation does nothing, other than advance the value of the program counter by 4. This instruction can be used for
instruction alignment purposes.

Note

The timing effects of including a NOP instruction in a program are not guaranteed. It can increase execution time,
leave it unchanged, or even reduce it. Therefore, NOP instructions are not suitable for timing loops.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 0 1 0 1 0 0 0 0 0 0 1 1 0 0 1 0 0 0 0 0 0 0 0 1 1 1 1 1
CRm op2

NOP

// Empty.

Operation

// do nothing

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

NOP Page 546

ORN (shifted register)

Bitwise OR NOT (shifted register) performs a bitwise (inclusive) OR of a register value and the complement of an
optionally-shifted register value, and writes the result to the destination register.
This instruction is used by the alias MVN.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 0 1 0 1 0 1 0 shift 1 Rm imm6 Rn Rd
opc N

32-bit (sf == 0)

ORN <Wd>, <Wn>, <Wm>{, <shift> #<amount>}

64-bit (sf == 1)

ORN <Xd>, <Xn>, <Xm>{, <shift> #<amount>}

integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer datasize = if sf == '1' then 64 else 32;
if sf == '0' && imm6<5> == '1' then UNDEFINED;

ShiftType shift_type = DecodeShift(shift);

integer shift_amount = UInt(imm6);

Assembler Symbols

<shift> Is the optional shift to be applied to the final source, defaulting to LSL and encoded in “shift”:

shift <shift>
00 LSL
01 LSR
10 ASR
11 ROR

Alias Conditions

Alias Is preferred when

MVN Rn == '11111'

ORN (shifted register) Page 547

Operation

bits(datasize) operand1 = X[n];

bits(datasize) operand2 = ShiftReg(m, shift_type, shift_amount);
bits(datasize) result;

operand2 = NOT(operand2);

result = operand1 OR operand2;

X[d] = result;

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

ORN (shifted register) Page 548

ORR (immediate)

Bitwise OR (immediate) performs a bitwise (inclusive) OR of a register value and an immediate register value, and
writes the result to the destination register.
This instruction is used by the alias MOV (bitmask immediate).
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 0 1 1 0 0 1 0 0 N immr imms Rn Rd
opc

32-bit (sf == 0 && N == 0)

ORR <Wd|WSP>, <Wn>, #<imm>

64-bit (sf == 1)

ORR <Xd|SP>, <Xn>, #<imm>

Assembler Symbols

Alias Conditions

Alias Is preferred when

MOV (bitmask immediate) Rn == '11111' && ! MoveWidePreferred(sf, N, imms, immr)

Operation

bits(datasize) result;
bits(datasize) operand1 = X[n];

result = operand1 OR imm;

if d == 31 then
SP[] = result;
else
X[d] = result;

Operational information

ORR (immediate) Page 549

◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

ORR (immediate) Page 550

ORR (shifted register)

Bitwise OR (shifted register) performs a bitwise (inclusive) OR of a register value and an optionally-shifted register
value, and writes the result to the destination register.
This instruction is used by the alias MOV (register).
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 0 1 0 1 0 1 0 shift 0 Rm imm6 Rn Rd
opc N

32-bit (sf == 0)

ORR <Wd>, <Wn>, <Wm>{, <shift> #<amount>}

64-bit (sf == 1)

ORR <Xd>, <Xn>, <Xm>{, <shift> #<amount>}

integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer datasize = if sf == '1' then 64 else 32;
if sf == '0' && imm6<5> == '1' then UNDEFINED;

ShiftType shift_type = DecodeShift(shift);

integer shift_amount = UInt(imm6);

Assembler Symbols

<shift> Is the optional shift to be applied to the final source, defaulting to LSL and encoded in “shift”:

shift <shift>
00 LSL
01 LSR
10 ASR
11 ROR

Alias Conditions

Alias Is preferred when

MOV (register) shift == '00' && imm6 == '000000' && Rn == '11111'

ORR (shifted register) Page 551

Operation

bits(datasize) operand1 = X[n];

bits(datasize) operand2 = ShiftReg(m, shift_type, shift_amount);
bits(datasize) result;

result = operand1 OR operand2;

X[d] = result;

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

ORR (shifted register) Page 552

PACDA, PACDZA

Pointer Authentication Code for Data address, using key A. This instruction computes and inserts a pointer
authentication code for a data address, using a modifier and key A.
The address is in the general-purpose register that is specified by <Xd>.
The modifier is:
• In the general-purpose register or stack pointer that is specified by <Xn|SP> for PACDA.
• The value zero, for PACDZA.

Integer
(FEAT_PAuth)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 1 0 1 0 1 1 0 0 0 0 0 1 0 0 Z 0 1 0 Rn Rd

PACDA (Z == 0)

PACDA <Xd>, <Xn|SP>

PACDZA (Z == 1 && Rn == 11111)

PACDZA <Xd>

boolean source_is_sp = FALSE;

integer d = UInt(Rd);
integer n = UInt(Rn);

if !HavePACExt() then
UNDEFINED;

if Z == '0' then // PACDA

if n == 31 then source_is_sp = TRUE;
else // PACDZA
if n != 31 then UNDEFINED;

Assembler Symbols

Operation

if source_is_sp then
X[d] = AddPACDA(X[d], SP[]);
else
X[d] = AddPACDA(X[d], X[n]);

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

PACDA, PACDZA Page 553

PACDB, PACDZB

Pointer Authentication Code for Data address, using key B. This instruction computes and inserts a pointer
authentication code for a data address, using a modifier and key B.
The address is in the general-purpose register that is specified by <Xd>.
The modifier is:
• In the general-purpose register or stack pointer that is specified by <Xn|SP> for PACDB.
• The value zero, for PACDZB.

Integer
(FEAT_PAuth)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 1 0 1 0 1 1 0 0 0 0 0 1 0 0 Z 0 1 1 Rn Rd

PACDB (Z == 0)

PACDB <Xd>, <Xn|SP>

PACDZB (Z == 1 && Rn == 11111)

PACDZB <Xd>

boolean source_is_sp = FALSE;

integer d = UInt(Rd);
integer n = UInt(Rn);

if !HavePACExt() then
UNDEFINED;

if Z == '0' then // PACDB

if n == 31 then source_is_sp = TRUE;
else // PACDZB
if n != 31 then UNDEFINED;

Assembler Symbols

Operation

if source_is_sp then
X[d] = AddPACDB(X[d], SP[]);
else
X[d] = AddPACDB(X[d], X[n]);

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

PACDB, PACDZB Page 554

PACGA

Pointer Authentication Code, using Generic key. This instruction computes the pointer authentication code for an
address in the first source register, using a modifier in the second source register, and the Generic key. The computed
pointer authentication code is returned in the upper 32 bits of the destination register.

Integer
(FEAT_PAuth)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 0 1 1 0 1 0 1 1 0 Rm 0 0 1 1 0 0 Rn Rd

PACGA <Xd>, <Xn>, <Xm|SP>

boolean source_is_sp = FALSE;

integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);

if !HavePACExt() then
UNDEFINED;

if m == 31 then source_is_sp = TRUE;

Assembler Symbols

<Xd> Is the 64-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Xn> Is the 64-bit name of the first general-purpose source register, encoded in the "Rn" field.
<Xm|SP> Is the 64-bit name of the second general-purpose source register or stack pointer, encoded in the "Rm"
field.

Operation

if source_is_sp then
X[d] = AddPACGA(X[n], SP[]);
else
X[d] = AddPACGA(X[n], X[m]);

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

PACGA Page 555

PACIA, PACIA1716, PACIASP, PACIAZ, PACIZA

Pointer Authentication Code for Instruction address, using key A. This instruction computes and inserts a pointer
authentication code for an instruction address, using a modifier and key A.
The address is:
• In the general-purpose register that is specified by <Xd> for PACIA and PACIZA.
• In X17, for PACIA1716.
• In X30, for PACIASP and PACIAZ.
The modifier is:
• In the general-purpose register or stack pointer that is specified by <Xn|SP> for PACIA.
• The value zero, for PACIZA and PACIAZ.
• In X16, for PACIA1716.
• In SP, for PACIASP.
It has encodings from 2 classes: Integer and System

Integer
(FEAT_PAuth)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 1 0 1 0 1 1 0 0 0 0 0 1 0 0 Z 0 0 0 Rn Rd

PACIA (Z == 0)

PACIA <Xd>, <Xn|SP>

PACIZA (Z == 1 && Rn == 11111)

PACIZA <Xd>

boolean source_is_sp = FALSE;

integer d = UInt(Rd);
integer n = UInt(Rn);

if !HavePACExt() then
UNDEFINED;

if Z == '0' then // PACIA

if n == 31 then source_is_sp = TRUE;
else // PACIZA
if n != 31 then UNDEFINED;

System
(FEAT_PAuth)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 0 1 0 1 0 0 0 0 0 0 1 1 0 0 1 0 0 0 x 1 0 0 x 1 1 1 1 1
CRm op2

PACIA, PACIA1716, PACIASP,

Page 556
PACIAZ, PACIZA
PACIA1716 (CRm == 0001 && op2 == 000)

PACIA1716

PACIASP (CRm == 0011 && op2 == 001)

PACIASP

PACIAZ (CRm == 0011 && op2 == 000)

PACIAZ

integer d;
integer n;
boolean source_is_sp = FALSE;

case CRm:op2 of
when '0011 000' // PACIAZ
d = 30;
n = 31;
when '0011 001' // PACIASP
d = 30;
source_is_sp = TRUE;
if HaveBTIExt() then
// Check for branch target compatibility between PSTATE.BTYPE
// and implicit branch target of PACIASP instruction.
SetBTypeCompatible(BTypeCompatible_PACIXSP());

when '0001 000' // PACIA1716

d = 17;
n = 16;
when '0001 010' SEE "PACIB";
when '0001 100' SEE "AUTIA";
when '0001 110' SEE "AUTIB";
when '0011 01x' SEE "PACIB";
when '0011 10x' SEE "AUTIA";
when '0011 11x' SEE "AUTIB";
when '0000 111' SEE "XPACLRI";
otherwise SEE "HINT";

Assembler Symbols

Operation

if HavePACExt() then
if source_is_sp then
X[d] = AddPACIA(X[d], SP[]);
else
X[d] = AddPACIA(X[d], X[n]);

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

PACIA, PACIA1716, PACIASP,

Page 557
PACIAZ, PACIZA
PACIB, PACIB1716, PACIBSP, PACIBZ, PACIZB

Pointer Authentication Code for Instruction address, using key B. This instruction computes and inserts a pointer
authentication code for an instruction address, using a modifier and key B.
The address is:
• In the general-purpose register that is specified by <Xd> for PACIB and PACIZB.
• In X17, for PACIB1716.
• In X30, for PACIBSP and PACIBZ.
The modifier is:
• In the general-purpose register or stack pointer that is specified by <Xn|SP> for PACIB.
• The value zero, for PACIZB and PACIBZ.
• In X16, for PACIB1716.
• In SP, for PACIBSP.
It has encodings from 2 classes: Integer and System

Integer
(FEAT_PAuth)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 1 0 1 0 1 1 0 0 0 0 0 1 0 0 Z 0 0 1 Rn Rd

PACIB (Z == 0)

PACIB <Xd>, <Xn|SP>

PACIZB (Z == 1 && Rn == 11111)

PACIZB <Xd>

boolean source_is_sp = FALSE;

integer d = UInt(Rd);
integer n = UInt(Rn);

if !HavePACExt() then
UNDEFINED;

if Z == '0' then // PACIB

if n == 31 then source_is_sp = TRUE;
else // PACIZB
if n != 31 then UNDEFINED;

System
(FEAT_PAuth)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 0 1 0 1 0 0 0 0 0 0 1 1 0 0 1 0 0 0 x 1 0 1 x 1 1 1 1 1
CRm op2

PACIB, PACIB1716, PACIBSP,

Page 558
PACIBZ, PACIZB
PACIB1716 (CRm == 0001 && op2 == 010)

PACIB1716

PACIBSP (CRm == 0011 && op2 == 011)

PACIBSP

PACIBZ (CRm == 0011 && op2 == 010)

PACIBZ

integer d;
integer n;
boolean source_is_sp = FALSE;

case CRm:op2 of
when '0011 010' // PACIBZ
d = 30;
n = 31;
when '0011 011' // PACIBSP
d = 30;
source_is_sp = TRUE;
if HaveBTIExt() then
// Check for branch target compatibility between PSTATE.BTYPE
// and implicit branch target of PACIBSP instruction.
SetBTypeCompatible(BTypeCompatible_PACIXSP());
when '0001 010' // PACIB1716
d = 17;
n = 16;
when '0001 000' SEE "PACIA";
when '0001 100' SEE "AUTIA";
when '0001 110' SEE "AUTIB";
when '0011 00x' SEE "PACIA";
when '0011 10x' SEE "AUTIA";
when '0011 11x' SEE "AUTIB";
when '0000 111' SEE "XPACLRI";
otherwise SEE "HINT";

Assembler Symbols

Operation

if HavePACExt() then
if source_is_sp then
X[d] = AddPACIB(X[d], SP[]);
else
X[d] = AddPACIB(X[d], X[n]);

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

PACIB, PACIB1716, PACIBSP,

Page 559
PACIBZ, PACIZB
PRFM (immediate)

Prefetch Memory (immediate) signals the memory system that data memory accesses from a specified address are
likely to occur in the near future. The memory system can respond by taking actions that are expected to speed up the
memory accesses when they do occur, such as preloading the cache line containing the specified address into one or
more caches.
The effect of an PRFM instruction is IMPLEMENTATION DEFINED. For more information, see Prefetch memory.
For information about memory accesses, see Load/Store addressing modes.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 1 1 0 0 1 1 0 imm12 Rn Rt
size opc

PRFM (<prfop>|#<imm5>), [<Xn|SP>{, #<pimm>}]

bits(64) offset = LSL(ZeroExtend(imm12, 64), 3);

Assembler Symbols

<prfop> Is the prefetch operation, defined as <type><target><policy>.

<type> is one of:
PLD
Prefetch for load, encoded in the "Rt<4:3>" field as 0b00.

PLI
Preload instructions, encoded in the "Rt<4:3>" field as 0b01.

PST
Prefetch for store, encoded in the "Rt<4:3>" field as 0b10.

<target> is one of:

L1
Level 1 cache, encoded in the "Rt<2:1>" field as 0b00.

L2
Level 2 cache, encoded in the "Rt<2:1>" field as 0b01.

L3
Level 3 cache, encoded in the "Rt<2:1>" field as 0b10.

<policy> is one of:

KEEP
Retained or temporal prefetch, allocated in the cache normally. Encoded in the "Rt<0>" field as 0.

STRM
Streaming or non-temporal prefetch, for data that is used only once. Encoded in the "Rt<0>" field
as 1.

For more information on these prefetch operations, see Prefetch memory.

For other encodings of the "Rt" field, use <imm5>.
<imm5> Is the prefetch operation encoding as an immediate, in the range 0 to 31, encoded in the "Rt" field.
This syntax is only for encodings that are not accessible using <prfop>.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<pimm> Is the optional positive immediate byte offset, a multiple of 8 in the range 0 to 32760, defaulting to 0
and encoded in the "imm12" field as <pimm>/8.

Shared Decode

integer n = UInt(Rn);
integer t = UInt(Rt);

PRFM (immediate) Page 560

Operation

bits(64) address;

if HaveMTE2Ext() then
SetTagCheckedInstruction(FALSE);

if n == 31 then
address = SP[];
else
address = X[n];

address = address + offset;

Prefetch(address, t<4:0>);

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

PRFM (immediate) Page 561

PRFM (literal)

Prefetch Memory (literal) signals the memory system that data memory accesses from a specified address are likely to
occur in the near future. The memory system can respond by taking actions that are expected to speed up the memory
accesses when they do occur, such as preloading the cache line containing the specified address into one or more
caches.
The effect of an PRFM instruction is IMPLEMENTATION DEFINED. For more information, see Prefetch memory.
For information about memory accesses, see Load/Store addressing modes.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 1 0 0 0 imm19 Rt
opc

PRFM (<prfop>|#<imm5>), <label>

integer t = UInt(Rt);
bits(64) offset;

offset = SignExtend(imm19:'00', 64);

Assembler Symbols

<prfop> Is the prefetch operation, defined as <type><target><policy>.

<type> is one of:
PLD
Prefetch for load, encoded in the "Rt<4:3>" field as 0b00.

PLI
Preload instructions, encoded in the "Rt<4:3>" field as 0b01.

PST
Prefetch for store, encoded in the "Rt<4:3>" field as 0b10.

<target> is one of:

L1
Level 1 cache, encoded in the "Rt<2:1>" field as 0b00.

L2
Level 2 cache, encoded in the "Rt<2:1>" field as 0b01.

L3
Level 3 cache, encoded in the "Rt<2:1>" field as 0b10.

<policy> is one of:

KEEP
Retained or temporal prefetch, allocated in the cache normally. Encoded in the "Rt<0>" field as 0.

STRM
Streaming or non-temporal prefetch, for data that is used only once. Encoded in the "Rt<0>" field
as 1.

For more information on these prefetch operations, see Prefetch memory.

For other encodings of the "Rt" field, use <imm5>.
<imm5> Is the prefetch operation encoding as an immediate, in the range 0 to 31, encoded in the "Rt" field.
This syntax is only for encodings that are not accessible using <prfop>.
<label> Is the program label from which the data is to be loaded. Its offset from the address of this instruction,
in the range +/-1MB, is encoded as "imm19" times 4.

PRFM (literal) Page 562

Operation

bits(64) address = PC[] + offset;

if HaveMTE2Ext() then
SetTagCheckedInstruction(FALSE);

Prefetch(address, t<4:0>);

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

PRFM (literal) Page 563

PRFM (register)

Prefetch Memory (register) signals the memory system that data memory accesses from a specified address are likely
to occur in the near future. The memory system can respond by taking actions that are expected to speed up the
memory accesses when they do occur, such as preloading the cache line containing the specified address into one or
more caches.
The effect of an PRFM instruction is IMPLEMENTATION DEFINED. For more information, see Prefetch memory.
For information about memory accesses, see Load/Store addressing modes.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 1 1 0 0 0 1 0 1 Rm option S 1 0 Rn Rt
size opc

PRFM (<prfop>|#<imm5>), [<Xn|SP>, (<Wm>|<Xm>){, <extend> {<amount>}}]

if option<1> == '0' then UNDEFINED; // sub-word index

ExtendType extend_type = DecodeRegExtend(option);
integer shift = if S == '1' then 3 else 0;

Assembler Symbols

<prfop> Is the prefetch operation, defined as <type><target><policy>.

<type> is one of:
PLD
Prefetch for load, encoded in the "Rt<4:3>" field as 0b00.

PLI
Preload instructions, encoded in the "Rt<4:3>" field as 0b01.

PST
Prefetch for store, encoded in the "Rt<4:3>" field as 0b10.

<target> is one of:

L1
Level 1 cache, encoded in the "Rt<2:1>" field as 0b00.

L2
Level 2 cache, encoded in the "Rt<2:1>" field as 0b01.

L3
Level 3 cache, encoded in the "Rt<2:1>" field as 0b10.

<policy> is one of:

KEEP
Retained or temporal prefetch, allocated in the cache normally. Encoded in the "Rt<0>" field as 0.

STRM
Streaming or non-temporal prefetch, for data that is used only once. Encoded in the "Rt<0>" field
as 1.

For more information on these prefetch operations, see Prefetch memory.

For other encodings of the "Rt" field, use <imm5>.
<imm5> Is the prefetch operation encoding as an immediate, in the range 0 to 31, encoded in the "Rt" field.
This syntax is only for encodings that are not accessible using <prfop>.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<Wm> When option<0> is set to 0, is the 32-bit name of the general-purpose index register, encoded in the
"Rm" field.
<Xm> When option<0> is set to 1, is the 64-bit name of the general-purpose index register, encoded in the
"Rm" field.

<extend> Is the index extend/shift specifier, defaulting to LSL, and which must be omitted for the LSL option
when <amount> is omitted. encoded in “option”:

PRFM (register) Page 564

option <extend>
010 UXTW
011 LSL
110 SXTW
111 SXTX

<amount> Is the index shift amount, optional only when <extend> is not LSL. Where it is permitted to be optional,
it defaults to #0. It is encoded in “S”:

S <amount>
0 #0
1 #3

Shared Decode

integer n = UInt(Rn);
integer t = UInt(Rt);
integer m = UInt(Rm);

Operation

bits(64) offset = ExtendReg(m, extend_type, shift);

bits(64) address;

if HaveMTE2Ext() then
SetTagCheckedInstruction(FALSE);

if n == 31 then
address = SP[];
else
address = X[n];

address = address + offset;

Prefetch(address, t<4:0>);

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

PRFM (register) Page 565

PRFUM

Prefetch Memory (unscaled offset) signals the memory system that data memory accesses from a specified address are
likely to occur in the near future. The memory system can respond by taking actions that are expected to speed up the
memory accesses when they do occur, such as preloading the cache line containing the specified address into one or
more caches.
The effect of an PRFUM instruction is IMPLEMENTATION DEFINED. For more information, see Prefetch memory.
For information about memory accesses, see Load/Store addressing modes.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 1 1 0 0 0 1 0 0 imm9 0 0 Rn Rt
size opc

PRFUM (<prfop>|#<imm5>), [<Xn|SP>{, #<simm>}]

bits(64) offset = SignExtend(imm9, 64);

Assembler Symbols

<prfop> Is the prefetch operation, defined as <type><target><policy>.

<type> is one of:
PLD
Prefetch for load, encoded in the "Rt<4:3>" field as 0b00.

PLI
Preload instructions, encoded in the "Rt<4:3>" field as 0b01.

PST
Prefetch for store, encoded in the "Rt<4:3>" field as 0b10.

<target> is one of:

L1
Level 1 cache, encoded in the "Rt<2:1>" field as 0b00.

L2
Level 2 cache, encoded in the "Rt<2:1>" field as 0b01.

L3
Level 3 cache, encoded in the "Rt<2:1>" field as 0b10.

<policy> is one of:

KEEP
Retained or temporal prefetch, allocated in the cache normally. Encoded in the "Rt<0>" field as 0.

STRM
Streaming or non-temporal prefetch, for data that is used only once. Encoded in the "Rt<0>" field
as 1.

For more information on these prefetch operations, see Prefetch memory.

For other encodings of the "Rt" field, use <imm5>.
<imm5> Is the prefetch operation encoding as an immediate, in the range 0 to 31, encoded in the "Rt" field.
This syntax is only for encodings that are not accessible using <prfop>.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<simm> Is the optional signed immediate byte offset, in the range -256 to 255, defaulting to 0 and encoded in
the "imm9" field.

Shared Decode

integer n = UInt(Rn);
integer t = UInt(Rt);

PRFUM Page 566

Operation

bits(64) address;

if HaveMTE2Ext() then
SetTagCheckedInstruction(FALSE);

if n == 31 then
address = SP[];
else
address = X[n];

address = address + offset;

Prefetch(address, t<4:0>);

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

PRFUM Page 567

PSB CSYNC

Profiling Synchronization Barrier. This instruction is a barrier that ensures that all existing profiling data for the
current PE has been formatted, and profiling buffer addresses have been translated such that all writes to the profiling
buffer have been initiated. A following DSB instruction completes when the writes to the profiling buffer have
completed.
If the Statistical Profiling Extension is not implemented, this instruction executes as a NOP.

System
(FEAT_SPE)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 0 1 0 1 0 0 0 0 0 0 1 1 0 0 1 0 0 0 1 0 0 0 1 1 1 1 1 1
CRm op2

PSB CSYNC

if !HaveStatisticalProfiling() then EndOfInstruction();

Operation

ProfilingSynchronizationBarrier();

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

PSB CSYNC Page 568

PSSBB

Physical Speculative Store Bypass Barrier is a memory barrier which prevents speculative loads from bypassing
earlier stores to the same physical address.
The semantics of the Physical Speculative Store Bypass Barrier are:
• When a load to a location appears in program order after the PSSBB, then the load does not speculatively
read an entry earlier in the coherence order for that location than the entry generated by the latest store
satisfying all of the following conditions:
◦ The store is to the same location as the load.
◦ The store appears in program order before the PSSBB.
• When a load to a location appears in program order before the PSSBB, then the load does not speculatively
read data from any store satisfying all of the following conditions:
◦ The store is to the same location as the load.
◦ The store appears in program order after the PSSBB.

This is an alias of DSB. This means:

• The encodings in this description are named to match the encodings of DSB.
• The description of DSB gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 0 1 0 1 0 0 0 0 0 0 1 1 0 0 1 1 0 1 0 0 1 0 0 1 1 1 1 1
CRm opc

PSSBB

is equivalent to

DSB #4

and is always the preferred disassembly.

Operation

The description of DSB gives the operational pseudocode for this instruction.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

PSSBB Page 569

RBIT

Reverse Bits reverses the bit order in a register.

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 1 0 1 1 0 1 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 Rn Rd

32-bit (sf == 0)

RBIT <Wd>, <Wn>

64-bit (sf == 1)

RBIT <Xd>, <Xn>

integer d = UInt(Rd);
integer n = UInt(Rn);

integer datasize = if sf == '1' then 64 else 32;

Assembler Symbols

Operation

bits(datasize) operand = X[n];

bits(datasize) result;

for i = 0 to datasize-1
result<(datasize-1)-i> = operand<i>;

X[d] = result;

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

RBIT Page 570

RET

Return from subroutine branches unconditionally to an address in a register, with a hint that this is a subroutine
return.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 0 1 1 0 0 1 0 1 1 1 1 1 0 0 0 0 0 0 Rn 0 0 0 0 0
Z op A M Rm

RET {<Xn>}

integer n = UInt(Rn);

Assembler Symbols

<Xn> Is the 64-bit name of the general-purpose register holding the address to be branched to, encoded in
the "Rn" field. Defaults to X30 if absent.

Operation

bits(64) target = X[n];

// Value in BTypeNext will be used to set PSTATE.BTYPE

BTypeNext = '00';

BranchTo(target, BranchType_RET, FALSE);

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

RET Page 571

RETAA, RETAB

Return from subroutine, with pointer authentication. This instruction authenticates the address that is held in LR,
using SP as the modifier and the specified key, branches to the authenticated address, with a hint that this instruction
is a subroutine return.
Key A is used for RETAA, and key B is used for RETAB.
If the authentication passes, the PE continues execution at the target of the branch. If the authentication fails, a
Translation fault is generated.
The authenticated address is not written back to LR.

Integer
(FEAT_PAuth)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 0 1 1 0 0 1 0 1 1 1 1 1 0 0 0 0 1 M 1 1 1 1 1 1 1 1 1 1
Z op A Rn Rm

RETAA (M == 0)

RETAA

RETAB (M == 1)

RETAB

boolean use_key_a = (M == '0');

if !HavePACExt() then
UNDEFINED;

Operation

bits(64) target = X[30];

bits(64) modifier = SP[];

if use_key_a then
target = AuthIA(target, modifier, TRUE);
else
target = AuthIB(target, modifier, TRUE);

// Value in BTypeNext will be used to set PSTATE.BTYPE

BTypeNext = '00';

BranchTo(target, BranchType_RET, FALSE);

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

RETAA, RETAB Page 572

REV

Reverse Bytes reverses the byte order in a register.

This instruction is used by the pseudo-instruction REV64.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 1 0 1 1 0 1 0 1 1 0 0 0 0 0 0 0 0 0 0 1 x Rn Rd
opc

32-bit (sf == 0 && opc == 10)

REV <Wd>, <Wn>

64-bit (sf == 1 && opc == 11)

REV <Xd>, <Xn>

integer d = UInt(Rd);
integer n = UInt(Rn);

integer datasize = if sf == '1' then 64 else 32;

integer container_size;
case opc of
when '00'
Unreachable();
when '01'
container_size = 16;
when '10'
container_size = 32;
when '11'
if sf == '0' then UNDEFINED;
container_size = 64;

Assembler Symbols

Operation

bits(datasize) operand = X[n];

bits(datasize) result;

integer containers = datasize DIV container_size;

integer elements_per_container = container_size DIV 8;
integer index = 0;
integer rev_index;
for c = 0 to containers-1
rev_index = index + ((elements_per_container - 1) * 8);
for e = 0 to elements_per_container-1
result<rev_index+7:rev_index> = operand<index+7:index>;
index = index + 8;
rev_index = rev_index - 8;

X[d] = result;

REV Page 573

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

REV Page 574

REV16

Reverse bytes in 16-bit halfwords reverses the byte order in each 16-bit halfword of a register.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 1 0 1 1 0 1 0 1 1 0 0 0 0 0 0 0 0 0 0 0 1 Rn Rd
opc

32-bit (sf == 0)

REV16 <Wd>, <Wn>

64-bit (sf == 1)

REV16 <Xd>, <Xn>

integer d = UInt(Rd);
integer n = UInt(Rn);

integer datasize = if sf == '1' then 64 else 32;

integer container_size;
case opc of
when '00'
Unreachable();
when '01'
container_size = 16;
when '10'
container_size = 32;
when '11'
if sf == '0' then UNDEFINED;
container_size = 64;

Assembler Symbols

Operation

bits(datasize) operand = X[n];

bits(datasize) result;

integer containers = datasize DIV container_size;

X[d] = result;

Operational information

If PSTATE.DIT is 1:

REV16 Page 575

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

REV16 Page 576

REV32

Reverse bytes in 32-bit words reverses the byte order in each 32-bit word of a register.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 1 0 1 0 1 1 0 0 0 0 0 0 0 0 0 0 1 0 Rn Rd
sf opc

REV32 <Xd>, <Xn>

integer d = UInt(Rd);
integer n = UInt(Rn);

integer datasize = if sf == '1' then 64 else 32;

integer container_size;
case opc of
when '00'
Unreachable();
when '01'
container_size = 16;
when '10'
container_size = 32;
when '11'
if sf == '0' then UNDEFINED;
container_size = 64;

Assembler Symbols

<Xd> Is the 64-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Xn> Is the 64-bit name of the general-purpose source register, encoded in the "Rn" field.

Operation

bits(datasize) operand = X[n];

bits(datasize) result;

integer containers = datasize DIV container_size;

X[d] = result;

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

REV32 Page 577

REV64

Reverse Bytes reverses the byte order in a 64-bit general-purpose register.

When assembling for Armv8.2, an assembler must support this pseudo-instruction. It is OPTIONAL whether an
assembler supports this pseudo-instruction when assembling for an architecture earlier than Armv8.2.

This is a pseudo-instruction of REV. This means:

• The encodings in this description are named to match the encodings of REV.
• The assembler syntax is used only for assembly, and is not used on disassembly.
• The description of REV gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 1 0 1 0 1 1 0 0 0 0 0 0 0 0 0 0 1 1 Rn Rd
sf opc

64-bit

REV64 <Xd>, <Xn>

is equivalent to

REV <Xd>, <Xn>

Assembler Symbols

<Xd> Is the 64-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Xn> Is the 64-bit name of the general-purpose source register, encoded in the "Rn" field.

Operation

The description of REV gives the operational pseudocode for this instruction.

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

REV64 Page 578

RMIF

Performs a rotation right of a value held in a general purpose register by an immediate value, and then inserts a
selection of the bottom four bits of the result of the rotation into the PSTATE flags, under the control of a second
immediate mask.

Integer
(FEAT_FlagM)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 1 1 0 1 0 0 0 0 imm6 0 0 0 0 1 Rn 0 mask
sf

RMIF <Xn>, #<shift>, #<mask>

if !HaveFlagManipulateExt() then UNDEFINED;

integer lsb = UInt(imm6);
integer n = UInt(Rn);

Assembler Symbols

<Xn> Is the 64-bit name of the general-purpose source register, encoded in the "Rn" field.
<shift> Is the shift amount, in the range 0 to 63, defaulting to 0 and encoded in the "imm6" field,
<mask> Is the flag bit mask, an immediate in the range 0 to 15, which selects the bits that are inserted into the
NZCV condition flags, encoded in the "mask" field.

Operation

bits(4) tmp;
bits(64) tmpreg = X[n];
tmp = (tmpreg:tmpreg)<lsb+3:lsb>;
if mask<3> == '1' then PSTATE.N = tmp<3>;
if mask<2> == '1' then PSTATE.Z = tmp<2>;
if mask<1> == '1' then PSTATE.C = tmp<1>;
if mask<0> == '1' then PSTATE.V = tmp<0>;

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

RMIF Page 579

ROR (immediate)

Rotate right (immediate) provides the value of the contents of a register rotated by a variable number of bits. The bits
that are rotated off the right end are inserted into the vacated bit positions on the left.

This is an alias of EXTR. This means:

• The encodings in this description are named to match the encodings of EXTR.
• The description of EXTR gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 0 0 1 0 0 1 1 1 N 0 Rm imms Rn Rd

32-bit (sf == 0 && N == 0 && imms == 0xxxxx)

ROR <Wd>, <Ws>, #<shift>

is equivalent to

EXTR <Wd>, <Ws>, <Ws>, #<shift>

and is the preferred disassembly when Rn == Rm.

64-bit (sf == 1 && N == 1)

ROR <Xd>, <Xs>, #<shift>

is equivalent to

EXTR <Xd>, <Xs>, <Xs>, #<shift>

and is the preferred disassembly when Rn == Rm.

Assembler Symbols

<Wd> Is the 32-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Ws> Is the 32-bit name of the general-purpose source register, encoded in the "Rn" and "Rm" fields.
<Xd> Is the 64-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Xs> Is the 64-bit name of the general-purpose source register, encoded in the "Rn" and "Rm" fields.
<shift> For the 32-bit variant: is the amount by which to rotate, in the range 0 to 31, encoded in the "imms"
field.
For the 64-bit variant: is the amount by which to rotate, in the range 0 to 63, encoded in the "imms"
field.

Operation

The description of EXTR gives the operational pseudocode for this instruction.

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

ROR (immediate) Page 580

ROR (register)

Rotate Right (register) provides the value of the contents of a register rotated by a variable number of bits. The bits
that are rotated off the right end are inserted into the vacated bit positions on the left. The remainder obtained by
dividing the second source register by the data size defines the number of bits by which the first source register is
right-shifted.

This is an alias of RORV. This means:

• The encodings in this description are named to match the encodings of RORV.
• The description of RORV gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 0 0 1 1 0 1 0 1 1 0 Rm 0 0 1 0 1 1 Rn Rd
op2

32-bit (sf == 0)

ROR <Wd>, <Wn>, <Wm>

is equivalent to

RORV <Wd>, <Wn>, <Wm>

and is always the preferred disassembly.

64-bit (sf == 1)

ROR <Xd>, <Xn>, <Xm>

is equivalent to

RORV <Xd>, <Xn>, <Xm>

and is always the preferred disassembly.

Assembler Symbols

Operation

The description of RORV gives the operational pseudocode for this instruction.

Operational information

ROR (register) Page 581

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

ROR (register) Page 582

RORV

Rotate Right Variable provides the value of the contents of a register rotated by a variable number of bits. The bits
that are rotated off the right end are inserted into the vacated bit positions on the left. The remainder obtained by
dividing the second source register by the data size defines the number of bits by which the first source register is
right-shifted.
This instruction is used by the alias ROR (register).
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 0 0 1 1 0 1 0 1 1 0 Rm 0 0 1 0 1 1 Rn Rd
op2

32-bit (sf == 0)

RORV <Wd>, <Wn>, <Wm>

64-bit (sf == 1)

RORV <Xd>, <Xn>, <Xm>

integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer datasize = if sf == '1' then 64 else 32;
ShiftType shift_type = DecodeShift(op2);

Assembler Symbols

Operation

bits(datasize) result;
bits(datasize) operand2 = X[m];

result = ShiftReg(n, shift_type, UInt(operand2) MOD datasize);

X[d] = result;

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

RORV Page 583

Speculation Barrier is a barrier that controls speculation.

The semantics of the Speculation Barrier are that the execution, until the barrier completes, of any instruction that
appears later in the program order than the barrier:
• Cannot be performed speculatively to the extent that such speculation can be observed through side-channels
as a result of control flow speculation or data value speculation.
• Can be speculatively executed as a result of predicting that a potentially exception generating instruction has
not generated an exception.
In particular, any instruction that appears later in the program order than the barrier cannot cause a speculative
allocation into any caching structure where the allocation of that entry could be indicative of any data value present in
memory or in the registers.
The SB instruction:
• Cannot be speculatively executed as a result of control flow speculation or data value speculation.
• Can be speculatively executed as a result of predicting that a potentially exception generating instruction has
not generated an exception. The potentially exception generating instruction can complete once it is known
not to be speculative, and all data values generated by instructions appearing in program order before the SB
instruction have their predicted values confirmed.
When the prediction of the instruction stream is not informed by data taken from the register outputs of the
speculative execution of instructions appearing in program order after an uncompleted SB instruction, the SB
instruction has no effect on the use of prediction resources to predict the instruction stream that is being fetched.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 0 1 0 1 0 0 0 0 0 0 1 1 0 0 1 1 (0) (0) (0) (0) 1 1 1 1 1 1 1 1
CRm opc

if !HaveSBExt() then UNDEFINED;

Operation

SpeculationBarrier();

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

SB Page 584
SBC

Subtract with Carry subtracts a register value and the value of NOT (Carry flag) from a register value, and writes the
result to the destination register.
This instruction is used by the alias NGC.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 1 0 1 1 0 1 0 0 0 0 Rm 0 0 0 0 0 0 Rn Rd
op S

32-bit (sf == 0)

SBC <Wd>, <Wn>, <Wm>

64-bit (sf == 1)

SBC <Xd>, <Xn>, <Xm>

integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer datasize = if sf == '1' then 64 else 32;

Assembler Symbols

Alias Conditions

Alias Is preferred when

NGC Rn == '11111'

Operation

bits(datasize) result;
bits(datasize) operand1 = X[n];
bits(datasize) operand2 = X[m];

operand2 = NOT(operand2);

(result, -) = AddWithCarry(operand1, operand2, PSTATE.C);

X[d] = result;

Operational information

SBC Page 585

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

SBC Page 586

SBCS

Subtract with Carry, setting flags, subtracts a register value and the value of NOT (Carry flag) from a register value,
and writes the result to the destination register. It updates the condition flags based on the result.
This instruction is used by the alias NGCS.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 1 1 1 1 0 1 0 0 0 0 Rm 0 0 0 0 0 0 Rn Rd
op S

32-bit (sf == 0)

SBCS <Wd>, <Wn>, <Wm>

64-bit (sf == 1)

SBCS <Xd>, <Xn>, <Xm>

integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer datasize = if sf == '1' then 64 else 32;

Assembler Symbols

Alias Conditions

Alias Is preferred when

NGCS Rn == '11111'

Operation

bits(datasize) result;
bits(datasize) operand1 = X[n];
bits(datasize) operand2 = X[m];
bits(4) nzcv;

operand2 = NOT(operand2);

(result, nzcv) = AddWithCarry(operand1, operand2, PSTATE.C);

PSTATE.<N,Z,C,V> = nzcv;

X[d] = result;

Operational information

SBCS Page 587

◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

SBCS Page 588

SBFIZ

Signed Bitfield Insert in Zeros copies a bitfield of <width> bits from the least significant bits of the source register to
bit position <lsb> of the destination register, setting the destination bits below the bitfield to zero, and the bits above
the bitfield to a copy of the most significant bit of the bitfield.

This is an alias of SBFM. This means:

32-bit (sf == 0 && N == 0)

SBFIZ <Wd>, <Wn>, #<lsb>, #<width>

is equivalent to

SBFM <Wd>, <Wn>, #(-<lsb> MOD 32), #(<width>-1)

and is the preferred disassembly when UInt(imms) < UInt(immr).

64-bit (sf == 1 && N == 1)

SBFIZ <Xd>, <Xn>, #<lsb>, #<width>

is equivalent to

SBFM <Xd>, <Xn>, #(-<lsb> MOD 64), #(<width>-1)

and is the preferred disassembly when UInt(imms) < UInt(immr).

Assembler Symbols

Operation

The description of SBFM gives the operational pseudocode for this instruction.

Operational information

SBFIZ Page 589

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

SBFIZ Page 590

SBFM

Signed Bitfield Move is usually accessed via one of its aliases, which are always preferred for disassembly.
If <imms> is greater than or equal to <immr>, this copies a bitfield of (<imms>-<immr>+1) bits starting from bit
position <immr> in the source register to the least significant bits of the destination register.
If <imms> is less than <immr>, this copies a bitfield of (<imms>+1) bits from the least significant bits of the source
register to bit position (regsize-<immr>) of the destination register, where regsize is the destination register size of 32
or 64 bits.
In both cases the destination bits below the bitfield are set to zero, and the bits above the bitfield are set to a copy of
the most significant bit of the bitfield.
This instruction is used by the aliases ASR (immediate), SBFIZ, SBFX, SXTB, SXTH, and SXTW.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 0 0 1 0 0 1 1 0 N immr imms Rn Rd
opc

32-bit (sf == 0 && N == 0)

SBFM <Wd>, <Wn>, #<immr>, #<imms>

64-bit (sf == 1 && N == 1)

SBFM <Xd>, <Xn>, #<immr>, #<imms>

integer d = UInt(Rd);
integer n = UInt(Rn);
integer datasize = if sf == '1' then 64 else 32;

integer R;
integer S;
bits(datasize) wmask;
bits(datasize) tmask;

if sf == '1' && N != '1' then UNDEFINED;

if sf == '0' && (N != '0' || immr<5> != '0' || imms<5> != '0') then UNDEFINED;

R = UInt(immr);
S = UInt(imms);
(wmask, tmask) = DecodeBitMasks(N, imms, immr, FALSE);

Assembler Symbols

<Wd> Is the 32-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Wn> Is the 32-bit name of the general-purpose source register, encoded in the "Rn" field.
<Xd> Is the 64-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Xn> Is the 64-bit name of the general-purpose source register, encoded in the "Rn" field.
<immr> For the 32-bit variant: is the right rotate amount, in the range 0 to 31, encoded in the "immr" field.
For the 64-bit variant: is the right rotate amount, in the range 0 to 63, encoded in the "immr" field.
<imms> For the 32-bit variant: is the leftmost bit number to be moved from the source, in the range 0 to 31,
encoded in the "imms" field.
For the 64-bit variant: is the leftmost bit number to be moved from the source, in the range 0 to 63,
encoded in the "imms" field.

Alias Conditions

Alias Of variant Is preferred when

ASR (immediate) 32-bit imms == '011111'
ASR (immediate) 64-bit imms == '111111'

SBFM Page 591

Alias Of variant Is preferred when
SBFIZ UInt(imms) < UInt(immr)
SBFX BFXPreferred(sf, opc<1>, imms, immr)
SXTB immr == '000000' && imms == '000111'
SXTH immr == '000000' && imms == '001111'
SXTW immr == '000000' && imms == '011111'

Operation

bits(datasize) src = X[n];

// perform bitfield move on low bits

bits(datasize) bot = ROR(src, R) AND wmask;

// determine extension bits (sign, zero or dest register)

bits(datasize) top = Replicate(src<S>);

// combine extension bits and result bits

X[d] = (top AND NOT(tmask)) OR (bot AND tmask);

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

SBFM Page 592

SBFX

Signed Bitfield Extract copies a bitfield of <width> bits starting from bit position <lsb> in the source register to the
least significant bits of the destination register, and sets destination bits above the bitfield to a copy of the most
significant bit of the bitfield.

This is an alias of SBFM. This means:

32-bit (sf == 0 && N == 0)

SBFX <Wd>, <Wn>, #<lsb>, #<width>

is equivalent to

SBFM <Wd>, <Wn>, #<lsb>, #(<lsb>+<width>-1)

and is the preferred disassembly when BFXPreferred(sf, opc<1>, imms, immr).

64-bit (sf == 1 && N == 1)

SBFX <Xd>, <Xn>, #<lsb>, #<width>

is equivalent to

SBFM <Xd>, <Xn>, #<lsb>, #(<lsb>+<width>-1)

and is the preferred disassembly when BFXPreferred(sf, opc<1>, imms, immr).

Assembler Symbols

<Wd> Is the 32-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Wn> Is the 32-bit name of the general-purpose source register, encoded in the "Rn" field.
<Xd> Is the 64-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Xn> Is the 64-bit name of the general-purpose source register, encoded in the "Rn" field.
<lsb> For the 32-bit variant: is the bit number of the lsb of the source bitfield, in the range 0 to 31.
For the 64-bit variant: is the bit number of the lsb of the source bitfield, in the range 0 to 63.
<width> For the 32-bit variant: is the width of the bitfield, in the range 1 to 32-<lsb>.
For the 64-bit variant: is the width of the bitfield, in the range 1 to 64-<lsb>.

Operation

The description of SBFM gives the operational pseudocode for this instruction.

Operational information

SBFX Page 593

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

SBFX Page 594

SDIV

Signed Divide divides a signed integer register value by another signed integer register value, and writes the result to
the destination register. The condition flags are not affected.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 0 0 1 1 0 1 0 1 1 0 Rm 0 0 0 0 1 1 Rn Rd
o1

32-bit (sf == 0)

SDIV <Wd>, <Wn>, <Wm>

64-bit (sf == 1)

SDIV <Xd>, <Xn>, <Xm>

integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer datasize = if sf == '1' then 64 else 32;

Assembler Symbols

Operation

bits(datasize) operand1 = X[n];

bits(datasize) operand2 = X[m];
integer result;

if IsZero(operand2) then
result = 0;
else
result = RoundTowardsZero(Real(Int(operand1, FALSE)) / Real(Int(operand2, FALSE)));

X[d] = result<datasize-1:0>;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

SDIV Page 595

SETF8, SETF16

Set the PSTATE.NZV flags based on the value in the specified general-purpose register. SETF8 treats the value as an 8
bit value, and SETF16 treats the value as an 16 bit value.
The PSTATE.C flag is not affected by these instructions.

Integer
(FEAT_FlagM)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 1 1 0 1 0 0 0 0 0 0 0 0 0 0 sz 0 0 1 0 Rn 0 1 1 0 1
sf

SETF8 (sz == 0)

SETF8 <Wn>

SETF16 (sz == 1)

SETF16 <Wn>

if !HaveFlagManipulateExt() then UNDEFINED;

integer msb = if sz == '1' then 15 else 7;
integer n = UInt(Rn);

Assembler Symbols

<Wn> Is the 32-bit name of the general-purpose source register, encoded in the "Rn" field.

Operation

bits(32) tmpreg = X[n];

PSTATE.N = tmpreg<msb>;
PSTATE.Z = if (tmpreg<msb:0> == Zeros(msb + 1)) then '1' else '0';
PSTATE.V = tmpreg<msb+1> EOR tmpreg<msb>;
//PSTATE.C unchanged;

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

SETF8, SETF16 Page 596

SETGP, SETGM, SETGE

Memory Set with tag setting. These instructions perform a memory set using the value in the bottom byte of the
source register and store an Allocation Tag to memory for each Tag Granule written. The Allocation Tag is calculated
from the Logical Address Tag in the register which holds the first address that the set is made to. The prologue, main,
and epilogue instructions are expected to be run in succession and to appear consecutively in memory: SETGP, then
SETGM, and then SETGE.
SETGP performs some preconditioning of the arguments suitable for using the SETGM instruction, and performs an
IMPLEMENTATION DEFINED amount of the memory set. SETGM performs an IMPLEMENTATION DEFINED amount of the
memory set. SETGE performs the last part of the memory set.

Note

The inclusion of IMPLEMENTATION DEFINED amounts of memory set allows some optimization of the size that can be
performed.
The architecture supports two algorithms for the memory set: option A and option B. Which algorithm is used is
IMPLEMENTATION DEFINED.

Note

Portable software should not assume that the choice of algorithm is constant.
After execution of SETGP, option A (which results in encoding PSTATE.C = 0):
• If Xn<63> == 1, the set size is saturated to 0x7FFFFFFFFFFFFFF0.
• Xd holds the original Xd + saturated Xn.
• Xn holds -1* saturated Xn + an IMPLEMENTATION DEFINED number of bytes set.
• PSTATE.{N,Z,V} are set to {0,0,0}.
After execution of SETGP, option B (which results in encoding PSTATE.C = 1):
• If Xn<63> == 1, the copy size is saturated to 0x7FFFFFFFFFFFFFF0.
• Xd holds the original Xd + an IMPLEMENTATION DEFINED number of bytes set.
• Xn holds the saturated Xn - an IMPLEMENTATION DEFINED number of bytes set.
• PSTATE.{N,Z,V} are set to {0,0,0}.
For SETGM, option A (encoded by PSTATE.C = 0), the format of the arguments is:
• Xn is treated as a signed 64-bit number.
• Xn holds -1* number of bytes remaining to be set in the memory set in total.
• Xd holds the lowest address that the set is made to -Xn.
• At the end of the instruction, the value of Xn is written back with -1* number of bytes remaining to be set in
the memory set in total.
For SETGM, option B (encoded by PSTATE.C = 1), the format of the arguments is:
• Xn holds the number of bytes remaining to be set in the memory set in total.
• Xd holds the lowest address that the set is made to.
• At the end of the instruction:
◦ the value of Xn is written back with the number of bytes remaining to be set in the memory set in
total.
◦ the value of Xd is written back with the lowest address that has not been set.
For SETGE, option A (encoded by PSTATE.C = 0), the format of the arguments is:
• Xn is treated as a signed 64-bit number.
• Xn holds -1* the number of bytes remaining to be set in the memory set in total.
• Xd holds the lowest address that the set is made to -Xn.
• At the end of the instruction, the value of Xn is written back with 0.
For SETGE, option B (encoded by PSTATE.C = 1), the format of the arguments is:
• Xn holds the number of bytes remaining to be set in the memory set in total.
• Xd holds the lowest address that the set is made to.
• At the end of the instruction:
◦ the value of Xn is written back with 0.
◦ the value of Xd is written back with the lowest address that has not been set.

Integer
(FEAT_MOPS)

SETGP, SETGM, SETGE Page 597

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sz 0 1 1 1 0 1 1 1 0 Rs x x 0 0 0 1 Rn Rd
op2

Epilogue (op2 == 1000)

SETGE [<Xd>]!, <Xn>!, <Xs>

Main (op2 == 0100)

SETGM [<Xd>]!, <Xn>!, <Xs>

Prologue (op2 == 0000)

SETGP [<Xd>]!, <Xn>!, <Xs>

if !HaveFeatMOPS() then UNDEFINED;

if !HaveMTEExt() then UNDEFINED;
if sz != '00' then UNDEFINED;

integer d = UInt(Rd);
integer s = UInt(Rs);
integer n = UInt(Rn);
bits(2) options = op2<1:0>;

MOPSStage stage;
case op2<3:2> of
when '00' stage = MOPSStage_Prologue;
when '01' stage = MOPSStage_Main;
when '10' stage = MOPSStage_Epilogue;
otherwise UNDEFINED;

if s == n || s == d || n == d then UNDEFINED;
if d == 31 || n == 31 then UNDEFINED;

Assembler Symbols

<Xd> For the epilogue and main variant: is the 64-bit name of the general-purpose register that holds an
encoding of the destination address (an integer multiple of 16) and for option B is updated by the
instruction, encoded in the "Rd" field.
For the prologue variant: is the 64-bit name of the general-purpose register that holds an encoding of
the destination address (an integer multiple of 16) and is updated by the instruction, encoded in the
"Rd" field.
<Xn> For the epilogue variant: is the 64-bit name of the general-purpose register that holds an encoding of
the number of bytes to be set (an integer multiple of 16) and is set to zero at the end of the instruction,
encoded in the "Rn" field.
For the main variant: is the 64-bit name of the general-purpose register that holds an encoding of the
number of bytes to be set (an integer multiple of 16) and is updated by the instruction, encoded in the
"Rn" field.
For the prologue variant: is the 64-bit name of the general-purpose register that holds the number of
bytes to be set (an integer multiple of 16) and is updated by the instruction, encoded in the "Rn" field.
<Xs> For the epilogue variant: is the 64-bit name of the general-purpose register that holds the source data,
encoded in the "Rs" field.
For the main and prologue variant: is the 64-bit name of the general-purpose register that holds the
source data in bits<7:0>, encoded in the "Rs" field.

SETGP, SETGM, SETGE Page 598

Operation

SETGP, SETGM, SETGE Page 599

CheckMOPSEnabled();

bits(64) toaddress = X[d];

bits(64) setsize = X[n];
bits(8) data = X[s];
bits(64) stagesetsize;
boolean is_setg = TRUE;
integer B;

if HaveMTE2Ext() then
SetTagCheckedInstruction(FALSE);

boolean supports_option_a = MemCpyOptionA();

acctype = MemSetAccessType(options);

if stage == MOPSStage_Prologue then

if setsize<63> == '1' then setsize = 0x7FFFFFFFFFFFFFF0<63:0>;

if setsize != Zeros(64) && toaddress != Align(toaddress, TAG_GRANULE) then

boolean iswrite = TRUE;
boolean secondstage = FALSE;
AArch64.Abort(toaddress, AlignmentFault(acctype, iswrite, secondstage));

if setsize != Align(setsize, TAG_GRANULE) then

boolean iswrite = TRUE;
boolean secondstage = FALSE;
AArch64.Abort(toaddress, AlignmentFault(acctype, iswrite, secondstage));

if supports_option_a then
PSTATE.C = '0';
toaddress = toaddress + setsize;
setsize = Zeros(64) - setsize;
else
PSTATE.C = '1';
PSTATE.N = '0';
PSTATE.V = '0';
PSTATE.Z = '0';

stagesetsize = SETPreSizeChoice(toaddress, setsize, is_setg);

assert stagesetsize<63> == setsize<63> || stagesetsize == Zeros();
assert stagesetsize<3:0> == '0000';

if SInt(setsize) > 0 then

assert SInt(stagesetsize) <= SInt(setsize);
else
assert SInt(stagesetsize) >= SInt(setsize);
else
bits(64) postsize = SETPostSizeChoice(toaddress, setsize, is_setg);
assert postsize<63> == setsize<63> || postsize == Zeros();
assert postsize<3:0> == '0000';

boolean zero_size_exceptions = MemSetZeroSizeCheck();

// Check if this version is consistent with the state of the call.

if zero_size_exceptions || SInt(setsize) != 0 then
if supports_option_a then
if PSTATE.C == '1' then
boolean wrong_option = TRUE;
boolean from_epilogue = stage == MOPSStage_Epilogue;
MismatchedMemSetException(supports_option_a, d, s, n, wrong_option, from_epilogue, option
else
if PSTATE.C == '0' then
boolean wrong_option = TRUE;
boolean from_epilogue = stage == MOPSStage_Epilogue;
MismatchedMemSetException(supports_option_a, d, s, n, wrong_option, from_epilogue, option

if stage == MOPSStage_Main then

stagesetsize = setsize - postsize;
if MemSetParametersIllformedM(toaddress, setsize, is_setg) then
boolean wrong_option = FALSE;

SETGP, SETGM, SETGE Page 600

boolean from_epilogue = FALSE;
MismatchedMemSetException(supports_option_a, d, s, n, wrong_option, from_epilogue, options, i
else
stagesetsize = postsize;
if (setsize != postsize || MemSetParametersIllformedE(toaddress, setsize, is_setg)) then
boolean wrong_option = FALSE;
boolean from_epilogue = TRUE;
MismatchedMemSetException(supports_option_a, d, s, n, wrong_option, from_epilogue, options, i

if setsize != Zeros(64) && toaddress != Align(toaddress, TAG_GRANULE) then

boolean iswrite = TRUE;
boolean secondstage = FALSE;
AArch64.Abort(toaddress, AlignmentFault(acctype, iswrite, secondstage));

if setsize != Align(setsize, TAG_GRANULE) then

boolean iswrite = TRUE;
boolean secondstage = FALSE;
AArch64.Abort(toaddress, AlignmentFault(acctype, iswrite, secondstage));

integer tagstep;
bits(4) tag;
bits(64) tagaddr;

if supports_option_a then
while SInt(stagesetsize) < 0 do
// IMP DEF selection of the block size that is worked on. While many
// implementations might make this constant, that is not assumed.
B = SETSizeChoice(toaddress, setsize, 16);
assert B <= -1 * SInt(stagesetsize);
assert B<3:0> == '0000';

Mem[toaddress+setsize, B, acctype] = Replicate(data, B);

tagstep = B DIV 16;

tag = AArch64.AllocationTagFromAddress(toaddress + setsize);
while tagstep > 0 do
tagaddr = toaddress + setsize + (tagstep - 1) * 16;
AArch64.MemTag[tagaddr, acctype] = tag;
tagstep = tagstep - 1;

setsize = setsize + B;
stagesetsize = stagesetsize + B;
if stage != MOPSStage_Prologue then
X[n] = setsize;
else
while UInt(stagesetsize) > 0 do
// IMP DEF selection of the block size that is worked on. While many
// implementations might make this constant, that is not assumed.
B = SETSizeChoice(toaddress, setsize, 16);
assert B <= UInt(stagesetsize);
assert B<3:0> == '0000';

Mem[toaddress, B, acctype] = Replicate(data, B);

tagstep = B DIV 16;

tag = AArch64.AllocationTagFromAddress(toaddress);
while tagstep > 0 do
tagaddr = toaddress + (tagstep - 1) * 16;
AArch64.MemTag[tagaddr, acctype] = tag;
tagstep = tagstep - 1;

toaddress = toaddress + B;
setsize = setsize - B;
stagesetsize = stagesetsize - B;
if stage != MOPSStage_Prologue then
X[n] = setsize;
X[d] = toaddress;

if stage == MOPSStage_Prologue then

SETGP, SETGM, SETGE Page 601

X[n] = setsize;
X[d] = toaddress;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

SETGP, SETGM, SETGE Page 602

SETGPN, SETGMN, SETGEN

Memory Set with tag setting, non-temporal. These instructions perform a memory set using the value in the bottom
byte of the source register and store an Allocation Tag to memory for each Tag Granule written. The Allocation Tag is
calculated from the Logical Address Tag in the register which holds the first address that the set is made to. The
prologue, main, and epilogue instructions are expected to be run in succession and to appear consecutively in
memory: SETGPN, then SETGMN, and then SETGEN.
SETGPN performs some preconditioning of the arguments suitable for using the SETGMN instruction, and performs
an IMPLEMENTATION DEFINED amount of the memory set. SETGMN performs an IMPLEMENTATION DEFINED amount of the
memory set. SETGEN performs the last part of the memory set.

Note

Portable software should not assume that the choice of algorithm is constant.
After execution of SETGPN, option A (which results in encoding PSTATE.C = 0):
• If Xn<63> == 1, the set size is saturated to 0x7FFFFFFFFFFFFFF0.
• Xd holds the original Xd + saturated Xn.
• Xn holds -1* saturated Xn + an IMPLEMENTATION DEFINED number of bytes set.
• PSTATE.{N,Z,V} are set to {0,0,0}.
After execution of SETGPN, option B (which results in encoding PSTATE.C = 1):
• If Xn<63> == 1, the copy size is saturated to 0x7FFFFFFFFFFFFFF0.
• Xd holds the original Xd + an IMPLEMENTATION DEFINED number of bytes set.
• Xn holds the saturated Xn - an IMPLEMENTATION DEFINED number of bytes set.
• PSTATE.{N,Z,V} are set to {0,0,0}.
For SETGMN, option A (encoded by PSTATE.C = 0), the format of the arguments is:
• Xn is treated as a signed 64-bit number.
• Xn holds -1* number of bytes remaining to be set in the memory set in total.
• Xd holds the lowest address that the set is made to -Xn.
• At the end of the instruction, the value of Xn is written back with -1* number of bytes remaining to be set in
the memory set in total.
For SETGMN, option B (encoded by PSTATE.C = 1), the format of the arguments is:
• Xn holds the number of bytes remaining to be set in the memory set in total.
• Xd holds the lowest address that the set is made to.
• At the end of the instruction:
◦ the value of Xn is written back with the number of bytes remaining to be set in the memory set in
total.
◦ the value of Xd is written back with the lowest address that has not been set.
For SETGEN, option A (encoded by PSTATE.C = 0), the format of the arguments is:
• Xn is treated as a signed 64-bit number.
• Xn holds -1* the number of bytes remaining to be set in the memory set in total.
• Xd holds the lowest address that the set is made to -Xn.
• At the end of the instruction, the value of Xn is written back with 0.
For SETGEN, option B (encoded by PSTATE.C = 1), the format of the arguments is:
• Xn holds the number of bytes remaining to be set in the memory set in total.
• Xd holds the lowest address that the set is made to.
• At the end of the instruction:
◦ the value of Xn is written back with 0.
◦ the value of Xd is written back with the lowest address that has not been set.

Integer
(FEAT_MOPS)

SETGPN, SETGMN, SETGEN Page 603

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sz 0 1 1 1 0 1 1 1 0 Rs x x 1 0 0 1 Rn Rd
op2

Epilogue (op2 == 1010)

SETGEN [<Xd>]!, <Xn>!, <Xs>

Main (op2 == 0110)

SETGMN [<Xd>]!, <Xn>!, <Xs>

Prologue (op2 == 0010)

SETGPN [<Xd>]!, <Xn>!, <Xs>

if !HaveFeatMOPS() then UNDEFINED;

if !HaveMTEExt() then UNDEFINED;
if sz != '00' then UNDEFINED;

integer d = UInt(Rd);
integer s = UInt(Rs);
integer n = UInt(Rn);
bits(2) options = op2<1:0>;

MOPSStage stage;
case op2<3:2> of
when '00' stage = MOPSStage_Prologue;
when '01' stage = MOPSStage_Main;
when '10' stage = MOPSStage_Epilogue;
otherwise UNDEFINED;

if s == n || s == d || n == d then UNDEFINED;
if d == 31 || n == 31 then UNDEFINED;

Assembler Symbols

SETGPN, SETGMN, SETGEN Page 604

Operation

SETGPN, SETGMN, SETGEN Page 605

CheckMOPSEnabled();

bits(64) toaddress = X[d];

bits(64) setsize = X[n];
bits(8) data = X[s];
bits(64) stagesetsize;
boolean is_setg = TRUE;
integer B;

if HaveMTE2Ext() then
SetTagCheckedInstruction(FALSE);

boolean supports_option_a = MemCpyOptionA();

acctype = MemSetAccessType(options);

if stage == MOPSStage_Prologue then

if setsize<63> == '1' then setsize = 0x7FFFFFFFFFFFFFF0<63:0>;

if setsize != Zeros(64) && toaddress != Align(toaddress, TAG_GRANULE) then

boolean iswrite = TRUE;
boolean secondstage = FALSE;
AArch64.Abort(toaddress, AlignmentFault(acctype, iswrite, secondstage));

if setsize != Align(setsize, TAG_GRANULE) then

boolean iswrite = TRUE;
boolean secondstage = FALSE;
AArch64.Abort(toaddress, AlignmentFault(acctype, iswrite, secondstage));

if supports_option_a then
PSTATE.C = '0';
toaddress = toaddress + setsize;
setsize = Zeros(64) - setsize;
else
PSTATE.C = '1';
PSTATE.N = '0';
PSTATE.V = '0';
PSTATE.Z = '0';

stagesetsize = SETPreSizeChoice(toaddress, setsize, is_setg);

assert stagesetsize<63> == setsize<63> || stagesetsize == Zeros();
assert stagesetsize<3:0> == '0000';

if SInt(setsize) > 0 then

boolean zero_size_exceptions = MemSetZeroSizeCheck();

// Check if this version is consistent with the state of the call.

if stage == MOPSStage_Main then

stagesetsize = setsize - postsize;
if MemSetParametersIllformedM(toaddress, setsize, is_setg) then
boolean wrong_option = FALSE;

SETGPN, SETGMN, SETGEN Page 606

if setsize != Zeros(64) && toaddress != Align(toaddress, TAG_GRANULE) then

boolean iswrite = TRUE;
boolean secondstage = FALSE;
AArch64.Abort(toaddress, AlignmentFault(acctype, iswrite, secondstage));

if setsize != Align(setsize, TAG_GRANULE) then

boolean iswrite = TRUE;
boolean secondstage = FALSE;
AArch64.Abort(toaddress, AlignmentFault(acctype, iswrite, secondstage));

integer tagstep;
bits(4) tag;
bits(64) tagaddr;

Mem[toaddress+setsize, B, acctype] = Replicate(data, B);

tagstep = B DIV 16;

tag = AArch64.AllocationTagFromAddress(toaddress + setsize);
while tagstep > 0 do
tagaddr = toaddress + setsize + (tagstep - 1) * 16;
AArch64.MemTag[tagaddr, acctype] = tag;
tagstep = tagstep - 1;

Mem[toaddress, B, acctype] = Replicate(data, B);

tagstep = B DIV 16;

tag = AArch64.AllocationTagFromAddress(toaddress);
while tagstep > 0 do
tagaddr = toaddress + (tagstep - 1) * 16;
AArch64.MemTag[tagaddr, acctype] = tag;
tagstep = tagstep - 1;

toaddress = toaddress + B;
setsize = setsize - B;
stagesetsize = stagesetsize - B;
if stage != MOPSStage_Prologue then
X[n] = setsize;
X[d] = toaddress;

if stage == MOPSStage_Prologue then

SETGPN, SETGMN, SETGEN Page 607

X[n] = setsize;
X[d] = toaddress;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

SETGPN, SETGMN, SETGEN Page 608

SETGPT, SETGMT, SETGET

Memory Set with tag setting, unprivileged. These instructions perform a memory set using the value in the bottom
byte of the source register and store an Allocation Tag to memory for each Tag Granule written. The Allocation Tag is
calculated from the Logical Address Tag in the register which holds the first address that the set is made to. The
prologue, main, and epilogue instructions are expected to be run in succession and to appear consecutively in
memory: SETGPT, then SETGMT, and then SETGET.
SETGPT performs some preconditioning of the arguments suitable for using the SETGMT instruction, and performs an
IMPLEMENTATION DEFINED amount of the memory set. SETGMT performs an IMPLEMENTATION DEFINED amount of the
memory set. SETGET performs the last part of the memory set.

Note

Portable software should not assume that the choice of algorithm is constant.
After execution of SETGPT, option A (which results in encoding PSTATE.C = 0):
• If Xn<63> == 1, the set size is saturated to 0x7FFFFFFFFFFFFFF0.
• Xd holds the original Xd + saturated Xn.
• Xn holds -1* saturated Xn + an IMPLEMENTATION DEFINED number of bytes set.
• PSTATE.{N,Z,V} are set to {0,0,0}.
After execution of SETGPT, option B (which results in encoding PSTATE.C = 1):
• If Xn<63> == 1, the copy size is saturated to 0x7FFFFFFFFFFFFFF0.
• Xd holds the original Xd + an IMPLEMENTATION DEFINED number of bytes set.
• Xn holds the saturated Xn - an IMPLEMENTATION DEFINED number of bytes set.
• PSTATE.{N,Z,V} are set to {0,0,0}.
For SETGMT, option A (encoded by PSTATE.C = 0), the format of the arguments is:
• Xn is treated as a signed 64-bit number.
• Xn holds -1* number of bytes remaining to be set in the memory set in total.
• Xd holds the lowest address that the set is made to -Xn.
• At the end of the instruction, the value of Xn is written back with -1* number of bytes remaining to be set in
the memory set in total.
For SETGMT, option B (encoded by PSTATE.C = 1), the format of the arguments is:
• Xn holds the number of bytes remaining to be set in the memory set in total.
• Xd holds the lowest address that the set is made to.
• At the end of the instruction:
◦ the value of Xn is written back with the number of bytes remaining to be set in the memory set in
total.
◦ the value of Xd is written back with the lowest address that has not been set.
For SETGET, option A (encoded by PSTATE.C = 0), the format of the arguments is:
• Xn is treated as a signed 64-bit number.
• Xn holds -1* the number of bytes remaining to be set in the memory set in total.
• Xd holds the lowest address that the set is made to -Xn.
• At the end of the instruction, the value of Xn is written back with 0.
For SETGET, option B (encoded by PSTATE.C = 1), the format of the arguments is:
• Xn holds the number of bytes remaining to be set in the memory set in total.
• Xd holds the lowest address that the set is made to.
• At the end of the instruction:
◦ the value of Xn is written back with 0.
◦ the value of Xd is written back with the lowest address that has not been set.

Integer
(FEAT_MOPS)

SETGPT, SETGMT, SETGET Page 609

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sz 0 1 1 1 0 1 1 1 0 Rs x x 0 1 0 1 Rn Rd
op2

Epilogue (op2 == 1001)

SETGET [<Xd>]!, <Xn>!, <Xs>

Main (op2 == 0101)

SETGMT [<Xd>]!, <Xn>!, <Xs>

Prologue (op2 == 0001)

SETGPT [<Xd>]!, <Xn>!, <Xs>

if !HaveFeatMOPS() then UNDEFINED;

if !HaveMTEExt() then UNDEFINED;
if sz != '00' then UNDEFINED;

integer d = UInt(Rd);
integer s = UInt(Rs);
integer n = UInt(Rn);
bits(2) options = op2<1:0>;

MOPSStage stage;
case op2<3:2> of
when '00' stage = MOPSStage_Prologue;
when '01' stage = MOPSStage_Main;
when '10' stage = MOPSStage_Epilogue;
otherwise UNDEFINED;

if s == n || s == d || n == d then UNDEFINED;
if d == 31 || n == 31 then UNDEFINED;

Assembler Symbols

SETGPT, SETGMT, SETGET Page 610

Operation

SETGPT, SETGMT, SETGET Page 611

CheckMOPSEnabled();

bits(64) toaddress = X[d];

bits(64) setsize = X[n];
bits(8) data = X[s];
bits(64) stagesetsize;
boolean is_setg = TRUE;
integer B;

if HaveMTE2Ext() then
SetTagCheckedInstruction(FALSE);

boolean supports_option_a = MemCpyOptionA();

acctype = MemSetAccessType(options);

if stage == MOPSStage_Prologue then

if setsize<63> == '1' then setsize = 0x7FFFFFFFFFFFFFF0<63:0>;

if setsize != Zeros(64) && toaddress != Align(toaddress, TAG_GRANULE) then

boolean iswrite = TRUE;
boolean secondstage = FALSE;
AArch64.Abort(toaddress, AlignmentFault(acctype, iswrite, secondstage));

if setsize != Align(setsize, TAG_GRANULE) then

boolean iswrite = TRUE;
boolean secondstage = FALSE;
AArch64.Abort(toaddress, AlignmentFault(acctype, iswrite, secondstage));

if supports_option_a then
PSTATE.C = '0';
toaddress = toaddress + setsize;
setsize = Zeros(64) - setsize;
else
PSTATE.C = '1';
PSTATE.N = '0';
PSTATE.V = '0';
PSTATE.Z = '0';

stagesetsize = SETPreSizeChoice(toaddress, setsize, is_setg);

assert stagesetsize<63> == setsize<63> || stagesetsize == Zeros();
assert stagesetsize<3:0> == '0000';

if SInt(setsize) > 0 then

boolean zero_size_exceptions = MemSetZeroSizeCheck();

// Check if this version is consistent with the state of the call.

if stage == MOPSStage_Main then

stagesetsize = setsize - postsize;
if MemSetParametersIllformedM(toaddress, setsize, is_setg) then
boolean wrong_option = FALSE;

SETGPT, SETGMT, SETGET Page 612

if setsize != Zeros(64) && toaddress != Align(toaddress, TAG_GRANULE) then

boolean iswrite = TRUE;
boolean secondstage = FALSE;
AArch64.Abort(toaddress, AlignmentFault(acctype, iswrite, secondstage));

if setsize != Align(setsize, TAG_GRANULE) then

boolean iswrite = TRUE;
boolean secondstage = FALSE;
AArch64.Abort(toaddress, AlignmentFault(acctype, iswrite, secondstage));

integer tagstep;
bits(4) tag;
bits(64) tagaddr;

Mem[toaddress+setsize, B, acctype] = Replicate(data, B);

tagstep = B DIV 16;

tag = AArch64.AllocationTagFromAddress(toaddress + setsize);
while tagstep > 0 do
tagaddr = toaddress + setsize + (tagstep - 1) * 16;
AArch64.MemTag[tagaddr, acctype] = tag;
tagstep = tagstep - 1;

Mem[toaddress, B, acctype] = Replicate(data, B);

tagstep = B DIV 16;

tag = AArch64.AllocationTagFromAddress(toaddress);
while tagstep > 0 do
tagaddr = toaddress + (tagstep - 1) * 16;
AArch64.MemTag[tagaddr, acctype] = tag;
tagstep = tagstep - 1;

toaddress = toaddress + B;
setsize = setsize - B;
stagesetsize = stagesetsize - B;
if stage != MOPSStage_Prologue then
X[n] = setsize;
X[d] = toaddress;

if stage == MOPSStage_Prologue then

SETGPT, SETGMT, SETGET Page 613

X[n] = setsize;
X[d] = toaddress;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

SETGPT, SETGMT, SETGET Page 614

SETGPTN, SETGMTN, SETGETN

Memory Set with tag setting, unprivileged and non-temporal. These instructions perform a memory set using the value
in the bottom byte of the source register and store an Allocation Tag to memory for each Tag Granule written. The
Allocation Tag is calculated from the Logical Address Tag in the register which holds the first address that the set is
made to. The prologue, main, and epilogue instructions are expected to be run in succession and to appear
consecutively in memory: SETGPTN, then SETGMTN, and then SETGETN.
SETGPTN performs some preconditioning of the arguments suitable for using the SETGMTN instruction, and performs
an IMPLEMENTATION DEFINED amount of the memory set. SETGMTN performs an IMPLEMENTATION DEFINED amount of the
memory set. SETGETN performs the last part of the memory set.

Note

Portable software should not assume that the choice of algorithm is constant.
After execution of SETGPTN, option A (which results in encoding PSTATE.C = 0):
• If Xn<63> == 1, the set size is saturated to 0x7FFFFFFFFFFFFFF0.
• Xd holds the original Xd + saturated Xn.
• Xn holds -1* saturated Xn + an IMPLEMENTATION DEFINED number of bytes set.
• PSTATE.{N,Z,V} are set to {0,0,0}.
After execution of SETGPTN, option B (which results in encoding PSTATE.C = 1):
• If Xn<63> == 1, the copy size is saturated to 0x7FFFFFFFFFFFFFF0.
• Xd holds the original Xd + an IMPLEMENTATION DEFINED number of bytes set.
• Xn holds the saturated Xn - an IMPLEMENTATION DEFINED number of bytes set.
• PSTATE.{N,Z,V} are set to {0,0,0}.
For SETGMTN, option A (encoded by PSTATE.C = 0), the format of the arguments is:
• Xn is treated as a signed 64-bit number.
• Xn holds -1* number of bytes remaining to be set in the memory set in total.
• Xd holds the lowest address that the set is made to -Xn.
• At the end of the instruction, the value of Xn is written back with -1* number of bytes remaining to be set in
the memory set in total.
For SETGMTN, option B (encoded by PSTATE.C = 1), the format of the arguments is:
• Xn holds the number of bytes remaining to be set in the memory set in total.
• Xd holds the lowest address that the set is made to.
• At the end of the instruction:
◦ the value of Xn is written back with the number of bytes remaining to be set in the memory set in
total.
◦ the value of Xd is written back with the lowest address that has not been set.
For SETGETN, option A (encoded by PSTATE.C = 0), the format of the arguments is:
• Xn is treated as a signed 64-bit number.
• Xn holds -1* the number of bytes remaining to be set in the memory set in total.
• Xd holds the lowest address that the set is made to -Xn.
• At the end of the instruction, the value of Xn is written back with 0.
For SETGETN, option B (encoded by PSTATE.C = 1), the format of the arguments is:
• Xn holds the number of bytes remaining to be set in the memory set in total.
• Xd holds the lowest address that the set is made to.
• At the end of the instruction:
◦ the value of Xn is written back with 0.
◦ the value of Xd is written back with the lowest address that has not been set.

Integer
(FEAT_MOPS)

SETGPTN, SETGMTN,
Page 615
SETGETN
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sz 0 1 1 1 0 1 1 1 0 Rs x x 1 1 0 1 Rn Rd
op2

Epilogue (op2 == 1011)

SETGETN [<Xd>]!, <Xn>!, <Xs>

Main (op2 == 0111)

SETGMTN [<Xd>]!, <Xn>!, <Xs>

Prologue (op2 == 0011)

SETGPTN [<Xd>]!, <Xn>!, <Xs>

if !HaveFeatMOPS() then UNDEFINED;

if !HaveMTEExt() then UNDEFINED;
if sz != '00' then UNDEFINED;

integer d = UInt(Rd);
integer s = UInt(Rs);
integer n = UInt(Rn);
bits(2) options = op2<1:0>;

MOPSStage stage;
case op2<3:2> of
when '00' stage = MOPSStage_Prologue;
when '01' stage = MOPSStage_Main;
when '10' stage = MOPSStage_Epilogue;
otherwise UNDEFINED;

if s == n || s == d || n == d then UNDEFINED;
if d == 31 || n == 31 then UNDEFINED;

Assembler Symbols

SETGPTN, SETGMTN,
Page 616
SETGETN
Operation

SETGPTN, SETGMTN,
Page 617
SETGETN
CheckMOPSEnabled();

bits(64) toaddress = X[d];

bits(64) setsize = X[n];
bits(8) data = X[s];
bits(64) stagesetsize;
boolean is_setg = TRUE;
integer B;

if HaveMTE2Ext() then
SetTagCheckedInstruction(FALSE);

boolean supports_option_a = MemCpyOptionA();

acctype = MemSetAccessType(options);

if stage == MOPSStage_Prologue then

if setsize<63> == '1' then setsize = 0x7FFFFFFFFFFFFFF0<63:0>;

if setsize != Zeros(64) && toaddress != Align(toaddress, TAG_GRANULE) then

boolean iswrite = TRUE;
boolean secondstage = FALSE;
AArch64.Abort(toaddress, AlignmentFault(acctype, iswrite, secondstage));

if setsize != Align(setsize, TAG_GRANULE) then

boolean iswrite = TRUE;
boolean secondstage = FALSE;
AArch64.Abort(toaddress, AlignmentFault(acctype, iswrite, secondstage));

if supports_option_a then
PSTATE.C = '0';
toaddress = toaddress + setsize;
setsize = Zeros(64) - setsize;
else
PSTATE.C = '1';
PSTATE.N = '0';
PSTATE.V = '0';
PSTATE.Z = '0';

stagesetsize = SETPreSizeChoice(toaddress, setsize, is_setg);

assert stagesetsize<63> == setsize<63> || stagesetsize == Zeros();
assert stagesetsize<3:0> == '0000';

if SInt(setsize) > 0 then

boolean zero_size_exceptions = MemSetZeroSizeCheck();

// Check if this version is consistent with the state of the call.

if stage == MOPSStage_Main then

stagesetsize = setsize - postsize;
if MemSetParametersIllformedM(toaddress, setsize, is_setg) then
boolean wrong_option = FALSE;

SETGPTN, SETGMTN,
Page 618
SETGETN
boolean from_epilogue = FALSE;
MismatchedMemSetException(supports_option_a, d, s, n, wrong_option, from_epilogue, options, i
else
stagesetsize = postsize;
if (setsize != postsize || MemSetParametersIllformedE(toaddress, setsize, is_setg)) then
boolean wrong_option = FALSE;
boolean from_epilogue = TRUE;
MismatchedMemSetException(supports_option_a, d, s, n, wrong_option, from_epilogue, options, i

if setsize != Zeros(64) && toaddress != Align(toaddress, TAG_GRANULE) then

boolean iswrite = TRUE;
boolean secondstage = FALSE;
AArch64.Abort(toaddress, AlignmentFault(acctype, iswrite, secondstage));

if setsize != Align(setsize, TAG_GRANULE) then

boolean iswrite = TRUE;
boolean secondstage = FALSE;
AArch64.Abort(toaddress, AlignmentFault(acctype, iswrite, secondstage));

integer tagstep;
bits(4) tag;
bits(64) tagaddr;

Mem[toaddress+setsize, B, acctype] = Replicate(data, B);

tagstep = B DIV 16;

tag = AArch64.AllocationTagFromAddress(toaddress + setsize);
while tagstep > 0 do
tagaddr = toaddress + setsize + (tagstep - 1) * 16;
AArch64.MemTag[tagaddr, acctype] = tag;
tagstep = tagstep - 1;

Mem[toaddress, B, acctype] = Replicate(data, B);

tagstep = B DIV 16;

tag = AArch64.AllocationTagFromAddress(toaddress);
while tagstep > 0 do
tagaddr = toaddress + (tagstep - 1) * 16;
AArch64.MemTag[tagaddr, acctype] = tag;
tagstep = tagstep - 1;

toaddress = toaddress + B;
setsize = setsize - B;
stagesetsize = stagesetsize - B;
if stage != MOPSStage_Prologue then
X[n] = setsize;
X[d] = toaddress;

if stage == MOPSStage_Prologue then

SETGPTN, SETGMTN,
Page 619
SETGETN
X[n] = setsize;
X[d] = toaddress;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

SETGPTN, SETGMTN,
Page 620
SETGETN
SETP, SETM, SETE

Memory Set. These instructions perform a memory set using the value in the bottom byte of the source register. The
prologue, main, and epilogue instructions are expected to be run in succession and to appear consecutively in
memory: SETP, then SETM, and then SETE.
SETP performs some preconditioning of the arguments suitable for using the SETM instruction, and performs an
IMPLEMENTATION DEFINED amount of the memory set. SETM performs an IMPLEMENTATION DEFINED amount of the
memory set. SETE performs the last part of the memory set.

Note

Portable software should not assume that the choice of algorithm is constant.
After execution of SETP, option A (which results in encoding PSTATE.C = 0):
• If Xn<63> == 1, the set size is saturated to 0x7FFFFFFFFFFFFFFF.
• Xd holds the original Xd + saturated Xn.
• Xn holds -1* saturated Xn + an IMPLEMENTATION DEFINED number of bytes set.
• PSTATE.{N,Z,V} are set to {0,0,0}.
After execution of SETP, option B (which results in encoding PSTATE.C = 1):
• If Xn<63> == 1, the copy size is saturated to 0x7FFFFFFFFFFFFFFF.
• Xd holds the original Xd + an IMPLEMENTATION DEFINED number of bytes set.
• Xn holds the saturated Xn - an IMPLEMENTATION DEFINED number of bytes set.
• PSTATE.{N,Z,V} are set to {0,0,0}.
For SETM, option A (encoded by PSTATE.C = 0), the format of the arguments is:
• Xn is treated as a signed 64-bit number.
• Xn holds -1* number of bytes remaining to be set in the memory set in total.
• Xd holds the lowest address that the set is made to -Xn.
• At the end of the instruction, the value of Xn is written back with -1* the number of bytes remaining to be set
in the memory set in total.
For SETM, option B (encoded by PSTATE.C = 1), the format of the arguments is:
• Xn holds the number of bytes remaining to be set in the memory set in total.
• Xd holds the lowest address that the set is made to.
• At the end of the instruction:
◦ the value of Xn is written back with the number of bytes remaining to be set in the memory set in
total.
◦ the value of Xd is written back with the lowest address that has not been set.
For SETE, option A (encoded by PSTATE.C = 0), the format of the arguments is:
• Xn is treated as a signed 64-bit number.
• Xn holds -1* the number of bytes remaining to be set in the memory set in total.
• Xd holds the lowest address that the set is made to -Xn.
• At the end of the instruction, the value of Xn is written back with 0.
For SETE, option B (encoded by PSTATE.C = 1), the format of the arguments is:
• Xn holds the number of bytes remaining to be set in the memory set in total.
• Xd holds the lowest address that the set is made to.
• At the end of the instruction:
◦ the value of Xn is written back with 0.
◦ the value of Xd is written back with the lowest address that has not been set.

Integer
(FEAT_MOPS)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sz 0 1 1 0 0 1 1 1 0 Rs x x 0 0 0 1 Rn Rd
op2

SETP, SETM, SETE Page 621

Epilogue (op2 == 1000)

SETE [<Xd>]!, <Xn>!, <Xs>

Main (op2 == 0100)

SETM [<Xd>]!, <Xn>!, <Xs>

Prologue (op2 == 0000)

SETP [<Xd>]!, <Xn>!, <Xs>

if !HaveFeatMOPS() then UNDEFINED;

if sz != '00' then UNDEFINED;

integer d = UInt(Rd);
integer s = UInt(Rs);
integer n = UInt(Rn);
bits(2) options = op2<1:0>;

MOPSStage stage;
case op2<3:2> of
when '00' stage = MOPSStage_Prologue;
when '01' stage = MOPSStage_Main;
when '10' stage = MOPSStage_Epilogue;
otherwise UNDEFINED;

if s == n || s == d || n == d then UNDEFINED;
if d == 31 || n == 31 then UNDEFINED;

Assembler Symbols

<Xd> For the epilogue and main variant: is the 64-bit name of the general-purpose register that holds an
encoding of the destination address and for option B is updated by the instruction, encoded in the "Rd"
field.
For the prologue variant: is the 64-bit name of the general-purpose register that holds the destination
address and is updated by the instruction, encoded in the "Rd" field.
<Xn> For the epilogue variant: is the 64-bit name of the general-purpose register that holds the number of
bytes to be set and is set to zero at the end of the instruction, encoded in the "Rn" field.
For the main variant: is the 64-bit name of the general-purpose register that holds an encoding of the
number of bytes to be set and is updated by the instruction, encoded in the "Rn" field.
For the prologue variant: is the 64-bit name of the general-purpose register that holds the number of
bytes to be set and is updated by the instruction, encoded in the "Rn" field.
<Xs> Is the 64-bit name of the general-purpose register that holds the source data, encoded in the "Rs" field.

SETP, SETM, SETE Page 622

Operation

SETP, SETM, SETE Page 623

CheckMOPSEnabled();

bits(64) toaddress = X[d];

bits(64) setsize = X[n];
bits(8) data = X[s];
bits(64) stagesetsize;
boolean is_setg = FALSE;
integer B;

if HaveMTE2Ext() then
SetTagCheckedInstruction(TRUE);

boolean supports_option_a = MemCpyOptionA();

acctype = MemSetAccessType(options);

if stage == MOPSStage_Prologue then

if setsize<63> == '1' then setsize = 0x007FFFFFFFFFFFF0<63:0>;

if supports_option_a then
PSTATE.C = '0';
toaddress = toaddress + setsize;
setsize = Zeros(64) - setsize;
else
PSTATE.C = '1';
PSTATE.N = '0';
PSTATE.V = '0';
PSTATE.Z = '0';

stagesetsize = SETPreSizeChoice(toaddress, setsize, is_setg);

assert stagesetsize<63> == setsize<63> || stagesetsize == Zeros();

if SInt(setsize) > 0 then

boolean zero_size_exceptions = MemSetZeroSizeCheck();

// Check if this version is consistent with the state of the call.

if stage == MOPSStage_Main then

stagesetsize = setsize - postsize;
if MemSetParametersIllformedM(toaddress, setsize, is_setg) then
boolean wrong_option = FALSE;
boolean from_epilogue = FALSE;
MismatchedMemSetException(supports_option_a, d, s, n, wrong_option, from_epilogue, options, i
else
stagesetsize = postsize;
if (setsize != postsize || MemSetParametersIllformedE(toaddress, setsize, is_setg)) then
boolean wrong_option = FALSE;
boolean from_epilogue = TRUE;
MismatchedMemSetException(supports_option_a, d, s, n, wrong_option, from_epilogue, options, i

if supports_option_a then
while SInt(stagesetsize) < 0 do
// IMP DEF selection of the block size that is worked on. While many

SETP, SETM, SETE Page 624

// implementations might make this constant, that is not assumed.
B = SETSizeChoice(toaddress, setsize, 1);
assert B <= -1 * SInt(stagesetsize);

Mem[toaddress+setsize, B, acctype] = Replicate(data, B);

Mem[toaddress, B, acctype] = Replicate(data, B);

toaddress = toaddress + B;
setsize = setsize - B;
stagesetsize = stagesetsize - B;
if stage != MOPSStage_Prologue then
X[n] = setsize;
X[d] = toaddress;

if stage == MOPSStage_Prologue then

X[n] = setsize;
X[d] = toaddress;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

SETP, SETM, SETE Page 625

SETPN, SETMN, SETEN

Memory Set, non-temporal. These instructions perform a memory set using the value in the bottom byte of the source
register. The prologue, main, and epilogue instructions are expected to be run in succession and to appear
consecutively in memory: SETPN, then SETMN, and then SETEN.
SETPN performs some preconditioning of the arguments suitable for using the SETMN instruction, and performs an
IMPLEMENTATION DEFINED amount of the memory set. SETMN performs an IMPLEMENTATION DEFINED amount of the
memory set. SETEN performs the last part of the memory set.

Note

Portable software should not assume that the choice of algorithm is constant.
After execution of SETPN, option A (which results in encoding PSTATE.C = 0):
• If Xn<63> == 1, the set size is saturated to 0x7FFFFFFFFFFFFFFF.
• Xd holds the original Xd + saturated Xn.
• Xn holds -1* saturated Xn + an IMPLEMENTATION DEFINED number of bytes set.
• PSTATE.{N,Z,V} are set to {0,0,0}.
After execution of SETPN, option B (which results in encoding PSTATE.C = 1):
• If Xn<63> == 1, the copy size is saturated to 0x7FFFFFFFFFFFFFFF.
• Xd holds the original Xd + an IMPLEMENTATION DEFINED number of bytes set.
• Xn holds the saturated Xn - an IMPLEMENTATION DEFINED number of bytes set.
• PSTATE.{N,Z,V} are set to {0,0,0}.
For SETMN, option A (encoded by PSTATE.C = 0), the format of the arguments is:
• Xn is treated as a signed 64-bit number.
• Xn holds -1* number of bytes remaining to be set in the memory set in total.
• Xd holds the lowest address that the set is made to -Xn.
• At the end of the instruction, the value of Xn is written back with -1* the number of bytes remaining to be set
in the memory set in total.
For SETMN, option B (encoded by PSTATE.C = 1), the format of the arguments is:
• Xn holds the number of bytes remaining to be set in the memory set in total.
• Xd holds the lowest address that the set is made to.
• At the end of the instruction:
◦ the value of Xn is written back with the number of bytes remaining to be set in the memory set in
total.
◦ the value of Xd is written back with the lowest address that has not been set.
For SETEN, option A (encoded by PSTATE.C = 0), the format of the arguments is:
• Xn is treated as a signed 64-bit number.
• Xn holds -1* the number of bytes remaining to be set in the memory set in total.
• Xd holds the lowest address that the set is made to -Xn.
• At the end of the instruction, the value of Xn is written back with 0.
For SETEN, option B (encoded by PSTATE.C = 1), the format of the arguments is:
• Xn holds the number of bytes remaining to be set in the memory set in total.
• Xd holds the lowest address that the set is made to.
• At the end of the instruction:
◦ the value of Xn is written back with 0.
◦ the value of Xd is written back with the lowest address that has not been set.

Integer
(FEAT_MOPS)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sz 0 1 1 0 0 1 1 1 0 Rs x x 1 0 0 1 Rn Rd
op2

SETPN, SETMN, SETEN Page 626

Epilogue (op2 == 1010)

SETEN [<Xd>]!, <Xn>!, <Xs>

Main (op2 == 0110)

SETMN [<Xd>]!, <Xn>!, <Xs>

Prologue (op2 == 0010)

SETPN [<Xd>]!, <Xn>!, <Xs>

if !HaveFeatMOPS() then UNDEFINED;

if sz != '00' then UNDEFINED;

integer d = UInt(Rd);
integer s = UInt(Rs);
integer n = UInt(Rn);
bits(2) options = op2<1:0>;

MOPSStage stage;
case op2<3:2> of
when '00' stage = MOPSStage_Prologue;
when '01' stage = MOPSStage_Main;
when '10' stage = MOPSStage_Epilogue;
otherwise UNDEFINED;

if s == n || s == d || n == d then UNDEFINED;
if d == 31 || n == 31 then UNDEFINED;

Assembler Symbols

SETPN, SETMN, SETEN Page 627

Operation

SETPN, SETMN, SETEN Page 628

CheckMOPSEnabled();

bits(64) toaddress = X[d];

bits(64) setsize = X[n];
bits(8) data = X[s];
bits(64) stagesetsize;
boolean is_setg = FALSE;
integer B;

if HaveMTE2Ext() then
SetTagCheckedInstruction(TRUE);

boolean supports_option_a = MemCpyOptionA();

acctype = MemSetAccessType(options);

if stage == MOPSStage_Prologue then

if setsize<63> == '1' then setsize = 0x007FFFFFFFFFFFF0<63:0>;

if supports_option_a then
PSTATE.C = '0';
toaddress = toaddress + setsize;
setsize = Zeros(64) - setsize;
else
PSTATE.C = '1';
PSTATE.N = '0';
PSTATE.V = '0';
PSTATE.Z = '0';

stagesetsize = SETPreSizeChoice(toaddress, setsize, is_setg);

assert stagesetsize<63> == setsize<63> || stagesetsize == Zeros();

if SInt(setsize) > 0 then

boolean zero_size_exceptions = MemSetZeroSizeCheck();

// Check if this version is consistent with the state of the call.

if stage == MOPSStage_Main then

if supports_option_a then
while SInt(stagesetsize) < 0 do
// IMP DEF selection of the block size that is worked on. While many

SETPN, SETMN, SETEN Page 629

// implementations might make this constant, that is not assumed.
B = SETSizeChoice(toaddress, setsize, 1);
assert B <= -1 * SInt(stagesetsize);

Mem[toaddress+setsize, B, acctype] = Replicate(data, B);

Mem[toaddress, B, acctype] = Replicate(data, B);

toaddress = toaddress + B;
setsize = setsize - B;
stagesetsize = stagesetsize - B;
if stage != MOPSStage_Prologue then
X[n] = setsize;
X[d] = toaddress;

if stage == MOPSStage_Prologue then

X[n] = setsize;
X[d] = toaddress;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

SETPN, SETMN, SETEN Page 630

SETPT, SETMT, SETET

Memory Set, unprivileged. These instructions perform a memory set using the value in the bottom byte of the source
register. The prologue, main, and epilogue instructions are expected to be run in succession and to appear
consecutively in memory: SETPT, then SETMT, and then SETET.
SETPT performs some preconditioning of the arguments suitable for using the SETMT instruction, and performs an
IMPLEMENTATION DEFINED amount of the memory set. SETMT performs an IMPLEMENTATION DEFINED amount of the
memory set. SETET performs the last part of the memory set.

Note

Portable software should not assume that the choice of algorithm is constant.
After execution of SETPT, option A (which results in encoding PSTATE.C = 0):
• If Xn<63> == 1, the set size is saturated to 0x7FFFFFFFFFFFFFFF.
• Xd holds the original Xd + saturated Xn.
• Xn holds -1* saturated Xn + an IMPLEMENTATION DEFINED number of bytes set.
• PSTATE.{N,Z,V} are set to {0,0,0}.
After execution of SETPT, option B (which results in encoding PSTATE.C = 1):
• If Xn<63> == 1, the copy size is saturated to 0x7FFFFFFFFFFFFFFF.
• Xd holds the original Xd + an IMPLEMENTATION DEFINED number of bytes set.
• Xn holds the saturated Xn - an IMPLEMENTATION DEFINED number of bytes set.
• PSTATE.{N,Z,V} are set to {0,0,0}.
For SETMT, option A (encoded by PSTATE.C = 0), the format of the arguments is:
• Xn is treated as a signed 64-bit number.
• Xn holds -1* number of bytes remaining to be set in the memory set in total.
• Xd holds the lowest address that the set is made to -Xn.
• At the end of the instruction, the value of Xn is written back with -1* the number of bytes remaining to be set
in the memory set in total.
For SETMT, option B (encoded by PSTATE.C = 1), the format of the arguments is:
• Xn holds the number of bytes remaining to be set in the memory set in total.
• Xd holds the lowest address that the set is made to.
• At the end of the instruction:
◦ the value of Xn is written back with the number of bytes remaining to be set in the memory set in
total.
◦ the value of Xd is written back with the lowest address that has not been set.
For SETET, option A (encoded by PSTATE.C = 0), the format of the arguments is:
• Xn is treated as a signed 64-bit number.
• Xn holds -1* the number of bytes remaining to be set in the memory set in total.
• Xd holds the lowest address that the set is made to -Xn.
• At the end of the instruction, the value of Xn is written back with 0.
For SETET, option B (encoded by PSTATE.C = 1), the format of the arguments is:
• Xn holds the number of bytes remaining to be set in the memory set in total.
• Xd holds the lowest address that the set is made to.
• At the end of the instruction:
◦ the value of Xn is written back with 0.
◦ the value of Xd is written back with the lowest address that has not been set.

Integer
(FEAT_MOPS)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sz 0 1 1 0 0 1 1 1 0 Rs x x 0 1 0 1 Rn Rd
op2

SETPT, SETMT, SETET Page 631

Epilogue (op2 == 1001)

SETET [<Xd>]!, <Xn>!, <Xs>

Main (op2 == 0101)

SETMT [<Xd>]!, <Xn>!, <Xs>

Prologue (op2 == 0001)

SETPT [<Xd>]!, <Xn>!, <Xs>

if !HaveFeatMOPS() then UNDEFINED;

if sz != '00' then UNDEFINED;

integer d = UInt(Rd);
integer s = UInt(Rs);
integer n = UInt(Rn);
bits(2) options = op2<1:0>;

MOPSStage stage;
case op2<3:2> of
when '00' stage = MOPSStage_Prologue;
when '01' stage = MOPSStage_Main;
when '10' stage = MOPSStage_Epilogue;
otherwise UNDEFINED;

if s == n || s == d || n == d then UNDEFINED;
if d == 31 || n == 31 then UNDEFINED;

Assembler Symbols

SETPT, SETMT, SETET Page 632

Operation

SETPT, SETMT, SETET Page 633

CheckMOPSEnabled();

bits(64) toaddress = X[d];

bits(64) setsize = X[n];
bits(8) data = X[s];
bits(64) stagesetsize;
boolean is_setg = FALSE;
integer B;

if HaveMTE2Ext() then
SetTagCheckedInstruction(TRUE);

boolean supports_option_a = MemCpyOptionA();

acctype = MemSetAccessType(options);

if stage == MOPSStage_Prologue then

if setsize<63> == '1' then setsize = 0x007FFFFFFFFFFFF0<63:0>;

if supports_option_a then
PSTATE.C = '0';
toaddress = toaddress + setsize;
setsize = Zeros(64) - setsize;
else
PSTATE.C = '1';
PSTATE.N = '0';
PSTATE.V = '0';
PSTATE.Z = '0';

stagesetsize = SETPreSizeChoice(toaddress, setsize, is_setg);

assert stagesetsize<63> == setsize<63> || stagesetsize == Zeros();

if SInt(setsize) > 0 then

boolean zero_size_exceptions = MemSetZeroSizeCheck();

// Check if this version is consistent with the state of the call.

if stage == MOPSStage_Main then

if supports_option_a then
while SInt(stagesetsize) < 0 do
// IMP DEF selection of the block size that is worked on. While many

SETPT, SETMT, SETET Page 634

// implementations might make this constant, that is not assumed.
B = SETSizeChoice(toaddress, setsize, 1);
assert B <= -1 * SInt(stagesetsize);

Mem[toaddress+setsize, B, acctype] = Replicate(data, B);

Mem[toaddress, B, acctype] = Replicate(data, B);

toaddress = toaddress + B;
setsize = setsize - B;
stagesetsize = stagesetsize - B;
if stage != MOPSStage_Prologue then
X[n] = setsize;
X[d] = toaddress;

if stage == MOPSStage_Prologue then

X[n] = setsize;
X[d] = toaddress;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

SETPT, SETMT, SETET Page 635

SETPTN, SETMTN, SETETN

Memory Set, unprivileged and non-temporal. These instructions perform a memory set using the value in the bottom
byte of the source register. The prologue, main, and epilogue instructions are expected to be run in succession and to
appear consecutively in memory: SETPTN, then SETMTN, and then SETETN.
SETPTN performs some preconditioning of the arguments suitable for using the SETMTN instruction, and performs an
IMPLEMENTATION DEFINED amount of the memory set. SETMTN performs an IMPLEMENTATION DEFINED amount of the
memory set. SETETN performs the last part of the memory set.

Note

Portable software should not assume that the choice of algorithm is constant.
After execution of SETPTN, option A (which results in encoding PSTATE.C = 0):
• If Xn<63> == 1, the set size is saturated to 0x7FFFFFFFFFFFFFFF.
• Xd holds the original Xd + saturated Xn.
• Xn holds -1* saturated Xn + an IMPLEMENTATION DEFINED number of bytes set.
• PSTATE.{N,Z,V} are set to {0,0,0}.
After execution of SETPTN, option B (which results in encoding PSTATE.C = 1):
• If Xn<63> == 1, the copy size is saturated to 0x7FFFFFFFFFFFFFFF.
• Xd holds the original Xd + an IMPLEMENTATION DEFINED number of bytes set.
• Xn holds the saturated Xn - an IMPLEMENTATION DEFINED number of bytes set.
• PSTATE.{N,Z,V} are set to {0,0,0}.
For SETMTN, option A (encoded by PSTATE.C = 0), the format of the arguments is:
• Xn is treated as a signed 64-bit number.
• Xn holds -1* number of bytes remaining to be set in the memory set in total.
• Xd holds the lowest address that the set is made to -Xn.
• At the end of the instruction, the value of Xn is written back with -1* the number of bytes remaining to be set
in the memory set in total.
For SETMTN, option B (encoded by PSTATE.C = 1), the format of the arguments is:
• Xn holds the number of bytes remaining to be set in the memory set in total.
• Xd holds the lowest address that the set is made to.
• At the end of the instruction:
◦ the value of Xn is written back with the number of bytes remaining to be set in the memory set in
total.
◦ the value of Xd is written back with the lowest address that has not been set.
For SETETN, option A (encoded by PSTATE.C = 0), the format of the arguments is:
• Xn is treated as a signed 64-bit number.
• Xn holds -1* the number of bytes remaining to be set in the memory set in total.
• Xd holds the lowest address that the set is made to -Xn.
• At the end of the instruction, the value of Xn is written back with 0.
For SETETN, option B (encoded by PSTATE.C = 1), the format of the arguments is:
• Xn holds the number of bytes remaining to be set in the memory set in total.
• Xd holds the lowest address that the set is made to.
• At the end of the instruction:
◦ the value of Xn is written back with 0.
◦ the value of Xd is written back with the lowest address that has not been set.

Integer
(FEAT_MOPS)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sz 0 1 1 0 0 1 1 1 0 Rs x x 1 1 0 1 Rn Rd
op2

SETPTN, SETMTN, SETETN Page 636

Epilogue (op2 == 1011)

SETETN [<Xd>]!, <Xn>!, <Xs>

Main (op2 == 0111)

SETMTN [<Xd>]!, <Xn>!, <Xs>

Prologue (op2 == 0011)

SETPTN [<Xd>]!, <Xn>!, <Xs>

if !HaveFeatMOPS() then UNDEFINED;

if sz != '00' then UNDEFINED;

integer d = UInt(Rd);
integer s = UInt(Rs);
integer n = UInt(Rn);
bits(2) options = op2<1:0>;

MOPSStage stage;
case op2<3:2> of
when '00' stage = MOPSStage_Prologue;
when '01' stage = MOPSStage_Main;
when '10' stage = MOPSStage_Epilogue;
otherwise UNDEFINED;

if s == n || s == d || n == d then UNDEFINED;
if d == 31 || n == 31 then UNDEFINED;

Assembler Symbols

SETPTN, SETMTN, SETETN Page 637

Operation

SETPTN, SETMTN, SETETN Page 638

CheckMOPSEnabled();

bits(64) toaddress = X[d];

bits(64) setsize = X[n];
bits(8) data = X[s];
bits(64) stagesetsize;
boolean is_setg = FALSE;
integer B;

if HaveMTE2Ext() then
SetTagCheckedInstruction(TRUE);

boolean supports_option_a = MemCpyOptionA();

acctype = MemSetAccessType(options);

if stage == MOPSStage_Prologue then

if setsize<63> == '1' then setsize = 0x007FFFFFFFFFFFF0<63:0>;

if supports_option_a then
PSTATE.C = '0';
toaddress = toaddress + setsize;
setsize = Zeros(64) - setsize;
else
PSTATE.C = '1';
PSTATE.N = '0';
PSTATE.V = '0';
PSTATE.Z = '0';

stagesetsize = SETPreSizeChoice(toaddress, setsize, is_setg);

assert stagesetsize<63> == setsize<63> || stagesetsize == Zeros();

if SInt(setsize) > 0 then

boolean zero_size_exceptions = MemSetZeroSizeCheck();

// Check if this version is consistent with the state of the call.

if stage == MOPSStage_Main then

if supports_option_a then
while SInt(stagesetsize) < 0 do
// IMP DEF selection of the block size that is worked on. While many

SETPTN, SETMTN, SETETN Page 639

// implementations might make this constant, that is not assumed.
B = SETSizeChoice(toaddress, setsize, 1);
assert B <= -1 * SInt(stagesetsize);

Mem[toaddress+setsize, B, acctype] = Replicate(data, B);

Mem[toaddress, B, acctype] = Replicate(data, B);

toaddress = toaddress + B;
setsize = setsize - B;
stagesetsize = stagesetsize - B;
if stage != MOPSStage_Prologue then
X[n] = setsize;
X[d] = toaddress;

if stage == MOPSStage_Prologue then

X[n] = setsize;
X[d] = toaddress;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

SETPTN, SETMTN, SETETN Page 640

SEV

Send Event is a hint instruction. It causes an event to be signaled to all PEs in the multiprocessor system. For more
information, see Wait for Event mechanism and Send event.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 0 1 0 1 0 0 0 0 0 0 1 1 0 0 1 0 0 0 0 0 1 0 0 1 1 1 1 1
CRm op2

SEV

// Empty.

Operation

SendEvent();

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

SEV Page 641

SEVL

Send Event Local is a hint instruction that causes an event to be signaled locally without requiring the event to be
signaled to other PEs in the multiprocessor system. It can prime a wait-loop which starts with a WFE instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 0 1 0 1 0 0 0 0 0 0 1 1 0 0 1 0 0 0 0 0 1 0 1 1 1 1 1 1
CRm op2

SEVL

// Empty.

Operation

SendEventLocal();

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

SEVL Page 642

SMADDL

Signed Multiply-Add Long multiplies two 32-bit register values, adds a 64-bit register value, and writes the result to
the 64-bit destination register.
This instruction is used by the alias SMULL.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 0 1 1 0 1 1 0 0 1 Rm 0 Ra Rn Rd
U o0

SMADDL <Xd>, <Wn>, <Wm>, <Xa>

integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer a = UInt(Ra);

Assembler Symbols

<Xd> Is the 64-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Wn> Is the 32-bit name of the first general-purpose source register holding the multiplicand, encoded in the
"Rn" field.
<Wm> Is the 32-bit name of the second general-purpose source register holding the multiplier, encoded in the
"Rm" field.
<Xa> Is the 64-bit name of the third general-purpose source register holding the addend, encoded in the "Ra"
field.

Alias Conditions

Alias Is preferred when

SMULL Ra == '11111'

Operation

bits(32) operand1 = X[n];

bits(32) operand2 = X[m];
bits(64) operand3 = X[a];

integer result;

result = Int(operand3, FALSE) + (Int(operand1, FALSE) * Int(operand2, FALSE));

X[d] = result<63:0>;

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

SMADDL Page 643

SMC

Secure Monitor Call causes an exception to EL3.

SMC is available only for software executing at EL1 or higher. It is UNDEFINED in EL0.
If the values of HCR_EL2.TSC and SCR_EL3.SMD are both 0, execution of an SMC instruction at EL1 or higher
generates a Secure Monitor Call exception, recording it in ESR_ELx, using the EC value 0x17, that is taken to EL3.
If the value of HCR_EL2.TSC is 1 and EL2 is enabled in the current Security state, execution of an SMC instruction at
EL1 generates an exception that is taken to EL2, regardless of the value of SCR_EL3.SMD.
If the value of HCR_EL2.TSC is 0 and the value of SCR_EL3.SMD is 1, the SMC instruction is UNDEFINED.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 0 1 0 0 0 0 0 imm16 0 0 0 1 1

SMC #<imm>

// Empty.

Assembler Symbols

<imm> Is a 16-bit unsigned immediate, in the range 0 to 65535, encoded in the "imm16" field.

Operation

AArch64.CheckForSMCUndefOrTrap(imm16);
AArch64.CallSecureMonitor(imm16);

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

SMC Page 644

SMNEGL

Signed Multiply-Negate Long multiplies two 32-bit register values, negates the product, and writes the result to the
64-bit destination register.

This is an alias of SMSUBL. This means:

• The encodings in this description are named to match the encodings of SMSUBL.
• The description of SMSUBL gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 0 1 1 0 1 1 0 0 1 Rm 1 1 1 1 1 1 Rn Rd
U o0 Ra

SMNEGL <Xd>, <Wn>, <Wm>

is equivalent to

SMSUBL <Xd>, <Wn>, <Wm>, XZR

and is always the preferred disassembly.

Assembler Symbols

Operation

The description of SMSUBL gives the operational pseudocode for this instruction.

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

SMNEGL Page 645

SMSUBL

Signed Multiply-Subtract Long multiplies two 32-bit register values, subtracts the product from a 64-bit register value,
and writes the result to the 64-bit destination register.
This instruction is used by the alias SMNEGL.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 0 1 1 0 1 1 0 0 1 Rm 1 Ra Rn Rd
U o0

SMSUBL <Xd>, <Wn>, <Wm>, <Xa>

integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer a = UInt(Ra);

Assembler Symbols

Alias Conditions

Alias Is preferred when

SMNEGL Ra == '11111'

Operation

bits(32) operand1 = X[n];

bits(32) operand2 = X[m];
bits(64) operand3 = X[a];

integer result;

result = Int(operand3, FALSE) - (Int(operand1, FALSE) * Int(operand2, FALSE));

X[d] = result<63:0>;

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

SMSUBL Page 646

SMULH

Signed Multiply High multiplies two 64-bit register values, and writes bits[127:64] of the 128-bit result to the 64-bit
destination register.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 0 1 1 0 1 1 0 1 0 Rm 0 (1) (1) (1) (1) (1) Rn Rd
U Ra

SMULH <Xd>, <Xn>, <Xm>

integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);

Assembler Symbols

<Xd> Is the 64-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Xn> Is the 64-bit name of the first general-purpose source register holding the multiplicand, encoded in the
"Rn" field.
<Xm> Is the 64-bit name of the second general-purpose source register holding the multiplier, encoded in the
"Rm" field.

Operation

bits(64) operand1 = X[n];

bits(64) operand2 = X[m];

integer result;

result = Int(operand1, FALSE) * Int(operand2, FALSE);

X[d] = result<127:64>;

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

SMULH Page 647

SMULL

Signed Multiply Long multiplies two 32-bit register values, and writes the result to the 64-bit destination register.

This is an alias of SMADDL. This means:

• The encodings in this description are named to match the encodings of SMADDL.
• The description of SMADDL gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 0 1 1 0 1 1 0 0 1 Rm 0 1 1 1 1 1 Rn Rd
U o0 Ra

SMULL <Xd>, <Wn>, <Wm>

is equivalent to

SMADDL <Xd>, <Wn>, <Wm>, XZR

and is always the preferred disassembly.

Assembler Symbols

Operation

The description of SMADDL gives the operational pseudocode for this instruction.

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

SMULL Page 648

SSBB

Speculative Store Bypass Barrier is a memory barrier which prevents speculative loads from bypassing earlier stores
to the same virtual address under certain conditions.
The semantics of the Speculative Store Bypass Barrier are:
• When a load to a location appears in program order after the SSBB, then the load does not speculatively read
an entry earlier in the coherence order for that location than the entry generated by the latest store satisfying
all of the following conditions:
◦ The store is to the same location as the load.
◦ The store uses the same virtual address as the load.
◦ The store appears in program order before the SSBB.
• When a load to a location appears in program order before the SSBB, then the load does not speculatively
read data from any store satisfying all of the following conditions:
◦ The store is to the same location as the load.
◦ The store uses the same virtual address as the load.
◦ The store appears in program order after the SSBB.

This is an alias of DSB. This means:

SSBB

is equivalent to

DSB #0

and is always the preferred disassembly.

Operation

The description of DSB gives the operational pseudocode for this instruction.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

SSBB Page 649

ST2G

Store Allocation Tags stores an Allocation Tag to two Tag granules of memory. The address used for the store is
calculated from the base register and an immediate signed offset scaled by the Tag granule. The Allocation Tag is
calculated from the Logical Address Tag in the source register.
This instruction generates an Unchecked access.
It has encodings from 3 classes: Post-index , Pre-index and Signed offset

Post-index
(FEAT_MTE)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 1 0 0 1 1 0 1 imm9 0 1 Xn Xt

ST2G <Xt|SP>, [<Xn|SP>], #<simm>

if !HaveMTEExt() then UNDEFINED;

integer n = UInt(Xn);
integer t = UInt(Xt);
bits(64) offset = LSL(SignExtend(imm9, 64), LOG2_TAG_GRANULE);
boolean writeback = TRUE;
boolean postindex = TRUE;

Pre-index
(FEAT_MTE)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 1 0 0 1 1 0 1 imm9 1 1 Xn Xt

ST2G <Xt|SP>, [<Xn|SP>, #<simm>]!

if !HaveMTEExt() then UNDEFINED;

integer n = UInt(Xn);
integer t = UInt(Xt);
bits(64) offset = LSL(SignExtend(imm9, 64), LOG2_TAG_GRANULE);
boolean writeback = TRUE;
boolean postindex = FALSE;

Signed offset
(FEAT_MTE)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 1 0 0 1 1 0 1 imm9 1 0 Xn Xt

ST2G <Xt|SP>, [<Xn|SP>{, #<simm>}]

if !HaveMTEExt() then UNDEFINED;

integer n = UInt(Xn);
integer t = UInt(Xt);
bits(64) offset = LSL(SignExtend(imm9, 64), LOG2_TAG_GRANULE);
boolean writeback = FALSE;
boolean postindex = FALSE;

Assembler Symbols

<Xt|SP> Is the 64-bit name of the general-purpose register to be transferred, encoded in the "Xt" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Xn" field.
<simm> Is the optional signed immediate offset, a multiple of 16 in the range -4096 to 4080, defaulting to 0 and
encoded in the "imm9" field.

ST2G Page 650

Operation

bits(64) address;
bits(64) data = if t == 31 then SP[] else X[t];
bits(4) tag = AArch64.AllocationTagFromAddress(data);

SetTagCheckedInstruction(FALSE);

if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];

if !postindex then
address = address + offset;

AArch64.MemTag[address, AccType_NORMAL] = tag;

AArch64.MemTag[address+TAG_GRANULE, AccType_NORMAL] = tag;

if writeback then
if postindex then
address = address + offset;

if n == 31 then
SP[] = address;
else
X[n] = address;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

ST2G Page 651

ST64B

Single-copy Atomic 64-byte Store without Return stores eight 64-bit doublewords from consecutive registers, Xt to
X(t+7), to a memory location. The data that is stored is atomic and is required to be 64-byte-aligned.

Integer
(FEAT_LS64)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 1 1 0 0 0 0 0 1 1 1 1 1 1 1 0 0 1 0 0 Rn Rt

ST64B <Xt>, [<Xn|SP> {,#0}]

if !HaveFeatLS64() then UNDEFINED;

if Rt<4:3> == '11' || Rt<0> == '1' then UNDEFINED;

integer n = UInt(Rn);
integer t = UInt(Rt);
boolean tag_checked = n != 31;

Assembler Symbols

Operation

CheckLDST64BEnabled();

bits(512) data;
bits(64) address;
bits(64) value;
acctype = AccType_ATOMICLS64;

if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);

for i = 0 to 7
value = X[t+i];
if BigEndian(acctype) then value = BigEndianReverse(value);
data<63+64*i:64*i> = value;

if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];

MemStore64B(address, data, acctype);

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

ST64B Page 652

ST64BV

Single-copy Atomic 64-byte Store with Return stores eight 64-bit doublewords from consecutive registers, Xt to
X(t+7), to a memory location, and writes the status result of the store to a register. The data that is stored is atomic
and is required to be 64-byte aligned.

Integer
(FEAT_LS64_V)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 1 1 0 0 0 0 0 1 Rs 1 0 1 1 0 0 Rn Rt

ST64BV <Xs>, <Xt>, [<Xn|SP>]

if !HaveFeatLS64_V() then UNDEFINED;

if Rt<4:3> == '11' || Rt<0> == '1' then UNDEFINED;

integer n = UInt(Rn);
integer t = UInt(Rt);
integer s = UInt(Rs);
boolean tag_checked = n != 31;

Assembler Symbols

<Xs> Is the 64-bit name of the general-purpose register into which the status result of this instruction is
written, encoded in the "Rs" field.
The value returned is:
0xFFFFFFFF_FFFFFFFF
If the memory location accessed does not support this instruction. In this case, the value at the
memory location is UNKNOWN.

!= 0xFFFFFFFF_FFFFFFFF
If the memory location accessed does support this instruction. In this case, the peripheral that
provides the response defines the returned value and provides information on the state of the
memory update at the memory location.

If XZR is used, then the return value is ignored.

ST64BV Page 653

Operation

CheckST64BVEnabled();

bits(512) data;
bits(64) address;
bits(64) value;
bits(64) status;
acctype = AccType_ATOMICLS64;

if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);

for i = 0 to 7
value = X[t+i];
if BigEndian(acctype) then value = BigEndianReverse(value);
data<63+64*i:64*i> = value;

if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];

status = MemStore64BWithRet(address, data, acctype);

if s != 31 then X[s] = status;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

ST64BV Page 654

ST64BV0

Single-copy Atomic 64-byte EL0 Store with Return stores eight 64-bit doublewords from consecutive registers, Xt to
X(t+7), to a memory location, with the bottom 32 bits taken from ACCDATA_EL1, and writes the status result of the
store to a register. The data that is stored is atomic and is required to be 64-byte aligned.

Integer
(FEAT_LS64_ACCDATA)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 1 1 0 0 0 0 0 1 Rs 1 0 1 0 0 0 Rn Rt

ST64BV0 <Xs>, <Xt>, [<Xn|SP>]

if !HaveFeatLS64_ACCDATA() then UNDEFINED;

if Rt<4:3> == '11' || Rt<0> == '1' then UNDEFINED;

integer n = UInt(Rn);
integer t = UInt(Rt);
integer s = UInt(Rs);
boolean tag_checked = n != 31;

Assembler Symbols

If XZR is used, then the return value is ignored.

ST64BV0 Page 655

Operation

CheckST64BV0Enabled();

bits(512) data;
bits(64) address;
bits(64) value;
bits(64) status;
acctype = AccType_ATOMICLS64;

if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);

bits(64) Xt = X[t];
value<31:0> = ACCDATA_EL1<31:0>;
value<63:32> = Xt<63:32>;
if BigEndian(acctype) then value = BigEndianReverse(value);
data<63:0> = value;
for i = 1 to 7
value = X[t+i];
if BigEndian(acctype) then value = BigEndianReverse(value);
data<63+64*i:64*i> = value;

if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];

status = MemStore64BWithRet(address, data, acctype);

if s != 31 then X[s] = status;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

ST64BV0 Page 656

STADD, STADDL

Atomic add on word or doubleword in memory, without return, atomically loads a 32-bit word or 64-bit doubleword
from memory, adds the value held in a register to it, and stores the result back to memory.
• STADD does not have release semantics.
• STADDL stores to memory with release semantics, as described in Load-Acquire, Store-Release.
For information about memory accesses see Load/Store addressing modes.

This is an alias of LDADD, LDADDA, LDADDAL, LDADDL. This means:

• The encodings in this description are named to match the encodings of LDADD, LDADDA, LDADDAL,
LDADDL.
• The description of LDADD, LDADDA, LDADDAL, LDADDL gives the operational pseudocode for this
instruction.

Integer
(FEAT_LSE)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 x 1 1 1 0 0 0 0 R 1 Rs 0 0 0 0 0 0 Rn 1 1 1 1 1
size A opc Rt

32-bit LDADD alias (size == 10 && R == 0)

STADD <Ws>, [<Xn|SP>]

is equivalent to

LDADD <Ws>, WZR, [<Xn|SP>]

and is always the preferred disassembly.

32-bit LDADDL alias (size == 10 && R == 1)

STADDL <Ws>, [<Xn|SP>]

is equivalent to

LDADDL <Ws>, WZR, [<Xn|SP>]

and is always the preferred disassembly.

64-bit LDADD alias (size == 11 && R == 0)

STADD <Xs>, [<Xn|SP>]

is equivalent to

LDADD <Xs>, XZR, [<Xn|SP>]

and is always the preferred disassembly.

64-bit LDADDL alias (size == 11 && R == 1)

STADDL <Xs>, [<Xn|SP>]

is equivalent to

LDADDL <Xs>, XZR, [<Xn|SP>]

and is always the preferred disassembly.

STADD, STADDL Page 657

Assembler Symbols

<Ws> Is the 32-bit name of the general-purpose register holding the data value to be operated on with the
contents of the memory location, encoded in the "Rs" field.
<Xs> Is the 64-bit name of the general-purpose register holding the data value to be operated on with the
contents of the memory location, encoded in the "Rs" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.

Operation

The description of LDADD, LDADDA, LDADDAL, LDADDL gives the operational pseudocode for this instruction.

Operational information

If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

STADD, STADDL Page 658

STADDB, STADDLB

Atomic add on byte in memory, without return, atomically loads an 8-bit byte from memory, adds the value held in a
register to it, and stores the result back to memory.
• STADDB does not have release semantics.
• STADDLB stores to memory with release semantics, as described in Load-Acquire, Store-Release.
For information about memory accesses see Load/Store addressing modes.

This is an alias of LDADDB, LDADDAB, LDADDALB, LDADDLB. This means:

• The encodings in this description are named to match the encodings of LDADDB, LDADDAB, LDADDALB,
LDADDLB.
• The description of LDADDB, LDADDAB, LDADDALB, LDADDLB gives the operational pseudocode for this
instruction.

Integer
(FEAT_LSE)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 1 1 0 0 0 0 R 1 Rs 0 0 0 0 0 0 Rn 1 1 1 1 1
size A opc Rt

No memory ordering (R == 0)

STADDB <Ws>, [<Xn|SP>]

is equivalent to

LDADDB <Ws>, WZR, [<Xn|SP>]

and is always the preferred disassembly.

Release (R == 1)

STADDLB <Ws>, [<Xn|SP>]

is equivalent to

LDADDLB <Ws>, WZR, [<Xn|SP>]

and is always the preferred disassembly.

Assembler Symbols

<Ws> Is the 32-bit name of the general-purpose register holding the data value to be operated on with the
contents of the memory location, encoded in the "Rs" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.

Operation

The description of LDADDB, LDADDAB, LDADDALB, LDADDLB gives the operational pseudocode for this instruction.

Operational information

If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

STADDB, STADDLB Page 659

STADDH, STADDLH

Atomic add on halfword in memory, without return, atomically loads a 16-bit halfword from memory, adds the value
held in a register to it, and stores the result back to memory.
• STADDH does not have release semantics.
• STADDLH stores to memory with release semantics, as described in Load-Acquire, Store-Release.
For information about memory accesses see Load/Store addressing modes.

This is an alias of LDADDH, LDADDAH, LDADDALH, LDADDLH. This means:

• The encodings in this description are named to match the encodings of LDADDH, LDADDAH, LDADDALH,
LDADDLH.
• The description of LDADDH, LDADDAH, LDADDALH, LDADDLH gives the operational pseudocode for this
instruction.

Integer
(FEAT_LSE)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 0 0 0 0 R 1 Rs 0 0 0 0 0 0 Rn 1 1 1 1 1
size A opc Rt

No memory ordering (R == 0)

STADDH <Ws>, [<Xn|SP>]

is equivalent to

LDADDH <Ws>, WZR, [<Xn|SP>]

and is always the preferred disassembly.

Release (R == 1)

STADDLH <Ws>, [<Xn|SP>]

is equivalent to

LDADDLH <Ws>, WZR, [<Xn|SP>]

and is always the preferred disassembly.

Assembler Symbols

Operation

The description of LDADDH, LDADDAH, LDADDALH, LDADDLH gives the operational pseudocode for this instruction.

Operational information

If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

STADDH, STADDLH Page 660

STCLR, STCLRL

Atomic bit clear on word or doubleword in memory, without return, atomically loads a 32-bit word or 64-bit
doubleword from memory, performs a bitwise AND with the complement of the value held in a register on it, and
stores the result back to memory.
• STCLR does not have release semantics.
• STCLRL stores to memory with release semantics, as described in Load-Acquire, Store-Release.
For information about memory accesses see Load/Store addressing modes.

This is an alias of LDCLR, LDCLRA, LDCLRAL, LDCLRL. This means:

• The encodings in this description are named to match the encodings of LDCLR, LDCLRA, LDCLRAL, LDCLRL.
• The description of LDCLR, LDCLRA, LDCLRAL, LDCLRL gives the operational pseudocode for this
instruction.

Integer
(FEAT_LSE)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 x 1 1 1 0 0 0 0 R 1 Rs 0 0 0 1 0 0 Rn 1 1 1 1 1
size A opc Rt

32-bit LDCLR alias (size == 10 && R == 0)

STCLR <Ws>, [<Xn|SP>]

is equivalent to

LDCLR <Ws>, WZR, [<Xn|SP>]

and is always the preferred disassembly.

32-bit LDCLRL alias (size == 10 && R == 1)

STCLRL <Ws>, [<Xn|SP>]

is equivalent to

LDCLRL <Ws>, WZR, [<Xn|SP>]

and is always the preferred disassembly.

64-bit LDCLR alias (size == 11 && R == 0)

STCLR <Xs>, [<Xn|SP>]

is equivalent to

LDCLR <Xs>, XZR, [<Xn|SP>]

and is always the preferred disassembly.

64-bit LDCLRL alias (size == 11 && R == 1)

STCLRL <Xs>, [<Xn|SP>]

is equivalent to

LDCLRL <Xs>, XZR, [<Xn|SP>]

and is always the preferred disassembly.

STCLR, STCLRL Page 661

Assembler Symbols

Operation

The description of LDCLR, LDCLRA, LDCLRAL, LDCLRL gives the operational pseudocode for this instruction.

Operational information

If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

STCLR, STCLRL Page 662

STCLRB, STCLRLB

Atomic bit clear on byte in memory, without return, atomically loads an 8-bit byte from memory, performs a bitwise
AND with the complement of the value held in a register on it, and stores the result back to memory.
• STCLRB does not have release semantics.
• STCLRLB stores to memory with release semantics, as described in Load-Acquire, Store-Release.
For information about memory accesses see Load/Store addressing modes.

This is an alias of LDCLRB, LDCLRAB, LDCLRALB, LDCLRLB. This means:

• The encodings in this description are named to match the encodings of LDCLRB, LDCLRAB, LDCLRALB,
LDCLRLB.
• The description of LDCLRB, LDCLRAB, LDCLRALB, LDCLRLB gives the operational pseudocode for this
instruction.

Integer
(FEAT_LSE)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 1 1 0 0 0 0 R 1 Rs 0 0 0 1 0 0 Rn 1 1 1 1 1
size A opc Rt

No memory ordering (R == 0)

STCLRB <Ws>, [<Xn|SP>]

is equivalent to

LDCLRB <Ws>, WZR, [<Xn|SP>]

and is always the preferred disassembly.

Release (R == 1)

STCLRLB <Ws>, [<Xn|SP>]

is equivalent to

LDCLRLB <Ws>, WZR, [<Xn|SP>]

and is always the preferred disassembly.

Assembler Symbols

Operation

The description of LDCLRB, LDCLRAB, LDCLRALB, LDCLRLB gives the operational pseudocode for this instruction.

Operational information

If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

STCLRB, STCLRLB Page 663

STCLRH, STCLRLH

Atomic bit clear on halfword in memory, without return, atomically loads a 16-bit halfword from memory, performs a
bitwise AND with the complement of the value held in a register on it, and stores the result back to memory.
• STCLRH does not have release semantics.
• STCLRLH stores to memory with release semantics, as described in Load-Acquire, Store-Release.
For information about memory accesses see Load/Store addressing modes.

This is an alias of LDCLRH, LDCLRAH, LDCLRALH, LDCLRLH. This means:

• The encodings in this description are named to match the encodings of LDCLRH, LDCLRAH, LDCLRALH,
LDCLRLH.
• The description of LDCLRH, LDCLRAH, LDCLRALH, LDCLRLH gives the operational pseudocode for this
instruction.

Integer
(FEAT_LSE)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 0 0 0 0 R 1 Rs 0 0 0 1 0 0 Rn 1 1 1 1 1
size A opc Rt

No memory ordering (R == 0)

STCLRH <Ws>, [<Xn|SP>]

is equivalent to

LDCLRH <Ws>, WZR, [<Xn|SP>]

and is always the preferred disassembly.

Release (R == 1)

STCLRLH <Ws>, [<Xn|SP>]

is equivalent to

LDCLRLH <Ws>, WZR, [<Xn|SP>]

and is always the preferred disassembly.

Assembler Symbols

Operation

The description of LDCLRH, LDCLRAH, LDCLRALH, LDCLRLH gives the operational pseudocode for this instruction.

Operational information

If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

STCLRH, STCLRLH Page 664

STEOR, STEORL

Atomic exclusive OR on word or doubleword in memory, without return, atomically loads a 32-bit word or 64-bit
doubleword from memory, performs an exclusive OR with the value held in a register on it, and stores the result back
to memory.
• STEOR does not have release semantics.
• STEORL stores to memory with release semantics, as described in Load-Acquire, Store-Release.
For information about memory accesses see Load/Store addressing modes.

This is an alias of LDEOR, LDEORA, LDEORAL, LDEORL. This means:

• The encodings in this description are named to match the encodings of LDEOR, LDEORA, LDEORAL,
LDEORL.
• The description of LDEOR, LDEORA, LDEORAL, LDEORL gives the operational pseudocode for this
instruction.

Integer
(FEAT_LSE)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 x 1 1 1 0 0 0 0 R 1 Rs 0 0 1 0 0 0 Rn 1 1 1 1 1
size A opc Rt

32-bit LDEOR alias (size == 10 && R == 0)

STEOR <Ws>, [<Xn|SP>]

is equivalent to

LDEOR <Ws>, WZR, [<Xn|SP>]

and is always the preferred disassembly.

32-bit LDEORL alias (size == 10 && R == 1)

STEORL <Ws>, [<Xn|SP>]

is equivalent to

LDEORL <Ws>, WZR, [<Xn|SP>]

and is always the preferred disassembly.

64-bit LDEOR alias (size == 11 && R == 0)

STEOR <Xs>, [<Xn|SP>]

is equivalent to

LDEOR <Xs>, XZR, [<Xn|SP>]

and is always the preferred disassembly.

64-bit LDEORL alias (size == 11 && R == 1)

STEORL <Xs>, [<Xn|SP>]

is equivalent to

LDEORL <Xs>, XZR, [<Xn|SP>]

and is always the preferred disassembly.

STEOR, STEORL Page 665

Assembler Symbols

Operation

The description of LDEOR, LDEORA, LDEORAL, LDEORL gives the operational pseudocode for this instruction.

Operational information

If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

STEOR, STEORL Page 666

STEORB, STEORLB

Atomic exclusive OR on byte in memory, without return, atomically loads an 8-bit byte from memory, performs an
exclusive OR with the value held in a register on it, and stores the result back to memory.
• STEORB does not have release semantics.
• STEORLB stores to memory with release semantics, as described in Load-Acquire, Store-Release.
For information about memory accesses see Load/Store addressing modes.

This is an alias of LDEORB, LDEORAB, LDEORALB, LDEORLB. This means:

• The encodings in this description are named to match the encodings of LDEORB, LDEORAB, LDEORALB,
LDEORLB.
• The description of LDEORB, LDEORAB, LDEORALB, LDEORLB gives the operational pseudocode for this
instruction.

Integer
(FEAT_LSE)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 1 1 0 0 0 0 R 1 Rs 0 0 1 0 0 0 Rn 1 1 1 1 1
size A opc Rt

No memory ordering (R == 0)

STEORB <Ws>, [<Xn|SP>]

is equivalent to

LDEORB <Ws>, WZR, [<Xn|SP>]

and is always the preferred disassembly.

Release (R == 1)

STEORLB <Ws>, [<Xn|SP>]

is equivalent to

LDEORLB <Ws>, WZR, [<Xn|SP>]

and is always the preferred disassembly.

Assembler Symbols

Operation

The description of LDEORB, LDEORAB, LDEORALB, LDEORLB gives the operational pseudocode for this instruction.

Operational information

If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

STEORB, STEORLB Page 667

STEORH, STEORLH

Atomic exclusive OR on halfword in memory, without return, atomically loads a 16-bit halfword from memory, performs
an exclusive OR with the value held in a register on it, and stores the result back to memory.
• STEORH does not have release semantics.
• STEORLH stores to memory with release semantics, as described in Load-Acquire, Store-Release.
For information about memory accesses see Load/Store addressing modes.

This is an alias of LDEORH, LDEORAH, LDEORALH, LDEORLH. This means:

• The encodings in this description are named to match the encodings of LDEORH, LDEORAH, LDEORALH,
LDEORLH.
• The description of LDEORH, LDEORAH, LDEORALH, LDEORLH gives the operational pseudocode for this
instruction.

Integer
(FEAT_LSE)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 0 0 0 0 R 1 Rs 0 0 1 0 0 0 Rn 1 1 1 1 1
size A opc Rt

No memory ordering (R == 0)

STEORH <Ws>, [<Xn|SP>]

is equivalent to

LDEORH <Ws>, WZR, [<Xn|SP>]

and is always the preferred disassembly.

Release (R == 1)

STEORLH <Ws>, [<Xn|SP>]

is equivalent to

LDEORLH <Ws>, WZR, [<Xn|SP>]

and is always the preferred disassembly.

Assembler Symbols

Operation

The description of LDEORH, LDEORAH, LDEORALH, LDEORLH gives the operational pseudocode for this instruction.

Operational information

If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

STEORH, STEORLH Page 668

STG

Store Allocation Tag stores an Allocation Tag to memory. The address used for the store is calculated from the base
register and an immediate signed offset scaled by the Tag granule. The Allocation Tag is calculated from the Logical
Address Tag in the source register.
This instruction generates an Unchecked access.
It has encodings from 3 classes: Post-index , Pre-index and Signed offset

Post-index
(FEAT_MTE)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 1 0 0 1 0 0 1 imm9 0 1 Xn Xt

STG <Xt|SP>, [<Xn|SP>], #<simm>

if !HaveMTEExt() then UNDEFINED;

integer n = UInt(Xn);
integer t = UInt(Xt);
bits(64) offset = LSL(SignExtend(imm9, 64), LOG2_TAG_GRANULE);
boolean writeback = TRUE;
boolean postindex = TRUE;

Pre-index
(FEAT_MTE)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 1 0 0 1 0 0 1 imm9 1 1 Xn Xt

STG <Xt|SP>, [<Xn|SP>, #<simm>]!

if !HaveMTEExt() then UNDEFINED;

integer n = UInt(Xn);
integer t = UInt(Xt);
bits(64) offset = LSL(SignExtend(imm9, 64), LOG2_TAG_GRANULE);
boolean writeback = TRUE;
boolean postindex = FALSE;

Signed offset
(FEAT_MTE)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 1 0 0 1 0 0 1 imm9 1 0 Xn Xt

STG <Xt|SP>, [<Xn|SP>{, #<simm>}]

if !HaveMTEExt() then UNDEFINED;

integer n = UInt(Xn);
integer t = UInt(Xt);
bits(64) offset = LSL(SignExtend(imm9, 64), LOG2_TAG_GRANULE);
boolean writeback = FALSE;
boolean postindex = FALSE;

Assembler Symbols

STG Page 669

Operation

bits(64) address;

SetTagCheckedInstruction(FALSE);

if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];

if !postindex then
address = address + offset;

bits(64) data = if t == 31 then SP[] else X[t];

bits(4) tag = AArch64.AllocationTagFromAddress(data);
AArch64.MemTag[address, AccType_NORMAL] = tag;

if writeback then
if postindex then
address = address + offset;

if n == 31 then
SP[] = address;
else
X[n] = address;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

STG Page 670

STGM

Store Tag Multiple writes a naturally aligned block of N Allocation Tags, where the size of N is identified in
GMID_EL1.BS, and the Allocation Tag written to address A is taken from the source register at
4*A<7:4>+3:4*A<7:4>.
This instruction is UNDEFINED at EL0.
This instruction generates an Unchecked access.

Integer
(FEAT_MTE2)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 1 0 0 1 1 0 1 0 0 0 0 0 0 0 0 0 0 0 Xn Xt

STGM <Xt>, [<Xn|SP>]

if !HaveMTE2Ext() then UNDEFINED;

integer t = UInt(Xt);
integer n = UInt(Xn);

Assembler Symbols

Operation

if PSTATE.EL == EL0 then

UNDEFINED;

bits(64) data = X[t];

bits(64) address;

if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];

integer size = 4 * (2 ^ (UInt(GMID_EL1.BS)));

address = Align(address, size);
integer count = size >> LOG2_TAG_GRANULE;
integer index = UInt(address<LOG2_TAG_GRANULE+3:LOG2_TAG_GRANULE>);

for i = 0 to count-1
bits(4) tag = data<(index*4)+3:index*4>;
AArch64.MemTag[address, AccType_NORMAL] = tag;
address = address + TAG_GRANULE;
index = index + 1;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

STGM Page 671

STGP

Store Allocation Tag and Pair of registers stores an Allocation Tag and two 64-bit doublewords to memory, from two
registers. The address used for the store is calculated from the base register and an immediate signed offset scaled by
the Tag granule. The Allocation Tag is calculated from the Logical Address Tag in the base register.
This instruction generates an Unchecked access.
It has encodings from 3 classes: Post-index , Pre-index and Signed offset

Post-index
(FEAT_MTE)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 1 0 0 0 1 0 simm7 Xt2 Xn Xt

STGP <Xt1>, <Xt2>, [<Xn|SP>], #<imm>

if !HaveMTEExt() then UNDEFINED;

integer n = UInt(Xn);
integer t = UInt(Xt);
integer t2 = UInt(Xt2);
bits(64) offset = LSL(SignExtend(simm7, 64), LOG2_TAG_GRANULE);
boolean writeback = TRUE;
boolean postindex = TRUE;

Pre-index
(FEAT_MTE)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 1 0 0 1 1 0 simm7 Xt2 Xn Xt

STGP <Xt1>, <Xt2>, [<Xn|SP>, #<imm>]!

if !HaveMTEExt() then UNDEFINED;

integer n = UInt(Xn);
integer t = UInt(Xt);
integer t2 = UInt(Xt2);
bits(64) offset = LSL(SignExtend(simm7, 64), LOG2_TAG_GRANULE);
boolean writeback = TRUE;
boolean postindex = FALSE;

Signed offset
(FEAT_MTE)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 1 0 0 1 0 0 simm7 Xt2 Xn Xt

STGP <Xt1>, <Xt2>, [<Xn|SP>{, #<imm>}]

if !HaveMTEExt() then UNDEFINED;

integer n = UInt(Xn);
integer t = UInt(Xt);
integer t2 = UInt(Xt2);
bits(64) offset = LSL(SignExtend(simm7, 64), LOG2_TAG_GRANULE);
boolean writeback = FALSE;
boolean postindex = FALSE;

Assembler Symbols

<Xt1> Is the 64-bit name of the first general-purpose register to be transferred, encoded in the "Xt" field.

STGP Page 672

<Xt2> Is the 64-bit name of the second general-purpose register to be transferred, encoded in the "Xt2" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Xn" field.
<imm> For the post-index and pre-index variant: is the signed immediate offset, a multiple of 16 in the range
-1024 to 1008, encoded in the "simm7" field.
For the signed offset variant: is the optional signed immediate offset, a multiple of 16 in the range -1024
to 1008, defaulting to 0 and encoded in the "simm7" field.

Operation

bits(64) address;
bits(64) data1;
bits(64) data2;

SetTagCheckedInstruction(FALSE);

if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];

data1 = X[t];
data2 = X[t2];

if !postindex then
address = address + offset;

if address != Align(address, TAG_GRANULE) then

AArch64.Abort(address, AlignmentFault(AccType_NORMAL, TRUE, FALSE));

Mem[address, 8, AccType_NORMAL] = data1;

Mem[address+8, 8, AccType_NORMAL] = data2;

AArch64.MemTag[address, AccType_NORMAL] = AArch64.AllocationTagFromAddress(address);

if writeback then
if postindex then
address = address + offset;

if n == 31 then
SP[] = address;
else
X[n] = address;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

STGP Page 673

STLLR

Store LORelease Register stores a 32-bit word or a 64-bit doubleword to a memory location, from a register. The
instruction also has memory ordering semantics as described in Load LOAcquire, Store LORelease. For information
about memory accesses, see Load/Store addressing modes.

No offset
(FEAT_LOR)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 x 0 0 1 0 0 0 1 0 0 (1) (1) (1) (1) (1) 0 (1) (1) (1) (1) (1) Rn Rt
size L Rs o0 Rt2

32-bit (size == 10)

STLLR <Wt>, [<Xn|SP>{,#0}]

64-bit (size == 11)

STLLR <Xt>, [<Xn|SP>{,#0}]

integer n = UInt(Rn);
integer t = UInt(Rt);

integer elsize = 8 << UInt(size);

boolean tag_checked = n != 31;

Assembler Symbols

Operation

bits(64) address;
bits(elsize) data;
constant integer dbytes = elsize DIV 8;

if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);

if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];

data = X[t];
Mem[address, dbytes, AccType_LIMITEDORDERED] = data;

Operational information

If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

STLLR Page 674

STLLRB

Store LORelease Register Byte stores a byte from a 32-bit register to a memory location. The instruction also has
memory ordering semantics as described in Load LOAcquire, Store LORelease. For information about memory
accesses, see Load/Store addressing modes.

No offset
(FEAT_LOR)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 1 0 0 0 1 0 0 (1) (1) (1) (1) (1) 0 (1) (1) (1) (1) (1) Rn Rt
size L Rs o0 Rt2

STLLRB <Wt>, [<Xn|SP>{,#0}]

integer n = UInt(Rn);
integer t = UInt(Rt);

boolean tag_checked = n != 31;

Assembler Symbols

Operation

bits(64) address;
bits(8) data;

if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);

if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];

data = X[t];
Mem[address, 1, AccType_LIMITEDORDERED] = data;

Operational information

If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

STLLRB Page 675

STLLRH

Store LORelease Register Halfword stores a halfword from a 32-bit register to a memory location. The instruction also
has memory ordering semantics as described in Load LOAcquire, Store LORelease. For information about memory
accesses, see Load/Store addressing modes.

No offset
(FEAT_LOR)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 0 1 0 0 0 1 0 0 (1) (1) (1) (1) (1) 0 (1) (1) (1) (1) (1) Rn Rt
size L Rs o0 Rt2

STLLRH <Wt>, [<Xn|SP>{,#0}]

integer n = UInt(Rn);
integer t = UInt(Rt);

boolean tag_checked = n != 31;

Assembler Symbols

Operation

bits(64) address;
bits(16) data;

if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);

if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];

data = X[t];
Mem[address, 2, AccType_LIMITEDORDERED] = data;

Operational information

If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

STLLRH Page 676

STLR

Store-Release Register stores a 32-bit word or a 64-bit doubleword to a memory location, from a register. The
instruction also has memory ordering semantics as described in Load-Acquire, Store-Release. For information about
memory accesses, see Load/Store addressing modes.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 x 0 0 1 0 0 0 1 0 0 (1) (1) (1) (1) (1) 1 (1) (1) (1) (1) (1) Rn Rt
size L Rs o0 Rt2

32-bit (size == 10)

STLR <Wt>, [<Xn|SP>{,#0}]

64-bit (size == 11)

STLR <Xt>, [<Xn|SP>{,#0}]

integer n = UInt(Rn);
integer t = UInt(Rt);

integer elsize = 8 << UInt(size);

boolean tag_checked = n != 31;

Assembler Symbols

Operation

bits(64) address;
bits(elsize) data;
constant integer dbytes = elsize DIV 8;

if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);

if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];

data = X[t];
Mem[address, dbytes, AccType_ORDERED] = data;

Operational information

If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

STLR Page 677

STLRB

Store-Release Register Byte stores a byte from a 32-bit register to a memory location. The instruction also has memory
ordering semantics as described in Load-Acquire, Store-Release. For information about memory accesses, see Load/
Store addressing modes.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 1 0 0 0 1 0 0 (1) (1) (1) (1) (1) 1 (1) (1) (1) (1) (1) Rn Rt
size L Rs o0 Rt2

STLRB <Wt>, [<Xn|SP>{,#0}]

integer n = UInt(Rn);
integer t = UInt(Rt);

boolean tag_checked = n != 31;

Assembler Symbols

Operation

bits(64) address;
bits(8) data;

if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);

if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];

data = X[t];
Mem[address, 1, AccType_ORDERED] = data;

Operational information

If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

STLRB Page 678

STLRH

Store-Release Register Halfword stores a halfword from a 32-bit register to a memory location. The instruction also
has memory ordering semantics as described in Load-Acquire, Store-Release. For information about memory accesses,
see Load/Store addressing modes.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 0 1 0 0 0 1 0 0 (1) (1) (1) (1) (1) 1 (1) (1) (1) (1) (1) Rn Rt
size L Rs o0 Rt2

STLRH <Wt>, [<Xn|SP>{,#0}]

integer n = UInt(Rn);
integer t = UInt(Rt);

boolean tag_checked = n != 31;

Assembler Symbols

Operation

bits(64) address;
bits(16) data;

if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);

if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];

data = X[t];
Mem[address, 2, AccType_ORDERED] = data;

Operational information

If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

STLRH Page 679

STLUR

Store-Release Register (unscaled) calculates an address from a base register value and an immediate offset, and
stores a 32-bit word or a 64-bit doubleword to the calculated address, from a register.
The instruction has memory ordering semantics as described in Load-Acquire, Load-AcquirePC, and Store-Release
For information about memory accesses, see Load/Store addressing modes.

Unscaled offset
(FEAT_LRCPC2)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 x 0 1 1 0 0 1 0 0 0 imm9 0 0 Rn Rt
size opc

32-bit (size == 10)

STLUR <Wt>, [<Xn|SP>{, #<simm>}]

64-bit (size == 11)

STLUR <Xt>, [<Xn|SP>{, #<simm>}]

integer scale = UInt(size);

bits(64) offset = SignExtend(imm9, 64);

Assembler Symbols

Shared Decode

integer n = UInt(Rn);
integer t = UInt(Rt);

integer datasize = 8 << scale;

boolean tag_checked = n != 31;

Operation

bits(64) address;
bits(datasize) data;

if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);

if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];

address = address + offset;

data = X[t];
Mem[address, datasize DIV 8, AccType_ORDERED] = data;

STLUR Page 680

Operational information

If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

STLUR Page 681

STLURB

Store-Release Register Byte (unscaled) calculates an address from a base register value and an immediate offset, and
stores a byte to the calculated address, from a 32-bit register.
The instruction has memory ordering semantics as described in Load-Acquire, Load-AcquirePC, and Store-Release
For information about memory accesses, see Load/Store addressing modes.

Unscaled offset
(FEAT_LRCPC2)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 1 1 0 0 1 0 0 0 imm9 0 0 Rn Rt
size opc

STLURB <Wt>, [<Xn|SP>{, #<simm>}]

bits(64) offset = SignExtend(imm9, 64);

Assembler Symbols

<Wt> Is the 32-bit name of the general-purpose register to be transferred, encoded in the "Rt" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<simm> Is the optional signed immediate byte offset, in the range -256 to 255, defaulting to 0 and encoded in
the "imm9" field.

Shared Decode

integer n = UInt(Rn);
integer t = UInt(Rt);

boolean tag_checked = n != 31;

Operation

bits(64) address;
bits(8) data;

if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);

if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];

address = address + offset;

data = X[t];
Mem[address, 1, AccType_ORDERED] = data;

Operational information

If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

STLURB Page 682

STLURH

Store-Release Register Halfword (unscaled) calculates an address from a base register value and an immediate offset,
and stores a halfword to the calculated address, from a 32-bit register.
The instruction has memory ordering semantics as described in Load-Acquire, Load-AcquirePC, and Store-Release
For information about memory accesses, see Load/Store addressing modes.

Unscaled offset
(FEAT_LRCPC2)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 0 0 1 0 0 0 imm9 0 0 Rn Rt
size opc

STLURH <Wt>, [<Xn|SP>{, #<simm>}]

bits(64) offset = SignExtend(imm9, 64);

Assembler Symbols

<Wt> Is the 32-bit name of the general-purpose register to be transferred, encoded in the "Rt" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<simm> Is the optional signed immediate byte offset, in the range -256 to 255, defaulting to 0 and encoded in
the "imm9" field.

Shared Decode

integer n = UInt(Rn);
integer t = UInt(Rt);

boolean tag_checked = n != 31;

Operation

bits(64) address;
bits(16) data;

if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);

if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];

address = address + offset;

data = X[t];
Mem[address, 2, AccType_ORDERED] = data;

Operational information

If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

STLURH Page 683

STLXP

Store-Release Exclusive Pair of registers stores two 32-bit words or two 64-bit doublewords to a memory location if the
PE has exclusive access to the memory address, from two registers, and returns a status value of 0 if the store was
successful, or of 1 if no store was performed. See Synchronization and semaphores. For information on single-copy
atomicity and alignment requirements, see Requirements for single-copy atomicity and Alignment of data accesses. If
a 64-bit pair Store-Exclusive succeeds, it causes a single-copy atomic update of the 128-bit memory location being
updated. The instruction also has memory ordering semantics, as described in Load-Acquire, Store-Release. For
information about memory accesses, see Load/Store addressing modes.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 sz 0 0 1 0 0 0 0 0 1 Rs 1 Rt2 Rn Rt
L o0

32-bit (sz == 0)

STLXP <Ws>, <Wt1>, <Wt2>, [<Xn|SP>{,#0}]

64-bit (sz == 1)

STLXP <Ws>, <Xt1>, <Xt2>, [<Xn|SP>{,#0}]

integer n = UInt(Rn);
integer t = UInt(Rt);
integer t2 = UInt(Rt2); // ignored by load/store single register
integer s = UInt(Rs); // ignored by all loads and store-release

integer elsize = 32 << UInt(sz);

integer datasize = elsize * 2;
boolean tag_checked = n != 31;

boolean rt_unknown = FALSE;

boolean rn_unknown = FALSE;
if s == t || (s == t2) then
Constraint c = ConstrainUnpredictable(Unpredictable_DATAOVERLAP);
assert c IN {Constraint_UNKNOWN, Constraint_UNDEF, Constraint_NOP};
case c of
when Constraint_UNKNOWN rt_unknown = TRUE; // store UNKNOWN value
when Constraint_UNDEF UNDEFINED;
when Constraint_NOP EndOfInstruction();
if s == n && n != 31 then
Constraint c = ConstrainUnpredictable(Unpredictable_BASEOVERLAP);
assert c IN {Constraint_UNKNOWN, Constraint_UNDEF, Constraint_NOP};
case c of
when Constraint_UNKNOWN rn_unknown = TRUE; // address is UNKNOWN
when Constraint_UNDEF UNDEFINED;
when Constraint_NOP EndOfInstruction();

For information about the CONSTRAINED UNPREDICTABLE behavior of this instruction, see Architectural Constraints on
UNPREDICTABLE behaviors, and particularly STLXP.

Assembler Symbols

<Ws> Is the 32-bit name of the general-purpose register into which the status result of the store exclusive is
written, encoded in the "Rs" field. The value returned is:
0
If the operation updates memory.

1
If the operation fails to update memory.

STLXP Page 684

<Wt1> Is the 32-bit name of the first general-purpose register to be transferred, encoded in the "Rt" field.
<Wt2> Is the 32-bit name of the second general-purpose register to be transferred, encoded in the "Rt2" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
Aborts and alignment
If a synchronous Data Abort exception is generated by the execution of this instruction:
• Memory is not updated.
• <Ws> is not updated.
Accessing an address that is not aligned to the size of the data being accessed causes an Alignment fault Data Abort
exception to be generated, subject to the following rules:
• If AArch64.ExclusiveMonitorsPass() returns TRUE, the exception is generated.
• Otherwise, it is IMPLEMENTATION DEFINED whether the exception is generated.
If AArch64.ExclusiveMonitorsPass() returns FALSE and the memory address, if accessed, would generate a
synchronous Data Abort exception, it is IMPLEMENTATION DEFINED whether the exception is generated.

Operation

bits(64) address;
bits(datasize) data;
constant integer dbytes = datasize DIV 8;

if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);

if n == 31 then
CheckSPAlignment();
address = SP[];
elsif rn_unknown then
address = bits(64) UNKNOWN;
else
address = X[n];

if rt_unknown then
data = bits(datasize) UNKNOWN;
else
bits(datasize DIV 2) el1 = X[t];
bits(datasize DIV 2) el2 = X[t2];
data = if BigEndian(AccType_ORDEREDATOMIC) then el1:el2 else el2:el1;
bit status = '1';
// Check whether the Exclusives monitors are set to include the
// physical memory locations corresponding to virtual address
// range [address, address+dbytes-1].
if AArch64.ExclusiveMonitorsPass(address, dbytes) then
// This atomic write will be rejected if it does not refer
// to the same physical locations after address translation.
Mem[address, dbytes, AccType_ORDEREDATOMIC] = data;
status = ExclusiveMonitorsStatus();
X[s] = ZeroExtend(status, 32);

Operational information

If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

STLXP Page 685

STLXR

Store-Release Exclusive Register stores a 32-bit word or a 64-bit doubleword to memory if the PE has exclusive access
to the memory address, from two registers, and returns a status value of 0 if the store was successful, or of 1 if no
store was performed. See Synchronization and semaphores. The memory access is atomic. The instruction also has
memory ordering semantics as described in Load-Acquire, Store-Release. For information about memory accesses see
Load/Store addressing modes.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 x 0 0 1 0 0 0 0 0 0 Rs 1 (1) (1) (1) (1) (1) Rn Rt
size L o0 Rt2

32-bit (size == 10)

STLXR <Ws>, <Wt>, [<Xn|SP>{,#0}]

64-bit (size == 11)

STLXR <Ws>, <Xt>, [<Xn|SP>{,#0}]

integer n = UInt(Rn);
integer t = UInt(Rt);
integer s = UInt(Rs); // ignored by all loads and store-release

integer elsize = 8 << UInt(size);

boolean tag_checked = n != 31;

boolean rt_unknown = FALSE;

boolean rn_unknown = FALSE;
if s == t then
Constraint c = ConstrainUnpredictable(Unpredictable_DATAOVERLAP);
assert c IN {Constraint_UNKNOWN, Constraint_UNDEF, Constraint_NOP};
case c of
when Constraint_UNKNOWN rt_unknown = TRUE; // store UNKNOWN value
when Constraint_UNDEF UNDEFINED;
when Constraint_NOP EndOfInstruction();
if s == n && n != 31 then
Constraint c = ConstrainUnpredictable(Unpredictable_BASEOVERLAP);
assert c IN {Constraint_UNKNOWN, Constraint_UNDEF, Constraint_NOP};
case c of
when Constraint_UNKNOWN rn_unknown = TRUE; // address is UNKNOWN
when Constraint_UNDEF UNDEFINED;
when Constraint_NOP EndOfInstruction();

For information about the CONSTRAINED UNPREDICTABLE behavior of this instruction, see Architectural Constraints on
UNPREDICTABLE behaviors, and particularly STLXR.

Assembler Symbols

1
If the operation fails to update memory.

<Xt> Is the 64-bit name of the general-purpose register to be transferred, encoded in the "Rt" field.
<Wt> Is the 32-bit name of the general-purpose register to be transferred, encoded in the "Rt" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
Aborts and alignment
If a synchronous Data Abort exception is generated by the execution of this instruction:
• Memory is not updated.

STLXR Page 686

• <Ws> is not updated.
Accessing an address that is not aligned to the size of the data being accessed causes an Alignment fault Data Abort
exception to be generated, subject to the following rules:
• If AArch64.ExclusiveMonitorsPass() returns TRUE, the exception is generated.
• Otherwise, it is IMPLEMENTATION DEFINED whether the exception is generated.
If AArch64.ExclusiveMonitorsPass() returns FALSE and the memory address, if accessed, would generate a
synchronous Data Abort exception, it is IMPLEMENTATION DEFINED whether the exception is generated.

Operation

bits(64) address;
bits(elsize) data;
constant integer dbytes = elsize DIV 8;

if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);

if n == 31 then
CheckSPAlignment();
address = SP[];
elsif rn_unknown then
address = bits(64) UNKNOWN;
else
address = X[n];

if rt_unknown then
data = bits(elsize) UNKNOWN;
else
data = X[t];

bit status = '1';

// Check whether the Exclusives monitors are set to include the
// physical memory locations corresponding to virtual address
// range [address, address+dbytes-1].
if AArch64.ExclusiveMonitorsPass(address, dbytes) then
// This atomic write will be rejected if it does not refer
// to the same physical locations after address translation.
Mem[address, dbytes, AccType_ORDEREDATOMIC] = data;
status = ExclusiveMonitorsStatus();
X[s] = ZeroExtend(status, 32);

Operational information

If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

STLXR Page 687

STLXRB

Store-Release Exclusive Register Byte stores a byte from a 32-bit register to memory if the PE has exclusive access to
the memory address, and returns a status value of 0 if the store was successful, or of 1 if no store was performed. See
Synchronization and semaphores. The memory access is atomic. The instruction also has memory ordering semantics
as described in Load-Acquire, Store-Release. For information about memory accesses see Load/Store addressing
modes.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 1 0 0 0 0 0 0 Rs 1 (1) (1) (1) (1) (1) Rn Rt
size L o0 Rt2

STLXRB <Ws>, <Wt>, [<Xn|SP>{,#0}]

integer n = UInt(Rn);
integer t = UInt(Rt);
integer s = UInt(Rs); // ignored by all loads and store-release

boolean tag_checked = n != 31;

boolean rt_unknown = FALSE;

For information about the CONSTRAINED UNPREDICTABLE behavior of this instruction, see Architectural Constraints on
UNPREDICTABLE behaviors, and particularly STLXRB.

Assembler Symbols

1
If the operation fails to update memory.

STLXRB Page 688

Operation

bits(64) address;
bits(8) data;

if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);

if n == 31 then
CheckSPAlignment();
address = SP[];
elsif rn_unknown then
address = bits(64) UNKNOWN;
else
address = X[n];

if rt_unknown then
data = bits(8) UNKNOWN;
else
data = X[t];

bit status = '1';

// Check whether the Exclusives monitors are set to include the
// physical memory locations corresponding to virtual address
// range [address, address+dbytes-1].
if AArch64.ExclusiveMonitorsPass(address, 1) then
// This atomic write will be rejected if it does not refer
// to the same physical locations after address translation.
Mem[address, 1, AccType_ORDEREDATOMIC] = data;
status = ExclusiveMonitorsStatus();
X[s] = ZeroExtend(status, 32);

Operational information

If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

STLXRB Page 689

STLXRH

Store-Release Exclusive Register Halfword stores a halfword from a 32-bit register to memory if the PE has exclusive
access to the memory address, and returns a status value of 0 if the store was successful, or of 1 if no store was
performed. See Synchronization and semaphores. The memory access is atomic. The instruction also has memory
ordering semantics as described in Load-Acquire, Store-Release. For information about memory accesses see Load/
Store addressing modes.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 0 1 0 0 0 0 0 0 Rs 1 (1) (1) (1) (1) (1) Rn Rt
size L o0 Rt2

STLXRH <Ws>, <Wt>, [<Xn|SP>{,#0}]

integer n = UInt(Rn);
integer t = UInt(Rt);
integer s = UInt(Rs); // ignored by all loads and store-release

boolean tag_checked = n != 31;

boolean rt_unknown = FALSE;

For information about the CONSTRAINED UNPREDICTABLE behavior of this instruction, see Architectural Constraints on
UNPREDICTABLE behaviors, and particularly STLXRH.

Assembler Symbols

1
If the operation fails to update memory.

<Wt> Is the 32-bit name of the general-purpose register to be transferred, encoded in the "Rt" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
Aborts and alignment
If a synchronous Data Abort exception is generated by the execution of this instruction:
• Memory is not updated.
• <Ws> is not updated.
A non halfword-aligned memory address causes an Alignment fault Data Abort exception to be generated, subject to
the following rules:
• If AArch64.ExclusiveMonitorsPass() returns TRUE, the exception is generated.
• Otherwise, it is IMPLEMENTATION DEFINED whether the exception is generated.
If AArch64.ExclusiveMonitorsPass() returns FALSE and the memory address, if accessed, would generate a
synchronous Data Abort exception, it is IMPLEMENTATION DEFINED whether the exception is generated.

STLXRH Page 690

Operation

bits(64) address;
bits(16) data;

if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);

if n == 31 then
CheckSPAlignment();
address = SP[];
elsif rn_unknown then
address = bits(64) UNKNOWN;
else
address = X[n];

if rt_unknown then
data = bits(16) UNKNOWN;
else
data = X[t];

bit status = '1';

// Check whether the Exclusives monitors are set to include the
// physical memory locations corresponding to virtual address
// range [address, address+dbytes-1].
if AArch64.ExclusiveMonitorsPass(address, 2) then
// This atomic write will be rejected if it does not refer
// to the same physical locations after address translation.
Mem[address, 2, AccType_ORDEREDATOMIC] = data;
status = ExclusiveMonitorsStatus();
X[s] = ZeroExtend(status, 32);

Operational information

If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

STLXRH Page 691

STNP

Store Pair of Registers, with non-temporal hint, calculates an address from a base register value and an immediate
offset, and stores two 32-bit words or two 64-bit doublewords to the calculated address, from two registers. For
information about memory accesses, see Load/Store addressing modes. For information about Non-temporal pair
instructions, see Load/Store Non-temporal pair.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
x 0 1 0 1 0 0 0 0 0 imm7 Rt2 Rn Rt
opc L

32-bit (opc == 00)

STNP <Wt1>, <Wt2>, [<Xn|SP>{, #<imm>}]

64-bit (opc == 10)

STNP <Xt1>, <Xt2>, [<Xn|SP>{, #<imm>}]

// Empty.

Assembler Symbols

Shared Decode

STNP Page 692

Operation

bits(64) address;
bits(datasize) data1;
bits(datasize) data2;
constant integer dbytes = datasize DIV 8;

if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);

if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];

address = address + offset;

data1 = X[t];
data2 = X[t2];
Mem[address, dbytes, AccType_STREAM] = data1;
Mem[address+dbytes, dbytes, AccType_STREAM] = data2;

Operational information

If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

STNP Page 693

STP

Store Pair of Registers calculates an address from a base register value and an immediate offset, and stores two 32-bit
words or two 64-bit doublewords to the calculated address, from two registers. For information about memory
accesses, see Load/Store addressing modes.
It has encodings from 3 classes: Post-index , Pre-index and Signed offset

Post-index

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
x 0 1 0 1 0 0 0 1 0 imm7 Rt2 Rn Rt
opc L

32-bit (opc == 00)

STP <Wt1>, <Wt2>, [<Xn|SP>], #<imm>

64-bit (opc == 10)

STP <Xt1>, <Xt2>, [<Xn|SP>], #<imm>

boolean wback = TRUE;

boolean postindex = TRUE;

Pre-index

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
x 0 1 0 1 0 0 1 1 0 imm7 Rt2 Rn Rt
opc L

32-bit (opc == 00)

STP <Wt1>, <Wt2>, [<Xn|SP>, #<imm>]!

64-bit (opc == 10)

STP <Xt1>, <Xt2>, [<Xn|SP>, #<imm>]!

boolean wback = TRUE;

boolean postindex = FALSE;

Signed offset

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
x 0 1 0 1 0 0 1 0 0 imm7 Rt2 Rn Rt
opc L

32-bit (opc == 00)

STP <Wt1>, <Wt2>, [<Xn|SP>{, #<imm>}]

64-bit (opc == 10)

STP <Xt1>, <Xt2>, [<Xn|SP>{, #<imm>}]

boolean wback = FALSE;

boolean postindex = FALSE;

STP Page 694

For information about the CONSTRAINED UNPREDICTABLE behavior of this instruction, see Architectural Constraints on
UNPREDICTABLE behaviors, and particularly STP.

Assembler Symbols

<Wt1> Is the 32-bit name of the first general-purpose register to be transferred, encoded in the "Rt" field.
<Wt2> Is the 32-bit name of the second general-purpose register to be transferred, encoded in the "Rt2" field.
<Xt1> Is the 64-bit name of the first general-purpose register to be transferred, encoded in the "Rt" field.
<Xt2> Is the 64-bit name of the second general-purpose register to be transferred, encoded in the "Rt2" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<imm> For the 32-bit post-index and 32-bit pre-index variant: is the signed immediate byte offset, a multiple of
4 in the range -256 to 252, encoded in the "imm7" field as <imm>/4.
For the 32-bit signed offset variant: is the optional signed immediate byte offset, a multiple of 4 in the
range -256 to 252, defaulting to 0 and encoded in the "imm7" field as <imm>/4.
For the 64-bit post-index and 64-bit pre-index variant: is the signed immediate byte offset, a multiple of
8 in the range -512 to 504, encoded in the "imm7" field as <imm>/8.
For the 64-bit signed offset variant: is the optional signed immediate byte offset, a multiple of 8 in the
range -512 to 504, defaulting to 0 and encoded in the "imm7" field as <imm>/8.

Shared Decode

integer n = UInt(Rn);
integer t = UInt(Rt);
integer t2 = UInt(Rt2);
if L:opc<0> == '01' || opc == '11' then UNDEFINED;
integer scale = 2 + UInt(opc<1>);
integer datasize = 8 << scale;
bits(64) offset = LSL(SignExtend(imm7, 64), scale);
boolean tag_checked = wback || n != 31;

boolean rt_unknown = FALSE;

if wback && (t == n || t2 == n) && n != 31 then

Constraint c = ConstrainUnpredictable(Unpredictable_WBOVERLAPST);
assert c IN {Constraint_NONE, Constraint_UNKNOWN, Constraint_UNDEF, Constraint_NOP};
case c of
when Constraint_NONE rt_unknown = FALSE; // value stored is pre-writeback
when Constraint_UNKNOWN rt_unknown = TRUE; // value stored is UNKNOWN
when Constraint_UNDEF UNDEFINED;
when Constraint_NOP EndOfInstruction();

STP Page 695

Operation

bits(64) address;
bits(datasize) data1;
bits(datasize) data2;
constant integer dbytes = datasize DIV 8;

if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);

if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];

if !postindex then
address = address + offset;

if rt_unknown && t == n then

data1 = bits(datasize) UNKNOWN;
else
data1 = X[t];
if rt_unknown && t2 == n then
data2 = bits(datasize) UNKNOWN;
else
data2 = X[t2];
if HaveLSE2Ext() then
bits(2*datasize) full_data;
if BigEndian(AccType_NORMAL) then
full_data = data1:data2;
else
full_data = data2:data1;
Mem[address, 2*dbytes, AccType_NORMAL, TRUE] = full_data;
else
Mem[address, dbytes, AccType_NORMAL] = data1;
Mem[address+dbytes, dbytes, AccType_NORMAL] = data2;

if wback then
if postindex then
address = address + offset;
if n == 31 then
SP[] = address;
else
X[n] = address;

Operational information

If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

STP Page 696

STR (immediate)

Store Register (immediate) stores a word or a doubleword from a register to memory. The address that is used for the
store is calculated from a base register and an immediate offset. For information about memory accesses, see Load/
Store addressing modes.
It has encodings from 3 classes: Post-index , Pre-index and Unsigned offset

Post-index

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 x 1 1 1 0 0 0 0 0 0 imm9 0 1 Rn Rt
size opc

32-bit (size == 10)

STR <Wt>, [<Xn|SP>], #<simm>

64-bit (size == 11)

STR <Xt>, [<Xn|SP>], #<simm>

boolean wback = TRUE;

boolean postindex = TRUE;
integer scale = UInt(size);
bits(64) offset = SignExtend(imm9, 64);

Pre-index

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 x 1 1 1 0 0 0 0 0 0 imm9 1 1 Rn Rt
size opc

32-bit (size == 10)

STR <Wt>, [<Xn|SP>, #<simm>]!

64-bit (size == 11)

STR <Xt>, [<Xn|SP>, #<simm>]!

boolean wback = TRUE;

boolean postindex = FALSE;
integer scale = UInt(size);
bits(64) offset = SignExtend(imm9, 64);

Unsigned offset

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 x 1 1 1 0 0 1 0 0 imm12 Rn Rt
size opc

STR (immediate) Page 697

32-bit (size == 10)

STR <Wt>, [<Xn|SP>{, #<pimm>}]

64-bit (size == 11)

STR <Xt>, [<Xn|SP>{, #<pimm>}]

boolean wback = FALSE;

boolean postindex = FALSE;
integer scale = UInt(size);
bits(64) offset = LSL(ZeroExtend(imm12, 64), scale);

Assembler Symbols

Shared Decode

integer n = UInt(Rn);
integer t = UInt(Rt);

integer datasize = 8 << scale;

boolean tag_checked = wback || n != 31;

boolean rt_unknown = FALSE;

if wback && n == t && n != 31 then

STR (immediate) Page 698

Operation

bits(64) address;
bits(datasize) data;

if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);

if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];

if !postindex then
address = address + offset;

if rt_unknown then
data = bits(datasize) UNKNOWN;
else
data = X[t];
Mem[address, datasize DIV 8, AccType_NORMAL] = data;

if wback then
if postindex then
address = address + offset;
if n == 31 then
SP[] = address;
else
X[n] = address;

Operational information

If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

STR (immediate) Page 699

STR (register)

Store Register (register) calculates an address from a base register value and an offset register value, and stores a
32-bit word or a 64-bit doubleword to the calculated address, from a register. For information about memory accesses,
see Load/Store addressing modes.
The instruction uses an offset addressing mode, that calculates the address used for the memory access from a base
register value and an offset register value. The offset can be optionally shifted and extended.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 x 1 1 1 0 0 0 0 0 1 Rm option S 1 0 Rn Rt
size opc

32-bit (size == 10)

STR <Wt>, [<Xn|SP>, (<Wm>|<Xm>){, <extend> {<amount>}}]

64-bit (size == 11)

STR <Xt>, [<Xn|SP>, (<Wm>|<Xm>){, <extend> {<amount>}}]

integer scale = UInt(size);

if option<1> == '0' then UNDEFINED; // sub-word index
ExtendType extend_type = DecodeRegExtend(option);
integer shift = if S == '1' then scale else 0;

Assembler Symbols

<Wt> Is the 32-bit name of the general-purpose register to be transferred, encoded in the "Rt" field.
<Xt> Is the 64-bit name of the general-purpose register to be transferred, encoded in the "Rt" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<Wm> When option<0> is set to 0, is the 32-bit name of the general-purpose index register, encoded in the
"Rm" field.
<Xm> When option<0> is set to 1, is the 64-bit name of the general-purpose index register, encoded in the
"Rm" field.

<extend> Is the index extend/shift specifier, defaulting to LSL, and which must be omitted for the LSL option
when <amount> is omitted. encoded in “option”:

option <extend>
010 UXTW
011 LSL
110 SXTW
111 SXTX

<amount> For the 32-bit variant: is the index shift amount, optional only when <extend> is not LSL. Where it is
permitted to be optional, it defaults to #0. It is encoded in “S”:

S <amount>
0 #0
1 #2

For the 64-bit variant: is the index shift amount, optional only when <extend> is not LSL. Where it is
permitted to be optional, it defaults to #0. It is encoded in “S”:

S <amount>
0 #0
1 #3

STR (register) Page 700

Shared Decode

integer n = UInt(Rn);
integer t = UInt(Rt);
integer m = UInt(Rm);

integer datasize = 8 << scale;

Operation

bits(64) offset = ExtendReg(m, extend_type, shift);

bits(64) address;
bits(datasize) data;

if HaveMTE2Ext() then
SetTagCheckedInstruction(TRUE);

if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];

address = address + offset;

data = X[t];
Mem[address, datasize DIV 8, AccType_NORMAL] = data;

Operational information

If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

STR (register) Page 701

STRB (immediate)

Store Register Byte (immediate) stores the least significant byte of a 32-bit register to memory. The address that is
used for the store is calculated from a base register and an immediate offset. For information about memory accesses,
see Load/Store addressing modes.
It has encodings from 3 classes: Post-index , Pre-index and Unsigned offset

Post-index

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 1 1 0 0 0 0 0 0 imm9 0 1 Rn Rt
size opc

STRB <Wt>, [<Xn|SP>], #<simm>

boolean wback = TRUE;

boolean postindex = TRUE;
bits(64) offset = SignExtend(imm9, 64);

Pre-index

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 1 1 0 0 0 0 0 0 imm9 1 1 Rn Rt
size opc

STRB <Wt>, [<Xn|SP>, #<simm>]!

boolean wback = TRUE;

boolean postindex = FALSE;
bits(64) offset = SignExtend(imm9, 64);

Unsigned offset

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 1 1 0 0 1 0 0 imm12 Rn Rt
size opc

STRB <Wt>, [<Xn|SP>{, #<pimm>}]

boolean wback = FALSE;

boolean postindex = FALSE;
bits(64) offset = LSL(ZeroExtend(imm12, 64), 0);

For information about the CONSTRAINED UNPREDICTABLE behavior of this instruction, see Architectural Constraints on
UNPREDICTABLE behaviors, and particularly STRB (immediate).

Assembler Symbols

<Wt> Is the 32-bit name of the general-purpose register to be transferred, encoded in the "Rt" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<simm> Is the signed immediate byte offset, in the range -256 to 255, encoded in the "imm9" field.
<pimm> Is the optional positive immediate byte offset, in the range 0 to 4095, defaulting to 0 and encoded in the
"imm12" field.

STRB (immediate) Page 702

Shared Decode

integer n = UInt(Rn);
integer t = UInt(Rt);

boolean tag_checked = wback || n != 31;

boolean rt_unknown = FALSE;

if wback && n == t && n != 31 then

Operation

bits(64) address;
bits(8) data;

if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);

if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];

if !postindex then
address = address + offset;

if rt_unknown then
data = bits(8) UNKNOWN;
else
data = X[t];
Mem[address, 1, AccType_NORMAL] = data;

if wback then
if postindex then
address = address + offset;
if n == 31 then
SP[] = address;
else
X[n] = address;

Operational information

If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

STRB (immediate) Page 703

STRB (register)

Store Register Byte (register) calculates an address from a base register value and an offset register value, and stores
a byte from a 32-bit register to the calculated address. For information about memory accesses, see Load/Store
addressing modes.
The instruction uses an offset addressing mode, that calculates the address used for the memory access from a base
register value and an offset register value. The offset can be optionally shifted and extended.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 1 1 0 0 0 0 0 1 Rm option S 1 0 Rn Rt
size opc

Extended register (option != 011)

STRB <Wt>, [<Xn|SP>, (<Wm>|<Xm>), <extend> {<amount>}]

Shifted register (option == 011)

STRB <Wt>, [<Xn|SP>, <Xm>{, LSL <amount>}]

if option<1> == '0' then UNDEFINED; // sub-word index

ExtendType extend_type = DecodeRegExtend(option);

Assembler Symbols

<Wt> Is the 32-bit name of the general-purpose register to be transferred, encoded in the "Rt" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<Wm> When option<0> is set to 0, is the 32-bit name of the general-purpose index register, encoded in the
"Rm" field.
<Xm> When option<0> is set to 1, is the 64-bit name of the general-purpose index register, encoded in the
"Rm" field.

<extend> Is the index extend specifier, encoded in “option”:

option <extend>
010 UXTW
110 SXTW
111 SXTX

<amount> Is the index shift amount, it must be #0, encoded in "S" as 0 if omitted, or as 1 if present.

Shared Decode

integer n = UInt(Rn);
integer t = UInt(Rt);
integer m = UInt(Rm);

STRB (register) Page 704

Operation

bits(64) offset = ExtendReg(m, extend_type, 0);

bits(64) address;
bits(8) data;

if HaveMTE2Ext() then
SetTagCheckedInstruction(TRUE);

if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];

address = address + offset;

data = X[t];
Mem[address, 1, AccType_NORMAL] = data;

Operational information

If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

STRB (register) Page 705

STRH (immediate)

Store Register Halfword (immediate) stores the least significant halfword of a 32-bit register to memory. The address
that is used for the store is calculated from a base register and an immediate offset. For information about memory
accesses, see Load/Store addressing modes.
It has encodings from 3 classes: Post-index , Pre-index and Unsigned offset

Post-index

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 0 0 0 0 0 0 imm9 0 1 Rn Rt
size opc

STRH <Wt>, [<Xn|SP>], #<simm>

boolean wback = TRUE;

boolean postindex = TRUE;
bits(64) offset = SignExtend(imm9, 64);

Pre-index

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 0 0 0 0 0 0 imm9 1 1 Rn Rt
size opc

STRH <Wt>, [<Xn|SP>, #<simm>]!

boolean wback = TRUE;

boolean postindex = FALSE;
bits(64) offset = SignExtend(imm9, 64);

Unsigned offset

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 0 0 1 0 0 imm12 Rn Rt
size opc

STRH <Wt>, [<Xn|SP>{, #<pimm>}]

boolean wback = FALSE;

boolean postindex = FALSE;
bits(64) offset = LSL(ZeroExtend(imm12, 64), 1);

For information about the CONSTRAINED UNPREDICTABLE behavior of this instruction, see Architectural Constraints on
UNPREDICTABLE behaviors, and particularly STRH (immediate).

Assembler Symbols

<Wt> Is the 32-bit name of the general-purpose register to be transferred, encoded in the "Rt" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<simm> Is the signed immediate byte offset, in the range -256 to 255, encoded in the "imm9" field.
<pimm> Is the optional positive immediate byte offset, a multiple of 2 in the range 0 to 8190, defaulting to 0 and
encoded in the "imm12" field as <pimm>/2.

STRH (immediate) Page 706

Shared Decode

integer n = UInt(Rn);
integer t = UInt(Rt);

boolean tag_checked = wback || n != 31;

boolean rt_unknown = FALSE;

if wback && n == t && n != 31 then

Operation

bits(64) address;
bits(16) data;

if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);

if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];

if !postindex then
address = address + offset;

if rt_unknown then
data = bits(16) UNKNOWN;
else
data = X[t];
Mem[address, 2, AccType_NORMAL] = data;

if wback then
if postindex then
address = address + offset;
if n == 31 then
SP[] = address;
else
X[n] = address;

Operational information

If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

STRH (immediate) Page 707

STRH (register)

Store Register Halfword (register) calculates an address from a base register value and an offset register value, and
stores a halfword from a 32-bit register to the calculated address. For information about memory accesses, see Load/
Store addressing modes.
The instruction uses an offset addressing mode, that calculates the address used for the memory access from a base
register value and an offset register value. The offset can be optionally shifted and extended.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 0 0 0 0 0 1 Rm option S 1 0 Rn Rt
size opc

STRH <Wt>, [<Xn|SP>, (<Wm>|<Xm>){, <extend> {<amount>}}]

if option<1> == '0' then UNDEFINED; // sub-word index

ExtendType extend_type = DecodeRegExtend(option);
integer shift = if S == '1' then 1 else 0;

Assembler Symbols

<Wt> Is the 32-bit name of the general-purpose register to be transferred, encoded in the "Rt" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<Wm> When option<0> is set to 0, is the 32-bit name of the general-purpose index register, encoded in the
"Rm" field.
<Xm> When option<0> is set to 1, is the 64-bit name of the general-purpose index register, encoded in the
"Rm" field.

<extend> Is the index extend/shift specifier, defaulting to LSL, and which must be omitted for the LSL option
when <amount> is omitted. encoded in “option”:

option <extend>
010 UXTW
011 LSL
110 SXTW
111 SXTX

<amount> Is the index shift amount, optional only when <extend> is not LSL. Where it is permitted to be optional,
it defaults to #0. It is encoded in “S”:

S <amount>
0 #0
1 #1

Shared Decode

integer n = UInt(Rn);
integer t = UInt(Rt);
integer m = UInt(Rm);

STRH (register) Page 708

Operation

bits(64) offset = ExtendReg(m, extend_type, shift);

bits(64) address;
bits(16) data;

if HaveMTE2Ext() then
SetTagCheckedInstruction(TRUE);

if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];

address = address + offset;

data = X[t];
Mem[address, 2, AccType_NORMAL] = data;

Operational information

If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

STRH (register) Page 709

STSET, STSETL

Atomic bit set on word or doubleword in memory, without return, atomically loads a 32-bit word or 64-bit doubleword
from memory, performs a bitwise OR with the value held in a register on it, and stores the result back to memory.
• STSET does not have release semantics.
• STSETL stores to memory with release semantics, as described in Load-Acquire, Store-Release.
For information about memory accesses see Load/Store addressing modes.

This is an alias of LDSET, LDSETA, LDSETAL, LDSETL. This means:

• The encodings in this description are named to match the encodings of LDSET, LDSETA, LDSETAL, LDSETL.
• The description of LDSET, LDSETA, LDSETAL, LDSETL gives the operational pseudocode for this instruction.

Integer
(FEAT_LSE)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 x 1 1 1 0 0 0 0 R 1 Rs 0 0 1 1 0 0 Rn 1 1 1 1 1
size A opc Rt

32-bit LDSET alias (size == 10 && R == 0)

STSET <Ws>, [<Xn|SP>]

is equivalent to

LDSET <Ws>, WZR, [<Xn|SP>]

and is always the preferred disassembly.

32-bit LDSETL alias (size == 10 && R == 1)

STSETL <Ws>, [<Xn|SP>]

is equivalent to

LDSETL <Ws>, WZR, [<Xn|SP>]

and is always the preferred disassembly.

64-bit LDSET alias (size == 11 && R == 0)

STSET <Xs>, [<Xn|SP>]

is equivalent to

LDSET <Xs>, XZR, [<Xn|SP>]

and is always the preferred disassembly.

64-bit LDSETL alias (size == 11 && R == 1)

STSETL <Xs>, [<Xn|SP>]

is equivalent to

LDSETL <Xs>, XZR, [<Xn|SP>]

and is always the preferred disassembly.

STSET, STSETL Page 710

Assembler Symbols

Operation

The description of LDSET, LDSETA, LDSETAL, LDSETL gives the operational pseudocode for this instruction.

Operational information

If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

STSET, STSETL Page 711

STSETB, STSETLB

Atomic bit set on byte in memory, without return, atomically loads an 8-bit byte from memory, performs a bitwise OR
with the value held in a register on it, and stores the result back to memory.
• STSETB does not have release semantics.
• STSETLB stores to memory with release semantics, as described in Load-Acquire, Store-Release.
For information about memory accesses see Load/Store addressing modes.

This is an alias of LDSETB, LDSETAB, LDSETALB, LDSETLB. This means:

• The encodings in this description are named to match the encodings of LDSETB, LDSETAB, LDSETALB,
LDSETLB.
• The description of LDSETB, LDSETAB, LDSETALB, LDSETLB gives the operational pseudocode for this
instruction.

Integer
(FEAT_LSE)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 1 1 0 0 0 0 R 1 Rs 0 0 1 1 0 0 Rn 1 1 1 1 1
size A opc Rt

No memory ordering (R == 0)

STSETB <Ws>, [<Xn|SP>]

is equivalent to

LDSETB <Ws>, WZR, [<Xn|SP>]

and is always the preferred disassembly.

Release (R == 1)

STSETLB <Ws>, [<Xn|SP>]

is equivalent to

LDSETLB <Ws>, WZR, [<Xn|SP>]

and is always the preferred disassembly.

Assembler Symbols

Operation

The description of LDSETB, LDSETAB, LDSETALB, LDSETLB gives the operational pseudocode for this instruction.

Operational information

If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

STSETB, STSETLB Page 712

STSETH, STSETLH

Atomic bit set on halfword in memory, without return, atomically loads a 16-bit halfword from memory, performs a
bitwise OR with the value held in a register on it, and stores the result back to memory.
• STSETH does not have release semantics.
• STSETLH stores to memory with release semantics, as described in Load-Acquire, Store-Release.
For information about memory accesses see Load/Store addressing modes.

This is an alias of LDSETH, LDSETAH, LDSETALH, LDSETLH. This means:

• The encodings in this description are named to match the encodings of LDSETH, LDSETAH, LDSETALH,
LDSETLH.
• The description of LDSETH, LDSETAH, LDSETALH, LDSETLH gives the operational pseudocode for this
instruction.

Integer
(FEAT_LSE)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 0 0 0 0 R 1 Rs 0 0 1 1 0 0 Rn 1 1 1 1 1
size A opc Rt

No memory ordering (R == 0)

STSETH <Ws>, [<Xn|SP>]

is equivalent to

LDSETH <Ws>, WZR, [<Xn|SP>]

and is always the preferred disassembly.

Release (R == 1)

STSETLH <Ws>, [<Xn|SP>]

is equivalent to

LDSETLH <Ws>, WZR, [<Xn|SP>]

and is always the preferred disassembly.

Assembler Symbols

Operation

The description of LDSETH, LDSETAH, LDSETALH, LDSETLH gives the operational pseudocode for this instruction.

Operational information

If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

STSETH, STSETLH Page 713

STSMAX, STSMAXL

Atomic signed maximum on word or doubleword in memory, without return, atomically loads a 32-bit word or 64-bit
doubleword from memory, compares it against the value held in a register, and stores the larger value back to memory,
treating the values as signed numbers.
• STSMAX does not have release semantics.
• STSMAXL stores to memory with release semantics, as described in Load-Acquire, Store-Release.
For information about memory accesses see Load/Store addressing modes.

This is an alias of LDSMAX, LDSMAXA, LDSMAXAL, LDSMAXL. This means:

• The encodings in this description are named to match the encodings of LDSMAX, LDSMAXA, LDSMAXAL,
LDSMAXL.
• The description of LDSMAX, LDSMAXA, LDSMAXAL, LDSMAXL gives the operational pseudocode for this
instruction.

Integer
(FEAT_LSE)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 x 1 1 1 0 0 0 0 R 1 Rs 0 1 0 0 0 0 Rn 1 1 1 1 1
size A opc Rt

32-bit LDSMAX alias (size == 10 && R == 0)

STSMAX <Ws>, [<Xn|SP>]

is equivalent to

LDSMAX <Ws>, WZR, [<Xn|SP>]

and is always the preferred disassembly.

32-bit LDSMAXL alias (size == 10 && R == 1)

STSMAXL <Ws>, [<Xn|SP>]

is equivalent to

LDSMAXL <Ws>, WZR, [<Xn|SP>]

and is always the preferred disassembly.

64-bit LDSMAX alias (size == 11 && R == 0)

STSMAX <Xs>, [<Xn|SP>]

is equivalent to

LDSMAX <Xs>, XZR, [<Xn|SP>]

and is always the preferred disassembly.

64-bit LDSMAXL alias (size == 11 && R == 1)

STSMAXL <Xs>, [<Xn|SP>]

is equivalent to

LDSMAXL <Xs>, XZR, [<Xn|SP>]

and is always the preferred disassembly.

STSMAX, STSMAXL Page 714

Assembler Symbols

Operation

The description of LDSMAX, LDSMAXA, LDSMAXAL, LDSMAXL gives the operational pseudocode for this instruction.

Operational information

If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

STSMAX, STSMAXL Page 715

STSMAXB, STSMAXLB

Atomic signed maximum on byte in memory, without return, atomically loads an 8-bit byte from memory, compares it
against the value held in a register, and stores the larger value back to memory, treating the values as signed
numbers.
• STSMAXB does not have release semantics.
• STSMAXLB stores to memory with release semantics, as described in Load-Acquire, Store-Release.
For information about memory accesses see Load/Store addressing modes.

This is an alias of LDSMAXB, LDSMAXAB, LDSMAXALB, LDSMAXLB. This means:

• The encodings in this description are named to match the encodings of LDSMAXB, LDSMAXAB, LDSMAXALB,
LDSMAXLB.
• The description of LDSMAXB, LDSMAXAB, LDSMAXALB, LDSMAXLB gives the operational pseudocode for
this instruction.

Integer
(FEAT_LSE)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 1 1 0 0 0 0 R 1 Rs 0 1 0 0 0 0 Rn 1 1 1 1 1
size A opc Rt

No memory ordering (R == 0)

STSMAXB <Ws>, [<Xn|SP>]

is equivalent to

LDSMAXB <Ws>, WZR, [<Xn|SP>]

and is always the preferred disassembly.

Release (R == 1)

STSMAXLB <Ws>, [<Xn|SP>]

is equivalent to

LDSMAXLB <Ws>, WZR, [<Xn|SP>]

and is always the preferred disassembly.

Assembler Symbols

Operation

The description of LDSMAXB, LDSMAXAB, LDSMAXALB, LDSMAXLB gives the operational pseudocode for this
instruction.

Operational information

If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

STSMAXB, STSMAXLB Page 716

STSMAXH, STSMAXLH

Atomic signed maximum on halfword in memory, without return, atomically loads a 16-bit halfword from memory,
compares it against the value held in a register, and stores the larger value back to memory, treating the values as
signed numbers.
• STSMAXH does not have release semantics.
• STSMAXLH stores to memory with release semantics, as described in Load-Acquire, Store-Release.
For information about memory accesses see Load/Store addressing modes.

This is an alias of LDSMAXH, LDSMAXAH, LDSMAXALH, LDSMAXLH. This means:

• The encodings in this description are named to match the encodings of LDSMAXH, LDSMAXAH,
LDSMAXALH, LDSMAXLH.
• The description of LDSMAXH, LDSMAXAH, LDSMAXALH, LDSMAXLH gives the operational pseudocode for
this instruction.

Integer
(FEAT_LSE)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 0 0 0 0 R 1 Rs 0 1 0 0 0 0 Rn 1 1 1 1 1
size A opc Rt

No memory ordering (R == 0)

STSMAXH <Ws>, [<Xn|SP>]

is equivalent to

LDSMAXH <Ws>, WZR, [<Xn|SP>]

and is always the preferred disassembly.

Release (R == 1)

STSMAXLH <Ws>, [<Xn|SP>]

is equivalent to

LDSMAXLH <Ws>, WZR, [<Xn|SP>]

and is always the preferred disassembly.

Assembler Symbols

Operation

The description of LDSMAXH, LDSMAXAH, LDSMAXALH, LDSMAXLH gives the operational pseudocode for this
instruction.

Operational information

If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

STSMAXH, STSMAXLH Page 717

STSMIN, STSMINL

Atomic signed minimum on word or doubleword in memory, without return, atomically loads a 32-bit word or 64-bit
doubleword from memory, compares it against the value held in a register, and stores the smaller value back to
memory, treating the values as signed numbers.
• STSMIN does not have release semantics.
• STSMINL stores to memory with release semantics, as described in Load-Acquire, Store-Release.
For information about memory accesses see Load/Store addressing modes.

This is an alias of LDSMIN, LDSMINA, LDSMINAL, LDSMINL. This means:

• The encodings in this description are named to match the encodings of LDSMIN, LDSMINA, LDSMINAL,
LDSMINL.
• The description of LDSMIN, LDSMINA, LDSMINAL, LDSMINL gives the operational pseudocode for this
instruction.

Integer
(FEAT_LSE)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 x 1 1 1 0 0 0 0 R 1 Rs 0 1 0 1 0 0 Rn 1 1 1 1 1
size A opc Rt

32-bit LDSMIN alias (size == 10 && R == 0)

STSMIN <Ws>, [<Xn|SP>]

is equivalent to

LDSMIN <Ws>, WZR, [<Xn|SP>]

and is always the preferred disassembly.

32-bit LDSMINL alias (size == 10 && R == 1)

STSMINL <Ws>, [<Xn|SP>]

is equivalent to

LDSMINL <Ws>, WZR, [<Xn|SP>]

and is always the preferred disassembly.

64-bit LDSMIN alias (size == 11 && R == 0)

STSMIN <Xs>, [<Xn|SP>]

is equivalent to

LDSMIN <Xs>, XZR, [<Xn|SP>]

and is always the preferred disassembly.

64-bit LDSMINL alias (size == 11 && R == 1)

STSMINL <Xs>, [<Xn|SP>]

is equivalent to

LDSMINL <Xs>, XZR, [<Xn|SP>]

and is always the preferred disassembly.

STSMIN, STSMINL Page 718

Assembler Symbols

Operation

The description of LDSMIN, LDSMINA, LDSMINAL, LDSMINL gives the operational pseudocode for this instruction.

Operational information

If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

STSMIN, STSMINL Page 719

STSMINB, STSMINLB

Atomic signed minimum on byte in memory, without return, atomically loads an 8-bit byte from memory, compares it
against the value held in a register, and stores the smaller value back to memory, treating the values as signed
numbers.
• STSMINB does not have release semantics.
• STSMINLB stores to memory with release semantics, as described in Load-Acquire, Store-Release.
For information about memory accesses see Load/Store addressing modes.

This is an alias of LDSMINB, LDSMINAB, LDSMINALB, LDSMINLB. This means:

• The encodings in this description are named to match the encodings of LDSMINB, LDSMINAB, LDSMINALB,
LDSMINLB.
• The description of LDSMINB, LDSMINAB, LDSMINALB, LDSMINLB gives the operational pseudocode for
this instruction.

Integer
(FEAT_LSE)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 1 1 0 0 0 0 R 1 Rs 0 1 0 1 0 0 Rn 1 1 1 1 1
size A opc Rt

No memory ordering (R == 0)

STSMINB <Ws>, [<Xn|SP>]

is equivalent to

LDSMINB <Ws>, WZR, [<Xn|SP>]

and is always the preferred disassembly.

Release (R == 1)

STSMINLB <Ws>, [<Xn|SP>]

is equivalent to

LDSMINLB <Ws>, WZR, [<Xn|SP>]

and is always the preferred disassembly.

Assembler Symbols

Operation

The description of LDSMINB, LDSMINAB, LDSMINALB, LDSMINLB gives the operational pseudocode for this
instruction.

Operational information

If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

STSMINB, STSMINLB Page 720

STSMINH, STSMINLH

Atomic signed minimum on halfword in memory, without return, atomically loads a 16-bit halfword from memory,
compares it against the value held in a register, and stores the smaller value back to memory, treating the values as
signed numbers.
• STSMINH does not have release semantics.
• STSMINLH stores to memory with release semantics, as described in Load-Acquire, Store-Release.
For information about memory accesses see Load/Store addressing modes.

This is an alias of LDSMINH, LDSMINAH, LDSMINALH, LDSMINLH. This means:

• The encodings in this description are named to match the encodings of LDSMINH, LDSMINAH, LDSMINALH,
LDSMINLH.
• The description of LDSMINH, LDSMINAH, LDSMINALH, LDSMINLH gives the operational pseudocode for
this instruction.

Integer
(FEAT_LSE)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 0 0 0 0 R 1 Rs 0 1 0 1 0 0 Rn 1 1 1 1 1
size A opc Rt

No memory ordering (R == 0)

STSMINH <Ws>, [<Xn|SP>]

is equivalent to

LDSMINH <Ws>, WZR, [<Xn|SP>]

and is always the preferred disassembly.

Release (R == 1)

STSMINLH <Ws>, [<Xn|SP>]

is equivalent to

LDSMINLH <Ws>, WZR, [<Xn|SP>]

and is always the preferred disassembly.

Assembler Symbols

Operation

The description of LDSMINH, LDSMINAH, LDSMINALH, LDSMINLH gives the operational pseudocode for this
instruction.

Operational information

If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

STSMINH, STSMINLH Page 721

STTR

Store Register (unprivileged) stores a word or doubleword from a register to memory. The address that is used for the
store is calculated from a base register and an immediate offset.
Memory accesses made by the instruction behave as if the instruction was executed at EL0 if the Effective value of
PSTATE.UAO is 0 and either:
• The instruction is executed at EL1.
• The instruction is executed at EL2 when the Effective value of HCR_EL2.{E2H, TGE} is {1, 1}.
Otherwise, the memory access operates with the restrictions determined by the Exception level at which the
instruction is executed. For information about memory accesses, see Load/Store addressing modes.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 x 1 1 1 0 0 0 0 0 0 imm9 1 0 Rn Rt
size opc

32-bit (size == 10)

STTR <Wt>, [<Xn|SP>{, #<simm>}]

64-bit (size == 11)

STTR <Xt>, [<Xn|SP>{, #<simm>}]

integer scale = UInt(size);

bits(64) offset = SignExtend(imm9, 64);

Assembler Symbols

Shared Decode

user_access_override = HaveUAOExt() && PSTATE.UAO == '1';

if !user_access_override && (unpriv_at_el1 || unpriv_at_el2) then
acctype = AccType_UNPRIV;
else
acctype = AccType_NORMAL;

integer datasize = 8 << scale;

boolean tag_checked = n != 31;

STTR Page 722

Operation

bits(64) address;
bits(datasize) data;

if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);

if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];

address = address + offset;

data = X[t];
Mem[address, datasize DIV 8, acctype] = data;

Operational information

If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

STTR Page 723

STTRB

Store Register Byte (unprivileged) stores a byte from a 32-bit register to memory. The address that is used for the
store is calculated from a base register and an immediate offset.
Memory accesses made by the instruction behave as if the instruction was executed at EL0 if the Effective value of
PSTATE.UAO is 0 and either:
• The instruction is executed at EL1.
• The instruction is executed at EL2 when the Effective value of HCR_EL2.{E2H, TGE} is {1, 1}.
Otherwise, the memory access operates with the restrictions determined by the Exception level at which the
instruction is executed. For information about memory accesses, see Load/Store addressing modes.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 1 1 0 0 0 0 0 0 imm9 1 0 Rn Rt
size opc

STTRB <Wt>, [<Xn|SP>{, #<simm>}]

bits(64) offset = SignExtend(imm9, 64);

Assembler Symbols

<Wt> Is the 32-bit name of the general-purpose register to be transferred, encoded in the "Rt" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<simm> Is the optional signed immediate byte offset, in the range -256 to 255, defaulting to 0 and encoded in
the "imm9" field.

Shared Decode

user_access_override = HaveUAOExt() && PSTATE.UAO == '1';

if !user_access_override && (unpriv_at_el1 || unpriv_at_el2) then
acctype = AccType_UNPRIV;
else
acctype = AccType_NORMAL;

boolean tag_checked = n != 31;

Operation

bits(64) address;
bits(8) data;

if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);

if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];

address = address + offset;

data = X[t];
Mem[address, 1, acctype] = data;

STTRB Page 724

Operational information

If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

STTRB Page 725

STTRH

Store Register Halfword (unprivileged) stores a halfword from a 32-bit register to memory. The address that is used
for the store is calculated from a base register and an immediate offset.
Memory accesses made by the instruction behave as if the instruction was executed at EL0 if the Effective value of
PSTATE.UAO is 0 and either:
• The instruction is executed at EL1.
• The instruction is executed at EL2 when the Effective value of HCR_EL2.{E2H, TGE} is {1, 1}.
Otherwise, the memory access operates with the restrictions determined by the Exception level at which the
instruction is executed. For information about memory accesses, see Load/Store addressing modes.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 0 0 0 0 0 0 imm9 1 0 Rn Rt
size opc

STTRH <Wt>, [<Xn|SP>{, #<simm>}]

bits(64) offset = SignExtend(imm9, 64);

Assembler Symbols

<Wt> Is the 32-bit name of the general-purpose register to be transferred, encoded in the "Rt" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<simm> Is the optional signed immediate byte offset, in the range -256 to 255, defaulting to 0 and encoded in
the "imm9" field.

Shared Decode

user_access_override = HaveUAOExt() && PSTATE.UAO == '1';

if !user_access_override && (unpriv_at_el1 || unpriv_at_el2) then
acctype = AccType_UNPRIV;
else
acctype = AccType_NORMAL;

boolean tag_checked = n != 31;

Operation

bits(64) address;
bits(16) data;

if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);

if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];

address = address + offset;

data = X[t];
Mem[address, 2, acctype] = data;

STTRH Page 726

Operational information

If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

STTRH Page 727

STUMAX, STUMAXL

Atomic unsigned maximum on word or doubleword in memory, without return, atomically loads a 32-bit word or 64-bit
doubleword from memory, compares it against the value held in a register, and stores the larger value back to memory,
treating the values as unsigned numbers.
• STUMAX does not have release semantics.
• STUMAXL stores to memory with release semantics, as described in Load-Acquire, Store-Release.
For information about memory accesses see Load/Store addressing modes.

This is an alias of LDUMAX, LDUMAXA, LDUMAXAL, LDUMAXL. This means:

• The encodings in this description are named to match the encodings of LDUMAX, LDUMAXA, LDUMAXAL,
LDUMAXL.
• The description of LDUMAX, LDUMAXA, LDUMAXAL, LDUMAXL gives the operational pseudocode for this
instruction.

Integer
(FEAT_LSE)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 x 1 1 1 0 0 0 0 R 1 Rs 0 1 1 0 0 0 Rn 1 1 1 1 1
size A opc Rt

32-bit LDUMAX alias (size == 10 && R == 0)

STUMAX <Ws>, [<Xn|SP>]

is equivalent to

LDUMAX <Ws>, WZR, [<Xn|SP>]

and is always the preferred disassembly.

32-bit LDUMAXL alias (size == 10 && R == 1)

STUMAXL <Ws>, [<Xn|SP>]

is equivalent to

LDUMAXL <Ws>, WZR, [<Xn|SP>]

and is always the preferred disassembly.

64-bit LDUMAX alias (size == 11 && R == 0)

STUMAX <Xs>, [<Xn|SP>]

is equivalent to

LDUMAX <Xs>, XZR, [<Xn|SP>]

and is always the preferred disassembly.

64-bit LDUMAXL alias (size == 11 && R == 1)

STUMAXL <Xs>, [<Xn|SP>]

is equivalent to

LDUMAXL <Xs>, XZR, [<Xn|SP>]

and is always the preferred disassembly.

STUMAX, STUMAXL Page 728

Assembler Symbols

Operation

The description of LDUMAX, LDUMAXA, LDUMAXAL, LDUMAXL gives the operational pseudocode for this
instruction.

Operational information

If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

STUMAX, STUMAXL Page 729

STUMAXB, STUMAXLB

Atomic unsigned maximum on byte in memory, without return, atomically loads an 8-bit byte from memory, compares
it against the value held in a register, and stores the larger value back to memory, treating the values as unsigned
numbers.
• STUMAXB does not have release semantics.
• STUMAXLB stores to memory with release semantics, as described in Load-Acquire, Store-Release.
For information about memory accesses see Load/Store addressing modes.

This is an alias of LDUMAXB, LDUMAXAB, LDUMAXALB, LDUMAXLB. This means:

• The encodings in this description are named to match the encodings of LDUMAXB, LDUMAXAB,
LDUMAXALB, LDUMAXLB.
• The description of LDUMAXB, LDUMAXAB, LDUMAXALB, LDUMAXLB gives the operational pseudocode for
this instruction.

Integer
(FEAT_LSE)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 1 1 0 0 0 0 R 1 Rs 0 1 1 0 0 0 Rn 1 1 1 1 1
size A opc Rt

No memory ordering (R == 0)

STUMAXB <Ws>, [<Xn|SP>]

is equivalent to

LDUMAXB <Ws>, WZR, [<Xn|SP>]

and is always the preferred disassembly.

Release (R == 1)

STUMAXLB <Ws>, [<Xn|SP>]

is equivalent to

LDUMAXLB <Ws>, WZR, [<Xn|SP>]

and is always the preferred disassembly.

Assembler Symbols

Operation

The description of LDUMAXB, LDUMAXAB, LDUMAXALB, LDUMAXLB gives the operational pseudocode for this
instruction.

Operational information

If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

STUMAXB, STUMAXLB Page 730

STUMAXH, STUMAXLH

Atomic unsigned maximum on halfword in memory, without return, atomically loads a 16-bit halfword from memory,
compares it against the value held in a register, and stores the larger value back to memory, treating the values as
unsigned numbers.
• STUMAXH does not have release semantics.
• STUMAXLH stores to memory with release semantics, as described in Load-Acquire, Store-Release.
For information about memory accesses see Load/Store addressing modes.

This is an alias of LDUMAXH, LDUMAXAH, LDUMAXALH, LDUMAXLH. This means:

• The encodings in this description are named to match the encodings of LDUMAXH, LDUMAXAH,
LDUMAXALH, LDUMAXLH.
• The description of LDUMAXH, LDUMAXAH, LDUMAXALH, LDUMAXLH gives the operational pseudocode for
this instruction.

Integer
(FEAT_LSE)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 0 0 0 0 R 1 Rs 0 1 1 0 0 0 Rn 1 1 1 1 1
size A opc Rt

No memory ordering (R == 0)

STUMAXH <Ws>, [<Xn|SP>]

is equivalent to

LDUMAXH <Ws>, WZR, [<Xn|SP>]

and is always the preferred disassembly.

Release (R == 1)

STUMAXLH <Ws>, [<Xn|SP>]

is equivalent to

LDUMAXLH <Ws>, WZR, [<Xn|SP>]

and is always the preferred disassembly.

Assembler Symbols

Operation

The description of LDUMAXH, LDUMAXAH, LDUMAXALH, LDUMAXLH gives the operational pseudocode for this
instruction.

Operational information

If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

STUMAXH, STUMAXLH Page 731

STUMIN, STUMINL

Atomic unsigned minimum on word or doubleword in memory, without return, atomically loads a 32-bit word or 64-bit
doubleword from memory, compares it against the value held in a register, and stores the smaller value back to
memory, treating the values as unsigned numbers.
• STUMIN does not have release semantics.
• STUMINL stores to memory with release semantics, as described in Load-Acquire, Store-Release.
For information about memory accesses see Load/Store addressing modes.

This is an alias of LDUMIN, LDUMINA, LDUMINAL, LDUMINL. This means:

• The encodings in this description are named to match the encodings of LDUMIN, LDUMINA, LDUMINAL,
LDUMINL.
• The description of LDUMIN, LDUMINA, LDUMINAL, LDUMINL gives the operational pseudocode for this
instruction.

Integer
(FEAT_LSE)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 x 1 1 1 0 0 0 0 R 1 Rs 0 1 1 1 0 0 Rn 1 1 1 1 1
size A opc Rt

32-bit LDUMIN alias (size == 10 && R == 0)

STUMIN <Ws>, [<Xn|SP>]

is equivalent to

LDUMIN <Ws>, WZR, [<Xn|SP>]

and is always the preferred disassembly.

32-bit LDUMINL alias (size == 10 && R == 1)

STUMINL <Ws>, [<Xn|SP>]

is equivalent to

LDUMINL <Ws>, WZR, [<Xn|SP>]

and is always the preferred disassembly.

64-bit LDUMIN alias (size == 11 && R == 0)

STUMIN <Xs>, [<Xn|SP>]

is equivalent to

LDUMIN <Xs>, XZR, [<Xn|SP>]

and is always the preferred disassembly.

64-bit LDUMINL alias (size == 11 && R == 1)

STUMINL <Xs>, [<Xn|SP>]

is equivalent to

LDUMINL <Xs>, XZR, [<Xn|SP>]

and is always the preferred disassembly.

STUMIN, STUMINL Page 732

Assembler Symbols

Operation

The description of LDUMIN, LDUMINA, LDUMINAL, LDUMINL gives the operational pseudocode for this instruction.

Operational information

If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

STUMIN, STUMINL Page 733

STUMINB, STUMINLB

Atomic unsigned minimum on byte in memory, without return, atomically loads an 8-bit byte from memory, compares it
against the value held in a register, and stores the smaller value back to memory, treating the values as unsigned
numbers.
• STUMINB does not have release semantics.
• STUMINLB stores to memory with release semantics, as described in Load-Acquire, Store-Release.
For information about memory accesses see Load/Store addressing modes.

This is an alias of LDUMINB, LDUMINAB, LDUMINALB, LDUMINLB. This means:

• The encodings in this description are named to match the encodings of LDUMINB, LDUMINAB, LDUMINALB,
LDUMINLB.
• The description of LDUMINB, LDUMINAB, LDUMINALB, LDUMINLB gives the operational pseudocode for
this instruction.

Integer
(FEAT_LSE)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 1 1 0 0 0 0 R 1 Rs 0 1 1 1 0 0 Rn 1 1 1 1 1
size A opc Rt

No memory ordering (R == 0)

STUMINB <Ws>, [<Xn|SP>]

is equivalent to

LDUMINB <Ws>, WZR, [<Xn|SP>]

and is always the preferred disassembly.

Release (R == 1)

STUMINLB <Ws>, [<Xn|SP>]

is equivalent to

LDUMINLB <Ws>, WZR, [<Xn|SP>]

and is always the preferred disassembly.

Assembler Symbols

Operation

The description of LDUMINB, LDUMINAB, LDUMINALB, LDUMINLB gives the operational pseudocode for this
instruction.

Operational information

If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

STUMINB, STUMINLB Page 734

STUMINH, STUMINLH

Atomic unsigned minimum on halfword in memory, without return, atomically loads a 16-bit halfword from memory,
compares it against the value held in a register, and stores the smaller value back to memory, treating the values as
unsigned numbers.
• STUMINH does not have release semantics.
• STUMINLH stores to memory with release semantics, as described in Load-Acquire, Store-Release.
For information about memory accesses see Load/Store addressing modes.

This is an alias of LDUMINH, LDUMINAH, LDUMINALH, LDUMINLH. This means:

• The encodings in this description are named to match the encodings of LDUMINH, LDUMINAH,
LDUMINALH, LDUMINLH.
• The description of LDUMINH, LDUMINAH, LDUMINALH, LDUMINLH gives the operational pseudocode for
this instruction.

Integer
(FEAT_LSE)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 0 0 0 0 R 1 Rs 0 1 1 1 0 0 Rn 1 1 1 1 1
size A opc Rt

No memory ordering (R == 0)

STUMINH <Ws>, [<Xn|SP>]

is equivalent to

LDUMINH <Ws>, WZR, [<Xn|SP>]

and is always the preferred disassembly.

Release (R == 1)

STUMINLH <Ws>, [<Xn|SP>]

is equivalent to

LDUMINLH <Ws>, WZR, [<Xn|SP>]

and is always the preferred disassembly.

Assembler Symbols

Operation

The description of LDUMINH, LDUMINAH, LDUMINALH, LDUMINLH gives the operational pseudocode for this
instruction.

Operational information

If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

STUMINH, STUMINLH Page 735

STUR

Store Register (unscaled) calculates an address from a base register value and an immediate offset, and stores a 32-bit
word or a 64-bit doubleword to the calculated address, from a register. For information about memory accesses, see
Load/Store addressing modes.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 x 1 1 1 0 0 0 0 0 0 imm9 0 0 Rn Rt
size opc

32-bit (size == 10)

STUR <Wt>, [<Xn|SP>{, #<simm>}]

64-bit (size == 11)

STUR <Xt>, [<Xn|SP>{, #<simm>}]

integer scale = UInt(size);

bits(64) offset = SignExtend(imm9, 64);

Assembler Symbols

Shared Decode

integer n = UInt(Rn);
integer t = UInt(Rt);

integer datasize = 8 << scale;

boolean tag_checked = n != 31;

Operation

bits(64) address;
bits(datasize) data;

if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);

if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];

address = address + offset;

data = X[t];
Mem[address, datasize DIV 8, AccType_NORMAL] = data;

Operational information

If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

STUR Page 736

STUR Page 737

STURB

Store Register Byte (unscaled) calculates an address from a base register value and an immediate offset, and stores a
byte to the calculated address, from a 32-bit register. For information about memory accesses, see Load/Store
addressing modes.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 1 1 0 0 0 0 0 0 imm9 0 0 Rn Rt
size opc

STURB <Wt>, [<Xn|SP>{, #<simm>}]

bits(64) offset = SignExtend(imm9, 64);

Assembler Symbols

<Wt> Is the 32-bit name of the general-purpose register to be transferred, encoded in the "Rt" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<simm> Is the optional signed immediate byte offset, in the range -256 to 255, defaulting to 0 and encoded in
the "imm9" field.

Shared Decode

integer n = UInt(Rn);
integer t = UInt(Rt);

boolean tag_checked = n != 31;

Operation

bits(64) address;
bits(8) data;

if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);

if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];

address = address + offset;

data = X[t];
Mem[address, 1, AccType_NORMAL] = data;

Operational information

If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

STURB Page 738

STURH

Store Register Halfword (unscaled) calculates an address from a base register value and an immediate offset, and
stores a halfword to the calculated address, from a 32-bit register. For information about memory accesses, see Load/
Store addressing modes.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 0 0 0 0 0 0 imm9 0 0 Rn Rt
size opc

STURH <Wt>, [<Xn|SP>{, #<simm>}]

bits(64) offset = SignExtend(imm9, 64);

Assembler Symbols

<Wt> Is the 32-bit name of the general-purpose register to be transferred, encoded in the "Rt" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<simm> Is the optional signed immediate byte offset, in the range -256 to 255, defaulting to 0 and encoded in
the "imm9" field.

Shared Decode

integer n = UInt(Rn);
integer t = UInt(Rt);

boolean tag_checked = n != 31;

Operation

bits(64) address;
bits(16) data;

if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);

if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];

address = address + offset;

data = X[t];
Mem[address, 2, AccType_NORMAL] = data;

Operational information

If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

STURH Page 739

STXP

Store Exclusive Pair of registers stores two 32-bit words or two 64-bit doublewords from two registers to a memory
location if the PE has exclusive access to the memory address, and returns a status value of 0 if the store was
successful, or of 1 if no store was performed. See Synchronization and semaphores. For information on single-copy
atomicity and alignment requirements, see Requirements for single-copy atomicity and Alignment of data accesses. If
a 64-bit pair Store-Exclusive succeeds, it causes a single-copy atomic update of the 128-bit memory location being
updated. For information about memory accesses, see Load/Store addressing modes.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 sz 0 0 1 0 0 0 0 0 1 Rs 0 Rt2 Rn Rt
L o0

32-bit (sz == 0)

STXP <Ws>, <Wt1>, <Wt2>, [<Xn|SP>{,#0}]

64-bit (sz == 1)

STXP <Ws>, <Xt1>, <Xt2>, [<Xn|SP>{,#0}]

integer n = UInt(Rn);
integer t = UInt(Rt);
integer t2 = UInt(Rt2); // ignored by load/store single register
integer s = UInt(Rs); // ignored by all loads and store-release

integer elsize = 32 << UInt(sz);

integer datasize = elsize * 2;
boolean tag_checked = n != 31;

boolean rt_unknown = FALSE;

For information about the CONSTRAINED UNPREDICTABLE behavior of this instruction, see Architectural Constraints on
UNPREDICTABLE behaviors, and particularly STXP.

Assembler Symbols

1
If the operation fails to update memory.

STXP Page 740

<Wt2> Is the 32-bit name of the second general-purpose register to be transferred, encoded in the "Rt2" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
Aborts and alignment
If a synchronous Data Abort exception is generated by the execution of this instruction:
• Memory is not updated.
• <Ws> is not updated.
Accessing an address that is not aligned to the size of the data being accessed causes an Alignment fault Data Abort
exception to be generated, subject to the following rules:
• If AArch64.ExclusiveMonitorsPass() returns TRUE, the exception is generated.
• Otherwise, it is IMPLEMENTATION DEFINED whether the exception is generated.
If AArch64.ExclusiveMonitorsPass() returns FALSE and the memory address, if accessed, would generate a
synchronous Data Abort exception, it is IMPLEMENTATION DEFINED whether the exception is generated.

Operation

bits(64) address;
bits(datasize) data;
constant integer dbytes = datasize DIV 8;

if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);

if n == 31 then
CheckSPAlignment();
address = SP[];
elsif rn_unknown then
address = bits(64) UNKNOWN;
else
address = X[n];

if rt_unknown then
data = bits(datasize) UNKNOWN;
else
bits(datasize DIV 2) el1 = X[t];
bits(datasize DIV 2) el2 = X[t2];
data = if BigEndian(AccType_ATOMIC) then el1:el2 else el2:el1;
bit status = '1';
// Check whether the Exclusives monitors are set to include the
// physical memory locations corresponding to virtual address
// range [address, address+dbytes-1].
if AArch64.ExclusiveMonitorsPass(address, dbytes) then
// This atomic write will be rejected if it does not refer
// to the same physical locations after address translation.
Mem[address, dbytes, AccType_ATOMIC] = data;
status = ExclusiveMonitorsStatus();
X[s] = ZeroExtend(status, 32);

Operational information

If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

STXP Page 741

STXR

Store Exclusive Register stores a 32-bit word or a 64-bit doubleword from a register to memory if the PE has exclusive
access to the memory address, and returns a status value of 0 if the store was successful, or of 1 if no store was
performed. See Synchronization and semaphores. For information about memory accesses see Load/Store addressing
modes.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 x 0 0 1 0 0 0 0 0 0 Rs 0 (1) (1) (1) (1) (1) Rn Rt
size L o0 Rt2

32-bit (size == 10)

STXR <Ws>, <Wt>, [<Xn|SP>{,#0}]

64-bit (size == 11)

STXR <Ws>, <Xt>, [<Xn|SP>{,#0}]

integer n = UInt(Rn);
integer t = UInt(Rt);
integer s = UInt(Rs); // ignored by all loads and store-release

integer elsize = 8 << UInt(size);

boolean tag_checked = n != 31;

boolean rt_unknown = FALSE;

For information about the CONSTRAINED UNPREDICTABLE behavior of this instruction, see Architectural Constraints on
UNPREDICTABLE behaviors, and particularly STXR.

Assembler Symbols

1
If the operation fails to update memory.

STXR Page 742

Accessing an address that is not aligned to the size of the data being accessed causes an Alignment fault Data Abort
exception to be generated, subject to the following rules:
• If AArch64.ExclusiveMonitorsPass() returns TRUE, the exception is generated.
• Otherwise, it is IMPLEMENTATION DEFINED whether the exception is generated.
If AArch64.ExclusiveMonitorsPass() returns FALSE and the memory address, if accessed, would generate a
synchronous Data Abort exception, it is IMPLEMENTATION DEFINED whether the exception is generated.

Operation

bits(64) address;
bits(elsize) data;
constant integer dbytes = elsize DIV 8;

if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);

if n == 31 then
CheckSPAlignment();
address = SP[];
elsif rn_unknown then
address = bits(64) UNKNOWN;
else
address = X[n];

if rt_unknown then
data = bits(elsize) UNKNOWN;
else
data = X[t];

bit status = '1';

// Check whether the Exclusives monitors are set to include the
// physical memory locations corresponding to virtual address
// range [address, address+dbytes-1].
if AArch64.ExclusiveMonitorsPass(address, dbytes) then
// This atomic write will be rejected if it does not refer
// to the same physical locations after address translation.
Mem[address, dbytes, AccType_ATOMIC] = data;
status = ExclusiveMonitorsStatus();
X[s] = ZeroExtend(status, 32);

Operational information

If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

STXR Page 743

STXRB

Store Exclusive Register Byte stores a byte from a register to memory if the PE has exclusive access to the memory
address, and returns a status value of 0 if the store was successful, or of 1 if no store was performed. See
Synchronization and semaphores. The memory access is atomic.
For information about memory accesses see Load/Store addressing modes.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 1 0 0 0 0 0 0 Rs 0 (1) (1) (1) (1) (1) Rn Rt
size L o0 Rt2

STXRB <Ws>, <Wt>, [<Xn|SP>{,#0}]

integer n = UInt(Rn);
integer t = UInt(Rt);
integer s = UInt(Rs); // ignored by all loads and store-release

boolean tag_checked = n != 31;

boolean rt_unknown = FALSE;

For information about the CONSTRAINED UNPREDICTABLE behavior of this instruction, see Architectural Constraints on
UNPREDICTABLE behaviors, and particularly STXRB.

Assembler Symbols

1
If the operation fails to update memory.

STXRB Page 744

Operation

bits(64) address;
bits(8) data;

if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);

if n == 31 then
CheckSPAlignment();
address = SP[];
elsif rn_unknown then
address = bits(64) UNKNOWN;
else
address = X[n];

if rt_unknown then
data = bits(8) UNKNOWN;
else
data = X[t];

bit status = '1';

// Check whether the Exclusives monitors are set to include the
// physical memory locations corresponding to virtual address
// range [address, address+dbytes-1].
if AArch64.ExclusiveMonitorsPass(address, 1) then
// This atomic write will be rejected if it does not refer
// to the same physical locations after address translation.
Mem[address, 1, AccType_ATOMIC] = data;
status = ExclusiveMonitorsStatus();
X[s] = ZeroExtend(status, 32);

Operational information

If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

STXRB Page 745

STXRH

Store Exclusive Register Halfword stores a halfword from a register to memory if the PE has exclusive access to the
memory address, and returns a status value of 0 if the store was successful, or of 1 if no store was performed. See
Synchronization and semaphores. The memory access is atomic.
For information about memory accesses see Load/Store addressing modes.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 0 1 0 0 0 0 0 0 Rs 0 (1) (1) (1) (1) (1) Rn Rt
size L o0 Rt2

STXRH <Ws>, <Wt>, [<Xn|SP>{,#0}]

integer n = UInt(Rn);
integer t = UInt(Rt);
integer s = UInt(Rs); // ignored by all loads and store-release

boolean tag_checked = n != 31;

boolean rt_unknown = FALSE;

Assembler Symbols

1
If the operation fails to update memory.

<Wt> Is the 32-bit name of the general-purpose register to be transferred, encoded in the "Rt" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
Aborts and alignment
If a synchronous Data Abort exception is generated by the execution of this instruction:
• Memory is not updated.
• <Ws> is not updated.
A non halfword-aligned memory address causes an Alignment fault Data Abort exception to be generated, subject to
the following rules:
• If AArch64.ExclusiveMonitorsPass() returns TRUE, the exception is generated.
• Otherwise, it is IMPLEMENTATION DEFINED whether the exception is generated.
If AArch64.ExclusiveMonitorsPass() returns FALSE and the memory address, if accessed, would generate a
synchronous Data Abort exception, it is IMPLEMENTATION DEFINED whether the exception is generated.

STXRH Page 746

Operation

bits(64) address;
bits(16) data;

if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);

if n == 31 then
CheckSPAlignment();
address = SP[];
elsif rn_unknown then
address = bits(64) UNKNOWN;
else
address = X[n];

if rt_unknown then
data = bits(16) UNKNOWN;
else
data = X[t];

bit status = '1';

// Check whether the Exclusives monitors are set to include the
// physical memory locations corresponding to virtual address
// range [address, address+dbytes-1].
if AArch64.ExclusiveMonitorsPass(address, 2) then
// This atomic write will be rejected if it does not refer
// to the same physical locations after address translation.
Mem[address, 2, AccType_ATOMIC] = data;
status = ExclusiveMonitorsStatus();
X[s] = ZeroExtend(status, 32);

Operational information

If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

STXRH Page 747

STZ2G

Store Allocation Tags, Zeroing stores an Allocation Tag to two Tag granules of memory, zeroing the associated data
locations. The address used for the store is calculated from the base register and an immediate signed offset scaled by
the Tag granule. The Allocation Tag is calculated from the Logical Address Tag in the source register.
This instruction generates an Unchecked access.
It has encodings from 3 classes: Post-index , Pre-index and Signed offset

Post-index
(FEAT_MTE)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 1 0 0 1 1 1 1 imm9 0 1 Xn Xt

STZ2G <Xt|SP>, [<Xn|SP>], #<simm>

if !HaveMTEExt() then UNDEFINED;

integer n = UInt(Xn);
integer t = UInt(Xt);
bits(64) offset = LSL(SignExtend(imm9, 64), LOG2_TAG_GRANULE);
boolean writeback = TRUE;
boolean postindex = TRUE;

Pre-index
(FEAT_MTE)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 1 0 0 1 1 1 1 imm9 1 1 Xn Xt

STZ2G <Xt|SP>, [<Xn|SP>, #<simm>]!

if !HaveMTEExt() then UNDEFINED;

integer n = UInt(Xn);
integer t = UInt(Xt);
bits(64) offset = LSL(SignExtend(imm9, 64), LOG2_TAG_GRANULE);
boolean writeback = TRUE;
boolean postindex = FALSE;

Signed offset
(FEAT_MTE)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 1 0 0 1 1 1 1 imm9 1 0 Xn Xt

STZ2G <Xt|SP>, [<Xn|SP>{, #<simm>}]

if !HaveMTEExt() then UNDEFINED;

integer n = UInt(Xn);
integer t = UInt(Xt);
bits(64) offset = LSL(SignExtend(imm9, 64), LOG2_TAG_GRANULE);
boolean writeback = FALSE;
boolean postindex = FALSE;

Assembler Symbols

STZ2G Page 748

Operation

bits(64) address;
bits(64) data = if t == 31 then SP[] else X[t];
bits(4) tag = AArch64.AllocationTagFromAddress(data);

SetTagCheckedInstruction(FALSE);

if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];

if !postindex then
address = address + offset;

if address != Align(address, TAG_GRANULE) then

AArch64.Abort(address, AlignmentFault(AccType_NORMAL, TRUE, FALSE));

Mem[address, TAG_GRANULE, AccType_NORMAL] = Zeros(TAG_GRANULE * 8);

Mem[address+TAG_GRANULE, TAG_GRANULE, AccType_NORMAL] = Zeros(TAG_GRANULE * 8);

AArch64.MemTag[address, AccType_NORMAL] = tag;

AArch64.MemTag[address+TAG_GRANULE, AccType_NORMAL] = tag;

if writeback then
if postindex then
address = address + offset;

if n == 31 then
SP[] = address;
else
X[n] = address;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

STZ2G Page 749

STZG

Store Allocation Tag, Zeroing stores an Allocation Tag to memory, zeroing the associated data location. The address
used for the store is calculated from the base register and an immediate signed offset scaled by the Tag granule. The
Allocation Tag is calculated from the Logical Address Tag in the source register.
This instruction generates an Unchecked access.
It has encodings from 3 classes: Post-index , Pre-index and Signed offset

Post-index
(FEAT_MTE)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 1 0 0 1 0 1 1 imm9 0 1 Xn Xt

STZG <Xt|SP>, [<Xn|SP>], #<simm>

if !HaveMTEExt() then UNDEFINED;

integer n = UInt(Xn);
integer t = UInt(Xt);
bits(64) offset = LSL(SignExtend(imm9, 64), LOG2_TAG_GRANULE);
boolean writeback = TRUE;
boolean postindex = TRUE;

Pre-index
(FEAT_MTE)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 1 0 0 1 0 1 1 imm9 1 1 Xn Xt

STZG <Xt|SP>, [<Xn|SP>, #<simm>]!

if !HaveMTEExt() then UNDEFINED;

integer n = UInt(Xn);
integer t = UInt(Xt);
bits(64) offset = LSL(SignExtend(imm9, 64), LOG2_TAG_GRANULE);
boolean writeback = TRUE;
boolean postindex = FALSE;

Signed offset
(FEAT_MTE)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 1 0 0 1 0 1 1 imm9 1 0 Xn Xt

STZG <Xt|SP>, [<Xn|SP>{, #<simm>}]

if !HaveMTEExt() then UNDEFINED;

integer n = UInt(Xn);
integer t = UInt(Xt);
bits(64) offset = LSL(SignExtend(imm9, 64), LOG2_TAG_GRANULE);
boolean writeback = FALSE;
boolean postindex = FALSE;

Assembler Symbols

STZG Page 750

Operation

bits(64) address;

SetTagCheckedInstruction(FALSE);

if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];

if !postindex then
address = address + offset;

if address != Align(address, TAG_GRANULE) then

AArch64.Abort(address, AlignmentFault(AccType_NORMAL, TRUE, FALSE));

Mem[address, TAG_GRANULE, AccType_NORMAL] = Zeros(TAG_GRANULE * 8);

bits(64) data = if t == 31 then SP[] else X[t];

bits(4) tag = AArch64.AllocationTagFromAddress(data);
AArch64.MemTag[address, AccType_NORMAL] = tag;

if writeback then
if postindex then
address = address + offset;

if n == 31 then
SP[] = address;
else
X[n] = address;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

STZG Page 751

STZGM

Store Tag and Zero Multiple writes a naturally aligned block of N Allocation Tags and stores zero to the associated
data locations, where the size of N is identified in DCZID_EL0.BS, and the Allocation Tag written to address A is taken
from the source register bits<3:0>.
This instruction is UNDEFINED at EL0.
This instruction generates an Unchecked access.

Integer
(FEAT_MTE2)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 1 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 Xn Xt

STZGM <Xt>, [<Xn|SP>]

if !HaveMTE2Ext() then UNDEFINED;

integer t = UInt(Xt);
integer n = UInt(Xn);

Assembler Symbols

Operation

if PSTATE.EL == EL0 then

UNDEFINED;

bits(64) data = X[t];

bits(4) tag = data<3:0>;
bits(64) address;
if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];

integer size = 4 * (2 ^ (UInt(DCZID_EL0.BS)));

address = Align(address, size);
integer count = size >> LOG2_TAG_GRANULE;

for i = 0 to count-1
AArch64.MemTag[address, AccType_NORMAL] = tag;
Mem[address, TAG_GRANULE, AccType_NORMAL] = Zeros(8 * TAG_GRANULE);
address = address + TAG_GRANULE;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

STZGM Page 752

SUB (extended register)

Subtract (extended register) subtracts a sign or zero-extended register value, followed by an optional left shift
amount, from a register value, and writes the result to the destination register. The argument that is extended from
the <Rm> register can be a byte, halfword, word, or doubleword.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 1 0 0 1 0 1 1 0 0 1 Rm option imm3 Rn Rd
op S

32-bit (sf == 0)

SUB <Wd|WSP>, <Wn|WSP>, <Wm>{, <extend> {#<amount>}}

64-bit (sf == 1)

SUB <Xd|SP>, <Xn|SP>, <R><m>{, <extend> {#<amount>}}

Assembler Symbols

<R> Is a width specifier, encoded in “option”:

option <R>
00x W
010 W
x11 X
10x W
110 W

<m> Is the number [0-30] of the second general-purpose source register or the name ZR (31), encoded in the
"Rm" field.

<extend> For the 32-bit variant: is the extension to be applied to the second source operand, encoded in “option”:

option <extend>
000 UXTB
001 UXTH
010 LSL|UXTW
011 UXTX
100 SXTB
101 SXTH
110 SXTW
111 SXTX

SUB (extended register) Page 753

If "Rd" or "Rn" is '11111' (WSP) and "option" is '010' then LSL is preferred, but may be omitted when
"imm3" is '000'. In all other cases <extend> is required and must be UXTW when "option" is '010'.

For the 64-bit variant: is the extension to be applied to the second source operand, encoded in “option”:

Operation

bits(datasize) result;
bits(datasize) operand1 = if n == 31 then SP[] else X[n];
bits(datasize) operand2 = ExtendReg(m, extend_type, shift);

operand2 = NOT(operand2);
(result, -) = AddWithCarry(operand1, operand2, '1');

if d == 31 then
SP[] = result;
else
X[d] = result;

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

SUB (extended register) Page 754

SUB (immediate)

Subtract (immediate) subtracts an optionally-shifted immediate value from a register value, and writes the result to
the destination register.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 1 0 1 0 0 0 1 0 sh imm12 Rn Rd
op S

32-bit (sf == 0)

SUB <Wd|WSP>, <Wn|WSP>, #<imm>{, <shift>}

64-bit (sf == 1)

SUB <Xd|SP>, <Xn|SP>, #<imm>{, <shift>}

integer d = UInt(Rd);
integer n = UInt(Rn);
integer datasize = if sf == '1' then 64 else 32;
bits(datasize) imm;

case sh of
when '0' imm = ZeroExtend(imm12, datasize);
when '1' imm = ZeroExtend(imm12:Zeros(12), datasize);

Assembler Symbols

<shift> Is the optional left shift to apply to the immediate, defaulting to LSL #0 and encoded in “sh”:

sh <shift>
0 LSL #0
1 LSL #12

Operation

bits(datasize) result;
bits(datasize) operand1 = if n == 31 then SP[] else X[n];
bits(datasize) operand2;

operand2 = NOT(imm);
(result, -) = AddWithCarry(operand1, operand2, '1');

if d == 31 then
SP[] = result;
else
X[d] = result;

Operational information

If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:

SUB (immediate) Page 755

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

SUB (immediate) Page 756

SUB (shifted register)

Subtract (shifted register) subtracts an optionally-shifted register value from a register value, and writes the result to
the destination register.
This instruction is used by the alias NEG (shifted register).
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 1 0 0 1 0 1 1 shift 0 Rm imm6 Rn Rd
op S

32-bit (sf == 0)

SUB <Wd>, <Wn>, <Wm>{, <shift> #<amount>}

64-bit (sf == 1)

SUB <Xd>, <Xn>, <Xm>{, <shift> #<amount>}

integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer datasize = if sf == '1' then 64 else 32;

if shift == '11' then UNDEFINED;

if sf == '0' && imm6<5> == '1' then UNDEFINED;

ShiftType shift_type = DecodeShift(shift);

integer shift_amount = UInt(imm6);

Assembler Symbols

<shift> Is the optional shift type to be applied to the second source operand, defaulting to LSL and encoded in
“shift”:

shift <shift>
00 LSL
01 LSR
10 ASR
11 RESERVED

Alias Conditions

Alias Is preferred when

NEG (shifted register) Rn == '11111'

SUB (shifted register) Page 757

Operation

bits(datasize) result;
bits(datasize) operand1 = X[n];
bits(datasize) operand2 = ShiftReg(m, shift_type, shift_amount);

operand2 = NOT(operand2);
(result, -) = AddWithCarry(operand1, operand2, '1');

X[d] = result;

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

SUB (shifted register) Page 758

SUBG

Subtract with Tag subtracts an immediate value scaled by the Tag granule from the address in the source register,
modifies the Logical Address Tag of the address using an immediate value, and writes the result to the destination
register. Tags specified in GCR_EL1.Exclude are excluded from the possible outputs when modifying the Logical
Address Tag.

Integer
(FEAT_MTE)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 0 0 0 1 1 0 uimm6 (0) (0) uimm4 Xn Xd
op3

SUBG <Xd|SP>, <Xn|SP>, #<uimm6>, #<uimm4>

if !HaveMTEExt() then UNDEFINED;

integer d = UInt(Xd);
integer n = UInt(Xn);
bits(64) offset = LSL(ZeroExtend(uimm6, 64), LOG2_TAG_GRANULE);

Assembler Symbols

Operation

bits(64) operand1 = if n == 31 then SP[] else X[n];

bits(4) start_tag = AArch64.AllocationTagFromAddress(operand1);
bits(16) exclude = GCR_EL1.Exclude;
bits(64) result;
bits(4) rtag;

if AArch64.AllocationTagAccessIsEnabled(AccType_NORMAL) then
rtag = AArch64.ChooseNonExcludedTag(start_tag, uimm4, exclude);
else
rtag = '0000';

(result, -) = AddWithCarry(operand1, NOT(offset), '1');

result = AArch64.AddressWithAllocationTag(result, AccType_NORMAL, rtag);

if d == 31 then
SP[] = result;
else
X[d] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

SUBG Page 759

SUBP

Subtract Pointer subtracts the 56-bit address held in the second source register from the 56-bit address held in the
first source register, sign-extends the result to 64-bits, and writes the result to the destination register.

Integer
(FEAT_MTE)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 0 1 1 0 1 0 1 1 0 Xm 0 0 0 0 0 0 Xn Xd

SUBP <Xd>, <Xn|SP>, <Xm|SP>

if !HaveMTEExt() then UNDEFINED;

integer d = UInt(Xd);
integer n = UInt(Xn);
integer m = UInt(Xm);

Assembler Symbols

<Xd> Is the 64-bit name of the general-purpose destination register, encoded in the "Xd" field.
<Xn|SP> Is the 64-bit name of the first source general-purpose register or stack pointer, encoded in the "Xn"
field.
<Xm|SP> Is the 64-bit name of the second general-purpose source register or stack pointer, encoded in the "Xm"
field.

Operation

bits(64) operand1 = if n == 31 then SP[] else X[n];

bits(64) operand2 = if m == 31 then SP[] else X[m];
operand1 = SignExtend(operand1<55:0>, 64);
operand2 = SignExtend(operand2<55:0>, 64);

bits(64) result;

operand2 = NOT(operand2);
(result, -) = AddWithCarry(operand1, operand2, '1');

X[d] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

SUBP Page 760

SUBPS

Subtract Pointer, setting Flags subtracts the 56-bit address held in the second source register from the 56-bit address
held in the first source register, sign-extends the result to 64-bits, and writes the result to the destination register. It
updates the condition flags based on the result of the subtraction.
This instruction is used by the alias CMPP.

Integer
(FEAT_MTE)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 1 1 0 1 0 1 1 0 Xm 0 0 0 0 0 0 Xn Xd

SUBPS <Xd>, <Xn|SP>, <Xm|SP>

if !HaveMTEExt() then UNDEFINED;

integer d = UInt(Xd);
integer n = UInt(Xn);
integer m = UInt(Xm);

Assembler Symbols

<Xd> Is the 64-bit name of the general-purpose destination register, encoded in the "Xd" field.
<Xn|SP> Is the 64-bit name of the first source general-purpose register or stack pointer, encoded in the "Xn"
field.
<Xm|SP> Is the 64-bit name of the second general-purpose source register or stack pointer, encoded in the "Xm"
field.

Alias Conditions

Alias Is preferred when

CMPP S == '1' && Xd == '11111'

Operation

bits(64) operand1 = if n == 31 then SP[] else X[n];

bits(64) operand2 = if m == 31 then SP[] else X[m];
operand1 = SignExtend(operand1<55:0>, 64);
operand2 = SignExtend(operand2<55:0>, 64);

bits(64) result;
bits(4) nzcv;

operand2 = NOT(operand2);
(result, nzcv) = AddWithCarry(operand1, operand2, '1');

PSTATE.<N,Z,C,V> = nzcv;
X[d] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

SUBPS Page 761

SUBS (extended register)

Subtract (extended register), setting flags, subtracts a sign or zero-extended register value, followed by an optional
left shift amount, from a register value, and writes the result to the destination register. The argument that is
extended from the <Rm> register can be a byte, halfword, word, or doubleword. It updates the condition flags based
on the result.
This instruction is used by the alias CMP (extended register).
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 1 1 0 1 0 1 1 0 0 1 Rm option imm3 Rn Rd
op S

32-bit (sf == 0)

SUBS <Wd>, <Wn|WSP>, <Wm>{, <extend> {#<amount>}}

64-bit (sf == 1)

SUBS <Xd>, <Xn|SP>, <R><m>{, <extend> {#<amount>}}

Assembler Symbols

<R> Is a width specifier, encoded in “option”:

option <R>
00x W
010 W
x11 X
10x W
110 W

<m> Is the number [0-30] of the second general-purpose source register or the name ZR (31), encoded in the
"Rm" field.

<extend> For the 32-bit variant: is the extension to be applied to the second source operand, encoded in “option”:

option <extend>
000 UXTB
001 UXTH
010 LSL|UXTW
011 UXTX
100 SXTB
101 SXTH
110 SXTW
111 SXTX

SUBS (extended register) Page 762

If "Rn" is '11111' (WSP) and "option" is '010' then LSL is preferred, but may be omitted when "imm3" is
'000'. In all other cases <extend> is required and must be UXTW when "option" is '010'.

For the 64-bit variant: is the extension to be applied to the second source operand, encoded in “option”:

Alias Conditions

Alias Is preferred when

CMP (extended register) Rd == '11111'

Operation

bits(datasize) result;
bits(datasize) operand1 = if n == 31 then SP[] else X[n];
bits(datasize) operand2 = ExtendReg(m, extend_type, shift);
bits(4) nzcv;

operand2 = NOT(operand2);
(result, nzcv) = AddWithCarry(operand1, operand2, '1');

PSTATE.<N,Z,C,V> = nzcv;

X[d] = result;

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

SUBS (extended register) Page 763

SUBS (immediate)

Subtract (immediate), setting flags, subtracts an optionally-shifted immediate value from a register value, and writes
the result to the destination register. It updates the condition flags based on the result.
This instruction is used by the alias CMP (immediate).
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 1 1 1 0 0 0 1 0 sh imm12 Rn Rd
op S

32-bit (sf == 0)

SUBS <Wd>, <Wn|WSP>, #<imm>{, <shift>}

64-bit (sf == 1)

SUBS <Xd>, <Xn|SP>, #<imm>{, <shift>}

integer d = UInt(Rd);
integer n = UInt(Rn);
integer datasize = if sf == '1' then 64 else 32;
bits(datasize) imm;

case sh of
when '0' imm = ZeroExtend(imm12, datasize);
when '1' imm = ZeroExtend(imm12:Zeros(12), datasize);

Assembler Symbols

<shift> Is the optional left shift to apply to the immediate, defaulting to LSL #0 and encoded in “sh”:

sh <shift>
0 LSL #0
1 LSL #12

Alias Conditions

Alias Is preferred when

CMP (immediate) Rd == '11111'

Operation

bits(datasize) result;
bits(datasize) operand1 = if n == 31 then SP[] else X[n];
bits(datasize) operand2;
bits(4) nzcv;

operand2 = NOT(imm);
(result, nzcv) = AddWithCarry(operand1, operand2, '1');

PSTATE.<N,Z,C,V> = nzcv;

X[d] = result;

SUBS (immediate) Page 764

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

SUBS (immediate) Page 765

SUBS (shifted register)

Subtract (shifted register), setting flags, subtracts an optionally-shifted register value from a register value, and writes
the result to the destination register. It updates the condition flags based on the result.
This instruction is used by the aliases CMP (shifted register), and NEGS.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 1 1 0 1 0 1 1 shift 0 Rm imm6 Rn Rd
op S

32-bit (sf == 0)

SUBS <Wd>, <Wn>, <Wm>{, <shift> #<amount>}

64-bit (sf == 1)

SUBS <Xd>, <Xn>, <Xm>{, <shift> #<amount>}

integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer datasize = if sf == '1' then 64 else 32;

if shift == '11' then UNDEFINED;

if sf == '0' && imm6<5> == '1' then UNDEFINED;

ShiftType shift_type = DecodeShift(shift);

integer shift_amount = UInt(imm6);

Assembler Symbols

<shift> Is the optional shift type to be applied to the second source operand, defaulting to LSL and encoded in
“shift”:

shift <shift>
00 LSL
01 LSR
10 ASR
11 RESERVED

Alias Conditions

Alias Is preferred when

CMP (shifted register) Rd == '11111'
NEGS Rn == '11111' && Rd != '11111'

SUBS (shifted register) Page 766

Operation

bits(datasize) result;
bits(datasize) operand1 = X[n];
bits(datasize) operand2 = ShiftReg(m, shift_type, shift_amount);
bits(4) nzcv;

operand2 = NOT(operand2);
(result, nzcv) = AddWithCarry(operand1, operand2, '1');

PSTATE.<N,Z,C,V> = nzcv;

X[d] = result;

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

SUBS (shifted register) Page 767

SVC

Supervisor Call causes an exception to be taken to EL1.

On executing an SVC instruction, the PE records the exception as a Supervisor Call exception in ESR_ELx, using the
EC value 0x15, and the value of the immediate argument.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 0 1 0 0 0 0 0 imm16 0 0 0 0 1

SVC #<imm>

// Empty.

Assembler Symbols

<imm> Is a 16-bit unsigned immediate, in the range 0 to 65535, encoded in the "imm16" field.

Operation

AArch64.CheckForSVCTrap(imm16);
AArch64.CallSupervisor(imm16);

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

SVC Page 768

SWP, SWPA, SWPAL, SWPL

Swap word or doubleword in memory atomically loads a 32-bit word or 64-bit doubleword from a memory location,
and stores the value held in a register back to the same memory location. The value initially loaded from memory is
returned in the destination register.
• If the destination register is not one of WZR or XZR, SWPA and SWPAL load from memory with acquire
semantics.
• SWPL and SWPAL store to memory with release semantics.
• SWP has neither acquire nor release semantics.
For more information about memory ordering semantics see Load-Acquire, Store-Release.
For information about memory accesses see Load/Store addressing modes.

Integer
(FEAT_LSE)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 x 1 1 1 0 0 0 A R 1 Rs 1 0 0 0 0 0 Rn Rt
size

SWP, SWPA, SWPAL, SWPL Page 769

32-bit SWP (size == 10 && A == 0 && R == 0)

SWP <Ws>, <Wt>, [<Xn|SP>]

32-bit SWPA (size == 10 && A == 1 && R == 0)

SWPA <Ws>, <Wt>, [<Xn|SP>]

32-bit SWPAL (size == 10 && A == 1 && R == 1)

SWPAL <Ws>, <Wt>, [<Xn|SP>]

32-bit SWPL (size == 10 && A == 0 && R == 1)

SWPL <Ws>, <Wt>, [<Xn|SP>]

64-bit SWP (size == 11 && A == 0 && R == 0)

SWP <Xs>, <Xt>, [<Xn|SP>]

64-bit SWPA (size == 11 && A == 1 && R == 0)

SWPA <Xs>, <Xt>, [<Xn|SP>]

64-bit SWPAL (size == 11 && A == 1 && R == 1)

SWPAL <Xs>, <Xt>, [<Xn|SP>]

64-bit SWPL (size == 11 && A == 0 && R == 1)

SWPL <Xs>, <Xt>, [<Xn|SP>]

if !HaveAtomicExt() then UNDEFINED;

integer t = UInt(Rt);
integer n = UInt(Rn);
integer s = UInt(Rs);

integer datasize = 8 << UInt(size);

Assembler Symbols

<Ws> Is the 32-bit name of the general-purpose register to be stored, encoded in the "Rs" field.
<Wt> Is the 32-bit name of the general-purpose register to be loaded, encoded in the "Rt" field.
<Xs> Is the 64-bit name of the general-purpose register to be stored, encoded in the "Rs" field.
<Xt> Is the 64-bit name of the general-purpose register to be loaded, encoded in the "Rt" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.

SWP, SWPA, SWPAL, SWPL Page 770

Operation

bits(64) address;
bits(datasize) data;
bits(datasize) store_value;

if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);

if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];

store_value = X[s];
data = MemAtomic(address, MemAtomicOp_SWP, store_value, ldacctype, stacctype);
X[t] = ZeroExtend(data, regsize);

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

SWP, SWPA, SWPAL, SWPL Page 771

SWPB, SWPAB, SWPALB, SWPLB

Swap byte in memory atomically loads an 8-bit byte from a memory location, and stores the value held in a register
back to the same memory location. The value initially loaded from memory is returned in the destination register.
• If the destination register is not WZR, SWPAB and SWPALB load from memory with acquire semantics.
• SWPLB and SWPALB store to memory with release semantics.
• SWPB has neither acquire nor release semantics.
For more information about memory ordering semantics see Load-Acquire, Store-Release.
For information about memory accesses see Load/Store addressing modes.

Integer
(FEAT_LSE)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 1 1 0 0 0 A R 1 Rs 1 0 0 0 0 0 Rn Rt
size

SWPAB (A == 1 && R == 0)

SWPAB <Ws>, <Wt>, [<Xn|SP>]

SWPALB (A == 1 && R == 1)

SWPALB <Ws>, <Wt>, [<Xn|SP>]

SWPB (A == 0 && R == 0)

SWPB <Ws>, <Wt>, [<Xn|SP>]

SWPLB (A == 0 && R == 1)

SWPLB <Ws>, <Wt>, [<Xn|SP>]

if !HaveAtomicExt() then UNDEFINED;

integer t = UInt(Rt);
integer n = UInt(Rn);
integer s = UInt(Rs);

AccType ldacctype = if A == '1' && Rt != '11111' then AccType_ORDEREDATOMICRW else AccType_ATOMICRW;

AccType stacctype = if R == '1' then AccType_ORDEREDATOMICRW else AccType_ATOMICRW;
boolean tag_checked = n != 31;

Assembler Symbols

SWPB, SWPAB, SWPALB,

Page 772
SWPLB
Operation

bits(64) address;
bits(8) data;
bits(8) store_value;

if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);

if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];

store_value = X[s];
data = MemAtomic(address, MemAtomicOp_SWP, store_value, ldacctype, stacctype);
X[t] = ZeroExtend(data, 32);

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

SWPB, SWPAB, SWPALB,

Page 773
SWPLB
SWPH, SWPAH, SWPALH, SWPLH

Swap halfword in memory atomically loads a 16-bit halfword from a memory location, and stores the value held in a
register back to the same memory location. The value initially loaded from memory is returned in the destination
register.
• If the destination register is not WZR, SWPAH and SWPALH load from memory with acquire semantics.
• SWPLH and SWPALH store to memory with release semantics.
• SWPH has neither acquire nor release semantics.
For more information about memory ordering semantics see Load-Acquire, Store-Release.
For information about memory accesses see Load/Store addressing modes.

Integer
(FEAT_LSE)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 0 0 0 A R 1 Rs 1 0 0 0 0 0 Rn Rt
size

SWPAH (A == 1 && R == 0)

SWPAH <Ws>, <Wt>, [<Xn|SP>]

SWPALH (A == 1 && R == 1)

SWPALH <Ws>, <Wt>, [<Xn|SP>]

SWPH (A == 0 && R == 0)

SWPH <Ws>, <Wt>, [<Xn|SP>]

SWPLH (A == 0 && R == 1)

SWPLH <Ws>, <Wt>, [<Xn|SP>]

if !HaveAtomicExt() then UNDEFINED;

integer t = UInt(Rt);
integer n = UInt(Rn);
integer s = UInt(Rs);

AccType ldacctype = if A == '1' && Rt != '11111' then AccType_ORDEREDATOMICRW else AccType_ATOMICRW;

AccType stacctype = if R == '1' then AccType_ORDEREDATOMICRW else AccType_ATOMICRW;
boolean tag_checked = n != 31;

Assembler Symbols

SWPH, SWPAH, SWPALH,

Page 774
SWPLH
Operation

bits(64) address;
bits(16) data;
bits(16) store_value;

if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);

if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];

store_value = X[s];
data = MemAtomic(address, MemAtomicOp_SWP, store_value, ldacctype, stacctype);
X[t] = ZeroExtend(data, 32);

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

SWPH, SWPAH, SWPALH,

Page 775
SWPLH
SXTB

Signed Extend Byte extracts an 8-bit value from a register, sign-extends it to the size of the register, and writes the
result to the destination register.

This is an alias of SBFM. This means:

32-bit (sf == 0 && N == 0)

SXTB <Wd>, <Wn>

is equivalent to

SBFM <Wd>, <Wn>, #0, #7

and is always the preferred disassembly.

64-bit (sf == 1 && N == 1)

SXTB <Xd>, <Wn>

is equivalent to

SBFM <Xd>, <Xn>, #0, #7

and is always the preferred disassembly.

Assembler Symbols

Operation

The description of SBFM gives the operational pseudocode for this instruction.

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

SXTB Page 776

SXTH

Sign Extend Halfword extracts a 16-bit value, sign-extends it to the size of the register, and writes the result to the
destination register.

This is an alias of SBFM. This means:

32-bit (sf == 0 && N == 0)

SXTH <Wd>, <Wn>

is equivalent to

SBFM <Wd>, <Wn>, #0, #15

and is always the preferred disassembly.

64-bit (sf == 1 && N == 1)

SXTH <Xd>, <Wn>

is equivalent to

SBFM <Xd>, <Xn>, #0, #15

and is always the preferred disassembly.

Assembler Symbols

Operation

The description of SBFM gives the operational pseudocode for this instruction.

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

SXTH Page 777

SXTW

Sign Extend Word sign-extends a word to the size of the register, and writes the result to the destination register.

This is an alias of SBFM. This means:

• The encodings in this description are named to match the encodings of SBFM.
• The description of SBFM gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 0 1 0 0 1 1 0 1 0 0 0 0 0 0 0 1 1 1 1 1 Rn Rd
sf opc N immr imms

64-bit

SXTW <Xd>, <Wn>

is equivalent to

SBFM <Xd>, <Xn>, #0, #31

and is always the preferred disassembly.

Assembler Symbols

<Xd> Is the 64-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Xn> Is the 64-bit name of the general-purpose source register, encoded in the "Rn" field.
<Wn> Is the 32-bit name of the general-purpose source register, encoded in the "Rn" field.

Operation

The description of SBFM gives the operational pseudocode for this instruction.

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

SXTW Page 778

SYS

System instruction. For more information, see Op0 equals 0b01, cache maintenance, TLB maintenance, and address
translation instructions for the encodings of System instructions.
This instruction is used by the aliases AT, CFP, CPP, DC, DVP, IC, and TLBI.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 0 1 0 1 0 0 0 0 1 op1 CRn CRm op2 Rt
L

SYS #<op1>, <Cn>, <Cm>, #<op2>{, <Xt>}

integer t = UInt(Rt);

integer sys_op1 = UInt(op1);

integer sys_op2 = UInt(op2);
integer sys_crn = UInt(CRn);
integer sys_crm = UInt(CRm);

Assembler Symbols

Alias Conditions

Alias Is preferred when

AT CRn == '0111' && CRm == '100x' && SysOp(op1,'0111',CRm,op2) == Sys_AT
CFP op1 == '011' && CRn == '0111' && CRm == '0011' && op2 == '100'
CPP op1 == '011' && CRn == '0111' && CRm == '0011' && op2 == '111'
DC CRn == '0111' && SysOp(op1,'0111',CRm,op2) == Sys_DC
DVP op1 == '011' && CRn == '0111' && CRm == '0011' && op2 == '101'
IC CRn == '0111' && SysOp(op1,'0111',CRm,op2) == Sys_IC
TLBI CRn == '1000' && SysOp(op1,'1000',CRm,op2) == Sys_TLBI

Operation

AArch64.SysInstr(1, sys_op1, sys_crn, sys_crm, sys_op2, t);

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

SYS Page 779

SYSL

System instruction with result. For more information, see Op0 equals 0b01, cache maintenance, TLB maintenance, and
address translation instructions for the encodings of System instructions.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 0 1 0 1 0 0 1 0 1 op1 CRn CRm op2 Rt
L

SYSL <Xt>, #<op1>, <Cn>, <Cm>, #<op2>

integer t = UInt(Rt);

integer sys_op1 = UInt(op1);

integer sys_op2 = UInt(op2);
integer sys_crn = UInt(CRn);
integer sys_crm = UInt(CRm);

Assembler Symbols

<Xt> Is the 64-bit name of the general-purpose destination register, encoded in the "Rt" field.
<op1> Is a 3-bit unsigned immediate, in the range 0 to 7, encoded in the "op1" field.
<Cn> Is a name 'Cn', with 'n' in the range 0 to 15, encoded in the "CRn" field.
<Cm> Is a name 'Cm', with 'm' in the range 0 to 15, encoded in the "CRm" field.
<op2> Is a 3-bit unsigned immediate, in the range 0 to 7, encoded in the "op2" field.

Operation

// No architecturally defined instructions here.

AArch64.SysInstrWithResult(1, sys_op1, sys_crn, sys_crm, sys_op2, t);

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

SYSL Page 780

TBNZ

Test bit and Branch if Nonzero compares the value of a bit in a general-purpose register with zero, and conditionally
branches to a label at a PC-relative offset if the comparison is not equal. It provides a hint that this is not a subroutine
call or return. This instruction does not affect condition flags.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
b5 0 1 1 0 1 1 1 b40 imm14 Rt
op

TBNZ <R><t>, #<imm>, <label>

integer t = UInt(Rt);

integer datasize = if b5 == '1' then 64 else 32;

integer bit_pos = UInt(b5:b40);
bits(64) offset = SignExtend(imm14:'00', 64);

Assembler Symbols

<R> Is a width specifier, encoded in “b5”:

b5 <R>
0 W
1 X
In assembler source code an 'X' specifier is always permitted, but a 'W' specifier is only permitted when
the bit number is less than 32.
<t> Is the number [0-30] of the general-purpose register to be tested or the name ZR (31), encoded in the
"Rt" field.
<imm> Is the bit number to be tested, in the range 0 to 63, encoded in "b5:b40".
<label> Is the program label to be conditionally branched to. Its offset from the address of this instruction, in
the range +/-32KB, is encoded as "imm14" times 4.

Operation

bits(datasize) operand = X[t];

if operand<bit_pos> == op then
BranchTo(PC[] + offset, BranchType_DIR, TRUE);

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

TBNZ Page 781

TBZ

Test bit and Branch if Zero compares the value of a test bit with zero, and conditionally branches to a label at a PC-
relative offset if the comparison is equal. It provides a hint that this is not a subroutine call or return. This instruction
does not affect condition flags.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
b5 0 1 1 0 1 1 0 b40 imm14 Rt
op

TBZ <R><t>, #<imm>, <label>

integer t = UInt(Rt);

integer datasize = if b5 == '1' then 64 else 32;

integer bit_pos = UInt(b5:b40);
bits(64) offset = SignExtend(imm14:'00', 64);

Assembler Symbols

<R> Is a width specifier, encoded in “b5”:

Operation

bits(datasize) operand = X[t];

if operand<bit_pos> == op then
BranchTo(PC[] + offset, BranchType_DIR, TRUE);

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

TBZ Page 782

TLBI

TLB Invalidate operation. For more information, see op0==0b01, cache maintenance, TLB maintenance, and address
translation instructions.

This is an alias of SYS. This means:

TLBI <tlbi_op>{, <Xt>}

is equivalent to

SYS #<op1>, C8, <Cm>, #<op2>{, <Xt>}

and is the preferred disassembly when SysOp(op1,'1000',CRm,op2) == Sys_TLBI.

Assembler Symbols

<tlbi_op> Is a TLBI instruction name, as listed for the TLBI system instruction group, encoded in “op1:CRm:op2”:

TLBI Page 783

op1 CRm op2 <tlbi_op> Architectural Feature
000 0001 000 VMALLE1OS FEAT_TLBIOS
000 0001 001 VAE1OS FEAT_TLBIOS
000 0001 010 ASIDE1OS FEAT_TLBIOS
000 0001 011 VAAE1OS FEAT_TLBIOS
000 0001 101 VALE1OS FEAT_TLBIOS
000 0001 111 VAALE1OS FEAT_TLBIOS
000 0010 001 RVAE1IS FEAT_TLBIRANGE
000 0010 011 RVAAE1IS FEAT_TLBIRANGE
000 0010 101 RVALE1IS FEAT_TLBIRANGE
000 0010 111 RVAALE1IS FEAT_TLBIRANGE
000 0011 000 VMALLE1IS -
000 0011 001 VAE1IS -
000 0011 010 ASIDE1IS -
000 0011 011 VAAE1IS -
000 0011 101 VALE1IS -
000 0011 111 VAALE1IS -
000 0101 001 RVAE1OS FEAT_TLBIRANGE
000 0101 011 RVAAE1OS FEAT_TLBIRANGE
000 0101 101 RVALE1OS FEAT_TLBIRANGE
000 0101 111 RVAALE1OS FEAT_TLBIRANGE
000 0110 001 RVAE1 FEAT_TLBIRANGE
000 0110 011 RVAAE1 FEAT_TLBIRANGE
000 0110 101 RVALE1 FEAT_TLBIRANGE
000 0110 111 RVAALE1 FEAT_TLBIRANGE
000 0111 000 VMALLE1 -
000 0111 001 VAE1 -
000 0111 010 ASIDE1 -
000 0111 011 VAAE1 -
000 0111 101 VALE1 -
000 0111 111 VAALE1 -
100 0000 001 IPAS2E1IS -
100 0000 010 RIPAS2E1IS FEAT_TLBIRANGE
100 0000 101 IPAS2LE1IS -
100 0000 110 RIPAS2LE1IS FEAT_TLBIRANGE
100 0001 000 ALLE2OS FEAT_TLBIOS
100 0001 001 VAE2OS FEAT_TLBIOS
100 0001 100 ALLE1OS FEAT_TLBIOS
100 0001 101 VALE2OS FEAT_TLBIOS
100 0001 110 VMALLS12E1OS FEAT_TLBIOS
100 0010 001 RVAE2IS FEAT_TLBIRANGE
100 0010 101 RVALE2IS FEAT_TLBIRANGE
100 0011 000 ALLE2IS -
100 0011 001 VAE2IS -
100 0011 100 ALLE1IS -
100 0011 101 VALE2IS -
100 0011 110 VMALLS12E1IS -
100 0100 000 IPAS2E1OS FEAT_TLBIOS
100 0100 001 IPAS2E1 -
100 0100 010 RIPAS2E1 FEAT_TLBIRANGE
100 0100 011 RIPAS2E1OS FEAT_TLBIRANGE
100 0100 100 IPAS2LE1OS FEAT_TLBIOS
100 0100 101 IPAS2LE1 -
100 0100 110 RIPAS2LE1 FEAT_TLBIRANGE
100 0100 111 RIPAS2LE1OS FEAT_TLBIRANGE
100 0101 001 RVAE2OS FEAT_TLBIRANGE
100 0101 101 RVALE2OS FEAT_TLBIRANGE
100 0110 001 RVAE2 FEAT_TLBIRANGE
100 0110 101 RVALE2 FEAT_TLBIRANGE
100 0111 000 ALLE2 -
100 0111 001 VAE2 -
100 0111 100 ALLE1 -
100 0111 101 VALE2 -
100 0111 110 VMALLS12E1 -
110 0001 000 ALLE3OS FEAT_TLBIOS
110 0001 001 VAE3OS FEAT_TLBIOS
110 0001 101 VALE3OS FEAT_TLBIOS
110 0010 001 RVAE3IS FEAT_TLBIRANGE
110 0010 101 RVALE3IS FEAT_TLBIRANGE

TLBI Page 784

op1 CRm op2 <tlbi_op> Architectural Feature
110 0011 000 ALLE3IS -
110 0011 001 VAE3IS -
110 0011 101 VALE3IS -
110 0101 001 RVAE3OS FEAT_TLBIRANGE
110 0101 101 RVALE3OS FEAT_TLBIRANGE
110 0110 001 RVAE3 FEAT_TLBIRANGE
110 0110 101 RVALE3 FEAT_TLBIRANGE
110 0111 000 ALLE3 -
110 0111 001 VAE3 -
110 0111 101 VALE3 -

<Xt> Is the 64-bit name of the optional general-purpose source register, defaulting to '11111', encoded in the
"Rt" field.

Operation

The description of SYS gives the operational pseudocode for this instruction.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

TLBI Page 785

TSB CSYNC

Trace Synchronization Barrier. This instruction is a barrier that synchronizes the trace operations of instructions.
If FEAT_TRF is not implemented, this instruction executes as a NOP.

System
(FEAT_TRF)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 0 1 0 1 0 0 0 0 0 0 1 1 0 0 1 0 0 0 1 0 0 1 0 1 1 1 1 1
CRm op2

TSB CSYNC

if !HaveSelfHostedTrace() then EndOfInstruction();

Operation

TraceSynchronizationBarrier();

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

TSB CSYNC Page 786

TST (immediate)

Test bits (immediate), setting the condition flags and discarding the result

: Rn AND imm.

This is an alias of ANDS (immediate). This means:

• The encodings in this description are named to match the encodings of ANDS (immediate).
• The description of ANDS (immediate) gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 1 1 1 0 0 1 0 0 N immr imms Rn 1 1 1 1 1
opc Rd

32-bit (sf == 0 && N == 0)

TST <Wn>, #<imm>

is equivalent to

ANDS WZR, <Wn>, #<imm>

and is always the preferred disassembly.

64-bit (sf == 1)

TST <Xn>, #<imm>

is equivalent to

ANDS XZR, <Xn>, #<imm>

and is always the preferred disassembly.

Assembler Symbols

<Wn> Is the 32-bit name of the general-purpose source register, encoded in the "Rn" field.
<Xn> Is the 64-bit name of the general-purpose source register, encoded in the "Rn" field.
<imm> For the 32-bit variant: is the bitmask immediate, encoded in "imms:immr".
For the 64-bit variant: is the bitmask immediate, encoded in "N:imms:immr".

Operation

The description of ANDS (immediate) gives the operational pseudocode for this instruction.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

TST (immediate) Page 787

TST (shifted register)

Test (shifted register) performs a bitwise AND operation on a register value and an optionally-shifted register value. It
updates the condition flags based on the result, and discards the result.

This is an alias of ANDS (shifted register). This means:

• The encodings in this description are named to match the encodings of ANDS (shifted register).
• The description of ANDS (shifted register) gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 1 1 0 1 0 1 0 shift 0 Rm imm6 Rn 1 1 1 1 1
opc N Rd

32-bit (sf == 0)

TST <Wn>, <Wm>{, <shift> #<amount>}

is equivalent to

ANDS WZR, <Wn>, <Wm>{, <shift> #<amount>}

and is always the preferred disassembly.

64-bit (sf == 1)

TST <Xn>, <Xm>{, <shift> #<amount>}

is equivalent to

ANDS XZR, <Xn>, <Xm>{, <shift> #<amount>}

and is always the preferred disassembly.

Assembler Symbols

<shift> Is the optional shift to be applied to the final source, defaulting to LSL and encoded in “shift”:

shift <shift>
00 LSL
01 LSR
10 ASR
11 ROR

Operation

The description of ANDS (shifted register) gives the operational pseudocode for this instruction.

Operational information

If PSTATE.DIT is 1:

TST (shifted register) Page 788

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

TST (shifted register) Page 789

UBFIZ

Unsigned Bitfield Insert in Zeros copies a bitfield of <width> bits from the least significant bits of the source register
to bit position <lsb> of the destination register, setting the destination bits above and below the bitfield to zero.

This is an alias of UBFM. This means:

32-bit (sf == 0 && N == 0)

UBFIZ <Wd>, <Wn>, #<lsb>, #<width>

is equivalent to

UBFM <Wd>, <Wn>, #(-<lsb> MOD 32), #(<width>-1)

and is the preferred disassembly when UInt(imms) < UInt(immr).

64-bit (sf == 1 && N == 1)

UBFIZ <Xd>, <Xn>, #<lsb>, #<width>

is equivalent to

UBFM <Xd>, <Xn>, #(-<lsb> MOD 64), #(<width>-1)

and is the preferred disassembly when UInt(imms) < UInt(immr).

Assembler Symbols

Operation

The description of UBFM gives the operational pseudocode for this instruction.

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

UBFIZ Page 790

UBFIZ Page 791

UBFM

Unsigned Bitfield Move is usually accessed via one of its aliases, which are always preferred for disassembly.
If <imms> is greater than or equal to <immr>, this copies a bitfield of (<imms>-<immr>+1) bits starting from bit
position <immr> in the source register to the least significant bits of the destination register.
If <imms> is less than <immr>, this copies a bitfield of (<imms>+1) bits from the least significant bits of the source
register to bit position (regsize-<immr>) of the destination register, where regsize is the destination register size of 32
or 64 bits.
In both cases the destination bits below and above the bitfield are set to zero.
This instruction is used by the aliases LSL (immediate), LSR (immediate), UBFIZ, UBFX, UXTB, and UXTH.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 1 0 1 0 0 1 1 0 N immr imms Rn Rd
opc

32-bit (sf == 0 && N == 0)

UBFM <Wd>, <Wn>, #<immr>, #<imms>

64-bit (sf == 1 && N == 1)

UBFM <Xd>, <Xn>, #<immr>, #<imms>

integer d = UInt(Rd);
integer n = UInt(Rn);
integer datasize = if sf == '1' then 64 else 32;

integer R;
bits(datasize) wmask;
bits(datasize) tmask;

if sf == '1' && N != '1' then UNDEFINED;

if sf == '0' && (N != '0' || immr<5> != '0' || imms<5> != '0') then UNDEFINED;

R = UInt(immr);
(wmask, tmask) = DecodeBitMasks(N, imms, immr, FALSE);

Assembler Symbols

<Wd> Is the 32-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Wn> Is the 32-bit name of the general-purpose source register, encoded in the "Rn" field.
<Xd> Is the 64-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Xn> Is the 64-bit name of the general-purpose source register, encoded in the "Rn" field.
<immr> For the 32-bit variant: is the right rotate amount, in the range 0 to 31, encoded in the "immr" field.
For the 64-bit variant: is the right rotate amount, in the range 0 to 63, encoded in the "immr" field.
<imms> For the 32-bit variant: is the leftmost bit number to be moved from the source, in the range 0 to 31,
encoded in the "imms" field.
For the 64-bit variant: is the leftmost bit number to be moved from the source, in the range 0 to 63,
encoded in the "imms" field.

Alias Conditions

Alias Of variant Is preferred when

LSL (immediate) 32-bit imms != '011111' && imms + 1 == immr
LSL (immediate) 64-bit imms != '111111' && imms + 1 == immr
LSR (immediate) 32-bit imms == '011111'
LSR (immediate) 64-bit imms == '111111'

UBFM Page 792

Alias Of variant Is preferred when
UBFIZ UInt(imms) < UInt(immr)
UBFX BFXPreferred(sf, opc<1>, imms, immr)
UXTB immr == '000000' && imms == '000111'
UXTH immr == '000000' && imms == '001111'

Operation

bits(datasize) src = X[n];

// perform bitfield move on low bits

bits(datasize) bot = ROR(src, R) AND wmask;

// combine extension bits and result bits

X[d] = bot AND tmask;

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

UBFM Page 793

UBFX

Unsigned Bitfield Extract copies a bitfield of <width> bits starting from bit position <lsb> in the source register to the
least significant bits of the destination register, and sets destination bits above the bitfield to zero.

This is an alias of UBFM. This means:

32-bit (sf == 0 && N == 0)

UBFX <Wd>, <Wn>, #<lsb>, #<width>

is equivalent to

UBFM <Wd>, <Wn>, #<lsb>, #(<lsb>+<width>-1)

and is the preferred disassembly when BFXPreferred(sf, opc<1>, imms, immr).

64-bit (sf == 1 && N == 1)

UBFX <Xd>, <Xn>, #<lsb>, #<width>

is equivalent to

UBFM <Xd>, <Xn>, #<lsb>, #(<lsb>+<width>-1)

and is the preferred disassembly when BFXPreferred(sf, opc<1>, imms, immr).

Assembler Symbols

<Wd> Is the 32-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Wn> Is the 32-bit name of the general-purpose source register, encoded in the "Rn" field.
<Xd> Is the 64-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Xn> Is the 64-bit name of the general-purpose source register, encoded in the "Rn" field.
<lsb> For the 32-bit variant: is the bit number of the lsb of the source bitfield, in the range 0 to 31.
For the 64-bit variant: is the bit number of the lsb of the source bitfield, in the range 0 to 63.
<width> For the 32-bit variant: is the width of the bitfield, in the range 1 to 32-<lsb>.
For the 64-bit variant: is the width of the bitfield, in the range 1 to 64-<lsb>.

Operation

The description of UBFM gives the operational pseudocode for this instruction.

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

UBFX Page 794

UBFX Page 795

UDF

Permanently Undefined generates an Undefined Instruction exception (ESR_ELx.EC = 0b000000). The encodings for
UDF used in this section are defined as permanently UNDEFINED.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 imm16

UDF #<imm>

// The imm16 field is ignored by hardware.

UNDEFINED;

Assembler Symbols

<imm> is a 16-bit unsigned immediate, in the range 0 to 65535, encoded in the "imm16" field. The PE ignores
the value of this constant.

Operation

// No operation.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

UDF Page 796

UDIV

Unsigned Divide divides an unsigned integer register value by another unsigned integer register value, and writes the
result to the destination register. The condition flags are not affected.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 0 0 1 1 0 1 0 1 1 0 Rm 0 0 0 0 1 0 Rn Rd
o1

32-bit (sf == 0)

UDIV <Wd>, <Wn>, <Wm>

64-bit (sf == 1)

UDIV <Xd>, <Xn>, <Xm>

integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer datasize = if sf == '1' then 64 else 32;

Assembler Symbols

Operation

bits(datasize) operand1 = X[n];

bits(datasize) operand2 = X[m];
integer result;

if IsZero(operand2) then
result = 0;
else
result = RoundTowardsZero(Real(Int(operand1, TRUE)) / Real(Int(operand2, TRUE)));

X[d] = result<datasize-1:0>;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

UDIV Page 797

UMADDL

Unsigned Multiply-Add Long multiplies two 32-bit register values, adds a 64-bit register value, and writes the result to
the 64-bit destination register.
This instruction is used by the alias UMULL.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 0 1 1 0 1 1 1 0 1 Rm 0 Ra Rn Rd
U o0

UMADDL <Xd>, <Wn>, <Wm>, <Xa>

integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer a = UInt(Ra);

Assembler Symbols

Alias Conditions

Alias Is preferred when

UMULL Ra == '11111'

Operation

bits(32) operand1 = X[n];

bits(32) operand2 = X[m];
bits(64) operand3 = X[a];

integer result;

result = Int(operand3, TRUE) + (Int(operand1, TRUE) * Int(operand2, TRUE));

X[d] = result<63:0>;

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

UMADDL Page 798

UMNEGL

Unsigned Multiply-Negate Long multiplies two 32-bit register values, negates the product, and writes the result to the
64-bit destination register.

This is an alias of UMSUBL. This means:

• The encodings in this description are named to match the encodings of UMSUBL.
• The description of UMSUBL gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 0 1 1 0 1 1 1 0 1 Rm 1 1 1 1 1 1 Rn Rd
U o0 Ra

UMNEGL <Xd>, <Wn>, <Wm>

is equivalent to

UMSUBL <Xd>, <Wn>, <Wm>, XZR

and is always the preferred disassembly.

Assembler Symbols

Operation

The description of UMSUBL gives the operational pseudocode for this instruction.

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

UMNEGL Page 799

UMSUBL

Unsigned Multiply-Subtract Long multiplies two 32-bit register values, subtracts the product from a 64-bit register
value, and writes the result to the 64-bit destination register.
This instruction is used by the alias UMNEGL.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 0 1 1 0 1 1 1 0 1 Rm 1 Ra Rn Rd
U o0

UMSUBL <Xd>, <Wn>, <Wm>, <Xa>

integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer a = UInt(Ra);

Assembler Symbols

Alias Conditions

Alias Is preferred when

UMNEGL Ra == '11111'

Operation

bits(32) operand1 = X[n];

bits(32) operand2 = X[m];
bits(64) operand3 = X[a];

integer result;

result = Int(operand3, TRUE) - (Int(operand1, TRUE) * Int(operand2, TRUE));

X[d] = result<63:0>;

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

UMSUBL Page 800

UMULH

Unsigned Multiply High multiplies two 64-bit register values, and writes bits[127:64] of the 128-bit result to the 64-bit
destination register.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 0 1 1 0 1 1 1 1 0 Rm 0 (1) (1) (1) (1) (1) Rn Rd
U Ra

UMULH <Xd>, <Xn>, <Xm>

integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);

Assembler Symbols

Operation

bits(64) operand1 = X[n];

bits(64) operand2 = X[m];

integer result;

result = Int(operand1, TRUE) * Int(operand2, TRUE);

X[d] = result<127:64>;

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

UMULH Page 801

UMULL

Unsigned Multiply Long multiplies two 32-bit register values, and writes the result to the 64-bit destination register.

This is an alias of UMADDL. This means:

• The encodings in this description are named to match the encodings of UMADDL.
• The description of UMADDL gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 0 1 1 0 1 1 1 0 1 Rm 0 1 1 1 1 1 Rn Rd
U o0 Ra

UMULL <Xd>, <Wn>, <Wm>

is equivalent to

UMADDL <Xd>, <Wn>, <Wm>, XZR

and is always the preferred disassembly.

Assembler Symbols

Operation

The description of UMADDL gives the operational pseudocode for this instruction.

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

UMULL Page 802

UXTB

Unsigned Extend Byte extracts an 8-bit value from a register, zero-extends it to the size of the register, and writes the
result to the destination register.

This is an alias of UBFM. This means:

32-bit

UXTB <Wd>, <Wn>

is equivalent to

UBFM <Wd>, <Wn>, #0, #7

and is always the preferred disassembly.

Assembler Symbols

<Wd> Is the 32-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Wn> Is the 32-bit name of the general-purpose source register, encoded in the "Rn" field.

Operation

The description of UBFM gives the operational pseudocode for this instruction.

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

UXTB Page 803

UXTH

Unsigned Extend Halfword extracts a 16-bit value from a register, zero-extends it to the size of the register, and writes
the result to the destination register.

This is an alias of UBFM. This means:

32-bit

UXTH <Wd>, <Wn>

is equivalent to

UBFM <Wd>, <Wn>, #0, #15

and is always the preferred disassembly.

Assembler Symbols

<Wd> Is the 32-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Wn> Is the 32-bit name of the general-purpose source register, encoded in the "Rn" field.

Operation

The description of UBFM gives the operational pseudocode for this instruction.

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

UXTH Page 804

WFE

Wait For Event is a hint instruction that indicates that the PE can enter a low-power state and remain there until a
wakeup event occurs. Wakeup events include the event signaled as a result of executing the SEV instruction on any PE
in the multiprocessor system. For more information, see Wait For Event mechanism and Send event.
As described in Wait For Event mechanism and Send event, the execution of a WFE instruction that would otherwise
cause entry to a low-power state can be trapped to a higher Exception level.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 0 1 0 1 0 0 0 0 0 0 1 1 0 0 1 0 0 0 0 0 0 1 0 1 1 1 1 1
CRm op2

WFE

// Empty.

Operation

Hint_WFE(1, WFxType_WFE);

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

WFE Page 805

WFET

Wait For Event with Timeout is a hint instruction that indicates that the PE can enter a low-power state and remain
there until either a local timeout event or a wakeup event occurs. Wakeup events include the event signaled as a result
of executing the SEV instruction on any PE in the multiprocessor system. For more information, see Wait For Event
mechanism and Send event.
As described in Wait For Event mechanism and Send event, the execution of a WFET instruction that would otherwise
cause entry to a low-power state can be trapped to a higher Exception level.

System
(FEAT_WFxT)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 0 1 0 1 0 0 0 0 0 0 1 1 0 0 0 1 0 0 0 0 0 0 0 Rd

WFET <Xt>

if !HaveFeatWFxT() then UNDEFINED;

integer d = UInt(Rd);

Assembler Symbols

<Xt> Is the 64-bit name of the general-purpose source register, encoded in the "Rd" field.

Operation

integer localtimeout = UInt(X[d, 64]);

if Halted() && ConstrainUnpredictableBool(Unpredictable_WFxTDEBUG) then

EndOfInstruction();

Hint_WFE(localtimeout, WFxType_WFET);

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

WFET Page 806

WFI

Wait For Interrupt is a hint instruction that indicates that the PE can enter a low-power state and remain there until a
wakeup event occurs. For more information, see Wait For Interrupt.
As described in Wait For Interrupt, the execution of a WFI instruction that would otherwise cause entry to a low-power
state can be trapped to a higher Exception level.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 0 1 0 1 0 0 0 0 0 0 1 1 0 0 1 0 0 0 0 0 0 1 1 1 1 1 1 1
CRm op2

WFI

// Empty.

Operation

Hint_WFI(1, WFxType_WFI);

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

WFI Page 807

WFIT

Wait For Interrupt with Timeout is a hint instruction that indicates that the PE can enter a low-power state and remain
there until either a local timeout event or a wakeup event occurs. For more information, see Wait For Interrupt.
As described in Wait For Interrupt, the execution of a WFIT instruction that would otherwise cause entry to a low-
power state can be trapped to a higher Exception level.

System
(FEAT_WFxT)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 0 1 0 1 0 0 0 0 0 0 1 1 0 0 0 1 0 0 0 0 0 0 1 Rd

WFIT <Xt>

if !HaveFeatWFxT() then UNDEFINED;

integer d = UInt(Rd);

Assembler Symbols

<Xt> Is the 64-bit name of the general-purpose source register, encoded in the "Rd" field.

Operation

integer localtimeout = UInt(X[d, 64]);

if Halted() && ConstrainUnpredictableBool(Unpredictable_WFxTDEBUG) then

EndOfInstruction();

Hint_WFI(localtimeout, WFxType_WFIT);

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

WFIT Page 808

XAFLAG

Convert floating-point condition flags from external format to Arm format. This instruction converts the state of the
PSTATE.{N,Z,C,V} flags from an alternative representation required by some software to a form representing the
result of an Arm floating-point scalar compare instruction.

System
(FEAT_FlagM2)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 0 1 0 1 0 0 0 0 0 0 0 0 0 1 0 0 (0) (0) (0) (0) 0 0 1 1 1 1 1 1
CRm

XAFLAG

if !HaveFlagFormatExt() then UNDEFINED;

Operation

bit N = NOT(PSTATE.C) AND NOT(PSTATE.Z);

bit Z = PSTATE.Z AND PSTATE.C;
bit C = PSTATE.C OR PSTATE.Z;
bit V = NOT(PSTATE.C) AND PSTATE.Z;

PSTATE.N = N;
PSTATE.Z = Z;
PSTATE.C = C;
PSTATE.V = V;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

XAFLAG Page 809

XPACD, XPACI, XPACLRI

Strip Pointer Authentication Code. This instruction removes the pointer authentication code from an address. The
address is in the specified general-purpose register for XPACI and XPACD, and is in LR for XPACLRI.
The XPACD instruction is used for data addresses, and XPACI and XPACLRI are used for instruction addresses.
It has encodings from 2 classes: Integer and System

Integer
(FEAT_PAuth)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 1 0 1 0 1 1 0 0 0 0 0 1 0 1 0 0 0 D 1 1 1 1 1 Rd
Rn

XPACD (D == 1)

XPACD <Xd>

XPACI (D == 0)

XPACI <Xd>

boolean data = (D == '1');

integer d = UInt(Rd);

if !HavePACExt() then
UNDEFINED;

System
(FEAT_PAuth)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 0 1 0 1 0 0 0 0 0 0 1 1 0 0 1 0 0 0 0 0 1 1 1 1 1 1 1 1

XPACLRI

integer d = 30;
boolean data = FALSE;

Assembler Symbols

<Xd> Is the 64-bit name of the general-purpose destination register, encoded in the "Rd" field.

Operation

if HavePACExt() then
X[d] = Strip(X[d], data);

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

XPACD, XPACI, XPACLRI Page 810

YIELD

YIELD is a hint instruction. Software with a multithreading capability can use a YIELD instruction to indicate to the PE
that it is performing a task, for example a spin-lock, that could be swapped out to improve overall system performance.
The PE can use this hint to suspend and resume multiple software threads if it supports the capability.
For more information about the recommended use of this instruction, see The YIELD instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 1 0 1 0 1 0 0 0 0 0 0 1 1 0 0 1 0 0 0 0 0 0 0 1 1 1 1 1 1
CRm op2

YIELD

// Empty.

Operation

Hint_Yield();

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

YIELD Page 811

A64 -- SIMD and Floating-point Instructions (alphabetic order)

ABS: Absolute value (vector).

ADD (vector): Add (vector).

ADDHN, ADDHN2: Add returning High Narrow.

ADDP (scalar): Add Pair of elements (scalar).

ADDP (vector): Add Pairwise (vector).

ADDV: Add across Vector.

AESD: AES single round decryption.

AESE: AES single round encryption.

AESIMC: AES inverse mix columns.

AESMC: AES mix columns.

AND (vector): Bitwise AND (vector).

BCAX: Bit Clear and XOR.

BFCVT: Floating-point convert from single-precision to BFloat16 format (scalar).

BFCVTN, BFCVTN2: Floating-point convert from single-precision to BFloat16 format (vector).

BFDOT (by element): BFloat16 floating-point dot product (vector, by element).

BFDOT (vector): BFloat16 floating-point dot product (vector).

BFMLALB, BFMLALT (by element): BFloat16 floating-point widening multiply-add long (by element).

BFMLALB, BFMLALT (vector): BFloat16 floating-point widening multiply-add long (vector).

BFMMLA: BFloat16 floating-point matrix multiply-accumulate into 2x2 matrix.

BIC (vector, immediate): Bitwise bit Clear (vector, immediate).

BIC (vector, register): Bitwise bit Clear (vector, register).

BIF: Bitwise Insert if False.

BIT: Bitwise Insert if True.

BSL: Bitwise Select.

CLS (vector): Count Leading Sign bits (vector).

CLZ (vector): Count Leading Zero bits (vector).

CMEQ (register): Compare bitwise Equal (vector).

CMEQ (zero): Compare bitwise Equal to zero (vector).

CMGE (register): Compare signed Greater than or Equal (vector).

CMGE (zero): Compare signed Greater than or Equal to zero (vector).

CMGT (register): Compare signed Greater than (vector).

CMGT (zero): Compare signed Greater than zero (vector).

CMHI (register): Compare unsigned Higher (vector).

CMHS (register): Compare unsigned Higher or Same (vector).

Page 812
A64 -- SIMD and Floating-point Instructions (alphabetic order)

CMLE (zero): Compare signed Less than or Equal to zero (vector).

CMLT (zero): Compare signed Less than zero (vector).

CMTST: Compare bitwise Test bits nonzero (vector).

CNT: Population Count per byte.

DUP (element): Duplicate vector element to vector or scalar.

DUP (general): Duplicate general-purpose register to vector.

EOR (vector): Bitwise Exclusive OR (vector).

EOR3: Three-way Exclusive OR.

EXT: Extract vector from pair of vectors.

FABD: Floating-point Absolute Difference (vector).

FABS (scalar): Floating-point Absolute value (scalar).

FABS (vector): Floating-point Absolute value (vector).

FACGE: Floating-point Absolute Compare Greater than or Equal (vector).

FACGT: Floating-point Absolute Compare Greater than (vector).

FADD (scalar): Floating-point Add (scalar).

FADD (vector): Floating-point Add (vector).

FADDP (scalar): Floating-point Add Pair of elements (scalar).

FADDP (vector): Floating-point Add Pairwise (vector).

FCADD: Floating-point Complex Add.

FCCMP: Floating-point Conditional quiet Compare (scalar).

FCCMPE: Floating-point Conditional signaling Compare (scalar).

FCMEQ (register): Floating-point Compare Equal (vector).

FCMEQ (zero): Floating-point Compare Equal to zero (vector).

FCMGE (register): Floating-point Compare Greater than or Equal (vector).

FCMGE (zero): Floating-point Compare Greater than or Equal to zero (vector).

FCMGT (register): Floating-point Compare Greater than (vector).

FCMGT (zero): Floating-point Compare Greater than zero (vector).

FCMLA: Floating-point Complex Multiply Accumulate.

FCMLA (by element): Floating-point Complex Multiply Accumulate (by element).

FCMLE (zero): Floating-point Compare Less than or Equal to zero (vector).

FCMLT (zero): Floating-point Compare Less than zero (vector).

FCMP: Floating-point quiet Compare (scalar).

FCMPE: Floating-point signaling Compare (scalar).

FCSEL: Floating-point Conditional Select (scalar).

FCVT: Floating-point Convert precision (scalar).

FCVTAS (scalar): Floating-point Convert to Signed integer, rounding to nearest with ties to Away (scalar).

Page 813
A64 -- SIMD and Floating-point Instructions (alphabetic order)

FCVTAS (vector): Floating-point Convert to Signed integer, rounding to nearest with ties to Away (vector).

FCVTAU (scalar): Floating-point Convert to Unsigned integer, rounding to nearest with ties to Away (scalar).

FCVTAU (vector): Floating-point Convert to Unsigned integer, rounding to nearest with ties to Away (vector).

FCVTL, FCVTL2: Floating-point Convert to higher precision Long (vector).

FCVTMS (scalar): Floating-point Convert to Signed integer, rounding toward Minus infinity (scalar).

FCVTMS (vector): Floating-point Convert to Signed integer, rounding toward Minus infinity (vector).

FCVTMU (scalar): Floating-point Convert to Unsigned integer, rounding toward Minus infinity (scalar).

FCVTMU (vector): Floating-point Convert to Unsigned integer, rounding toward Minus infinity (vector).

FCVTN, FCVTN2: Floating-point Convert to lower precision Narrow (vector).

FCVTNS (scalar): Floating-point Convert to Signed integer, rounding to nearest with ties to even (scalar).

FCVTNS (vector): Floating-point Convert to Signed integer, rounding to nearest with ties to even (vector).

FCVTNU (scalar): Floating-point Convert to Unsigned integer, rounding to nearest with ties to even (scalar).

FCVTNU (vector): Floating-point Convert to Unsigned integer, rounding to nearest with ties to even (vector).

FCVTPS (scalar): Floating-point Convert to Signed integer, rounding toward Plus infinity (scalar).

FCVTPS (vector): Floating-point Convert to Signed integer, rounding toward Plus infinity (vector).

FCVTPU (scalar): Floating-point Convert to Unsigned integer, rounding toward Plus infinity (scalar).

FCVTPU (vector): Floating-point Convert to Unsigned integer, rounding toward Plus infinity (vector).

FCVTXN, FCVTXN2: Floating-point Convert to lower precision Narrow, rounding to odd (vector).

FCVTZS (scalar, fixed-point): Floating-point Convert to Signed fixed-point, rounding toward Zero (scalar).

FCVTZS (scalar, integer): Floating-point Convert to Signed integer, rounding toward Zero (scalar).

FCVTZS (vector, fixed-point): Floating-point Convert to Signed fixed-point, rounding toward Zero (vector).

FCVTZS (vector, integer): Floating-point Convert to Signed integer, rounding toward Zero (vector).

FCVTZU (scalar, fixed-point): Floating-point Convert to Unsigned fixed-point, rounding toward Zero (scalar).

FCVTZU (scalar, integer): Floating-point Convert to Unsigned integer, rounding toward Zero (scalar).

FCVTZU (vector, fixed-point): Floating-point Convert to Unsigned fixed-point, rounding toward Zero (vector).

FCVTZU (vector, integer): Floating-point Convert to Unsigned integer, rounding toward Zero (vector).

FDIV (scalar): Floating-point Divide (scalar).

FDIV (vector): Floating-point Divide (vector).

FJCVTZS: Floating-point Javascript Convert to Signed fixed-point, rounding toward Zero.

FMADD: Floating-point fused Multiply-Add (scalar).

FMAX (scalar): Floating-point Maximum (scalar).

FMAX (vector): Floating-point Maximum (vector).

FMAXNM (scalar): Floating-point Maximum Number (scalar).

FMAXNM (vector): Floating-point Maximum Number (vector).

FMAXNMP (scalar): Floating-point Maximum Number of Pair of elements (scalar).

FMAXNMP (vector): Floating-point Maximum Number Pairwise (vector).

Page 814
A64 -- SIMD and Floating-point Instructions (alphabetic order)

FMAXNMV: Floating-point Maximum Number across Vector.

FMAXP (scalar): Floating-point Maximum of Pair of elements (scalar).

FMAXP (vector): Floating-point Maximum Pairwise (vector).

FMAXV: Floating-point Maximum across Vector.

FMIN (scalar): Floating-point Minimum (scalar).

FMIN (vector): Floating-point minimum (vector).

FMINNM (scalar): Floating-point Minimum Number (scalar).

FMINNM (vector): Floating-point Minimum Number (vector).

FMINNMP (scalar): Floating-point Minimum Number of Pair of elements (scalar).

FMINNMP (vector): Floating-point Minimum Number Pairwise (vector).

FMINNMV: Floating-point Minimum Number across Vector.

FMINP (scalar): Floating-point Minimum of Pair of elements (scalar).

FMINP (vector): Floating-point Minimum Pairwise (vector).

FMINV: Floating-point Minimum across Vector.

FMLA (by element): Floating-point fused Multiply-Add to accumulator (by element).

FMLA (vector): Floating-point fused Multiply-Add to accumulator (vector).

FMLAL, FMLAL2 (by element): Floating-point fused Multiply-Add Long to accumulator (by element).

FMLAL, FMLAL2 (vector): Floating-point fused Multiply-Add Long to accumulator (vector).

FMLS (by element): Floating-point fused Multiply-Subtract from accumulator (by element).

FMLS (vector): Floating-point fused Multiply-Subtract from accumulator (vector).

FMLSL, FMLSL2 (by element): Floating-point fused Multiply-Subtract Long from accumulator (by element).

FMLSL, FMLSL2 (vector): Floating-point fused Multiply-Subtract Long from accumulator (vector).

FMOV (general): Floating-point Move to or from general-purpose register without conversion.

FMOV (register): Floating-point Move register without conversion.

FMOV (scalar, immediate): Floating-point move immediate (scalar).

FMOV (vector, immediate): Floating-point move immediate (vector).

FMSUB: Floating-point Fused Multiply-Subtract (scalar).

FMUL (by element): Floating-point Multiply (by element).

FMUL (scalar): Floating-point Multiply (scalar).

FMUL (vector): Floating-point Multiply (vector).

FMULX: Floating-point Multiply extended.

FMULX (by element): Floating-point Multiply extended (by element).

FNEG (scalar): Floating-point Negate (scalar).

FNEG (vector): Floating-point Negate (vector).

FNMADD: Floating-point Negated fused Multiply-Add (scalar).

FNMSUB: Floating-point Negated fused Multiply-Subtract (scalar).

Page 815
A64 -- SIMD and Floating-point Instructions (alphabetic order)

FNMUL (scalar): Floating-point Multiply-Negate (scalar).

FRECPE: Floating-point Reciprocal Estimate.

FRECPS: Floating-point Reciprocal Step.

FRECPX: Floating-point Reciprocal exponent (scalar).

FRINT32X (scalar): Floating-point Round to 32-bit Integer, using current rounding mode (scalar).

FRINT32X (vector): Floating-point Round to 32-bit Integer, using current rounding mode (vector).

FRINT32Z (scalar): Floating-point Round to 32-bit Integer toward Zero (scalar).

FRINT32Z (vector): Floating-point Round to 32-bit Integer toward Zero (vector).

FRINT64X (scalar): Floating-point Round to 64-bit Integer, using current rounding mode (scalar).

FRINT64X (vector): Floating-point Round to 64-bit Integer, using current rounding mode (vector).

FRINT64Z (scalar): Floating-point Round to 64-bit Integer toward Zero (scalar).

FRINT64Z (vector): Floating-point Round to 64-bit Integer toward Zero (vector).

FRINTA (scalar): Floating-point Round to Integral, to nearest with ties to Away (scalar).

FRINTA (vector): Floating-point Round to Integral, to nearest with ties to Away (vector).

FRINTI (scalar): Floating-point Round to Integral, using current rounding mode (scalar).

FRINTI (vector): Floating-point Round to Integral, using current rounding mode (vector).

FRINTM (scalar): Floating-point Round to Integral, toward Minus infinity (scalar).

FRINTM (vector): Floating-point Round to Integral, toward Minus infinity (vector).

FRINTN (scalar): Floating-point Round to Integral, to nearest with ties to even (scalar).

FRINTN (vector): Floating-point Round to Integral, to nearest with ties to even (vector).

FRINTP (scalar): Floating-point Round to Integral, toward Plus infinity (scalar).

FRINTP (vector): Floating-point Round to Integral, toward Plus infinity (vector).

FRINTX (scalar): Floating-point Round to Integral exact, using current rounding mode (scalar).

FRINTX (vector): Floating-point Round to Integral exact, using current rounding mode (vector).

FRINTZ (scalar): Floating-point Round to Integral, toward Zero (scalar).

FRINTZ (vector): Floating-point Round to Integral, toward Zero (vector).

FRSQRTE: Floating-point Reciprocal Square Root Estimate.

FRSQRTS: Floating-point Reciprocal Square Root Step.

FSQRT (scalar): Floating-point Square Root (scalar).

FSQRT (vector): Floating-point Square Root (vector).

FSUB (scalar): Floating-point Subtract (scalar).

FSUB (vector): Floating-point Subtract (vector).

INS (element): Insert vector element from another vector element.

INS (general): Insert vector element from general-purpose register.

LD1 (multiple structures): Load multiple single-element structures to one, two, three, or four registers.

LD1 (single structure): Load one single-element structure to one lane of one register.

Page 816
A64 -- SIMD and Floating-point Instructions (alphabetic order)

LD1R: Load one single-element structure and Replicate to all lanes (of one register).

LD2 (multiple structures): Load multiple 2-element structures to two registers.

LD2 (single structure): Load single 2-element structure to one lane of two registers.

LD2R: Load single 2-element structure and Replicate to all lanes of two registers.

LD3 (multiple structures): Load multiple 3-element structures to three registers.

LD3 (single structure): Load single 3-element structure to one lane of three registers.

LD3R: Load single 3-element structure and Replicate to all lanes of three registers.

LD4 (multiple structures): Load multiple 4-element structures to four registers.

LD4 (single structure): Load single 4-element structure to one lane of four registers.

LD4R: Load single 4-element structure and Replicate to all lanes of four registers.

LDNP (SIMD&FP): Load Pair of SIMD&FP registers, with Non-temporal hint.

LDP (SIMD&FP): Load Pair of SIMD&FP registers.

LDR (immediate, SIMD&FP): Load SIMD&FP Register (immediate offset).

LDR (literal, SIMD&FP): Load SIMD&FP Register (PC-relative literal).

LDR (register, SIMD&FP): Load SIMD&FP Register (register offset).

LDUR (SIMD&FP): Load SIMD&FP Register (unscaled offset).

MLA (by element): Multiply-Add to accumulator (vector, by element).

MLA (vector): Multiply-Add to accumulator (vector).

MLS (by element): Multiply-Subtract from accumulator (vector, by element).

MLS (vector): Multiply-Subtract from accumulator (vector).

MOV (element): Move vector element to another vector element: an alias of INS (element).

MOV (from general): Move general-purpose register to a vector element: an alias of INS (general).

MOV (scalar): Move vector element to scalar: an alias of DUP (element).

MOV (to general): Move vector element to general-purpose register: an alias of UMOV.

MOV (vector): Move vector: an alias of ORR (vector, register).

MOVI: Move Immediate (vector).

MUL (by element): Multiply (vector, by element).

MUL (vector): Multiply (vector).

MVN: Bitwise NOT (vector): an alias of NOT.

MVNI: Move inverted Immediate (vector).

NEG (vector): Negate (vector).

NOT: Bitwise NOT (vector).

ORN (vector): Bitwise inclusive OR NOT (vector).

ORR (vector, immediate): Bitwise inclusive OR (vector, immediate).

ORR (vector, register): Bitwise inclusive OR (vector, register).

PMUL: Polynomial Multiply.

Page 817
A64 -- SIMD and Floating-point Instructions (alphabetic order)

PMULL, PMULL2: Polynomial Multiply Long.

RADDHN, RADDHN2: Rounding Add returning High Narrow.

RAX1: Rotate and Exclusive OR.

RBIT (vector): Reverse Bit order (vector).

REV16 (vector): Reverse elements in 16-bit halfwords (vector).

REV32 (vector): Reverse elements in 32-bit words (vector).

REV64: Reverse elements in 64-bit doublewords (vector).

RSHRN, RSHRN2: Rounding Shift Right Narrow (immediate).

RSUBHN, RSUBHN2: Rounding Subtract returning High Narrow.

SABA: Signed Absolute difference and Accumulate.

SABAL, SABAL2: Signed Absolute difference and Accumulate Long.

SABD: Signed Absolute Difference.

SABDL, SABDL2: Signed Absolute Difference Long.

SADALP: Signed Add and Accumulate Long Pairwise.

SADDL, SADDL2: Signed Add Long (vector).

SADDLP: Signed Add Long Pairwise.

SADDLV: Signed Add Long across Vector.

SADDW, SADDW2: Signed Add Wide.

SCVTF (scalar, fixed-point): Signed fixed-point Convert to Floating-point (scalar).

SCVTF (scalar, integer): Signed integer Convert to Floating-point (scalar).

SCVTF (vector, fixed-point): Signed fixed-point Convert to Floating-point (vector).

SCVTF (vector, integer): Signed integer Convert to Floating-point (vector).

SDOT (by element): Dot Product signed arithmetic (vector, by element).

SDOT (vector): Dot Product signed arithmetic (vector).

SHA1C: SHA1 hash update (choose).

SHA1H: SHA1 fixed rotate.

SHA1M: SHA1 hash update (majority).

SHA1P: SHA1 hash update (parity).

SHA1SU0: SHA1 schedule update 0.

SHA1SU1: SHA1 schedule update 1.

SHA256H: SHA256 hash update (part 1).

SHA256H2: SHA256 hash update (part 2).

SHA256SU0: SHA256 schedule update 0.

SHA256SU1: SHA256 schedule update 1.

SHA512H: SHA512 Hash update part 1.

SHA512H2: SHA512 Hash update part 2.

Page 818
A64 -- SIMD and Floating-point Instructions (alphabetic order)

SHA512SU0: SHA512 Schedule Update 0.

SHA512SU1: SHA512 Schedule Update 1.

SHADD: Signed Halving Add.

SHL: Shift Left (immediate).

SHLL, SHLL2: Shift Left Long (by element size).

SHRN, SHRN2: Shift Right Narrow (immediate).

SHSUB: Signed Halving Subtract.

SLI: Shift Left and Insert (immediate).

SM3PARTW1: SM3PARTW1.

SM3PARTW2: SM3PARTW2.

SM3SS1: SM3SS1.

SM3TT1A: SM3TT1A.

SM3TT1B: SM3TT1B.

SM3TT2A: SM3TT2A.

SM3TT2B: SM3TT2B.

SM4E: SM4 Encode.

SM4EKEY: SM4 Key.

SMAX: Signed Maximum (vector).

SMAXP: Signed Maximum Pairwise.

SMAXV: Signed Maximum across Vector.

SMIN: Signed Minimum (vector).

SMINP: Signed Minimum Pairwise.

SMINV: Signed Minimum across Vector.

SMLAL, SMLAL2 (by element): Signed Multiply-Add Long (vector, by element).

SMLAL, SMLAL2 (vector): Signed Multiply-Add Long (vector).

SMLSL, SMLSL2 (by element): Signed Multiply-Subtract Long (vector, by element).

SMLSL, SMLSL2 (vector): Signed Multiply-Subtract Long (vector).

SMMLA (vector): Signed 8-bit integer matrix multiply-accumulate (vector).

SMOV: Signed Move vector element to general-purpose register.

SMULL, SMULL2 (by element): Signed Multiply Long (vector, by element).

SMULL, SMULL2 (vector): Signed Multiply Long (vector).

SQABS: Signed saturating Absolute value.

SQADD: Signed saturating Add.

SQDMLAL, SQDMLAL2 (by element): Signed saturating Doubling Multiply-Add Long (by element).

SQDMLAL, SQDMLAL2 (vector): Signed saturating Doubling Multiply-Add Long.

SQDMLSL, SQDMLSL2 (by element): Signed saturating Doubling Multiply-Subtract Long (by element).

Page 819
A64 -- SIMD and Floating-point Instructions (alphabetic order)

SQDMLSL, SQDMLSL2 (vector): Signed saturating Doubling Multiply-Subtract Long.

SQDMULH (by element): Signed saturating Doubling Multiply returning High half (by element).

SQDMULH (vector): Signed saturating Doubling Multiply returning High half.

SQDMULL, SQDMULL2 (by element): Signed saturating Doubling Multiply Long (by element).

SQDMULL, SQDMULL2 (vector): Signed saturating Doubling Multiply Long.

SQNEG: Signed saturating Negate.

SQRDMLAH (by element): Signed Saturating Rounding Doubling Multiply Accumulate returning High Half (by
element).

SQRDMLAH (vector): Signed Saturating Rounding Doubling Multiply Accumulate returning High Half (vector).

SQRDMLSH (by element): Signed Saturating Rounding Doubling Multiply Subtract returning High Half (by element).

SQRDMLSH (vector): Signed Saturating Rounding Doubling Multiply Subtract returning High Half (vector).

SQRDMULH (by element): Signed saturating Rounding Doubling Multiply returning High half (by element).

SQRDMULH (vector): Signed saturating Rounding Doubling Multiply returning High half.

SQRSHL: Signed saturating Rounding Shift Left (register).

SQRSHRN, SQRSHRN2: Signed saturating Rounded Shift Right Narrow (immediate).

SQRSHRUN, SQRSHRUN2: Signed saturating Rounded Shift Right Unsigned Narrow (immediate).

SQSHL (immediate): Signed saturating Shift Left (immediate).

SQSHL (register): Signed saturating Shift Left (register).

SQSHLU: Signed saturating Shift Left Unsigned (immediate).

SQSHRN, SQSHRN2: Signed saturating Shift Right Narrow (immediate).

SQSHRUN, SQSHRUN2: Signed saturating Shift Right Unsigned Narrow (immediate).

SQSUB: Signed saturating Subtract.

SQXTN, SQXTN2: Signed saturating extract Narrow.

SQXTUN, SQXTUN2: Signed saturating extract Unsigned Narrow.

SRHADD: Signed Rounding Halving Add.

SRI: Shift Right and Insert (immediate).

SRSHL: Signed Rounding Shift Left (register).

SRSHR: Signed Rounding Shift Right (immediate).

SRSRA: Signed Rounding Shift Right and Accumulate (immediate).

SSHL: Signed Shift Left (register).

SSHLL, SSHLL2: Signed Shift Left Long (immediate).

SSHR: Signed Shift Right (immediate).

SSRA: Signed Shift Right and Accumulate (immediate).

SSUBL, SSUBL2: Signed Subtract Long.

SSUBW, SSUBW2: Signed Subtract Wide.

ST1 (multiple structures): Store multiple single-element structures from one, two, three, or four registers.

ST1 (single structure): Store a single-element structure from one lane of one register.

Page 820
A64 -- SIMD and Floating-point Instructions (alphabetic order)

ST2 (multiple structures): Store multiple 2-element structures from two registers.

ST2 (single structure): Store single 2-element structure from one lane of two registers.

ST3 (multiple structures): Store multiple 3-element structures from three registers.

ST3 (single structure): Store single 3-element structure from one lane of three registers.

ST4 (multiple structures): Store multiple 4-element structures from four registers.

ST4 (single structure): Store single 4-element structure from one lane of four registers.

STNP (SIMD&FP): Store Pair of SIMD&FP registers, with Non-temporal hint.

STP (SIMD&FP): Store Pair of SIMD&FP registers.

STR (immediate, SIMD&FP): Store SIMD&FP register (immediate offset).

STR (register, SIMD&FP): Store SIMD&FP register (register offset).

STUR (SIMD&FP): Store SIMD&FP register (unscaled offset).

SUB (vector): Subtract (vector).

SUBHN, SUBHN2: Subtract returning High Narrow.

SUDOT (by element): Dot product with signed and unsigned integers (vector, by element).

SUQADD: Signed saturating Accumulate of Unsigned value.

SXTL, SXTL2: Signed extend Long: an alias of SSHLL, SSHLL2.

TBL: Table vector Lookup.

TBX: Table vector lookup extension.

TRN1: Transpose vectors (primary).

TRN2: Transpose vectors (secondary).

UABA: Unsigned Absolute difference and Accumulate.

UABAL, UABAL2: Unsigned Absolute difference and Accumulate Long.

UABD: Unsigned Absolute Difference (vector).

UABDL, UABDL2: Unsigned Absolute Difference Long.

UADALP: Unsigned Add and Accumulate Long Pairwise.

UADDL, UADDL2: Unsigned Add Long (vector).

UADDLP: Unsigned Add Long Pairwise.

UADDLV: Unsigned sum Long across Vector.

UADDW, UADDW2: Unsigned Add Wide.

UCVTF (scalar, fixed-point): Unsigned fixed-point Convert to Floating-point (scalar).

UCVTF (scalar, integer): Unsigned integer Convert to Floating-point (scalar).

UCVTF (vector, fixed-point): Unsigned fixed-point Convert to Floating-point (vector).

UCVTF (vector, integer): Unsigned integer Convert to Floating-point (vector).

UDOT (by element): Dot Product unsigned arithmetic (vector, by element).

UDOT (vector): Dot Product unsigned arithmetic (vector).

UHADD: Unsigned Halving Add.

Page 821
A64 -- SIMD and Floating-point Instructions (alphabetic order)

UHSUB: Unsigned Halving Subtract.

UMAX: Unsigned Maximum (vector).

UMAXP: Unsigned Maximum Pairwise.

UMAXV: Unsigned Maximum across Vector.

UMIN: Unsigned Minimum (vector).

UMINP: Unsigned Minimum Pairwise.

UMINV: Unsigned Minimum across Vector.

UMLAL, UMLAL2 (by element): Unsigned Multiply-Add Long (vector, by element).

UMLAL, UMLAL2 (vector): Unsigned Multiply-Add Long (vector).

UMLSL, UMLSL2 (by element): Unsigned Multiply-Subtract Long (vector, by element).

UMLSL, UMLSL2 (vector): Unsigned Multiply-Subtract Long (vector).

UMMLA (vector): Unsigned 8-bit integer matrix multiply-accumulate (vector).

UMOV: Unsigned Move vector element to general-purpose register.

UMULL, UMULL2 (by element): Unsigned Multiply Long (vector, by element).

UMULL, UMULL2 (vector): Unsigned Multiply long (vector).

UQADD: Unsigned saturating Add.

UQRSHL: Unsigned saturating Rounding Shift Left (register).

UQRSHRN, UQRSHRN2: Unsigned saturating Rounded Shift Right Narrow (immediate).

UQSHL (immediate): Unsigned saturating Shift Left (immediate).

UQSHL (register): Unsigned saturating Shift Left (register).

UQSHRN, UQSHRN2: Unsigned saturating Shift Right Narrow (immediate).

UQSUB: Unsigned saturating Subtract.

UQXTN, UQXTN2: Unsigned saturating extract Narrow.

URECPE: Unsigned Reciprocal Estimate.

URHADD: Unsigned Rounding Halving Add.

URSHL: Unsigned Rounding Shift Left (register).

URSHR: Unsigned Rounding Shift Right (immediate).

URSQRTE: Unsigned Reciprocal Square Root Estimate.

URSRA: Unsigned Rounding Shift Right and Accumulate (immediate).

USDOT (by element): Dot Product with unsigned and signed integers (vector, by element).

USDOT (vector): Dot Product with unsigned and signed integers (vector).

USHL: Unsigned Shift Left (register).

USHLL, USHLL2: Unsigned Shift Left Long (immediate).

USHR: Unsigned Shift Right (immediate).

USMMLA (vector): Unsigned and signed 8-bit integer matrix multiply-accumulate (vector).

USQADD: Unsigned saturating Accumulate of Signed value.

Page 822
A64 -- SIMD and Floating-point Instructions (alphabetic order)

USRA: Unsigned Shift Right and Accumulate (immediate).

USUBL, USUBL2: Unsigned Subtract Long.

USUBW, USUBW2: Unsigned Subtract Wide.

UXTL, UXTL2: Unsigned extend Long: an alias of USHLL, USHLL2.

UZP1: Unzip vectors (primary).

UZP2: Unzip vectors (secondary).

XAR: Exclusive OR and Rotate.

XTN, XTN2: Extract Narrow.

ZIP1: Zip vectors (primary).

ZIP2: Zip vectors (secondary).

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

Page 823
ABS

Absolute value (vector). This instruction calculates the absolute value of each vector element in the source SIMD&FP
register, puts the result into a vector, and writes the vector to the destination SIMD&FP register.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: Scalar and Vector

Scalar

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 0 size 1 0 0 0 0 0 1 0 1 1 1 0 Rn Rd
U

ABS <V><d>, <V><n>

integer d = UInt(Rd);
integer n = UInt(Rn);

if size != '11' then UNDEFINED;

integer esize = 8 << UInt(size);
integer datasize = esize;
integer elements = 1;
boolean neg = (U == '1');

Vector

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 size 1 0 0 0 0 0 1 0 1 1 1 0 Rn Rd
U

ABS <Vd>.<T>, <Vn>.<T>

integer d = UInt(Rd);
integer n = UInt(Rn);

if size:Q == '110' then UNDEFINED;

integer esize = 8 << UInt(size);
integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;
boolean neg = (U == '1');

Assembler Symbols

<V> Is a width specifier, encoded in “size”:

size <V>
0x RESERVED
10 RESERVED
11 D

<d> Is the number of the SIMD&FP destination register, encoded in the "Rd" field.
<n> Is the number of the SIMD&FP source register, encoded in the "Rn" field.
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.

<T> Is an arrangement specifier, encoded in “size:Q”:

ABS Page 824

size Q <T>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 0 RESERVED
11 1 2D

<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.

Operation

CheckFPAdvSIMDEnabled64();
bits(datasize) operand = V[n];
bits(datasize) result;
integer element;

for e = 0 to elements-1
element = SInt(Elem[operand, e, esize]);
if neg then
element = -element;
else
element = Abs(element);
Elem[result, e, esize] = element<esize-1:0>;

V[d] = result;

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

ABS Page 825

ADD (vector)

Add (vector). This instruction adds corresponding elements in the two source SIMD&FP registers, places the results
into a vector, and writes the vector to the destination SIMD&FP register.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: Scalar and Vector

Scalar

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 0 size 1 Rm 1 0 0 0 0 1 Rn Rd
U

ADD <V><d>, <V><n>, <V><m>

integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
if size != '11' then UNDEFINED;
integer esize = 8 << UInt(size);
integer datasize = esize;
integer elements = 1;
boolean sub_op = (U == '1');

Vector

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 size 1 Rm 1 0 0 0 0 1 Rn Rd
U

ADD <Vd>.<T>, <Vn>.<T>, <Vm>.<T>

integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
if size:Q == '110' then UNDEFINED;
integer esize = 8 << UInt(size);
integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;
boolean sub_op = (U == '1');

Assembler Symbols

<V> Is a width specifier, encoded in “size”:

size <V>
0x RESERVED
10 RESERVED
11 D

<d> Is the number of the SIMD&FP destination register, in the "Rd" field.
<n> Is the number of the first SIMD&FP source register, encoded in the "Rn" field.
<m> Is the number of the second SIMD&FP source register, encoded in the "Rm" field.
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.

<T> Is an arrangement specifier, encoded in “size:Q”:

ADD (vector) Page 826

size Q <T>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 0 RESERVED
11 1 2D

<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.

Operation

CheckFPAdvSIMDEnabled64();
bits(datasize) operand1 = V[n];
bits(datasize) operand2 = V[m];
bits(datasize) result;
bits(esize) element1;
bits(esize) element2;

for e = 0 to elements-1
element1 = Elem[operand1, e, esize];
element2 = Elem[operand2, e, esize];
if sub_op then
Elem[result, e, esize] = element1 - element2;
else
Elem[result, e, esize] = element1 + element2;

V[d] = result;

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

ADD (vector) Page 827

ADDHN, ADDHN2

Add returning High Narrow. This instruction adds each vector element in the first source SIMD&FP register to the
corresponding vector element in the second source SIMD&FP register, places the most significant half of the result
into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register.
The results are truncated. For rounded results, see RADDHN.
The ADDHN instruction writes the vector to the lower half of the destination register and clears the upper half, while the
ADDHN2 instruction writes the vector to the upper half of the destination register without affecting the other bits of the
register.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 size 1 Rm 0 1 0 0 0 0 Rn Rd
U o1

ADDHN{2} <Vd>.<Tb>, <Vn>.<Ta>, <Vm>.<Ta>

integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);

if size == '11' then UNDEFINED;

integer esize = 8 << UInt(size);
integer datasize = 64;
integer part = UInt(Q);
integer elements = datasize DIV esize;

boolean sub_op = (o1 == '1');

boolean round = (U == '1');

Assembler Symbols

2 Is the second and upper half specifier. If present it causes the operation to be performed on the upper
64 bits of the registers holding the narrower elements, and is encoded in “Q”:

Q 2
0 [absent]
1 [present]

<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.

<Tb> Is an arrangement specifier, encoded in “size:Q”:

size Q <Tb>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 x RESERVED

<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.

<Ta> Is an arrangement specifier, encoded in “size”:

size <Ta>
00 8H
01 4S
10 2D
11 RESERVED

<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.

ADDHN, ADDHN2 Page 828

Operation

CheckFPAdvSIMDEnabled64();
bits(2*datasize) operand1 = V[n];
bits(2*datasize) operand2 = V[m];
bits(datasize) result;
integer round_const = if round then 1 << (esize - 1) else 0;
bits(2*esize) element1;
bits(2*esize) element2;
bits(2*esize) sum;

for e = 0 to elements-1
element1 = Elem[operand1, e, 2*esize];
element2 = Elem[operand2, e, 2*esize];
if sub_op then
sum = element1 - element2;
else
sum = element1 + element2;
sum = sum + round_const;
Elem[result, e, esize] = sum<2*esize-1:esize>;

Vpart[d, part] = result;

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

ADDHN, ADDHN2 Page 829

ADDP (scalar)

Add Pair of elements (scalar). This instruction adds two vector elements in the source SIMD&FP register and writes
the scalar result into the destination SIMD&FP register.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 0 size 1 1 0 0 0 1 1 0 1 1 1 0 Rn Rd

ADDP <V><d>, <Vn>.<T>

integer d = UInt(Rd);
integer n = UInt(Rn);

if size != '11' then UNDEFINED;

integer esize = 8 << UInt(size);

integer datasize = esize * 2;

Assembler Symbols

<V> Is the destination width specifier, encoded in “size”:

size <V>
0x RESERVED
10 RESERVED
11 D

<d> Is the number of the SIMD&FP destination register, encoded in the "Rd" field.
<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.

<T> Is the source arrangement specifier, encoded in “size”:

size <T>
0x RESERVED
10 RESERVED
11 2D

Operation

CheckFPAdvSIMDEnabled64();
bits(datasize) operand = V[n];
V[d] = Reduce(ReduceOp_ADD, operand, esize);

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

ADDP (scalar) Page 830

ADDP (vector)

Add Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first source
SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent
vector elements from the concatenated vector, adds each pair of values together, places the result into a vector, and
writes the vector to the destination SIMD&FP register.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 size 1 Rm 1 0 1 1 1 1 Rn Rd

ADDP <Vd>.<T>, <Vn>.<T>, <Vm>.<T>

Assembler Symbols

<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.

<T> Is an arrangement specifier, encoded in “size:Q”:

size Q <T>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 0 RESERVED
11 1 2D

<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.

Operation

CheckFPAdvSIMDEnabled64();
bits(datasize) operand1 = V[n];
bits(datasize) operand2 = V[m];
bits(datasize) result;
bits(2*datasize) concat = operand2:operand1;
bits(esize) element1;
bits(esize) element2;

for e = 0 to elements-1
element1 = Elem[concat, 2*e, esize];
element2 = Elem[concat, (2*e)+1, esize];
Elem[result, e, esize] = element1 + element2;

V[d] = result;

Operational information

If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.

ADDP (vector) Page 831

◦ The values of the NZCV flags.
• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

ADDP (vector) Page 832

ADDV

Add across Vector. This instruction adds every vector element in the source SIMD&FP register together, and writes the
scalar result to the destination SIMD&FP register.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 size 1 1 0 0 0 1 1 0 1 1 1 0 Rn Rd

ADDV <V><d>, <Vn>.<T>

integer d = UInt(Rd);
integer n = UInt(Rn);

if size:Q == '100' then UNDEFINED;

if size == '11' then UNDEFINED;

integer esize = 8 << UInt(size);

integer datasize = if Q == '1' then 128 else 64;

Assembler Symbols

<V> Is the destination width specifier, encoded in “size”:

size <V>
00 B
01 H
10 S
11 RESERVED

<d> Is the number of the SIMD&FP destination register, encoded in the "Rd" field.
<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.

<T> Is an arrangement specifier, encoded in “size:Q”:

size Q <T>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 RESERVED
10 1 4S
11 x RESERVED

Operation

CheckFPAdvSIMDEnabled64();
bits(datasize) operand = V[n];
V[d] = Reduce(ReduceOp_ADD, operand, esize);

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

ADDV Page 833

ADDV Page 834

AESD

AES single round decryption.

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 0 1 1 1 0 0 0 1 0 1 0 0 0 0 1 0 1 1 0 Rn Rd
D

AESD <Vd>.16B, <Vn>.16B

integer d = UInt(Rd);
integer n = UInt(Rn);
if !HaveAESExt() then UNDEFINED;

Assembler Symbols

<Vd> Is the name of the SIMD&FP source and destination register, encoded in the "Rd" field.
<Vn> Is the name of the second SIMD&FP source register, encoded in the "Rn" field.

Operation

AArch64.CheckFPAdvSIMDEnabled();

bits(128) operand1 = V[d];

bits(128) operand2 = V[n];
bits(128) result;
result = operand1 EOR operand2;
result = AESInvSubBytes(AESInvShiftRows(result));
V[d] = result;

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

AESD Page 835

AESE

AES single round encryption.

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 0 1 1 1 0 0 0 1 0 1 0 0 0 0 1 0 0 1 0 Rn Rd
D

AESE <Vd>.16B, <Vn>.16B

integer d = UInt(Rd);
integer n = UInt(Rn);
if !HaveAESExt() then UNDEFINED;

Assembler Symbols

<Vd> Is the name of the SIMD&FP source and destination register, encoded in the "Rd" field.
<Vn> Is the name of the second SIMD&FP source register, encoded in the "Rn" field.

Operation

AArch64.CheckFPAdvSIMDEnabled();

bits(128) operand1 = V[d];

bits(128) operand2 = V[n];
bits(128) result;
result = operand1 EOR operand2;
result = AESSubBytes(AESShiftRows(result));

V[d] = result;

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

AESE Page 836

AESIMC

AES inverse mix columns.

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 0 1 1 1 0 0 0 1 0 1 0 0 0 0 1 1 1 1 0 Rn Rd
D

AESIMC <Vd>.16B, <Vn>.16B

integer d = UInt(Rd);
integer n = UInt(Rn);
if !HaveAESExt() then UNDEFINED;

Assembler Symbols

<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.

Operation

AArch64.CheckFPAdvSIMDEnabled();

bits(128) operand = V[n];

bits(128) result;
result = AESInvMixColumns(operand);
V[d] = result;

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

AESIMC Page 837

AESMC

AES mix columns.

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 0 1 1 1 0 0 0 1 0 1 0 0 0 0 1 1 0 1 0 Rn Rd
D

AESMC <Vd>.16B, <Vn>.16B

integer d = UInt(Rd);
integer n = UInt(Rn);
if !HaveAESExt() then UNDEFINED;

Assembler Symbols

<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.

Operation

AArch64.CheckFPAdvSIMDEnabled();

bits(128) operand = V[n];

bits(128) result;
result = AESMixColumns(operand);
V[d] = result;

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

AESMC Page 838

AND (vector)

Bitwise AND (vector). This instruction performs a bitwise AND between the two source SIMD&FP registers, and
writes the result to the destination SIMD&FP register.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 0 0 1 Rm 0 0 0 1 1 1 Rn Rd
size

AND <Vd>.<T>, <Vn>.<T>, <Vm>.<T>

integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer datasize = if Q == '1' then 128 else 64;

Assembler Symbols

<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.

<T> Is an arrangement specifier, encoded in “Q”:

Q <T>
0 8B
1 16B

<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.

Operation

CheckFPAdvSIMDEnabled64();
bits(datasize) operand1 = V[n];
bits(datasize) operand2 = V[m];
bits(datasize) result;

result = operand1 AND operand2;

V[d] = result;

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

AND (vector) Page 839

BCAX

Bit Clear and Exclusive OR performs a bitwise AND of the 128-bit vector in a source SIMD&FP register and the
complement of the vector in another source SIMD&FP register, then performs a bitwise exclusive OR of the resulting
vector and the vector in a third source SIMD&FP register, and writes the result to the destination SIMD&FP register.
This instruction is implemented only when FEAT_SHA3 is implemented.

Advanced SIMD
(FEAT_SHA3)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 1 1 1 0 0 0 1 Rm 0 Ra Rn Rd

BCAX <Vd>.16B, <Vn>.16B, <Vm>.16B, <Va>.16B

if !HaveSHA3Ext() then UNDEFINED;

integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer a = UInt(Ra);

Assembler Symbols

<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.
<Va> Is the name of the third SIMD&FP source register, encoded in the "Ra" field.

Operation

AArch64.CheckFPAdvSIMDEnabled();

bits(128) Vm = V[m];
bits(128) Vn = V[n];
bits(128) Va = V[a];
V[d] = Vn EOR (Vm AND NOT(Va));

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

BCAX Page 840

BFCVT

Floating-point convert from single-precision to BFloat16 format (scalar) converts the single-precision floating-point
value in the 32-bit SIMD&FP source register to BFloat16 format and writes the result in the 16-bit SIMD&FP
destination register.
ID_AA64ISAR1_EL1.BF16 indicates whether this instruction is supported.

Single-precision to BFloat16
(FEAT_BF16)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 1 1 1 1 0 0 1 1 0 0 0 1 1 0 1 0 0 0 0 Rn Rd

BFCVT <Hd>, <Sn>

if !HaveBF16Ext() then UNDEFINED;

integer n = UInt(Rn);
integer d = UInt(Rd);

Assembler Symbols

<Hd> Is the 16-bit name of the SIMD&FP destination register, encoded in the "Rd" field.
<Sn> Is the 32-bit name of the SIMD&FP source register, encoded in the "Rn" field.

Operation

CheckFPAdvSIMDEnabled64();

bits(32) operand = V[n];

FPCRType fpcr = FPCR[];
boolean merge = IsMerging(fpcr);
bits(128) result = if merge then V[d] else Zeros();

Elem[result, 0, 16] = FPConvertBF(operand, fpcr);

V[d] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

BFCVT Page 841

BFCVTN, BFCVTN2

Floating-point convert from single-precision to BFloat16 format (vector) reads each single-precision element in the
SIMD&FP source vector, converts each value to BFloat16 format, and writes the results in the lower or upper half of
the SIMD&FP destination vector. The result elements are half the width of the source elements.
The BFCVTN instruction writes the half-width results to the lower half of the destination vector and clears the upper
half to zero, while the BFCVTN2 instruction writes the results to the upper half of the destination vector without
affecting the other bits in the register.

Vector single-precision to BFloat16

(FEAT_BF16)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 1 0 1 0 0 0 0 1 0 1 1 0 1 0 Rn Rd

BFCVTN{2} <Vd>.<Ta>, <Vn>.4S

if !HaveBF16Ext() then UNDEFINED;

integer n = UInt(Rn);
integer d = UInt(Rd);
integer part = UInt(Q);
integer elements = 64 DIV 16;

Assembler Symbols

2 Is the second and upper half specifier. If present it causes the operation to be performed on the upper
64 bits of the registers holding the narrower elements, and is encoded in “Q”:

Q 2
0 [absent]
1 [present]

<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.

<Ta> Is an arrangement specifier, encoded in “Q”:

Q <Ta>
0 4H
1 8H

<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.

Operation

CheckFPAdvSIMDEnabled64();
bits(128) operand = V[n];
bits(64) result;

for e = 0 to elements-1
Elem[result, e, 16] = FPConvertBF(Elem[operand, e, 32], FPCR[]);

Vpart[d, part] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

BFCVTN, BFCVTN2 Page 842

BFDOT (by element)

BFloat16 floating-point dot product (vector, by element). This instruction delimits the source vectors into pairs of
BFloat16 elements.
Irrespective of the control bits in the FPCR, this instruction:
• Performs an unfused sum-of-products of each pair of adjacent BFloat16 elements in the first source vector
with the specified pair of elements in the second source vector. The intermediate single-precision products
are rounded before they are summed, and the intermediate sum is rounded before accumulation into the
single-precision destination element that overlaps with the corresponding pair of BFloat16 elements in the
first source vector.
• Uses the non-IEEE 754 Round-to-Odd rounding mode, which forces bit 0 of an inexact result to 1, and rounds
an overflow to an appropriately signed Infinity.
• Does not modify the cumulative FPSR exception bits (IDC, IXC, UFC, OFC, DZC, and IOC).
• Disables trapped floating-point exceptions, as if the FPCR trap enable bits (IDE, IXE, UFE, OFE, DZE, and
IOE) are all zero.
• Flushes denormalized inputs and results to zero, as if FPCR.{FZ, FIZ} is {1, 1}.
• Generates only the default NaN, as if FPCR.DN is 1.
The BFloat16 pair within the second source vector is specified using an immediate index. The index range is from 0 to
3 inclusive. ID_AA64ISAR1_EL1.BF16 indicates whether this instruction is supported.

Vector
(FEAT_BF16)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 1 0 1 L M Rm 1 1 1 1 H 0 Rn Rd

BFDOT <Vd>.<Ta>, <Vn>.<Tb>, <Vm>.2H[<index>]

if !HaveBF16Ext() then UNDEFINED;

integer n = UInt(Rn);
integer m = UInt(M:Rm);
integer d = UInt(Rd);
integer i = UInt(H:L);
integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV 32;

Assembler Symbols

<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.

<Ta> Is an arrangement specifier, encoded in “Q”:

Q <Ta>
0 2S
1 4S

<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.

<Tb> Is an arrangement specifier, encoded in “Q”:

Q <Tb>
0 4H
1 8H

<Vm> Is the name of the second SIMD&FP source register, encoded in the "M:Rm" fields.
<index> Is the immediate index of a pair of 16-bit elements in the range 0 to 3, encoded in the "H:L" fields.

BFDOT (by element) Page 843

Operation

CheckFPAdvSIMDEnabled64();
bits(datasize) operand1 = V[n];
bits(128) operand2 = V[m];
bits(datasize) operand3 = V[d];
bits(datasize) result;

for e = 0 to elements-1
bits(16) elt1_a = Elem[operand1, 2*e+0, 16];
bits(16) elt1_b = Elem[operand1, 2*e+1, 16];
bits(16) elt2_a = Elem[operand2, 2*i+0, 16];
bits(16) elt2_b = Elem[operand2, 2*i+1, 16];

bits(32) sum = Elem[operand3, e, 32];

sum = BFDotAdd(sum, elt1_a, elt1_b, elt2_a, elt2_b, FPCR[]);
Elem[result, e, 32] = sum;

V[d] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

BFDOT (by element) Page 844

BFDOT (vector)

BFloat16 floating-point dot product (vector). This instruction delimits the source vectors into pairs of BFloat16
elements.
Irrespective of the control bits in the FPCR, this instruction:
• Performs an unfused sum-of-products of each pair of adjacent BFloat16 elements in the source vectors. The
intermediate single-precision products are rounded before they are summed, and the intermediate sum is
rounded before accumulation into the single-precision destination element that overlaps with the
corresponding pair of BFloat16 elements in the source vectors.
• Uses the non-IEEE 754 Round-to-Odd rounding mode, which forces bit 0 of an inexact result to 1, and rounds
an overflow to an appropriately signed Infinity.
• Does not modify the cumulative FPSR exception bits (IDC, IXC, UFC, OFC, DZC, and IOC).
• Disables trapped floating-point exceptions, as if the FPCR trap enable bits (IDE, IXE, UFE, OFE, DZE, and
IOE) are all zero.
• Flushes denormalized inputs and results to zero, as if FPCR.{FZ, FIZ} is {1, 1}.
• Generates only the default NaN, as if FPCR.DN is 1.

Vector
(FEAT_BF16)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 0 1 0 Rm 1 1 1 1 1 1 Rn Rd

BFDOT <Vd>.<Ta>, <Vn>.<Tb>, <Vm>.<Tb>

if !HaveBF16Ext() then UNDEFINED;

integer n = UInt(Rn);
integer m = UInt(Rm);
integer d = UInt(Rd);
integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV 32;

Assembler Symbols

<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.

<Ta> Is an arrangement specifier, encoded in “Q”:

Q <Ta>
0 2S
1 4S

<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.

<Tb> Is an arrangement specifier, encoded in “Q”:

Q <Tb>
0 4H
1 8H

<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.

BFDOT (vector) Page 845

Operation

CheckFPAdvSIMDEnabled64();
bits(datasize) operand1 = V[n];
bits(datasize) operand2 = V[m];
bits(datasize) operand3 = V[d];
bits(datasize) result;

for e = 0 to elements-1
bits(16) elt1_a = Elem[operand1, 2*e+0, 16];
bits(16) elt1_b = Elem[operand1, 2*e+1, 16];
bits(16) elt2_a = Elem[operand2, 2*e+0, 16];
bits(16) elt2_b = Elem[operand2, 2*e+1, 16];

bits(32) sum = Elem[operand3, e, 32];

sum = BFDotAdd(sum, elt1_a, elt1_b, elt2_a, elt2_b, FPCR[]);
Elem[result, e, 32] = sum;

V[d] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

BFDOT (vector) Page 846

BFMLALB, BFMLALT (by element)

BFloat16 floating-point widening multiply-add long (by element) widens the even-numbered (bottom) or odd-numbered
(top) 16-bit elements in the first source vector, and the indexed element in the second source vector from Bfloat16 to
single-precision format. The instruction then multiplies and adds these values without intermediate rounding to single-
precision elements of the destination vector that overlap with the corresponding BFloat16 elements in the first source
vector.
ID_AA64ISAR1_EL1.BF16 indicates whether this instruction is supported.

Vector
(FEAT_BF16)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 1 1 1 L M Rm 1 1 1 1 H 0 Rn Rd

BFMLAL<bt> <Vd>.4S, <Vn>.8H, <Vm>.H[<index>]

if !HaveBF16Ext() then UNDEFINED;

integer n = UInt(Rn);
integer m = UInt('0':Rm);
integer d = UInt(Rd);
integer index = UInt(H:L:M);

integer elements = 128 DIV 32;

integer sel = UInt(Q);

Assembler Symbols

<bt> Is the bottom or top element specifier, encoded in “Q”:

Q <bt>
0 B
1 T

<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
<Vm> Is the name of the second SIMD&FP source register, in the range V0 to V15, encoded in the "Rm" field.
<index> Is the element index, in the range 0 to 7, encoded in the "H:L:M" fields.

Operation

CheckFPAdvSIMDEnabled64();
bits(128) result;
bits(128) operand1 = V[n];
bits(128) operand2 = V[m];
bits(128) operand3 = V[d];
bits(32) element2 = Elem[operand2, index, 16]:Zeros(16);

for e = 0 to elements-1
bits(32) element1 = Elem[operand1, 2*e+sel, 16]:Zeros(16);
bits(32) addend = Elem[operand3, e, 32];
Elem[result, e, 32] = BFMulAdd(addend, element1, element2, FPCR[]);

V[d] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

BFMLALB, BFMLALT (by

Page 847
element)
BFMLALB, BFMLALT (vector)

BFloat16 floating-point widening multiply-add long (vector) widens the even-numbered (bottom) or odd-numbered
(top) 16-bit elements in the first and second source vectors from Bfloat16 to single-precision format. The instruction
then multiplies and adds these values without intermediate rounding to the single-precision elements of the
destination vector that overlap with the corresponding BFloat16 elements in the source vectors.
ID_AA64ISAR1_EL1.BF16 indicates whether this instruction is supported.

Vector
(FEAT_BF16)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 1 1 0 Rm 1 1 1 1 1 1 Rn Rd

BFMLAL<bt> <Vd>.4S, <Vn>.8H, <Vm>.8H

if !HaveBF16Ext() then UNDEFINED;

integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);

integer elements = 128 DIV 32;

integer sel = UInt(Q);

Assembler Symbols

<bt> Is the bottom or top element specifier, encoded in “Q”:

Q <bt>
0 B
1 T

Operation

CheckFPAdvSIMDEnabled64();
bits(128) operand1 = V[n];
bits(128) operand2 = V[m];
bits(128) operand3 = V[d];
bits(128) result;

for e = 0 to elements-1
bits(32) element1 = Elem[operand1, 2*e+sel, 16]:Zeros(16);
bits(32) element2 = Elem[operand2, 2*e+sel, 16]:Zeros(16);
bits(32) addend = Elem[operand3, e, 32];
Elem[result, e, 32] = BFMulAdd(addend, element1, element2, FPCR[]);

V[d] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

BFMLALB, BFMLALT (vector) Page 848

BFMMLA

BFloat16 floating-point matrix multiply-accumulate into 2x2 matrix.

Irrespective of the control bits in the FPCR, this instruction:
• Performs two unfused sums-of-products within each two pairs of adjacent BFloat16 elements while
multiplying the 2x4 matrix of BFloat16 values in the first source vector with the 4x2 matrix of BFloat16 values
in the second source vector. The intermediate single-precision products are rounded before they are summed
and the intermediate sum is rounded before accumulation into the 2x2 single-precision matrix in the
destination vector. This is equivalent to accumulating two 2-way unfused dot products per destination
element.
• Uses the non-IEEE 754 Round-to-Odd rounding mode, which forces bit 0 of an inexact result to 1, and rounds
an overflow to an appropriately signed Infinity.
• Does not modify the cumulative FPSR exception bits (IDC, IXC, UFC, OFC, DZC, and IOC).
• Disables trapped floating-point exceptions, as if the FPCR trap enable bits (IDE, IXE, UFE, OFE, DZE, and
IOE) are all zero.
• Flushes denormalized inputs and results to zero, as if FPCR.{FZ, FIZ} is {1, 1}.
• Generates only the default NaN, as if FPCR.DN is 1.

Note

Arm expects that the BFMMLA instruction will deliver a peak BFloat16 multiply throughput that is at least as high
as can be achieved using two BFDOT instructions, with a goal that it should have significantly higher throughput.

Vector
(FEAT_BF16)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 1 1 1 0 0 1 0 Rm 1 1 1 0 1 1 Rn Rd

BFMMLA <Vd>.4S, <Vn>.8H, <Vm>.8H

if !HaveBF16Ext() then UNDEFINED;

integer n = UInt(Rn);
integer m = UInt(Rm);
integer d = UInt(Rd);

Assembler Symbols

<Vd> Is the name of the SIMD&FP third source and destination register, encoded in the "Rd" field.
<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.

Operation

CheckFPAdvSIMDEnabled64();
bits(128) op1 = V[n];
bits(128) op2 = V[m];
bits(128) acc = V[d];

V[d] = BFMatMulAdd(acc, op1, op2);

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

BFMMLA Page 849

BIC (vector, immediate)

Bitwise bit Clear (vector, immediate). This instruction reads each vector element from the destination SIMD&FP
register, performs a bitwise AND between each result and the complement of an immediate constant, places the result
into a vector, and writes the vector to the destination SIMD&FP register.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 1 0 0 0 0 0 a b c x x x 1 0 1 d e f g h Rd
op cmode

16-bit (cmode == 10x1)

BIC <Vd>.<T>, #<imm8>{, LSL #<amount>}

32-bit (cmode == 0xx1)

BIC <Vd>.<T>, #<imm8>{, LSL #<amount>}

integer rd = UInt(Rd);

integer datasize = if Q == '1' then 128 else 64;

bits(datasize) imm;
bits(64) imm64;

ImmediateOp operation;
case cmode:op of
when '0xx01' operation = ImmediateOp_MVNI;
when '0xx11' operation = ImmediateOp_BIC;
when '10x01' operation = ImmediateOp_MVNI;
when '10x11' operation = ImmediateOp_BIC;
when '110x1' operation = ImmediateOp_MVNI;
when '1110x' operation = ImmediateOp_MOVI;
when '11111'
// FMOV Dn,#imm is in main FP instruction set
if Q == '0' then UNDEFINED;
operation = ImmediateOp_MOVI;

imm64 = AdvSIMDExpandImm(op, cmode, a:b:c:d:e:f:g:h);

imm = Replicate(imm64, datasize DIV 64);

Assembler Symbols

<Vd> Is the name of the SIMD&FP register, encoded in the "Rd" field.

<T> For the 16-bit variant: is an arrangement specifier, encoded in “Q”:

Q <T>
0 4H
1 8H

For the 32-bit variant: is an arrangement specifier, encoded in “Q”:

Q <T>
0 2S
1 4S

<imm8> Is an 8-bit immediate encoded in "a:b:c:d:e:f:g:h".

<amount> For the 16-bit variant: is the shift amount encoded in “cmode<1>”:

BIC (vector, immediate) Page 850

cmode<1> <amount>
0 0
1 8
defaulting to 0 if LSL is omitted.

For the 32-bit variant: is the shift amount encoded in “cmode<2:1>”:

cmode<2:1> <amount>
00 0
01 8
10 16
11 24
defaulting to 0 if LSL is omitted.

Operation

CheckFPAdvSIMDEnabled64();
bits(datasize) operand;
bits(datasize) result;

case operation of
when ImmediateOp_MOVI
result = imm;
when ImmediateOp_MVNI
result = NOT(imm);
when ImmediateOp_ORR
operand = V[rd];
result = operand OR imm;
when ImmediateOp_BIC
operand = V[rd];
result = operand AND NOT(imm);

V[rd] = result;

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

BIC (vector, immediate) Page 851

BIC (vector, register)

Bitwise bit Clear (vector, register). This instruction performs a bitwise AND between the first source SIMD&FP
register and the complement of the second source SIMD&FP register, and writes the result to the destination
SIMD&FP register.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 0 1 1 Rm 0 0 0 1 1 1 Rn Rd
size

BIC <Vd>.<T>, <Vn>.<T>, <Vm>.<T>

integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer datasize = if Q == '1' then 128 else 64;

Assembler Symbols

<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.

<T> Is an arrangement specifier, encoded in “Q”:

Q <T>
0 8B
1 16B

<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.

Operation

CheckFPAdvSIMDEnabled64();
bits(datasize) operand1 = V[n];
bits(datasize) operand2 = V[m];
bits(datasize) result;

operand2 = NOT(operand2);

result = operand1 AND operand2;

V[d] = result;

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

BIC (vector, register) Page 852

BIF

Bitwise Insert if False. This instruction inserts each bit from the first source SIMD&FP register into the destination
SIMD&FP register if the corresponding bit of the second source SIMD&FP register is 0, otherwise leaves the bit in the
destination register unchanged.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 1 1 1 Rm 0 0 0 1 1 1 Rn Rd
opc2

BIF <Vd>.<T>, <Vn>.<T>, <Vm>.<T>

integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer datasize = if Q == '1' then 128 else 64;

Assembler Symbols

<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.

<T> Is an arrangement specifier, encoded in “Q”:

Q <T>
0 8B
1 16B

<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.

Operation

CheckFPAdvSIMDEnabled64();
bits(datasize) operand1;
bits(datasize) operand3;
bits(datasize) operand4 = V[n];

operand1 = V[d];
operand3 = NOT(V[m]);

V[d] = operand1 EOR ((operand1 EOR operand4) AND operand3);

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

BIF Page 853

BIT

Bitwise Insert if True. This instruction inserts each bit from the first source SIMD&FP register into the SIMD&FP
destination register if the corresponding bit of the second source SIMD&FP register is 1, otherwise leaves the bit in
the destination register unchanged.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 1 0 1 Rm 0 0 0 1 1 1 Rn Rd
opc2

BIT <Vd>.<T>, <Vn>.<T>, <Vm>.<T>

integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer datasize = if Q == '1' then 128 else 64;

Assembler Symbols

<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.

<T> Is an arrangement specifier, encoded in “Q”:

Q <T>
0 8B
1 16B

<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.

Operation

CheckFPAdvSIMDEnabled64();
bits(datasize) operand1;
bits(datasize) operand3;
bits(datasize) operand4 = V[n];

operand1 = V[d];
operand3 = V[m];
V[d] = operand1 EOR ((operand1 EOR operand4) AND operand3);

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

BIT Page 854

BSL

Bitwise Select. This instruction sets each bit in the destination SIMD&FP register to the corresponding bit from the
first source SIMD&FP register when the original destination bit was 1, otherwise from the second source SIMD&FP
register.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 0 1 1 Rm 0 0 0 1 1 1 Rn Rd
opc2

BSL <Vd>.<T>, <Vn>.<T>, <Vm>.<T>

integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer datasize = if Q == '1' then 128 else 64;

Assembler Symbols

<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.

<T> Is an arrangement specifier, encoded in “Q”:

Q <T>
0 8B
1 16B

<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.

Operation

CheckFPAdvSIMDEnabled64();
bits(datasize) operand1;
bits(datasize) operand3;
bits(datasize) operand4 = V[n];

operand1 = V[m];
operand3 = V[d];
V[d] = operand1 EOR ((operand1 EOR operand4) AND operand3);

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

BSL Page 855

CLS (vector)

Count Leading Sign bits (vector). This instruction counts the number of consecutive bits following the most significant
bit that are the same as the most significant bit in each vector element in the source SIMD&FP register, places the
result into a vector, and writes the vector to the destination SIMD&FP register. The count does not include the most
significant bit itself.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 size 1 0 0 0 0 0 0 1 0 0 1 0 Rn Rd
U

CLS <Vd>.<T>, <Vn>.<T>

integer d = UInt(Rd);
integer n = UInt(Rn);

if size == '11' then UNDEFINED;

integer esize = 8 << UInt(size);
integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;

CountOp countop = if U == '1' then CountOp_CLZ else CountOp_CLS;

Assembler Symbols

<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.

<T> Is an arrangement specifier, encoded in “size:Q”:

size Q <T>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 x RESERVED

<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.

Operation

CheckFPAdvSIMDEnabled64();
bits(datasize) operand = V[n];
bits(datasize) result;

integer count;
for e = 0 to elements-1
if countop == CountOp_CLS then
count = CountLeadingSignBits(Elem[operand, e, esize]);
else
count = CountLeadingZeroBits(Elem[operand, e, esize]);
Elem[result, e, esize] = count<esize-1:0>;
V[d] = result;

Operational information

CLS (vector) Page 856

◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

CLS (vector) Page 857

CLZ (vector)

Count Leading Zero bits (vector). This instruction counts the number of consecutive zeros, starting from the most
significant bit, in each vector element in the source SIMD&FP register, places the result into a vector, and writes the
vector to the destination SIMD&FP register.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 size 1 0 0 0 0 0 0 1 0 0 1 0 Rn Rd
U

CLZ <Vd>.<T>, <Vn>.<T>

integer d = UInt(Rd);
integer n = UInt(Rn);

if size == '11' then UNDEFINED;

integer esize = 8 << UInt(size);
integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;

CountOp countop = if U == '1' then CountOp_CLZ else CountOp_CLS;

Assembler Symbols

<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.

<T> Is an arrangement specifier, encoded in “size:Q”:

size Q <T>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 x RESERVED

<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.

Operation

CheckFPAdvSIMDEnabled64();
bits(datasize) operand = V[n];
bits(datasize) result;

Operational information

CLZ (vector) Page 858

◦ The values of the NZCV flags.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

CLZ (vector) Page 859

CMEQ (register)

Compare bitwise Equal (vector). This instruction compares each vector element from the first source SIMD&FP
register with the corresponding vector element from the second source SIMD&FP register, and if the comparison is
equal sets every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets
every bit of the corresponding vector element in the destination SIMD&FP register to zero.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: Scalar and Vector

Scalar

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 1 1 0 size 1 Rm 1 0 0 0 1 1 Rn Rd
U

CMEQ <V><d>, <V><n>, <V><m>

Vector

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 size 1 Rm 1 0 0 0 1 1 Rn Rd
U

CMEQ <Vd>.<T>, <Vn>.<T>, <Vm>.<T>

Assembler Symbols

<V> Is a width specifier, encoded in “size”:

size <V>
0x RESERVED
10 RESERVED
11 D

CMEQ (register) Page 860

<T> Is an arrangement specifier, encoded in “size:Q”:

size Q <T>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 0 RESERVED
11 1 2D

<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.

Operation

CheckFPAdvSIMDEnabled64();
bits(datasize) operand1 = V[n];
bits(datasize) operand2 = V[m];
bits(datasize) result;
bits(esize) element1;
bits(esize) element2;
boolean test_passed;

for e = 0 to elements-1
element1 = Elem[operand1, e, esize];
element2 = Elem[operand2, e, esize];
if and_test then
test_passed = !IsZero(element1 AND element2);
else
test_passed = (element1 == element2);
Elem[result, e, esize] = if test_passed then Ones() else Zeros();

V[d] = result;

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

CMEQ (register) Page 861

CMEQ (zero)

Compare bitwise Equal to zero (vector). This instruction reads each vector element in the source SIMD&FP register
and if the value is equal to zero sets every bit of the corresponding vector element in the destination SIMD&FP
register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to
zero.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: Scalar and Vector

Scalar

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 0 size 1 0 0 0 0 0 1 0 0 1 1 0 Rn Rd
U op

CMEQ <V><d>, <V><n>, #0

integer d = UInt(Rd);
integer n = UInt(Rn);

if size != '11' then UNDEFINED;

integer esize = 8 << UInt(size);
integer datasize = esize;
integer elements = 1;

CompareOp comparison;
case op:U of
when '00' comparison = CompareOp_GT;
when '01' comparison = CompareOp_GE;
when '10' comparison = CompareOp_EQ;
when '11' comparison = CompareOp_LE;

Vector

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 size 1 0 0 0 0 0 1 0 0 1 1 0 Rn Rd
U op

CMEQ <Vd>.<T>, <Vn>.<T>, #0

integer d = UInt(Rd);
integer n = UInt(Rn);

if size:Q == '110' then UNDEFINED;

integer esize = 8 << UInt(size);
integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;

CompareOp comparison;
case op:U of
when '00' comparison = CompareOp_GT;
when '01' comparison = CompareOp_GE;
when '10' comparison = CompareOp_EQ;
when '11' comparison = CompareOp_LE;

Assembler Symbols

<V> Is a width specifier, encoded in “size”:

CMEQ (zero) Page 862

size <V>
0x RESERVED
10 RESERVED
11 D

<T> Is an arrangement specifier, encoded in “size:Q”:

size Q <T>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 0 RESERVED
11 1 2D

<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.

Operation

CheckFPAdvSIMDEnabled64();
bits(datasize) operand = V[n];
bits(datasize) result;
integer element;
boolean test_passed;

for e = 0 to elements-1
element = SInt(Elem[operand, e, esize]);
case comparison of
when CompareOp_GT test_passed = element > 0;
when CompareOp_GE test_passed = element >= 0;
when CompareOp_EQ test_passed = element == 0;
when CompareOp_LE test_passed = element <= 0;
when CompareOp_LT test_passed = element < 0;
Elem[result, e, esize] = if test_passed then Ones() else Zeros();

V[d] = result;

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

CMEQ (zero) Page 863

CMGE (register)

Compare signed Greater than or Equal (vector). This instruction compares each vector element in the first source
SIMD&FP register with the corresponding vector element in the second source SIMD&FP register and if the first
signed integer value is greater than or equal to the second signed integer value sets every bit of the corresponding
vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector
element in the destination SIMD&FP register to zero.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: Scalar and Vector

Scalar

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 0 size 1 Rm 0 0 1 1 1 1 Rn Rd
U eq

CMGE <V><d>, <V><n>, <V><m>

Vector

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 size 1 Rm 0 0 1 1 1 1 Rn Rd
U eq

CMGE <Vd>.<T>, <Vn>.<T>, <Vm>.<T>

Assembler Symbols

<V> Is a width specifier, encoded in “size”:

size <V>
0x RESERVED
10 RESERVED
11 D

CMGE (register) Page 864

<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.

<T> Is an arrangement specifier, encoded in “size:Q”:

size Q <T>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 0 RESERVED
11 1 2D

<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.

Operation

CheckFPAdvSIMDEnabled64();
bits(datasize) operand1 = V[n];
bits(datasize) operand2 = V[m];
bits(datasize) result;
integer element1;
integer element2;
boolean test_passed;

for e = 0 to elements-1
element1 = Int(Elem[operand1, e, esize], unsigned);
element2 = Int(Elem[operand2, e, esize], unsigned);
test_passed = if cmp_eq then element1 >= element2 else element1 > element2;
Elem[result, e, esize] = if test_passed then Ones() else Zeros();

V[d] = result;

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

CMGE (register) Page 865

CMGE (zero)

Compare signed Greater than or Equal to zero (vector). This instruction reads each vector element in the source
SIMD&FP register and if the signed integer value is greater than or equal to zero sets every bit of the corresponding
vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector
element in the destination SIMD&FP register to zero.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: Scalar and Vector

Scalar

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 1 1 0 size 1 0 0 0 0 0 1 0 0 0 1 0 Rn Rd
U op

CMGE <V><d>, <V><n>, #0

integer d = UInt(Rd);
integer n = UInt(Rn);

if size != '11' then UNDEFINED;

integer esize = 8 << UInt(size);
integer datasize = esize;
integer elements = 1;

CompareOp comparison;
case op:U of
when '00' comparison = CompareOp_GT;
when '01' comparison = CompareOp_GE;
when '10' comparison = CompareOp_EQ;
when '11' comparison = CompareOp_LE;

Vector

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 size 1 0 0 0 0 0 1 0 0 0 1 0 Rn Rd
U op

CMGE <Vd>.<T>, <Vn>.<T>, #0

integer d = UInt(Rd);
integer n = UInt(Rn);

if size:Q == '110' then UNDEFINED;

integer esize = 8 << UInt(size);
integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;

CompareOp comparison;
case op:U of
when '00' comparison = CompareOp_GT;
when '01' comparison = CompareOp_GE;
when '10' comparison = CompareOp_EQ;
when '11' comparison = CompareOp_LE;

Assembler Symbols

<V> Is a width specifier, encoded in “size”:

CMGE (zero) Page 866

size <V>
0x RESERVED
10 RESERVED
11 D

<T> Is an arrangement specifier, encoded in “size:Q”:

size Q <T>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 0 RESERVED
11 1 2D

<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.

Operation

CheckFPAdvSIMDEnabled64();
bits(datasize) operand = V[n];
bits(datasize) result;
integer element;
boolean test_passed;

V[d] = result;

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

CMGE (zero) Page 867

CMGT (register)

Compare signed Greater than (vector). This instruction compares each vector element in the first source SIMD&FP
register with the corresponding vector element in the second source SIMD&FP register and if the first signed integer
value is greater than the second signed integer value sets every bit of the corresponding vector element in the
destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination
SIMD&FP register to zero.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: Scalar and Vector

Scalar

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 0 size 1 Rm 0 0 1 1 0 1 Rn Rd
U eq

CMGT <V><d>, <V><n>, <V><m>

Vector

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 size 1 Rm 0 0 1 1 0 1 Rn Rd
U eq

CMGT <Vd>.<T>, <Vn>.<T>, <Vm>.<T>

Assembler Symbols

<V> Is a width specifier, encoded in “size”:

size <V>
0x RESERVED
10 RESERVED
11 D

CMGT (register) Page 868

<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.

<T> Is an arrangement specifier, encoded in “size:Q”:

size Q <T>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 0 RESERVED
11 1 2D

<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.

Operation

CheckFPAdvSIMDEnabled64();
bits(datasize) operand1 = V[n];
bits(datasize) operand2 = V[m];
bits(datasize) result;
integer element1;
integer element2;
boolean test_passed;

V[d] = result;

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

CMGT (register) Page 869

CMGT (zero)

Compare signed Greater than zero (vector). This instruction reads each vector element in the source SIMD&FP
register and if the signed integer value is greater than zero sets every bit of the corresponding vector element in the
destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination
SIMD&FP register to zero.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: Scalar and Vector

Scalar

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 0 size 1 0 0 0 0 0 1 0 0 0 1 0 Rn Rd
U op

CMGT <V><d>, <V><n>, #0

integer d = UInt(Rd);
integer n = UInt(Rn);

if size != '11' then UNDEFINED;

integer esize = 8 << UInt(size);
integer datasize = esize;
integer elements = 1;

CompareOp comparison;
case op:U of
when '00' comparison = CompareOp_GT;
when '01' comparison = CompareOp_GE;
when '10' comparison = CompareOp_EQ;
when '11' comparison = CompareOp_LE;

Vector

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 size 1 0 0 0 0 0 1 0 0 0 1 0 Rn Rd
U op

CMGT <Vd>.<T>, <Vn>.<T>, #0

integer d = UInt(Rd);
integer n = UInt(Rn);

if size:Q == '110' then UNDEFINED;

integer esize = 8 << UInt(size);
integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;

CompareOp comparison;
case op:U of
when '00' comparison = CompareOp_GT;
when '01' comparison = CompareOp_GE;
when '10' comparison = CompareOp_EQ;
when '11' comparison = CompareOp_LE;

Assembler Symbols

<V> Is a width specifier, encoded in “size”:

CMGT (zero) Page 870

size <V>
0x RESERVED
10 RESERVED
11 D

<T> Is an arrangement specifier, encoded in “size:Q”:

size Q <T>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 0 RESERVED
11 1 2D

<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.

Operation

CheckFPAdvSIMDEnabled64();
bits(datasize) operand = V[n];
bits(datasize) result;
integer element;
boolean test_passed;

V[d] = result;

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

CMGT (zero) Page 871

CMHI (register)

Compare unsigned Higher (vector). This instruction compares each vector element in the first source SIMD&FP
register with the corresponding vector element in the second source SIMD&FP register and if the first unsigned
integer value is greater than the second unsigned integer value sets every bit of the corresponding vector element in
the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the
destination SIMD&FP register to zero.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: Scalar and Vector

Scalar

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 1 1 0 size 1 Rm 0 0 1 1 0 1 Rn Rd
U eq

CMHI <V><d>, <V><n>, <V><m>

Vector

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 size 1 Rm 0 0 1 1 0 1 Rn Rd
U eq

CMHI <Vd>.<T>, <Vn>.<T>, <Vm>.<T>

Assembler Symbols

<V> Is a width specifier, encoded in “size”:

size <V>
0x RESERVED
10 RESERVED
11 D

CMHI (register) Page 872

<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.

<T> Is an arrangement specifier, encoded in “size:Q”:

size Q <T>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 0 RESERVED
11 1 2D

<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.

Operation

CheckFPAdvSIMDEnabled64();
bits(datasize) operand1 = V[n];
bits(datasize) operand2 = V[m];
bits(datasize) result;
integer element1;
integer element2;
boolean test_passed;

V[d] = result;

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

CMHI (register) Page 873

CMHS (register)

Compare unsigned Higher or Same (vector). This instruction compares each vector element in the first source
SIMD&FP register with the corresponding vector element in the second source SIMD&FP register and if the first
unsigned integer value is greater than or equal to the second unsigned integer value sets every bit of the
corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the
corresponding vector element in the destination SIMD&FP register to zero.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: Scalar and Vector

Scalar

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 1 1 0 size 1 Rm 0 0 1 1 1 1 Rn Rd
U eq

CMHS <V><d>, <V><n>, <V><m>

Vector

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 size 1 Rm 0 0 1 1 1 1 Rn Rd
U eq

CMHS <Vd>.<T>, <Vn>.<T>, <Vm>.<T>

Assembler Symbols

<V> Is a width specifier, encoded in “size”:

size <V>
0x RESERVED
10 RESERVED
11 D

CMHS (register) Page 874

<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.

<T> Is an arrangement specifier, encoded in “size:Q”:

size Q <T>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 0 RESERVED
11 1 2D

<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.

Operation

CheckFPAdvSIMDEnabled64();
bits(datasize) operand1 = V[n];
bits(datasize) operand2 = V[m];
bits(datasize) result;
integer element1;
integer element2;
boolean test_passed;

V[d] = result;

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

CMHS (register) Page 875

CMLE (zero)

Compare signed Less than or Equal to zero (vector). This instruction reads each vector element in the source
SIMD&FP register and if the signed integer value is less than or equal to zero sets every bit of the corresponding
vector element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector
element in the destination SIMD&FP register to zero.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: Scalar and Vector

Scalar

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 1 1 0 size 1 0 0 0 0 0 1 0 0 1 1 0 Rn Rd
U op

CMLE <V><d>, <V><n>, #0

integer d = UInt(Rd);
integer n = UInt(Rn);

if size != '11' then UNDEFINED;

integer esize = 8 << UInt(size);
integer datasize = esize;
integer elements = 1;

CompareOp comparison;
case op:U of
when '00' comparison = CompareOp_GT;
when '01' comparison = CompareOp_GE;
when '10' comparison = CompareOp_EQ;
when '11' comparison = CompareOp_LE;

Vector

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 size 1 0 0 0 0 0 1 0 0 1 1 0 Rn Rd
U op

CMLE <Vd>.<T>, <Vn>.<T>, #0

integer d = UInt(Rd);
integer n = UInt(Rn);

if size:Q == '110' then UNDEFINED;

integer esize = 8 << UInt(size);
integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;

CompareOp comparison;
case op:U of
when '00' comparison = CompareOp_GT;
when '01' comparison = CompareOp_GE;
when '10' comparison = CompareOp_EQ;
when '11' comparison = CompareOp_LE;

Assembler Symbols

<V> Is a width specifier, encoded in “size”:

CMLE (zero) Page 876

size <V>
0x RESERVED
10 RESERVED
11 D

<T> Is an arrangement specifier, encoded in “size:Q”:

size Q <T>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 0 RESERVED
11 1 2D

<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.

Operation

CheckFPAdvSIMDEnabled64();
bits(datasize) operand = V[n];
bits(datasize) result;
integer element;
boolean test_passed;

V[d] = result;

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

CMLE (zero) Page 877

CMLT (zero)

Compare signed Less than zero (vector). This instruction reads each vector element in the source SIMD&FP register
and if the signed integer value is less than zero sets every bit of the corresponding vector element in the destination
SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP
register to zero.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: Scalar and Vector

Scalar

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 0 size 1 0 0 0 0 0 1 0 1 0 1 0 Rn Rd

CMLT <V><d>, <V><n>, #0

integer d = UInt(Rd);
integer n = UInt(Rn);

if size != '11' then UNDEFINED;

integer esize = 8 << UInt(size);
integer datasize = esize;
integer elements = 1;

CompareOp comparison = CompareOp_LT;

Vector

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 size 1 0 0 0 0 0 1 0 1 0 1 0 Rn Rd

CMLT <Vd>.<T>, <Vn>.<T>, #0

integer d = UInt(Rd);
integer n = UInt(Rn);

if size:Q == '110' then UNDEFINED;

integer esize = 8 << UInt(size);
integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;

CompareOp comparison = CompareOp_LT;

Assembler Symbols

<V> Is a width specifier, encoded in “size”:

size <V>
0x RESERVED
10 RESERVED
11 D

<T> Is an arrangement specifier, encoded in “size:Q”:

CMLT (zero) Page 878

size Q <T>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 0 RESERVED
11 1 2D

<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.

Operation

CheckFPAdvSIMDEnabled64();
bits(datasize) operand = V[n];
bits(datasize) result;
integer element;
boolean test_passed;

V[d] = result;

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

CMLT (zero) Page 879

CMTST

Compare bitwise Test bits nonzero (vector). This instruction reads each vector element in the first source SIMD&FP
register, performs an AND with the corresponding vector element in the second source SIMD&FP register, and if the
result is not zero, sets every bit of the corresponding vector element in the destination SIMD&FP register to one,
otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: Scalar and Vector

Scalar

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 0 size 1 Rm 1 0 0 0 1 1 Rn Rd
U

CMTST <V><d>, <V><n>, <V><m>

Vector

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 size 1 Rm 1 0 0 0 1 1 Rn Rd
U

CMTST <Vd>.<T>, <Vn>.<T>, <Vm>.<T>

Assembler Symbols

<V> Is a width specifier, encoded in “size”:

size <V>
0x RESERVED
10 RESERVED
11 D

CMTST Page 880

<T> Is an arrangement specifier, encoded in “size:Q”:

size Q <T>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 0 RESERVED
11 1 2D

<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.

Operation

CheckFPAdvSIMDEnabled64();
bits(datasize) operand1 = V[n];
bits(datasize) operand2 = V[m];
bits(datasize) result;
bits(esize) element1;
bits(esize) element2;
boolean test_passed;

V[d] = result;

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

CMTST Page 881

CNT

Population Count per byte. This instruction counts the number of bits that have a value of one in each vector element
in the source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP
register.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 size 1 0 0 0 0 0 0 1 0 1 1 0 Rn Rd

CNT <Vd>.<T>, <Vn>.<T>

integer d = UInt(Rd);
integer n = UInt(Rn);

if size != '00' then UNDEFINED;

integer esize = 8;
integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV 8;

Assembler Symbols

<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.

<T> Is an arrangement specifier, encoded in “size:Q”:

size Q <T>
00 0 8B
00 1 16B
01 x RESERVED
1x x RESERVED

<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.

Operation

CheckFPAdvSIMDEnabled64();
bits(datasize) operand = V[n];
bits(datasize) result;

integer count;
for e = 0 to elements-1
count = BitCount(Elem[operand, e, esize]);
Elem[result, e, esize] = count<esize-1:0>;
V[d] = result;

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

CNT Page 882

DUP (element)

Duplicate vector element to vector or scalar. This instruction duplicates the vector element at the specified element
index in the source SIMD&FP register into a scalar or each element in a vector, and writes the result to the
destination SIMD&FP register.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
This instruction is used by the alias MOV (scalar).
It has encodings from 2 classes: Scalar and Vector

Scalar

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 0 0 0 0 imm5 0 0 0 0 0 1 Rn Rd

DUP <V><d>, <Vn>.<T>[<index>]

integer d = UInt(Rd);
integer n = UInt(Rn);

integer size = LowestSetBit(imm5);

if size > 3 then UNDEFINED;

integer index = UInt(imm5<4:size+1>);

integer idxdsize = if imm5<4> == '1' then 128 else 64;

integer esize = 8 << size;

integer datasize = esize;
integer elements = 1;

Vector

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 0 0 0 imm5 0 0 0 0 0 1 Rn Rd

DUP <Vd>.<T>, <Vn>.<Ts>[<index>]

integer d = UInt(Rd);
integer n = UInt(Rn);

integer size = LowestSetBit(imm5);

if size > 3 then UNDEFINED;

integer index = UInt(imm5<4:size+1>);

integer idxdsize = if imm5<4> == '1' then 128 else 64;

if size == 3 && Q == '0' then UNDEFINED;

integer esize = 8 << size;
integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;

Assembler Symbols

<T> For the scalar variant: is the element width specifier, encoded in “imm5”:

imm5 <T>
x0000 RESERVED
xxxx1 B
xxx10 H
xx100 S
x1000 D

DUP (element) Page 883

For the vector variant: is an arrangement specifier, encoded in “imm5:Q”:

imm5 Q <T>
x0000 x RESERVED
xxxx1 0 8B
xxxx1 1 16B
xxx10 0 4H
xxx10 1 8H
xx100 0 2S
xx100 1 4S
x1000 0 RESERVED
x1000 1 2D

<Ts> Is an element size specifier, encoded in “imm5”:

imm5 <Ts>
x0000 RESERVED
xxxx1 B
xxx10 H
xx100 S
x1000 D

<V> Is the destination width specifier, encoded in “imm5”:

imm5 <V>
x0000 RESERVED
xxxx1 B
xxx10 H
xx100 S
x1000 D

<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.

<index> Is the element index encoded in “imm5”:

imm5 <index>
x0000 RESERVED
xxxx1 imm5<4:1>
xxx10 imm5<4:2>
xx100 imm5<4:3>
x1000 imm5<4>

<d> Is the number of the SIMD&FP destination register, encoded in the "Rd" field.
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.

Operation

CheckFPAdvSIMDEnabled64();
bits(idxdsize) operand = V[n];
bits(datasize) result;
bits(esize) element;

element = Elem[operand, index, esize];

for e = 0 to elements-1
Elem[result, e, esize] = element;
V[d] = result;

Operational information

DUP (element) Page 884

◦ The values of the NZCV flags.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

DUP (element) Page 885

DUP (general)

Duplicate general-purpose register to vector. This instruction duplicates the contents of the source general-purpose
register into a scalar or each element in a vector, and writes the result to the SIMD&FP destination register.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 0 0 0 imm5 0 0 0 0 1 1 Rn Rd

DUP <Vd>.<T>, <R><n>

integer d = UInt(Rd);
integer n = UInt(Rn);

integer size = LowestSetBit(imm5);

if size > 3 then UNDEFINED;

// imm5<4:size+1> is IGNORED

if size == 3 && Q == '0' then UNDEFINED;

integer esize = 8 << size;
integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;

Assembler Symbols

<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.

<T> Is an arrangement specifier, encoded in “imm5:Q”:

imm5 Q <T>
x0000 x RESERVED
xxxx1 0 8B
xxxx1 1 16B
xxx10 0 4H
xxx10 1 8H
xx100 0 2S
xx100 1 4S
x1000 0 RESERVED
x1000 1 2D

<R> Is the width specifier for the general-purpose source register, encoded in “imm5”:

imm5 <R>
x0000 RESERVED
xxxx1 W
xxx10 W
xx100 W
x1000 X
Unspecified bits in "imm5" are ignored but should be set to zero by an assembler.
<n> Is the number [0-30] of the general-purpose source register or ZR (31), encoded in the "Rn" field.

Operation

CheckFPAdvSIMDEnabled64();
bits(esize) element = X[n];
bits(datasize) result;

for e = 0 to elements-1
Elem[result, e, esize] = element;
V[d] = result;

DUP (general) Page 886

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

DUP (general) Page 887

EOR (vector)

Bitwise Exclusive OR (vector). This instruction performs a bitwise Exclusive OR operation between the two source
SIMD&FP registers, and places the result in the destination SIMD&FP register.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 0 0 1 Rm 0 0 0 1 1 1 Rn Rd
opc2

EOR <Vd>.<T>, <Vn>.<T>, <Vm>.<T>

integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer datasize = if Q == '1' then 128 else 64;

Assembler Symbols

<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.

<T> Is an arrangement specifier, encoded in “Q”:

Q <T>
0 8B
1 16B

<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.

Operation

CheckFPAdvSIMDEnabled64();
bits(datasize) operand1;
bits(datasize) operand2;
bits(datasize) operand3;
bits(datasize) operand4 = V[n];

operand1 = V[m];
operand2 = Zeros();
operand3 = Ones();
V[d] = operand1 EOR ((operand2 EOR operand4) AND operand3);

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

EOR (vector) Page 888

EOR3

Three-way Exclusive OR performs a three-way exclusive OR of the values in the three source SIMD&FP registers, and
writes the result to the destination SIMD&FP register.
This instruction is implemented only when FEAT_SHA3 is implemented.

Advanced SIMD
(FEAT_SHA3)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 1 1 1 0 0 0 0 Rm 0 Ra Rn Rd

EOR3 <Vd>.16B, <Vn>.16B, <Vm>.16B, <Va>.16B

if !HaveSHA3Ext() then UNDEFINED;

integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer a = UInt(Ra);

Assembler Symbols

Operation

AArch64.CheckFPAdvSIMDEnabled();

bits(128) Vm = V[m];
bits(128) Vn = V[n];
bits(128) Va = V[a];
V[d] = Vn EOR Vm EOR Va;

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

EOR3 Page 889

EXT

Extract vector from pair of vectors. This instruction extracts the lowest vector elements from the second source
SIMD&FP register and the highest vector elements from the first source SIMD&FP register, concatenates the results
into a vector, and writes the vector to the destination SIMD&FP register vector. The index value specifies the lowest
vector element to extract from the first source register, and consecutive elements are extracted from the first, then
second, source registers until the destination vector is filled.
The following figure shows an example of the operation of EXT doubleword operation for Q = 0 and imm4<2:0> = 3.
7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0
Vm Vn

Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 0 0 0 Rm 0 imm4 0 Rn Rd

EXT <Vd>.<T>, <Vn>.<T>, <Vm>.<T>, #<index>

integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);

if Q == '0' && imm4<3> == '1' then UNDEFINED;

integer datasize = if Q == '1' then 128 else 64;

integer position = UInt(imm4) << 3;

Assembler Symbols

<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.

<T> Is an arrangement specifier, encoded in “Q”:

Q <T>
0 8B
1 16B

<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.

<index> Is the lowest numbered byte element to be extracted, encoded in “Q:imm4”:

Q imm4<3> <index>
0 0 imm4<2:0>
0 1 RESERVED
1 x imm4

Operation

CheckFPAdvSIMDEnabled64();
bits(datasize) hi = V[m];
bits(datasize) lo = V[n];
bits(datasize*2) concat = hi:lo;

V[d] = concat<position+datasize-1:position>;

EXT Page 890

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

EXT Page 891

FABD

Floating-point Absolute Difference (vector). This instruction subtracts the floating-point values in the elements of the
second source SIMD&FP register, from the corresponding floating-point values in the elements of the first source
SIMD&FP register, places the absolute value of each result in a vector, and writes the vector to the destination
SIMD&FP register.
This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in
either a flag being set in FPSR, or a synchronous exception being generated. For more information, see Floating-point
exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 4 classes: Scalar half precision , Scalar single-precision and double-precision , Vector half
precision and Vector single-precision and double-precision

Scalar half precision

(FEAT_FP16)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 1 1 0 1 1 0 Rm 0 0 0 1 0 1 Rn Rd

FABD <Hd>, <Hn>, <Hm>

if !HaveFP16Ext() then UNDEFINED;

integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer esize = 16;
integer datasize = esize;
integer elements = 1;
boolean abs = TRUE;

Scalar single-precision and double-precision

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 1 1 0 1 sz 1 Rm 1 1 0 1 0 1 Rn Rd

FABD <V><d>, <V><n>, <V><m>

integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer esize = 32 << UInt(sz);
integer datasize = esize;
integer elements = 1;
boolean abs = TRUE;

Vector half precision

(FEAT_FP16)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 1 1 0 Rm 0 0 0 1 0 1 Rn Rd
U

FABD Page 892

FABD <Vd>.<T>, <Vn>.<T>, <Vm>.<T>

if !HaveFP16Ext() then UNDEFINED;

integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer esize = 16;
integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;
boolean abs = (U == '1');

Vector single-precision and double-precision

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 1 sz 1 Rm 1 1 0 1 0 1 Rn Rd
U

FABD <Vd>.<T>, <Vn>.<T>, <Vm>.<T>

integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
if sz:Q == '10' then UNDEFINED;
integer esize = 32 << UInt(sz);
integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;
boolean abs = (U == '1');

Assembler Symbols

<Hd> Is the 16-bit name of the SIMD&FP destination register, encoded in the "Rd" field.
<Hn> Is the 16-bit name of the first SIMD&FP source register, encoded in the "Rn" field.
<Hm> Is the 16-bit name of the second SIMD&FP source register, encoded in the "Rm" field.

<V> Is a width specifier, encoded in “sz”:

sz <V>
0 S
1 D

<T> For the half-precision variant: is an arrangement specifier, encoded in “Q”:

Q <T>
0 4H
1 8H

For the single-precision and double-precision variant: is an arrangement specifier, encoded in “sz:Q”:

sz Q <T>
0 0 2S
0 1 4S
1 0 RESERVED
1 1 2D

<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.

FABD Page 893

<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.

Operation

CheckFPAdvSIMDEnabled64();
bits(datasize) operand1 = V[n];
bits(datasize) operand2 = V[m];

bits(esize) element1;
bits(esize) element2;
bits(esize) diff;
FPCRType fpcr = FPCR[];
boolean merge = elements == 1 && IsMerging(fpcr);
bits(128) result = if merge then V[n] else Zeros();

for e = 0 to elements-1
element1 = Elem[operand1, e, esize];
element2 = Elem[operand2, e, esize];
diff = FPSub(element1, element2, fpcr);
Elem[result, e, esize] = if abs then FPAbs(diff) else diff;

V[d] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

FABD Page 894

FABS (scalar)

Floating-point Absolute value (scalar). This instruction calculates the absolute value in the SIMD&FP source register
and writes the result to the SIMD&FP destination register.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 1 1 1 1 0 ftype 1 0 0 0 0 0 1 1 0 0 0 0 Rn Rd
opc

Half-precision (ftype == 11)

(FEAT_FP16)

FABS <Hd>, <Hn>

Single-precision (ftype == 00)

FABS <Sd>, <Sn>

Double-precision (ftype == 01)

FABS <Dd>, <Dn>

integer d = UInt(Rd);
integer n = UInt(Rn);

integer esize;
case ftype of
when '00' esize = 32;
when '01' esize = 64;
when '10' UNDEFINED;
when '11'
if HaveFP16Ext() then
esize = 16;
else
UNDEFINED;

Assembler Symbols

<Dd> Is the 64-bit name of the SIMD&FP destination register, encoded in the "Rd" field.
<Dn> Is the 64-bit name of the SIMD&FP source register, encoded in the "Rn" field.
<Hd> Is the 16-bit name of the SIMD&FP destination register, encoded in the "Rd" field.
<Hn> Is the 16-bit name of the SIMD&FP source register, encoded in the "Rn" field.
<Sd> Is the 32-bit name of the SIMD&FP destination register, encoded in the "Rd" field.
<Sn> Is the 32-bit name of the SIMD&FP source register, encoded in the "Rn" field.

Operation

CheckFPAdvSIMDEnabled64();

FPCRType fpcr = FPCR[];

boolean merge = IsMerging(fpcr);
bits(128) result = if merge then V[d] else Zeros();

bits(esize) operand = V[n];

Elem[result, 0, esize] = FPAbs(operand);

V[d] = result;

FABS (scalar) Page 895

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

FABS (scalar) Page 896

FABS (vector)

Floating-point Absolute value (vector). This instruction calculates the absolute value of each vector element in the
source SIMD&FP register, writes the result to a vector, and writes the vector to the destination SIMD&FP register.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: Half-precision and Single-precision and double-precision

Half-precision
(FEAT_FP16)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 1 1 1 1 1 0 0 0 1 1 1 1 1 0 Rn Rd
U

FABS <Vd>.<T>, <Vn>.<T>

if !HaveFP16Ext() then UNDEFINED;

integer d = UInt(Rd);
integer n = UInt(Rn);

integer esize = 16;

integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;
boolean neg = (U == '1');

Single-precision and double-precision

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 1 sz 1 0 0 0 0 0 1 1 1 1 1 0 Rn Rd
U

FABS <Vd>.<T>, <Vn>.<T>

integer d = UInt(Rd);
integer n = UInt(Rn);

if sz:Q == '10' then UNDEFINED;

integer esize = 32 << UInt(sz);
integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;
boolean neg = (U == '1');

Assembler Symbols

<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.

<T> For the half-precision variant: is an arrangement specifier, encoded in “Q”:

Q <T>
0 4H
1 8H

For the single-precision and double-precision variant: is an arrangement specifier, encoded in “sz:Q”:

FABS (vector) Page 897

sz Q <T>
0 0 2S
0 1 4S
1 0 RESERVED
1 1 2D

<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.

Operation

CheckFPAdvSIMDEnabled64();
bits(datasize) operand = V[n];
bits(datasize) result;
bits(esize) element;

for e = 0 to elements-1
element = Elem[operand, e, esize];
if neg then
element = FPNeg(element);
else
element = FPAbs(element);
Elem[result, e, esize] = element;

V[d] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

FABS (vector) Page 898

FACGE

Floating-point Absolute Compare Greater than or Equal (vector). This instruction compares the absolute value of each
floating-point value in the first source SIMD&FP register with the absolute value of the corresponding floating-point
value in the second source SIMD&FP register and if the first value is greater than or equal to the second value sets
every bit of the corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of
the corresponding vector element in the destination SIMD&FP register to zero.
This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in
either a flag being set in FPSR, or a synchronous exception being generated. For more information, see Floating-point
exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 4 classes: Scalar half precision , Scalar single-precision and double-precision , Vector half
precision and Vector single-precision and double-precision

Scalar half precision

(FEAT_FP16)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 1 1 0 0 1 0 Rm 0 0 1 0 1 1 Rn Rd
U E ac

FACGE <Hd>, <Hn>, <Hm>

if !HaveFP16Ext() then UNDEFINED;

integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer esize = 16;
integer datasize = esize;
integer elements = 1;
CompareOp cmp;
boolean abs;

case E:U:ac of
when '000' cmp = CompareOp_EQ; abs = FALSE;
when '010' cmp = CompareOp_GE; abs = FALSE;
when '011' cmp = CompareOp_GE; abs = TRUE;
when '110' cmp = CompareOp_GT; abs = FALSE;
when '111' cmp = CompareOp_GT; abs = TRUE;
otherwise UNDEFINED;

Scalar single-precision and double-precision

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 1 1 0 0 sz 1 Rm 1 1 1 0 1 1 Rn Rd
U E ac

FACGE Page 899

FACGE <V><d>, <V><n>, <V><m>

integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer esize = 32 << UInt(sz);
integer datasize = esize;
integer elements = 1;
CompareOp cmp;
boolean abs;

Vector half precision

(FEAT_FP16)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 0 1 0 Rm 0 0 1 0 1 1 Rn Rd
U E ac

FACGE <Vd>.<T>, <Vn>.<T>, <Vm>.<T>

if !HaveFP16Ext() then UNDEFINED;

integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer esize = 16;
integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;
CompareOp cmp;
boolean abs;

Vector single-precision and double-precision

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 0 sz 1 Rm 1 1 1 0 1 1 Rn Rd
U E ac

FACGE Page 900

FACGE <Vd>.<T>, <Vn>.<T>, <Vm>.<T>

Assembler Symbols

<V> Is a width specifier, encoded in “sz”:

sz <V>
0 S
1 D

<T> For the half-precision variant: is an arrangement specifier, encoded in “Q”:

Q <T>
0 4H
1 8H

For the single-precision and double-precision variant: is an arrangement specifier, encoded in “sz:Q”:

sz Q <T>
0 0 2S
0 1 4S
1 0 RESERVED
1 1 2D

<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.

FACGE Page 901

Operation

CheckFPAdvSIMDEnabled64();
bits(datasize) operand1 = V[n];
bits(datasize) operand2 = V[m];

bits(esize) element1;
bits(esize) element2;
boolean test_passed;
FPCRType fpcr = FPCR[];
boolean merge = elements == 1 && IsMerging(fpcr);
bits(128) result = if merge then V[m] else Zeros();

for e = 0 to elements-1
element1 = Elem[operand1, e, esize];
element2 = Elem[operand2, e, esize];
if abs then
element1 = FPAbs(element1);
element2 = FPAbs(element2);
case cmp of
when CompareOp_EQ test_passed = FPCompareEQ(element1, element2, fpcr);
when CompareOp_GE test_passed = FPCompareGE(element1, element2, fpcr);
when CompareOp_GT test_passed = FPCompareGT(element1, element2, fpcr);
Elem[result, e, esize] = if test_passed then Ones() else Zeros();

V[d] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

FACGE Page 902

FACGT

Floating-point Absolute Compare Greater than (vector). This instruction compares the absolute value of each vector
element in the first source SIMD&FP register with the absolute value of the corresponding vector element in the
second source SIMD&FP register and if the first value is greater than the second value sets every bit of the
corresponding vector element in the destination SIMD&FP register to one, otherwise sets every bit of the
corresponding vector element in the destination SIMD&FP register to zero.
This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in
either a flag being set in FPSR, or a synchronous exception being generated. For more information, see Floating-point
exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 4 classes: Scalar half precision , Scalar single-precision and double-precision , Vector half
precision and Vector single-precision and double-precision

Scalar half precision

(FEAT_FP16)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 1 1 0 1 1 0 Rm 0 0 1 0 1 1 Rn Rd
U E ac

FACGT <Hd>, <Hn>, <Hm>

if !HaveFP16Ext() then UNDEFINED;

integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer esize = 16;
integer datasize = esize;
integer elements = 1;
CompareOp cmp;
boolean abs;

Scalar single-precision and double-precision

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 1 1 0 1 sz 1 Rm 1 1 1 0 1 1 Rn Rd
U E ac

FACGT Page 903

FACGT <V><d>, <V><n>, <V><m>

integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer esize = 32 << UInt(sz);
integer datasize = esize;
integer elements = 1;
CompareOp cmp;
boolean abs;

Vector half precision

(FEAT_FP16)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 1 1 0 Rm 0 0 1 0 1 1 Rn Rd
U E ac

FACGT <Vd>.<T>, <Vn>.<T>, <Vm>.<T>

if !HaveFP16Ext() then UNDEFINED;

Vector single-precision and double-precision

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 1 sz 1 Rm 1 1 1 0 1 1 Rn Rd
U E ac

FACGT Page 904

FACGT <Vd>.<T>, <Vn>.<T>, <Vm>.<T>

Assembler Symbols

<V> Is a width specifier, encoded in “sz”:

sz <V>
0 S
1 D

<T> For the half-precision variant: is an arrangement specifier, encoded in “Q”:

Q <T>
0 4H
1 8H

For the single-precision and double-precision variant: is an arrangement specifier, encoded in “sz:Q”:

sz Q <T>
0 0 2S
0 1 4S
1 0 RESERVED
1 1 2D

<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.

FACGT Page 905

Operation

CheckFPAdvSIMDEnabled64();
bits(datasize) operand1 = V[n];
bits(datasize) operand2 = V[m];

bits(esize) element1;
bits(esize) element2;
boolean test_passed;
FPCRType fpcr = FPCR[];
boolean merge = elements == 1 && IsMerging(fpcr);
bits(128) result = if merge then V[m] else Zeros();

V[d] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

FACGT Page 906

FADD (scalar)

Floating-point Add (scalar). This instruction adds the floating-point values of the two source SIMD&FP registers, and
writes the result to the destination SIMD&FP register.
This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in
either a flag being set in FPSR, or a synchronous exception being generated. For more information, see Floating-point
exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 1 1 1 1 0 ftype 1 Rm 0 0 1 0 1 0 Rn Rd
op

Half-precision (ftype == 11)

(FEAT_FP16)

FADD <Hd>, <Hn>, <Hm>

Single-precision (ftype == 00)

FADD <Sd>, <Sn>, <Sm>

Double-precision (ftype == 01)

FADD <Dd>, <Dn>, <Dm>

integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);

integer esize;
case ftype of
when '00' esize = 32;
when '01' esize = 64;
when '10' UNDEFINED;
when '11'
if HaveFP16Ext() then
esize = 16;
else
UNDEFINED;

Assembler Symbols

<Dd> Is the 64-bit name of the SIMD&FP destination register, encoded in the "Rd" field.
<Dn> Is the 64-bit name of the first SIMD&FP source register, encoded in the "Rn" field.
<Dm> Is the 64-bit name of the second SIMD&FP source register, encoded in the "Rm" field.
<Hd> Is the 16-bit name of the SIMD&FP destination register, encoded in the "Rd" field.
<Hn> Is the 16-bit name of the first SIMD&FP source register, encoded in the "Rn" field.
<Hm> Is the 16-bit name of the second SIMD&FP source register, encoded in the "Rm" field.
<Sd> Is the 32-bit name of the SIMD&FP destination register, encoded in the "Rd" field.
<Sn> Is the 32-bit name of the first SIMD&FP source register, encoded in the "Rn" field.
<Sm> Is the 32-bit name of the second SIMD&FP source register, encoded in the "Rm" field.

FADD (scalar) Page 907

Operation

CheckFPAdvSIMDEnabled64();
bits(esize) operand1 = V[n];
bits(esize) operand2 = V[m];

FPCRType fpcr = FPCR[];

boolean merge = IsMerging(fpcr);
bits(128) result = if merge then V[n] else Zeros();

Elem[result, 0, esize] = FPAdd(operand1, operand2, fpcr);

V[d] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

FADD (scalar) Page 908

FADD (vector)

Floating-point Add (vector). This instruction adds corresponding vector elements in the two source SIMD&FP
registers, writes the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in
this instruction are floating-point values.
This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in
either a flag being set in FPSR or a synchronous exception being generated. For more information, see Floating-point
exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: Half-precision and Single-precision and double-precision

Half-precision
(FEAT_FP16)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 0 1 0 Rm 0 0 0 1 0 1 Rn Rd
U

FADD <Vd>.<T>, <Vn>.<T>, <Vm>.<T>

if !HaveFP16Ext() then UNDEFINED;

integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer esize = 16;
integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;

boolean pair = (U == '1');

Single-precision and double-precision

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 0 sz 1 Rm 1 1 0 1 0 1 Rn Rd
U

FADD <Vd>.<T>, <Vn>.<T>, <Vm>.<T>

boolean pair = (U == '1');

Assembler Symbols

<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.

<T> For the half-precision variant: is an arrangement specifier, encoded in “Q”:

Q <T>
0 4H
1 8H

FADD (vector) Page 909

For the single-precision and double-precision variant: is an arrangement specifier, encoded in “sz:Q”:

sz Q <T>
0 0 2S
0 1 4S
1 0 RESERVED
1 1 2D

<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.

Operation

for e = 0 to elements-1
if pair then
element1 = Elem[concat, 2*e, esize];
element2 = Elem[concat, (2*e)+1, esize];
else
element1 = Elem[operand1, e, esize];
element2 = Elem[operand2, e, esize];
Elem[result, e, esize] = FPAdd(element1, element2, FPCR[]);

V[d] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

FADD (vector) Page 910

FADDP (scalar)

Floating-point Add Pair of elements (scalar). This instruction adds two floating-point vector elements in the source
SIMD&FP register and writes the scalar result into the destination SIMD&FP register.
This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in
either a flag being set in FPSR or a synchronous exception being generated. For more information, see Floating-point
exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: Half-precision and Single-precision and double-precision

Half-precision
(FEAT_FP16)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 0 0 sz 1 1 0 0 0 0 1 1 0 1 1 0 Rn Rd

FADDP <V><d>, <Vn>.<T>

if !HaveFP16Ext() then UNDEFINED;

integer d = UInt(Rd);
integer n = UInt(Rn);
integer esize = 16;
if sz == '1' then UNDEFINED;
integer datasize = 32;

Single-precision and double-precision

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 1 1 0 0 sz 1 1 0 0 0 0 1 1 0 1 1 0 Rn Rd

FADDP <V><d>, <Vn>.<T>

integer d = UInt(Rd);
integer n = UInt(Rn);

integer esize = 32 << UInt(sz);

integer datasize = esize * 2;

Assembler Symbols

<V> For the half-precision variant: is the destination width specifier, encoded in “sz”:

sz <V>
0 H
1 RESERVED

For the single-precision and double-precision variant: is the destination width specifier, encoded in “sz”:

sz <V>
0 S
1 D

<d> Is the number of the SIMD&FP destination register, encoded in the "Rd" field.
<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.

<T> For the half-precision variant: is the source arrangement specifier, encoded in “sz”:

FADDP (scalar) Page 911

sz <T>
0 2H
1 RESERVED

For the single-precision and double-precision variant: is the source arrangement specifier, encoded in
“sz”:

sz <T>
0 2S
1 2D

Operation

CheckFPAdvSIMDEnabled64();
bits(datasize) operand = V[n];
V[d] = Reduce(ReduceOp_FADD, operand, esize);

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

FADDP (scalar) Page 912

FADDP (vector)

Floating-point Add Pairwise (vector). This instruction creates a vector by concatenating the vector elements of the first
source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of
adjacent vector elements from the concatenated vector, adds each pair of values together, places the result into a
vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are floating-point
values.
This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in
either a flag being set in FPSR or a synchronous exception being generated. For more information, see Floating-point
exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: Half-precision and Single-precision and double-precision

Half-precision
(FEAT_FP16)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 0 1 0 Rm 0 0 0 1 0 1 Rn Rd
U

FADDP <Vd>.<T>, <Vn>.<T>, <Vm>.<T>

if !HaveFP16Ext() then UNDEFINED;

integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer esize = 16;
integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;

boolean pair = (U == '1');

Single-precision and double-precision

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 0 sz 1 Rm 1 1 0 1 0 1 Rn Rd
U

FADDP <Vd>.<T>, <Vn>.<T>, <Vm>.<T>

boolean pair = (U == '1');

Assembler Symbols

<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.

<T> For the half-precision variant: is an arrangement specifier, encoded in “Q”:

Q <T>
0 4H
1 8H

FADDP (vector) Page 913

For the single-precision and double-precision variant: is an arrangement specifier, encoded in “sz:Q”:

sz Q <T>
0 0 2S
0 1 4S
1 0 RESERVED
1 1 2D

<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.

Operation

V[d] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

FADDP (vector) Page 914

FCADD

Floating-point Complex Add.

Vector
(FEAT_FCMA)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 size 0 Rm 1 1 1 rot 0 1 Rn Rd

FCADD <Vd>.<T>, <Vn>.<T>, <Vm>.<T>, #<rotate>

if !HaveFCADDExt() then UNDEFINED;

integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
if size == '00' then UNDEFINED;
if Q == '0' && size == '11' then UNDEFINED;
integer esize = 8 << UInt(size);
if !HaveFP16Ext() && esize == 16 then UNDEFINED;
integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;

Assembler Symbols

<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.

<T> Is an arrangement specifier, encoded in “size:Q”:

size Q <T>
00 x RESERVED
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 0 RESERVED
11 1 2D

<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.

<rotate> Is the rotation, encoded in “rot”:

rot <rotate>
0 90
1 270

FCADD Page 915

Operation

CheckFPAdvSIMDEnabled64();
bits(datasize) operand1 = V[n];
bits(datasize) operand2 = V[m];
bits(datasize) result;
bits(esize) element1;
bits(esize) element3;

for e = 0 to (elements DIV 2)-1

case rot of
when '0'
element1 = FPNeg(Elem[operand2, e*2+1, esize]);
element3 = Elem[operand2, e*2, esize];
when '1'
element1 = Elem[operand2, e*2+1, esize];
element3 = FPNeg(Elem[operand2, e*2, esize]);
Elem[result, e*2, esize] = FPAdd(Elem[operand1, e*2, esize], element1, FPCR[]);
Elem[result, e*2+1, esize] = FPAdd(Elem[operand1, e*2+1, esize], element3, FPCR[]);

V[d] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

FCADD Page 916

FCCMP

Floating-point Conditional quiet Compare (scalar). This instruction compares the two SIMD&FP source register values
and writes the result to the PSTATE.{N, Z, C, V} flags. If the condition does not pass then the PSTATE.{N, Z, C, V}
flags are set to the flag bit specifier.
This instruction raises an Invalid Operation floating-point exception if either or both of the operands is a signaling
NaN.
A floating-point exception can be generated by this instruction. Depending on the settings in FPCR, the exception
results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see
Floating-point exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 1 1 1 1 0 ftype 1 Rm cond 0 1 Rn 0 nzcv
op

Half-precision (ftype == 11)

(FEAT_FP16)

FCCMP <Hn>, <Hm>, #<nzcv>, <cond>

Single-precision (ftype == 00)

FCCMP <Sn>, <Sm>, #<nzcv>, <cond>

Double-precision (ftype == 01)

FCCMP <Dn>, <Dm>, #<nzcv>, <cond>

integer n = UInt(Rn);
integer m = UInt(Rm);

integer datasize;
case ftype of
when '00' datasize = 32;
when '01' datasize = 64;
when '10' UNDEFINED;
when '11'
if HaveFP16Ext() then
datasize = 16;
else
UNDEFINED;

bits(4) flags = nzcv;

Assembler Symbols

<Dn> Is the 64-bit name of the first SIMD&FP source register, encoded in the "Rn" field.
<Dm> Is the 64-bit name of the second SIMD&FP source register, encoded in the "Rm" field.
<Hn> Is the 16-bit name of the first SIMD&FP source register, encoded in the "Rn" field.
<Hm> Is the 16-bit name of the second SIMD&FP source register, encoded in the "Rm" field.
<Sn> Is the 32-bit name of the first SIMD&FP source register, encoded in the "Rn" field.
<Sm> Is the 32-bit name of the second SIMD&FP source register, encoded in the "Rm" field.
<nzcv> Is the flag bit specifier, an immediate in the range 0 to 15, giving the alternative state for the 4-bit
NZCV condition flags, encoded in the "nzcv" field.
<cond> Is one of the standard conditions, encoded in the "cond" field in the standard way.

FCCMP Page 917

Operation

CheckFPAdvSIMDEnabled64();

bits(datasize) operand1 = V[n];

bits(datasize) operand2;

operand2 = V[m];

if ConditionHolds(cond) then
flags = FPCompare(operand1, operand2, FALSE, FPCR[]);
PSTATE.<N,Z,C,V> = flags;

Operational information

The IEEE 754 standard specifies that the result of a comparison is precisely one of <, ==, > or unordered. If either or
both of the operands is a NaN, they are unordered, and all three of (Operand1 < Operand2), (Operand1 == Operand2)
and (Operand1 > Operand2) are false. An unordered comparison sets the PSTATE condition flags to N=0, Z=0, C=1,
and V=1.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

FCCMP Page 918

FCCMPE

Floating-point Conditional signaling Compare (scalar). This instruction compares the two SIMD&FP source register
values and writes the result to the PSTATE.{N, Z, C, V} flags. If the condition does not pass then the PSTATE.{N, Z, C,
V} flags are set to the flag bit specifier.
This instruction raises an Invalid Operation floating-point exception if either or both of the operands is any type of
NaN.
A floating-point exception can be generated by this instruction. Depending on the settings in FPCR, the exception
results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see
Floating-point exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 1 1 1 1 0 ftype 1 Rm cond 0 1 Rn 1 nzcv
op

Half-precision (ftype == 11)

(FEAT_FP16)

FCCMPE <Hn>, <Hm>, #<nzcv>, <cond>

Single-precision (ftype == 00)

FCCMPE <Sn>, <Sm>, #<nzcv>, <cond>

Double-precision (ftype == 01)

FCCMPE <Dn>, <Dm>, #<nzcv>, <cond>

integer n = UInt(Rn);
integer m = UInt(Rm);

integer datasize;
case ftype of
when '00' datasize = 32;
when '01' datasize = 64;
when '10' UNDEFINED;
when '11'
if HaveFP16Ext() then
datasize = 16;
else
UNDEFINED;

bits(4) flags = nzcv;

Assembler Symbols

FCCMPE Page 919

Operation

CheckFPAdvSIMDEnabled64();

bits(datasize) operand1 = V[n];

bits(datasize) operand2;

operand2 = V[m];

if ConditionHolds(cond) then
flags = FPCompare(operand1, operand2, TRUE, FPCR[]);
PSTATE.<N,Z,C,V> = flags;

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

FCCMPE Page 920

FCMEQ (register)

Floating-point Compare Equal (vector). This instruction compares each floating-point value from the first source
SIMD&FP register, with the corresponding floating-point value from the second source SIMD&FP register, and if the
comparison is equal sets every bit of the corresponding vector element in the destination SIMD&FP register to one,
otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in
either a flag being set in FPSR, or a synchronous exception being generated. For more information, see Floating-point
exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 4 classes: Scalar half precision , Scalar single-precision and double-precision , Vector half
precision and Vector single-precision and double-precision

Scalar half precision

(FEAT_FP16)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 0 0 1 0 Rm 0 0 1 0 0 1 Rn Rd
U E ac

FCMEQ <Hd>, <Hn>, <Hm>

if !HaveFP16Ext() then UNDEFINED;

integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer esize = 16;
integer datasize = esize;
integer elements = 1;
CompareOp cmp;
boolean abs;

Scalar single-precision and double-precision

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 0 0 sz 1 Rm 1 1 1 0 0 1 Rn Rd
U E ac

FCMEQ (register) Page 921

FCMEQ <V><d>, <V><n>, <V><m>

integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer esize = 32 << UInt(sz);
integer datasize = esize;
integer elements = 1;
CompareOp cmp;
boolean abs;

Vector half precision

(FEAT_FP16)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 0 1 0 Rm 0 0 1 0 0 1 Rn Rd
U E ac

FCMEQ <Vd>.<T>, <Vn>.<T>, <Vm>.<T>

if !HaveFP16Ext() then UNDEFINED;

Vector single-precision and double-precision

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 0 sz 1 Rm 1 1 1 0 0 1 Rn Rd
U E ac

FCMEQ (register) Page 922

FCMEQ <Vd>.<T>, <Vn>.<T>, <Vm>.<T>

Assembler Symbols

<V> Is a width specifier, encoded in “sz”:

sz <V>
0 S
1 D

<T> For the half-precision variant: is an arrangement specifier, encoded in “Q”:

Q <T>
0 4H
1 8H

For the single-precision and double-precision variant: is an arrangement specifier, encoded in “sz:Q”:

sz Q <T>
0 0 2S
0 1 4S
1 0 RESERVED
1 1 2D

<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.

FCMEQ (register) Page 923

Operation

CheckFPAdvSIMDEnabled64();
bits(datasize) operand1 = V[n];
bits(datasize) operand2 = V[m];

bits(esize) element1;
bits(esize) element2;
boolean test_passed;
FPCRType fpcr = FPCR[];
boolean merge = elements == 1 && IsMerging(fpcr);
bits(128) result = if merge then V[m] else Zeros();

V[d] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

FCMEQ (register) Page 924

FCMEQ (zero)

Floating-point Compare Equal to zero (vector). This instruction reads each floating-point value in the source SIMD&FP
register and if the value is equal to zero sets every bit of the corresponding vector element in the destination
SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP
register to zero.
This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in
either a flag being set in FPSR, or a synchronous exception being generated. For more information, see Floating-point
exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 4 classes: Scalar half precision , Scalar single-precision and double-precision , Vector half
precision and Vector single-precision and double-precision

Scalar half precision

(FEAT_FP16)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 0 1 1 1 1 1 0 0 0 1 1 0 1 1 0 Rn Rd
U op

FCMEQ <Hd>, <Hn>, #0.0

if !HaveFP16Ext() then UNDEFINED;

integer d = UInt(Rd);
integer n = UInt(Rn);

integer esize = 16;

integer datasize = esize;
integer elements = 1;

CompareOp comparison;
case op:U of
when '00' comparison = CompareOp_GT;
when '01' comparison = CompareOp_GE;
when '10' comparison = CompareOp_EQ;
when '11' comparison = CompareOp_LE;

Scalar single-precision and double-precision

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 0 1 sz 1 0 0 0 0 0 1 1 0 1 1 0 Rn Rd
U op

FCMEQ <V><d>, <V><n>, #0.0

integer d = UInt(Rd);
integer n = UInt(Rn);

integer esize = 32 << UInt(sz);

integer datasize = esize;
integer elements = 1;

CompareOp comparison;
case op:U of
when '00' comparison = CompareOp_GT;
when '01' comparison = CompareOp_GE;
when '10' comparison = CompareOp_EQ;
when '11' comparison = CompareOp_LE;

FCMEQ (zero) Page 925

Vector half precision
(FEAT_FP16)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 1 1 1 1 1 0 0 0 1 1 0 1 1 0 Rn Rd
U op

FCMEQ <Vd>.<T>, <Vn>.<T>, #0.0

if !HaveFP16Ext() then UNDEFINED;

integer d = UInt(Rd);
integer n = UInt(Rn);

integer esize = 16;

integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;

CompareOp comparison;
case op:U of
when '00' comparison = CompareOp_GT;
when '01' comparison = CompareOp_GE;
when '10' comparison = CompareOp_EQ;
when '11' comparison = CompareOp_LE;

Vector single-precision and double-precision

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 1 sz 1 0 0 0 0 0 1 1 0 1 1 0 Rn Rd
U op

FCMEQ <Vd>.<T>, <Vn>.<T>, #0.0

integer d = UInt(Rd);
integer n = UInt(Rn);

if sz:Q == '10' then UNDEFINED;

integer esize = 32 << UInt(sz);
integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;

CompareOp comparison;
case op:U of
when '00' comparison = CompareOp_GT;
when '01' comparison = CompareOp_GE;
when '10' comparison = CompareOp_EQ;
when '11' comparison = CompareOp_LE;

Assembler Symbols

<Hd> Is the 16-bit name of the SIMD&FP destination register, encoded in the "Rd" field.
<Hn> Is the 16-bit name of the SIMD&FP source register, encoded in the "Rn" field.

<V> Is a width specifier, encoded in “sz”:

sz <V>
0 S
1 D

FCMEQ (zero) Page 926

<T> For the half-precision variant: is an arrangement specifier, encoded in “Q”:

Q <T>
0 4H
1 8H

For the single-precision and double-precision variant: is an arrangement specifier, encoded in “sz:Q”:

sz Q <T>
0 0 2S
0 1 4S
1 0 RESERVED
1 1 2D

<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.

Operation

CheckFPAdvSIMDEnabled64();
bits(datasize) operand = V[n];
bits(datasize) result;
bits(esize) zero = FPZero('0');
bits(esize) element;
boolean test_passed;

for e = 0 to elements-1
element = Elem[operand, e, esize];
case comparison of
when CompareOp_GT test_passed = FPCompareGT(element, zero, FPCR[]);
when CompareOp_GE test_passed = FPCompareGE(element, zero, FPCR[]);
when CompareOp_EQ test_passed = FPCompareEQ(element, zero, FPCR[]);
when CompareOp_LE test_passed = FPCompareGE(zero, element, FPCR[]);
when CompareOp_LT test_passed = FPCompareGT(zero, element, FPCR[]);
Elem[result, e, esize] = if test_passed then Ones() else Zeros();

V[d] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

FCMEQ (zero) Page 927

FCMGE (register)

Floating-point Compare Greater than or Equal (vector). This instruction reads each floating-point value in the first
source SIMD&FP register and if the value is greater than or equal to the corresponding floating-point value in the
second source SIMD&FP register sets every bit of the corresponding vector element in the destination SIMD&FP
register to one, otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to
zero.
This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in
either a flag being set in FPSR, or a synchronous exception being generated. For more information, see Floating-point
exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 4 classes: Scalar half precision , Scalar single-precision and double-precision , Vector half
precision and Vector single-precision and double-precision

Scalar half precision

(FEAT_FP16)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 1 1 0 0 1 0 Rm 0 0 1 0 0 1 Rn Rd
U E ac

FCMGE <Hd>, <Hn>, <Hm>

if !HaveFP16Ext() then UNDEFINED;

integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer esize = 16;
integer datasize = esize;
integer elements = 1;
CompareOp cmp;
boolean abs;

Scalar single-precision and double-precision

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 1 1 0 0 sz 1 Rm 1 1 1 0 0 1 Rn Rd
U E ac

FCMGE (register) Page 928

FCMGE <V><d>, <V><n>, <V><m>

integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer esize = 32 << UInt(sz);
integer datasize = esize;
integer elements = 1;
CompareOp cmp;
boolean abs;

Vector half precision

(FEAT_FP16)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 0 1 0 Rm 0 0 1 0 0 1 Rn Rd
U E ac

FCMGE <Vd>.<T>, <Vn>.<T>, <Vm>.<T>

if !HaveFP16Ext() then UNDEFINED;

Vector single-precision and double-precision

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 0 sz 1 Rm 1 1 1 0 0 1 Rn Rd
U E ac

FCMGE (register) Page 929

FCMGE <Vd>.<T>, <Vn>.<T>, <Vm>.<T>

Assembler Symbols

<V> Is a width specifier, encoded in “sz”:

sz <V>
0 S
1 D

<T> For the half-precision variant: is an arrangement specifier, encoded in “Q”:

Q <T>
0 4H
1 8H

For the single-precision and double-precision variant: is an arrangement specifier, encoded in “sz:Q”:

sz Q <T>
0 0 2S
0 1 4S
1 0 RESERVED
1 1 2D

<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.

FCMGE (register) Page 930

Operation

CheckFPAdvSIMDEnabled64();
bits(datasize) operand1 = V[n];
bits(datasize) operand2 = V[m];

bits(esize) element1;
bits(esize) element2;
boolean test_passed;
FPCRType fpcr = FPCR[];
boolean merge = elements == 1 && IsMerging(fpcr);
bits(128) result = if merge then V[m] else Zeros();

V[d] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

FCMGE (register) Page 931

FCMGE (zero)

Floating-point Compare Greater than or Equal to zero (vector). This instruction reads each floating-point value in the
source SIMD&FP register and if the value is greater than or equal to zero sets every bit of the corresponding vector
element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in
the destination SIMD&FP register to zero.
This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in
either a flag being set in FPSR, or a synchronous exception being generated. For more information, see Floating-point
exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 4 classes: Scalar half precision , Scalar single-precision and double-precision , Vector half
precision and Vector single-precision and double-precision

Scalar half precision

(FEAT_FP16)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 1 1 0 1 1 1 1 1 0 0 0 1 1 0 0 1 0 Rn Rd
U op

FCMGE <Hd>, <Hn>, #0.0

if !HaveFP16Ext() then UNDEFINED;

integer d = UInt(Rd);
integer n = UInt(Rn);

integer esize = 16;

integer datasize = esize;
integer elements = 1;

CompareOp comparison;
case op:U of
when '00' comparison = CompareOp_GT;
when '01' comparison = CompareOp_GE;
when '10' comparison = CompareOp_EQ;
when '11' comparison = CompareOp_LE;

Scalar single-precision and double-precision

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 1 1 0 1 sz 1 0 0 0 0 0 1 1 0 0 1 0 Rn Rd
U op

FCMGE <V><d>, <V><n>, #0.0

integer d = UInt(Rd);
integer n = UInt(Rn);

integer esize = 32 << UInt(sz);

integer datasize = esize;
integer elements = 1;

CompareOp comparison;
case op:U of
when '00' comparison = CompareOp_GT;
when '01' comparison = CompareOp_GE;
when '10' comparison = CompareOp_EQ;
when '11' comparison = CompareOp_LE;

FCMGE (zero) Page 932

Vector half precision
(FEAT_FP16)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 1 1 1 1 1 0 0 0 1 1 0 0 1 0 Rn Rd
U op

FCMGE <Vd>.<T>, <Vn>.<T>, #0.0

if !HaveFP16Ext() then UNDEFINED;

integer d = UInt(Rd);
integer n = UInt(Rn);

integer esize = 16;

integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;

CompareOp comparison;
case op:U of
when '00' comparison = CompareOp_GT;
when '01' comparison = CompareOp_GE;
when '10' comparison = CompareOp_EQ;
when '11' comparison = CompareOp_LE;

Vector single-precision and double-precision

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 1 sz 1 0 0 0 0 0 1 1 0 0 1 0 Rn Rd
U op

FCMGE <Vd>.<T>, <Vn>.<T>, #0.0

integer d = UInt(Rd);
integer n = UInt(Rn);

if sz:Q == '10' then UNDEFINED;

integer esize = 32 << UInt(sz);
integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;

CompareOp comparison;
case op:U of
when '00' comparison = CompareOp_GT;
when '01' comparison = CompareOp_GE;
when '10' comparison = CompareOp_EQ;
when '11' comparison = CompareOp_LE;

Assembler Symbols

<Hd> Is the 16-bit name of the SIMD&FP destination register, encoded in the "Rd" field.
<Hn> Is the 16-bit name of the SIMD&FP source register, encoded in the "Rn" field.

<V> Is a width specifier, encoded in “sz”:

sz <V>
0 S
1 D

FCMGE (zero) Page 933

<T> For the half-precision variant: is an arrangement specifier, encoded in “Q”:

Q <T>
0 4H
1 8H

For the single-precision and double-precision variant: is an arrangement specifier, encoded in “sz:Q”:

sz Q <T>
0 0 2S
0 1 4S
1 0 RESERVED
1 1 2D

<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.

Operation

CheckFPAdvSIMDEnabled64();
bits(datasize) operand = V[n];
bits(datasize) result;
bits(esize) zero = FPZero('0');
bits(esize) element;
boolean test_passed;

V[d] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

FCMGE (zero) Page 934

FCMGT (register)

Floating-point Compare Greater than (vector). This instruction reads each floating-point value in the first source
SIMD&FP register and if the value is greater than the corresponding floating-point value in the second source
SIMD&FP register sets every bit of the corresponding vector element in the destination SIMD&FP register to one,
otherwise sets every bit of the corresponding vector element in the destination SIMD&FP register to zero.
This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in
either a flag being set in FPSR, or a synchronous exception being generated. For more information, see Floating-point
exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 4 classes: Scalar half precision , Scalar single-precision and double-precision , Vector half
precision and Vector single-precision and double-precision

Scalar half precision

(FEAT_FP16)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 1 1 0 1 1 0 Rm 0 0 1 0 0 1 Rn Rd
U E ac

FCMGT <Hd>, <Hn>, <Hm>

if !HaveFP16Ext() then UNDEFINED;

integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer esize = 16;
integer datasize = esize;
integer elements = 1;
CompareOp cmp;
boolean abs;

Scalar single-precision and double-precision

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 1 1 0 1 sz 1 Rm 1 1 1 0 0 1 Rn Rd
U E ac

FCMGT (register) Page 935

FCMGT <V><d>, <V><n>, <V><m>

integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer esize = 32 << UInt(sz);
integer datasize = esize;
integer elements = 1;
CompareOp cmp;
boolean abs;

Vector half precision

(FEAT_FP16)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 1 1 0 Rm 0 0 1 0 0 1 Rn Rd
U E ac

FCMGT <Vd>.<T>, <Vn>.<T>, <Vm>.<T>

if !HaveFP16Ext() then UNDEFINED;

Vector single-precision and double-precision

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 1 sz 1 Rm 1 1 1 0 0 1 Rn Rd
U E ac

FCMGT (register) Page 936

FCMGT <Vd>.<T>, <Vn>.<T>, <Vm>.<T>

Assembler Symbols

<V> Is a width specifier, encoded in “sz”:

sz <V>
0 S
1 D

<T> For the half-precision variant: is an arrangement specifier, encoded in “Q”:

Q <T>
0 4H
1 8H

For the single-precision and double-precision variant: is an arrangement specifier, encoded in “sz:Q”:

sz Q <T>
0 0 2S
0 1 4S
1 0 RESERVED
1 1 2D

<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.

FCMGT (register) Page 937

Operation

CheckFPAdvSIMDEnabled64();
bits(datasize) operand1 = V[n];
bits(datasize) operand2 = V[m];

bits(esize) element1;
bits(esize) element2;
boolean test_passed;
FPCRType fpcr = FPCR[];
boolean merge = elements == 1 && IsMerging(fpcr);
bits(128) result = if merge then V[m] else Zeros();

V[d] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

FCMGT (register) Page 938

FCMGT (zero)

Floating-point Compare Greater than zero (vector). This instruction reads each floating-point value in the source
SIMD&FP register and if the value is greater than zero sets every bit of the corresponding vector element in the
destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination
SIMD&FP register to zero.
This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in
either a flag being set in FPSR, or a synchronous exception being generated. For more information, see Floating-point
exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 4 classes: Scalar half precision , Scalar single-precision and double-precision , Vector half
precision and Vector single-precision and double-precision

Scalar half precision

(FEAT_FP16)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 0 1 1 1 1 1 0 0 0 1 1 0 0 1 0 Rn Rd
U op

FCMGT <Hd>, <Hn>, #0.0

if !HaveFP16Ext() then UNDEFINED;

integer d = UInt(Rd);
integer n = UInt(Rn);

integer esize = 16;

integer datasize = esize;
integer elements = 1;

CompareOp comparison;
case op:U of
when '00' comparison = CompareOp_GT;
when '01' comparison = CompareOp_GE;
when '10' comparison = CompareOp_EQ;
when '11' comparison = CompareOp_LE;

Scalar single-precision and double-precision

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 0 1 sz 1 0 0 0 0 0 1 1 0 0 1 0 Rn Rd
U op

FCMGT <V><d>, <V><n>, #0.0

integer d = UInt(Rd);
integer n = UInt(Rn);

integer esize = 32 << UInt(sz);

integer datasize = esize;
integer elements = 1;

CompareOp comparison;
case op:U of
when '00' comparison = CompareOp_GT;
when '01' comparison = CompareOp_GE;
when '10' comparison = CompareOp_EQ;
when '11' comparison = CompareOp_LE;

FCMGT (zero) Page 939

Vector half precision
(FEAT_FP16)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 1 1 1 1 1 0 0 0 1 1 0 0 1 0 Rn Rd
U op

FCMGT <Vd>.<T>, <Vn>.<T>, #0.0

if !HaveFP16Ext() then UNDEFINED;

integer d = UInt(Rd);
integer n = UInt(Rn);

integer esize = 16;

integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;

CompareOp comparison;
case op:U of
when '00' comparison = CompareOp_GT;
when '01' comparison = CompareOp_GE;
when '10' comparison = CompareOp_EQ;
when '11' comparison = CompareOp_LE;

Vector single-precision and double-precision

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 1 sz 1 0 0 0 0 0 1 1 0 0 1 0 Rn Rd
U op

FCMGT <Vd>.<T>, <Vn>.<T>, #0.0

integer d = UInt(Rd);
integer n = UInt(Rn);

if sz:Q == '10' then UNDEFINED;

integer esize = 32 << UInt(sz);
integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;

CompareOp comparison;
case op:U of
when '00' comparison = CompareOp_GT;
when '01' comparison = CompareOp_GE;
when '10' comparison = CompareOp_EQ;
when '11' comparison = CompareOp_LE;

Assembler Symbols

<Hd> Is the 16-bit name of the SIMD&FP destination register, encoded in the "Rd" field.
<Hn> Is the 16-bit name of the SIMD&FP source register, encoded in the "Rn" field.

<V> Is a width specifier, encoded in “sz”:

sz <V>
0 S
1 D

FCMGT (zero) Page 940

<T> For the half-precision variant: is an arrangement specifier, encoded in “Q”:

Q <T>
0 4H
1 8H

For the single-precision and double-precision variant: is an arrangement specifier, encoded in “sz:Q”:

sz Q <T>
0 0 2S
0 1 4S
1 0 RESERVED
1 1 2D

<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.

Operation

CheckFPAdvSIMDEnabled64();
bits(datasize) operand = V[n];
bits(datasize) result;
bits(esize) zero = FPZero('0');
bits(esize) element;
boolean test_passed;

V[d] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

FCMGT (zero) Page 941

FCMLA

Floating-point Complex Multiply Accumulate.

This instruction operates on complex numbers that are represented in SIMD&FP registers as pairs of elements, with
the more significant element holding the imaginary part of the number and the less significant element holding the
real part of the number. Each element holds a floating-point value. It performs the following computation on the
corresponding complex number element pairs from the two source registers and the destination register:
• Considering the complex number from the second source register on an Argand diagram, the number is
rotated counterclockwise by 0, 90, 180, or 270 degrees.
• The two elements of the transformed complex number are multiplied by:
◦ The real element of the complex number from the first source register, if the transformation was a
rotation by 0 or 180 degrees.
◦ The imaginary element of the complex number from the first source register, if the transformation
was a rotation by 90 or 270 degrees.
• The complex number resulting from that multiplication is added to the complex number from the destination
register.
The multiplication and addition operations are performed as a fused multiply-add, without any intermediate rounding.
This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in
either a flag being set in FPSR or a synchronous exception being generated. For more information, see Floating-point
exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.

Vector
(FEAT_FCMA)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 size 0 Rm 1 1 0 rot 1 Rn Rd

FCMLA <Vd>.<T>, <Vn>.<T>, <Vm>.<T>, #<rotate>

if !HaveFCADDExt() then UNDEFINED;

Assembler Symbols

<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.

<T> Is an arrangement specifier, encoded in “size:Q”:

size Q <T>
00 x RESERVED
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 0 RESERVED
11 1 2D

<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.

<rotate> Is the rotation, encoded in “rot”:

FCMLA Page 942

rot <rotate>
00 0
01 90
10 180
11 270

Operation

CheckFPAdvSIMDEnabled64();
bits(datasize) operand1 = V[n];
bits(datasize) operand2 = V[m];
bits(datasize) operand3 = V[d];
bits(datasize) result;
bits(esize) element1;
bits(esize) element2;
bits(esize) element3;
bits(esize) element4;
FPCRType fpcr = FPCR[];

for e = 0 to (elements DIV 2)-1

case rot of
when '00'
element1 = Elem[operand2, e*2, esize];
element2 = Elem[operand1, e*2, esize];
element3 = Elem[operand2, e*2+1, esize];
element4 = Elem[operand1, e*2, esize];
when '01'
element1 = FPNeg(Elem[operand2, e*2+1, esize]);
element2 = Elem[operand1, e*2+1, esize];
element3 = Elem[operand2, e*2, esize];
element4 = Elem[operand1, e*2+1, esize];
when '10'
element1 = FPNeg(Elem[operand2, e*2, esize]);
element2 = Elem[operand1, e*2, esize];
element3 = FPNeg(Elem[operand2, e*2+1, esize]);
element4 = Elem[operand1, e*2, esize];
when '11'
element1 = Elem[operand2, e*2+1, esize];
element2 = Elem[operand1, e*2+1, esize];
element3 = FPNeg(Elem[operand2, e*2, esize]);
element4 = Elem[operand1, e*2+1, esize];

Elem[result, e2, esize] = FPMulAdd(Elem[operand3, e2, esize], element2, element1, fpcr);

Elem[result, e*2+1, esize] = FPMulAdd(Elem[operand3, e*2+1, esize], element4, element3, fpcr);

V[d] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

FCMLA Page 943

FCMLA (by element)

Floating-point Complex Multiply Accumulate (by element).

This instruction operates on complex numbers that are represented in SIMD&FP registers as pairs of elements, with
the more significant element holding the imaginary part of the number and the less significant element holding the
real part of the number. Each element holds a floating-point value. It performs the following computation on complex
numbers from the first source register and the destination register with the specified complex number from the
second source register:
• Considering the complex number from the second source register on an Argand diagram, the number is
rotated counterclockwise by 0, 90, 180, or 270 degrees.
• The two elements of the transformed complex number are multiplied by:
◦ The real element of the complex number from the first source register, if the transformation was a
rotation by 0 or 180 degrees.
◦ The imaginary element of the complex number from the first source register, if the transformation
was a rotation by 90 or 270 degrees.
• The complex number resulting from that multiplication is added to the complex number from the destination
register.
The multiplication and addition operations are performed as a fused multiply-add, without any intermediate rounding.
This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in
either a flag being set in FPSR or a synchronous exception being generated. For more information, see Floating-point
exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.

Vector
(FEAT_FCMA)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 1 size L M Rm 0 rot 1 H 0 Rn Rd

(size == 01)

FCMLA <Vd>.<T>, <Vn>.<T>, <Vm>.<Ts>[<index>], #<rotate>

(size == 10)

FCMLA <Vd>.<T>, <Vn>.<T>, <Vm>.<Ts>[<index>], #<rotate>

if !HaveFCADDExt() then UNDEFINED;

integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(M:Rm);
integer index;
if size == '00' || size == '11' then UNDEFINED;
if size == '01' then index = UInt(H:L);
if size == '10' then index = UInt(H);
integer esize = 8 << UInt(size);
if !HaveFP16Ext() && esize == 16 then UNDEFINED;
integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;
if size == '10' && (L == '1' || Q == '0') then UNDEFINED;
if size == '01' && H == '1' && Q == '0' then UNDEFINED;

Assembler Symbols

<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.

<T> Is an arrangement specifier, encoded in “size:Q”:

FCMLA (by element) Page 944

size Q <T>
00 x RESERVED
01 0 4H
01 1 8H
10 0 RESERVED
10 1 4S
11 x RESERVED

<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
<Vm> Is the name of the second SIMD&FP source register, encoded in the "M:Rm" fields.

<Ts> Is an element size specifier, encoded in “size”:

size <Ts>
00 RESERVED
01 H
10 S
11 RESERVED

<index> Is the element index, encoded in “size:H:L”:

size <index>
00 RESERVED
01 H:L
10 H
11 RESERVED

<rotate> Is the rotation, encoded in “rot”:

rot <rotate>
00 0
01 90
10 180
11 270

FCMLA (by element) Page 945

Operation

CheckFPAdvSIMDEnabled64();
bits(datasize) operand1 = V[n];
bits(datasize) operand2 = V[m];
bits(datasize) operand3 = V[d];
bits(datasize) result;
FPCRType fpcr = FPCR[];

for e = 0 to (elements DIV 2)-1

bits(esize) element1;
bits(esize) element2;
bits(esize) element3;
bits(esize) element4;
case rot of
when '00'
element1 = Elem[operand2, index*2, esize];
element2 = Elem[operand1, e*2, esize];
element3 = Elem[operand2, index*2+1, esize];
element4 = Elem[operand1, e*2, esize];
when '01'
element1 = FPNeg(Elem[operand2, index*2+1, esize]);
element2 = Elem[operand1, e*2+1, esize];
element3 = Elem[operand2, index*2, esize];
element4 = Elem[operand1, e*2+1, esize];
when '10'
element1 = FPNeg(Elem[operand2, index*2, esize]);
element2 = Elem[operand1, e*2, esize];
element3 = FPNeg(Elem[operand2, index*2+1, esize]);
element4 = Elem[operand1, e*2, esize];
when '11'
element1 = Elem[operand2, index*2+1, esize];
element2 = Elem[operand1, e*2+1, esize];
element3 = FPNeg(Elem[operand2, index*2, esize]);
element4 = Elem[operand1, e*2+1, esize];

Elem[result, e2, esize] = FPMulAdd(Elem[operand3, e2, esize], element2, element1, fpcr);

Elem[result, e*2+1, esize] = FPMulAdd(Elem[operand3, e*2+1, esize], element4, element3, fpcr);

V[d] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

FCMLA (by element) Page 946

FCMLE (zero)

Floating-point Compare Less than or Equal to zero (vector). This instruction reads each floating-point value in the
source SIMD&FP register and if the value is less than or equal to zero sets every bit of the corresponding vector
element in the destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in
the destination SIMD&FP register to zero.
This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in
either a flag being set in FPSR, or a synchronous exception being generated. For more information, see Floating-point
exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 4 classes: Scalar half precision , Scalar single-precision and double-precision , Vector half
precision and Vector single-precision and double-precision

Scalar half precision

(FEAT_FP16)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 1 1 0 1 1 1 1 1 0 0 0 1 1 0 1 1 0 Rn Rd
U op

FCMLE <Hd>, <Hn>, #0.0

if !HaveFP16Ext() then UNDEFINED;

integer d = UInt(Rd);
integer n = UInt(Rn);

integer esize = 16;

integer datasize = esize;
integer elements = 1;

CompareOp comparison;
case op:U of
when '00' comparison = CompareOp_GT;
when '01' comparison = CompareOp_GE;
when '10' comparison = CompareOp_EQ;
when '11' comparison = CompareOp_LE;

Scalar single-precision and double-precision

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 1 1 0 1 sz 1 0 0 0 0 0 1 1 0 1 1 0 Rn Rd
U op

FCMLE <V><d>, <V><n>, #0.0

integer d = UInt(Rd);
integer n = UInt(Rn);

integer esize = 32 << UInt(sz);

integer datasize = esize;
integer elements = 1;

CompareOp comparison;
case op:U of
when '00' comparison = CompareOp_GT;
when '01' comparison = CompareOp_GE;
when '10' comparison = CompareOp_EQ;
when '11' comparison = CompareOp_LE;

FCMLE (zero) Page 947

Vector half precision
(FEAT_FP16)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 1 1 1 1 1 0 0 0 1 1 0 1 1 0 Rn Rd
U op

FCMLE <Vd>.<T>, <Vn>.<T>, #0.0

if !HaveFP16Ext() then UNDEFINED;

integer d = UInt(Rd);
integer n = UInt(Rn);

integer esize = 16;

integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;

CompareOp comparison;
case op:U of
when '00' comparison = CompareOp_GT;
when '01' comparison = CompareOp_GE;
when '10' comparison = CompareOp_EQ;
when '11' comparison = CompareOp_LE;

Vector single-precision and double-precision

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 1 sz 1 0 0 0 0 0 1 1 0 1 1 0 Rn Rd
U op

FCMLE <Vd>.<T>, <Vn>.<T>, #0.0

integer d = UInt(Rd);
integer n = UInt(Rn);

if sz:Q == '10' then UNDEFINED;

integer esize = 32 << UInt(sz);
integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;

CompareOp comparison;
case op:U of
when '00' comparison = CompareOp_GT;
when '01' comparison = CompareOp_GE;
when '10' comparison = CompareOp_EQ;
when '11' comparison = CompareOp_LE;

Assembler Symbols

<Hd> Is the 16-bit name of the SIMD&FP destination register, encoded in the "Rd" field.
<Hn> Is the 16-bit name of the SIMD&FP source register, encoded in the "Rn" field.

<V> Is a width specifier, encoded in “sz”:

sz <V>
0 S
1 D

FCMLE (zero) Page 948

<T> For the half-precision variant: is an arrangement specifier, encoded in “Q”:

Q <T>
0 4H
1 8H

For the single-precision and double-precision variant: is an arrangement specifier, encoded in “sz:Q”:

sz Q <T>
0 0 2S
0 1 4S
1 0 RESERVED
1 1 2D

<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.

Operation

CheckFPAdvSIMDEnabled64();
bits(datasize) operand = V[n];
bits(datasize) result;
bits(esize) zero = FPZero('0');
bits(esize) element;
boolean test_passed;

V[d] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

FCMLE (zero) Page 949

FCMLT (zero)

Floating-point Compare Less than zero (vector). This instruction reads each floating-point value in the source
SIMD&FP register and if the value is less than zero sets every bit of the corresponding vector element in the
destination SIMD&FP register to one, otherwise sets every bit of the corresponding vector element in the destination
SIMD&FP register to zero.
This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in
either a flag being set in FPSR, or a synchronous exception being generated. For more information, see Floating-point
exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 4 classes: Scalar half precision , Scalar single-precision and double-precision , Vector half
precision and Vector single-precision and double-precision

Scalar half precision

(FEAT_FP16)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 0 1 1 1 1 1 0 0 0 1 1 1 0 1 0 Rn Rd

FCMLT <Hd>, <Hn>, #0.0

if !HaveFP16Ext() then UNDEFINED;

integer d = UInt(Rd);
integer n = UInt(Rn);

integer esize = 16;

integer datasize = esize;
integer elements = 1;

CompareOp comparison = CompareOp_LT;

Scalar single-precision and double-precision

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 0 1 sz 1 0 0 0 0 0 1 1 1 0 1 0 Rn Rd

FCMLT <V><d>, <V><n>, #0.0

integer d = UInt(Rd);
integer n = UInt(Rn);

integer esize = 32 << UInt(sz);

integer datasize = esize;
integer elements = 1;

CompareOp comparison = CompareOp_LT;

Vector half precision

(FEAT_FP16)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 1 1 1 1 1 0 0 0 1 1 1 0 1 0 Rn Rd

FCMLT (zero) Page 950

FCMLT <Vd>.<T>, <Vn>.<T>, #0.0

if !HaveFP16Ext() then UNDEFINED;

integer d = UInt(Rd);
integer n = UInt(Rn);

integer esize = 16;

integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;

CompareOp comparison = CompareOp_LT;

Vector single-precision and double-precision

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 1 sz 1 0 0 0 0 0 1 1 1 0 1 0 Rn Rd

FCMLT <Vd>.<T>, <Vn>.<T>, #0.0

integer d = UInt(Rd);
integer n = UInt(Rn);

if sz:Q == '10' then UNDEFINED;

integer esize = 32 << UInt(sz);
integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;

CompareOp comparison = CompareOp_LT;

Assembler Symbols

<Hd> Is the 16-bit name of the SIMD&FP destination register, encoded in the "Rd" field.
<Hn> Is the 16-bit name of the SIMD&FP source register, encoded in the "Rn" field.

<V> Is a width specifier, encoded in “sz”:

sz <V>
0 S
1 D

<T> For the half-precision variant: is an arrangement specifier, encoded in “Q”:

Q <T>
0 4H
1 8H

For the single-precision and double-precision variant: is an arrangement specifier, encoded in “sz:Q”:

sz Q <T>
0 0 2S
0 1 4S
1 0 RESERVED
1 1 2D

<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.

FCMLT (zero) Page 951

Operation

CheckFPAdvSIMDEnabled64();
bits(datasize) operand = V[n];
bits(datasize) result;
bits(esize) zero = FPZero('0');
bits(esize) element;
boolean test_passed;

V[d] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

FCMLT (zero) Page 952

FCMP

Floating-point quiet Compare (scalar). This instruction compares the two SIMD&FP source register values, or the first
SIMD&FP source register value and zero. It writes the result to the PSTATE.{N, Z, C, V} flags.
This instruction raises an Invalid Operation floating-point exception if either or both of the operands is a signaling
NaN.
A floating-point exception can be generated by this instruction. Depending on the settings in FPCR, the exception
results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see
Floating-point exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 1 1 1 1 0 ftype 1 Rm 0 0 1 0 0 0 Rn 0 x 0 0 0
opc

Half-precision (ftype == 11 && opc == 00)

(FEAT_FP16)

FCMP <Hn>, <Hm>

Half-precision, zero (ftype == 11 && Rm == (00000) && opc == 01)

(FEAT_FP16)

FCMP <Hn>, #0.0

Single-precision (ftype == 00 && opc == 00)

FCMP <Sn>, <Sm>

Single-precision, zero (ftype == 00 && Rm == (00000) && opc == 01)

FCMP <Sn>, #0.0

Double-precision (ftype == 01 && opc == 00)

FCMP <Dn>, <Dm>

Double-precision, zero (ftype == 01 && Rm == (00000) && opc == 01)

FCMP <Dn>, #0.0

integer n = UInt(Rn);
integer m = UInt(Rm); // ignored when opc<0> == '1'

integer datasize;
case ftype of
when '00' datasize = 32;
when '01' datasize = 64;
when '10' UNDEFINED;
when '11'
if HaveFP16Ext() then
datasize = 16;
else
UNDEFINED;

boolean signal_all_nans = (opc<1> == '1');

boolean cmp_with_zero = (opc<0> == '1');

FCMP Page 953

Assembler Symbols

<Dn> For the double-precision variant: is the 64-bit name of the first SIMD&FP source register, encoded in
the "Rn" field.
For the double-precision, zero variant: is the 64-bit name of the SIMD&FP source register, encoded in
the "Rn" field.
<Dm> Is the 64-bit name of the second SIMD&FP source register, encoded in the "Rm" field.
<Hn> For the half-precision variant: is the 16-bit name of the first SIMD&FP source register, encoded in the
"Rn" field.
For the half-precision, zero variant: is the 16-bit name of the SIMD&FP source register, encoded in the
"Rn" field.
<Hm> Is the 16-bit name of the second SIMD&FP source register, encoded in the "Rm" field.
<Sn> For the single-precision variant: is the 32-bit name of the first SIMD&FP source register, encoded in the
"Rn" field.
For the single-precision, zero variant: is the 32-bit name of the SIMD&FP source register, encoded in
the "Rn" field.
<Sm> Is the 32-bit name of the second SIMD&FP source register, encoded in the "Rm" field.

Operation

CheckFPAdvSIMDEnabled64();

bits(datasize) operand1 = V[n];

bits(datasize) operand2;

operand2 = if cmp_with_zero then FPZero('0') else V[m];

PSTATE.<N,Z,C,V> = FPCompare(operand1, operand2, signal_all_nans, FPCR[]);

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

FCMP Page 954

FCMPE

Floating-point signaling Compare (scalar). This instruction compares the two SIMD&FP source register values, or the
first SIMD&FP source register value and zero. It writes the result to the PSTATE.{N, Z, C, V} flags.
This instruction raises an Invalid Operation floating-point exception if either or both of the operands is any type of
NaN.
A floating-point exception can be generated by this instruction. Depending on the settings in FPCR, the exception
results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see
Floating-point exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 1 1 1 1 0 ftype 1 Rm 0 0 1 0 0 0 Rn 1 x 0 0 0
opc

Half-precision (ftype == 11 && opc == 10)

(FEAT_FP16)

FCMPE <Hn>, <Hm>

Half-precision, zero (ftype == 11 && Rm == (00000) && opc == 11)

(FEAT_FP16)

FCMPE <Hn>, #0.0

Single-precision (ftype == 00 && opc == 10)

FCMPE <Sn>, <Sm>

Single-precision, zero (ftype == 00 && Rm == (00000) && opc == 11)

FCMPE <Sn>, #0.0

Double-precision (ftype == 01 && opc == 10)

FCMPE <Dn>, <Dm>

Double-precision, zero (ftype == 01 && Rm == (00000) && opc == 11)

FCMPE <Dn>, #0.0

integer n = UInt(Rn);
integer m = UInt(Rm); // ignored when opc<0> == '1'

integer datasize;
case ftype of
when '00' datasize = 32;
when '01' datasize = 64;
when '10' UNDEFINED;
when '11'
if HaveFP16Ext() then
datasize = 16;
else
UNDEFINED;

boolean signal_all_nans = (opc<1> == '1');

boolean cmp_with_zero = (opc<0> == '1');

FCMPE Page 955

Assembler Symbols

Operation

CheckFPAdvSIMDEnabled64();

bits(datasize) operand1 = V[n];

bits(datasize) operand2;

operand2 = if cmp_with_zero then FPZero('0') else V[m];

PSTATE.<N,Z,C,V> = FPCompare(operand1, operand2, signal_all_nans, FPCR[]);

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

FCMPE Page 956

FCSEL

Floating-point Conditional Select (scalar). This instruction allows the SIMD&FP destination register to take the value
from either one or the other of two SIMD&FP source registers. If the condition passes, the first SIMD&FP source
register value is taken, otherwise the second SIMD&FP source register value is taken.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 1 1 1 1 0 ftype 1 Rm cond 1 1 Rn Rd

Half-precision (ftype == 11)

(FEAT_FP16)

FCSEL <Hd>, <Hn>, <Hm>, <cond>

Single-precision (ftype == 00)

FCSEL <Sd>, <Sn>, <Sm>, <cond>

Double-precision (ftype == 01)

FCSEL <Dd>, <Dn>, <Dm>, <cond>

integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);

integer datasize;
case ftype of
when '00' datasize = 32;
when '01' datasize = 64;
when '10' UNDEFINED;
when '11'
if HaveFP16Ext() then
datasize = 16;
else
UNDEFINED;

Assembler Symbols

FCSEL Page 957

Operation

CheckFPAdvSIMDEnabled64();
bits(datasize) result;

result = if ConditionHolds(cond) then V[n] else V[m];

V[d] = result;

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

FCSEL Page 958

FCVT

Floating-point Convert precision (scalar). This instruction converts the floating-point value in the SIMD&FP source
register to the precision for the destination register data type using the rounding mode that is determined by the
FPCR and writes the result to the SIMD&FP destination register.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 1 1 1 1 0 ftype 1 0 0 0 1 opc 1 0 0 0 0 Rn Rd

Half-precision to single-precision (ftype == 11 && opc == 00)

FCVT <Sd>, <Hn>

Half-precision to double-precision (ftype == 11 && opc == 01)

FCVT <Dd>, <Hn>

Single-precision to half-precision (ftype == 00 && opc == 11)

FCVT <Hd>, <Sn>

Single-precision to double-precision (ftype == 00 && opc == 01)

FCVT <Dd>, <Sn>

Double-precision to half-precision (ftype == 01 && opc == 11)

FCVT <Hd>, <Dn>

Double-precision to single-precision (ftype == 01 && opc == 00)

FCVT <Sd>, <Dn>

integer d = UInt(Rd);
integer n = UInt(Rn);
integer srcsize;
integer dstsize;

if ftype == opc then UNDEFINED;

case ftype of
when '00' srcsize = 32;
when '01' srcsize = 64;
when '10' UNDEFINED;
when '11' srcsize = 16;
case opc of
when '00' dstsize = 32;
when '01' dstsize = 64;
when '10' UNDEFINED;
when '11' dstsize = 16;

Assembler Symbols

<Dd> Is the 64-bit name of the SIMD&FP destination register, encoded in the "Rd" field.
<Hd> Is the 16-bit name of the SIMD&FP destination register, encoded in the "Rd" field.
<Sn> Is the 32-bit name of the SIMD&FP source register, encoded in the "Rn" field.
<Sd> Is the 32-bit name of the SIMD&FP destination register, encoded in the "Rd" field.

FCVT Page 959

<Hn> Is the 16-bit name of the SIMD&FP source register, encoded in the "Rn" field.
<Dn> Is the 64-bit name of the SIMD&FP source register, encoded in the "Rn" field.

Operation

CheckFPAdvSIMDEnabled64();

bits(srcsize) operand = V[n];

FPCRType fpcr = FPCR[];
boolean merge = IsMerging(fpcr);
bits(128) result = if merge then V[d] else Zeros();

Elem[result, 0, dstsize] = FPConvert(operand, fpcr);

V[d] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

FCVT Page 960

FCVTAS (scalar)

Floating-point Convert to Signed integer, rounding to nearest with ties to Away (scalar). This instruction converts the
floating-point value in the SIMD&FP source register to a 32-bit or 64-bit signed integer using the Round to Nearest
with Ties to Away rounding mode, and writes the result to the general-purpose destination register.
A floating-point exception can be generated by this instruction. Depending on the settings in FPCR, the exception
results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see
Floating-point exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 0 0 1 1 1 1 0 ftype 1 0 0 1 0 0 0 0 0 0 0 0 Rn Rd
rmode opcode

Half-precision to 32-bit (sf == 0 && ftype == 11)

(FEAT_FP16)

FCVTAS <Wd>, <Hn>

Half-precision to 64-bit (sf == 1 && ftype == 11)

(FEAT_FP16)

FCVTAS <Xd>, <Hn>

Single-precision to 32-bit (sf == 0 && ftype == 00)

FCVTAS <Wd>, <Sn>

Single-precision to 64-bit (sf == 1 && ftype == 00)

FCVTAS <Xd>, <Sn>

Double-precision to 32-bit (sf == 0 && ftype == 01)

FCVTAS <Wd>, <Dn>

Double-precision to 64-bit (sf == 1 && ftype == 01)

FCVTAS <Xd>, <Dn>

integer d = UInt(Rd);
integer n = UInt(Rn);

integer intsize = if sf == '1' then 64 else 32;

integer fltsize;

case ftype of
when '00'
fltsize = 32;
when '01'
fltsize = 64;
when '10'
UNDEFINED;
when '11'
if HaveFP16Ext() then
fltsize = 16;
else
UNDEFINED;

FCVTAS (scalar) Page 961

Assembler Symbols

Operation

CheckFPAdvSIMDEnabled64();

FPCRType fpcr = FPCR[];

bits(fltsize) fltval;
bits(intsize) intval;

fltval = V[n];
intval = FPToFixed(fltval, 0, FALSE, fpcr, FPRounding_TIEAWAY);
X[d] = intval;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

FCVTAS (scalar) Page 962

FCVTAS (vector)

Floating-point Convert to Signed integer, rounding to nearest with ties to Away (vector). This instruction converts each
element in a vector from a floating-point value to a signed integer value using the Round to Nearest with Ties to Away
rounding mode and writes the result to the SIMD&FP destination register.
A floating-point exception can be generated by this instruction. Depending on the settings in FPCR, the exception
results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see
Floating-point exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 4 classes: Scalar half precision , Scalar single-precision and double-precision , Vector half
precision and Vector single-precision and double-precision

Scalar half precision

(FEAT_FP16)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 0 0 1 1 1 1 0 0 1 1 1 0 0 1 0 Rn Rd
U

FCVTAS <Hd>, <Hn>

if !HaveFP16Ext() then UNDEFINED;

integer d = UInt(Rd);
integer n = UInt(Rn);

integer esize = 16;

integer datasize = esize;
integer elements = 1;

FPRounding rounding = FPRounding_TIEAWAY;

boolean unsigned = (U == '1');

Scalar single-precision and double-precision

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 0 0 sz 1 0 0 0 0 1 1 1 0 0 1 0 Rn Rd
U

FCVTAS <V><d>, <V><n>

integer d = UInt(Rd);
integer n = UInt(Rn);

integer esize = 32 << UInt(sz);

integer datasize = esize;
integer elements = 1;

FPRounding rounding = FPRounding_TIEAWAY;

boolean unsigned = (U == '1');

Vector half precision

(FEAT_FP16)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 0 1 1 1 1 0 0 1 1 1 0 0 1 0 Rn Rd
U

FCVTAS (vector) Page 963

FCVTAS <Vd>.<T>, <Vn>.<T>

if !HaveFP16Ext() then UNDEFINED;

integer d = UInt(Rd);
integer n = UInt(Rn);

integer esize = 16;

integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;

FPRounding rounding = FPRounding_TIEAWAY;

boolean unsigned = (U == '1');

Vector single-precision and double-precision

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 0 sz 1 0 0 0 0 1 1 1 0 0 1 0 Rn Rd
U

FCVTAS <Vd>.<T>, <Vn>.<T>

integer d = UInt(Rd);
integer n = UInt(Rn);

if sz:Q == '10' then UNDEFINED;

integer esize = 32 << UInt(sz);
integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;

FPRounding rounding = FPRounding_TIEAWAY;

boolean unsigned = (U == '1');

Assembler Symbols

<Hd> Is the 16-bit name of the SIMD&FP destination register, encoded in the "Rd" field.
<Hn> Is the 16-bit name of the SIMD&FP source register, encoded in the "Rn" field.

<V> Is a width specifier, encoded in “sz”:

sz <V>
0 S
1 D

<T> For the half-precision variant: is an arrangement specifier, encoded in “Q”:

Q <T>
0 4H
1 8H

For the single-precision and double-precision variant: is an arrangement specifier, encoded in “sz:Q”:

sz Q <T>
0 0 2S
0 1 4S
1 0 RESERVED
1 1 2D

<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.

FCVTAS (vector) Page 964

Operation

CheckFPAdvSIMDEnabled64();
bits(datasize) operand = V[n];

bits(esize) element;
FPCRType fpcr = FPCR[];
boolean merge = elements == 1 && IsMerging(fpcr);
bits(128) result = if merge then V[d] else Zeros();

for e = 0 to elements-1
element = Elem[operand, e, esize];
Elem[result, e, esize] = FPToFixed(element, 0, unsigned, fpcr, rounding);

V[d] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

FCVTAS (vector) Page 965

FCVTAU (scalar)

Floating-point Convert to Unsigned integer, rounding to nearest with ties to Away (scalar). This instruction converts
the floating-point value in the SIMD&FP source register to a 32-bit or 64-bit unsigned integer using the Round to
Nearest with Ties to Away rounding mode, and writes the result to the general-purpose destination register.
A floating-point exception can be generated by this instruction. Depending on the settings in FPCR, the exception
results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see
Floating-point exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 0 0 1 1 1 1 0 ftype 1 0 0 1 0 1 0 0 0 0 0 0 Rn Rd
rmode opcode

Half-precision to 32-bit (sf == 0 && ftype == 11)

(FEAT_FP16)

FCVTAU <Wd>, <Hn>

Half-precision to 64-bit (sf == 1 && ftype == 11)

(FEAT_FP16)

FCVTAU <Xd>, <Hn>

Single-precision to 32-bit (sf == 0 && ftype == 00)

FCVTAU <Wd>, <Sn>

Single-precision to 64-bit (sf == 1 && ftype == 00)

FCVTAU <Xd>, <Sn>

Double-precision to 32-bit (sf == 0 && ftype == 01)

FCVTAU <Wd>, <Dn>

Double-precision to 64-bit (sf == 1 && ftype == 01)

FCVTAU <Xd>, <Dn>

integer d = UInt(Rd);
integer n = UInt(Rn);

integer intsize = if sf == '1' then 64 else 32;

integer fltsize;

case ftype of
when '00'
fltsize = 32;
when '01'
fltsize = 64;
when '10'
UNDEFINED;
when '11'
if HaveFP16Ext() then
fltsize = 16;
else
UNDEFINED;

FCVTAU (scalar) Page 966

Assembler Symbols

Operation

CheckFPAdvSIMDEnabled64();

FPCRType fpcr = FPCR[];

bits(fltsize) fltval;
bits(intsize) intval;

fltval = V[n];
intval = FPToFixed(fltval, 0, TRUE, fpcr, FPRounding_TIEAWAY);
X[d] = intval;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

FCVTAU (scalar) Page 967

FCVTAU (vector)

Floating-point Convert to Unsigned integer, rounding to nearest with ties to Away (vector). This instruction converts
each element in a vector from a floating-point value to an unsigned integer value using the Round to Nearest with Ties
to Away rounding mode and writes the result to the SIMD&FP destination register.
A floating-point exception can be generated by this instruction. Depending on the settings in FPCR, the exception
results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see
Floating-point exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 4 classes: Scalar half precision , Scalar single-precision and double-precision , Vector half
precision and Vector single-precision and double-precision

Scalar half precision

(FEAT_FP16)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 1 1 0 0 1 1 1 1 0 0 1 1 1 0 0 1 0 Rn Rd
U

FCVTAU <Hd>, <Hn>

if !HaveFP16Ext() then UNDEFINED;

integer d = UInt(Rd);
integer n = UInt(Rn);

integer esize = 16;

integer datasize = esize;
integer elements = 1;

FPRounding rounding = FPRounding_TIEAWAY;

boolean unsigned = (U == '1');

Scalar single-precision and double-precision

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 1 1 0 0 sz 1 0 0 0 0 1 1 1 0 0 1 0 Rn Rd
U

FCVTAU <V><d>, <V><n>

integer d = UInt(Rd);
integer n = UInt(Rn);

integer esize = 32 << UInt(sz);

integer datasize = esize;
integer elements = 1;

FPRounding rounding = FPRounding_TIEAWAY;

boolean unsigned = (U == '1');

Vector half precision

(FEAT_FP16)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 0 1 1 1 1 0 0 1 1 1 0 0 1 0 Rn Rd
U

FCVTAU (vector) Page 968

FCVTAU <Vd>.<T>, <Vn>.<T>

if !HaveFP16Ext() then UNDEFINED;

integer d = UInt(Rd);
integer n = UInt(Rn);

integer esize = 16;

integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;

FPRounding rounding = FPRounding_TIEAWAY;

boolean unsigned = (U == '1');

Vector single-precision and double-precision

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 0 sz 1 0 0 0 0 1 1 1 0 0 1 0 Rn Rd
U

FCVTAU <Vd>.<T>, <Vn>.<T>

integer d = UInt(Rd);
integer n = UInt(Rn);

if sz:Q == '10' then UNDEFINED;

integer esize = 32 << UInt(sz);
integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;

FPRounding rounding = FPRounding_TIEAWAY;

boolean unsigned = (U == '1');

Assembler Symbols

<Hd> Is the 16-bit name of the SIMD&FP destination register, encoded in the "Rd" field.
<Hn> Is the 16-bit name of the SIMD&FP source register, encoded in the "Rn" field.

<V> Is a width specifier, encoded in “sz”:

sz <V>
0 S
1 D

<T> For the half-precision variant: is an arrangement specifier, encoded in “Q”:

Q <T>
0 4H
1 8H

For the single-precision and double-precision variant: is an arrangement specifier, encoded in “sz:Q”:

sz Q <T>
0 0 2S
0 1 4S
1 0 RESERVED
1 1 2D

<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.

FCVTAU (vector) Page 969

Operation

CheckFPAdvSIMDEnabled64();
bits(datasize) operand = V[n];

bits(esize) element;
FPCRType fpcr = FPCR[];
boolean merge = elements == 1 && IsMerging(fpcr);
bits(128) result = if merge then V[d] else Zeros();

for e = 0 to elements-1
element = Elem[operand, e, esize];
Elem[result, e, esize] = FPToFixed(element, 0, unsigned, fpcr, rounding);

V[d] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

FCVTAU (vector) Page 970

FCVTL, FCVTL2

Floating-point Convert to higher precision Long (vector). This instruction reads each element in a vector in the
SIMD&FP source register, converts each value to double the precision of the source element using the rounding mode
that is determined by the FPCR, and writes each result to the equivalent element of the vector in the SIMD&FP
destination register.
Where the operation lengthens a 64-bit vector to a 128-bit vector, the FCVTL2 variant operates on the elements in the
top 64 bits of the source register.
A floating-point exception can be generated by this instruction. Depending on the settings in FPCR, the exception
results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see
Floating-point exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 0 sz 1 0 0 0 0 1 0 1 1 1 1 0 Rn Rd

FCVTL{2} <Vd>.<Ta>, <Vn>.<Tb>

integer d = UInt(Rd);
integer n = UInt(Rn);

integer esize = 16 << UInt(sz);

integer datasize = 64;
integer part = UInt(Q);
integer elements = datasize DIV esize;

Assembler Symbols

2 Is the second and upper half specifier. If present it causes the operation to be performed on the upper
64 bits of the registers holding the narrower elements, and is encoded in “Q”:

Q 2
0 [absent]
1 [present]

<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.

<Ta> Is an arrangement specifier, encoded in “sz”:

sz <Ta>
0 4S
1 2D

<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.

<Tb> Is an arrangement specifier, encoded in “sz:Q”:

sz Q <Tb>
0 0 4H
0 1 8H
1 0 2S
1 1 4S

FCVTL, FCVTL2 Page 971

Operation

CheckFPAdvSIMDEnabled64();
bits(datasize) operand = Vpart[n, part];
bits(2*datasize) result;

for e = 0 to elements-1
Elem[result, e, 2*esize] = FPConvert(Elem[operand, e, esize], FPCR[]);

V[d] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

FCVTL, FCVTL2 Page 972

FCVTMS (scalar)

Floating-point Convert to Signed integer, rounding toward Minus infinity (scalar). This instruction converts the
floating-point value in the SIMD&FP source register to a 32-bit or 64-bit signed integer using the Round towards
Minus Infinity rounding mode, and writes the result to the general-purpose destination register.
A floating-point exception can be generated by this instruction. Depending on the settings in FPCR, the exception
results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see
Floating-point exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 0 0 1 1 1 1 0 ftype 1 1 0 0 0 0 0 0 0 0 0 0 Rn Rd
rmode opcode

Half-precision to 32-bit (sf == 0 && ftype == 11)

(FEAT_FP16)

FCVTMS <Wd>, <Hn>

Half-precision to 64-bit (sf == 1 && ftype == 11)

(FEAT_FP16)

FCVTMS <Xd>, <Hn>

Single-precision to 32-bit (sf == 0 && ftype == 00)

FCVTMS <Wd>, <Sn>

Single-precision to 64-bit (sf == 1 && ftype == 00)

FCVTMS <Xd>, <Sn>

Double-precision to 32-bit (sf == 0 && ftype == 01)

FCVTMS <Wd>, <Dn>

Double-precision to 64-bit (sf == 1 && ftype == 01)

FCVTMS <Xd>, <Dn>

integer d = UInt(Rd);
integer n = UInt(Rn);

integer intsize = if sf == '1' then 64 else 32;

integer fltsize;
FPRounding rounding;

case ftype of
when '00'
fltsize = 32;
when '01'
fltsize = 64;
when '10'
UNDEFINED;
when '11'
if HaveFP16Ext() then
fltsize = 16;
else
UNDEFINED;

rounding = FPDecodeRounding(rmode);

FCVTMS (scalar) Page 973

Assembler Symbols

Operation

CheckFPAdvSIMDEnabled64();

FPCRType fpcr = FPCR[];

bits(fltsize) fltval;
bits(intsize) intval;

fltval = V[n];
intval = FPToFixed(fltval, 0, FALSE, fpcr, rounding);
X[d] = intval;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

FCVTMS (scalar) Page 974

FCVTMS (vector)

Floating-point Convert to Signed integer, rounding toward Minus infinity (vector). This instruction converts a scalar or
each element in a vector from a floating-point value to a signed integer value using the Round towards Minus Infinity
rounding mode, and writes the result to the SIMD&FP destination register.
A floating-point exception can be generated by this instruction. Depending on the settings in FPCR, the exception
results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see
Floating-point exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the Security state and
Exception level in which the instruction is executed, an attempt to execute the instruction might be trapped.
It has encodings from 4 classes: Scalar half precision , Scalar single-precision and double-precision , Vector half
precision and Vector single-precision and double-precision

Scalar half precision

(FEAT_FP16)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 0 0 1 1 1 1 0 0 1 1 0 1 1 1 0 Rn Rd
U o2 o1

FCVTMS <Hd>, <Hn>

if !HaveFP16Ext() then UNDEFINED;

integer d = UInt(Rd);
integer n = UInt(Rn);

integer esize = 16;

integer datasize = esize;
integer elements = 1;

FPRounding rounding = FPDecodeRounding(o1:o2);

boolean unsigned = (U == '1');

Scalar single-precision and double-precision

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 0 0 sz 1 0 0 0 0 1 1 0 1 1 1 0 Rn Rd
U o2 o1

FCVTMS <V><d>, <V><n>

integer d = UInt(Rd);
integer n = UInt(Rn);

integer esize = 32 << UInt(sz);

integer datasize = esize;
integer elements = 1;

FPRounding rounding = FPDecodeRounding(o1:o2);

boolean unsigned = (U == '1');

Vector half precision

(FEAT_FP16)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 0 1 1 1 1 0 0 1 1 0 1 1 1 0 Rn Rd
U o2 o1

FCVTMS (vector) Page 975

FCVTMS <Vd>.<T>, <Vn>.<T>

if !HaveFP16Ext() then UNDEFINED;

integer d = UInt(Rd);
integer n = UInt(Rn);

integer esize = 16;

integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;

FPRounding rounding = FPDecodeRounding(o1:o2);

boolean unsigned = (U == '1');

Vector single-precision and double-precision

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 0 sz 1 0 0 0 0 1 1 0 1 1 1 0 Rn Rd
U o2 o1

FCVTMS <Vd>.<T>, <Vn>.<T>

integer d = UInt(Rd);
integer n = UInt(Rn);

if sz:Q == '10' then UNDEFINED;

integer esize = 32 << UInt(sz);
integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;

FPRounding rounding = FPDecodeRounding(o1:o2);

boolean unsigned = (U == '1');

Assembler Symbols

<Hd> Is the 16-bit name of the SIMD&FP destination register, encoded in the "Rd" field.
<Hn> Is the 16-bit name of the SIMD&FP source register, encoded in the "Rn" field.

<V> Is a width specifier, encoded in “sz”:

sz <V>
0 S
1 D

<T> For the half-precision variant: is an arrangement specifier, encoded in “Q”:

Q <T>
0 4H
1 8H

For the single-precision and double-precision variant: is an arrangement specifier, encoded in “sz:Q”:

sz Q <T>
0 0 2S
0 1 4S
1 0 RESERVED
1 1 2D

<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.

FCVTMS (vector) Page 976

Operation

CheckFPAdvSIMDEnabled64();
bits(datasize) operand = V[n];

bits(esize) element;
FPCRType fpcr = FPCR[];
boolean merge = elements == 1 && IsMerging(fpcr);
bits(128) result = if merge then V[d] else Zeros();

for e = 0 to elements-1
element = Elem[operand, e, esize];
Elem[result, e, esize] = FPToFixed(element, 0, unsigned, fpcr, rounding);

V[d] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

FCVTMS (vector) Page 977

FCVTMU (scalar)

Floating-point Convert to Unsigned integer, rounding toward Minus infinity (scalar). This instruction converts the
floating-point value in the SIMD&FP source register to a 32-bit or 64-bit unsigned integer using the Round towards
Minus Infinity rounding mode, and writes the result to the general-purpose destination register.
A floating-point exception can be generated by this instruction. Depending on the settings in FPCR, the exception
results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see
Floating-point exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 0 0 1 1 1 1 0 ftype 1 1 0 0 0 1 0 0 0 0 0 0 Rn Rd
rmode opcode

Half-precision to 32-bit (sf == 0 && ftype == 11)

(FEAT_FP16)

FCVTMU <Wd>, <Hn>

Half-precision to 64-bit (sf == 1 && ftype == 11)

(FEAT_FP16)

FCVTMU <Xd>, <Hn>

Single-precision to 32-bit (sf == 0 && ftype == 00)

FCVTMU <Wd>, <Sn>

Single-precision to 64-bit (sf == 1 && ftype == 00)

FCVTMU <Xd>, <Sn>

Double-precision to 32-bit (sf == 0 && ftype == 01)

FCVTMU <Wd>, <Dn>

Double-precision to 64-bit (sf == 1 && ftype == 01)

FCVTMU <Xd>, <Dn>

integer d = UInt(Rd);
integer n = UInt(Rn);

integer intsize = if sf == '1' then 64 else 32;

integer fltsize;
FPRounding rounding;

case ftype of
when '00'
fltsize = 32;
when '01'
fltsize = 64;
when '10'
UNDEFINED;
when '11'
if HaveFP16Ext() then
fltsize = 16;
else
UNDEFINED;

rounding = FPDecodeRounding(rmode);

FCVTMU (scalar) Page 978

Assembler Symbols

Operation

CheckFPAdvSIMDEnabled64();

FPCRType fpcr = FPCR[];

bits(fltsize) fltval;
bits(intsize) intval;

fltval = V[n];
intval = FPToFixed(fltval, 0, TRUE, fpcr, rounding);
X[d] = intval;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

FCVTMU (scalar) Page 979

FCVTMU (vector)

Floating-point Convert to Unsigned integer, rounding toward Minus infinity (vector). This instruction converts a scalar
or each element in a vector from a floating-point value to an unsigned integer value using the Round towards Minus
Infinity rounding mode, and writes the result to the SIMD&FP destination register.
A floating-point exception can be generated by this instruction. Depending on the settings in FPCR, the exception
results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see
Floating-point exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the Security state and
Exception level in which the instruction is executed, an attempt to execute the instruction might be trapped.
It has encodings from 4 classes: Scalar half precision , Scalar single-precision and double-precision , Vector half
precision and Vector single-precision and double-precision

Scalar half precision

(FEAT_FP16)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 1 1 0 0 1 1 1 1 0 0 1 1 0 1 1 1 0 Rn Rd
U o2 o1

FCVTMU <Hd>, <Hn>

if !HaveFP16Ext() then UNDEFINED;

integer d = UInt(Rd);
integer n = UInt(Rn);

integer esize = 16;

integer datasize = esize;
integer elements = 1;

FPRounding rounding = FPDecodeRounding(o1:o2);

boolean unsigned = (U == '1');

Scalar single-precision and double-precision

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 1 1 0 0 sz 1 0 0 0 0 1 1 0 1 1 1 0 Rn Rd
U o2 o1

FCVTMU <V><d>, <V><n>

integer d = UInt(Rd);
integer n = UInt(Rn);

integer esize = 32 << UInt(sz);

integer datasize = esize;
integer elements = 1;

FPRounding rounding = FPDecodeRounding(o1:o2);

boolean unsigned = (U == '1');

Vector half precision

(FEAT_FP16)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 0 1 1 1 1 0 0 1 1 0 1 1 1 0 Rn Rd
U o2 o1

FCVTMU (vector) Page 980

FCVTMU <Vd>.<T>, <Vn>.<T>

if !HaveFP16Ext() then UNDEFINED;

integer d = UInt(Rd);
integer n = UInt(Rn);

integer esize = 16;

integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;

FPRounding rounding = FPDecodeRounding(o1:o2);

boolean unsigned = (U == '1');

Vector single-precision and double-precision

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 0 sz 1 0 0 0 0 1 1 0 1 1 1 0 Rn Rd
U o2 o1

FCVTMU <Vd>.<T>, <Vn>.<T>

integer d = UInt(Rd);
integer n = UInt(Rn);

if sz:Q == '10' then UNDEFINED;

integer esize = 32 << UInt(sz);
integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;

FPRounding rounding = FPDecodeRounding(o1:o2);

boolean unsigned = (U == '1');

Assembler Symbols

<Hd> Is the 16-bit name of the SIMD&FP destination register, encoded in the "Rd" field.
<Hn> Is the 16-bit name of the SIMD&FP source register, encoded in the "Rn" field.

<V> Is a width specifier, encoded in “sz”:

sz <V>
0 S
1 D

<T> For the half-precision variant: is an arrangement specifier, encoded in “Q”:

Q <T>
0 4H
1 8H

For the single-precision and double-precision variant: is an arrangement specifier, encoded in “sz:Q”:

sz Q <T>
0 0 2S
0 1 4S
1 0 RESERVED
1 1 2D

<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.

FCVTMU (vector) Page 981

Operation

CheckFPAdvSIMDEnabled64();
bits(datasize) operand = V[n];

bits(esize) element;
FPCRType fpcr = FPCR[];
boolean merge = elements == 1 && IsMerging(fpcr);
bits(128) result = if merge then V[d] else Zeros();

for e = 0 to elements-1
element = Elem[operand, e, esize];
Elem[result, e, esize] = FPToFixed(element, 0, unsigned, fpcr, rounding);

V[d] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

FCVTMU (vector) Page 982

FCVTN, FCVTN2

Floating-point Convert to lower precision Narrow (vector). This instruction reads each vector element in the SIMD&FP
source register, converts each result to half the precision of the source element, writes the final result to a vector, and
writes the vector to the lower or upper half of the destination SIMD&FP register. The destination vector elements are
half as long as the source vector elements. The rounding mode is determined by the FPCR.
The FCVTN instruction writes the vector to the lower half of the destination register and clears the upper half, while the
FCVTN2 instruction writes the vector to the upper half of the destination register without affecting the other bits of the
register.
A floating-point exception can be generated by this instruction. Depending on the settings in FPCR, the exception
results in either a flag being set in FPSR or a synchronous exception being generated. For more information, see
Floating-point exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the Security state and
Exception level in which the instruction is executed, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 0 sz 1 0 0 0 0 1 0 1 1 0 1 0 Rn Rd

FCVTN{2} <Vd>.<Tb>, <Vn>.<Ta>

integer d = UInt(Rd);
integer n = UInt(Rn);

integer esize = 16 << UInt(sz);

integer datasize = 64;
integer part = UInt(Q);
integer elements = datasize DIV esize;

Assembler Symbols

2 Is the second and upper half specifier. If present it causes the operation to be performed on the upper
64 bits of the registers holding the narrower elements, and is encoded in “Q”:

Q 2
0 [absent]
1 [present]

<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.

<Tb> Is an arrangement specifier, encoded in “sz:Q”:

sz Q <Tb>
0 0 4H
0 1 8H
1 0 2S
1 1 4S

<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.

<Ta> Is an arrangement specifier, encoded in “sz”:

sz <Ta>
0 4S
1 2D

FCVTN, FCVTN2 Page 983

Operation

CheckFPAdvSIMDEnabled64();
bits(2*datasize) operand = V[n];
bits(datasize) result;

for e = 0 to elements-1
Elem[result, e, esize] = FPConvert(Elem[operand, e, 2*esize], FPCR[]);

Vpart[d, part] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

FCVTN, FCVTN2 Page 984

FCVTNS (scalar)

Floating-point Convert to Signed integer, rounding to nearest with ties to even (scalar). This instruction converts the
floating-point value in the SIMD&FP source register to a 32-bit or 64-bit signed integer using the Round to Nearest
rounding mode, and writes the result to the general-purpose destination register.
A floating-point exception can be generated by this instruction. Depending on the settings in FPCR, the exception
results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see
Floating-point exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 0 0 1 1 1 1 0 ftype 1 0 0 0 0 0 0 0 0 0 0 0 Rn Rd
rmode opcode

Half-precision to 32-bit (sf == 0 && ftype == 11)

(FEAT_FP16)

FCVTNS <Wd>, <Hn>

Half-precision to 64-bit (sf == 1 && ftype == 11)

(FEAT_FP16)

FCVTNS <Xd>, <Hn>

Single-precision to 32-bit (sf == 0 && ftype == 00)

FCVTNS <Wd>, <Sn>

Single-precision to 64-bit (sf == 1 && ftype == 00)

FCVTNS <Xd>, <Sn>

Double-precision to 32-bit (sf == 0 && ftype == 01)

FCVTNS <Wd>, <Dn>

Double-precision to 64-bit (sf == 1 && ftype == 01)

FCVTNS <Xd>, <Dn>

integer d = UInt(Rd);
integer n = UInt(Rn);

integer intsize = if sf == '1' then 64 else 32;

integer fltsize;
FPRounding rounding;

case ftype of
when '00'
fltsize = 32;
when '01'
fltsize = 64;
when '10'
UNDEFINED;
when '11'
if HaveFP16Ext() then
fltsize = 16;
else
UNDEFINED;

rounding = FPDecodeRounding(rmode);

FCVTNS (scalar) Page 985

Assembler Symbols

Operation

CheckFPAdvSIMDEnabled64();

FPCRType fpcr = FPCR[];

bits(fltsize) fltval;
bits(intsize) intval;

fltval = V[n];
intval = FPToFixed(fltval, 0, FALSE, fpcr, rounding);
X[d] = intval;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

FCVTNS (scalar) Page 986

FCVTNS (vector)

Floating-point Convert to Signed integer, rounding to nearest with ties to even (vector). This instruction converts a
scalar or each element in a vector from a floating-point value to a signed integer value using the Round to Nearest
rounding mode, and writes the result to the SIMD&FP destination register.
A floating-point exception can be generated by this instruction. Depending on the settings in FPCR, the exception
results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see
Floating-point exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the Security state and
Exception level in which the instruction is executed, an attempt to execute the instruction might be trapped.
It has encodings from 4 classes: Scalar half precision , Scalar single-precision and double-precision , Vector half
precision and Vector single-precision and double-precision

Scalar half precision

(FEAT_FP16)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 0 0 1 1 1 1 0 0 1 1 0 1 0 1 0 Rn Rd
U o2 o1

FCVTNS <Hd>, <Hn>

if !HaveFP16Ext() then UNDEFINED;

integer d = UInt(Rd);
integer n = UInt(Rn);

integer esize = 16;

integer datasize = esize;
integer elements = 1;

FPRounding rounding = FPDecodeRounding(o1:o2);

boolean unsigned = (U == '1');

Scalar single-precision and double-precision

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 0 0 sz 1 0 0 0 0 1 1 0 1 0 1 0 Rn Rd
U o2 o1

FCVTNS <V><d>, <V><n>

integer d = UInt(Rd);
integer n = UInt(Rn);

integer esize = 32 << UInt(sz);

integer datasize = esize;
integer elements = 1;

FPRounding rounding = FPDecodeRounding(o1:o2);

boolean unsigned = (U == '1');

Vector half precision

(FEAT_FP16)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 0 1 1 1 1 0 0 1 1 0 1 0 1 0 Rn Rd
U o2 o1

FCVTNS (vector) Page 987

FCVTNS <Vd>.<T>, <Vn>.<T>

if !HaveFP16Ext() then UNDEFINED;

integer d = UInt(Rd);
integer n = UInt(Rn);

integer esize = 16;

integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;

FPRounding rounding = FPDecodeRounding(o1:o2);

boolean unsigned = (U == '1');

Vector single-precision and double-precision

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 0 sz 1 0 0 0 0 1 1 0 1 0 1 0 Rn Rd
U o2 o1

FCVTNS <Vd>.<T>, <Vn>.<T>

integer d = UInt(Rd);
integer n = UInt(Rn);

if sz:Q == '10' then UNDEFINED;

integer esize = 32 << UInt(sz);
integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;

FPRounding rounding = FPDecodeRounding(o1:o2);

boolean unsigned = (U == '1');

Assembler Symbols

<Hd> Is the 16-bit name of the SIMD&FP destination register, encoded in the "Rd" field.
<Hn> Is the 16-bit name of the SIMD&FP source register, encoded in the "Rn" field.

<V> Is a width specifier, encoded in “sz”:

sz <V>
0 S
1 D

<T> For the half-precision variant: is an arrangement specifier, encoded in “Q”:

Q <T>
0 4H
1 8H

For the single-precision and double-precision variant: is an arrangement specifier, encoded in “sz:Q”:

sz Q <T>
0 0 2S
0 1 4S
1 0 RESERVED
1 1 2D

<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.

FCVTNS (vector) Page 988

Operation

CheckFPAdvSIMDEnabled64();
bits(datasize) operand = V[n];

bits(esize) element;
FPCRType fpcr = FPCR[];
boolean merge = elements == 1 && IsMerging(fpcr);
bits(128) result = if merge then V[d] else Zeros();

for e = 0 to elements-1
element = Elem[operand, e, esize];
Elem[result, e, esize] = FPToFixed(element, 0, unsigned, fpcr, rounding);

V[d] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

FCVTNS (vector) Page 989

FCVTNU (scalar)

Floating-point Convert to Unsigned integer, rounding to nearest with ties to even (scalar). This instruction converts
the floating-point value in the SIMD&FP source register to a 32-bit or 64-bit unsigned integer using the Round to
Nearest rounding mode, and writes the result to the general-purpose destination register.
A floating-point exception can be generated by this instruction. Depending on the settings in FPCR, the exception
results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see
Floating-point exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 0 0 1 1 1 1 0 ftype 1 0 0 0 0 1 0 0 0 0 0 0 Rn Rd
rmode opcode

Half-precision to 32-bit (sf == 0 && ftype == 11)

(FEAT_FP16)

FCVTNU <Wd>, <Hn>

Half-precision to 64-bit (sf == 1 && ftype == 11)

(FEAT_FP16)

FCVTNU <Xd>, <Hn>

Single-precision to 32-bit (sf == 0 && ftype == 00)

FCVTNU <Wd>, <Sn>

Single-precision to 64-bit (sf == 1 && ftype == 00)

FCVTNU <Xd>, <Sn>

Double-precision to 32-bit (sf == 0 && ftype == 01)

FCVTNU <Wd>, <Dn>

Double-precision to 64-bit (sf == 1 && ftype == 01)

FCVTNU <Xd>, <Dn>

integer d = UInt(Rd);
integer n = UInt(Rn);

integer intsize = if sf == '1' then 64 else 32;

integer fltsize;
FPRounding rounding;

case ftype of
when '00'
fltsize = 32;
when '01'
fltsize = 64;
when '10'
UNDEFINED;
when '11'
if HaveFP16Ext() then
fltsize = 16;
else
UNDEFINED;

rounding = FPDecodeRounding(rmode);

FCVTNU (scalar) Page 990

Assembler Symbols

Operation

CheckFPAdvSIMDEnabled64();

FPCRType fpcr = FPCR[];

bits(fltsize) fltval;
bits(intsize) intval;

fltval = V[n];
intval = FPToFixed(fltval, 0, TRUE, fpcr, rounding);
X[d] = intval;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

FCVTNU (scalar) Page 991

FCVTNU (vector)

Floating-point Convert to Unsigned integer, rounding to nearest with ties to even (vector). This instruction converts a
scalar or each element in a vector from a floating-point value to an unsigned integer value using the Round to Nearest
rounding mode, and writes the result to the SIMD&FP destination register.
A floating-point exception can be generated by this instruction. Depending on the settings in FPCR, the exception
results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see
Floating-point exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the Security state and
Exception level in which the instruction is executed, an attempt to execute the instruction might be trapped.
It has encodings from 4 classes: Scalar half precision , Scalar single-precision and double-precision , Vector half
precision and Vector single-precision and double-precision

Scalar half precision

(FEAT_FP16)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 1 1 0 0 1 1 1 1 0 0 1 1 0 1 0 1 0 Rn Rd
U o2 o1

FCVTNU <Hd>, <Hn>

if !HaveFP16Ext() then UNDEFINED;

integer d = UInt(Rd);
integer n = UInt(Rn);

integer esize = 16;

integer datasize = esize;
integer elements = 1;

FPRounding rounding = FPDecodeRounding(o1:o2);

boolean unsigned = (U == '1');

Scalar single-precision and double-precision

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 1 1 0 0 sz 1 0 0 0 0 1 1 0 1 0 1 0 Rn Rd
U o2 o1

FCVTNU <V><d>, <V><n>

integer d = UInt(Rd);
integer n = UInt(Rn);

integer esize = 32 << UInt(sz);

integer datasize = esize;
integer elements = 1;

FPRounding rounding = FPDecodeRounding(o1:o2);

boolean unsigned = (U == '1');

Vector half precision

(FEAT_FP16)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 0 1 1 1 1 0 0 1 1 0 1 0 1 0 Rn Rd
U o2 o1

FCVTNU (vector) Page 992

FCVTNU <Vd>.<T>, <Vn>.<T>

if !HaveFP16Ext() then UNDEFINED;

integer d = UInt(Rd);
integer n = UInt(Rn);

integer esize = 16;

integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;

FPRounding rounding = FPDecodeRounding(o1:o2);

boolean unsigned = (U == '1');

Vector single-precision and double-precision

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 0 sz 1 0 0 0 0 1 1 0 1 0 1 0 Rn Rd
U o2 o1

FCVTNU <Vd>.<T>, <Vn>.<T>

integer d = UInt(Rd);
integer n = UInt(Rn);

if sz:Q == '10' then UNDEFINED;

integer esize = 32 << UInt(sz);
integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;

FPRounding rounding = FPDecodeRounding(o1:o2);

boolean unsigned = (U == '1');

Assembler Symbols

<Hd> Is the 16-bit name of the SIMD&FP destination register, encoded in the "Rd" field.
<Hn> Is the 16-bit name of the SIMD&FP source register, encoded in the "Rn" field.

<V> Is a width specifier, encoded in “sz”:

sz <V>
0 S
1 D

<T> For the half-precision variant: is an arrangement specifier, encoded in “Q”:

Q <T>
0 4H
1 8H

For the single-precision and double-precision variant: is an arrangement specifier, encoded in “sz:Q”:

sz Q <T>
0 0 2S
0 1 4S
1 0 RESERVED
1 1 2D

<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.

FCVTNU (vector) Page 993

Operation

CheckFPAdvSIMDEnabled64();
bits(datasize) operand = V[n];

bits(esize) element;
FPCRType fpcr = FPCR[];
boolean merge = elements == 1 && IsMerging(fpcr);
bits(128) result = if merge then V[d] else Zeros();

for e = 0 to elements-1
element = Elem[operand, e, esize];
Elem[result, e, esize] = FPToFixed(element, 0, unsigned, fpcr, rounding);

V[d] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

FCVTNU (vector) Page 994

FCVTPS (scalar)

Floating-point Convert to Signed integer, rounding toward Plus infinity (scalar). This instruction converts the floating-
point value in the SIMD&FP source register to a 32-bit or 64-bit signed integer using the Round towards Plus Infinity
rounding mode, and writes the result to the general-purpose destination register.
A floating-point exception can be generated by this instruction. Depending on the settings in FPCR, the exception
results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see
Floating-point exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 0 0 1 1 1 1 0 ftype 1 0 1 0 0 0 0 0 0 0 0 0 Rn Rd
rmode opcode

Half-precision to 32-bit (sf == 0 && ftype == 11)

(FEAT_FP16)

FCVTPS <Wd>, <Hn>

Half-precision to 64-bit (sf == 1 && ftype == 11)

(FEAT_FP16)

FCVTPS <Xd>, <Hn>

Single-precision to 32-bit (sf == 0 && ftype == 00)

FCVTPS <Wd>, <Sn>

Single-precision to 64-bit (sf == 1 && ftype == 00)

FCVTPS <Xd>, <Sn>

Double-precision to 32-bit (sf == 0 && ftype == 01)

FCVTPS <Wd>, <Dn>

Double-precision to 64-bit (sf == 1 && ftype == 01)

FCVTPS <Xd>, <Dn>

integer d = UInt(Rd);
integer n = UInt(Rn);

integer intsize = if sf == '1' then 64 else 32;

integer fltsize;
FPRounding rounding;

case ftype of
when '00'
fltsize = 32;
when '01'
fltsize = 64;
when '10'
UNDEFINED;
when '11'
if HaveFP16Ext() then
fltsize = 16;
else
UNDEFINED;

rounding = FPDecodeRounding(rmode);

FCVTPS (scalar) Page 995

Assembler Symbols

Operation

CheckFPAdvSIMDEnabled64();

FPCRType fpcr = FPCR[];

bits(fltsize) fltval;
bits(intsize) intval;

fltval = V[n];
intval = FPToFixed(fltval, 0, FALSE, fpcr, rounding);
X[d] = intval;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

FCVTPS (scalar) Page 996

FCVTPS (vector)

Floating-point Convert to Signed integer, rounding toward Plus infinity (vector). This instruction converts a scalar or
each element in a vector from a floating-point value to a signed integer value using the Round towards Plus Infinity
rounding mode, and writes the result to the SIMD&FP destination register.
A floating-point exception can be generated by this instruction. Depending on the settings in FPCR, the exception
results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see
Floating-point exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the Security state and
Exception level in which the instruction is executed, an attempt to execute the instruction might be trapped.
It has encodings from 4 classes: Scalar half precision , Scalar single-precision and double-precision , Vector half
precision and Vector single-precision and double-precision

Scalar half precision

(FEAT_FP16)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 0 1 1 1 1 1 0 0 1 1 0 1 0 1 0 Rn Rd
U o2 o1

FCVTPS <Hd>, <Hn>

if !HaveFP16Ext() then UNDEFINED;

integer d = UInt(Rd);
integer n = UInt(Rn);

integer esize = 16;

integer datasize = esize;
integer elements = 1;

FPRounding rounding = FPDecodeRounding(o1:o2);

boolean unsigned = (U == '1');

Scalar single-precision and double-precision

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 0 1 sz 1 0 0 0 0 1 1 0 1 0 1 0 Rn Rd
U o2 o1

FCVTPS <V><d>, <V><n>

integer d = UInt(Rd);
integer n = UInt(Rn);

integer esize = 32 << UInt(sz);

integer datasize = esize;
integer elements = 1;

FPRounding rounding = FPDecodeRounding(o1:o2);

boolean unsigned = (U == '1');

Vector half precision

(FEAT_FP16)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 1 1 1 1 1 0 0 1 1 0 1 0 1 0 Rn Rd
U o2 o1

FCVTPS (vector) Page 997

FCVTPS <Vd>.<T>, <Vn>.<T>

if !HaveFP16Ext() then UNDEFINED;

integer d = UInt(Rd);
integer n = UInt(Rn);

integer esize = 16;

integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;

FPRounding rounding = FPDecodeRounding(o1:o2);

boolean unsigned = (U == '1');

Vector single-precision and double-precision

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 1 sz 1 0 0 0 0 1 1 0 1 0 1 0 Rn Rd
U o2 o1

FCVTPS <Vd>.<T>, <Vn>.<T>

integer d = UInt(Rd);
integer n = UInt(Rn);

if sz:Q == '10' then UNDEFINED;

integer esize = 32 << UInt(sz);
integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;

FPRounding rounding = FPDecodeRounding(o1:o2);

boolean unsigned = (U == '1');

Assembler Symbols

<Hd> Is the 16-bit name of the SIMD&FP destination register, encoded in the "Rd" field.
<Hn> Is the 16-bit name of the SIMD&FP source register, encoded in the "Rn" field.

<V> Is a width specifier, encoded in “sz”:

sz <V>
0 S
1 D

<T> For the half-precision variant: is an arrangement specifier, encoded in “Q”:

Q <T>
0 4H
1 8H

For the single-precision and double-precision variant: is an arrangement specifier, encoded in “sz:Q”:

sz Q <T>
0 0 2S
0 1 4S
1 0 RESERVED
1 1 2D

<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.

FCVTPS (vector) Page 998

Operation

CheckFPAdvSIMDEnabled64();
bits(datasize) operand = V[n];

bits(esize) element;
FPCRType fpcr = FPCR[];
boolean merge = elements == 1 && IsMerging(fpcr);
bits(128) result = if merge then V[d] else Zeros();

for e = 0 to elements-1
element = Elem[operand, e, esize];
Elem[result, e, esize] = FPToFixed(element, 0, unsigned, fpcr, rounding);

V[d] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

FCVTPS (vector) Page 999

FCVTPU (scalar)

Floating-point Convert to Unsigned integer, rounding toward Plus infinity (scalar). This instruction converts the
floating-point value in the SIMD&FP source register to a 32-bit or 64-bit unsigned integer using the Round towards
Plus Infinity rounding mode, and writes the result to the general-purpose destination register.
A floating-point exception can be generated by this instruction. Depending on the settings in FPCR, the exception
results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see
Floating-point exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 0 0 1 1 1 1 0 ftype 1 0 1 0 0 1 0 0 0 0 0 0 Rn Rd
rmode opcode

Half-precision to 32-bit (sf == 0 && ftype == 11)

(FEAT_FP16)

FCVTPU <Wd>, <Hn>

Half-precision to 64-bit (sf == 1 && ftype == 11)

(FEAT_FP16)

FCVTPU <Xd>, <Hn>

Single-precision to 32-bit (sf == 0 && ftype == 00)

FCVTPU <Wd>, <Sn>

Single-precision to 64-bit (sf == 1 && ftype == 00)

FCVTPU <Xd>, <Sn>

Double-precision to 32-bit (sf == 0 && ftype == 01)

FCVTPU <Wd>, <Dn>

Double-precision to 64-bit (sf == 1 && ftype == 01)

FCVTPU <Xd>, <Dn>

integer d = UInt(Rd);
integer n = UInt(Rn);

integer intsize = if sf == '1' then 64 else 32;

integer fltsize;
FPRounding rounding;

case ftype of
when '00'
fltsize = 32;
when '01'
fltsize = 64;
when '10'
UNDEFINED;
when '11'
if HaveFP16Ext() then
fltsize = 16;
else
UNDEFINED;

rounding = FPDecodeRounding(rmode);

FCVTPU (scalar) Page 1000

Assembler Symbols

Operation

CheckFPAdvSIMDEnabled64();

FPCRType fpcr = FPCR[];

bits(fltsize) fltval;
bits(intsize) intval;

fltval = V[n];
intval = FPToFixed(fltval, 0, TRUE, fpcr, rounding);
X[d] = intval;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

FCVTPU (scalar) Page 1001

FCVTPU (vector)

Floating-point Convert to Unsigned integer, rounding toward Plus infinity (vector). This instruction converts a scalar or
each element in a vector from a floating-point value to an unsigned integer value using the Round towards Plus Infinity
rounding mode, and writes the result to the SIMD&FP destination register.
A floating-point exception can be generated by this instruction. Depending on the settings in FPCR, the exception
results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see
Floating-point exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the Security state and
Exception level in which the instruction is executed, an attempt to execute the instruction might be trapped.
It has encodings from 4 classes: Scalar half precision , Scalar single-precision and double-precision , Vector half
precision and Vector single-precision and double-precision

Scalar half precision

(FEAT_FP16)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 1 1 0 1 1 1 1 1 0 0 1 1 0 1 0 1 0 Rn Rd
U o2 o1

FCVTPU <Hd>, <Hn>

if !HaveFP16Ext() then UNDEFINED;

integer d = UInt(Rd);
integer n = UInt(Rn);

integer esize = 16;

integer datasize = esize;
integer elements = 1;

FPRounding rounding = FPDecodeRounding(o1:o2);

boolean unsigned = (U == '1');

Scalar single-precision and double-precision

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 1 1 0 1 sz 1 0 0 0 0 1 1 0 1 0 1 0 Rn Rd
U o2 o1

FCVTPU <V><d>, <V><n>

integer d = UInt(Rd);
integer n = UInt(Rn);

integer esize = 32 << UInt(sz);

integer datasize = esize;
integer elements = 1;

FPRounding rounding = FPDecodeRounding(o1:o2);

boolean unsigned = (U == '1');

Vector half precision

(FEAT_FP16)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 1 1 1 1 1 0 0 1 1 0 1 0 1 0 Rn Rd
U o2 o1

FCVTPU (vector) Page 1002

FCVTPU <Vd>.<T>, <Vn>.<T>

if !HaveFP16Ext() then UNDEFINED;

integer d = UInt(Rd);
integer n = UInt(Rn);

integer esize = 16;

integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;

FPRounding rounding = FPDecodeRounding(o1:o2);

boolean unsigned = (U == '1');

Vector single-precision and double-precision

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 1 sz 1 0 0 0 0 1 1 0 1 0 1 0 Rn Rd
U o2 o1

FCVTPU <Vd>.<T>, <Vn>.<T>

integer d = UInt(Rd);
integer n = UInt(Rn);

if sz:Q == '10' then UNDEFINED;

integer esize = 32 << UInt(sz);
integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;

FPRounding rounding = FPDecodeRounding(o1:o2);

boolean unsigned = (U == '1');

Assembler Symbols

<Hd> Is the 16-bit name of the SIMD&FP destination register, encoded in the "Rd" field.
<Hn> Is the 16-bit name of the SIMD&FP source register, encoded in the "Rn" field.

<V> Is a width specifier, encoded in “sz”:

sz <V>
0 S
1 D

<T> For the half-precision variant: is an arrangement specifier, encoded in “Q”:

Q <T>
0 4H
1 8H

For the single-precision and double-precision variant: is an arrangement specifier, encoded in “sz:Q”:

sz Q <T>
0 0 2S
0 1 4S
1 0 RESERVED
1 1 2D

<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.

FCVTPU (vector) Page 1003

Operation

CheckFPAdvSIMDEnabled64();
bits(datasize) operand = V[n];

bits(esize) element;
FPCRType fpcr = FPCR[];
boolean merge = elements == 1 && IsMerging(fpcr);
bits(128) result = if merge then V[d] else Zeros();

for e = 0 to elements-1
element = Elem[operand, e, esize];
Elem[result, e, esize] = FPToFixed(element, 0, unsigned, fpcr, rounding);

V[d] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

FCVTPU (vector) Page 1004

FCVTXN, FCVTXN2

Floating-point Convert to lower precision Narrow, rounding to odd (vector). This instruction reads each vector element
in the source SIMD&FP register, narrows each value to half the precision of the source element using the Round to
Odd rounding mode, writes the result to a vector, and writes the vector to the destination SIMD&FP register.

Note

This instruction uses the Round to Odd rounding mode which is not defined by the IEEE 754-2008 standard. This
rounding mode ensures that if the result of the conversion is inexact the least significant bit of the mantissa is
forced to 1. This rounding mode enables a floating-point value to be converted to a lower precision format via an
intermediate precision format while avoiding double rounding errors. For example, a 64-bit floating-point value
can be converted to a correctly rounded 16-bit floating-point value by first using this instruction to produce a
32-bit value and then using another instruction with the wanted rounding mode to convert the 32-bit value to the
final 16-bit floating-point value.
The FCVTXN instruction writes the vector to the lower half of the destination register and clears the upper half, while
the FCVTXN2 instruction writes the vector to the upper half of the destination register without affecting the other bits
of the register.
This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in
either a flag being set in FPSR or a synchronous exception being generated. For more information, see Floating-point
exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: Scalar and Vector

Scalar

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 1 1 0 0 sz 1 0 0 0 0 1 0 1 1 0 1 0 Rn Rd

FCVTXN <Vb><d>, <Va><n>

integer d = UInt(Rd);
integer n = UInt(Rn);

if sz == '0' then UNDEFINED;

integer esize = 32;
integer datasize = esize;
integer elements = 1;
integer part = 0;

Vector

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 0 sz 1 0 0 0 0 1 0 1 1 0 1 0 Rn Rd

FCVTXN{2} <Vd>.<Tb>, <Vn>.<Ta>

integer d = UInt(Rd);
integer n = UInt(Rn);

if sz == '0' then UNDEFINED;

integer esize = 32;
integer datasize = 64;
integer elements = 2;
integer part = UInt(Q);

FCVTXN, FCVTXN2 Page 1005

Assembler Symbols

2 Is the second and upper half specifier. If present it causes the operation to be performed on the upper
64 bits of the registers holding the narrower elements, and is encoded in “Q”:

Q 2
0 [absent]
1 [present]

<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.

<Tb> Is an arrangement specifier, encoded in “sz:Q”:

sz Q <Tb>
0 x RESERVED
1 0 2S
1 1 4S

<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.

<Ta> Is an arrangement specifier, encoded in “sz”:

sz <Ta>
0 RESERVED
1 2D

<Vb> Is the destination width specifier, encoded in “sz”:

sz <Vb>
0 RESERVED
1 S

<d> Is the number of the SIMD&FP destination register, encoded in the "Rd" field.

<Va> Is the source width specifier, encoded in “sz”:

sz <Va>
0 RESERVED
1 D

<n> Is the number of the SIMD&FP source register, encoded in the "Rn" field.

Operation

CheckFPAdvSIMDEnabled64();

bits(2*datasize) operand = V[n];

FPCRType fpcr = FPCR[];
boolean merge = elements == 1 && IsMerging(fpcr);
bits(128) result = if merge then V[d] else Zeros();

for e = 0 to elements-1
Elem[result, e, esize] = FPConvert(Elem[operand, e, 2*esize], fpcr, FPRounding_ODD);

if merge then
V[d] = result;
else
Vpart[d, part] = Elem[result, 0, datasize];

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

FCVTXN, FCVTXN2 Page 1006

FCVTZS (scalar, fixed-point)

Floating-point Convert to Signed fixed-point, rounding toward Zero (scalar). This instruction converts the floating-
point value in the SIMD&FP source register to a 32-bit or 64-bit fixed-point signed integer using the Round towards
Zero rounding mode, and writes the result to the general-purpose destination register.
A floating-point exception can be generated by this instruction. Depending on the settings in FPCR, the exception
results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see
Floating-point exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the Security state and
Exception level in which the instruction is executed, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 0 0 1 1 1 1 0 ftype 0 1 1 0 0 0 scale Rn Rd
rmode opcode

Half-precision to 32-bit (sf == 0 && ftype == 11)

(FEAT_FP16)

FCVTZS <Wd>, <Hn>, #<fbits>

Half-precision to 64-bit (sf == 1 && ftype == 11)

(FEAT_FP16)

FCVTZS <Xd>, <Hn>, #<fbits>

Single-precision to 32-bit (sf == 0 && ftype == 00)

FCVTZS <Wd>, <Sn>, #<fbits>

Single-precision to 64-bit (sf == 1 && ftype == 00)

FCVTZS <Xd>, <Sn>, #<fbits>

Double-precision to 32-bit (sf == 0 && ftype == 01)

FCVTZS <Wd>, <Dn>, #<fbits>

Double-precision to 64-bit (sf == 1 && ftype == 01)

FCVTZS <Xd>, <Dn>, #<fbits>

integer d = UInt(Rd);
integer n = UInt(Rn);

integer intsize = if sf == '1' then 64 else 32;

integer fltsize;

case ftype of
when '00' fltsize = 32;
when '01' fltsize = 64;
when '10' UNDEFINED;
when '11'
if HaveFP16Ext() then
fltsize = 16;
else
UNDEFINED;

if sf == '0' && scale<5> == '0' then UNDEFINED;

integer fracbits = 64 - UInt(scale);

FCVTZS (scalar, fixed-point) Page 1007

Assembler Symbols

<Wd> Is the 32-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Xd> Is the 64-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Sn> Is the 32-bit name of the SIMD&FP source register, encoded in the "Rn" field.
<Hn> Is the 16-bit name of the SIMD&FP source register, encoded in the "Rn" field.
<Dn> Is the 64-bit name of the SIMD&FP source register, encoded in the "Rn" field.
<fbits> For the double-precision to 32-bit, half-precision to 32-bit and single-precision to 32-bit variant: is the
number of bits after the binary point in the fixed-point destination, in the range 1 to 32, encoded as 64
minus "scale".
For the double-precision to 64-bit, half-precision to 64-bit and single-precision to 64-bit variant: is the
number of bits after the binary point in the fixed-point destination, in the range 1 to 64, encoded as 64
minus "scale".

Operation

CheckFPAdvSIMDEnabled64();

FPCRType fpcr = FPCR[];

bits(fltsize) fltval;
bits(intsize) intval;

fltval = V[n];
intval = FPToFixed(fltval, fracbits, FALSE, fpcr, FPRounding_ZERO);
X[d] = intval;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

FCVTZS (scalar, fixed-point) Page 1008

FCVTZS (scalar, integer)

Floating-point Convert to Signed integer, rounding toward Zero (scalar). This instruction converts the floating-point
value in the SIMD&FP source register to a 32-bit or 64-bit signed integer using the Round towards Zero rounding
mode, and writes the result to the general-purpose destination register.
A floating-point exception can be generated by this instruction. Depending on the settings in FPCR, the exception
results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see
Floating-point exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 0 0 1 1 1 1 0 ftype 1 1 1 0 0 0 0 0 0 0 0 0 Rn Rd
rmode opcode

Half-precision to 32-bit (sf == 0 && ftype == 11)

(FEAT_FP16)

FCVTZS <Wd>, <Hn>

Half-precision to 64-bit (sf == 1 && ftype == 11)

(FEAT_FP16)

FCVTZS <Xd>, <Hn>

Single-precision to 32-bit (sf == 0 && ftype == 00)

FCVTZS <Wd>, <Sn>

Single-precision to 64-bit (sf == 1 && ftype == 00)

FCVTZS <Xd>, <Sn>

Double-precision to 32-bit (sf == 0 && ftype == 01)

FCVTZS <Wd>, <Dn>

Double-precision to 64-bit (sf == 1 && ftype == 01)

FCVTZS <Xd>, <Dn>

integer d = UInt(Rd);
integer n = UInt(Rn);

integer intsize = if sf == '1' then 64 else 32;

integer fltsize;
FPRounding rounding;

case ftype of
when '00'
fltsize = 32;
when '01'
fltsize = 64;
when '10'
UNDEFINED;
when '11'
if HaveFP16Ext() then
fltsize = 16;
else
UNDEFINED;

rounding = FPDecodeRounding(rmode);

FCVTZS (scalar, integer) Page 1009

Assembler Symbols

Operation

CheckFPAdvSIMDEnabled64();

FPCRType fpcr = FPCR[];

bits(fltsize) fltval;
bits(intsize) intval;

fltval = V[n];
intval = FPToFixed(fltval, 0, FALSE, fpcr, rounding);
X[d] = intval;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

FCVTZS (scalar, integer) Page 1010

FCVTZS (vector, fixed-point)

Floating-point Convert to Signed fixed-point, rounding toward Zero (vector). This instruction converts a scalar or each
element in a vector from floating-point to fixed-point signed integer using the Round towards Zero rounding mode, and
writes the result to the SIMD&FP destination register.
A floating-point exception can be generated by this instruction. Depending on the settings in FPCR, the exception
results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see
Floating-point exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the Security state and
Exception level in which the instruction is executed, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: Scalar and Vector

Scalar

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 1 0 != 0000 immb 1 1 1 1 1 1 Rn Rd
U immh

FCVTZS <V><d>, <V><n>, #<fbits>

integer d = UInt(Rd);
integer n = UInt(Rn);

if immh == '000x' || (immh == '001x' && !HaveFP16Ext()) then UNDEFINED;

integer esize = if immh == '1xxx' then 64 else if immh == '01xx' then 32 else 16;
integer datasize = esize;
integer elements = 1;

integer fracbits = (esize * 2) - UInt(immh:immb);

boolean unsigned = (U == '1');
FPRounding rounding = FPRounding_ZERO;

Vector

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 1 0 != 0000 immb 1 1 1 1 1 1 Rn Rd
U immh

FCVTZS <Vd>.<T>, <Vn>.<T>, #<fbits>

integer d = UInt(Rd);
integer n = UInt(Rn);

if immh == '0000' then SEE(asimdimm);

if immh == '000x' || (immh == '001x' && !HaveFP16Ext()) then UNDEFINED;
if immh<3>:Q == '10' then UNDEFINED;
integer esize = if immh == '1xxx' then 64 else if immh == '01xx' then 32 else 16;
integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;

integer fracbits = (esize * 2) - UInt(immh:immb);

boolean unsigned = (U == '1');
FPRounding rounding = FPRounding_ZERO;

Assembler Symbols

<V> Is a width specifier, encoded in “immh”:

FCVTZS (vector, fixed-point) Page 1011

immh <V>
000x RESERVED
001x H
01xx S
1xxx D

<d> Is the number of the SIMD&FP destination register, in the "Rd" field.
<n> Is the number of the first SIMD&FP source register, encoded in the "Rn" field.
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.

<T> Is an arrangement specifier, encoded in “immh:Q”:

immh Q <T>
0000 x SEE Advanced SIMD modified immediate
0001 x RESERVED
001x 0 4H
001x 1 8H
01xx 0 2S
01xx 1 4S
1xxx 0 RESERVED
1xxx 1 2D

<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.

<fbits> For the scalar variant: is the number of fractional bits, in the range 1 to the operand width, encoded in
“immh:immb”:

immh <fbits>
000x RESERVED
001x (32-Uint(immh:immb))
01xx (64-UInt(immh:immb))
1xxx (128-UInt(immh:immb))

For the vector variant: is the number of fractional bits, in the range 1 to the element width, encoded in
“immh:immb”:

immh <fbits>
0000 SEE Advanced SIMD modified immediate
0001 RESERVED
001x (32-Uint(immh:immb))
01xx (64-UInt(immh:immb))
1xxx (128-UInt(immh:immb))

Operation

CheckFPAdvSIMDEnabled64();
bits(datasize) operand = V[n];

bits(esize) element;
FPCRType fpcr = FPCR[];
boolean merge = elements == 1 && IsMerging(fpcr);
bits(128) result = if merge then V[d] else Zeros();
for e = 0 to elements-1
element = Elem[operand, e, esize];
Elem[result, e, esize] = FPToFixed(element, fracbits, unsigned, fpcr, rounding);

V[d] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

FCVTZS (vector, fixed-point) Page 1012

FCVTZS (vector, integer)

Floating-point Convert to Signed integer, rounding toward Zero (vector). This instruction converts a scalar or each
element in a vector from a floating-point value to a signed integer value using the Round towards Zero rounding mode,
and writes the result to the SIMD&FP destination register.
A floating-point exception can be generated by this instruction. Depending on the settings in FPCR, the exception
results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see
Floating-point exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the Security state and
Exception level in which the instruction is executed, an attempt to execute the instruction might be trapped.
It has encodings from 4 classes: Scalar half precision , Scalar single-precision and double-precision , Vector half
precision and Vector single-precision and double-precision

Scalar half precision

(FEAT_FP16)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 0 1 1 1 1 1 0 0 1 1 0 1 1 1 0 Rn Rd
U o2 o1

FCVTZS <Hd>, <Hn>

if !HaveFP16Ext() then UNDEFINED;

integer d = UInt(Rd);
integer n = UInt(Rn);

integer esize = 16;

integer datasize = esize;
integer elements = 1;

FPRounding rounding = FPDecodeRounding(o1:o2);

boolean unsigned = (U == '1');

Scalar single-precision and double-precision

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 0 1 sz 1 0 0 0 0 1 1 0 1 1 1 0 Rn Rd
U o2 o1

FCVTZS <V><d>, <V><n>

integer d = UInt(Rd);
integer n = UInt(Rn);

integer esize = 32 << UInt(sz);

integer datasize = esize;
integer elements = 1;

FPRounding rounding = FPDecodeRounding(o1:o2);

boolean unsigned = (U == '1');

Vector half precision

(FEAT_FP16)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 1 1 1 1 1 0 0 1 1 0 1 1 1 0 Rn Rd
U o2 o1

FCVTZS (vector, integer) Page 1013

FCVTZS <Vd>.<T>, <Vn>.<T>

if !HaveFP16Ext() then UNDEFINED;

integer d = UInt(Rd);
integer n = UInt(Rn);

integer esize = 16;

integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;

FPRounding rounding = FPDecodeRounding(o1:o2);

boolean unsigned = (U == '1');

Vector single-precision and double-precision

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 1 sz 1 0 0 0 0 1 1 0 1 1 1 0 Rn Rd
U o2 o1

FCVTZS <Vd>.<T>, <Vn>.<T>

integer d = UInt(Rd);
integer n = UInt(Rn);

if sz:Q == '10' then UNDEFINED;

integer esize = 32 << UInt(sz);
integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;

FPRounding rounding = FPDecodeRounding(o1:o2);

boolean unsigned = (U == '1');

Assembler Symbols

<Hd> Is the 16-bit name of the SIMD&FP destination register, encoded in the "Rd" field.
<Hn> Is the 16-bit name of the SIMD&FP source register, encoded in the "Rn" field.

<V> Is a width specifier, encoded in “sz”:

sz <V>
0 S
1 D

<T> For the half-precision variant: is an arrangement specifier, encoded in “Q”:

Q <T>
0 4H
1 8H

For the single-precision and double-precision variant: is an arrangement specifier, encoded in “sz:Q”:

sz Q <T>
0 0 2S
0 1 4S
1 0 RESERVED
1 1 2D

<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.

FCVTZS (vector, integer) Page 1014

Operation

CheckFPAdvSIMDEnabled64();
bits(datasize) operand = V[n];

bits(esize) element;
FPCRType fpcr = FPCR[];
boolean merge = elements == 1 && IsMerging(fpcr);
bits(128) result = if merge then V[d] else Zeros();

for e = 0 to elements-1
element = Elem[operand, e, esize];
Elem[result, e, esize] = FPToFixed(element, 0, unsigned, fpcr, rounding);

V[d] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

FCVTZS (vector, integer) Page 1015

FCVTZU (scalar, fixed-point)

Floating-point Convert to Unsigned fixed-point, rounding toward Zero (scalar). This instruction converts the floating-
point value in the SIMD&FP source register to a 32-bit or 64-bit fixed-point unsigned integer using the Round towards
Zero rounding mode, and writes the result to the general-purpose destination register.
A floating-point exception can be generated by this instruction. Depending on the settings in FPCR, the exception
results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see
Floating-point exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the Security state and
Exception level in which the instruction is executed, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 0 0 1 1 1 1 0 ftype 0 1 1 0 0 1 scale Rn Rd
rmode opcode

Half-precision to 32-bit (sf == 0 && ftype == 11)

(FEAT_FP16)

FCVTZU <Wd>, <Hn>, #<fbits>

Half-precision to 64-bit (sf == 1 && ftype == 11)

(FEAT_FP16)

FCVTZU <Xd>, <Hn>, #<fbits>

Single-precision to 32-bit (sf == 0 && ftype == 00)

FCVTZU <Wd>, <Sn>, #<fbits>

Single-precision to 64-bit (sf == 1 && ftype == 00)

FCVTZU <Xd>, <Sn>, #<fbits>

Double-precision to 32-bit (sf == 0 && ftype == 01)

FCVTZU <Wd>, <Dn>, #<fbits>

Double-precision to 64-bit (sf == 1 && ftype == 01)

FCVTZU <Xd>, <Dn>, #<fbits>

integer d = UInt(Rd);
integer n = UInt(Rn);

integer intsize = if sf == '1' then 64 else 32;

integer fltsize;

case ftype of
when '00' fltsize = 32;
when '01' fltsize = 64;
when '10' UNDEFINED;
when '11'
if HaveFP16Ext() then
fltsize = 16;
else
UNDEFINED;

if sf == '0' && scale<5> == '0' then UNDEFINED;

integer fracbits = 64 - UInt(scale);

FCVTZU (scalar, fixed-point) Page 1016

Assembler Symbols

<Wd> Is the 32-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Xd> Is the 64-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Sn> Is the 32-bit name of the SIMD&FP source register, encoded in the "Rn" field.
<Hn> Is the 16-bit name of the SIMD&FP source register, encoded in the "Rn" field.
<Dn> Is the 64-bit name of the SIMD&FP source register, encoded in the "Rn" field.
<fbits> For the double-precision to 32-bit, half-precision to 32-bit and single-precision to 32-bit variant: is the
number of bits after the binary point in the fixed-point destination, in the range 1 to 32, encoded as 64
minus "scale".
For the double-precision to 64-bit, half-precision to 64-bit and single-precision to 64-bit variant: is the
number of bits after the binary point in the fixed-point destination, in the range 1 to 64, encoded as 64
minus "scale".

Operation

CheckFPAdvSIMDEnabled64();

FPCRType fpcr = FPCR[];

bits(fltsize) fltval;
bits(intsize) intval;

fltval = V[n];
intval = FPToFixed(fltval, fracbits, TRUE, fpcr, FPRounding_ZERO);
X[d] = intval;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

FCVTZU (scalar, fixed-point) Page 1017

FCVTZU (scalar, integer)

Floating-point Convert to Unsigned integer, rounding toward Zero (scalar). This instruction converts the floating-point
value in the SIMD&FP source register to a 32-bit or 64-bit unsigned integer using the Round towards Zero rounding
mode, and writes the result to the general-purpose destination register.
A floating-point exception can be generated by this instruction. Depending on the settings in FPCR, the exception
results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see
Floating-point exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 0 0 1 1 1 1 0 ftype 1 1 1 0 0 1 0 0 0 0 0 0 Rn Rd
rmode opcode

Half-precision to 32-bit (sf == 0 && ftype == 11)

(FEAT_FP16)

FCVTZU <Wd>, <Hn>

Half-precision to 64-bit (sf == 1 && ftype == 11)

(FEAT_FP16)

FCVTZU <Xd>, <Hn>

Single-precision to 32-bit (sf == 0 && ftype == 00)

FCVTZU <Wd>, <Sn>

Single-precision to 64-bit (sf == 1 && ftype == 00)

FCVTZU <Xd>, <Sn>

Double-precision to 32-bit (sf == 0 && ftype == 01)

FCVTZU <Wd>, <Dn>

Double-precision to 64-bit (sf == 1 && ftype == 01)

FCVTZU <Xd>, <Dn>

integer d = UInt(Rd);
integer n = UInt(Rn);

integer intsize = if sf == '1' then 64 else 32;

integer fltsize;
FPRounding rounding;

case ftype of
when '00'
fltsize = 32;
when '01'
fltsize = 64;
when '10'
UNDEFINED;
when '11'
if HaveFP16Ext() then
fltsize = 16;
else
UNDEFINED;

rounding = FPDecodeRounding(rmode);

FCVTZU (scalar, integer) Page 1018

Assembler Symbols

Operation

CheckFPAdvSIMDEnabled64();

FPCRType fpcr = FPCR[];

bits(fltsize) fltval;
bits(intsize) intval;

fltval = V[n];
intval = FPToFixed(fltval, 0, TRUE, fpcr, rounding);
X[d] = intval;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

FCVTZU (scalar, integer) Page 1019

FCVTZU (vector, fixed-point)

Floating-point Convert to Unsigned fixed-point, rounding toward Zero (vector). This instruction converts a scalar or
each element in a vector from floating-point to fixed-point unsigned integer using the Round towards Zero rounding
mode, and writes the result to the general-purpose destination register.
A floating-point exception can be generated by this instruction. Depending on the settings in FPCR, the exception
results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see
Floating-point exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the Security state and
Exception level in which the instruction is executed, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: Scalar and Vector

Scalar

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 1 1 1 0 != 0000 immb 1 1 1 1 1 1 Rn Rd
U immh

FCVTZU <V><d>, <V><n>, #<fbits>

integer d = UInt(Rd);
integer n = UInt(Rn);

if immh == '000x' || (immh == '001x' && !HaveFP16Ext()) then UNDEFINED;

integer esize = if immh == '1xxx' then 64 else if immh == '01xx' then 32 else 16;
integer datasize = esize;
integer elements = 1;

integer fracbits = (esize * 2) - UInt(immh:immb);

boolean unsigned = (U == '1');
FPRounding rounding = FPRounding_ZERO;

Vector

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 1 0 != 0000 immb 1 1 1 1 1 1 Rn Rd
U immh

FCVTZU <Vd>.<T>, <Vn>.<T>, #<fbits>

integer d = UInt(Rd);
integer n = UInt(Rn);

if immh == '0000' then SEE(asimdimm);

integer fracbits = (esize * 2) - UInt(immh:immb);

boolean unsigned = (U == '1');
FPRounding rounding = FPRounding_ZERO;

Assembler Symbols

<V> Is a width specifier, encoded in “immh”:

FCVTZU (vector, fixed-point) Page 1020

immh <V>
000x RESERVED
001x H
01xx S
1xxx D

<T> Is an arrangement specifier, encoded in “immh:Q”:

immh Q <T>
0000 x SEE Advanced SIMD modified immediate
0001 x RESERVED
001x 0 4H
001x 1 8H
01xx 0 2S
01xx 1 4S
1xxx 0 RESERVED
1xxx 1 2D

<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.

<fbits> For the scalar variant: is the number of fractional bits, in the range 1 to the operand width, encoded in
“immh:immb”:

immh <fbits>
000x RESERVED
001x (32-Uint(immh:immb))
01xx (64-UInt(immh:immb))
1xxx (128-UInt(immh:immb))

For the vector variant: is the number of fractional bits, in the range 1 to the element width, encoded in
“immh:immb”:

immh <fbits>
0000 SEE Advanced SIMD modified immediate
0001 RESERVED
001x (32-Uint(immh:immb))
01xx (64-UInt(immh:immb))
1xxx (128-UInt(immh:immb))

Operation

CheckFPAdvSIMDEnabled64();
bits(datasize) operand = V[n];

V[d] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

FCVTZU (vector, fixed-point) Page 1021

FCVTZU (vector, integer)

Floating-point Convert to Unsigned integer, rounding toward Zero (vector). This instruction converts a scalar or each
element in a vector from a floating-point value to an unsigned integer value using the Round towards Zero rounding
mode, and writes the result to the SIMD&FP destination register.
A floating-point exception can be generated by this instruction. Depending on the settings in FPCR, the exception
results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see
Floating-point exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the Security state and
Exception level in which the instruction is executed, an attempt to execute the instruction might be trapped.
It has encodings from 4 classes: Scalar half precision , Scalar single-precision and double-precision , Vector half
precision and Vector single-precision and double-precision

Scalar half precision

(FEAT_FP16)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 1 1 0 1 1 1 1 1 0 0 1 1 0 1 1 1 0 Rn Rd
U o2 o1

FCVTZU <Hd>, <Hn>

if !HaveFP16Ext() then UNDEFINED;

integer d = UInt(Rd);
integer n = UInt(Rn);

integer esize = 16;

integer datasize = esize;
integer elements = 1;

FPRounding rounding = FPDecodeRounding(o1:o2);

boolean unsigned = (U == '1');

Scalar single-precision and double-precision

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 1 1 0 1 sz 1 0 0 0 0 1 1 0 1 1 1 0 Rn Rd
U o2 o1

FCVTZU <V><d>, <V><n>

integer d = UInt(Rd);
integer n = UInt(Rn);

integer esize = 32 << UInt(sz);

integer datasize = esize;
integer elements = 1;

FPRounding rounding = FPDecodeRounding(o1:o2);

boolean unsigned = (U == '1');

Vector half precision

(FEAT_FP16)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 1 1 1 1 1 0 0 1 1 0 1 1 1 0 Rn Rd
U o2 o1

FCVTZU (vector, integer) Page 1022

FCVTZU <Vd>.<T>, <Vn>.<T>

if !HaveFP16Ext() then UNDEFINED;

integer d = UInt(Rd);
integer n = UInt(Rn);

integer esize = 16;

integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;

FPRounding rounding = FPDecodeRounding(o1:o2);

boolean unsigned = (U == '1');

Vector single-precision and double-precision

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 1 sz 1 0 0 0 0 1 1 0 1 1 1 0 Rn Rd
U o2 o1

FCVTZU <Vd>.<T>, <Vn>.<T>

integer d = UInt(Rd);
integer n = UInt(Rn);

if sz:Q == '10' then UNDEFINED;

integer esize = 32 << UInt(sz);
integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;

FPRounding rounding = FPDecodeRounding(o1:o2);

boolean unsigned = (U == '1');

Assembler Symbols

<Hd> Is the 16-bit name of the SIMD&FP destination register, encoded in the "Rd" field.
<Hn> Is the 16-bit name of the SIMD&FP source register, encoded in the "Rn" field.

<V> Is a width specifier, encoded in “sz”:

sz <V>
0 S
1 D

<T> For the half-precision variant: is an arrangement specifier, encoded in “Q”:

Q <T>
0 4H
1 8H

For the single-precision and double-precision variant: is an arrangement specifier, encoded in “sz:Q”:

sz Q <T>
0 0 2S
0 1 4S
1 0 RESERVED
1 1 2D

<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.

FCVTZU (vector, integer) Page 1023

Operation

CheckFPAdvSIMDEnabled64();
bits(datasize) operand = V[n];

bits(esize) element;
FPCRType fpcr = FPCR[];
boolean merge = elements == 1 && IsMerging(fpcr);
bits(128) result = if merge then V[d] else Zeros();

for e = 0 to elements-1
element = Elem[operand, e, esize];
Elem[result, e, esize] = FPToFixed(element, 0, unsigned, fpcr, rounding);

V[d] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

FCVTZU (vector, integer) Page 1024

FDIV (scalar)

Floating-point Divide (scalar). This instruction divides the floating-point value of the first source SIMD&FP register by
the floating-point value of the second source SIMD&FP register, and writes the result to the destination SIMD&FP
register.
This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in
either a flag being set in FPSR, or a synchronous exception being generated. For more information, see Floating-point
exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 1 1 1 1 0 ftype 1 Rm 0 0 0 1 1 0 Rn Rd

Half-precision (ftype == 11)

(FEAT_FP16)

FDIV <Hd>, <Hn>, <Hm>

Single-precision (ftype == 00)

FDIV <Sd>, <Sn>, <Sm>

Double-precision (ftype == 01)

FDIV <Dd>, <Dn>, <Dm>

integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);

integer esize;
case ftype of
when '00' esize = 32;
when '01' esize = 64;
when '10' UNDEFINED;
when '11'
if HaveFP16Ext() then
esize = 16;
else
UNDEFINED;

Assembler Symbols

FDIV (scalar) Page 1025

Operation

CheckFPAdvSIMDEnabled64();
bits(esize) operand1 = V[n];
bits(esize) operand2 = V[m];

FPCRType fpcr = FPCR[];

boolean merge = IsMerging(fpcr);
bits(128) result = if merge then V[n] else Zeros();

Elem[result, 0, esize] = FPDiv(operand1, operand2, FPCR[]);

V[d] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

FDIV (scalar) Page 1026

FDIV (vector)

Floating-point Divide (vector). This instruction divides the floating-point values in the elements in the first source
SIMD&FP register, by the floating-point values in the corresponding elements in the second source SIMD&FP register,
places the results in a vector, and writes the vector to the destination SIMD&FP register.
This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in
either a flag being set in FPSR, or a synchronous exception being generated. For more information, see Floating-point
exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: Half-precision and Single-precision and double-precision

Half-precision
(FEAT_FP16)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 0 1 0 Rm 0 0 1 1 1 1 Rn Rd

FDIV <Vd>.<T>, <Vn>.<T>, <Vm>.<T>

if !HaveFP16Ext() then UNDEFINED;

integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer esize = 16;
integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;

Single-precision and double-precision

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 0 sz 1 Rm 1 1 1 1 1 1 Rn Rd

FDIV <Vd>.<T>, <Vn>.<T>, <Vm>.<T>

Assembler Symbols

<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.

<T> For the half-precision variant: is an arrangement specifier, encoded in “Q”:

Q <T>
0 4H
1 8H

For the single-precision and double-precision variant: is an arrangement specifier, encoded in “sz:Q”:

FDIV (vector) Page 1027

sz Q <T>
0 0 2S
0 1 4S
1 0 RESERVED
1 1 2D

<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.

Operation

CheckFPAdvSIMDEnabled64();
bits(datasize) operand1 = V[n];
bits(datasize) operand2 = V[m];
bits(datasize) result;
bits(esize) element1;
bits(esize) element2;

for e = 0 to elements-1
element1 = Elem[operand1, e, esize];
element2 = Elem[operand2, e, esize];
Elem[result, e, esize] = FPDiv(element1, element2, FPCR[]);

V[d] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

FDIV (vector) Page 1028

FJCVTZS

Floating-point Javascript Convert to Signed fixed-point, rounding toward Zero. This instruction converts the double-
precision floating-point value in the SIMD&FP source register to a 32-bit signed integer using the Round towards Zero
rounding mode, and writes the result to the general-purpose destination register. If the result is too large to be
represented as a signed 32-bit integer, then the result is the integer modulo 232, as held in a 32-bit signed integer.
This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in
either a flag being set in FPSR or a synchronous exception being generated. For more information, see Floating-point
exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.

Double-precision to 32-bit
(FEAT_JSCVT)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 1 1 1 1 0 0 1 1 1 1 1 1 0 0 0 0 0 0 0 Rn Rd
sf ftype rmode opcode

FJCVTZS <Wd>, <Dn>

integer d = UInt(Rd);
integer n = UInt(Rn);

if !HaveFJCVTZSExt() then UNDEFINED;

Assembler Symbols

<Wd> Is the 32-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Dn> Is the 64-bit name of the SIMD&FP source register, encoded in the "Rn" field.

Operation

CheckFPAdvSIMDEnabled64();

FPCRType fpcr = FPCR[];

bits(64) fltval;
bits(32) intval;

bit Z;
fltval = V[n];
(intval, Z) = FPToFixedJS(fltval, fpcr, TRUE);
PSTATE.<N,Z,C,V> = '0':Z:'00';
X[d] = intval;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

FJCVTZS Page 1029

FMADD

Floating-point fused Multiply-Add (scalar). This instruction multiplies the values of the first two SIMD&FP source
registers, adds the product to the value of the third SIMD&FP source register, and writes the result to the SIMD&FP
destination register.
A floating-point exception can be generated by this instruction. Depending on the settings in FPCR, the exception
results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see
Floating-point exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 1 1 1 1 1 ftype 0 Rm 0 Ra Rn Rd
o1 o0

Half-precision (ftype == 11)

(FEAT_FP16)

FMADD <Hd>, <Hn>, <Hm>, <Ha>

Single-precision (ftype == 00)

FMADD <Sd>, <Sn>, <Sm>, <Sa>

Double-precision (ftype == 01)

FMADD <Dd>, <Dn>, <Dm>, <Da>

integer d = UInt(Rd);
integer a = UInt(Ra);
integer n = UInt(Rn);
integer m = UInt(Rm);

integer esize;
case ftype of
when '00' esize = 32;
when '01' esize = 64;
when '10' UNDEFINED;
when '11'
if HaveFP16Ext() then
esize = 16;
else
UNDEFINED;

Assembler Symbols

<Dd> Is the 64-bit name of the SIMD&FP destination register, encoded in the "Rd" field.
<Dn> Is the 64-bit name of the first SIMD&FP source register holding the multiplicand, encoded in the "Rn"
field.
<Dm> Is the 64-bit name of the second SIMD&FP source register holding the multiplier, encoded in the "Rm"
field.
<Da> Is the 64-bit name of the third SIMD&FP source register holding the addend, encoded in the "Ra" field.
<Hd> Is the 16-bit name of the SIMD&FP destination register, encoded in the "Rd" field.
<Hn> Is the 16-bit name of the first SIMD&FP source register holding the multiplicand, encoded in the "Rn"
field.
<Hm> Is the 16-bit name of the second SIMD&FP source register holding the multiplier, encoded in the "Rm"
field.
<Ha> Is the 16-bit name of the third SIMD&FP source register holding the addend, encoded in the "Ra" field.
<Sd> Is the 32-bit name of the SIMD&FP destination register, encoded in the "Rd" field.

FMADD Page 1030

<Sn> Is the 32-bit name of the first SIMD&FP source register holding the multiplicand, encoded in the "Rn"
field.
<Sm> Is the 32-bit name of the second SIMD&FP source register holding the multiplier, encoded in the "Rm"
field.
<Sa> Is the 32-bit name of the third SIMD&FP source register holding the addend, encoded in the "Ra" field.

Operation

CheckFPAdvSIMDEnabled64();

bits(esize) operanda = V[a];

bits(esize) operand1 = V[n];
bits(esize) operand2 = V[m];

FPCRType fpcr = FPCR[];

boolean merge = IsMerging(fpcr);
bits(128) result = if merge then V[a] else Zeros();

Elem[result, 0, esize] = FPMulAdd(operanda, operand1, operand2, fpcr);

V[d] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

FMADD Page 1031

FMAX (scalar)

Floating-point Maximum (scalar). This instruction compares the two source SIMD&FP registers, and writes the larger
of the two floating-point values to the destination SIMD&FP register.
This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in
either a flag being set in FPSR, or a synchronous exception being generated. For more information, see Floating-point
exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 1 1 1 1 0 ftype 1 Rm 0 1 0 0 1 0 Rn Rd
op

Half-precision (ftype == 11)

(FEAT_FP16)

FMAX <Hd>, <Hn>, <Hm>

Single-precision (ftype == 00)

FMAX <Sd>, <Sn>, <Sm>

Double-precision (ftype == 01)

FMAX <Dd>, <Dn>, <Dm>

integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);

integer esize;
case ftype of
when '00' esize = 32;
when '01' esize = 64;
when '10' UNDEFINED;
when '11'
if HaveFP16Ext() then
esize = 16;
else
UNDEFINED;

Assembler Symbols

FMAX (scalar) Page 1032

Operation

CheckFPAdvSIMDEnabled64();
bits(esize) operand1 = V[n];
bits(esize) operand2 = V[m];

FPCRType fpcr = FPCR[];

boolean merge = IsMerging(fpcr);
bits(128) result = if merge then V[n] else Zeros();

Elem[result, 0, esize] = FPMax(operand1, operand2, fpcr);

V[d] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

FMAX (scalar) Page 1033

FMAX (vector)

Floating-point Maximum (vector). This instruction compares corresponding vector elements in the two source
SIMD&FP registers, places the larger of each of the two floating-point values into a vector, and writes the vector to
the destination SIMD&FP register.
This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in
either a flag being set in FPSR, or a synchronous exception being generated. For more information, see Floating-point
exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: Half-precision and Single-precision and double-precision

Half-precision
(FEAT_FP16)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 0 1 0 Rm 0 0 1 1 0 1 Rn Rd
U o1

FMAX <Vd>.<T>, <Vn>.<T>, <Vm>.<T>

if !HaveFP16Ext() then UNDEFINED;

integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer esize = 16;
integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;

boolean pair = (U == '1');

boolean minimum = (o1 == '1');

Single-precision and double-precision

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 0 sz 1 Rm 1 1 1 1 0 1 Rn Rd
U o1

FMAX <Vd>.<T>, <Vn>.<T>, <Vm>.<T>

boolean pair = (U == '1');

boolean minimum = (o1 == '1');

Assembler Symbols

<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.

<T> For the half-precision variant: is an arrangement specifier, encoded in “Q”:

Q <T>
0 4H
1 8H

FMAX (vector) Page 1034

For the single-precision and double-precision variant: is an arrangement specifier, encoded in “sz:Q”:

sz Q <T>
0 0 2S
0 1 4S
1 0 RESERVED
1 1 2D

<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.

Operation

for e = 0 to elements-1
if pair then
element1 = Elem[concat, 2*e, esize];
element2 = Elem[concat, (2*e)+1, esize];
else
element1 = Elem[operand1, e, esize];
element2 = Elem[operand2, e, esize];

if minimum then
Elem[result, e, esize] = FPMin(element1, element2, FPCR[]);
else
Elem[result, e, esize] = FPMax(element1, element2, FPCR[]);

V[d] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

FMAX (vector) Page 1035

FMAXNM (scalar)

Floating-point Maximum Number (scalar). This instruction compares the first and second source SIMD&FP register
values, and writes the larger of the two floating-point values to the destination SIMD&FP register.
NaNs are handled according to the IEEE 754-2008 standard. If one vector element is numeric and the other is a quiet
NaN, the result that is placed in the vector is the numerical value, otherwise the result is identical to FMAX (scalar).
This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in
either a flag being set in FPSR, or a synchronous exception being generated. For more information, see Floating-point
exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 1 1 1 1 0 ftype 1 Rm 0 1 1 0 1 0 Rn Rd
op

Half-precision (ftype == 11)

(FEAT_FP16)

FMAXNM <Hd>, <Hn>, <Hm>

Single-precision (ftype == 00)

FMAXNM <Sd>, <Sn>, <Sm>

Double-precision (ftype == 01)

FMAXNM <Dd>, <Dn>, <Dm>

integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);

integer esize;
case ftype of
when '00' esize = 32;
when '01' esize = 64;
when '10' UNDEFINED;
when '11'
if HaveFP16Ext() then
esize = 16;
else
UNDEFINED;

Assembler Symbols

FMAXNM (scalar) Page 1036

Operation

CheckFPAdvSIMDEnabled64();
bits(esize) operand1 = V[n];
bits(esize) operand2 = V[m];

FPCRType fpcr = FPCR[];

boolean merge = IsMerging(fpcr);
bits(128) result = if merge then V[n] else Zeros();

Elem[result, 0, esize] = FPMaxNum(operand1, operand2, fpcr);

V[d] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

FMAXNM (scalar) Page 1037

FMAXNM (vector)

Floating-point Maximum Number (vector). This instruction compares corresponding vector elements in the two source
SIMD&FP registers, writes the larger of the two floating-point values into a vector, and writes the vector to the
destination SIMD&FP register.
NaNs are handled according to the IEEE 754-2008 standard. If one vector element is numeric and the other is a quiet
NaN, the result placed in the vector is the numerical value, otherwise the result is identical to FMAX (scalar).
This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in
either a flag being set in FPSR, or a synchronous exception being generated. For more information, see Floating-point
exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: Half-precision and Single-precision and double-precision

Half-precision
(FEAT_FP16)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 0 1 0 Rm 0 0 0 0 0 1 Rn Rd
U a

FMAXNM <Vd>.<T>, <Vn>.<T>, <Vm>.<T>

if !HaveFP16Ext() then UNDEFINED;

integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer esize = 16;
integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;

boolean pair = (U == '1');

boolean minimum = (a == '1');

Single-precision and double-precision

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 0 sz 1 Rm 1 1 0 0 0 1 Rn Rd
U o1

FMAXNM <Vd>.<T>, <Vn>.<T>, <Vm>.<T>

boolean pair = (U == '1');

boolean minimum = (o1 == '1');

Assembler Symbols

<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.

<T> For the half-precision variant: is an arrangement specifier, encoded in “Q”:

FMAXNM (vector) Page 1038

Q <T>
0 4H
1 8H

For the single-precision and double-precision variant: is an arrangement specifier, encoded in “sz:Q”:

sz Q <T>
0 0 2S
0 1 4S
1 0 RESERVED
1 1 2D

<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.

Operation

for e = 0 to elements-1
if pair then
element1 = Elem[concat, 2*e, esize];
element2 = Elem[concat, (2*e)+1, esize];
else
element1 = Elem[operand1, e, esize];
element2 = Elem[operand2, e, esize];

if minimum then
Elem[result, e, esize] = FPMinNum(element1, element2, FPCR[]);
else
Elem[result, e, esize] = FPMaxNum(element1, element2, FPCR[]);

V[d] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

FMAXNM (vector) Page 1039

FMAXNMP (scalar)

Floating-point Maximum Number of Pair of elements (scalar). This instruction compares two vector elements in the
source SIMD&FP register and writes the largest of the floating-point values as a scalar to the destination SIMD&FP
register.
This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in
either a flag being set in FPSR or a synchronous exception being generated. For more information, see Floating-point
exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: Half-precision and Single-precision and double-precision

Half-precision
(FEAT_FP16)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 0 0 sz 1 1 0 0 0 0 1 1 0 0 1 0 Rn Rd
o1

FMAXNMP <V><d>, <Vn>.<T>

if !HaveFP16Ext() then UNDEFINED;

integer d = UInt(Rd);
integer n = UInt(Rn);

integer esize = 16;

if sz == '1' then UNDEFINED;
integer datasize = 32;

Single-precision and double-precision

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 1 1 0 0 sz 1 1 0 0 0 0 1 1 0 0 1 0 Rn Rd
o1

FMAXNMP <V><d>, <Vn>.<T>

integer d = UInt(Rd);
integer n = UInt(Rn);

integer esize = 32 << UInt(sz);

integer datasize = esize * 2;

Assembler Symbols

<V> For the half-precision variant: is the destination width specifier, encoded in “sz”:

sz <V>
0 H
1 RESERVED

For the single-precision and double-precision variant: is the destination width specifier, encoded in “sz”:

sz <V>
0 S
1 D

<d> Is the number of the SIMD&FP destination register, encoded in the "Rd" field.
<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.

FMAXNMP (scalar) Page 1040

<T> For the half-precision variant: is the source arrangement specifier, encoded in “sz”:

sz <T>
0 2H
1 RESERVED

For the single-precision and double-precision variant: is the source arrangement specifier, encoded in
“sz”:

sz <T>
0 2S
1 2D

Operation

CheckFPAdvSIMDEnabled64();
bits(datasize) operand = V[n];
V[d] = Reduce(ReduceOp_FMAXNUM, operand, esize, FALSE);

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

FMAXNMP (scalar) Page 1041

FMAXNMP (vector)

Floating-point Maximum Number Pairwise (vector). This instruction creates a vector by concatenating the vector
elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register,
reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the largest of each pair of
values into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are
floating-point values.
NaNs are handled according to the IEEE 754-2008 standard. If one vector element is numeric and the other is a quiet
NaN, the result is the numerical value, otherwise the result is identical to FMAX (scalar).
This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in
either a flag being set in FPSR or a synchronous exception being generated. For more information, see Floating-point
exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: Half-precision and Single-precision and double-precision

Half-precision
(FEAT_FP16)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 0 1 0 Rm 0 0 0 0 0 1 Rn Rd
U a

FMAXNMP <Vd>.<T>, <Vn>.<T>, <Vm>.<T>

if !HaveFP16Ext() then UNDEFINED;

integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer esize = 16;
integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;

boolean pair = (U == '1');

boolean minimum = (a == '1');

Single-precision and double-precision

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 0 sz 1 Rm 1 1 0 0 0 1 Rn Rd
U o1

FMAXNMP <Vd>.<T>, <Vn>.<T>, <Vm>.<T>

boolean pair = (U == '1');

boolean minimum = (o1 == '1');

Assembler Symbols

<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.

FMAXNMP (vector) Page 1042

<T> For the half-precision variant: is an arrangement specifier, encoded in “Q”:

Q <T>
0 4H
1 8H

For the single-precision and double-precision variant: is an arrangement specifier, encoded in “sz:Q”:

sz Q <T>
0 0 2S
0 1 4S
1 0 RESERVED
1 1 2D

<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.

Operation

for e = 0 to elements-1
if pair then
element1 = Elem[concat, 2*e, esize];
element2 = Elem[concat, (2*e)+1, esize];
else
element1 = Elem[operand1, e, esize];
element2 = Elem[operand2, e, esize];

if minimum then
Elem[result, e, esize] = FPMinNum(element1, element2, FPCR[]);
else
Elem[result, e, esize] = FPMaxNum(element1, element2, FPCR[]);

V[d] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

FMAXNMP (vector) Page 1043

FMAXNMV

Floating-point Maximum Number across Vector. This instruction compares all the vector elements in the source
SIMD&FP register, and writes the largest of the values as a scalar to the destination SIMD&FP register. All the values
in this instruction are floating-point values.
NaNs are handled according to the IEEE 754-2008 standard. If one vector element is numeric and the other is a quiet
NaN, the result of the comparison is the numerical value, otherwise the result is identical to FMAX (scalar).
This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in
either a flag being set in FPSR or a synchronous exception being generated. For more information, see Floating-point
exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: Half-precision and Single-precision and double-precision

Half-precision
(FEAT_FP16)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 0 0 1 1 0 0 0 0 1 1 0 0 1 0 Rn Rd
o1

FMAXNMV <V><d>, <Vn>.<T>

if !HaveFP16Ext() then UNDEFINED;

integer d = UInt(Rd);
integer n = UInt(Rn);

integer esize = 16;

integer datasize = if Q == '1' then 128 else 64;

Single-precision and double-precision

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 0 sz 1 1 0 0 0 0 1 1 0 0 1 0 Rn Rd
o1

FMAXNMV <V><d>, <Vn>.<T>

integer d = UInt(Rd);
integer n = UInt(Rn);

if sz:Q != '01' then UNDEFINED; // .4S only

integer esize = 32 << UInt(sz);

integer datasize = if Q == '1' then 128 else 64;

Assembler Symbols

<V> For the half-precision variant: is the destination width specifier, H.

For the single-precision and double-precision variant: is the destination width specifier, encoded in “sz”:

sz <V>
0 S
1 RESERVED

<d> Is the number of the SIMD&FP destination register, encoded in the "Rd" field.
<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.

FMAXNMV Page 1044

<T> For the half-precision variant: is an arrangement specifier, encoded in “Q”:

Q <T>
0 4H
1 8H

For the single-precision and double-precision variant: is an arrangement specifier, encoded in “Q:sz”:

Q sz <T>
0 x RESERVED
1 0 4S
1 1 RESERVED

Operation

CheckFPAdvSIMDEnabled64();
bits(datasize) operand = V[n];
V[d] = Reduce(ReduceOp_FMAXNUM, operand, esize, FALSE);

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

FMAXNMV Page 1045

FMAXP (scalar)

Floating-point Maximum of Pair of elements (scalar). This instruction compares two vector elements in the source
SIMD&FP register and writes the largest of the floating-point values as a scalar to the destination SIMD&FP register.
This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in
either a flag being set in FPSR or a synchronous exception being generated. For more information, see Floating-point
exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: Half-precision and Single-precision and double-precision

Half-precision
(FEAT_FP16)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 0 0 sz 1 1 0 0 0 0 1 1 1 1 1 0 Rn Rd
o1

FMAXP <V><d>, <Vn>.<T>

if !HaveFP16Ext() then UNDEFINED;

integer d = UInt(Rd);
integer n = UInt(Rn);

integer esize = 16;

if sz == '1' then UNDEFINED;
integer datasize = 32;

Single-precision and double-precision

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 1 1 0 0 sz 1 1 0 0 0 0 1 1 1 1 1 0 Rn Rd
o1

FMAXP <V><d>, <Vn>.<T>

integer d = UInt(Rd);
integer n = UInt(Rn);

integer esize = 32 << UInt(sz);

integer datasize = esize * 2;

Assembler Symbols

<V> For the half-precision variant: is the destination width specifier, encoded in “sz”:

sz <V>
0 H
1 RESERVED

For the single-precision and double-precision variant: is the destination width specifier, encoded in “sz”:

sz <V>
0 S
1 D

<d> Is the number of the SIMD&FP destination register, encoded in the "Rd" field.
<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.

FMAXP (scalar) Page 1046

<T> For the half-precision variant: is the source arrangement specifier, encoded in “sz”:

sz <T>
0 2H
1 RESERVED

For the single-precision and double-precision variant: is the source arrangement specifier, encoded in
“sz”:

sz <T>
0 2S
1 2D

Operation

CheckFPAdvSIMDEnabled64();
bits(datasize) operand = V[n];

V[d] = Reduce(ReduceOp_FMAX, operand, esize);

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

FMAXP (scalar) Page 1047

FMAXP (vector)

Floating-point Maximum Pairwise (vector). This instruction creates a vector by concatenating the vector elements of
the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair
of adjacent vector elements from the concatenated vector, writes the larger of each pair of values into a vector, and
writes the vector to the destination SIMD&FP register. All the values in this instruction are floating-point values.
This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in
either a flag being set in FPSR or a synchronous exception being generated. For more information, see Floating-point
exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: Half-precision and Single-precision and double-precision

Half-precision
(FEAT_FP16)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 0 1 0 Rm 0 0 1 1 0 1 Rn Rd
U o1

FMAXP <Vd>.<T>, <Vn>.<T>, <Vm>.<T>

if !HaveFP16Ext() then UNDEFINED;

integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer esize = 16;
integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;

boolean pair = (U == '1');

boolean minimum = (o1 == '1');

Single-precision and double-precision

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 0 sz 1 Rm 1 1 1 1 0 1 Rn Rd
U o1

FMAXP <Vd>.<T>, <Vn>.<T>, <Vm>.<T>

boolean pair = (U == '1');

boolean minimum = (o1 == '1');

Assembler Symbols

<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.

<T> For the half-precision variant: is an arrangement specifier, encoded in “Q”:

FMAXP (vector) Page 1048

Q <T>
0 4H
1 8H

For the single-precision and double-precision variant: is an arrangement specifier, encoded in “sz:Q”:

sz Q <T>
0 0 2S
0 1 4S
1 0 RESERVED
1 1 2D

<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.

Operation

for e = 0 to elements-1
if pair then
element1 = Elem[concat, 2*e, esize];
element2 = Elem[concat, (2*e)+1, esize];
else
element1 = Elem[operand1, e, esize];
element2 = Elem[operand2, e, esize];

if minimum then
Elem[result, e, esize] = FPMin(element1, element2, FPCR[]);
else
Elem[result, e, esize] = FPMax(element1, element2, FPCR[]);

V[d] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

FMAXP (vector) Page 1049

FMAXV

Floating-point Maximum across Vector. This instruction compares all the vector elements in the source SIMD&FP
register, and writes the largest of the values as a scalar to the destination SIMD&FP register. All the values in this
instruction are floating-point values.
This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in
either a flag being set in FPSR or a synchronous exception being generated. For more information, see Floating-point
exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: Half-precision and Single-precision and double-precision

Half-precision
(FEAT_FP16)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 0 0 1 1 0 0 0 0 1 1 1 1 1 0 Rn Rd
o1

FMAXV <V><d>, <Vn>.<T>

if !HaveFP16Ext() then UNDEFINED;

integer d = UInt(Rd);
integer n = UInt(Rn);

integer esize = 16;

integer datasize = if Q == '1' then 128 else 64;

Single-precision and double-precision

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 0 sz 1 1 0 0 0 0 1 1 1 1 1 0 Rn Rd
o1

FMAXV <V><d>, <Vn>.<T>

integer d = UInt(Rd);
integer n = UInt(Rn);

if sz:Q != '01' then UNDEFINED;

integer esize = 32 << UInt(sz);

integer datasize = if Q == '1' then 128 else 64;

Assembler Symbols

<V> For the half-precision variant: is the destination width specifier, H.

For the single-precision and double-precision variant: is the destination width specifier, encoded in “sz”:

sz <V>
0 S
1 RESERVED

<d> Is the number of the SIMD&FP destination register, encoded in the "Rd" field.
<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.

<T> For the half-precision variant: is an arrangement specifier, encoded in “Q”:

FMAXV Page 1050

Q <T>
0 4H
1 8H

For the single-precision and double-precision variant: is an arrangement specifier, encoded in “Q:sz”:

Q sz <T>
0 x RESERVED
1 0 4S
1 1 RESERVED

Operation

CheckFPAdvSIMDEnabled64();
bits(datasize) operand = V[n];

V[d] = Reduce(ReduceOp_FMAX, operand, esize);

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

FMAXV Page 1051

FMIN (scalar)

Floating-point Minimum (scalar). This instruction compares the first and second source SIMD&FP register values, and
writes the smaller of the two floating-point values to the destination SIMD&FP register.
This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in
either a flag being set in FPSR, or a synchronous exception being generated. For more information, see Floating-point
exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 1 1 1 1 0 ftype 1 Rm 0 1 0 1 1 0 Rn Rd
op

Half-precision (ftype == 11)

(FEAT_FP16)

FMIN <Hd>, <Hn>, <Hm>

Single-precision (ftype == 00)

FMIN <Sd>, <Sn>, <Sm>

Double-precision (ftype == 01)

FMIN <Dd>, <Dn>, <Dm>

integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);

integer esize;
case ftype of
when '00' esize = 32;
when '01' esize = 64;
when '10' UNDEFINED;
when '11'
if HaveFP16Ext() then
esize = 16;
else
UNDEFINED;

Assembler Symbols

FMIN (scalar) Page 1052

Operation

CheckFPAdvSIMDEnabled64();
bits(esize) operand1 = V[n];
bits(esize) operand2 = V[m];

FPCRType fpcr = FPCR[];

boolean merge = IsMerging(fpcr);
bits(128) result = if merge then V[n] else Zeros();

Elem[result, 0, esize] = FPMin(operand1, operand2, fpcr);

V[d] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

FMIN (scalar) Page 1053

FMIN (vector)

Floating-point minimum (vector). This instruction compares corresponding elements in the vectors in the two source
SIMD&FP registers, places the smaller of each of the two floating-point values into a vector, and writes the vector to
the destination SIMD&FP register.
This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in
either a flag being set in FPSR, or a synchronous exception being generated. For more information, see Floating-point
exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: Half-precision and Single-precision and double-precision

Half-precision
(FEAT_FP16)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 1 1 0 Rm 0 0 1 1 0 1 Rn Rd
U o1

FMIN <Vd>.<T>, <Vn>.<T>, <Vm>.<T>

if !HaveFP16Ext() then UNDEFINED;

integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer esize = 16;
integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;

boolean pair = (U == '1');

boolean minimum = (o1 == '1');

Single-precision and double-precision

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 1 sz 1 Rm 1 1 1 1 0 1 Rn Rd
U o1

FMIN <Vd>.<T>, <Vn>.<T>, <Vm>.<T>

boolean pair = (U == '1');

boolean minimum = (o1 == '1');

Assembler Symbols

<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.

<T> For the half-precision variant: is an arrangement specifier, encoded in “Q”:

Q <T>
0 4H
1 8H

FMIN (vector) Page 1054

For the single-precision and double-precision variant: is an arrangement specifier, encoded in “sz:Q”:

sz Q <T>
0 0 2S
0 1 4S
1 0 RESERVED
1 1 2D

<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.

Operation

for e = 0 to elements-1
if pair then
element1 = Elem[concat, 2*e, esize];
element2 = Elem[concat, (2*e)+1, esize];
else
element1 = Elem[operand1, e, esize];
element2 = Elem[operand2, e, esize];

if minimum then
Elem[result, e, esize] = FPMin(element1, element2, FPCR[]);
else
Elem[result, e, esize] = FPMax(element1, element2, FPCR[]);

V[d] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

FMIN (vector) Page 1055

FMINNM (scalar)

Floating-point Minimum Number (scalar). This instruction compares the first and second source SIMD&FP register
values, and writes the smaller of the two floating-point values to the destination SIMD&FP register.
NaNs are handled according to the IEEE 754-2008 standard. If one vector element is numeric and the other is a quiet
NaN, the result that is placed in the vector is the numerical value, otherwise the result is identical to FMIN (scalar).
This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in
either a flag being set in FPSR or a synchronous exception being generated. For more information, see Floating-point
exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 1 1 1 1 0 ftype 1 Rm 0 1 1 1 1 0 Rn Rd
op

Half-precision (ftype == 11)

(FEAT_FP16)

FMINNM <Hd>, <Hn>, <Hm>

Single-precision (ftype == 00)

FMINNM <Sd>, <Sn>, <Sm>

Double-precision (ftype == 01)

FMINNM <Dd>, <Dn>, <Dm>

integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);

integer esize;
case ftype of
when '00' esize = 32;
when '01' esize = 64;
when '10' UNDEFINED;
when '11'
if HaveFP16Ext() then
esize = 16;
else
UNDEFINED;

Assembler Symbols

FMINNM (scalar) Page 1056

Operation

CheckFPAdvSIMDEnabled64();
bits(esize) operand1 = V[n];
bits(esize) operand2 = V[m];

FPCRType fpcr = FPCR[];

boolean merge = IsMerging(fpcr);
bits(128) result = if merge then V[n] else Zeros();

Elem[result, 0, esize] = FPMinNum(operand1, operand2, fpcr);

V[d] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

FMINNM (scalar) Page 1057

FMINNM (vector)

Floating-point Minimum Number (vector). This instruction compares corresponding vector elements in the two source
SIMD&FP registers, writes the smaller of the two floating-point values into a vector, and writes the vector to the
destination SIMD&FP register.
NaNs are handled according to the IEEE 754-2008 standard. If one vector element is numeric and the other is a quiet
NaN, the result placed in the vector is the numerical value, otherwise the result is identical to FMIN (scalar).
This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in
either a flag being set in FPSR or a synchronous exception being generated. For more information, see Floating-point
exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: Half-precision and Single-precision and double-precision

Half-precision
(FEAT_FP16)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 1 1 0 Rm 0 0 0 0 0 1 Rn Rd
U a

FMINNM <Vd>.<T>, <Vn>.<T>, <Vm>.<T>

if !HaveFP16Ext() then UNDEFINED;

integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer esize = 16;
integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;

boolean pair = (U == '1');

boolean minimum = (a == '1');

Single-precision and double-precision

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 1 sz 1 Rm 1 1 0 0 0 1 Rn Rd
U o1

FMINNM <Vd>.<T>, <Vn>.<T>, <Vm>.<T>

boolean pair = (U == '1');

boolean minimum = (o1 == '1');

Assembler Symbols

<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.

<T> For the half-precision variant: is an arrangement specifier, encoded in “Q”:

FMINNM (vector) Page 1058

Q <T>
0 4H
1 8H

For the single-precision and double-precision variant: is an arrangement specifier, encoded in “sz:Q”:

sz Q <T>
0 0 2S
0 1 4S
1 0 RESERVED
1 1 2D

<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.

Operation

for e = 0 to elements-1
if pair then
element1 = Elem[concat, 2*e, esize];
element2 = Elem[concat, (2*e)+1, esize];
else
element1 = Elem[operand1, e, esize];
element2 = Elem[operand2, e, esize];

if minimum then
Elem[result, e, esize] = FPMinNum(element1, element2, FPCR[]);
else
Elem[result, e, esize] = FPMaxNum(element1, element2, FPCR[]);

V[d] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

FMINNM (vector) Page 1059

FMINNMP (scalar)

Floating-point Minimum Number of Pair of elements (scalar). This instruction compares two vector elements in the
source SIMD&FP register and writes the smallest of the floating-point values as a scalar to the destination SIMD&FP
register.
This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in
either a flag being set in FPSR or a synchronous exception being generated. For more information, see Floating-point
exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: Half-precision and Single-precision and double-precision

Half-precision
(FEAT_FP16)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 0 1 sz 1 1 0 0 0 0 1 1 0 0 1 0 Rn Rd
o1

FMINNMP <V><d>, <Vn>.<T>

if !HaveFP16Ext() then UNDEFINED;

integer d = UInt(Rd);
integer n = UInt(Rn);

integer esize = 16;

if sz == '1' then UNDEFINED;
integer datasize = 32;

Single-precision and double-precision

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 1 1 0 1 sz 1 1 0 0 0 0 1 1 0 0 1 0 Rn Rd
o1

FMINNMP <V><d>, <Vn>.<T>

integer d = UInt(Rd);
integer n = UInt(Rn);

integer esize = 32 << UInt(sz);

integer datasize = esize * 2;

Assembler Symbols

<V> For the half-precision variant: is the destination width specifier, encoded in “sz”:

sz <V>
0 H
1 RESERVED

For the single-precision and double-precision variant: is the destination width specifier, encoded in “sz”:

sz <V>
0 S
1 D

<d> Is the number of the SIMD&FP destination register, encoded in the "Rd" field.
<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.

FMINNMP (scalar) Page 1060

<T> For the half-precision variant: is the source arrangement specifier, encoded in “sz”:

sz <T>
0 2H
1 RESERVED

For the single-precision and double-precision variant: is the source arrangement specifier, encoded in
“sz”:

sz <T>
0 2S
1 2D

Operation

CheckFPAdvSIMDEnabled64();
bits(datasize) operand = V[n];
V[d] = Reduce(ReduceOp_FMINNUM, operand, esize, FALSE);

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

FMINNMP (scalar) Page 1061

FMINNMP (vector)

Floating-point Minimum Number Pairwise (vector). This instruction creates a vector by concatenating the vector
elements of the first source SIMD&FP register after the vector elements of the second source SIMD&FP register,
reads each pair of adjacent vector elements in the two source SIMD&FP registers, writes the smallest of each pair of
floating-point values into a vector, and writes the vector to the destination SIMD&FP register. All the values in this
instruction are floating-point values.
NaNs are handled according to the IEEE 754-2008 standard. If one vector element is numeric and the other is a quiet
NaN, the result is the numerical value, otherwise the result is identical to FMIN (scalar).
This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in
either a flag being set in FPSR or a synchronous exception being generated. For more information, see Floating-point
exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: Half-precision and Single-precision and double-precision

Half-precision
(FEAT_FP16)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 1 1 0 Rm 0 0 0 0 0 1 Rn Rd
U a

FMINNMP <Vd>.<T>, <Vn>.<T>, <Vm>.<T>

if !HaveFP16Ext() then UNDEFINED;

integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer esize = 16;
integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;

boolean pair = (U == '1');

boolean minimum = (a == '1');

Single-precision and double-precision

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 1 sz 1 Rm 1 1 0 0 0 1 Rn Rd
U o1

FMINNMP <Vd>.<T>, <Vn>.<T>, <Vm>.<T>

boolean pair = (U == '1');

boolean minimum = (o1 == '1');

Assembler Symbols

<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.

FMINNMP (vector) Page 1062

<T> For the half-precision variant: is an arrangement specifier, encoded in “Q”:

Q <T>
0 4H
1 8H

For the single-precision and double-precision variant: is an arrangement specifier, encoded in “sz:Q”:

sz Q <T>
0 0 2S
0 1 4S
1 0 RESERVED
1 1 2D

<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.

Operation

for e = 0 to elements-1
if pair then
element1 = Elem[concat, 2*e, esize];
element2 = Elem[concat, (2*e)+1, esize];
else
element1 = Elem[operand1, e, esize];
element2 = Elem[operand2, e, esize];

if minimum then
Elem[result, e, esize] = FPMinNum(element1, element2, FPCR[]);
else
Elem[result, e, esize] = FPMaxNum(element1, element2, FPCR[]);

V[d] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

FMINNMP (vector) Page 1063

FMINNMV

Floating-point Minimum Number across Vector. This instruction compares all the vector elements in the source
SIMD&FP register, and writes the smallest of the values as a scalar to the destination SIMD&FP register. All the
values in this instruction are floating-point values.
NaNs are handled according to the IEEE 754-2008 standard. If one vector element is numeric and the other is a quiet
NaN, the result of the comparison is the numerical value, otherwise the result is identical to FMIN (scalar).
This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in
either a flag being set in FPSR or a synchronous exception being generated. For more information, see Floating-point
exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: Half-precision and Single-precision and double-precision

Half-precision
(FEAT_FP16)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 1 0 1 1 0 0 0 0 1 1 0 0 1 0 Rn Rd
o1

FMINNMV <V><d>, <Vn>.<T>

if !HaveFP16Ext() then UNDEFINED;

integer d = UInt(Rd);
integer n = UInt(Rn);

integer esize = 16;

integer datasize = if Q == '1' then 128 else 64;

Single-precision and double-precision

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 1 sz 1 1 0 0 0 0 1 1 0 0 1 0 Rn Rd
o1

FMINNMV <V><d>, <Vn>.<T>

integer d = UInt(Rd);
integer n = UInt(Rn);

if sz:Q != '01' then UNDEFINED; // .4S only

integer esize = 32 << UInt(sz);

integer datasize = if Q == '1' then 128 else 64;

Assembler Symbols

<V> For the half-precision variant: is the destination width specifier, H.

For the single-precision and double-precision variant: is the destination width specifier, encoded in “sz”:

sz <V>
0 S
1 RESERVED

<d> Is the number of the SIMD&FP destination register, encoded in the "Rd" field.
<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.

FMINNMV Page 1064

<T> For the half-precision variant: is an arrangement specifier, encoded in “Q”:

Q <T>
0 4H
1 8H

For the single-precision and double-precision variant: is an arrangement specifier, encoded in “Q:sz”:

Q sz <T>
0 x RESERVED
1 0 4S
1 1 RESERVED

Operation

CheckFPAdvSIMDEnabled64();
bits(datasize) operand = V[n];
V[d] = Reduce(ReduceOp_FMINNUM, operand, esize, FALSE);

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

FMINNMV Page 1065

FMINP (scalar)

Floating-point Minimum of Pair of elements (scalar). This instruction compares two vector elements in the source
SIMD&FP register and writes the smallest of the floating-point values as a scalar to the destination SIMD&FP register.
This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in
either a flag being set in FPSR or a synchronous exception being generated. For more information, see Floating-point
exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: Half-precision and Single-precision and double-precision

Half-precision
(FEAT_FP16)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 0 1 sz 1 1 0 0 0 0 1 1 1 1 1 0 Rn Rd
o1

FMINP <V><d>, <Vn>.<T>

if !HaveFP16Ext() then UNDEFINED;

integer d = UInt(Rd);
integer n = UInt(Rn);

integer esize = 16;

if sz == '1' then UNDEFINED;
integer datasize = 32;

Single-precision and double-precision

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 1 1 0 1 sz 1 1 0 0 0 0 1 1 1 1 1 0 Rn Rd
o1

FMINP <V><d>, <Vn>.<T>

integer d = UInt(Rd);
integer n = UInt(Rn);

integer esize = 32 << UInt(sz);

integer datasize = esize * 2;

Assembler Symbols

<V> For the half-precision variant: is the destination width specifier, encoded in “sz”:

sz <V>
0 H
1 RESERVED

For the single-precision and double-precision variant: is the destination width specifier, encoded in “sz”:

sz <V>
0 S
1 D

<d> Is the number of the SIMD&FP destination register, encoded in the "Rd" field.
<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.

FMINP (scalar) Page 1066

<T> For the half-precision variant: is the source arrangement specifier, encoded in “sz”:

sz <T>
0 2H
1 RESERVED

For the single-precision and double-precision variant: is the source arrangement specifier, encoded in
“sz”:

sz <T>
0 2S
1 2D

Operation

CheckFPAdvSIMDEnabled64();
bits(datasize) operand = V[n];

V[d] = Reduce(ReduceOp_FMIN, operand, esize);

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

FMINP (scalar) Page 1067

FMINP (vector)

Floating-point Minimum Pairwise (vector). This instruction creates a vector by concatenating the vector elements of
the first source SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair
of adjacent vector elements from the concatenated vector, writes the smaller of each pair of values into a vector, and
writes the vector to the destination SIMD&FP register. All the values in this instruction are floating-point values.
This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in
either a flag being set in FPSR or a synchronous exception being generated. For more information, see Floating-point
exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: Half-precision and Single-precision and double-precision

Half-precision
(FEAT_FP16)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 1 1 0 Rm 0 0 1 1 0 1 Rn Rd
U o1

FMINP <Vd>.<T>, <Vn>.<T>, <Vm>.<T>

if !HaveFP16Ext() then UNDEFINED;

integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer esize = 16;
integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;

boolean pair = (U == '1');

boolean minimum = (o1 == '1');

Single-precision and double-precision

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 1 sz 1 Rm 1 1 1 1 0 1 Rn Rd
U o1

FMINP <Vd>.<T>, <Vn>.<T>, <Vm>.<T>

boolean pair = (U == '1');

boolean minimum = (o1 == '1');

Assembler Symbols

<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.

<T> For the half-precision variant: is an arrangement specifier, encoded in “Q”:

FMINP (vector) Page 1068

Q <T>
0 4H
1 8H

For the single-precision and double-precision variant: is an arrangement specifier, encoded in “sz:Q”:

sz Q <T>
0 0 2S
0 1 4S
1 0 RESERVED
1 1 2D

<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.

Operation

for e = 0 to elements-1
if pair then
element1 = Elem[concat, 2*e, esize];
element2 = Elem[concat, (2*e)+1, esize];
else
element1 = Elem[operand1, e, esize];
element2 = Elem[operand2, e, esize];

if minimum then
Elem[result, e, esize] = FPMin(element1, element2, FPCR[]);
else
Elem[result, e, esize] = FPMax(element1, element2, FPCR[]);

V[d] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

FMINP (vector) Page 1069

FMINV

Floating-point Minimum across Vector. This instruction compares all the vector elements in the source SIMD&FP
register, and writes the smallest of the values as a scalar to the destination SIMD&FP register. All the values in this
instruction are floating-point values.
This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in
either a flag being set in FPSR or a synchronous exception being generated. For more information, see Floating-point
exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: Half-precision and Single-precision and double-precision

Half-precision
(FEAT_FP16)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 1 0 1 1 0 0 0 0 1 1 1 1 1 0 Rn Rd
o1

FMINV <V><d>, <Vn>.<T>

if !HaveFP16Ext() then UNDEFINED;

integer d = UInt(Rd);
integer n = UInt(Rn);

integer esize = 16;

integer datasize = if Q == '1' then 128 else 64;

Single-precision and double-precision

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 1 sz 1 1 0 0 0 0 1 1 1 1 1 0 Rn Rd
o1

FMINV <V><d>, <Vn>.<T>

integer d = UInt(Rd);
integer n = UInt(Rn);

if sz:Q != '01' then UNDEFINED;

integer esize = 32 << UInt(sz);

integer datasize = if Q == '1' then 128 else 64;

Assembler Symbols

<V> For the half-precision variant: is the destination width specifier, H.

For the single-precision and double-precision variant: is the destination width specifier, encoded in “sz”:

sz <V>
0 S
1 RESERVED

<d> Is the number of the SIMD&FP destination register, encoded in the "Rd" field.
<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.

<T> For the half-precision variant: is an arrangement specifier, encoded in “Q”:

FMINV Page 1070

Q <T>
0 4H
1 8H

For the single-precision and double-precision variant: is an arrangement specifier, encoded in “Q:sz”:

Q sz <T>
0 x RESERVED
1 0 4S
1 1 RESERVED

Operation

CheckFPAdvSIMDEnabled64();
bits(datasize) operand = V[n];

V[d] = Reduce(ReduceOp_FMIN, operand, esize);

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

FMINV Page 1071

FMLA (by element)

Floating-point fused Multiply-Add to accumulator (by element). This instruction multiplies the vector elements in the
first source SIMD&FP register by the specified value in the second source SIMD&FP register, and accumulates the
results in the vector elements of the destination SIMD&FP register. All the values in this instruction are floating-point
values.
This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in
either a flag being set in FPSR or a synchronous exception being generated. For more information, see Floating-point
exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 4 classes: Scalar, half-precision , Scalar, single-precision and double-precision , Vector, half-
precision and Vector, single-precision and double-precision

Scalar, half-precision
(FEAT_FP16)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 1 0 0 L M Rm 0 0 0 1 H 0 Rn Rd
o2

FMLA <Hd>, <Hn>, <Vm>.H[<index>]

if !HaveFP16Ext() then UNDEFINED;

integer idxdsize = if H == '1' then 128 else 64;

integer n = UInt(Rn);
integer m = UInt(Rm);
integer d = UInt(Rd);
integer index = UInt(H:L:M);

integer esize = 16;

integer datasize = esize;
integer elements = 1;
boolean sub_op = (o2 == '1');

Scalar, single-precision and double-precision

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 1 1 sz L M Rm 0 0 0 1 H 0 Rn Rd
o2

FMLA <V><d>, <V><n>, <Vm>.<Ts>[<index>]

integer idxdsize = if H == '1' then 128 else 64;

integer index;
bit Rmhi = M;
case sz:L of
when '0x' index = UInt(H:L);
when '10' index = UInt(H);
when '11' UNDEFINED;

integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rmhi:Rm);

integer esize = 32 << UInt(sz);

integer datasize = esize;
integer elements = 1;
boolean sub_op = (o2 == '1');

FMLA (by element) Page 1072

Vector, half-precision
(FEAT_FP16)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 1 0 0 L M Rm 0 0 0 1 H 0 Rn Rd
o2

FMLA <Vd>.<T>, <Vn>.<T>, <Vm>.H[<index>]

if !HaveFP16Ext() then UNDEFINED;

integer idxdsize = if H == '1' then 128 else 64;

integer n = UInt(Rn);
integer m = UInt(Rm);
integer d = UInt(Rd);
integer index = UInt(H:L:M);

integer esize = 16;

integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;
boolean sub_op = (o2 == '1');

Vector, single-precision and double-precision

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 1 1 sz L M Rm 0 0 0 1 H 0 Rn Rd
o2

FMLA <Vd>.<T>, <Vn>.<T>, <Vm>.<Ts>[<index>]

integer idxdsize = if H == '1' then 128 else 64;

integer index;
bit Rmhi = M;
case sz:L of
when '0x' index = UInt(H:L);
when '10' index = UInt(H);
when '11' UNDEFINED;

integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rmhi:Rm);

if sz:Q == '10' then UNDEFINED;

integer esize = 32 << UInt(sz);
integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;
boolean sub_op = (o2 == '1');

Assembler Symbols

<Hd> Is the 16-bit name of the SIMD&FP destination register, encoded in the "Rd" field.
<Hn> Is the 16-bit name of the first SIMD&FP source register, encoded in the "Rn" field.

<V> Is a width specifier, encoded in “sz”:

sz <V>
0 S
1 D

<d> Is the number of the SIMD&FP destination register, encoded in the "Rd" field.
<n> Is the number of the first SIMD&FP source register, encoded in the "Rn" field.
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.

FMLA (by element) Page 1073

<T> For the half-precision variant: is an arrangement specifier, encoded in “Q”:

Q <T>
0 4H
1 8H

For the single-precision and double-precision variant: is an arrangement specifier, encoded in “Q:sz”:

Q sz <T>
0 0 2S
0 1 RESERVED
1 0 4S
1 1 2D

<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
<Vm> For the half-precision variant: is the name of the second SIMD&FP source register, in the range V0 to
V15, encoded in the "Rm" field.
For the single-precision and double-precision variant: is the name of the second SIMD&FP source
register, encoded in the "M:Rm" fields.

<Ts> Is an element size specifier, encoded in “sz”:

sz <Ts>
0 S
1 D

<index> For the half-precision variant: is the element index, in the range 0 to 7, encoded in the "H:L:M" fields.

For the single-precision and double-precision variant: is the element index, encoded in “sz:L:H”:

sz L <index>
0 x H:L
1 0 H
1 1 RESERVED

Operation

CheckFPAdvSIMDEnabled64();
bits(datasize) operand1 = V[n];
bits(idxdsize) operand2 = V[m];
bits(datasize) operand3 = V[d];
bits(esize) element1;
bits(esize) element2 = Elem[operand2, index, esize];
FPCRType fpcr = FPCR[];
boolean merge = elements == 1 && IsMerging(fpcr);
bits(128) result = if merge then V[d] else Zeros();

for e = 0 to elements-1
element1 = Elem[operand1, e, esize];
if sub_op then element1 = FPNeg(element1);
Elem[result, e, esize] = FPMulAdd(Elem[operand3, e, esize], element1, element2, fpcr);

V[d] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

FMLA (by element) Page 1074

FMLA (vector)

Floating-point fused Multiply-Add to accumulator (vector). This instruction multiplies corresponding floating-point
values in the vectors in the two source SIMD&FP registers, adds the product to the corresponding vector element of
the destination SIMD&FP register, and writes the result to the destination SIMD&FP register.
A floating-point exception can be generated by this instruction. Depending on the settings in FPCR, the exception
results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see
Floating-point exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: Half-precision and Single-precision and double-precision

Half-precision
(FEAT_FP16)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 0 1 0 Rm 0 0 0 0 1 1 Rn Rd
a

FMLA <Vd>.<T>, <Vn>.<T>, <Vm>.<T>

if !HaveFP16Ext() then UNDEFINED;

integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer esize = 16;
integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;

boolean sub_op = (a == '1');

Single-precision and double-precision

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 0 sz 1 Rm 1 1 0 0 1 1 Rn Rd
op

FMLA <Vd>.<T>, <Vn>.<T>, <Vm>.<T>

boolean sub_op = (op == '1');

Assembler Symbols

<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.

<T> For the half-precision variant: is an arrangement specifier, encoded in “Q”:

Q <T>
0 4H
1 8H

FMLA (vector) Page 1075

For the single-precision and double-precision variant: is an arrangement specifier, encoded in “sz:Q”:

sz Q <T>
0 0 2S
0 1 4S
1 0 RESERVED
1 1 2D

<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.

Operation

CheckFPAdvSIMDEnabled64();
bits(datasize) operand1 = V[n];
bits(datasize) operand2 = V[m];
bits(datasize) operand3 = V[d];
bits(datasize) result;
bits(esize) element1;
bits(esize) element2;

for e = 0 to elements-1
element1 = Elem[operand1, e, esize];
element2 = Elem[operand2, e, esize];
if sub_op then element1 = FPNeg(element1);
Elem[result, e, esize] = FPMulAdd(Elem[operand3, e, esize], element1, element2, FPCR[]);

V[d] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

FMLA (vector) Page 1076

FMLAL, FMLAL2 (by element)

Floating-point fused Multiply-Add Long to accumulator (by element). This instruction multiplies the vector elements in
the first source SIMD&FP register by the specified value in the second source SIMD&FP register, and accumulates the
product to the corresponding vector element of the destination SIMD&FP register. The instruction does not round the
result of the multiply before the accumulation.
A floating-point exception can be generated by this instruction. Depending on the settings in FPCR, the exception
results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see
Floating-point exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
In Armv8.2 and Armv8.3, this is an OPTIONAL instruction. From Armv8.4 it is mandatory for all implementations to
support it.

Note

ID_AA64ISAR0_EL1.FHM indicates whether this instruction is supported.

It has encodings from 2 classes: FMLAL and FMLAL2

FMLAL
(FEAT_FHM)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 1 1 0 L M Rm 0 0 0 0 H 0 Rn Rd
sz S

FMLAL <Vd>.<Ta>, <Vn>.<Tb>, <Vm>.H[<index>]

if !HaveFP16MulNoRoundingToFP32Ext() then UNDEFINED;

integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt('0':Rm); // Vm can only be in bottom 16 registers.
if sz == '1' then UNDEFINED;
integer index = UInt(H:L:M);

integer esize = 32;

integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;

boolean sub_op = (S == '1');

integer part = 0;

FMLAL2
(FEAT_FHM)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 1 1 0 L M Rm 1 0 0 0 H 0 Rn Rd
sz S

FMLAL, FMLAL2 (by

Page 1077
element)
FMLAL2 <Vd>.<Ta>, <Vn>.<Tb>, <Vm>.H[<index>]

if !HaveFP16MulNoRoundingToFP32Ext() then UNDEFINED;

integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt('0':Rm); // Vm can only be in bottom 16 registers.
if sz == '1' then UNDEFINED;
integer index = UInt(H:L:M);

integer esize = 32;

integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;

boolean sub_op = (S == '1');

integer part = 1;

Assembler Symbols

<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.

<Ta> Is an arrangement specifier, encoded in “Q”:

Q <Ta>
0 2S
1 4S

<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.

<Tb> Is an arrangement specifier, encoded in “Q”:

Q <Tb>
0 2H
1 4H

<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.
<index> Is the element index, encoded in the "H:L:M" fields.

Operation

CheckFPAdvSIMDEnabled64();
bits(datasize DIV 2) operand1 = Vpart[n, part];
bits(128) operand2 = V[m];
bits(datasize) operand3 = V[d];
bits(datasize) result;
bits(esize DIV 2) element1;
bits(esize DIV 2) element2 = Elem[operand2, index, esize DIV 2];

for e = 0 to elements-1
element1 = Elem[operand1, e, esize DIV 2];
if sub_op then element1 = FPNeg(element1);
Elem[result, e, esize] = FPMulAddH(Elem[operand3, e, esize], element1, element2, FPCR[]);
V[d] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

FMLAL, FMLAL2 (by

Page 1078
element)
FMLAL, FMLAL2 (vector)

Floating-point fused Multiply-Add Long to accumulator (vector). This instruction multiplies corresponding half-
precision floating-point values in the vectors in the two source SIMD&FP registers, and accumulates the product to
the corresponding vector element of the destination SIMD&FP register. The instruction does not round the result of
the multiply before the accumulation.
A floating-point exception can be generated by this instruction. Depending on the settings in FPCR, the exception
results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see
Floating-point exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
In Armv8.2 and Armv8.3, this is an OPTIONAL instruction. From Armv8.4 it is mandatory for all implementations to
support it.

Note

ID_AA64ISAR0_EL1.FHM indicates whether this instruction is supported.

It has encodings from 2 classes: FMLAL and FMLAL2

FMLAL
(FEAT_FHM)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 0 0 1 Rm 1 1 1 0 1 1 Rn Rd
S sz

FMLAL <Vd>.<Ta>, <Vn>.<Tb>, <Vm>.<Tb>

if !HaveFP16MulNoRoundingToFP32Ext() then UNDEFINED;

integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
if sz == '1' then UNDEFINED;
integer esize = 32;
integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;
boolean sub_op = (S == '1');
integer part = 0;

FMLAL2
(FEAT_FHM)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 0 0 1 Rm 1 1 0 0 1 1 Rn Rd
S sz

FMLAL2 <Vd>.<Ta>, <Vn>.<Tb>, <Vm>.<Tb>

if !HaveFP16MulNoRoundingToFP32Ext() then UNDEFINED;

FMLAL, FMLAL2 (vector) Page 1079

Assembler Symbols

<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.

<Ta> Is an arrangement specifier, encoded in “Q”:

Q <Ta>
0 2S
1 4S

<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.

<Tb> Is an arrangement specifier, encoded in “Q”:

Q <Tb>
0 2H
1 4H

<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.

Operation

CheckFPAdvSIMDEnabled64();
bits(datasize DIV 2) operand1 = Vpart[n, part];
bits(datasize DIV 2) operand2 = Vpart[m, part];
bits(datasize) operand3 = V[d];
bits(datasize) result;
bits(esize DIV 2) element1;
bits(esize DIV 2) element2;

for e = 0 to elements-1
element1 = Elem[operand1, e, esize DIV 2];
element2 = Elem[operand2, e, esize DIV 2];
if sub_op then element1 = FPNeg(element1);
Elem[result, e, esize] = FPMulAddH(Elem[operand3, e, esize], element1, element2, FPCR[]);
V[d] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

FMLAL, FMLAL2 (vector) Page 1080

FMLS (by element)

Floating-point fused Multiply-Subtract from accumulator (by element). This instruction multiplies the vector elements
in the first source SIMD&FP register by the specified value in the second source SIMD&FP register, and subtracts the
results from the vector elements of the destination SIMD&FP register. All the values in this instruction are floating-
point values.
This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in
either a flag being set in FPSR or a synchronous exception being generated. For more information, see Floating-point
exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 4 classes: Scalar, half-precision , Scalar, single-precision and double-precision , Vector, half-
precision and Vector, single-precision and double-precision

Scalar, half-precision
(FEAT_FP16)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 1 0 0 L M Rm 0 1 0 1 H 0 Rn Rd
o2

FMLS <Hd>, <Hn>, <Vm>.H[<index>]

if !HaveFP16Ext() then UNDEFINED;

integer idxdsize = if H == '1' then 128 else 64;

integer n = UInt(Rn);
integer m = UInt(Rm);
integer d = UInt(Rd);
integer index = UInt(H:L:M);

integer esize = 16;

integer datasize = esize;
integer elements = 1;
boolean sub_op = (o2 == '1');

Scalar, single-precision and double-precision

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 1 1 sz L M Rm 0 1 0 1 H 0 Rn Rd
o2

FMLS <V><d>, <V><n>, <Vm>.<Ts>[<index>]

integer idxdsize = if H == '1' then 128 else 64;

integer index;
bit Rmhi = M;
case sz:L of
when '0x' index = UInt(H:L);
when '10' index = UInt(H);
when '11' UNDEFINED;

integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rmhi:Rm);

integer esize = 32 << UInt(sz);

integer datasize = esize;
integer elements = 1;
boolean sub_op = (o2 == '1');

FMLS (by element) Page 1081

Vector, half-precision
(FEAT_FP16)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 1 0 0 L M Rm 0 1 0 1 H 0 Rn Rd
o2

FMLS <Vd>.<T>, <Vn>.<T>, <Vm>.H[<index>]

if !HaveFP16Ext() then UNDEFINED;

integer idxdsize = if H == '1' then 128 else 64;

integer n = UInt(Rn);
integer m = UInt(Rm);
integer d = UInt(Rd);
integer index = UInt(H:L:M);

integer esize = 16;

integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;
boolean sub_op = (o2 == '1');

Vector, single-precision and double-precision

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 1 1 sz L M Rm 0 1 0 1 H 0 Rn Rd
o2

FMLS <Vd>.<T>, <Vn>.<T>, <Vm>.<Ts>[<index>]

integer idxdsize = if H == '1' then 128 else 64;

integer index;
bit Rmhi = M;
case sz:L of
when '0x' index = UInt(H:L);
when '10' index = UInt(H);
when '11' UNDEFINED;

integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rmhi:Rm);

if sz:Q == '10' then UNDEFINED;

integer esize = 32 << UInt(sz);
integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;
boolean sub_op = (o2 == '1');

Assembler Symbols

<Hd> Is the 16-bit name of the SIMD&FP destination register, encoded in the "Rd" field.
<Hn> Is the 16-bit name of the first SIMD&FP source register, encoded in the "Rn" field.

<V> Is a width specifier, encoded in “sz”:

sz <V>
0 S
1 D

FMLS (by element) Page 1082

<T> For the half-precision variant: is an arrangement specifier, encoded in “Q”:

Q <T>
0 4H
1 8H

For the single-precision and double-precision variant: is an arrangement specifier, encoded in “Q:sz”:

Q sz <T>
0 0 2S
0 1 RESERVED
1 0 4S
1 1 2D

<Ts> Is an element size specifier, encoded in “sz”:

sz <Ts>
0 S
1 D

<index> For the half-precision variant: is the element index, in the range 0 to 7, encoded in the "H:L:M" fields.

For the single-precision and double-precision variant: is the element index, encoded in “sz:L:H”:

sz L <index>
0 x H:L
1 0 H
1 1 RESERVED

Operation

for e = 0 to elements-1
element1 = Elem[operand1, e, esize];
if sub_op then element1 = FPNeg(element1);
Elem[result, e, esize] = FPMulAdd(Elem[operand3, e, esize], element1, element2, fpcr);

V[d] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

FMLS (by element) Page 1083

FMLS (vector)

Floating-point fused Multiply-Subtract from accumulator (vector). This instruction multiplies corresponding floating-
point values in the vectors in the two source SIMD&FP registers, negates the product, adds the result to the
corresponding vector element of the destination SIMD&FP register, and writes the result to the destination SIMD&FP
register.
A floating-point exception can be generated by this instruction. Depending on the settings in FPCR, the exception
results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see
Floating-point exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: Half-precision and Single-precision and double-precision

Half-precision
(FEAT_FP16)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 1 1 0 Rm 0 0 0 0 1 1 Rn Rd
a

FMLS <Vd>.<T>, <Vn>.<T>, <Vm>.<T>

if !HaveFP16Ext() then UNDEFINED;

integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer esize = 16;
integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;

boolean sub_op = (a == '1');

Single-precision and double-precision

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 1 sz 1 Rm 1 1 0 0 1 1 Rn Rd
op

FMLS <Vd>.<T>, <Vn>.<T>, <Vm>.<T>

boolean sub_op = (op == '1');

Assembler Symbols

<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.

<T> For the half-precision variant: is an arrangement specifier, encoded in “Q”:

Q <T>
0 4H
1 8H

FMLS (vector) Page 1084

For the single-precision and double-precision variant: is an arrangement specifier, encoded in “sz:Q”:

sz Q <T>
0 0 2S
0 1 4S
1 0 RESERVED
1 1 2D

<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.

Operation

CheckFPAdvSIMDEnabled64();
bits(datasize) operand1 = V[n];
bits(datasize) operand2 = V[m];
bits(datasize) operand3 = V[d];
bits(datasize) result;
bits(esize) element1;
bits(esize) element2;

V[d] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

FMLS (vector) Page 1085

FMLSL, FMLSL2 (by element)

Floating-point fused Multiply-Subtract Long from accumulator (by element). This instruction multiplies the negated
vector elements in the first source SIMD&FP register by the specified value in the second source SIMD&FP register,
and accumulates the product to the corresponding vector element of the destination SIMD&FP register. The
instruction does not round the result of the multiply before the accumulation.
A floating-point exception can be generated by this instruction. Depending on the settings in FPCR, the exception
results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see
Floating-point exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
In Armv8.2 and Armv8.3, this is an OPTIONAL instruction. From Armv8.4 it is mandatory for all implementations to
support it.

Note

ID_AA64ISAR0_EL1.FHM indicates whether this instruction is supported.

It has encodings from 2 classes: FMLSL and FMLSL2

FMLSL
(FEAT_FHM)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 1 1 0 L M Rm 0 1 0 0 H 0 Rn Rd
sz S

FMLSL <Vd>.<Ta>, <Vn>.<Tb>, <Vm>.H[<index>]

if !HaveFP16MulNoRoundingToFP32Ext() then UNDEFINED;

integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt('0':Rm); // Vm can only be in bottom 16 registers.
if sz == '1' then UNDEFINED;
integer index = UInt(H:L:M);

integer esize = 32;

integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;

boolean sub_op = (S == '1');

integer part = 0;

FMLSL2
(FEAT_FHM)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 1 1 0 L M Rm 1 1 0 0 H 0 Rn Rd
sz S

FMLSL, FMLSL2 (by

Page 1086
element)
FMLSL2 <Vd>.<Ta>, <Vn>.<Tb>, <Vm>.H[<index>]

if !HaveFP16MulNoRoundingToFP32Ext() then UNDEFINED;

integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt('0':Rm); // Vm can only be in bottom 16 registers.
if sz == '1' then UNDEFINED;
integer index = UInt(H:L:M);

integer esize = 32;

integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;

boolean sub_op = (S == '1');

integer part = 1;

Assembler Symbols

<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.

<Ta> Is an arrangement specifier, encoded in “Q”:

Q <Ta>
0 2S
1 4S

<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.

<Tb> Is an arrangement specifier, encoded in “Q”:

Q <Tb>
0 2H
1 4H

<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.
<index> Is the element index, encoded in the "H:L:M" fields.

Operation

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

FMLSL, FMLSL2 (by

Page 1087
element)
FMLSL, FMLSL2 (vector)

Floating-point fused Multiply-Subtract Long from accumulator (vector). This instruction negates the values in the
vector of one SIMD&FP register, multiplies these with the corresponding values in another vector, and accumulates
the product to the corresponding vector element of the destination SIMD&FP register. The instruction does not round
the result of the multiply before the accumulation.
A floating-point exception can be generated by this instruction. Depending on the settings in FPCR, the exception
results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see
Floating-point exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
In Armv8.2 and Armv8.3, this is an OPTIONAL instruction. From Armv8.4 it is mandatory for all implementations to
support it.

Note

ID_AA64ISAR0_EL1.FHM indicates whether this instruction is supported.

It has encodings from 2 classes: FMLSL and FMLSL2

FMLSL
(FEAT_FHM)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 1 0 1 Rm 1 1 1 0 1 1 Rn Rd
S sz

FMLSL <Vd>.<Ta>, <Vn>.<Tb>, <Vm>.<Tb>

if !HaveFP16MulNoRoundingToFP32Ext() then UNDEFINED;

FMLSL2
(FEAT_FHM)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 1 0 1 Rm 1 1 0 0 1 1 Rn Rd
S sz

FMLSL2 <Vd>.<Ta>, <Vn>.<Tb>, <Vm>.<Tb>

if !HaveFP16MulNoRoundingToFP32Ext() then UNDEFINED;

FMLSL, FMLSL2 (vector) Page 1088

Assembler Symbols

<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.

<Ta> Is an arrangement specifier, encoded in “Q”:

Q <Ta>
0 2S
1 4S

<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.

<Tb> Is an arrangement specifier, encoded in “Q”:

Q <Tb>
0 2H
1 4H

<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.

Operation

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

FMLSL, FMLSL2 (vector) Page 1089

FMOV (general)

Floating-point Move to or from general-purpose register without conversion. This instruction transfers the contents of
a SIMD&FP register to a general-purpose register, or the contents of a general-purpose register to a SIMD&FP
register.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 0 0 1 1 1 1 0 ftype 1 0 x 1 1 x 0 0 0 0 0 0 Rn Rd
rmode opcode

FMOV (general) Page 1090

Half-precision to 32-bit (sf == 0 && ftype == 11 && rmode == 00 && opcode == 110)
(FEAT_FP16)

FMOV <Wd>, <Hn>

Half-precision to 64-bit (sf == 1 && ftype == 11 && rmode == 00 && opcode == 110)
(FEAT_FP16)

FMOV <Xd>, <Hn>

32-bit to half-precision (sf == 0 && ftype == 11 && rmode == 00 && opcode == 111)
(FEAT_FP16)

FMOV <Hd>, <Wn>

32-bit to single-precision (sf == 0 && ftype == 00 && rmode == 00 && opcode == 111)

FMOV <Sd>, <Wn>

Single-precision to 32-bit (sf == 0 && ftype == 00 && rmode == 00 && opcode == 110)

FMOV <Wd>, <Sn>

64-bit to half-precision (sf == 1 && ftype == 11 && rmode == 00 && opcode == 111)
(FEAT_FP16)

FMOV <Hd>, <Xn>

64-bit to double-precision (sf == 1 && ftype == 01 && rmode == 00 && opcode == 111)

FMOV <Dd>, <Xn>

64-bit to top half of 128-bit (sf == 1 && ftype == 10 && rmode == 01 && opcode == 111)

FMOV <Vd>.D[1], <Xn>

Double-precision to 64-bit (sf == 1 && ftype == 01 && rmode == 00 && opcode == 110)

FMOV <Xd>, <Dn>

Top half of 128-bit to 64-bit (sf == 1 && ftype == 10 && rmode == 01 && opcode == 110)

FMOV <Xd>, <Vn>.D[1]

FMOV (general) Page 1091

integer d = UInt(Rd);
integer n = UInt(Rn);

integer intsize = if sf == '1' then 64 else 32;

integer fltsize;
FPConvOp op;
FPRounding rounding;
boolean unsigned;
integer part;

case ftype of
when '00'
fltsize = 32;
when '01'
fltsize = 64;
when '10'
if opcode<2:1>:rmode != '11 01' then UNDEFINED;
fltsize = 128;
when '11'
if HaveFP16Ext() then
fltsize = 16;
else
UNDEFINED;

case opcode<2:1>:rmode of
when '00 xx' // FCVT[NPMZ][US]
rounding = FPDecodeRounding(rmode);
unsigned = (opcode<0> == '1');
op = FPConvOp_CVT_FtoI;
when '01 00' // [US]CVTF
rounding = FPRoundingMode(FPCR[]);
unsigned = (opcode<0> == '1');
op = FPConvOp_CVT_ItoF;
when '10 00' // FCVTA[US]
rounding = FPRounding_TIEAWAY;
unsigned = (opcode<0> == '1');
op = FPConvOp_CVT_FtoI;
when '11 00' // FMOV
if fltsize != 16 && fltsize != intsize then UNDEFINED;
op = if opcode<0> == '1' then FPConvOp_MOV_ItoF else FPConvOp_MOV_FtoI;
part = 0;
when '11 01' // FMOV D[1]
if intsize != 64 || fltsize != 128 then UNDEFINED;
op = if opcode<0> == '1' then FPConvOp_MOV_ItoF else FPConvOp_MOV_FtoI;
part = 1;
fltsize = 64; // size of D[1] is 64
when '11 11' // FJCVTZS
if !HaveFJCVTZSExt() then UNDEFINED;
rounding = FPRounding_ZERO;
unsigned = (opcode<0> == '1');
op = FPConvOp_CVT_FtoI_JS;
otherwise
UNDEFINED;

Assembler Symbols

<Dd> Is the 64-bit name of the SIMD&FP destination register, encoded in the "Rd" field.
<Hd> Is the 16-bit name of the SIMD&FP destination register, encoded in the "Rd" field.
<Sd> Is the 32-bit name of the SIMD&FP destination register, encoded in the "Rd" field.
<Wn> Is the 32-bit name of the general-purpose source register, encoded in the "Rn" field.
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
<Xn> Is the 64-bit name of the general-purpose source register, encoded in the "Rn" field.
<Wd> Is the 32-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Sn> Is the 32-bit name of the SIMD&FP source register, encoded in the "Rn" field.

FMOV (general) Page 1092

<Xd> Is the 64-bit name of the general-purpose destination register, encoded in the "Rd" field.
<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.
<Hn> Is the 16-bit name of the SIMD&FP source register, encoded in the "Rn" field.
<Dn> Is the 64-bit name of the SIMD&FP source register, encoded in the "Rn" field.

Operation

CheckFPAdvSIMDEnabled64();

FPCRType fpcr = FPCR[];

boolean merge = IsMerging(fpcr);
integer fsize = if op == FPConvOp_CVT_ItoF && merge then 128 else fltsize;
bits(fsize) fltval;
bits(intsize) intval;

case op of
when FPConvOp_CVT_FtoI
fltval = V[n];
intval = FPToFixed(fltval, 0, unsigned, fpcr, rounding);
X[d] = intval;
when FPConvOp_CVT_ItoF
intval = X[n];
fltval = if merge then V[d] else Zeros();
Elem[fltval, 0, fltsize] = FixedToFP(intval, 0, unsigned, fpcr, rounding);
V[d] = fltval;
when FPConvOp_MOV_FtoI
fltval = Vpart[n, part];
intval = ZeroExtend(fltval, intsize);
X[d] = intval;
when FPConvOp_MOV_ItoF
intval = X[n];
fltval = intval<fsize-1:0>;
Vpart[d, part] = fltval;
when FPConvOp_CVT_FtoI_JS
bit Z;
fltval = V[n];
(intval, Z) = FPToFixedJS(fltval, fpcr, TRUE);
PSTATE.<N,Z,C,V> = '0':Z:'00';
X[d] = intval;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

FMOV (general) Page 1093

FMOV (register)

Floating-point Move register without conversion. This instruction copies the floating-point value in the SIMD&FP
source register to the SIMD&FP destination register.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 1 1 1 1 0 ftype 1 0 0 0 0 0 0 1 0 0 0 0 Rn Rd
opc

Half-precision (ftype == 11)

(FEAT_FP16)

FMOV <Hd>, <Hn>

Single-precision (ftype == 00)

FMOV <Sd>, <Sn>

Double-precision (ftype == 01)

FMOV <Dd>, <Dn>

integer d = UInt(Rd);
integer n = UInt(Rn);

integer esize;
case ftype of
when '00' esize = 32;
when '01' esize = 64;
when '10' UNDEFINED;
when '11'
if HaveFP16Ext() then
esize = 16;
else
UNDEFINED;

Assembler Symbols

Operation

CheckFPAdvSIMDEnabled64();

bits(esize) operand = V[n];

Elem[Zeros(), 0, esize] = operand;

V[d] = Zeros();

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

FMOV (register) Page 1094

FMOV (scalar, immediate)

Floating-point move immediate (scalar). This instruction copies a floating-point immediate constant into the SIMD&FP
destination register.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 1 1 1 1 0 ftype 1 imm8 1 0 0 0 0 0 0 0 Rd

Half-precision (ftype == 11)

(FEAT_FP16)

FMOV <Hd>, #<imm>

Single-precision (ftype == 00)

FMOV <Sd>, #<imm>

Double-precision (ftype == 01)

FMOV <Dd>, #<imm>

integer d = UInt(Rd);

integer datasize;
case ftype of
when '00' datasize = 32;
when '01' datasize = 64;
when '10' UNDEFINED;
when '11'
if HaveFP16Ext() then
datasize = 16;
else
UNDEFINED;

bits(datasize) imm = VFPExpandImm(imm8);

Assembler Symbols

Operation

CheckFPAdvSIMDEnabled64();

V[d] = imm;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

FMOV (scalar, immediate) Page 1095

FMOV (vector, immediate)

Floating-point move immediate (vector). This instruction copies an immediate floating-point constant into every
element of the SIMD&FP destination register.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: Half-precision and Single-precision and double-precision

Half-precision
(FEAT_FP16)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 1 0 0 0 0 0 a b c 1 1 1 1 1 1 d e f g h Rd

FMOV <Vd>.<T>, #<imm>

if !HaveFP16Ext() then UNDEFINED;

integer rd = UInt(Rd);

integer datasize = if Q == '1' then 128 else 64;

bits(datasize) imm;

imm8 = a:b:c:d:e:f:g:h;
imm16 = imm8<7>:NOT(imm8<6>):Replicate(imm8<6>, 2):imm8<5:0>:Zeros(6);

imm = Replicate(imm16, datasize DIV 16);

Single-precision and double-precision

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q op 0 1 1 1 1 0 0 0 0 0 a b c 1 1 1 1 0 1 d e f g h Rd
cmode

Single-precision (op == 0)

FMOV <Vd>.<T>, #<imm>

Double-precision (Q == 1 && op == 1)

FMOV <Vd>.2D, #<imm>

integer rd = UInt(Rd);

integer datasize = if Q == '1' then 128 else 64;

bits(datasize) imm;
bits(64) imm64;

if cmode:op == '11111' then

// FMOV Dn,#imm is in main FP instruction set
if Q == '0' then UNDEFINED;

imm64 = AdvSIMDExpandImm(op, cmode, a:b:c:d:e:f:g:h);

imm = Replicate(imm64, datasize DIV 64);

Assembler Symbols

<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.

<T> For the half-precision variant: is an arrangement specifier, encoded in “Q”:

FMOV (vector, immediate) Page 1096

Q <T>
0 4H
1 8H

For the single-precision variant: is an arrangement specifier, encoded in “Q”:

Q <T>
0 2S
1 4S

<imm> Is a signed floating-point constant with 3-bit exponent and normalized 4 bits of precision, encoded in
"a:b:c:d:e:f:g:h". For details of the range of constants available and the encoding of <imm>, see
Modified immediate constants in A64 floating-point instructions.

Operation

CheckFPAdvSIMDEnabled64();

V[rd] = imm;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

FMOV (vector, immediate) Page 1097

FMSUB

Floating-point Fused Multiply-Subtract (scalar). This instruction multiplies the values of the first two SIMD&FP source
registers, negates the product, adds that to the value of the third SIMD&FP source register, and writes the result to
the SIMD&FP destination register.
A floating-point exception can be generated by this instruction. Depending on the settings in FPCR, the exception
results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see
Floating-point exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 1 1 1 1 1 ftype 0 Rm 1 Ra Rn Rd
o1 o0

Half-precision (ftype == 11)

(FEAT_FP16)

FMSUB <Hd>, <Hn>, <Hm>, <Ha>

Single-precision (ftype == 00)

FMSUB <Sd>, <Sn>, <Sm>, <Sa>

Double-precision (ftype == 01)

FMSUB <Dd>, <Dn>, <Dm>, <Da>

integer d = UInt(Rd);
integer a = UInt(Ra);
integer n = UInt(Rn);
integer m = UInt(Rm);

integer esize;
case ftype of
when '00' esize = 32;
when '01' esize = 64;
when '10' UNDEFINED;
when '11'
if HaveFP16Ext() then
esize = 16;
else
UNDEFINED;

Assembler Symbols

<Dd> Is the 64-bit name of the SIMD&FP destination register, encoded in the "Rd" field.
<Dn> Is the 64-bit name of the first SIMD&FP source register holding the multiplicand, encoded in the "Rn"
field.
<Dm> Is the 64-bit name of the second SIMD&FP source register holding the multiplier, encoded in the "Rm"
field.
<Da> Is the 64-bit name of the third SIMD&FP source register holding the minuend, encoded in the "Ra"
field.
<Hd> Is the 16-bit name of the SIMD&FP destination register, encoded in the "Rd" field.
<Hn> Is the 16-bit name of the first SIMD&FP source register holding the multiplicand, encoded in the "Rn"
field.
<Hm> Is the 16-bit name of the second SIMD&FP source register holding the multiplier, encoded in the "Rm"
field.

FMSUB Page 1098

<Ha> Is the 16-bit name of the third SIMD&FP source register holding the minuend, encoded in the "Ra"
field.
<Sd> Is the 32-bit name of the SIMD&FP destination register, encoded in the "Rd" field.
<Sn> Is the 32-bit name of the first SIMD&FP source register holding the multiplicand, encoded in the "Rn"
field.
<Sm> Is the 32-bit name of the second SIMD&FP source register holding the multiplier, encoded in the "Rm"
field.
<Sa> Is the 32-bit name of the third SIMD&FP source register holding the minuend, encoded in the "Ra"
field.

Operation

CheckFPAdvSIMDEnabled64();

bits(esize) operanda = V[a];

bits(esize) operand1 = V[n];
bits(esize) operand2 = V[m];

FPCRType fpcr = FPCR[];

boolean merge = IsMerging(fpcr);
bits(128) result = if merge then V[a] else Zeros();

operand1 = FPNeg(operand1);
Elem[result, 0, esize] = FPMulAdd(operanda, operand1, operand2, fpcr);

V[d] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

FMSUB Page 1099

FMUL (by element)

Floating-point Multiply (by element). This instruction multiplies the vector elements in the first source SIMD&FP
register by the specified value in the second source SIMD&FP register, places the results in a vector, and writes the
vector to the destination SIMD&FP register. All the values in this instruction are floating-point values.
This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in
either a flag being set in FPSR or a synchronous exception being generated. For more information, see Floating-point
exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 4 classes: Scalar, half-precision , Scalar, single-precision and double-precision , Vector, half-
precision and Vector, single-precision and double-precision

Scalar, half-precision
(FEAT_FP16)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 1 0 0 L M Rm 1 0 0 1 H 0 Rn Rd
U

FMUL <Hd>, <Hn>, <Vm>.H[<index>]

if !HaveFP16Ext() then UNDEFINED;

integer idxdsize = if H == '1' then 128 else 64;

integer n = UInt(Rn);
integer m = UInt(Rm);
integer d = UInt(Rd);
integer index = UInt(H:L:M);

integer esize = 16;

integer datasize = esize;
integer elements = 1;
boolean mulx_op = (U == '1');

Scalar, single-precision and double-precision

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 1 1 sz L M Rm 1 0 0 1 H 0 Rn Rd
U

FMUL <V><d>, <V><n>, <Vm>.<Ts>[<index>]

integer idxdsize = if H == '1' then 128 else 64;

integer index;
bit Rmhi = M;
case sz:L of
when '0x' index = UInt(H:L);
when '10' index = UInt(H);
when '11' UNDEFINED;

integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rmhi:Rm);

integer esize = 32 << UInt(sz);

integer datasize = esize;
integer elements = 1;
boolean mulx_op = (U == '1');

FMUL (by element) Page 1100

Vector, half-precision
(FEAT_FP16)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 1 0 0 L M Rm 1 0 0 1 H 0 Rn Rd
U

FMUL <Vd>.<T>, <Vn>.<T>, <Vm>.H[<index>]

if !HaveFP16Ext() then UNDEFINED;

integer idxdsize = if H == '1' then 128 else 64;

integer n = UInt(Rn);
integer m = UInt(Rm);
integer d = UInt(Rd);
integer index = UInt(H:L:M);

integer esize = 16;

integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;
boolean mulx_op = (U == '1');

Vector, single-precision and double-precision

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 1 1 sz L M Rm 1 0 0 1 H 0 Rn Rd
U

FMUL <Vd>.<T>, <Vn>.<T>, <Vm>.<Ts>[<index>]

integer idxdsize = if H == '1' then 128 else 64;

integer index;
bit Rmhi = M;
case sz:L of
when '0x' index = UInt(H:L);
when '10' index = UInt(H);
when '11' UNDEFINED;

integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rmhi:Rm);

if sz:Q == '10' then UNDEFINED;

integer esize = 32 << UInt(sz);
integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;
boolean mulx_op = (U == '1');

Assembler Symbols

<Hd> Is the 16-bit name of the SIMD&FP destination register, encoded in the "Rd" field.
<Hn> Is the 16-bit name of the first SIMD&FP source register, encoded in the "Rn" field.

<V> Is a width specifier, encoded in “sz”:

sz <V>
0 S
1 D

FMUL (by element) Page 1101

<T> For the half-precision variant: is an arrangement specifier, encoded in “Q”:

Q <T>
0 4H
1 8H

For the single-precision and double-precision variant: is an arrangement specifier, encoded in “Q:sz”:

Q sz <T>
0 0 2S
0 1 RESERVED
1 0 4S
1 1 2D

<Ts> Is an element size specifier, encoded in “sz”:

sz <Ts>
0 S
1 D

<index> For the half-precision variant: is the element index, in the range 0 to 7, encoded in the "H:L:M" fields.

For the single-precision and double-precision variant: is the element index, encoded in “sz:L:H”:

sz L <index>
0 x H:L
1 0 H
1 1 RESERVED

Operation

CheckFPAdvSIMDEnabled64();
bits(datasize) operand1 = V[n];
bits(idxdsize) operand2 = V[m];
bits(esize) element1;
bits(esize) element2 = Elem[operand2, index, esize];
FPCRType fpcr = FPCR[];
boolean merge = elements == 1 && IsMerging(fpcr);
bits(128) result = if merge then V[n] else Zeros();

for e = 0 to elements-1
element1 = Elem[operand1, e, esize];
if mulx_op then
Elem[result, e, esize] = FPMulX(element1, element2, fpcr);
else
Elem[result, e, esize] = FPMul(element1, element2, fpcr);

V[d] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

FMUL (by element) Page 1102

FMUL (scalar)

Floating-point Multiply (scalar). This instruction multiplies the floating-point values of the two source SIMD&FP
registers, and writes the result to the destination SIMD&FP register.
This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in
either a flag being set in FPSR, or a synchronous exception being generated. For more information, see Floating-point
exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 1 1 1 1 0 ftype 1 Rm 0 0 0 0 1 0 Rn Rd
op

Half-precision (ftype == 11)

(FEAT_FP16)

FMUL <Hd>, <Hn>, <Hm>

Single-precision (ftype == 00)

FMUL <Sd>, <Sn>, <Sm>

Double-precision (ftype == 01)

FMUL <Dd>, <Dn>, <Dm>

integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);

integer esize;
case ftype of
when '00' esize = 32;
when '01' esize = 64;
when '10' UNDEFINED;
when '11'
if HaveFP16Ext() then
esize = 16;
else
UNDEFINED;

Assembler Symbols

FMUL (scalar) Page 1103

Operation

CheckFPAdvSIMDEnabled64();
bits(esize) operand1 = V[n];
bits(esize) operand2 = V[m];

FPCRType fpcr = FPCR[];

boolean merge = IsMerging(fpcr);
bits(128) result = if merge then V[n] else Zeros();

bits(esize) product = FPMul(operand1, operand2, fpcr);

Elem[result, 0, esize] = product;

V[d] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

FMUL (scalar) Page 1104

FMUL (vector)

Floating-point Multiply (vector). This instruction multiplies corresponding floating-point values in the vectors in the
two source SIMD&FP registers, places the result in a vector, and writes the vector to the destination SIMD&FP
register.
This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in
either a flag being set in FPSR, or a synchronous exception being generated. For more information, see Floating-point
exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: Half-precision and Single-precision and double-precision

Half-precision
(FEAT_FP16)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 0 1 0 Rm 0 0 0 1 1 1 Rn Rd

FMUL <Vd>.<T>, <Vn>.<T>, <Vm>.<T>

if !HaveFP16Ext() then UNDEFINED;

integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer esize = 16;
integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;

Single-precision and double-precision

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 0 sz 1 Rm 1 1 0 1 1 1 Rn Rd

FMUL <Vd>.<T>, <Vn>.<T>, <Vm>.<T>

Assembler Symbols

<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.

<T> For the half-precision variant: is an arrangement specifier, encoded in “Q”:

Q <T>
0 4H
1 8H

For the single-precision and double-precision variant: is an arrangement specifier, encoded in “sz:Q”:

FMUL (vector) Page 1105

sz Q <T>
0 0 2S
0 1 4S
1 0 RESERVED
1 1 2D

<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.

Operation

CheckFPAdvSIMDEnabled64();
bits(datasize) operand1 = V[n];
bits(datasize) operand2 = V[m];
bits(datasize) result;
bits(esize) element1;
bits(esize) element2;

for e = 0 to elements-1
element1 = Elem[operand1, e, esize];
element2 = Elem[operand2, e, esize];
Elem[result, e, esize] = FPMul(element1, element2, FPCR[]);

V[d] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

FMUL (vector) Page 1106

FMULX

Floating-point Multiply extended. This instruction multiplies corresponding floating-point values in the vectors of the
two source SIMD&FP registers, places the resulting floating-point values in a vector, and writes the vector to the
destination SIMD&FP register.
If one value is zero and the other value is infinite, the result is 2.0. In this case, the result is negative if only one of the
values is negative, otherwise the result is positive.
This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in
either a flag being set in FPSR, or a synchronous exception being generated. For more information, see Floating-point
exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 4 classes: Scalar half precision , Scalar single-precision and double-precision , Vector half
precision and Vector single-precision and double-precision

Scalar half precision

(FEAT_FP16)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 0 0 1 0 Rm 0 0 0 1 1 1 Rn Rd

FMULX <Hd>, <Hn>, <Hm>

if !HaveFP16Ext() then UNDEFINED;

integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer esize = 16;
integer datasize = esize;
integer elements = 1;

Scalar single-precision and double-precision

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 0 0 sz 1 Rm 1 1 0 1 1 1 Rn Rd

FMULX <V><d>, <V><n>, <V><m>

integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer esize = 32 << UInt(sz);
integer datasize = esize;
integer elements = 1;

Vector half precision

(FEAT_FP16)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 0 1 0 Rm 0 0 0 1 1 1 Rn Rd

FMULX Page 1107

FMULX <Vd>.<T>, <Vn>.<T>, <Vm>.<T>

if !HaveFP16Ext() then UNDEFINED;

integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer esize = 16;
integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;

Vector single-precision and double-precision

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 0 sz 1 Rm 1 1 0 1 1 1 Rn Rd

FMULX <Vd>.<T>, <Vn>.<T>, <Vm>.<T>

Assembler Symbols

<V> Is a width specifier, encoded in “sz”:

sz <V>
0 S
1 D

<T> For the half-precision variant: is an arrangement specifier, encoded in “Q”:

Q <T>
0 4H
1 8H

For the single-precision and double-precision variant: is an arrangement specifier, encoded in “sz:Q”:

sz Q <T>
0 0 2S
0 1 4S
1 0 RESERVED
1 1 2D

<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.

FMULX Page 1108

Operation

CheckFPAdvSIMDEnabled64();
bits(datasize) operand1 = V[n];
bits(datasize) operand2 = V[m];

bits(esize) element1;
bits(esize) element2;
FPCRType fpcr = FPCR[];
boolean merge = elements == 1 && IsMerging(fpcr);
bits(128) result = if merge then V[n] else Zeros();

for e = 0 to elements-1
element1 = Elem[operand1, e, esize];
element2 = Elem[operand2, e, esize];
Elem[result, e, esize] = FPMulX(element1, element2, fpcr);
V[d] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

FMULX Page 1109

FMULX (by element)

Floating-point Multiply extended (by element). This instruction multiplies the floating-point values in the vector
elements in the first source SIMD&FP register by the specified floating-point value in the second source SIMD&FP
register, places the results in a vector, and writes the vector to the destination SIMD&FP register.
If one value is zero and the other value is infinite, the result is 2.0. In this case, the result is negative if only one of the
values is negative, otherwise the result is positive.
This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in
either a flag being set in FPSR or a synchronous exception being generated. For more information, see Floating-point
exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 4 classes: Scalar, half-precision , Scalar, single-precision and double-precision , Vector, half-
precision and Vector, single-precision and double-precision

Scalar, half-precision
(FEAT_FP16)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 1 1 1 0 0 L M Rm 1 0 0 1 H 0 Rn Rd
U

FMULX <Hd>, <Hn>, <Vm>.H[<index>]

if !HaveFP16Ext() then UNDEFINED;

integer idxdsize = if H == '1' then 128 else 64;

integer n = UInt(Rn);
integer m = UInt(Rm);
integer d = UInt(Rd);
integer index = UInt(H:L:M);

integer esize = 16;

integer datasize = esize;
integer elements = 1;
boolean mulx_op = (U == '1');

Scalar, single-precision and double-precision

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 1 1 1 1 sz L M Rm 1 0 0 1 H 0 Rn Rd
U

FMULX <V><d>, <V><n>, <Vm>.<Ts>[<index>]

integer idxdsize = if H == '1' then 128 else 64;

integer index;
bit Rmhi = M;
case sz:L of
when '0x' index = UInt(H:L);
when '10' index = UInt(H);
when '11' UNDEFINED;

integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rmhi:Rm);

integer esize = 32 << UInt(sz);

integer datasize = esize;
integer elements = 1;
boolean mulx_op = (U == '1');

FMULX (by element) Page 1110

Vector, half-precision
(FEAT_FP16)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 1 0 0 L M Rm 1 0 0 1 H 0 Rn Rd
U

FMULX <Vd>.<T>, <Vn>.<T>, <Vm>.H[<index>]

if !HaveFP16Ext() then UNDEFINED;

integer idxdsize = if H == '1' then 128 else 64;

integer n = UInt(Rn);
integer m = UInt(Rm);
integer d = UInt(Rd);
integer index = UInt(H:L:M);

integer esize = 16;

integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;
boolean mulx_op = (U == '1');

Vector, single-precision and double-precision

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 1 1 sz L M Rm 1 0 0 1 H 0 Rn Rd
U

FMULX <Vd>.<T>, <Vn>.<T>, <Vm>.<Ts>[<index>]

integer idxdsize = if H == '1' then 128 else 64;

integer index;
bit Rmhi = M;
case sz:L of
when '0x' index = UInt(H:L);
when '10' index = UInt(H);
when '11' UNDEFINED;

integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rmhi:Rm);

if sz:Q == '10' then UNDEFINED;

integer esize = 32 << UInt(sz);
integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;
boolean mulx_op = (U == '1');

Assembler Symbols

<Hd> Is the 16-bit name of the SIMD&FP destination register, encoded in the "Rd" field.
<Hn> Is the 16-bit name of the first SIMD&FP source register, encoded in the "Rn" field.

<V> Is a width specifier, encoded in “sz”:

sz <V>
0 S
1 D

FMULX (by element) Page 1111

<T> For the half-precision variant: is an arrangement specifier, encoded in “Q”:

Q <T>
0 4H
1 8H

For the single-precision and double-precision variant: is an arrangement specifier, encoded in “Q:sz”:

Q sz <T>
0 0 2S
0 1 RESERVED
1 0 4S
1 1 2D

<Ts> Is an element size specifier, encoded in “sz”:

sz <Ts>
0 S
1 D

<index> For the half-precision variant: is the element index, in the range 0 to 7, encoded in the "H:L:M" fields.

For the single-precision and double-precision variant: is the element index, encoded in “sz:L:H”:

sz L <index>
0 x H:L
1 0 H
1 1 RESERVED

Operation

for e = 0 to elements-1
element1 = Elem[operand1, e, esize];
if mulx_op then
Elem[result, e, esize] = FPMulX(element1, element2, fpcr);
else
Elem[result, e, esize] = FPMul(element1, element2, fpcr);

V[d] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

FMULX (by element) Page 1112

FNEG (scalar)

Floating-point Negate (scalar). This instruction negates the value in the SIMD&FP source register and writes the
result to the SIMD&FP destination register.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 1 1 1 1 0 ftype 1 0 0 0 0 1 0 1 0 0 0 0 Rn Rd
opc

Half-precision (ftype == 11)

(FEAT_FP16)

FNEG <Hd>, <Hn>

Single-precision (ftype == 00)

FNEG <Sd>, <Sn>

Double-precision (ftype == 01)

FNEG <Dd>, <Dn>

integer d = UInt(Rd);
integer n = UInt(Rn);

integer esize;
case ftype of
when '00' esize = 32;
when '01' esize = 64;
when '10' UNDEFINED;
when '11'
if HaveFP16Ext() then
esize = 16;
else
UNDEFINED;

Assembler Symbols

Operation

CheckFPAdvSIMDEnabled64();

FPCRType fpcr = FPCR[];

boolean merge = IsMerging(fpcr);
bits(128) result = if merge then V[d] else Zeros();

bits(esize) operand = V[n];

Elem[result, 0, esize] = FPNeg(operand);

V[d] = result;

FNEG (scalar) Page 1113

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

FNEG (scalar) Page 1114

FNEG (vector)

Floating-point Negate (vector). This instruction negates the value of each vector element in the source SIMD&FP
register, writes the result to a vector, and writes the vector to the destination SIMD&FP register.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: Half-precision and Single-precision and double-precision

Half-precision
(FEAT_FP16)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 1 1 1 1 1 0 0 0 1 1 1 1 1 0 Rn Rd
U

FNEG <Vd>.<T>, <Vn>.<T>

if !HaveFP16Ext() then UNDEFINED;

integer d = UInt(Rd);
integer n = UInt(Rn);

integer esize = 16;

integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;
boolean neg = (U == '1');

Single-precision and double-precision

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 1 sz 1 0 0 0 0 0 1 1 1 1 1 0 Rn Rd
U

FNEG <Vd>.<T>, <Vn>.<T>

integer d = UInt(Rd);
integer n = UInt(Rn);

if sz:Q == '10' then UNDEFINED;

integer esize = 32 << UInt(sz);
integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;
boolean neg = (U == '1');

Assembler Symbols

<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.

<T> For the half-precision variant: is an arrangement specifier, encoded in “Q”:

Q <T>
0 4H
1 8H

For the single-precision and double-precision variant: is an arrangement specifier, encoded in “sz:Q”:

FNEG (vector) Page 1115

sz Q <T>
0 0 2S
0 1 4S
1 0 RESERVED
1 1 2D

<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.

Operation

CheckFPAdvSIMDEnabled64();
bits(datasize) operand = V[n];
bits(datasize) result;
bits(esize) element;

for e = 0 to elements-1
element = Elem[operand, e, esize];
if neg then
element = FPNeg(element);
else
element = FPAbs(element);
Elem[result, e, esize] = element;

V[d] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

FNEG (vector) Page 1116

FNMADD

Floating-point Negated fused Multiply-Add (scalar). This instruction multiplies the values of the first two SIMD&FP
source registers, negates the product, subtracts the value of the third SIMD&FP source register, and writes the result
to the destination SIMD&FP register.
This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in
either a flag being set in FPSR, or a synchronous exception being generated. For more information, see Floating-point
exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 1 1 1 1 1 ftype 1 Rm 0 Ra Rn Rd
o1 o0

Half-precision (ftype == 11)

(FEAT_FP16)

FNMADD <Hd>, <Hn>, <Hm>, <Ha>

Single-precision (ftype == 00)

FNMADD <Sd>, <Sn>, <Sm>, <Sa>

Double-precision (ftype == 01)

FNMADD <Dd>, <Dn>, <Dm>, <Da>

integer d = UInt(Rd);
integer a = UInt(Ra);
integer n = UInt(Rn);
integer m = UInt(Rm);

integer esize;
case ftype of
when '00' esize = 32;
when '01' esize = 64;
when '10' UNDEFINED;
when '11'
if HaveFP16Ext() then
esize = 16;
else
UNDEFINED;

Assembler Symbols

FNMADD Page 1117

Operation

CheckFPAdvSIMDEnabled64();

bits(esize) operanda = V[a];

bits(esize) operand1 = V[n];
bits(esize) operand2 = V[m];

FPCRType fpcr = FPCR[];

boolean merge = IsMerging(fpcr);
bits(128) result = if merge then V[a] else Zeros();

operanda = FPNeg(operanda);
operand1 = FPNeg(operand1);
Elem[result, 0, esize] = FPMulAdd(operanda, operand1, operand2, fpcr);

V[d] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

FNMADD Page 1118

FNMSUB

Floating-point Negated fused Multiply-Subtract (scalar). This instruction multiplies the values of the first two
SIMD&FP source registers, subtracts the value of the third SIMD&FP source register, and writes the result to the
destination SIMD&FP register.
A floating-point exception can be generated by this instruction. Depending on the settings in FPCR, the exception
results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see
Floating-point exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 1 1 1 1 1 ftype 1 Rm 1 Ra Rn Rd
o1 o0

Half-precision (ftype == 11)

(FEAT_FP16)

FNMSUB <Hd>, <Hn>, <Hm>, <Ha>

Single-precision (ftype == 00)

FNMSUB <Sd>, <Sn>, <Sm>, <Sa>

Double-precision (ftype == 01)

FNMSUB <Dd>, <Dn>, <Dm>, <Da>

integer d = UInt(Rd);
integer a = UInt(Ra);
integer n = UInt(Rn);
integer m = UInt(Rm);

integer esize;
case ftype of
when '00' esize = 32;
when '01' esize = 64;
when '10' UNDEFINED;
when '11'
if HaveFP16Ext() then
esize = 16;
else
UNDEFINED;

Assembler Symbols

<Dd> Is the 64-bit name of the SIMD&FP destination register, encoded in the "Rd" field.
<Dn> Is the 64-bit name of the first SIMD&FP source register holding the multiplicand, encoded in the "Rn"
field.
<Dm> Is the 64-bit name of the second SIMD&FP source register holding the multiplier, encoded in the "Rm"
field.
<Da> Is the 64-bit name of the third SIMD&FP source register holding the minuend, encoded in the "Ra"
field.
<Hd> Is the 16-bit name of the SIMD&FP destination register, encoded in the "Rd" field.
<Hn> Is the 16-bit name of the first SIMD&FP source register holding the multiplicand, encoded in the "Rn"
field.
<Hm> Is the 16-bit name of the second SIMD&FP source register holding the multiplier, encoded in the "Rm"
field.

FNMSUB Page 1119

Operation

CheckFPAdvSIMDEnabled64();

bits(esize) operanda = V[a];

bits(esize) operand1 = V[n];
bits(esize) operand2 = V[m];

FPCRType fpcr = FPCR[];

boolean merge = IsMerging(fpcr);
bits(128) result = if merge then V[a] else Zeros();

operanda = FPNeg(operanda);
Elem[result, 0, esize] = FPMulAdd(operanda, operand1, operand2, fpcr);

V[d] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

FNMSUB Page 1120

FNMUL (scalar)

Floating-point Multiply-Negate (scalar). This instruction multiplies the floating-point values of the two source
SIMD&FP registers, and writes the negation of the result to the destination SIMD&FP register.
This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in
either a flag being set in FPSR, or a synchronous exception being generated. For more information, see Floating-point
exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 1 1 1 1 0 ftype 1 Rm 1 0 0 0 1 0 Rn Rd
op

Half-precision (ftype == 11)

(FEAT_FP16)

FNMUL <Hd>, <Hn>, <Hm>

Single-precision (ftype == 00)

FNMUL <Sd>, <Sn>, <Sm>

Double-precision (ftype == 01)

FNMUL <Dd>, <Dn>, <Dm>

integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);

integer esize;
case ftype of
when '00' esize = 32;
when '01' esize = 64;
when '10' UNDEFINED;
when '11'
if HaveFP16Ext() then
esize = 16;
else
UNDEFINED;

Assembler Symbols

FNMUL (scalar) Page 1121

Operation

CheckFPAdvSIMDEnabled64();
bits(esize) operand1 = V[n];
bits(esize) operand2 = V[m];

FPCRType fpcr = FPCR[];

boolean merge = IsMerging(fpcr);
bits(128) result = if merge then V[n] else Zeros();

bits(esize) product = FPMul(operand1, operand2, fpcr);

product = FPNeg(product);
Elem[result, 0, esize] = product;

V[d] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

FNMUL (scalar) Page 1122

FRECPE

Floating-point Reciprocal Estimate. This instruction finds an approximate reciprocal estimate for each vector element
in the source SIMD&FP register, places the result in a vector, and writes the vector to the destination SIMD&FP
register.
This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in
either a flag being set in FPSR or a synchronous exception being generated. For more information, see Floating-point
exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 4 classes: Scalar half precision , Scalar single-precision and double-precision , Vector half
precision and Vector single-precision and double-precision

Scalar half precision

(FEAT_FP16)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 0 1 1 1 1 1 0 0 1 1 1 0 1 1 0 Rn Rd

FRECPE <Hd>, <Hn>

if !HaveFP16Ext() then UNDEFINED;

integer d = UInt(Rd);
integer n = UInt(Rn);

integer esize = 16;

integer datasize = esize;
integer elements = 1;

Scalar single-precision and double-precision

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 0 1 sz 1 0 0 0 0 1 1 1 0 1 1 0 Rn Rd

FRECPE <V><d>, <V><n>

integer d = UInt(Rd);
integer n = UInt(Rn);

integer esize = 32 << UInt(sz);

integer datasize = esize;
integer elements = 1;

Vector half precision

(FEAT_FP16)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 1 1 1 1 1 0 0 1 1 1 0 1 1 0 Rn Rd

FRECPE <Vd>.<T>, <Vn>.<T>

if !HaveFP16Ext() then UNDEFINED;

integer d = UInt(Rd);
integer n = UInt(Rn);

integer esize = 16;

integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;

FRECPE Page 1123

Vector single-precision and double-precision

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 1 sz 1 0 0 0 0 1 1 1 0 1 1 0 Rn Rd

FRECPE <Vd>.<T>, <Vn>.<T>

integer d = UInt(Rd);
integer n = UInt(Rn);

if sz:Q == '10' then UNDEFINED;

integer esize = 32 << UInt(sz);
integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;

Assembler Symbols

<Hd> Is the 16-bit name of the SIMD&FP destination register, encoded in the "Rd" field.
<Hn> Is the 16-bit name of the SIMD&FP source register, encoded in the "Rn" field.

<V> Is a width specifier, encoded in “sz”:

sz <V>
0 S
1 D

<T> For the half-precision variant: is an arrangement specifier, encoded in “Q”:

Q <T>
0 4H
1 8H

For the single-precision and double-precision variant: is an arrangement specifier, encoded in “sz:Q”:

sz Q <T>
0 0 2S
0 1 4S
1 0 RESERVED
1 1 2D

<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.

Operation

CheckFPAdvSIMDEnabled64();
bits(datasize) operand = V[n];

FPCRType fpcr = FPCR[];

boolean merge = elements == 1 && IsMerging(fpcr);
bits(128) result = if merge then V[d] else Zeros();
bits(esize) element;

for e = 0 to elements-1
element = Elem[operand, e, esize];
Elem[result, e, esize] = FPRecipEstimate(element, FPCR[]);

V[d] = result;

FRECPE Page 1124

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

FRECPE Page 1125

FRECPS

Floating-point Reciprocal Step. This instruction multiplies the corresponding floating-point values in the vectors of the
two source SIMD&FP registers, subtracts each of the products from 2.0, places the resulting floating-point values in a
vector, and writes the vector to the destination SIMD&FP register.
This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in
either a flag being set in FPSR, or a synchronous exception being generated. For more information, see Floating-point
exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 4 classes: Scalar half precision , Scalar single-precision and double-precision , Vector half
precision and Vector single-precision and double-precision

Scalar half precision

(FEAT_FP16)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 0 0 1 0 Rm 0 0 1 1 1 1 Rn Rd

FRECPS <Hd>, <Hn>, <Hm>

if !HaveFP16Ext() then UNDEFINED;

integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer esize = 16;
integer datasize = esize;
integer elements = 1;

Scalar single-precision and double-precision

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 0 0 sz 1 Rm 1 1 1 1 1 1 Rn Rd

FRECPS <V><d>, <V><n>, <V><m>

integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer esize = 32 << UInt(sz);
integer datasize = esize;
integer elements = 1;

Vector half precision

(FEAT_FP16)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 0 1 0 Rm 0 0 1 1 1 1 Rn Rd

FRECPS <Vd>.<T>, <Vn>.<T>, <Vm>.<T>

if !HaveFP16Ext() then UNDEFINED;

integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer esize = 16;
integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;

FRECPS Page 1126

Vector single-precision and double-precision

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 0 sz 1 Rm 1 1 1 1 1 1 Rn Rd

FRECPS <Vd>.<T>, <Vn>.<T>, <Vm>.<T>

Assembler Symbols

<V> Is a width specifier, encoded in “sz”:

sz <V>
0 S
1 D

<T> For the half-precision variant: is an arrangement specifier, encoded in “Q”:

Q <T>
0 4H
1 8H

For the single-precision and double-precision variant: is an arrangement specifier, encoded in “sz:Q”:

sz Q <T>
0 0 2S
0 1 4S
1 0 RESERVED
1 1 2D

<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.

FRECPS Page 1127

Operation

CheckFPAdvSIMDEnabled64();
bits(datasize) operand1 = V[n];
bits(datasize) operand2 = V[m];

bits(esize) element1;
bits(esize) element2;
FPCRType fpcr = FPCR[];
boolean merge = elements == 1 && IsMerging(fpcr);
bits(128) result = if merge then V[n] else Zeros();

for e = 0 to elements-1
element1 = Elem[operand1, e, esize];
element2 = Elem[operand2, e, esize];
Elem[result, e, esize] = FPRecipStepFused(element1, element2);

V[d] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

FRECPS Page 1128

FRECPX

Floating-point Reciprocal exponent (scalar). This instruction finds an approximate reciprocal exponent for the source
SIMD&FP register and writes the result to the destination SIMD&FP register.
This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in
either a flag being set in FPSR or a synchronous exception being generated. For more information, see Floating-point
exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: Half-precision and Single-precision and double-precision

Half-precision
(FEAT_FP16)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 0 1 1 1 1 1 0 0 1 1 1 1 1 1 0 Rn Rd

FRECPX <Hd>, <Hn>

if !HaveFP16Ext() then UNDEFINED;

integer d = UInt(Rd);
integer n = UInt(Rn);

integer esize = 16;

Single-precision and double-precision

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 0 1 sz 1 0 0 0 0 1 1 1 1 1 1 0 Rn Rd

FRECPX <V><d>, <V><n>

integer d = UInt(Rd);
integer n = UInt(Rn);

integer esize = 32 << UInt(sz);

Assembler Symbols

<Hd> Is the 16-bit name of the SIMD&FP destination register, encoded in the "Rd" field.
<Hn> Is the 16-bit name of the SIMD&FP source register, encoded in the "Rn" field.

<V> Is a width specifier, encoded in “sz”:

sz <V>
0 S
1 D

<d> Is the number of the SIMD&FP destination register, encoded in the "Rd" field.
<n> Is the number of the SIMD&FP source register, encoded in the "Rn" field.

FRECPX Page 1129

Operation

CheckFPAdvSIMDEnabled64();
bits(esize) operand = V[n];

FPCRType fpcr = FPCR[];

boolean merge = IsMerging(fpcr);
bits(128) result = if merge then V[d] else Zeros();

Elem[result, 0, esize] = FPRecpX(operand, fpcr);

V[d] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

FRECPX Page 1130

FRINT32X (scalar)

Floating-point Round to 32-bit Integer, using current rounding mode (scalar). This instruction rounds a floating-point
value in the SIMD&FP source register to an integral floating-point value that fits into a 32-bit integer size using the
rounding mode that is determined by the FPCR, and writes the result to the SIMD&FP destination register.
A zero input returns a zero result with the same sign. When the result value is not numerically equal to the input
value, an Inexact exception is raised. When the input is infinite, NaN or out-of-range, the instruction returns {for the
corresponding result value} the most negative integer representable in the destination size, and an Invalid Operation
floating-point exception is raised.
A floating-point exception can be generated by this instruction. Depending on the settings in FPCR, the exception
results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see
Floating-point exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.

Floating-point
(FEAT_FRINTTS)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 1 1 1 1 0 0 x 1 0 1 0 0 0 1 1 0 0 0 0 Rn Rd
ftype op

Single-precision (ftype == 00)

FRINT32X <Sd>, <Sn>

Double-precision (ftype == 01)

FRINT32X <Dd>, <Dn>

if !HaveFrintExt() then UNDEFINED;

integer d = UInt(Rd);
integer n = UInt(Rn);

integer esize;
case ftype of
when '00' esize = 32;
when '01' esize = 64;
when '1x' UNDEFINED;

FPRounding rounding = FPRoundingMode(FPCR[]);

Assembler Symbols

<Dd> Is the 64-bit name of the SIMD&FP destination register, encoded in the "Rd" field.
<Dn> Is the 64-bit name of the SIMD&FP source register, encoded in the "Rn" field.
<Sd> Is the 32-bit name of the SIMD&FP destination register, encoded in the "Rd" field.
<Sn> Is the 32-bit name of the SIMD&FP source register, encoded in the "Rn" field.

FRINT32X (scalar) Page 1131

Operation

CheckFPAdvSIMDEnabled64();

FPCRType fpcr = FPCR[];

boolean merge = IsMerging(fpcr);
bits(128) result = if merge then V[d] else Zeros();
bits(esize) operand = V[n];

Elem[result, 0, esize] = FPRoundIntN(operand, fpcr, rounding, 32);

V[d] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

FRINT32X (scalar) Page 1132

FRINT32X (vector)

Floating-point Round to 32-bit Integer, using current rounding mode (vector). This instruction rounds a vector of
floating-point values in the SIMD&FP source register to integral floating-point values that fit into a 32-bit integer size
using the rounding mode that is determined by the FPCR, and writes the result to the SIMD&FP destination register.
A zero input returns a zero result with the same sign. When one of the result values is not numerically equal to the
corresponding input value, an Inexact exception is raised. When an input is infinite, NaN or out-of-range, the
instruction returns for the corresponding result value the most negative integer representable in the destination size,
and an Invalid Operation floating-point exception is raised.
A floating-point exception can be generated by this instruction. Depending on the settings in FPCR, the exception
results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see
Floating-point exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.

Vector single-precision and double-precision

(FEAT_FRINTTS)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 0 sz 1 0 0 0 0 1 1 1 1 0 1 0 Rn Rd
U op

FRINT32X <Vd>.<T>, <Vn>.<T>

if !HaveFrintExt() then UNDEFINED;

integer d = UInt(Rd);
integer n = UInt(Rn);

if sz:Q == '10' then UNDEFINED;

integer esize = 32 << UInt(sz);
integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;
integer intsize = if op == '0' then 32 else 64;
FPRounding rounding = if U == '0' then FPRounding_ZERO else FPRoundingMode(FPCR[]);

Assembler Symbols

<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.

<T> Is an arrangement specifier, encoded in “sz:Q”:

sz Q <T>
0 0 2S
0 1 4S
1 0 RESERVED
1 1 2D

<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.

Operation

CheckFPAdvSIMDEnabled64();
bits(datasize) operand = V[n];
bits(datasize) result;
bits(esize) element;

for e = 0 to elements-1
element = Elem[operand, e, esize];
Elem[result, e, esize] = FPRoundIntN(element, FPCR[], rounding, intsize);

V[d] = result;

FRINT32X (vector) Page 1133

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

FRINT32X (vector) Page 1134

FRINT32Z (scalar)

Floating-point Round to 32-bit Integer toward Zero (scalar). This instruction rounds a floating-point value in the
SIMD&FP source register to an integral floating-point value that fits into a 32-bit integer size using the Round towards
Zero rounding mode, and writes the result to the SIMD&FP destination register.
A zero input returns a zero result with the same sign. When the result value is not numerically equal to the
{corresponding} input value, an Inexact exception is raised. When the input is infinite, NaN or out-of-range, the
instruction returns {for the corresponding result value} the most negative integer representable in the destination
size, and an Invalid Operation floating-point exception is raised.
A floating-point exception can be generated by this instruction. Depending on the settings in FPCR, the exception
results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see
Floating-point exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.

Floating-point
(FEAT_FRINTTS)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 1 1 1 1 0 0 x 1 0 1 0 0 0 0 1 0 0 0 0 Rn Rd
ftype op

Single-precision (ftype == 00)

FRINT32Z <Sd>, <Sn>

Double-precision (ftype == 01)

FRINT32Z <Dd>, <Dn>

if !HaveFrintExt() then UNDEFINED;

integer d = UInt(Rd);
integer n = UInt(Rn);

integer esize;
case ftype of
when '00' esize = 32;
when '01' esize = 64;
when '1x' UNDEFINED;

Assembler Symbols

Operation

CheckFPAdvSIMDEnabled64();

FPCRType fpcr = FPCR[];

boolean merge = IsMerging(fpcr);
bits(128) result = if merge then V[d] else Zeros();
bits(esize) operand = V[n];

Elem[result, 0, esize] = FPRoundIntN(operand, fpcr, FPRounding_ZERO, 32);

V[d] = result;

FRINT32Z (scalar) Page 1135

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

FRINT32Z (scalar) Page 1136

FRINT32Z (vector)

Floating-point Round to 32-bit Integer toward Zero (vector). This instruction rounds a vector of floating-point values in
the SIMD&FP source register to integral floating-point values that fit into a 32-bit integer size using the Round
towards Zero rounding mode, and writes the result to the SIMD&FP destination register.
A zero input returns a zero result with the same sign. When one of the result values is not numerically equal to the
corresponding input value, an Inexact exception is raised. When an input is infinite, NaN or out-of-range, the
instruction returns for the corresponding result value the most negative integer representable in the destination size,
and an Invalid Operation floating-point exception is raised.
A floating-point exception can be generated by this instruction. Depending on the settings in FPCR, the exception
results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see
Floating-point exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.

Vector single-precision and double-precision

(FEAT_FRINTTS)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 0 sz 1 0 0 0 0 1 1 1 1 0 1 0 Rn Rd
U op

FRINT32Z <Vd>.<T>, <Vn>.<T>

if !HaveFrintExt() then UNDEFINED;

integer d = UInt(Rd);
integer n = UInt(Rn);

if sz:Q == '10' then UNDEFINED;

Assembler Symbols

<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.

<T> Is an arrangement specifier, encoded in “sz:Q”:

sz Q <T>
0 0 2S
0 1 4S
1 0 RESERVED
1 1 2D

<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.

Operation

CheckFPAdvSIMDEnabled64();
bits(datasize) operand = V[n];
bits(datasize) result;
bits(esize) element;

for e = 0 to elements-1
element = Elem[operand, e, esize];
Elem[result, e, esize] = FPRoundIntN(element, FPCR[], rounding, intsize);

V[d] = result;

FRINT32Z (vector) Page 1137

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

FRINT32Z (vector) Page 1138

FRINT64X (scalar)

Floating-point Round to 64-bit Integer, using current rounding mode (scalar). This instruction rounds a floating-point
value in the SIMD&FP source register to an integral floating-point value that fits into a 64-bit integer size using the
rounding mode that is determined by the FPCR, and writes the result to the SIMD&FP destination register.
A zero input returns a zero result with the same sign. When the result value is not numerically equal to the input
value, an Inexact exception is raised. When the input is infinite, NaN or out-of-range, the instruction returns {for the
corresponding result value} the most negative integer representable in the destination size, and an Invalid Operation
floating-point exception is raised.
A floating-point exception can be generated by this instruction. Depending on the settings in FPCR, the exception
results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see
Floating-point exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.

Floating-point
(FEAT_FRINTTS)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 1 1 1 1 0 0 x 1 0 1 0 0 1 1 1 0 0 0 0 Rn Rd
ftype op

Single-precision (ftype == 00)

FRINT64X <Sd>, <Sn>

Double-precision (ftype == 01)

FRINT64X <Dd>, <Dn>

if !HaveFrintExt() then UNDEFINED;

integer d = UInt(Rd);
integer n = UInt(Rn);

integer esize;
case ftype of
when '00' esize = 32;
when '01' esize = 64;
when '1x' UNDEFINED;

FPRounding rounding = FPRoundingMode(FPCR[]);

Assembler Symbols

FRINT64X (scalar) Page 1139

Operation

CheckFPAdvSIMDEnabled64();

FPCRType fpcr = FPCR[];

boolean merge = IsMerging(fpcr);
bits(128) result = if merge then V[d] else Zeros();
bits(esize) operand = V[n];

Elem[result, 0, esize] = FPRoundIntN(operand, fpcr, rounding, 64);

V[d] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

FRINT64X (scalar) Page 1140

FRINT64X (vector)

Floating-point Round to 64-bit Integer, using current rounding mode (vector). This instruction rounds a vector of
floating-point values in the SIMD&FP source register to integral floating-point values that fit into a 64-bit integer size
using the rounding mode that is determined by the FPCR, and writes the result to the SIMD&FP destination register.
A zero input returns a zero result with the same sign. When one of the result values is not numerically equal to the
corresponding input value, an Inexact exception is raised. When an input is infinite, NaN or out-of-range, the
instruction returns for the corresponding result value the most negative integer representable in the destination size,
and an Invalid Operation floating-point exception is raised.
A floating-point exception can be generated by this instruction. Depending on the settings in FPCR, the exception
results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see
Floating-point exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.

Vector single-precision and double-precision

(FEAT_FRINTTS)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 0 sz 1 0 0 0 0 1 1 1 1 1 1 0 Rn Rd
U op

FRINT64X <Vd>.<T>, <Vn>.<T>

if !HaveFrintExt() then UNDEFINED;

integer d = UInt(Rd);
integer n = UInt(Rn);

if sz:Q == '10' then UNDEFINED;

Assembler Symbols

<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.

<T> Is an arrangement specifier, encoded in “sz:Q”:

sz Q <T>
0 0 2S
0 1 4S
1 0 RESERVED
1 1 2D

<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.

Operation

CheckFPAdvSIMDEnabled64();
bits(datasize) operand = V[n];
bits(datasize) result;
bits(esize) element;

for e = 0 to elements-1
element = Elem[operand, e, esize];
Elem[result, e, esize] = FPRoundIntN(element, FPCR[], rounding, intsize);

V[d] = result;

FRINT64X (vector) Page 1141

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

FRINT64X (vector) Page 1142

FRINT64Z (scalar)

Floating-point Round to 64-bit Integer toward Zero (scalar). This instruction rounds a floating-point value in the
SIMD&FP source register to an integral floating-point value that fits into a 64-bit integer size using the Round towards
Zero rounding mode, and writes the result to the SIMD&FP destination register.
A zero input returns a zero result with the same sign. When the result value is not numerically equal to the
{corresponding} input value, an Inexact exception is raised. When the input is infinite, NaN or out-of-range, the
instruction returns {for the corresponding result value} the most negative integer representable in the destination
size, and an Invalid Operation floating-point exception is raised.
A floating-point exception can be generated by this instruction. Depending on the settings in FPCR, the exception
results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see
Floating-point exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.

Floating-point
(FEAT_FRINTTS)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 1 1 1 1 0 0 x 1 0 1 0 0 1 0 1 0 0 0 0 Rn Rd
ftype op

Single-precision (ftype == 00)

FRINT64Z <Sd>, <Sn>

Double-precision (ftype == 01)

FRINT64Z <Dd>, <Dn>

if !HaveFrintExt() then UNDEFINED;

integer d = UInt(Rd);
integer n = UInt(Rn);

integer esize;
case ftype of
when '00' esize = 32;
when '01' esize = 64;
when '1x' UNDEFINED;

Assembler Symbols

Operation

CheckFPAdvSIMDEnabled64();

FPCRType fpcr = FPCR[];

boolean merge = IsMerging(fpcr);
bits(128) result = if merge then V[d] else Zeros();
bits(esize) operand = V[n];

Elem[result, 0, esize] = FPRoundIntN(operand, fpcr, FPRounding_ZERO, 64);

V[d] = result;

FRINT64Z (scalar) Page 1143

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

FRINT64Z (scalar) Page 1144

FRINT64Z (vector)

Floating-point Round to 64-bit Integer toward Zero (vector). This instruction rounds a vector of floating-point values in
the SIMD&FP source register to integral floating-point values that fit into a 64-bit integer size using the Round
towards Zero rounding mode, and writes the result to the SIMD&FP destination register.
A zero input returns a zero result with the same sign. When one of the result values is not numerically equal to the
corresponding input value, an Inexact exception is raised. When an input is infinite, NaN or out-of-range, the
instruction returns for the corresponding result value the most negative integer representable in the destination size,
and an Invalid Operation floating-point exception is raised.
A floating-point exception can be generated by this instruction. Depending on the settings in FPCR, the exception
results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see
Floating-point exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.

Vector single-precision and double-precision

(FEAT_FRINTTS)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 0 sz 1 0 0 0 0 1 1 1 1 1 1 0 Rn Rd
U op

FRINT64Z <Vd>.<T>, <Vn>.<T>

if !HaveFrintExt() then UNDEFINED;

integer d = UInt(Rd);
integer n = UInt(Rn);

if sz:Q == '10' then UNDEFINED;

Assembler Symbols

<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.

<T> Is an arrangement specifier, encoded in “sz:Q”:

sz Q <T>
0 0 2S
0 1 4S
1 0 RESERVED
1 1 2D

<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.

Operation

CheckFPAdvSIMDEnabled64();
bits(datasize) operand = V[n];
bits(datasize) result;
bits(esize) element;

for e = 0 to elements-1
element = Elem[operand, e, esize];
Elem[result, e, esize] = FPRoundIntN(element, FPCR[], rounding, intsize);

V[d] = result;

FRINT64Z (vector) Page 1145

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

FRINT64Z (vector) Page 1146

FRINTA (scalar)

Floating-point Round to Integral, to nearest with ties to Away (scalar). This instruction rounds a floating-point value in
the SIMD&FP source register to an integral floating-point value of the same size using the Round to Nearest with Ties
to Away rounding mode, and writes the result to the SIMD&FP destination register.
A zero input gives a zero result with the same sign, an infinite input gives an infinite result with the same sign, and a
NaN is propagated as for normal arithmetic.
A floating-point exception can be generated by this instruction. Depending on the settings in FPCR, the exception
results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see
Floating-point exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 1 1 1 1 0 ftype 1 0 0 1 1 0 0 1 0 0 0 0 Rn Rd
rmode

Half-precision (ftype == 11)

(FEAT_FP16)

FRINTA <Hd>, <Hn>

Single-precision (ftype == 00)

FRINTA <Sd>, <Sn>

Double-precision (ftype == 01)

FRINTA <Dd>, <Dn>

integer d = UInt(Rd);
integer n = UInt(Rn);

integer esize;
case ftype of
when '00' esize = 32;
when '01' esize = 64;
when '10' UNDEFINED;
when '11'
if HaveFP16Ext() then
esize = 16;
else
UNDEFINED;

Assembler Symbols

FRINTA (scalar) Page 1147

Operation

CheckFPAdvSIMDEnabled64();

FPCRType fpcr = FPCR[];

boolean merge = IsMerging(fpcr);
bits(128) result = if merge then V[d] else Zeros();
bits(esize) operand = V[n];

Elem[result, 0, esize] = FPRoundInt(operand, fpcr, FPRounding_TIEAWAY, FALSE);

V[d] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

FRINTA (scalar) Page 1148

FRINTA (vector)

Floating-point Round to Integral, to nearest with ties to Away (vector). This instruction rounds a vector of floating-
point values in the SIMD&FP source register to integral floating-point values of the same size using the Round to
Nearest with Ties to Away rounding mode, and writes the result to the SIMD&FP destination register.
A zero input gives a zero result with the same sign, an infinite input gives an infinite result with the same sign, and a
NaN is propagated as for normal arithmetic.
A floating-point exception can be generated by this instruction. Depending on the settings in FPCR, the exception
results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see
Floating-point exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: Half-precision and Single-precision and double-precision

Half-precision
(FEAT_FP16)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 0 1 1 1 1 0 0 1 1 0 0 0 1 0 Rn Rd
U o2 o1

FRINTA <Vd>.<T>, <Vn>.<T>

if !HaveFP16Ext() then UNDEFINED;

integer d = UInt(Rd);
integer n = UInt(Rn);

integer esize = 16;

integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;

boolean exact = FALSE;

FPRounding rounding;
case U:o1:o2 of
when '0xx' rounding = FPDecodeRounding(o1:o2);
when '100' rounding = FPRounding_TIEAWAY;
when '101' UNDEFINED;
when '110' rounding = FPRoundingMode(FPCR[]); exact = TRUE;
when '111' rounding = FPRoundingMode(FPCR[]);

Single-precision and double-precision

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 0 sz 1 0 0 0 0 1 1 0 0 0 1 0 Rn Rd
U o2 o1

FRINTA (vector) Page 1149

FRINTA <Vd>.<T>, <Vn>.<T>

integer d = UInt(Rd);
integer n = UInt(Rn);

if sz:Q == '10' then UNDEFINED;

integer esize = 32 << UInt(sz);
integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;

boolean exact = FALSE;

Assembler Symbols

<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.

<T> For the half-precision variant: is an arrangement specifier, encoded in “Q”:

Q <T>
0 4H
1 8H

For the single-precision and double-precision variant: is an arrangement specifier, encoded in “sz:Q”:

sz Q <T>
0 0 2S
0 1 4S
1 0 RESERVED
1 1 2D

<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.

Operation

CheckFPAdvSIMDEnabled64();
bits(datasize) operand = V[n];
bits(datasize) result;
bits(esize) element;

for e = 0 to elements-1
element = Elem[operand, e, esize];
Elem[result, e, esize] = FPRoundInt(element, FPCR[], rounding, exact);

V[d] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

FRINTA (vector) Page 1150

FRINTI (scalar)

Floating-point Round to Integral, using current rounding mode (scalar). This instruction rounds a floating-point value
in the SIMD&FP source register to an integral floating-point value of the same size using the rounding mode that is
determined by the FPCR, and writes the result to the SIMD&FP destination register.
A zero input gives a zero result with the same sign, an infinite input gives an infinite result with the same sign, and a
NaN is propagated as for normal arithmetic.
A floating-point exception can be generated by this instruction. Depending on the settings in FPCR, the exception
results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see
Floating-point exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 1 1 1 1 0 ftype 1 0 0 1 1 1 1 1 0 0 0 0 Rn Rd
rmode

Half-precision (ftype == 11)

(FEAT_FP16)

FRINTI <Hd>, <Hn>

Single-precision (ftype == 00)

FRINTI <Sd>, <Sn>

Double-precision (ftype == 01)

FRINTI <Dd>, <Dn>

integer d = UInt(Rd);
integer n = UInt(Rn);

integer esize;
case ftype of
when '00' esize = 32;
when '01' esize = 64;
when '10' UNDEFINED;
when '11'
if HaveFP16Ext() then
esize = 16;
else
UNDEFINED;

FPRounding rounding;
rounding = FPRoundingMode(FPCR[]);

Assembler Symbols

FRINTI (scalar) Page 1151

Operation

CheckFPAdvSIMDEnabled64();

FPCRType fpcr = FPCR[];

boolean merge = IsMerging(fpcr);
bits(128) result = if merge then V[d] else Zeros();
bits(esize) operand = V[n];

Elem[result, 0, esize] = FPRoundInt(operand, fpcr, rounding, FALSE);

V[d] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

FRINTI (scalar) Page 1152

FRINTI (vector)

Floating-point Round to Integral, using current rounding mode (vector). This instruction rounds a vector of floating-
point values in the SIMD&FP source register to integral floating-point values of the same size using the rounding
mode that is determined by the FPCR, and writes the result to the SIMD&FP destination register.
A zero input gives a zero result with the same sign, an infinite input gives an infinite result with the same sign, and a
NaN is propagated as for normal arithmetic.
A floating-point exception can be generated by this instruction. Depending on the settings in FPCR, the exception
results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see
Floating-point exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: Half-precision and Single-precision and double-precision

Half-precision
(FEAT_FP16)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 1 1 1 1 1 0 0 1 1 0 0 1 1 0 Rn Rd
U o2 o1

FRINTI <Vd>.<T>, <Vn>.<T>

if !HaveFP16Ext() then UNDEFINED;

integer d = UInt(Rd);
integer n = UInt(Rn);

integer esize = 16;

integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;

boolean exact = FALSE;

Single-precision and double-precision

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 1 sz 1 0 0 0 0 1 1 0 0 1 1 0 Rn Rd
U o2 o1

FRINTI (vector) Page 1153

FRINTI <Vd>.<T>, <Vn>.<T>

integer d = UInt(Rd);
integer n = UInt(Rn);

if sz:Q == '10' then UNDEFINED;

integer esize = 32 << UInt(sz);
integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;

boolean exact = FALSE;

Assembler Symbols

<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.

<T> For the half-precision variant: is an arrangement specifier, encoded in “Q”:

Q <T>
0 4H
1 8H

For the single-precision and double-precision variant: is an arrangement specifier, encoded in “sz:Q”:

sz Q <T>
0 0 2S
0 1 4S
1 0 RESERVED
1 1 2D

<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.

Operation

CheckFPAdvSIMDEnabled64();
bits(datasize) operand = V[n];
bits(datasize) result;
bits(esize) element;

for e = 0 to elements-1
element = Elem[operand, e, esize];
Elem[result, e, esize] = FPRoundInt(element, FPCR[], rounding, exact);

V[d] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

FRINTI (vector) Page 1154

FRINTM (scalar)

Floating-point Round to Integral, toward Minus infinity (scalar). This instruction rounds a floating-point value in the
SIMD&FP source register to an integral floating-point value of the same size using the Round towards Minus Infinity
rounding mode, and writes the result to the SIMD&FP destination register.
A zero input gives a zero result with the same sign, an infinite input gives an infinite result with the same sign, and a
NaN is propagated as for normal arithmetic.
A floating-point exception can be generated by this instruction. Depending on the settings in FPCR, the exception
results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see
Floating-point exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 1 1 1 1 0 ftype 1 0 0 1 0 1 0 1 0 0 0 0 Rn Rd
rmode

Half-precision (ftype == 11)

(FEAT_FP16)

FRINTM <Hd>, <Hn>

Single-precision (ftype == 00)

FRINTM <Sd>, <Sn>

Double-precision (ftype == 01)

FRINTM <Dd>, <Dn>

integer d = UInt(Rd);
integer n = UInt(Rn);

integer esize;
case ftype of
when '00' esize = 32;
when '01' esize = 64;
when '10' UNDEFINED;
when '11'
if HaveFP16Ext() then
esize = 16;
else
UNDEFINED;

FPRounding rounding;
rounding = FPDecodeRounding('10');

Assembler Symbols

FRINTM (scalar) Page 1155

Operation

CheckFPAdvSIMDEnabled64();

FPCRType fpcr = FPCR[];

boolean merge = IsMerging(fpcr);
bits(128) result = if merge then V[d] else Zeros();
bits(esize) operand = V[n];

Elem[result, 0, esize] = FPRoundInt(operand, fpcr, rounding, FALSE);

V[d] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

FRINTM (scalar) Page 1156

FRINTM (vector)

Floating-point Round to Integral, toward Minus infinity (vector). This instruction rounds a vector of floating-point
values in the SIMD&FP source register to integral floating-point values of the same size using the Round towards
Minus Infinity rounding mode, and writes the result to the SIMD&FP destination register.
A zero input gives a zero result with the same sign, an infinite input gives an infinite result with the same sign, and a
NaN is propagated as for normal arithmetic.
A floating-point exception can be generated by this instruction. Depending on the settings in FPCR, the exception
results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see
Floating-point exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: Half-precision and Single-precision and double-precision

Half-precision
(FEAT_FP16)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 0 1 1 1 1 0 0 1 1 0 0 1 1 0 Rn Rd
U o2 o1

FRINTM <Vd>.<T>, <Vn>.<T>

if !HaveFP16Ext() then UNDEFINED;

integer d = UInt(Rd);
integer n = UInt(Rn);

integer esize = 16;

integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;

boolean exact = FALSE;

Single-precision and double-precision

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 0 sz 1 0 0 0 0 1 1 0 0 1 1 0 Rn Rd
U o2 o1

FRINTM (vector) Page 1157

FRINTM <Vd>.<T>, <Vn>.<T>

integer d = UInt(Rd);
integer n = UInt(Rn);

if sz:Q == '10' then UNDEFINED;

integer esize = 32 << UInt(sz);
integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;

boolean exact = FALSE;

Assembler Symbols

<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.

<T> For the half-precision variant: is an arrangement specifier, encoded in “Q”:

Q <T>
0 4H
1 8H

For the single-precision and double-precision variant: is an arrangement specifier, encoded in “sz:Q”:

sz Q <T>
0 0 2S
0 1 4S
1 0 RESERVED
1 1 2D

<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.

Operation

CheckFPAdvSIMDEnabled64();
bits(datasize) operand = V[n];
bits(datasize) result;
bits(esize) element;

for e = 0 to elements-1
element = Elem[operand, e, esize];
Elem[result, e, esize] = FPRoundInt(element, FPCR[], rounding, exact);

V[d] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

FRINTM (vector) Page 1158

FRINTN (scalar)

Floating-point Round to Integral, to nearest with ties to even (scalar). This instruction rounds a floating-point value in
the SIMD&FP source register to an integral floating-point value of the same size using the Round to Nearest rounding
mode, and writes the result to the SIMD&FP destination register.
A zero input gives a zero result with the same sign, an infinite input gives an infinite result with the same sign, and a
NaN is propagated as for normal arithmetic.
A floating-point exception can be generated by this instruction. Depending on the settings in FPCR, the exception
results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see
Floating-point exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 1 1 1 1 0 ftype 1 0 0 1 0 0 0 1 0 0 0 0 Rn Rd
rmode

Half-precision (ftype == 11)

(FEAT_FP16)

FRINTN <Hd>, <Hn>

Single-precision (ftype == 00)

FRINTN <Sd>, <Sn>

Double-precision (ftype == 01)

FRINTN <Dd>, <Dn>

integer d = UInt(Rd);
integer n = UInt(Rn);

integer esize;
case ftype of
when '00' esize = 32;
when '01' esize = 64;
when '10' UNDEFINED;
when '11'
if HaveFP16Ext() then
esize = 16;
else
UNDEFINED;

FPRounding rounding;
rounding = FPDecodeRounding('00');

Assembler Symbols

FRINTN (scalar) Page 1159

Operation

CheckFPAdvSIMDEnabled64();

FPCRType fpcr = FPCR[];

boolean merge = IsMerging(fpcr);
bits(128) result = if merge then V[d] else Zeros();
bits(esize) operand = V[n];

Elem[result, 0, esize] = FPRoundInt(operand, fpcr, rounding, FALSE);

V[d] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

FRINTN (scalar) Page 1160

FRINTN (vector)

Floating-point Round to Integral, to nearest with ties to even (vector). This instruction rounds a vector of floating-point
values in the SIMD&FP source register to integral floating-point values of the same size using the Round to Nearest
rounding mode, and writes the result to the SIMD&FP destination register.
A zero input gives a zero result with the same sign, an infinite input gives an infinite result with the same sign, and a
NaN is propagated as for normal arithmetic.
A floating-point exception can be generated by this instruction. Depending on the settings in FPCR, the exception
results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see
Floating-point exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: Half-precision and Single-precision and double-precision

Half-precision
(FEAT_FP16)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 0 1 1 1 1 0 0 1 1 0 0 0 1 0 Rn Rd
U o2 o1

FRINTN <Vd>.<T>, <Vn>.<T>

if !HaveFP16Ext() then UNDEFINED;

integer d = UInt(Rd);
integer n = UInt(Rn);

integer esize = 16;

integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;

boolean exact = FALSE;

Single-precision and double-precision

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 0 sz 1 0 0 0 0 1 1 0 0 0 1 0 Rn Rd
U o2 o1

FRINTN (vector) Page 1161

FRINTN <Vd>.<T>, <Vn>.<T>

integer d = UInt(Rd);
integer n = UInt(Rn);

if sz:Q == '10' then UNDEFINED;

integer esize = 32 << UInt(sz);
integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;

boolean exact = FALSE;

Assembler Symbols

<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.

<T> For the half-precision variant: is an arrangement specifier, encoded in “Q”:

Q <T>
0 4H
1 8H

For the single-precision and double-precision variant: is an arrangement specifier, encoded in “sz:Q”:

sz Q <T>
0 0 2S
0 1 4S
1 0 RESERVED
1 1 2D

<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.

Operation

CheckFPAdvSIMDEnabled64();
bits(datasize) operand = V[n];
bits(datasize) result;
bits(esize) element;

for e = 0 to elements-1
element = Elem[operand, e, esize];
Elem[result, e, esize] = FPRoundInt(element, FPCR[], rounding, exact);

V[d] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

FRINTN (vector) Page 1162

FRINTP (scalar)

Floating-point Round to Integral, toward Plus infinity (scalar). This instruction rounds a floating-point value in the
SIMD&FP source register to an integral floating-point value of the same size using the Round towards Plus Infinity
rounding mode, and writes the result to the SIMD&FP destination register.
A zero input gives a zero result with the same sign, an infinite input gives an infinite result with the same sign, and a
NaN is propagated as for normal arithmetic.
A floating-point exception can be generated by this instruction. Depending on the settings in FPCR, the exception
results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see
Floating-point exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 1 1 1 1 0 ftype 1 0 0 1 0 0 1 1 0 0 0 0 Rn Rd
rmode

Half-precision (ftype == 11)

(FEAT_FP16)

FRINTP <Hd>, <Hn>

Single-precision (ftype == 00)

FRINTP <Sd>, <Sn>

Double-precision (ftype == 01)

FRINTP <Dd>, <Dn>

integer d = UInt(Rd);
integer n = UInt(Rn);

integer esize;
case ftype of
when '00' esize = 32;
when '01' esize = 64;
when '10' UNDEFINED;
when '11'
if HaveFP16Ext() then
esize = 16;
else
UNDEFINED;

FPRounding rounding;
rounding = FPDecodeRounding('01');

Assembler Symbols

FRINTP (scalar) Page 1163

Operation

CheckFPAdvSIMDEnabled64();

FPCRType fpcr = FPCR[];

boolean merge = IsMerging(fpcr);
bits(128) result = if merge then V[d] else Zeros();
bits(esize) operand = V[n];

Elem[result, 0, esize] = FPRoundInt(operand, fpcr, rounding, FALSE);

V[d] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

FRINTP (scalar) Page 1164

FRINTP (vector)

Floating-point Round to Integral, toward Plus infinity (vector). This instruction rounds a vector of floating-point values
in the SIMD&FP source register to integral floating-point values of the same size using the Round towards Plus
Infinity rounding mode, and writes the result to the SIMD&FP destination register.
A zero input gives a zero result with the same sign, an infinite input gives an infinite result with the same sign, and a
NaN is propagated as for normal arithmetic.
A floating-point exception can be generated by this instruction. Depending on the settings in FPCR, the exception
results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see
Floating-point exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: Half-precision and Single-precision and double-precision

Half-precision
(FEAT_FP16)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 1 1 1 1 1 0 0 1 1 0 0 0 1 0 Rn Rd
U o2 o1

FRINTP <Vd>.<T>, <Vn>.<T>

if !HaveFP16Ext() then UNDEFINED;

integer d = UInt(Rd);
integer n = UInt(Rn);

integer esize = 16;

integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;

boolean exact = FALSE;

Single-precision and double-precision

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 1 sz 1 0 0 0 0 1 1 0 0 0 1 0 Rn Rd
U o2 o1

FRINTP (vector) Page 1165

FRINTP <Vd>.<T>, <Vn>.<T>

integer d = UInt(Rd);
integer n = UInt(Rn);

if sz:Q == '10' then UNDEFINED;

integer esize = 32 << UInt(sz);
integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;

boolean exact = FALSE;

Assembler Symbols

<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.

<T> For the half-precision variant: is an arrangement specifier, encoded in “Q”:

Q <T>
0 4H
1 8H

For the single-precision and double-precision variant: is an arrangement specifier, encoded in “sz:Q”:

sz Q <T>
0 0 2S
0 1 4S
1 0 RESERVED
1 1 2D

<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.

Operation

CheckFPAdvSIMDEnabled64();
bits(datasize) operand = V[n];
bits(datasize) result;
bits(esize) element;

for e = 0 to elements-1
element = Elem[operand, e, esize];
Elem[result, e, esize] = FPRoundInt(element, FPCR[], rounding, exact);

V[d] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

FRINTP (vector) Page 1166

FRINTX (scalar)

Floating-point Round to Integral exact, using current rounding mode (scalar). This instruction rounds a floating-point
value in the SIMD&FP source register to an integral floating-point value of the same size using the rounding mode
that is determined by the FPCR, and writes the result to the SIMD&FP destination register.
When the result value is not numerically equal to the input value, an Inexact exception is raised. A zero input gives a
zero result with the same sign, an infinite input gives an infinite result with the same sign, and a NaN is propagated as
for normal arithmetic.
A floating-point exception can be generated by this instruction. Depending on the settings in FPCR, the exception
results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see
Floating-point exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 1 1 1 1 0 ftype 1 0 0 1 1 1 0 1 0 0 0 0 Rn Rd
rmode

Half-precision (ftype == 11)

(FEAT_FP16)

FRINTX <Hd>, <Hn>

Single-precision (ftype == 00)

FRINTX <Sd>, <Sn>

Double-precision (ftype == 01)

FRINTX <Dd>, <Dn>

integer d = UInt(Rd);
integer n = UInt(Rn);

integer esize;
case ftype of
when '00' esize = 32;
when '01' esize = 64;
when '10' UNDEFINED;
when '11'
if HaveFP16Ext() then
esize = 16;
else
UNDEFINED;

FPRounding rounding;
rounding = FPRoundingMode(FPCR[]);

Assembler Symbols

FRINTX (scalar) Page 1167

Operation

CheckFPAdvSIMDEnabled64();

FPCRType fpcr = FPCR[];

boolean merge = IsMerging(fpcr);
bits(128) result = if merge then V[d] else Zeros();
bits(esize) operand = V[n];

Elem[result, 0, esize] = FPRoundInt(operand, fpcr, rounding, TRUE);

V[d] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

FRINTX (scalar) Page 1168

FRINTX (vector)

Floating-point Round to Integral exact, using current rounding mode (vector). This instruction rounds a vector of
floating-point values in the SIMD&FP source register to integral floating-point values of the same size using the
rounding mode that is determined by the FPCR, and writes the result to the SIMD&FP destination register.
When a result value is not numerically equal to the corresponding input value, an Inexact exception is raised. A zero
input gives a zero result with the same sign, an infinite input gives an infinite result with the same sign, and a NaN is
propagated as for normal arithmetic.
A floating-point exception can be generated by this instruction. Depending on the settings in FPCR, the exception
results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see
Floating-point exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: Half-precision and Single-precision and double-precision

Half-precision
(FEAT_FP16)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 0 1 1 1 1 0 0 1 1 0 0 1 1 0 Rn Rd
U o2 o1

FRINTX <Vd>.<T>, <Vn>.<T>

if !HaveFP16Ext() then UNDEFINED;

integer d = UInt(Rd);
integer n = UInt(Rn);

integer esize = 16;

integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;

boolean exact = FALSE;

Single-precision and double-precision

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 0 sz 1 0 0 0 0 1 1 0 0 1 1 0 Rn Rd
U o2 o1

FRINTX (vector) Page 1169

FRINTX <Vd>.<T>, <Vn>.<T>

integer d = UInt(Rd);
integer n = UInt(Rn);

if sz:Q == '10' then UNDEFINED;

integer esize = 32 << UInt(sz);
integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;

boolean exact = FALSE;

Assembler Symbols

<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.

<T> For the half-precision variant: is an arrangement specifier, encoded in “Q”:

Q <T>
0 4H
1 8H

For the single-precision and double-precision variant: is an arrangement specifier, encoded in “sz:Q”:

sz Q <T>
0 0 2S
0 1 4S
1 0 RESERVED
1 1 2D

<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.

Operation

CheckFPAdvSIMDEnabled64();
bits(datasize) operand = V[n];
bits(datasize) result;
bits(esize) element;

for e = 0 to elements-1
element = Elem[operand, e, esize];
Elem[result, e, esize] = FPRoundInt(element, FPCR[], rounding, exact);

V[d] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

FRINTX (vector) Page 1170

FRINTZ (scalar)

Floating-point Round to Integral, toward Zero (scalar). This instruction rounds a floating-point value in the SIMD&FP
source register to an integral floating-point value of the same size using the Round towards Zero rounding mode, and
writes the result to the SIMD&FP destination register.
A zero input gives a zero result with the same sign, an infinite input gives an infinite result with the same sign, and a
NaN is propagated as for normal arithmetic.
A floating-point exception can be generated by this instruction. Depending on the settings in FPCR, the exception
results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see
Floating-point exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 1 1 1 1 0 ftype 1 0 0 1 0 1 1 1 0 0 0 0 Rn Rd
rmode

Half-precision (ftype == 11)

(FEAT_FP16)

FRINTZ <Hd>, <Hn>

Single-precision (ftype == 00)

FRINTZ <Sd>, <Sn>

Double-precision (ftype == 01)

FRINTZ <Dd>, <Dn>

integer d = UInt(Rd);
integer n = UInt(Rn);

integer esize;
case ftype of
when '00' esize = 32;
when '01' esize = 64;
when '10' UNDEFINED;
when '11'
if HaveFP16Ext() then
esize = 16;
else
UNDEFINED;

FPRounding rounding;
rounding = FPDecodeRounding('11');

Assembler Symbols

FRINTZ (scalar) Page 1171

Operation

CheckFPAdvSIMDEnabled64();

FPCRType fpcr = FPCR[];

boolean merge = IsMerging(fpcr);
bits(128) result = if merge then V[d] else Zeros();
bits(esize) operand = V[n];

Elem[result, 0, esize] = FPRoundInt(operand, fpcr, rounding, FALSE);

V[d] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

FRINTZ (scalar) Page 1172

FRINTZ (vector)

Floating-point Round to Integral, toward Zero (vector). This instruction rounds a vector of floating-point values in the
SIMD&FP source register to integral floating-point values of the same size using the Round towards Zero rounding
mode, and writes the result to the SIMD&FP destination register.
A zero input gives a zero result with the same sign, an infinite input gives an infinite result with the same sign, and a
NaN is propagated as for normal arithmetic.
A floating-point exception can be generated by this instruction. Depending on the settings in FPCR, the exception
results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see
Floating-point exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: Half-precision and Single-precision and double-precision

Half-precision
(FEAT_FP16)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 1 1 1 1 1 0 0 1 1 0 0 1 1 0 Rn Rd
U o2 o1

FRINTZ <Vd>.<T>, <Vn>.<T>

if !HaveFP16Ext() then UNDEFINED;

integer d = UInt(Rd);
integer n = UInt(Rn);

integer esize = 16;

integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;

boolean exact = FALSE;

Single-precision and double-precision

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 1 sz 1 0 0 0 0 1 1 0 0 1 1 0 Rn Rd
U o2 o1

FRINTZ (vector) Page 1173

FRINTZ <Vd>.<T>, <Vn>.<T>

integer d = UInt(Rd);
integer n = UInt(Rn);

if sz:Q == '10' then UNDEFINED;

integer esize = 32 << UInt(sz);
integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;

boolean exact = FALSE;

Assembler Symbols

<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.

<T> For the half-precision variant: is an arrangement specifier, encoded in “Q”:

Q <T>
0 4H
1 8H

For the single-precision and double-precision variant: is an arrangement specifier, encoded in “sz:Q”:

sz Q <T>
0 0 2S
0 1 4S
1 0 RESERVED
1 1 2D

<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.

Operation

CheckFPAdvSIMDEnabled64();
bits(datasize) operand = V[n];
bits(datasize) result;
bits(esize) element;

for e = 0 to elements-1
element = Elem[operand, e, esize];
Elem[result, e, esize] = FPRoundInt(element, FPCR[], rounding, exact);

V[d] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

FRINTZ (vector) Page 1174

FRSQRTE

Floating-point Reciprocal Square Root Estimate. This instruction calculates an approximate square root for each
vector element in the source SIMD&FP register, places the result in a vector, and writes the vector to the destination
SIMD&FP register.
This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in
either a flag being set in FPSR or a synchronous exception being generated. For more information, see Floating-point
exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 4 classes: Scalar half precision , Scalar single-precision and double-precision , Vector half
precision and Vector single-precision and double-precision

Scalar half precision

(FEAT_FP16)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 1 1 0 1 1 1 1 1 0 0 1 1 1 0 1 1 0 Rn Rd

FRSQRTE <Hd>, <Hn>

if !HaveFP16Ext() then UNDEFINED;

integer d = UInt(Rd);
integer n = UInt(Rn);

integer esize = 16;

integer datasize = esize;
integer elements = 1;

Scalar single-precision and double-precision

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 1 1 0 1 sz 1 0 0 0 0 1 1 1 0 1 1 0 Rn Rd

FRSQRTE <V><d>, <V><n>

integer d = UInt(Rd);
integer n = UInt(Rn);

integer esize = 32 << UInt(sz);

integer datasize = esize;
integer elements = 1;

Vector half precision

(FEAT_FP16)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 1 1 1 1 1 0 0 1 1 1 0 1 1 0 Rn Rd

FRSQRTE <Vd>.<T>, <Vn>.<T>

if !HaveFP16Ext() then UNDEFINED;

integer d = UInt(Rd);
integer n = UInt(Rn);

integer esize = 16;

integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;

FRSQRTE Page 1175

Vector single-precision and double-precision

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 1 sz 1 0 0 0 0 1 1 1 0 1 1 0 Rn Rd

FRSQRTE <Vd>.<T>, <Vn>.<T>

integer d = UInt(Rd);
integer n = UInt(Rn);

if sz:Q == '10' then UNDEFINED;

integer esize = 32 << UInt(sz);
integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;

Assembler Symbols

<Hd> Is the 16-bit name of the SIMD&FP destination register, encoded in the "Rd" field.
<Hn> Is the 16-bit name of the SIMD&FP source register, encoded in the "Rn" field.

<V> Is a width specifier, encoded in “sz”:

sz <V>
0 S
1 D

<T> For the half-precision variant: is an arrangement specifier, encoded in “Q”:

Q <T>
0 4H
1 8H

For the single-precision and double-precision variant: is an arrangement specifier, encoded in “sz:Q”:

sz Q <T>
0 0 2S
0 1 4S
1 0 RESERVED
1 1 2D

<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.

Operation

CheckFPAdvSIMDEnabled64();
bits(datasize) operand = V[n];

bits(esize) element;
FPCRType fpcr = FPCR[];
boolean merge = elements == 1 && IsMerging(fpcr);
bits(128) result = if merge then V[d] else Zeros();

for e = 0 to elements-1
element = Elem[operand, e, esize];
Elem[result, e, esize] = FPRSqrtEstimate(element, fpcr);

V[d] = result;

FRSQRTE Page 1176

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

FRSQRTE Page 1177

FRSQRTS

Floating-point Reciprocal Square Root Step. This instruction multiplies corresponding floating-point values in the
vectors of the two source SIMD&FP registers, subtracts each of the products from 3.0, divides these results by 2.0,
places the results into a vector, and writes the vector to the destination SIMD&FP register.
This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in
either a flag being set in FPSR, or a synchronous exception being generated. For more information, see Floating-point
exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 4 classes: Scalar half precision , Scalar single-precision and double-precision , Vector half
precision and Vector single-precision and double-precision

Scalar half precision

(FEAT_FP16)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 0 1 1 0 Rm 0 0 1 1 1 1 Rn Rd

FRSQRTS <Hd>, <Hn>, <Hm>

if !HaveFP16Ext() then UNDEFINED;

integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer esize = 16;
integer datasize = esize;
integer elements = 1;

Scalar single-precision and double-precision

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 0 1 sz 1 Rm 1 1 1 1 1 1 Rn Rd

FRSQRTS <V><d>, <V><n>, <V><m>

integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer esize = 32 << UInt(sz);
integer datasize = esize;
integer elements = 1;

Vector half precision

(FEAT_FP16)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 1 1 0 Rm 0 0 1 1 1 1 Rn Rd

FRSQRTS <Vd>.<T>, <Vn>.<T>, <Vm>.<T>

if !HaveFP16Ext() then UNDEFINED;

integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer esize = 16;
integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;

FRSQRTS Page 1178

Vector single-precision and double-precision

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 1 sz 1 Rm 1 1 1 1 1 1 Rn Rd

FRSQRTS <Vd>.<T>, <Vn>.<T>, <Vm>.<T>

Assembler Symbols

<V> Is a width specifier, encoded in “sz”:

sz <V>
0 S
1 D

<T> For the half-precision variant: is an arrangement specifier, encoded in “Q”:

Q <T>
0 4H
1 8H

For the single-precision and double-precision variant: is an arrangement specifier, encoded in “sz:Q”:

sz Q <T>
0 0 2S
0 1 4S
1 0 RESERVED
1 1 2D

<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.

FRSQRTS Page 1179

Operation

CheckFPAdvSIMDEnabled64();
bits(datasize) operand1 = V[n];
bits(datasize) operand2 = V[m];

bits(esize) element1;
bits(esize) element2;
FPCRType fpcr = FPCR[];
boolean merge = elements == 1 && IsMerging(fpcr);
bits(128) result = if merge then V[n] else Zeros();

for e = 0 to elements-1
element1 = Elem[operand1, e, esize];
element2 = Elem[operand2, e, esize];
Elem[result, e, esize] = FPRSqrtStepFused(element1, element2);

V[d] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

FRSQRTS Page 1180

FSQRT (scalar)

Floating-point Square Root (scalar). This instruction calculates the square root of the value in the SIMD&FP source
register and writes the result to the SIMD&FP destination register.
A floating-point exception can be generated by this instruction. Depending on the settings in FPCR, the exception
results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see
Floating-point exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 1 1 1 1 0 ftype 1 0 0 0 0 1 1 1 0 0 0 0 Rn Rd
opc

Half-precision (ftype == 11)

(FEAT_FP16)

FSQRT <Hd>, <Hn>

Single-precision (ftype == 00)

FSQRT <Sd>, <Sn>

Double-precision (ftype == 01)

FSQRT <Dd>, <Dn>

integer d = UInt(Rd);
integer n = UInt(Rn);

integer esize;
case ftype of
when '00' esize = 32;
when '01' esize = 64;
when '10' UNDEFINED;
when '11'
if HaveFP16Ext() then
esize = 16;
else
UNDEFINED;

Assembler Symbols

FSQRT (scalar) Page 1181

Operation

CheckFPAdvSIMDEnabled64();

FPCRType fpcr = FPCR[];

boolean merge = IsMerging(fpcr);
bits(128) result = if merge then V[d] else Zeros();

bits(esize) operand = V[n];

Elem[result, 0, esize] = FPSqrt(operand, fpcr);

V[d] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

FSQRT (scalar) Page 1182

FSQRT (vector)

Floating-point Square Root (vector). This instruction calculates the square root for each vector element in the source
SIMD&FP register, places the result in a vector, and writes the vector to the destination SIMD&FP register.
This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in
either a flag being set in FPSR or a synchronous exception being generated. For more information, see Floating-point
exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: Half-precision and Single-precision and double-precision

Half-precision
(FEAT_FP16)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 1 1 1 1 1 0 0 1 1 1 1 1 1 0 Rn Rd

FSQRT <Vd>.<T>, <Vn>.<T>

if !HaveFP16Ext() then UNDEFINED;

integer d = UInt(Rd);
integer n = UInt(Rn);

integer esize = 16;

integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;

Single-precision and double-precision

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 1 sz 1 0 0 0 0 1 1 1 1 1 1 0 Rn Rd

FSQRT <Vd>.<T>, <Vn>.<T>

integer d = UInt(Rd);
integer n = UInt(Rn);

if sz:Q == '10' then UNDEFINED;

integer esize = 32 << UInt(sz);
integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;

Assembler Symbols

<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.

<T> For the half-precision variant: is an arrangement specifier, encoded in “Q”:

Q <T>
0 4H
1 8H

For the single-precision and double-precision variant: is an arrangement specifier, encoded in “sz:Q”:

sz Q <T>
0 0 2S
0 1 4S
1 0 RESERVED
1 1 2D

FSQRT (vector) Page 1183

<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.

Operation

CheckFPAdvSIMDEnabled64();
bits(datasize) operand = V[n];
bits(datasize) result;
bits(esize) element;

for e = 0 to elements-1
element = Elem[operand, e, esize];
Elem[result, e, esize] = FPSqrt(element, FPCR[]);

V[d] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

FSQRT (vector) Page 1184

FSUB (scalar)

Floating-point Subtract (scalar). This instruction subtracts the floating-point value of the second source SIMD&FP
register from the floating-point value of the first source SIMD&FP register, and writes the result to the destination
SIMD&FP register.
This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in
either a flag being set in FPSR, or a synchronous exception being generated. For more information, see Floating-point
exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 1 1 1 1 0 ftype 1 Rm 0 0 1 1 1 0 Rn Rd
op

Half-precision (ftype == 11)

(FEAT_FP16)

FSUB <Hd>, <Hn>, <Hm>

Single-precision (ftype == 00)

FSUB <Sd>, <Sn>, <Sm>

Double-precision (ftype == 01)

FSUB <Dd>, <Dn>, <Dm>

integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);

integer esize;
case ftype of
when '00' esize = 32;
when '01' esize = 64;
when '10' UNDEFINED;
when '11'
if HaveFP16Ext() then
esize = 16;
else
UNDEFINED;

Assembler Symbols

FSUB (scalar) Page 1185

Operation

CheckFPAdvSIMDEnabled64();
bits(esize) operand1 = V[n];
bits(esize) operand2 = V[m];

FPCRType fpcr = FPCR[];

boolean merge = IsMerging(fpcr);
bits(128) result = if merge then V[n] else Zeros();

Elem[result, 0, esize] = FPSub(operand1, operand2, fpcr);

V[d] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

FSUB (scalar) Page 1186

FSUB (vector)

Floating-point Subtract (vector). This instruction subtracts the elements in the vector in the second source SIMD&FP
register, from the corresponding elements in the vector in the first source SIMD&FP register, places each result into
elements of a vector, and writes the vector to the destination SIMD&FP register.
This instruction can generate a floating-point exception. Depending on the settings in FPCR, the exception results in
either a flag being set in FPSR, or a synchronous exception being generated. For more information, see Floating-point
exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: Half-precision and Single-precision and double-precision

Half-precision
(FEAT_FP16)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 1 1 0 Rm 0 0 0 1 0 1 Rn Rd
U

FSUB <Vd>.<T>, <Vn>.<T>, <Vm>.<T>

if !HaveFP16Ext() then UNDEFINED;

integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer esize = 16;
integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;
boolean abs = (U == '1');

Single-precision and double-precision

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 1 sz 1 Rm 1 1 0 1 0 1 Rn Rd
U

FSUB <Vd>.<T>, <Vn>.<T>, <Vm>.<T>

Assembler Symbols

<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.

<T> For the half-precision variant: is an arrangement specifier, encoded in “Q”:

Q <T>
0 4H
1 8H

For the single-precision and double-precision variant: is an arrangement specifier, encoded in “sz:Q”:

FSUB (vector) Page 1187

sz Q <T>
0 0 2S
0 1 4S
1 0 RESERVED
1 1 2D

<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.

Operation

CheckFPAdvSIMDEnabled64();
bits(datasize) operand1 = V[n];
bits(datasize) operand2 = V[m];

bits(esize) element1;
bits(esize) element2;
bits(esize) diff;
FPCRType fpcr = FPCR[];
bits(datasize) result;

for e = 0 to elements-1
element1 = Elem[operand1, e, esize];
element2 = Elem[operand2, e, esize];
diff = FPSub(element1, element2, fpcr);
Elem[result, e, esize] = if abs then FPAbs(diff) else diff;

V[d] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

FSUB (vector) Page 1188

INS (element)

Insert vector element from another vector element. This instruction copies the vector element of the source SIMD&FP
register to the specified vector element of the destination SIMD&FP register.
This instruction can insert data into individual elements within a SIMD&FP register without clearing the remaining
bits to zero.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
This instruction is used by the alias MOV (element).
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 1 1 1 0 0 0 0 imm5 0 imm4 1 Rn Rd

INS <Vd>.<Ts>[<index1>], <Vn>.<Ts>[<index2>]

integer d = UInt(Rd);
integer n = UInt(Rn);

integer size = LowestSetBit(imm5);

if size > 3 then UNDEFINED;

integer dst_index = UInt(imm5<4:size+1>);

integer src_index = UInt(imm4<3:size>);
integer idxdsize = if imm4<3> == '1' then 128 else 64;
// imm4<size-1:0> is IGNORED

integer esize = 8 << size;

Assembler Symbols

<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.

<Ts> Is an element size specifier, encoded in “imm5”:

imm5 <Ts>
x0000 RESERVED
xxxx1 B
xxx10 H
xx100 S
x1000 D

<index1> Is the destination element index encoded in “imm5”:

imm5 <index1>
x0000 RESERVED
xxxx1 imm5<4:1>
xxx10 imm5<4:2>
xx100 imm5<4:3>
x1000 imm5<4>

<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.

<index2> Is the source element index encoded in “imm5:imm4”:

imm5 <index2>
x0000 RESERVED
xxxx1 imm4<3:0>
xxx10 imm4<3:1>
xx100 imm4<3:2>
x1000 imm4<3>
Unspecified bits in "imm4" are ignored but should be set to zero by an assembler.

INS (element) Page 1189

Operation

CheckFPAdvSIMDEnabled64();
bits(idxdsize) operand = V[n];
bits(128) result;

result = V[d];
Elem[result, dst_index, esize] = Elem[operand, src_index, esize];
V[d] = result;

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

INS (element) Page 1190

INS (general)

Insert vector element from general-purpose register. This instruction copies the contents of the source general-
purpose register to the specified vector element in the destination SIMD&FP register.
This instruction can insert data into individual elements within a SIMD&FP register without clearing the remaining
bits to zero.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
This instruction is used by the alias MOV (from general).
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 0 1 1 1 0 0 0 0 imm5 0 0 0 1 1 1 Rn Rd

INS <Vd>.<Ts>[<index>], <R><n>

integer d = UInt(Rd);
integer n = UInt(Rn);

integer size = LowestSetBit(imm5);

if size > 3 then UNDEFINED;

integer index = UInt(imm5<4:size+1>);

integer esize = 8 << size;

Assembler Symbols

<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.

<Ts> Is an element size specifier, encoded in “imm5”:

imm5 <Ts>
x0000 RESERVED
xxxx1 B
xxx10 H
xx100 S
x1000 D

<index> Is the element index encoded in “imm5”:

imm5 <index>
x0000 RESERVED
xxxx1 imm5<4:1>
xxx10 imm5<4:2>
xx100 imm5<4:3>
x1000 imm5<4>

<R> Is the width specifier for the general-purpose source register, encoded in “imm5”:

imm5 <R>
x0000 RESERVED
xxxx1 W
xxx10 W
xx100 W
x1000 X

<n> Is the number [0-30] of the general-purpose source register or ZR (31), encoded in the "Rn" field.

INS (general) Page 1191

Operation

CheckFPAdvSIMDEnabled64();
bits(esize) element = X[n];
bits(128) result;

result = V[d];
Elem[result, index, esize] = element;
V[d] = result;

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

INS (general) Page 1192

LD1 (multiple structures)

Load multiple single-element structures to one, two, three, or four registers. This instruction loads multiple single-
element structures from memory and writes the result to one, two, three, or four SIMD&FP registers.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: No offset and Post-index

No offset

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 0 0 0 1 0 0 0 0 0 0 x x 1 x size Rn Rt
L opcode

One register (opcode == 0111)

LD1 { <Vt>.<T> }, [<Xn|SP>]

Two registers (opcode == 1010)

LD1 { <Vt>.<T>, <Vt2>.<T> }, [<Xn|SP>]

Three registers (opcode == 0110)

LD1 { <Vt>.<T>, <Vt2>.<T>, <Vt3>.<T> }, [<Xn|SP>]

Four registers (opcode == 0010)

LD1 { <Vt>.<T>, <Vt2>.<T>, <Vt3>.<T>, <Vt4>.<T> }, [<Xn|SP>]

integer t = UInt(Rt);
integer n = UInt(Rn);
integer m = integer UNKNOWN;
boolean wback = FALSE;
boolean tag_checked = wback || n != 31;

Post-index

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 0 0 1 1 0 Rm x x 1 x size Rn Rt
L opcode

LD1 (multiple structures) Page 1193

One register, immediate offset (Rm == 11111 && opcode == 0111)

LD1 { <Vt>.<T> }, [<Xn|SP>], <imm>

One register, register offset (Rm != 11111 && opcode == 0111)

LD1 { <Vt>.<T> }, [<Xn|SP>], <Xm>

Two registers, immediate offset (Rm == 11111 && opcode == 1010)

LD1 { <Vt>.<T>, <Vt2>.<T> }, [<Xn|SP>], <imm>

Two registers, register offset (Rm != 11111 && opcode == 1010)

LD1 { <Vt>.<T>, <Vt2>.<T> }, [<Xn|SP>], <Xm>

Three registers, immediate offset (Rm == 11111 && opcode == 0110)

LD1 { <Vt>.<T>, <Vt2>.<T>, <Vt3>.<T> }, [<Xn|SP>], <imm>

Three registers, register offset (Rm != 11111 && opcode == 0110)

LD1 { <Vt>.<T>, <Vt2>.<T>, <Vt3>.<T> }, [<Xn|SP>], <Xm>

Four registers, immediate offset (Rm == 11111 && opcode == 0010)

LD1 { <Vt>.<T>, <Vt2>.<T>, <Vt3>.<T>, <Vt4>.<T> }, [<Xn|SP>], <imm>

Four registers, register offset (Rm != 11111 && opcode == 0010)

LD1 { <Vt>.<T>, <Vt2>.<T>, <Vt3>.<T>, <Vt4>.<T> }, [<Xn|SP>], <Xm>

integer t = UInt(Rt);
integer n = UInt(Rn);
integer m = UInt(Rm);
boolean wback = TRUE;
boolean tag_checked = wback || n != 31;

Assembler Symbols

<Vt> Is the name of the first or only SIMD&FP register to be transferred, encoded in the "Rt" field.

<T> Is an arrangement specifier, encoded in “size:Q”:

size Q <T>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 0 1D
11 1 2D

<Vt2> Is the name of the second SIMD&FP register to be transferred, encoded as "Rt" plus 1 modulo 32.
<Vt3> Is the name of the third SIMD&FP register to be transferred, encoded as "Rt" plus 2 modulo 32.
<Vt4> Is the name of the fourth SIMD&FP register to be transferred, encoded as "Rt" plus 3 modulo 32.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.

LD1 (multiple structures) Page 1194

<imm> For the one register, immediate offset variant: is the post-index immediate offset, encoded in “Q”:

Q <imm>
0 #8
1 #16

For the two registers, immediate offset variant: is the post-index immediate offset, encoded in “Q”:

Q <imm>
0 #16
1 #32

For the three registers, immediate offset variant: is the post-index immediate offset, encoded in “Q”:

Q <imm>
0 #24
1 #48

For the four registers, immediate offset variant: is the post-index immediate offset, encoded in “Q”:

Q <imm>
0 #32
1 #64

<Xm> Is the 64-bit name of the general-purpose post-index register, excluding XZR, encoded in the "Rm" field.

Shared Decode

MemOp memop = if L == '1' then MemOp_LOAD else MemOp_STORE;

integer datasize = if Q == '1' then 128 else 64;
integer esize = 8 << UInt(size);
integer elements = datasize DIV esize;

integer rpt; // number of iterations

integer selem; // structure elements

case opcode of
when '0000' rpt = 1; selem = 4; // LD/ST4 (4 registers)
when '0010' rpt = 4; selem = 1; // LD/ST1 (4 registers)
when '0100' rpt = 1; selem = 3; // LD/ST3 (3 registers)
when '0110' rpt = 3; selem = 1; // LD/ST1 (3 registers)
when '0111' rpt = 1; selem = 1; // LD/ST1 (1 register)
when '1000' rpt = 1; selem = 2; // LD/ST2 (2 registers)
when '1010' rpt = 2; selem = 1; // LD/ST1 (2 registers)
otherwise UNDEFINED;

// .1D format only permitted with LD1 & ST1

if size:Q == '110' && selem != 1 then UNDEFINED;

LD1 (multiple structures) Page 1195

Operation

CheckFPAdvSIMDEnabled64();

bits(64) address;
bits(64) offs;
bits(datasize) rval;
integer tt;
constant integer ebytes = esize DIV 8;

if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);

if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];

offs = Zeros();
for r = 0 to rpt-1
for e = 0 to elements-1
tt = (t + r) MOD 32;
for s = 0 to selem-1
rval = V[tt];
if memop == MemOp_LOAD then
Elem[rval, e, esize] = Mem[address+offs, ebytes, AccType_VEC];
V[tt] = rval;
else // memop == MemOp_STORE
Mem[address+offs, ebytes, AccType_VEC] = Elem[rval, e, esize];
offs = offs + ebytes;
tt = (tt + 1) MOD 32;

if wback then
if m != 31 then
offs = X[m];
if n == 31 then
SP[] = address + offs;
else
X[n] = address + offs;

Operational information

If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

LD1 (multiple structures) Page 1196

LD1 (single structure)

Load one single-element structure to one lane of one register. This instruction loads a single-element structure from
memory and writes the result to the specified lane of the SIMD&FP register without affecting the other bits of the
register.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: No offset and Post-index

No offset

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 0 1 0 1 0 0 0 0 0 0 x x 0 S size Rn Rt
L R opcode

8-bit (opcode == 000)

LD1 { <Vt>.B }[<index>], [<Xn|SP>]

16-bit (opcode == 010 && size == x0)

LD1 { <Vt>.H }[<index>], [<Xn|SP>]

32-bit (opcode == 100 && size == 00)

LD1 { <Vt>.S }[<index>], [<Xn|SP>]

64-bit (opcode == 100 && S == 0 && size == 01)

LD1 { <Vt>.D }[<index>], [<Xn|SP>]

integer t = UInt(Rt);
integer n = UInt(Rn);
integer m = integer UNKNOWN;
boolean wback = FALSE;
boolean tag_checked = wback || n != 31;

Post-index

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 0 1 1 1 0 Rm x x 0 S size Rn Rt
L R opcode

LD1 (single structure) Page 1197

8-bit, immediate offset (Rm == 11111 && opcode == 000)

LD1 { <Vt>.B }[<index>], [<Xn|SP>], #1

8-bit, register offset (Rm != 11111 && opcode == 000)

LD1 { <Vt>.B }[<index>], [<Xn|SP>], <Xm>

16-bit, immediate offset (Rm == 11111 && opcode == 010 && size == x0)

LD1 { <Vt>.H }[<index>], [<Xn|SP>], #2

16-bit, register offset (Rm != 11111 && opcode == 010 && size == x0)

LD1 { <Vt>.H }[<index>], [<Xn|SP>], <Xm>

32-bit, immediate offset (Rm == 11111 && opcode == 100 && size == 00)

LD1 { <Vt>.S }[<index>], [<Xn|SP>], #4

32-bit, register offset (Rm != 11111 && opcode == 100 && size == 00)

LD1 { <Vt>.S }[<index>], [<Xn|SP>], <Xm>

64-bit, immediate offset (Rm == 11111 && opcode == 100 && S == 0 && size == 01)

LD1 { <Vt>.D }[<index>], [<Xn|SP>], #8

64-bit, register offset (Rm != 11111 && opcode == 100 && S == 0 && size == 01)

LD1 { <Vt>.D }[<index>], [<Xn|SP>], <Xm>

integer t = UInt(Rt);
integer n = UInt(Rn);
integer m = UInt(Rm);
boolean wback = TRUE;
boolean tag_checked = wback || n != 31;

Assembler Symbols

<Vt> Is the name of the first or only SIMD&FP register to be transferred, encoded in the "Rt" field.
<index> For the 8-bit variant: is the element index, encoded in "Q:S:size".
For the 16-bit variant: is the element index, encoded in "Q:S:size<1>".
For the 32-bit variant: is the element index, encoded in "Q:S".
For the 64-bit variant: is the element index, encoded in "Q".
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<Xm> Is the 64-bit name of the general-purpose post-index register, excluding XZR, encoded in the "Rm" field.

LD1 (single structure) Page 1198

Shared Decode

integer init_scale = UInt(opcode<2:1>);

integer scale = init_scale;
integer selem = UInt(opcode<0>:R) + 1;
boolean replicate = FALSE;
integer index;

case scale of
when 3
// load and replicate
if L == '0' || S == '1' then UNDEFINED;
scale = UInt(size);
replicate = TRUE;
when 0
index = UInt(Q:S:size); // B[0-15]
when 1
if size<0> == '1' then UNDEFINED;
index = UInt(Q:S:size<1>); // H[0-7]
when 2
if size<1> == '1' then UNDEFINED;
if size<0> == '0' then
index = UInt(Q:S); // S[0-3]
else
if S == '1' then UNDEFINED;
index = UInt(Q); // D[0-1]
scale = 3;

MemOp memop = if L == '1' then MemOp_LOAD else MemOp_STORE;

integer datasize = if Q == '1' then 128 else 64;
integer esize = 8 << scale;

LD1 (single structure) Page 1199

Operation

if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);

CheckFPAdvSIMDEnabled64();

bits(64) address;
bits(64) offs;
bits(128) rval;
bits(esize) element;
constant integer ebytes = esize DIV 8;

if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];

offs = Zeros();
if replicate then
// load and replicate to all elements
for s = 0 to selem-1
element = Mem[address+offs, ebytes, AccType_VEC];
// replicate to fill 128- or 64-bit register
V[t] = Replicate(element, datasize DIV esize);
offs = offs + ebytes;
t = (t + 1) MOD 32;
else
// load/store one element per register
for s = 0 to selem-1
rval = V[t];
if memop == MemOp_LOAD then
// insert into one lane of 128-bit register
Elem[rval, index, esize] = Mem[address+offs, ebytes, AccType_VEC];
V[t] = rval;
else // memop == MemOp_STORE
// extract from one lane of 128-bit register
Mem[address+offs, ebytes, AccType_VEC] = Elem[rval, index, esize];
offs = offs + ebytes;
t = (t + 1) MOD 32;

if wback then
if m != 31 then
offs = X[m];
if n == 31 then
SP[] = address + offs;
else
X[n] = address + offs;

Operational information

If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

LD1 (single structure) Page 1200

LD1R

Load one single-element structure and Replicate to all lanes (of one register). This instruction loads a single-element
structure from memory and replicates the structure to all the lanes of the SIMD&FP register.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: No offset and Post-index

No offset

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 0 1 0 1 0 0 0 0 0 0 1 1 0 0 size Rn Rt
L R opcode S

LD1R { <Vt>.<T> }, [<Xn|SP>]

integer t = UInt(Rt);
integer n = UInt(Rn);
integer m = integer UNKNOWN;
boolean wback = FALSE;
boolean tag_checked = wback || n != 31;

Post-index

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 0 1 1 1 0 Rm 1 1 0 0 size Rn Rt
L R opcode S

Immediate offset (Rm == 11111)

LD1R { <Vt>.<T> }, [<Xn|SP>], <imm>

Register offset (Rm != 11111)

LD1R { <Vt>.<T> }, [<Xn|SP>], <Xm>

integer t = UInt(Rt);
integer n = UInt(Rn);
integer m = UInt(Rm);
boolean wback = TRUE;
boolean tag_checked = wback || n != 31;

Assembler Symbols

<Vt> Is the name of the first or only SIMD&FP register to be transferred, encoded in the "Rt" field.

<T> Is an arrangement specifier, encoded in “size:Q”:

size Q <T>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 0 1D
11 1 2D

<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.

LD1R Page 1201

<imm> Is the post-index immediate offset, encoded in “size”:

size <imm>
00 #1
01 #2
10 #4
11 #8

<Xm> Is the 64-bit name of the general-purpose post-index register, excluding XZR, encoded in the "Rm" field.

Shared Decode

integer init_scale = UInt(opcode<2:1>);

integer scale = init_scale;
integer selem = UInt(opcode<0>:R) + 1;
boolean replicate = FALSE;
integer index;

MemOp memop = if L == '1' then MemOp_LOAD else MemOp_STORE;

integer datasize = if Q == '1' then 128 else 64;
integer esize = 8 << scale;

LD1R Page 1202

Operation

if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);

CheckFPAdvSIMDEnabled64();

bits(64) address;
bits(64) offs;
bits(128) rval;
bits(esize) element;
constant integer ebytes = esize DIV 8;

if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];

if wback then
if m != 31 then
offs = X[m];
if n == 31 then
SP[] = address + offs;
else
X[n] = address + offs;

Operational information

If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

LD1R Page 1203

LD2 (multiple structures)

Load multiple 2-element structures to two registers. This instruction loads multiple 2-element structures from memory
and writes the result to the two SIMD&FP registers, with de-interleaving.
For an example of de-interleaving, see LD3 (multiple structures).
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: No offset and Post-index

No offset

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 0 0 0 1 0 0 0 0 0 0 1 0 0 0 size Rn Rt
L opcode

LD2 { <Vt>.<T>, <Vt2>.<T> }, [<Xn|SP>]

integer t = UInt(Rt);
integer n = UInt(Rn);
integer m = integer UNKNOWN;
boolean wback = FALSE;
boolean tag_checked = wback || n != 31;

Post-index

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 0 0 1 1 0 Rm 1 0 0 0 size Rn Rt
L opcode

Immediate offset (Rm == 11111)

LD2 { <Vt>.<T>, <Vt2>.<T> }, [<Xn|SP>], <imm>

Register offset (Rm != 11111)

LD2 { <Vt>.<T>, <Vt2>.<T> }, [<Xn|SP>], <Xm>

integer t = UInt(Rt);
integer n = UInt(Rn);
integer m = UInt(Rm);
boolean wback = TRUE;
boolean tag_checked = wback || n != 31;

Assembler Symbols

<Vt> Is the name of the first or only SIMD&FP register to be transferred, encoded in the "Rt" field.

<T> Is an arrangement specifier, encoded in “size:Q”:

size Q <T>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 0 RESERVED
11 1 2D

<Vt2> Is the name of the second SIMD&FP register to be transferred, encoded as "Rt" plus 1 modulo 32.

LD2 (multiple structures) Page 1204

<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.

<imm> Is the post-index immediate offset, encoded in “Q”:

Q <imm>
0 #16
1 #32

<Xm> Is the 64-bit name of the general-purpose post-index register, excluding XZR, encoded in the "Rm" field.

Shared Decode

MemOp memop = if L == '1' then MemOp_LOAD else MemOp_STORE;

integer datasize = if Q == '1' then 128 else 64;
integer esize = 8 << UInt(size);
integer elements = datasize DIV esize;

integer rpt; // number of iterations

integer selem; // structure elements

// .1D format only permitted with LD1 & ST1

if size:Q == '110' && selem != 1 then UNDEFINED;

LD2 (multiple structures) Page 1205

Operation

CheckFPAdvSIMDEnabled64();

bits(64) address;
bits(64) offs;
bits(datasize) rval;
integer tt;
constant integer ebytes = esize DIV 8;

if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);

if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];

if wback then
if m != 31 then
offs = X[m];
if n == 31 then
SP[] = address + offs;
else
X[n] = address + offs;

Operational information

If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

LD2 (multiple structures) Page 1206

LD2 (single structure)

Load single 2-element structure to one lane of two registers. This instruction loads a 2-element structure from memory
and writes the result to the corresponding elements of the two SIMD&FP registers without affecting the other bits of
the registers.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: No offset and Post-index

No offset

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 0 1 0 1 1 0 0 0 0 0 x x 0 S size Rn Rt
L R opcode

8-bit (opcode == 000)

LD2 { <Vt>.B, <Vt2>.B }[<index>], [<Xn|SP>]

16-bit (opcode == 010 && size == x0)

LD2 { <Vt>.H, <Vt2>.H }[<index>], [<Xn|SP>]

32-bit (opcode == 100 && size == 00)

LD2 { <Vt>.S, <Vt2>.S }[<index>], [<Xn|SP>]

64-bit (opcode == 100 && S == 0 && size == 01)

LD2 { <Vt>.D, <Vt2>.D }[<index>], [<Xn|SP>]

integer t = UInt(Rt);
integer n = UInt(Rn);
integer m = integer UNKNOWN;
boolean wback = FALSE;
boolean tag_checked = wback || n != 31;

Post-index

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 0 1 1 1 1 Rm x x 0 S size Rn Rt
L R opcode

LD2 (single structure) Page 1207

8-bit, immediate offset (Rm == 11111 && opcode == 000)

LD2 { <Vt>.B, <Vt2>.B }[<index>], [<Xn|SP>], #2

8-bit, register offset (Rm != 11111 && opcode == 000)

LD2 { <Vt>.B, <Vt2>.B }[<index>], [<Xn|SP>], <Xm>

16-bit, immediate offset (Rm == 11111 && opcode == 010 && size == x0)

LD2 { <Vt>.H, <Vt2>.H }[<index>], [<Xn|SP>], #4

16-bit, register offset (Rm != 11111 && opcode == 010 && size == x0)

LD2 { <Vt>.H, <Vt2>.H }[<index>], [<Xn|SP>], <Xm>

32-bit, immediate offset (Rm == 11111 && opcode == 100 && size == 00)

LD2 { <Vt>.S, <Vt2>.S }[<index>], [<Xn|SP>], #8

32-bit, register offset (Rm != 11111 && opcode == 100 && size == 00)

LD2 { <Vt>.S, <Vt2>.S }[<index>], [<Xn|SP>], <Xm>

64-bit, immediate offset (Rm == 11111 && opcode == 100 && S == 0 && size == 01)

LD2 { <Vt>.D, <Vt2>.D }[<index>], [<Xn|SP>], #16

64-bit, register offset (Rm != 11111 && opcode == 100 && S == 0 && size == 01)

LD2 { <Vt>.D, <Vt2>.D }[<index>], [<Xn|SP>], <Xm>

integer t = UInt(Rt);
integer n = UInt(Rn);
integer m = UInt(Rm);
boolean wback = TRUE;
boolean tag_checked = wback || n != 31;

Assembler Symbols

<Vt> Is the name of the first or only SIMD&FP register to be transferred, encoded in the "Rt" field.
<Vt2> Is the name of the second SIMD&FP register to be transferred, encoded as "Rt" plus 1 modulo 32.
<index> For the 8-bit variant: is the element index, encoded in "Q:S:size".
For the 16-bit variant: is the element index, encoded in "Q:S:size<1>".
For the 32-bit variant: is the element index, encoded in "Q:S".
For the 64-bit variant: is the element index, encoded in "Q".
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<Xm> Is the 64-bit name of the general-purpose post-index register, excluding XZR, encoded in the "Rm" field.

LD2 (single structure) Page 1208

Shared Decode

integer init_scale = UInt(opcode<2:1>);

integer scale = init_scale;
integer selem = UInt(opcode<0>:R) + 1;
boolean replicate = FALSE;
integer index;

MemOp memop = if L == '1' then MemOp_LOAD else MemOp_STORE;

integer datasize = if Q == '1' then 128 else 64;
integer esize = 8 << scale;

LD2 (single structure) Page 1209

Operation

if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);

CheckFPAdvSIMDEnabled64();

bits(64) address;
bits(64) offs;
bits(128) rval;
bits(esize) element;
constant integer ebytes = esize DIV 8;

if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];

if wback then
if m != 31 then
offs = X[m];
if n == 31 then
SP[] = address + offs;
else
X[n] = address + offs;

Operational information

If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

LD2 (single structure) Page 1210

LD2R

Load single 2-element structure and Replicate to all lanes of two registers. This instruction loads a 2-element structure
from memory and replicates the structure to all the lanes of the two SIMD&FP registers.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: No offset and Post-index

No offset

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 0 1 0 1 1 0 0 0 0 0 1 1 0 0 size Rn Rt
L R opcode S

LD2R { <Vt>.<T>, <Vt2>.<T> }, [<Xn|SP>]

integer t = UInt(Rt);
integer n = UInt(Rn);
integer m = integer UNKNOWN;
boolean wback = FALSE;
boolean tag_checked = wback || n != 31;

Post-index

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 0 1 1 1 1 Rm 1 1 0 0 size Rn Rt
L R opcode S

Immediate offset (Rm == 11111)

LD2R { <Vt>.<T>, <Vt2>.<T> }, [<Xn|SP>], <imm>

Register offset (Rm != 11111)

LD2R { <Vt>.<T>, <Vt2>.<T> }, [<Xn|SP>], <Xm>

integer t = UInt(Rt);
integer n = UInt(Rn);
integer m = UInt(Rm);
boolean wback = TRUE;
boolean tag_checked = wback || n != 31;

Assembler Symbols

<Vt> Is the name of the first or only SIMD&FP register to be transferred, encoded in the "Rt" field.

<T> Is an arrangement specifier, encoded in “size:Q”:

size Q <T>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 0 1D
11 1 2D

<Vt2> Is the name of the second SIMD&FP register to be transferred, encoded as "Rt" plus 1 modulo 32.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.

LD2R Page 1211

<imm> Is the post-index immediate offset, encoded in “size”:

size <imm>
00 #2
01 #4
10 #8
11 #16

<Xm> Is the 64-bit name of the general-purpose post-index register, excluding XZR, encoded in the "Rm" field.

Shared Decode

integer init_scale = UInt(opcode<2:1>);

integer scale = init_scale;
integer selem = UInt(opcode<0>:R) + 1;
boolean replicate = FALSE;
integer index;

MemOp memop = if L == '1' then MemOp_LOAD else MemOp_STORE;

integer datasize = if Q == '1' then 128 else 64;
integer esize = 8 << scale;

LD2R Page 1212

Operation

if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);

CheckFPAdvSIMDEnabled64();

bits(64) address;
bits(64) offs;
bits(128) rval;
bits(esize) element;
constant integer ebytes = esize DIV 8;

if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];

if wback then
if m != 31 then
offs = X[m];
if n == 31 then
SP[] = address + offs;
else
X[n] = address + offs;

Operational information

If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

LD2R Page 1213

LD3 (multiple structures)

Load multiple 3-element structures to three registers. This instruction loads multiple 3-element structures from
memory and writes the result to the three SIMD&FP registers, with de-interleaving.
The following figure shows an example of the operation of de-interleaving of a LD3.16 (multiple 3-element structures)
instruction:.
Memory
A[0].x
A[0].y
A[0].z
A[1].x
A is a packed array of A[1].y
3-element structures. A[1].z
Each element is a 16-bitA[2].x
halfword. A[2].y
A[2].z
A[3].x
A[3].y X3 X2 X1 X0 D0
A[3].z Y3 Y2 Y1 Y0 D1 Registers
Z3 Z2 Z1 Z0 D2

Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: No offset and Post-index

No offset

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 0 0 0 1 0 0 0 0 0 0 0 1 0 0 size Rn Rt
L opcode

LD3 { <Vt>.<T>, <Vt2>.<T>, <Vt3>.<T> }, [<Xn|SP>]

integer t = UInt(Rt);
integer n = UInt(Rn);
integer m = integer UNKNOWN;
boolean wback = FALSE;
boolean tag_checked = wback || n != 31;

Post-index

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 0 0 1 1 0 Rm 0 1 0 0 size Rn Rt
L opcode

Immediate offset (Rm == 11111)

LD3 { <Vt>.<T>, <Vt2>.<T>, <Vt3>.<T> }, [<Xn|SP>], <imm>

Register offset (Rm != 11111)

LD3 { <Vt>.<T>, <Vt2>.<T>, <Vt3>.<T> }, [<Xn|SP>], <Xm>

integer t = UInt(Rt);
integer n = UInt(Rn);
integer m = UInt(Rm);
boolean wback = TRUE;
boolean tag_checked = wback || n != 31;

LD3 (multiple structures) Page 1214

Assembler Symbols

<Vt> Is the name of the first or only SIMD&FP register to be transferred, encoded in the "Rt" field.

<T> Is an arrangement specifier, encoded in “size:Q”:

size Q <T>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 0 RESERVED
11 1 2D

<Vt2> Is the name of the second SIMD&FP register to be transferred, encoded as "Rt" plus 1 modulo 32.
<Vt3> Is the name of the third SIMD&FP register to be transferred, encoded as "Rt" plus 2 modulo 32.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.

<imm> Is the post-index immediate offset, encoded in “Q”:

Q <imm>
0 #24
1 #48

<Xm> Is the 64-bit name of the general-purpose post-index register, excluding XZR, encoded in the "Rm" field.

Shared Decode

MemOp memop = if L == '1' then MemOp_LOAD else MemOp_STORE;

integer datasize = if Q == '1' then 128 else 64;
integer esize = 8 << UInt(size);
integer elements = datasize DIV esize;

integer rpt; // number of iterations

integer selem; // structure elements

// .1D format only permitted with LD1 & ST1

if size:Q == '110' && selem != 1 then UNDEFINED;

LD3 (multiple structures) Page 1215

Operation

CheckFPAdvSIMDEnabled64();

bits(64) address;
bits(64) offs;
bits(datasize) rval;
integer tt;
constant integer ebytes = esize DIV 8;

if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);

if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];

if wback then
if m != 31 then
offs = X[m];
if n == 31 then
SP[] = address + offs;
else
X[n] = address + offs;

Operational information

If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

LD3 (multiple structures) Page 1216

LD3 (single structure)

Load single 3-element structure to one lane of three registers. This instruction loads a 3-element structure from
memory and writes the result to the corresponding elements of the three SIMD&FP registers without affecting the
other bits of the registers.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: No offset and Post-index

No offset

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 0 1 0 1 0 0 0 0 0 0 x x 1 S size Rn Rt
L R opcode

8-bit (opcode == 001)

LD3 { <Vt>.B, <Vt2>.B, <Vt3>.B }[<index>], [<Xn|SP>]

16-bit (opcode == 011 && size == x0)

LD3 { <Vt>.H, <Vt2>.H, <Vt3>.H }[<index>], [<Xn|SP>]

32-bit (opcode == 101 && size == 00)

LD3 { <Vt>.S, <Vt2>.S, <Vt3>.S }[<index>], [<Xn|SP>]

64-bit (opcode == 101 && S == 0 && size == 01)

LD3 { <Vt>.D, <Vt2>.D, <Vt3>.D }[<index>], [<Xn|SP>]

integer t = UInt(Rt);
integer n = UInt(Rn);
integer m = integer UNKNOWN;
boolean wback = FALSE;
boolean tag_checked = wback || n != 31;

Post-index

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 0 1 1 1 0 Rm x x 1 S size Rn Rt
L R opcode

LD3 (single structure) Page 1217

8-bit, immediate offset (Rm == 11111 && opcode == 001)

LD3 { <Vt>.B, <Vt2>.B, <Vt3>.B }[<index>], [<Xn|SP>], #3

8-bit, register offset (Rm != 11111 && opcode == 001)

LD3 { <Vt>.B, <Vt2>.B, <Vt3>.B }[<index>], [<Xn|SP>], <Xm>

16-bit, immediate offset (Rm == 11111 && opcode == 011 && size == x0)

LD3 { <Vt>.H, <Vt2>.H, <Vt3>.H }[<index>], [<Xn|SP>], #6

16-bit, register offset (Rm != 11111 && opcode == 011 && size == x0)

LD3 { <Vt>.H, <Vt2>.H, <Vt3>.H }[<index>], [<Xn|SP>], <Xm>

32-bit, immediate offset (Rm == 11111 && opcode == 101 && size == 00)

LD3 { <Vt>.S, <Vt2>.S, <Vt3>.S }[<index>], [<Xn|SP>], #12

32-bit, register offset (Rm != 11111 && opcode == 101 && size == 00)

LD3 { <Vt>.S, <Vt2>.S, <Vt3>.S }[<index>], [<Xn|SP>], <Xm>

64-bit, immediate offset (Rm == 11111 && opcode == 101 && S == 0 && size == 01)

LD3 { <Vt>.D, <Vt2>.D, <Vt3>.D }[<index>], [<Xn|SP>], #24

64-bit, register offset (Rm != 11111 && opcode == 101 && S == 0 && size == 01)

LD3 { <Vt>.D, <Vt2>.D, <Vt3>.D }[<index>], [<Xn|SP>], <Xm>

integer t = UInt(Rt);
integer n = UInt(Rn);
integer m = UInt(Rm);
boolean wback = TRUE;
boolean tag_checked = wback || n != 31;

Assembler Symbols

<Vt> Is the name of the first or only SIMD&FP register to be transferred, encoded in the "Rt" field.
<Vt2> Is the name of the second SIMD&FP register to be transferred, encoded as "Rt" plus 1 modulo 32.
<Vt3> Is the name of the third SIMD&FP register to be transferred, encoded as "Rt" plus 2 modulo 32.
<index> For the 8-bit variant: is the element index, encoded in "Q:S:size".
For the 16-bit variant: is the element index, encoded in "Q:S:size<1>".
For the 32-bit variant: is the element index, encoded in "Q:S".
For the 64-bit variant: is the element index, encoded in "Q".
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<Xm> Is the 64-bit name of the general-purpose post-index register, excluding XZR, encoded in the "Rm" field.

LD3 (single structure) Page 1218

Shared Decode

integer init_scale = UInt(opcode<2:1>);

integer scale = init_scale;
integer selem = UInt(opcode<0>:R) + 1;
boolean replicate = FALSE;
integer index;

MemOp memop = if L == '1' then MemOp_LOAD else MemOp_STORE;

integer datasize = if Q == '1' then 128 else 64;
integer esize = 8 << scale;

LD3 (single structure) Page 1219

Operation

if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);

CheckFPAdvSIMDEnabled64();

bits(64) address;
bits(64) offs;
bits(128) rval;
bits(esize) element;
constant integer ebytes = esize DIV 8;

if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];

if wback then
if m != 31 then
offs = X[m];
if n == 31 then
SP[] = address + offs;
else
X[n] = address + offs;

Operational information

If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

LD3 (single structure) Page 1220

LD3R

Load single 3-element structure and Replicate to all lanes of three registers. This instruction loads a 3-element
structure from memory and replicates the structure to all the lanes of the three SIMD&FP registers.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: No offset and Post-index

No offset

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 0 1 0 1 0 0 0 0 0 0 1 1 1 0 size Rn Rt
L R opcode S

LD3R { <Vt>.<T>, <Vt2>.<T>, <Vt3>.<T> }, [<Xn|SP>]

integer t = UInt(Rt);
integer n = UInt(Rn);
integer m = integer UNKNOWN;
boolean wback = FALSE;
boolean tag_checked = wback || n != 31;

Post-index

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 0 1 1 1 0 Rm 1 1 1 0 size Rn Rt
L R opcode S

Immediate offset (Rm == 11111)

LD3R { <Vt>.<T>, <Vt2>.<T>, <Vt3>.<T> }, [<Xn|SP>], <imm>

Register offset (Rm != 11111)

LD3R { <Vt>.<T>, <Vt2>.<T>, <Vt3>.<T> }, [<Xn|SP>], <Xm>

integer t = UInt(Rt);
integer n = UInt(Rn);
integer m = UInt(Rm);
boolean wback = TRUE;
boolean tag_checked = wback || n != 31;

Assembler Symbols

<Vt> Is the name of the first or only SIMD&FP register to be transferred, encoded in the "Rt" field.

<T> Is an arrangement specifier, encoded in “size:Q”:

size Q <T>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 0 1D
11 1 2D

LD3R Page 1221

<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.

<imm> Is the post-index immediate offset, encoded in “size”:

size <imm>
00 #3
01 #6
10 #12
11 #24

<Xm> Is the 64-bit name of the general-purpose post-index register, excluding XZR, encoded in the "Rm" field.

Shared Decode

integer init_scale = UInt(opcode<2:1>);

integer scale = init_scale;
integer selem = UInt(opcode<0>:R) + 1;
boolean replicate = FALSE;
integer index;

MemOp memop = if L == '1' then MemOp_LOAD else MemOp_STORE;

integer datasize = if Q == '1' then 128 else 64;
integer esize = 8 << scale;

LD3R Page 1222

Operation

if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);

CheckFPAdvSIMDEnabled64();

bits(64) address;
bits(64) offs;
bits(128) rval;
bits(esize) element;
constant integer ebytes = esize DIV 8;

if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];

if wback then
if m != 31 then
offs = X[m];
if n == 31 then
SP[] = address + offs;
else
X[n] = address + offs;

Operational information

If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

LD3R Page 1223

LD4 (multiple structures)

Load multiple 4-element structures to four registers. This instruction loads multiple 4-element structures from
memory and writes the result to the four SIMD&FP registers, with de-interleaving.
For an example of de-interleaving, see LD3 (multiple structures).
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: No offset and Post-index

No offset

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 size Rn Rt
L opcode

LD4 { <Vt>.<T>, <Vt2>.<T>, <Vt3>.<T>, <Vt4>.<T> }, [<Xn|SP>]

integer t = UInt(Rt);
integer n = UInt(Rn);
integer m = integer UNKNOWN;
boolean wback = FALSE;
boolean tag_checked = wback || n != 31;

Post-index

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 0 0 1 1 0 Rm 0 0 0 0 size Rn Rt
L opcode

Immediate offset (Rm == 11111)

LD4 { <Vt>.<T>, <Vt2>.<T>, <Vt3>.<T>, <Vt4>.<T> }, [<Xn|SP>], <imm>

Register offset (Rm != 11111)

LD4 { <Vt>.<T>, <Vt2>.<T>, <Vt3>.<T>, <Vt4>.<T> }, [<Xn|SP>], <Xm>

integer t = UInt(Rt);
integer n = UInt(Rn);
integer m = UInt(Rm);
boolean wback = TRUE;
boolean tag_checked = wback || n != 31;

Assembler Symbols

<Vt> Is the name of the first or only SIMD&FP register to be transferred, encoded in the "Rt" field.

<T> Is an arrangement specifier, encoded in “size:Q”:

size Q <T>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 0 RESERVED
11 1 2D

<Vt2> Is the name of the second SIMD&FP register to be transferred, encoded as "Rt" plus 1 modulo 32.

LD4 (multiple structures) Page 1224

<Vt3> Is the name of the third SIMD&FP register to be transferred, encoded as "Rt" plus 2 modulo 32.
<Vt4> Is the name of the fourth SIMD&FP register to be transferred, encoded as "Rt" plus 3 modulo 32.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.

<imm> Is the post-index immediate offset, encoded in “Q”:

Q <imm>
0 #32
1 #64

<Xm> Is the 64-bit name of the general-purpose post-index register, excluding XZR, encoded in the "Rm" field.

Shared Decode

MemOp memop = if L == '1' then MemOp_LOAD else MemOp_STORE;

integer datasize = if Q == '1' then 128 else 64;
integer esize = 8 << UInt(size);
integer elements = datasize DIV esize;

integer rpt; // number of iterations

integer selem; // structure elements

// .1D format only permitted with LD1 & ST1

if size:Q == '110' && selem != 1 then UNDEFINED;

LD4 (multiple structures) Page 1225

Operation

CheckFPAdvSIMDEnabled64();

bits(64) address;
bits(64) offs;
bits(datasize) rval;
integer tt;
constant integer ebytes = esize DIV 8;

if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);

if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];

if wback then
if m != 31 then
offs = X[m];
if n == 31 then
SP[] = address + offs;
else
X[n] = address + offs;

Operational information

If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

LD4 (multiple structures) Page 1226

LD4 (single structure)

Load single 4-element structure to one lane of four registers. This instruction loads a 4-element structure from
memory and writes the result to the corresponding elements of the four SIMD&FP registers without affecting the
other bits of the registers.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: No offset and Post-index

No offset

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 0 1 0 1 1 0 0 0 0 0 x x 1 S size Rn Rt
L R opcode

8-bit (opcode == 001)

LD4 { <Vt>.B, <Vt2>.B, <Vt3>.B, <Vt4>.B }[<index>], [<Xn|SP>]

16-bit (opcode == 011 && size == x0)

LD4 { <Vt>.H, <Vt2>.H, <Vt3>.H, <Vt4>.H }[<index>], [<Xn|SP>]

32-bit (opcode == 101 && size == 00)

LD4 { <Vt>.S, <Vt2>.S, <Vt3>.S, <Vt4>.S }[<index>], [<Xn|SP>]

64-bit (opcode == 101 && S == 0 && size == 01)

LD4 { <Vt>.D, <Vt2>.D, <Vt3>.D, <Vt4>.D }[<index>], [<Xn|SP>]

integer t = UInt(Rt);
integer n = UInt(Rn);
integer m = integer UNKNOWN;
boolean wback = FALSE;
boolean tag_checked = wback || n != 31;

Post-index

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 0 1 1 1 1 Rm x x 1 S size Rn Rt
L R opcode

LD4 (single structure) Page 1227

8-bit, immediate offset (Rm == 11111 && opcode == 001)

LD4 { <Vt>.B, <Vt2>.B, <Vt3>.B, <Vt4>.B }[<index>], [<Xn|SP>], #4

8-bit, register offset (Rm != 11111 && opcode == 001)

LD4 { <Vt>.B, <Vt2>.B, <Vt3>.B, <Vt4>.B }[<index>], [<Xn|SP>], <Xm>

16-bit, immediate offset (Rm == 11111 && opcode == 011 && size == x0)

LD4 { <Vt>.H, <Vt2>.H, <Vt3>.H, <Vt4>.H }[<index>], [<Xn|SP>], #8

16-bit, register offset (Rm != 11111 && opcode == 011 && size == x0)

LD4 { <Vt>.H, <Vt2>.H, <Vt3>.H, <Vt4>.H }[<index>], [<Xn|SP>], <Xm>

32-bit, immediate offset (Rm == 11111 && opcode == 101 && size == 00)

LD4 { <Vt>.S, <Vt2>.S, <Vt3>.S, <Vt4>.S }[<index>], [<Xn|SP>], #16

32-bit, register offset (Rm != 11111 && opcode == 101 && size == 00)

LD4 { <Vt>.S, <Vt2>.S, <Vt3>.S, <Vt4>.S }[<index>], [<Xn|SP>], <Xm>

64-bit, immediate offset (Rm == 11111 && opcode == 101 && S == 0 && size == 01)

LD4 { <Vt>.D, <Vt2>.D, <Vt3>.D, <Vt4>.D }[<index>], [<Xn|SP>], #32

64-bit, register offset (Rm != 11111 && opcode == 101 && S == 0 && size == 01)

LD4 { <Vt>.D, <Vt2>.D, <Vt3>.D, <Vt4>.D }[<index>], [<Xn|SP>], <Xm>

integer t = UInt(Rt);
integer n = UInt(Rn);
integer m = UInt(Rm);
boolean wback = TRUE;
boolean tag_checked = wback || n != 31;

Assembler Symbols

<Vt> Is the name of the first or only SIMD&FP register to be transferred, encoded in the "Rt" field.
<Vt2> Is the name of the second SIMD&FP register to be transferred, encoded as "Rt" plus 1 modulo 32.
<Vt3> Is the name of the third SIMD&FP register to be transferred, encoded as "Rt" plus 2 modulo 32.
<Vt4> Is the name of the fourth SIMD&FP register to be transferred, encoded as "Rt" plus 3 modulo 32.
<index> For the 8-bit variant: is the element index, encoded in "Q:S:size".
For the 16-bit variant: is the element index, encoded in "Q:S:size<1>".
For the 32-bit variant: is the element index, encoded in "Q:S".
For the 64-bit variant: is the element index, encoded in "Q".
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<Xm> Is the 64-bit name of the general-purpose post-index register, excluding XZR, encoded in the "Rm" field.

LD4 (single structure) Page 1228

Shared Decode

integer init_scale = UInt(opcode<2:1>);

integer scale = init_scale;
integer selem = UInt(opcode<0>:R) + 1;
boolean replicate = FALSE;
integer index;

MemOp memop = if L == '1' then MemOp_LOAD else MemOp_STORE;

integer datasize = if Q == '1' then 128 else 64;
integer esize = 8 << scale;

LD4 (single structure) Page 1229

Operation

if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);

CheckFPAdvSIMDEnabled64();

bits(64) address;
bits(64) offs;
bits(128) rval;
bits(esize) element;
constant integer ebytes = esize DIV 8;

if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];

if wback then
if m != 31 then
offs = X[m];
if n == 31 then
SP[] = address + offs;
else
X[n] = address + offs;

Operational information

If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

LD4 (single structure) Page 1230

LD4R

Load single 4-element structure and Replicate to all lanes of four registers. This instruction loads a 4-element
structure from memory and replicates the structure to all the lanes of the four SIMD&FP registers.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: No offset and Post-index

No offset

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 0 1 0 1 1 0 0 0 0 0 1 1 1 0 size Rn Rt
L R opcode S

LD4R { <Vt>.<T>, <Vt2>.<T>, <Vt3>.<T>, <Vt4>.<T> }, [<Xn|SP>]

integer t = UInt(Rt);
integer n = UInt(Rn);
integer m = integer UNKNOWN;
boolean wback = FALSE;
boolean tag_checked = wback || n != 31;

Post-index

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 0 1 1 1 1 Rm 1 1 1 0 size Rn Rt
L R opcode S

Immediate offset (Rm == 11111)

LD4R { <Vt>.<T>, <Vt2>.<T>, <Vt3>.<T>, <Vt4>.<T> }, [<Xn|SP>], <imm>

Register offset (Rm != 11111)

LD4R { <Vt>.<T>, <Vt2>.<T>, <Vt3>.<T>, <Vt4>.<T> }, [<Xn|SP>], <Xm>

integer t = UInt(Rt);
integer n = UInt(Rn);
integer m = UInt(Rm);
boolean wback = TRUE;
boolean tag_checked = wback || n != 31;

Assembler Symbols

<Vt> Is the name of the first or only SIMD&FP register to be transferred, encoded in the "Rt" field.

<T> Is an arrangement specifier, encoded in “size:Q”:

size Q <T>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 0 1D
11 1 2D

LD4R Page 1231

<Vt4> Is the name of the fourth SIMD&FP register to be transferred, encoded as "Rt" plus 3 modulo 32.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.

<imm> Is the post-index immediate offset, encoded in “size”:

size <imm>
00 #4
01 #8
10 #16
11 #32

<Xm> Is the 64-bit name of the general-purpose post-index register, excluding XZR, encoded in the "Rm" field.

Shared Decode

integer init_scale = UInt(opcode<2:1>);

integer scale = init_scale;
integer selem = UInt(opcode<0>:R) + 1;
boolean replicate = FALSE;
integer index;

MemOp memop = if L == '1' then MemOp_LOAD else MemOp_STORE;

integer datasize = if Q == '1' then 128 else 64;
integer esize = 8 << scale;

LD4R Page 1232

Operation

if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);

CheckFPAdvSIMDEnabled64();

bits(64) address;
bits(64) offs;
bits(128) rval;
bits(esize) element;
constant integer ebytes = esize DIV 8;

if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];

if wback then
if m != 31 then
offs = X[m];
if n == 31 then
SP[] = address + offs;
else
X[n] = address + offs;

Operational information

If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

LD4R Page 1233

LDNP (SIMD&FP)

Load Pair of SIMD&FP registers, with Non-temporal hint. This instruction loads a pair of SIMD&FP registers from
memory, issuing a hint to the memory system that the access is non-temporal. The address that is used for the load is
calculated from a base register value and an optional immediate offset.
For information about non-temporal pair instructions, see Load/Store SIMD and Floating-point Non-temporal pair.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
opc 1 0 1 1 0 0 0 1 imm7 Rt2 Rn Rt
L

32-bit (opc == 00)

LDNP <St1>, <St2>, [<Xn|SP>{, #<imm>}]

64-bit (opc == 01)

LDNP <Dt1>, <Dt2>, [<Xn|SP>{, #<imm>}]

128-bit (opc == 10)

LDNP <Qt1>, <Qt2>, [<Xn|SP>{, #<imm>}]

// Empty.

For information about the CONSTRAINED UNPREDICTABLE behavior of this instruction, see Architectural Constraints on
UNPREDICTABLE behaviors, and particularly LDNP (SIMD&FP).

Assembler Symbols

LDNP (SIMD&FP) Page 1234

Shared Decode

integer n = UInt(Rn);
integer t = UInt(Rt);
integer t2 = UInt(Rt2);
if opc == '11' then UNDEFINED;
integer scale = 2 + UInt(opc);
integer datasize = 8 << scale;
bits(64) offset = LSL(SignExtend(imm7, 64), scale);
boolean tag_checked = n != 31;

boolean rt_unknown = FALSE;

Operation

CheckFPAdvSIMDEnabled64();
bits(64) address;
bits(datasize) data1;
bits(datasize) data2;
constant integer dbytes = datasize DIV 8;

if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);

if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];

address = address + offset;

data1 = Mem[address, dbytes, AccType_VECSTREAM];

data2 = Mem[address+dbytes, dbytes, AccType_VECSTREAM];
if rt_unknown then
data1 = bits(datasize) UNKNOWN;
data2 = bits(datasize) UNKNOWN;
V[t] = data1;
V[t2] = data2;

Operational information

If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

LDNP (SIMD&FP) Page 1235

LDP (SIMD&FP)

Load Pair of SIMD&FP registers. This instruction loads a pair of SIMD&FP registers from memory. The address that is
used for the load is calculated from a base register value and an optional immediate offset.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 3 classes: Post-index , Pre-index and Signed offset

Post-index

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
opc 1 0 1 1 0 0 1 1 imm7 Rt2 Rn Rt
L

32-bit (opc == 00)

LDP <St1>, <St2>, [<Xn|SP>], #<imm>

64-bit (opc == 01)

LDP <Dt1>, <Dt2>, [<Xn|SP>], #<imm>

128-bit (opc == 10)

LDP <Qt1>, <Qt2>, [<Xn|SP>], #<imm>

boolean wback = TRUE;

boolean postindex = TRUE;

Pre-index

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
opc 1 0 1 1 0 1 1 1 imm7 Rt2 Rn Rt
L

32-bit (opc == 00)

LDP <St1>, <St2>, [<Xn|SP>, #<imm>]!

64-bit (opc == 01)

LDP <Dt1>, <Dt2>, [<Xn|SP>, #<imm>]!

128-bit (opc == 10)

LDP <Qt1>, <Qt2>, [<Xn|SP>, #<imm>]!

boolean wback = TRUE;

boolean postindex = FALSE;

Signed offset

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
opc 1 0 1 1 0 1 0 1 imm7 Rt2 Rn Rt
L

LDP (SIMD&FP) Page 1236

32-bit (opc == 00)

LDP <St1>, <St2>, [<Xn|SP>{, #<imm>}]

64-bit (opc == 01)

LDP <Dt1>, <Dt2>, [<Xn|SP>{, #<imm>}]

128-bit (opc == 10)

LDP <Qt1>, <Qt2>, [<Xn|SP>{, #<imm>}]

boolean wback = FALSE;

boolean postindex = FALSE;

For information about the CONSTRAINED UNPREDICTABLE behavior of this instruction, see Architectural Constraints on
UNPREDICTABLE behaviors, and particularly LDP (SIMD&FP).

Assembler Symbols

<Dt1> Is the 64-bit name of the first SIMD&FP register to be transferred, encoded in the "Rt" field.
<Dt2> Is the 64-bit name of the second SIMD&FP register to be transferred, encoded in the "Rt2" field.
<Qt1> Is the 128-bit name of the first SIMD&FP register to be transferred, encoded in the "Rt" field.
<Qt2> Is the 128-bit name of the second SIMD&FP register to be transferred, encoded in the "Rt2" field.
<St1> Is the 32-bit name of the first SIMD&FP register to be transferred, encoded in the "Rt" field.
<St2> Is the 32-bit name of the second SIMD&FP register to be transferred, encoded in the "Rt2" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<imm> For the 32-bit post-index and 32-bit pre-index variant: is the signed immediate byte offset, a multiple of
4 in the range -256 to 252, encoded in the "imm7" field as <imm>/4.
For the 32-bit signed offset variant: is the optional signed immediate byte offset, a multiple of 4 in the
range -256 to 252, defaulting to 0 and encoded in the "imm7" field as <imm>/4.
For the 64-bit post-index and 64-bit pre-index variant: is the signed immediate byte offset, a multiple of
8 in the range -512 to 504, encoded in the "imm7" field as <imm>/8.
For the 64-bit signed offset variant: is the optional signed immediate byte offset, a multiple of 8 in the
range -512 to 504, defaulting to 0 and encoded in the "imm7" field as <imm>/8.
For the 128-bit post-index and 128-bit pre-index variant: is the signed immediate byte offset, a multiple
of 16 in the range -1024 to 1008, encoded in the "imm7" field as <imm>/16.
For the 128-bit signed offset variant: is the optional signed immediate byte offset, a multiple of 16 in the
range -1024 to 1008, defaulting to 0 and encoded in the "imm7" field as <imm>/16.

LDP (SIMD&FP) Page 1237

Shared Decode

boolean rt_unknown = FALSE;

Operation

CheckFPAdvSIMDEnabled64();
bits(64) address;
bits(datasize) data1;
bits(datasize) data2;
constant integer dbytes = datasize DIV 8;

if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);

if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];

if !postindex then
address = address + offset;

data1 = Mem[address, dbytes, AccType_VEC];

data2 = Mem[address+dbytes, dbytes, AccType_VEC];
if rt_unknown then
data1 = bits(datasize) UNKNOWN;
data2 = bits(datasize) UNKNOWN;
V[t] = data1;
V[t2] = data2;

if wback then
if postindex then
address = address + offset;
if n == 31 then
SP[] = address;
else
X[n] = address;

Operational information

If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

LDP (SIMD&FP) Page 1238

LDR (immediate, SIMD&FP)

Load SIMD&FP Register (immediate offset). This instruction loads an element from memory, and writes the result as a
scalar to the SIMD&FP register. The address that is used for the load is calculated from a base register value, a signed
immediate offset, and an optional offset that is a multiple of the element size.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 3 classes: Post-index , Pre-index and Unsigned offset

Post-index

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
size 1 1 1 1 0 0 x 1 0 imm9 0 1 Rn Rt
opc

8-bit (size == 00 && opc == 01)

LDR <Bt>, [<Xn|SP>], #<simm>

16-bit (size == 01 && opc == 01)

LDR <Ht>, [<Xn|SP>], #<simm>

32-bit (size == 10 && opc == 01)

LDR <St>, [<Xn|SP>], #<simm>

64-bit (size == 11 && opc == 01)

LDR <Dt>, [<Xn|SP>], #<simm>

128-bit (size == 00 && opc == 11)

LDR <Qt>, [<Xn|SP>], #<simm>

boolean wback = TRUE;

boolean postindex = TRUE;
integer scale = UInt(opc<1>:size);
if scale > 4 then UNDEFINED;
bits(64) offset = SignExtend(imm9, 64);

Pre-index

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
size 1 1 1 1 0 0 x 1 0 imm9 1 1 Rn Rt
opc

LDR (immediate, SIMD&FP) Page 1239

8-bit (size == 00 && opc == 01)

LDR <Bt>, [<Xn|SP>, #<simm>]!

16-bit (size == 01 && opc == 01)

LDR <Ht>, [<Xn|SP>, #<simm>]!

32-bit (size == 10 && opc == 01)

LDR <St>, [<Xn|SP>, #<simm>]!

64-bit (size == 11 && opc == 01)

LDR <Dt>, [<Xn|SP>, #<simm>]!

128-bit (size == 00 && opc == 11)

LDR <Qt>, [<Xn|SP>, #<simm>]!

boolean wback = TRUE;

boolean postindex = FALSE;
integer scale = UInt(opc<1>:size);
if scale > 4 then UNDEFINED;
bits(64) offset = SignExtend(imm9, 64);

Unsigned offset

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
size 1 1 1 1 0 1 x 1 imm12 Rn Rt
opc

8-bit (size == 00 && opc == 01)

LDR <Bt>, [<Xn|SP>{, #<pimm>}]

16-bit (size == 01 && opc == 01)

LDR <Ht>, [<Xn|SP>{, #<pimm>}]

32-bit (size == 10 && opc == 01)

LDR <St>, [<Xn|SP>{, #<pimm>}]

64-bit (size == 11 && opc == 01)

LDR <Dt>, [<Xn|SP>{, #<pimm>}]

128-bit (size == 00 && opc == 11)

LDR <Qt>, [<Xn|SP>{, #<pimm>}]

boolean wback = FALSE;

boolean postindex = FALSE;
integer scale = UInt(opc<1>:size);
if scale > 4 then UNDEFINED;
bits(64) offset = LSL(ZeroExtend(imm12, 64), scale);

LDR (immediate, SIMD&FP) Page 1240

Assembler Symbols

<Bt> Is the 8-bit name of the SIMD&FP register to be transferred, encoded in the "Rt" field.
<Dt> Is the 64-bit name of the SIMD&FP register to be transferred, encoded in the "Rt" field.
<Ht> Is the 16-bit name of the SIMD&FP register to be transferred, encoded in the "Rt" field.
<Qt> Is the 128-bit name of the SIMD&FP register to be transferred, encoded in the "Rt" field.
<St> Is the 32-bit name of the SIMD&FP register to be transferred, encoded in the "Rt" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<simm> Is the signed immediate byte offset, in the range -256 to 255, encoded in the "imm9" field.
<pimm> For the 8-bit variant: is the optional positive immediate byte offset, in the range 0 to 4095, defaulting to
0 and encoded in the "imm12" field.
For the 16-bit variant: is the optional positive immediate byte offset, a multiple of 2 in the range 0 to
8190, defaulting to 0 and encoded in the "imm12" field as <pimm>/2.
For the 32-bit variant: is the optional positive immediate byte offset, a multiple of 4 in the range 0 to
16380, defaulting to 0 and encoded in the "imm12" field as <pimm>/4.
For the 64-bit variant: is the optional positive immediate byte offset, a multiple of 8 in the range 0 to
32760, defaulting to 0 and encoded in the "imm12" field as <pimm>/8.
For the 128-bit variant: is the optional positive immediate byte offset, a multiple of 16 in the range 0 to
65520, defaulting to 0 and encoded in the "imm12" field as <pimm>/16.

Shared Decode

integer n = UInt(Rn);
integer t = UInt(Rt);
MemOp memop = if opc<0> == '1' then MemOp_LOAD else MemOp_STORE;
integer datasize = 8 << scale;
boolean tag_checked = memop != MemOp_PREFETCH && (wback || n != 31);

LDR (immediate, SIMD&FP) Page 1241

Operation

CheckFPAdvSIMDEnabled64();
bits(64) address;
bits(datasize) data;

if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);

if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];

if !postindex then
address = address + offset;

case memop of
when MemOp_STORE
data = V[t];
Mem[address, datasize DIV 8, AccType_VEC] = data;

when MemOp_LOAD
data = Mem[address, datasize DIV 8, AccType_VEC];
V[t] = data;

if wback then
if postindex then
address = address + offset;
if n == 31 then
SP[] = address;
else
X[n] = address;

Operational information

If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

LDR (immediate, SIMD&FP) Page 1242

LDR (literal, SIMD&FP)

Load SIMD&FP Register (PC-relative literal). This instruction loads a SIMD&FP register from memory. The address
that is used for the load is calculated from the PC value and an immediate offset.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
opc 0 1 1 1 0 0 imm19 Rt

32-bit (opc == 00)

LDR <St>, <label>

64-bit (opc == 01)

LDR <Dt>, <label>

128-bit (opc == 10)

LDR <Qt>, <label>

integer t = UInt(Rt);
integer size;
bits(64) offset;

case opc of
when '00'
size = 4;
when '01'
size = 8;
when '10'
size = 16;
when '11'
UNDEFINED;

offset = SignExtend(imm19:'00', 64);

Assembler Symbols

<Dt> Is the 64-bit name of the SIMD&FP register to be loaded, encoded in the "Rt" field.
<Qt> Is the 128-bit name of the SIMD&FP register to be loaded, encoded in the "Rt" field.
<St> Is the 32-bit name of the SIMD&FP register to be loaded, encoded in the "Rt" field.
<label> Is the program label from which the data is to be loaded. Its offset from the address of this instruction,
in the range +/-1MB, is encoded as "imm19" times 4.

Operation

bits(64) address = PC[] + offset;

bits(size*8) data;

if HaveMTE2Ext() then
SetTagCheckedInstruction(FALSE);

CheckFPAdvSIMDEnabled64();

data = Mem[address, size, AccType_VEC];

V[t] = data;

LDR (literal, SIMD&FP) Page 1243

Operational information

If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

LDR (literal, SIMD&FP) Page 1244

LDR (register, SIMD&FP)

Load SIMD&FP Register (register offset). This instruction loads a SIMD&FP register from memory. The address that is
used for the load is calculated from a base register value and an offset register value. The offset can be optionally
shifted and extended.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
size 1 1 1 1 0 0 x 1 1 Rm option S 1 0 Rn Rt
opc

8-bit (size == 00 && opc == 01 && option != 011)

LDR <Bt>, [<Xn|SP>, (<Wm>|<Xm>), <extend> {<amount>}]

8-bit (size == 00 && opc == 01 && option == 011)

LDR <Bt>, [<Xn|SP>, <Xm>{, LSL <amount>}]

16-bit (size == 01 && opc == 01)

LDR <Ht>, [<Xn|SP>, (<Wm>|<Xm>){, <extend> {<amount>}}]

32-bit (size == 10 && opc == 01)

LDR <St>, [<Xn|SP>, (<Wm>|<Xm>){, <extend> {<amount>}}]

64-bit (size == 11 && opc == 01)

LDR <Dt>, [<Xn|SP>, (<Wm>|<Xm>){, <extend> {<amount>}}]

128-bit (size == 00 && opc == 11)

LDR <Qt>, [<Xn|SP>, (<Wm>|<Xm>){, <extend> {<amount>}}]

integer scale = UInt(opc<1>:size);

if scale > 4 then UNDEFINED;
if option<1> == '0' then UNDEFINED; // sub-word index
ExtendType extend_type = DecodeRegExtend(option);
integer shift = if S == '1' then scale else 0;

Assembler Symbols

LDR (register, SIMD&FP) Page 1245

<extend> For the 8-bit variant: is the index extend specifier, encoded in “option”:

option <extend>
010 UXTW
110 SXTW
111 SXTX

For the 128-bit, 16-bit, 32-bit and 64-bit variant: is the index extend/shift specifier, defaulting to LSL,
and which must be omitted for the LSL option when <amount> is omitted. encoded in “option”:

option <extend>
010 UXTW
011 LSL
110 SXTW
111 SXTX

<amount> For the 8-bit variant: is the index shift amount, it must be #0, encoded in "S" as 0 if omitted, or as 1 if
present.

For the 16-bit variant: is the index shift amount, optional only when <extend> is not LSL. Where it is
permitted to be optional, it defaults to #0. It is encoded in “S”:

S <amount>
0 #0
1 #1

For the 32-bit variant: is the index shift amount, optional only when <extend> is not LSL. Where it is
permitted to be optional, it defaults to #0. It is encoded in “S”:

S <amount>
0 #0
1 #2

For the 64-bit variant: is the index shift amount, optional only when <extend> is not LSL. Where it is
permitted to be optional, it defaults to #0. It is encoded in “S”:

S <amount>
0 #0
1 #3

For the 128-bit variant: is the index shift amount, optional only when <extend> is not LSL. Where it is
permitted to be optional, it defaults to #0. It is encoded in “S”:

S <amount>
0 #0
1 #4

Shared Decode

integer n = UInt(Rn);
integer t = UInt(Rt);
integer m = UInt(Rm);
MemOp memop = if opc<0> == '1' then MemOp_LOAD else MemOp_STORE;
integer datasize = 8 << scale;
boolean tag_checked = memop != MemOp_PREFETCH;

LDR (register, SIMD&FP) Page 1246

Operation

bits(64) offset = ExtendReg(m, extend_type, shift);

CheckFPAdvSIMDEnabled64();
bits(64) address;
bits(datasize) data;

if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);

if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];

address = address + offset;

case memop of
when MemOp_STORE
data = V[t];
Mem[address, datasize DIV 8, AccType_VEC] = data;

when MemOp_LOAD
data = Mem[address, datasize DIV 8, AccType_VEC];
V[t] = data;

Operational information

If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

LDR (register, SIMD&FP) Page 1247

LDUR (SIMD&FP)

Load SIMD&FP Register (unscaled offset). This instruction loads a SIMD&FP register from memory. The address that
is used for the load is calculated from a base register value and an optional immediate offset.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
size 1 1 1 1 0 0 x 1 0 imm9 0 0 Rn Rt
opc

8-bit (size == 00 && opc == 01)

LDUR <Bt>, [<Xn|SP>{, #<simm>}]

16-bit (size == 01 && opc == 01)

LDUR <Ht>, [<Xn|SP>{, #<simm>}]

32-bit (size == 10 && opc == 01)

LDUR <St>, [<Xn|SP>{, #<simm>}]

64-bit (size == 11 && opc == 01)

LDUR <Dt>, [<Xn|SP>{, #<simm>}]

128-bit (size == 00 && opc == 11)

LDUR <Qt>, [<Xn|SP>{, #<simm>}]

integer scale = UInt(opc<1>:size);

if scale > 4 then UNDEFINED;
bits(64) offset = SignExtend(imm9, 64);

Assembler Symbols

Shared Decode

integer n = UInt(Rn);
integer t = UInt(Rt);
MemOp memop = if opc<0> == '1' then MemOp_LOAD else MemOp_STORE;
integer datasize = 8 << scale;
boolean tag_checked = memop != MemOp_PREFETCH && (n != 31);

LDUR (SIMD&FP) Page 1248

Operation

CheckFPAdvSIMDEnabled64();
bits(64) address;
bits(datasize) data;

if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);

if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];

address = address + offset;

case memop of
when MemOp_STORE
data = V[t];
Mem[address, datasize DIV 8, AccType_VEC] = data;

when MemOp_LOAD
data = Mem[address, datasize DIV 8, AccType_VEC];
V[t] = data;

Operational information

If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

LDUR (SIMD&FP) Page 1249

MLA (by element)

Multiply-Add to accumulator (vector, by element). This instruction multiplies the vector elements in the first source
SIMD&FP register by the specified value in the second source SIMD&FP register, and accumulates the results with
the vector elements of the destination SIMD&FP register. All the values in this instruction are unsigned integer values.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 1 size L M Rm 0 0 0 0 H 0 Rn Rd
o2

MLA <Vd>.<T>, <Vn>.<T>, <Vm>.<Ts>[<index>]

integer idxdsize = if H == '1' then 128 else 64;

integer index;
bit Rmhi;
case size of
when '01' index = UInt(H:L:M); Rmhi = '0';
when '10' index = UInt(H:L); Rmhi = M;
otherwise UNDEFINED;

integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rmhi:Rm);

integer esize = 8 << UInt(size);

integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;

boolean sub_op = (o2 == '1');

Assembler Symbols

<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.

<T> Is an arrangement specifier, encoded in “size:Q”:

size Q <T>
00 x RESERVED
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 x RESERVED

<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.

<Vm> Is the name of the second SIMD&FP source register, encoded in “size:M:Rm”:

size <Vm>
00 RESERVED
01 0:Rm
10 M:Rm
11 RESERVED
Restricted to V0-V15 when element size <Ts> is H.

<Ts> Is an element size specifier, encoded in “size”:

size <Ts>
00 RESERVED
01 H
10 S
11 RESERVED

MLA (by element) Page 1250

<index> Is the element index, encoded in “size:L:H:M”:

size <index>
00 RESERVED
01 H:L:M
10 H:L
11 RESERVED

Operation

CheckFPAdvSIMDEnabled64();
bits(datasize) operand1 = V[n];
bits(idxdsize) operand2 = V[m];
bits(datasize) operand3 = V[d];
bits(datasize) result;
integer element1;
integer element2;
bits(esize) product;

element2 = UInt(Elem[operand2, index, esize]);

for e = 0 to elements-1
element1 = UInt(Elem[operand1, e, esize]);
product = (element1*element2)<esize-1:0>;
if sub_op then
Elem[result, e, esize] = Elem[operand3, e, esize] - product;
else
Elem[result, e, esize] = Elem[operand3, e, esize] + product;
V[d] = result;

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

MLA (by element) Page 1251

MLA (vector)

Multiply-Add to accumulator (vector). This instruction multiplies corresponding elements in the vectors of the two
source SIMD&FP registers, and accumulates the results with the vector elements of the destination SIMD&FP
register.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 size 1 Rm 1 0 0 1 0 1 Rn Rd
U

MLA <Vd>.<T>, <Vn>.<T>, <Vm>.<T>

integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
if size == '11' then UNDEFINED;
integer esize = 8 << UInt(size);
integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;

boolean sub_op = (U == '1');

Assembler Symbols

<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.

<T> Is an arrangement specifier, encoded in “size:Q”:

size Q <T>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 x RESERVED

<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.

Operation

for e = 0 to elements-1
element1 = Elem[operand1, e, esize];
element2 = Elem[operand2, e, esize];
product = (UInt(element1)*UInt(element2))<esize-1:0>;
if sub_op then
Elem[result, e, esize] = Elem[operand3, e, esize] - product;
else
Elem[result, e, esize] = Elem[operand3, e, esize] + product;

V[d] = result;

MLA (vector) Page 1252

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

MLA (vector) Page 1253

MLS (by element)

Multiply-Subtract from accumulator (vector, by element). This instruction multiplies the vector elements in the first
source SIMD&FP register by the specified value in the second source SIMD&FP register, and subtracts the results
from the vector elements of the destination SIMD&FP register. All the values in this instruction are unsigned integer
values.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 1 size L M Rm 0 1 0 0 H 0 Rn Rd
o2

MLS <Vd>.<T>, <Vn>.<T>, <Vm>.<Ts>[<index>]

integer idxdsize = if H == '1' then 128 else 64;

integer index;
bit Rmhi;
case size of
when '01' index = UInt(H:L:M); Rmhi = '0';
when '10' index = UInt(H:L); Rmhi = M;
otherwise UNDEFINED;

integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rmhi:Rm);

integer esize = 8 << UInt(size);

integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;

boolean sub_op = (o2 == '1');

Assembler Symbols

<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.

<T> Is an arrangement specifier, encoded in “size:Q”:

size Q <T>
00 x RESERVED
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 x RESERVED

<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.

<Vm> Is the name of the second SIMD&FP source register, encoded in “size:M:Rm”:

size <Vm>
00 RESERVED
01 0:Rm
10 M:Rm
11 RESERVED
Restricted to V0-V15 when element size <Ts> is H.

<Ts> Is an element size specifier, encoded in “size”:

MLS (by element) Page 1254

size <Ts>
00 RESERVED
01 H
10 S
11 RESERVED

<index> Is the element index, encoded in “size:L:H:M”:

size <index>
00 RESERVED
01 H:L:M
10 H:L
11 RESERVED

Operation

element2 = UInt(Elem[operand2, index, esize]);

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

MLS (by element) Page 1255

MLS (vector)

Multiply-Subtract from accumulator (vector). This instruction multiplies corresponding elements in the vectors of the
two source SIMD&FP registers, and subtracts the results from the vector elements of the destination SIMD&FP
register.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 size 1 Rm 1 0 0 1 0 1 Rn Rd
U

MLS <Vd>.<T>, <Vn>.<T>, <Vm>.<T>

boolean sub_op = (U == '1');

Assembler Symbols

<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.

<T> Is an arrangement specifier, encoded in “size:Q”:

size Q <T>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 x RESERVED

<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.

Operation

V[d] = result;

MLS (vector) Page 1256

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

MLS (vector) Page 1257

MOV (element)

Move vector element to another vector element. This instruction copies the vector element of the source SIMD&FP
register to the specified vector element of the destination SIMD&FP register.
This instruction can insert data into individual elements within a SIMD&FP register without clearing the remaining
bits to zero.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.

This is an alias of INS (element). This means:

• The encodings in this description are named to match the encodings of INS (element).
• The description of INS (element) gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 1 1 1 0 0 0 0 imm5 0 imm4 1 Rn Rd

MOV <Vd>.<Ts>[<index1>], <Vn>.<Ts>[<index2>]

is equivalent to

INS <Vd>.<Ts>[<index1>], <Vn>.<Ts>[<index2>]

and is always the preferred disassembly.

Assembler Symbols

<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.

<Ts> Is an element size specifier, encoded in “imm5”:

imm5 <Ts>
x0000 RESERVED
xxxx1 B
xxx10 H
xx100 S
x1000 D

<index1> Is the destination element index encoded in “imm5”:

imm5 <index1>
x0000 RESERVED
xxxx1 imm5<4:1>
xxx10 imm5<4:2>
xx100 imm5<4:3>
x1000 imm5<4>

<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.

<index2> Is the source element index encoded in “imm5:imm4”:

imm5 <index2>
x0000 RESERVED
xxxx1 imm4<3:0>
xxx10 imm4<3:1>
xx100 imm4<3:2>
x1000 imm4<3>
Unspecified bits in "imm4" are ignored but should be set to zero by an assembler.

Operation

The description of INS (element) gives the operational pseudocode for this instruction.

MOV (element) Page 1258

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

MOV (element) Page 1259

MOV (from general)

Move general-purpose register to a vector element. This instruction copies the contents of the source general-purpose
register to the specified vector element in the destination SIMD&FP register.
This instruction can insert data into individual elements within a SIMD&FP register without clearing the remaining
bits to zero.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.

This is an alias of INS (general). This means:

• The encodings in this description are named to match the encodings of INS (general).
• The description of INS (general) gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 0 1 1 1 0 0 0 0 imm5 0 0 0 1 1 1 Rn Rd

MOV <Vd>.<Ts>[<index>], <R><n>

is equivalent to

INS <Vd>.<Ts>[<index>], <R><n>

and is always the preferred disassembly.

Assembler Symbols

<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.

<Ts> Is an element size specifier, encoded in “imm5”:

imm5 <Ts>
x0000 RESERVED
xxxx1 B
xxx10 H
xx100 S
x1000 D

<index> Is the element index encoded in “imm5”:

imm5 <index>
x0000 RESERVED
xxxx1 imm5<4:1>
xxx10 imm5<4:2>
xx100 imm5<4:3>
x1000 imm5<4>

<R> Is the width specifier for the general-purpose source register, encoded in “imm5”:

imm5 <R>
x0000 RESERVED
xxxx1 W
xxx10 W
xx100 W
x1000 X

<n> Is the number [0-30] of the general-purpose source register or ZR (31), encoded in the "Rn" field.

Operation

The description of INS (general) gives the operational pseudocode for this instruction.

MOV (from general) Page 1260

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

MOV (from general) Page 1261

MOV (scalar)

Move vector element to scalar. This instruction duplicates the specified vector element in the SIMD&FP source
register into a scalar, and writes the result to the SIMD&FP destination register.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.

This is an alias of DUP (element). This means:

• The encodings in this description are named to match the encodings of DUP (element).
• The description of DUP (element) gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 0 0 0 0 imm5 0 0 0 0 0 1 Rn Rd

MOV <V><d>, <Vn>.<T>[<index>]

is equivalent to

DUP <V><d>, <Vn>.<T>[<index>]

and is always the preferred disassembly.

Assembler Symbols

<V> Is the destination width specifier, encoded in “imm5”:

imm5 <V>
x0000 RESERVED
xxxx1 B
xxx10 H
xx100 S
x1000 D

<d> Is the number of the SIMD&FP destination register, encoded in the "Rd" field.
<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.

<T> Is the element width specifier, encoded in “imm5”:

imm5 <T>
x0000 RESERVED
xxxx1 B
xxx10 H
xx100 S
x1000 D

<index> Is the element index encoded in “imm5”:

imm5 <index>
x0000 RESERVED
xxxx1 imm5<4:1>
xxx10 imm5<4:2>
xx100 imm5<4:3>
x1000 imm5<4>

Operation

The description of DUP (element) gives the operational pseudocode for this instruction.

Operational information

If PSTATE.DIT is 1:

MOV (scalar) Page 1262

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

MOV (scalar) Page 1263

MOV (to general)

Move vector element to general-purpose register. This instruction reads the unsigned integer from the source
SIMD&FP register, zero-extends it to form a 32-bit or 64-bit value, and writes the result to the destination general-
purpose register.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.

This is an alias of UMOV. This means:

• The encodings in this description are named to match the encodings of UMOV.
• The description of UMOV gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 0 0 0 x x x 0 0 0 0 1 1 1 1 Rn Rd
imm5

32-bit (Q == 0 && imm5 == xx100)

MOV <Wd>, <Vn>.S[<index>]

is equivalent to

UMOV <Wd>, <Vn>.S[<index>]

and is always the preferred disassembly.

64-bit (Q == 1 && imm5 == x1000)

MOV <Xd>, <Vn>.D[<index>]

is equivalent to

UMOV <Xd>, <Vn>.D[<index>]

and is always the preferred disassembly.

Assembler Symbols

Operation

The description of UMOV gives the operational pseudocode for this instruction.

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

MOV (to general) Page 1264

MOV (vector)

Move vector. This instruction copies the vector in the source SIMD&FP register into the destination SIMD&FP
register.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.

This is an alias of ORR (vector, register). This means:

• The encodings in this description are named to match the encodings of ORR (vector, register).
• The description of ORR (vector, register) gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 1 0 1 Rm 0 0 0 1 1 1 Rn Rd
size

MOV <Vd>.<T>, <Vn>.<T>

is equivalent to

ORR <Vd>.<T>, <Vn>.<T>, <Vn>.<T>

and is the preferred disassembly when Rm == Rn.

Assembler Symbols

<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.

<T> Is an arrangement specifier, encoded in “Q”:

Q <T>
0 8B
1 16B

<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.

Operation

The description of ORR (vector, register) gives the operational pseudocode for this instruction.

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

MOV (vector) Page 1265

MOVI

Move Immediate (vector). This instruction places an immediate constant into every vector element of the destination
SIMD&FP register.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q op 0 1 1 1 1 0 0 0 0 0 a b c cmode 0 1 d e f g h Rd

8-bit (op == 0 && cmode == 1110)

MOVI <Vd>.<T>, #<imm8>{, LSL #0}

16-bit shifted immediate (op == 0 && cmode == 10x0)

MOVI <Vd>.<T>, #<imm8>{, LSL #<amount>}

32-bit shifted immediate (op == 0 && cmode == 0xx0)

MOVI <Vd>.<T>, #<imm8>{, LSL #<amount>}

32-bit shifting ones (op == 0 && cmode == 110x)

MOVI <Vd>.<T>, #<imm8>, MSL #<amount>

64-bit scalar (Q == 0 && op == 1 && cmode == 1110)

MOVI <Dd>, #<imm>

64-bit vector (Q == 1 && op == 1 && cmode == 1110)

MOVI <Vd>.2D, #<imm>

integer rd = UInt(Rd);

integer datasize = if Q == '1' then 128 else 64;

bits(datasize) imm;
bits(64) imm64;

ImmediateOp operation;
case cmode:op of
when '0xx00' operation = ImmediateOp_MOVI;
when '0xx01' operation = ImmediateOp_MVNI;
when '0xx10' operation = ImmediateOp_ORR;
when '0xx11' operation = ImmediateOp_BIC;
when '10x00' operation = ImmediateOp_MOVI;
when '10x01' operation = ImmediateOp_MVNI;
when '10x10' operation = ImmediateOp_ORR;
when '10x11' operation = ImmediateOp_BIC;
when '110x0' operation = ImmediateOp_MOVI;
when '110x1' operation = ImmediateOp_MVNI;
when '1110x' operation = ImmediateOp_MOVI;
when '11110' operation = ImmediateOp_MOVI;
when '11111'
// FMOV Dn,#imm is in main FP instruction set
if Q == '0' then UNDEFINED;
operation = ImmediateOp_MOVI;

imm64 = AdvSIMDExpandImm(op, cmode, a:b:c:d:e:f:g:h);

imm = Replicate(imm64, datasize DIV 64);

MOVI Page 1266

Assembler Symbols

<Dd> Is the 64-bit name of the SIMD&FP destination register, encoded in the "Rd" field.
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.
<imm> Is a 64-bit immediate 'aaaaaaaabbbbbbbbccccccccddddddddeeeeeeeeffffffffgggggggghhhhhhhh',
encoded in "a:b:c:d:e:f:g:h".

<T> For the 8-bit variant: is an arrangement specifier, encoded in “Q”:

Q <T>
0 8B
1 16B

For the 16-bit variant: is an arrangement specifier, encoded in “Q”:

Q <T>
0 4H
1 8H

For the 32-bit variant: is an arrangement specifier, encoded in “Q”:

Q <T>
0 2S
1 4S

<imm8> Is an 8-bit immediate encoded in "a:b:c:d:e:f:g:h".

<amount> For the 16-bit shifted immediate variant: is the shift amount encoded in “cmode<1>”:

cmode<1> <amount>
0 0
1 8
defaulting to 0 if LSL is omitted.

For the 32-bit shifted immediate variant: is the shift amount encoded in “cmode<2:1>”:

cmode<2:1> <amount>
00 0
01 8
10 16
11 24
defaulting to 0 if LSL is omitted.

For the 32-bit shifting ones variant: is the shift amount encoded in “cmode<0>”:

cmode<0> <amount>
0 8
1 16

MOVI Page 1267

Operation

CheckFPAdvSIMDEnabled64();
bits(datasize) operand;
bits(datasize) result;

V[rd] = result;

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

MOVI Page 1268

MUL (by element)

Multiply (vector, by element). This instruction multiplies the vector elements in the first source SIMD&FP register by
the specified value in the second source SIMD&FP register, places the results in a vector, and writes the vector to the
destination SIMD&FP register. All the values in this instruction are unsigned integer values.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 1 size L M Rm 1 0 0 0 H 0 Rn Rd

MUL <Vd>.<T>, <Vn>.<T>, <Vm>.<Ts>[<index>]

integer idxdsize = if H == '1' then 128 else 64;

integer index;
bit Rmhi;
case size of
when '01' index = UInt(H:L:M); Rmhi = '0';
when '10' index = UInt(H:L); Rmhi = M;
otherwise UNDEFINED;

integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rmhi:Rm);

integer esize = 8 << UInt(size);

integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;

Assembler Symbols

<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.

<T> Is an arrangement specifier, encoded in “size:Q”:

size Q <T>
00 x RESERVED
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 x RESERVED

<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.

<Vm> Is the name of the second SIMD&FP source register, encoded in “size:M:Rm”:

size <Vm>
00 RESERVED
01 0:Rm
10 M:Rm
11 RESERVED
Restricted to V0-V15 when element size <Ts> is H.

<Ts> Is an element size specifier, encoded in “size”:

size <Ts>
00 RESERVED
01 H
10 S
11 RESERVED

<index> Is the element index, encoded in “size:L:H:M”:

MUL (by element) Page 1269

size <index>
00 RESERVED
01 H:L:M
10 H:L
11 RESERVED

Operation

CheckFPAdvSIMDEnabled64();
bits(datasize) operand1 = V[n];
bits(idxdsize) operand2 = V[m];
bits(datasize) result;
integer element1;
integer element2;
bits(esize) product;

element2 = UInt(Elem[operand2, index, esize]);

for e = 0 to elements-1
element1 = UInt(Elem[operand1, e, esize]);
product = (element1*element2)<esize-1:0>;
Elem[result, e, esize] = product;

V[d] = result;

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

MUL (by element) Page 1270

MUL (vector)

Multiply (vector). This instruction multiplies corresponding elements in the vectors of the two source SIMD&FP
registers, places the results in a vector, and writes the vector to the destination SIMD&FP register.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 size 1 Rm 1 0 0 1 1 1 Rn Rd
U

MUL <Vd>.<T>, <Vn>.<T>, <Vm>.<T>

integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
if U == '1' && size != '00' then UNDEFINED;
if size == '11' then UNDEFINED;
integer esize = 8 << UInt(size);
integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;

boolean poly = (U == '1');

Assembler Symbols

<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.

<T> Is an arrangement specifier, encoded in “size:Q”:

size Q <T>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 x RESERVED

<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.

Operation

CheckFPAdvSIMDEnabled64();
bits(datasize) operand1 = V[n];
bits(datasize) operand2 = V[m];
bits(datasize) result;
bits(esize) element1;
bits(esize) element2;
bits(esize) product;

for e = 0 to elements-1
element1 = Elem[operand1, e, esize];
element2 = Elem[operand2, e, esize];
if poly then
product = PolynomialMult(element1, element2)<esize-1:0>;
else
product = (UInt(element1)*UInt(element2))<esize-1:0>;
Elem[result, e, esize] = product;

V[d] = result;

MUL (vector) Page 1271

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

MUL (vector) Page 1272

MVN

This is an alias of NOT. This means:

• The encodings in this description are named to match the encodings of NOT.
• The description of NOT gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 0 0 1 0 0 0 0 0 0 1 0 1 1 0 Rn Rd

MVN <Vd>.<T>, <Vn>.<T>

is equivalent to

NOT <Vd>.<T>, <Vn>.<T>

and is always the preferred disassembly.

Assembler Symbols

<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.

<T> Is an arrangement specifier, encoded in “Q”:

Q <T>
0 8B
1 16B

<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.

Operation

The description of NOT gives the operational pseudocode for this instruction.

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

MVN Page 1273

MVNI

Move inverted Immediate (vector). This instruction places the inverse of an immediate constant into every vector
element of the destination SIMD&FP register.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 1 0 0 0 0 0 a b c cmode 0 1 d e f g h Rd
op

16-bit shifted immediate (cmode == 10x0)

MVNI <Vd>.<T>, #<imm8>{, LSL #<amount>}

32-bit shifted immediate (cmode == 0xx0)

MVNI <Vd>.<T>, #<imm8>{, LSL #<amount>}

32-bit shifting ones (cmode == 110x)

MVNI <Vd>.<T>, #<imm8>, MSL #<amount>

integer rd = UInt(Rd);

integer datasize = if Q == '1' then 128 else 64;

bits(datasize) imm;
bits(64) imm64;

imm64 = AdvSIMDExpandImm(op, cmode, a:b:c:d:e:f:g:h);

imm = Replicate(imm64, datasize DIV 64);

Assembler Symbols

<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.

<T> For the 16-bit variant: is an arrangement specifier, encoded in “Q”:

Q <T>
0 4H
1 8H

For the 32-bit variant: is an arrangement specifier, encoded in “Q”:

Q <T>
0 2S
1 4S

<imm8> Is an 8-bit immediate encoded in "a:b:c:d:e:f:g:h".

MVNI Page 1274

<amount> For the 16-bit shifted immediate variant: is the shift amount encoded in “cmode<1>”:

cmode<1> <amount>
0 0
1 8
defaulting to 0 if LSL is omitted.

For the 32-bit shifted immediate variant: is the shift amount encoded in “cmode<2:1>”:

cmode<2:1> <amount>
00 0
01 8
10 16
11 24
defaulting to 0 if LSL is omitted.

For the 32-bit shifting ones variant: is the shift amount encoded in “cmode<0>”:

cmode<0> <amount>
0 8
1 16

Operation

CheckFPAdvSIMDEnabled64();
bits(datasize) operand;
bits(datasize) result;

V[rd] = result;

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

MVNI Page 1275

NEG (vector)

Negate (vector). This instruction reads each vector element from the source SIMD&FP register, negates each value,
puts the result into a vector, and writes the vector to the destination SIMD&FP register.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: Scalar and Vector

Scalar

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 1 1 0 size 1 0 0 0 0 0 1 0 1 1 1 0 Rn Rd
U

NEG <V><d>, <V><n>

integer d = UInt(Rd);
integer n = UInt(Rn);

if size != '11' then UNDEFINED;

integer esize = 8 << UInt(size);
integer datasize = esize;
integer elements = 1;
boolean neg = (U == '1');

Vector

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 size 1 0 0 0 0 0 1 0 1 1 1 0 Rn Rd
U

NEG <Vd>.<T>, <Vn>.<T>

integer d = UInt(Rd);
integer n = UInt(Rn);

if size:Q == '110' then UNDEFINED;

integer esize = 8 << UInt(size);
integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;
boolean neg = (U == '1');

Assembler Symbols

<V> Is a width specifier, encoded in “size”:

size <V>
0x RESERVED
10 RESERVED
11 D

<T> Is an arrangement specifier, encoded in “size:Q”:

NEG (vector) Page 1276

size Q <T>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 0 RESERVED
11 1 2D

<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.

Operation

CheckFPAdvSIMDEnabled64();
bits(datasize) operand = V[n];
bits(datasize) result;
integer element;

for e = 0 to elements-1
element = SInt(Elem[operand, e, esize]);
if neg then
element = -element;
else
element = Abs(element);
Elem[result, e, esize] = element<esize-1:0>;

V[d] = result;

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

NEG (vector) Page 1277

NOT

Bitwise NOT (vector). This instruction reads each vector element from the source SIMD&FP register, places the
inverse of each value into a vector, and writes the vector to the destination SIMD&FP register.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
This instruction is used by the alias MVN.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 0 0 1 0 0 0 0 0 0 1 0 1 1 0 Rn Rd

NOT <Vd>.<T>, <Vn>.<T>

integer d = UInt(Rd);
integer n = UInt(Rn);

integer esize = 8;
integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV 8;

Assembler Symbols

<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.

<T> Is an arrangement specifier, encoded in “Q”:

Q <T>
0 8B
1 16B

<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.

Operation

CheckFPAdvSIMDEnabled64();
bits(datasize) operand = V[n];
bits(datasize) result;
bits(esize) element;

for e = 0 to elements-1
element = Elem[operand, e, esize];
Elem[result, e, esize] = NOT(element);

V[d] = result;

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

NOT Page 1278

ORN (vector)

Bitwise inclusive OR NOT (vector). This instruction performs a bitwise OR NOT between the two source SIMD&FP
registers, and writes the result to the destination SIMD&FP register.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 1 1 1 Rm 0 0 0 1 1 1 Rn Rd
size

ORN <Vd>.<T>, <Vn>.<T>, <Vm>.<T>

integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer datasize = if Q == '1' then 128 else 64;

Assembler Symbols

<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.

<T> Is an arrangement specifier, encoded in “Q”:

Q <T>
0 8B
1 16B

<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.

Operation

CheckFPAdvSIMDEnabled64();
bits(datasize) operand1 = V[n];
bits(datasize) operand2 = V[m];
bits(datasize) result;

operand2 = NOT(operand2);

result = operand1 OR operand2;

V[d] = result;

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

ORN (vector) Page 1279

ORR (vector, immediate)

Bitwise inclusive OR (vector, immediate). This instruction reads each vector element from the destination SIMD&FP
register, performs a bitwise OR between each result and an immediate constant, places the result into a vector, and
writes the vector to the destination SIMD&FP register.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 1 0 0 0 0 0 a b c x x x 1 0 1 d e f g h Rd
op cmode

16-bit (cmode == 10x1)

ORR <Vd>.<T>, #<imm8>{, LSL #<amount>}

32-bit (cmode == 0xx1)

ORR <Vd>.<T>, #<imm8>{, LSL #<amount>}

integer rd = UInt(Rd);

integer datasize = if Q == '1' then 128 else 64;

bits(datasize) imm;
bits(64) imm64;

ImmediateOp operation;
case cmode:op of
when '0xx00' operation = ImmediateOp_MOVI;
when '0xx10' operation = ImmediateOp_ORR;
when '10x00' operation = ImmediateOp_MOVI;
when '10x10' operation = ImmediateOp_ORR;
when '110x0' operation = ImmediateOp_MOVI;
when '1110x' operation = ImmediateOp_MOVI;
when '11110' operation = ImmediateOp_MOVI;
imm64 = AdvSIMDExpandImm(op, cmode, a:b:c:d:e:f:g:h);
imm = Replicate(imm64, datasize DIV 64);

Assembler Symbols

<Vd> Is the name of the SIMD&FP register, encoded in the "Rd" field.

<T> For the 16-bit variant: is an arrangement specifier, encoded in “Q”:

Q <T>
0 4H
1 8H

For the 32-bit variant: is an arrangement specifier, encoded in “Q”:

Q <T>
0 2S
1 4S

<imm8> Is an 8-bit immediate encoded in "a:b:c:d:e:f:g:h".

<amount> For the 16-bit variant: is the shift amount encoded in “cmode<1>”:

cmode<1> <amount>
0 0
1 8
defaulting to 0 if LSL is omitted.

ORR (vector, immediate) Page 1280

For the 32-bit variant: is the shift amount encoded in “cmode<2:1>”:

cmode<2:1> <amount>
00 0
01 8
10 16
11 24
defaulting to 0 if LSL is omitted.

Operation

CheckFPAdvSIMDEnabled64();
bits(datasize) operand;
bits(datasize) result;

V[rd] = result;

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

ORR (vector, immediate) Page 1281

ORR (vector, register)

Bitwise inclusive OR (vector, register). This instruction performs a bitwise OR between the two source SIMD&FP
registers, and writes the result to the destination SIMD&FP register.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
This instruction is used by the alias MOV (vector).
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 1 0 1 Rm 0 0 0 1 1 1 Rn Rd
size

ORR <Vd>.<T>, <Vn>.<T>, <Vm>.<T>

integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer datasize = if Q == '1' then 128 else 64;

Assembler Symbols

<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.

<T> Is an arrangement specifier, encoded in “Q”:

Q <T>
0 8B
1 16B

<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.

Alias Conditions

Alias Is preferred when

MOV (vector) Rm == Rn

Operation

CheckFPAdvSIMDEnabled64();
bits(datasize) operand1 = V[n];
bits(datasize) operand2 = V[m];
bits(datasize) result;

result = operand1 OR operand2;

V[d] = result;

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

ORR (vector, register) Page 1282

PMUL

Polynomial Multiply. This instruction multiplies corresponding elements in the vectors of the two source SIMD&FP
registers, places the results in a vector, and writes the vector to the destination SIMD&FP register.
For information about multiplying polynomials see Polynomial arithmetic over {0, 1}.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 size 1 Rm 1 0 0 1 1 1 Rn Rd
U

PMUL <Vd>.<T>, <Vn>.<T>, <Vm>.<T>

boolean poly = (U == '1');

Assembler Symbols

<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.

<T> Is an arrangement specifier, encoded in “size:Q”:

size Q <T>
00 0 8B
00 1 16B
01 x RESERVED
1x x RESERVED

<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.

Operation

CheckFPAdvSIMDEnabled64();
bits(datasize) operand1 = V[n];
bits(datasize) operand2 = V[m];
bits(datasize) result;
bits(esize) element1;
bits(esize) element2;
bits(esize) product;

V[d] = result;

PMUL Page 1283

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

PMUL Page 1284

PMULL, PMULL2

Polynomial Multiply Long. This instruction multiplies corresponding elements in the lower or upper half of the vectors
of the two source SIMD&FP registers, places the results in a vector, and writes the vector to the destination SIMD&FP
register. The destination vector elements are twice as long as the elements that are multiplied.
For information about multiplying polynomials see Polynomial arithmetic over {0, 1}.
The PMULL instruction extracts each source vector from the lower half of each source register. The PMULL2 instruction
extracts each source vector from the upper half of each source register.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 size 1 Rm 1 1 1 0 0 0 Rn Rd

PMULL{2} <Vd>.<Ta>, <Vn>.<Tb>, <Vm>.<Tb>

integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);

if size == '01' || size == '10' then UNDEFINED;

if size == '11' && !HaveBit128PMULLExt() then UNDEFINED;
integer esize = 8 << UInt(size);
integer datasize = 64;
integer part = UInt(Q);
integer elements = datasize DIV esize;

Assembler Symbols

2 Is the second and upper half specifier. If present it causes the operation to be performed on the upper
64 bits of the registers holding the narrower elements, and is encoded in “Q”:

Q 2
0 [absent]
1 [present]

<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.

<Ta> Is an arrangement specifier, encoded in “size”:

size <Ta>
00 8H
01 RESERVED
10 RESERVED
11 1Q
The '1Q' arrangement is only allocated in an implementation that includes the Cryptographic Extension,
and is otherwise RESERVED.
<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.

<Tb> Is an arrangement specifier, encoded in “size:Q”:

size Q <Tb>
00 0 8B
00 1 16B
01 x RESERVED
10 x RESERVED
11 0 1D
11 1 2D

<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.

PMULL, PMULL2 Page 1285

Operation

CheckFPAdvSIMDEnabled64();
bits(datasize) operand1 = Vpart[n, part];
bits(datasize) operand2 = Vpart[m, part];
bits(2*datasize) result;
bits(esize) element1;
bits(esize) element2;

for e = 0 to elements-1
element1 = Elem[operand1, e, esize];
element2 = Elem[operand2, e, esize];
Elem[result, e, 2*esize] = PolynomialMult(element1, element2);

V[d] = result;

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

PMULL, PMULL2 Page 1286

RADDHN, RADDHN2

Rounding Add returning High Narrow. This instruction adds each vector element in the first source SIMD&FP register
to the corresponding vector element in the second source SIMD&FP register, places the most significant half of the
result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register.
The results are rounded. For truncated results, see ADDHN.
The RADDHN instruction writes the vector to the lower half of the destination register and clears the upper half, while
the RADDHN2 instruction writes the vector to the upper half of the destination register without affecting the other bits
of the register.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 size 1 Rm 0 1 0 0 0 0 Rn Rd
U o1

RADDHN{2} <Vd>.<Tb>, <Vn>.<Ta>, <Vm>.<Ta>

integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);

if size == '11' then UNDEFINED;

integer esize = 8 << UInt(size);
integer datasize = 64;
integer part = UInt(Q);
integer elements = datasize DIV esize;

boolean sub_op = (o1 == '1');

boolean round = (U == '1');

Assembler Symbols

2 Is the second and upper half specifier. If present it causes the operation to be performed on the upper
64 bits of the registers holding the narrower elements, and is encoded in “Q”:

Q 2
0 [absent]
1 [present]

<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.

<Tb> Is an arrangement specifier, encoded in “size:Q”:

size Q <Tb>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 x RESERVED

<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.

<Ta> Is an arrangement specifier, encoded in “size”:

size <Ta>
00 8H
01 4S
10 2D
11 RESERVED

<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.

RADDHN, RADDHN2 Page 1287

Operation

Vpart[d, part] = result;

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

RADDHN, RADDHN2 Page 1288

RAX1

Rotate and Exclusive OR rotates each 64-bit element of the 128-bit vector in a source SIMD&FP register left by 1,
performs a bitwise exclusive OR of the resulting 128-bit vector and the vector in another source SIMD&FP register,
and writes the result to the destination SIMD&FP register.
This instruction is implemented only when FEAT_SHA3 is implemented.

Advanced SIMD
(FEAT_SHA3)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 1 1 1 0 0 1 1 Rm 1 0 0 0 1 1 Rn Rd

RAX1 <Vd>.2D, <Vn>.2D, <Vm>.2D

if !HaveSHA3Ext() then UNDEFINED;

integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);

Assembler Symbols

Operation

AArch64.CheckFPAdvSIMDEnabled();

bits(128) Vm = V[m];
bits(128) Vn = V[n];
V[d] = Vn EOR (ROL(Vm<127:64>, 1):ROL(Vm<63:0>, 1));

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

RAX1 Page 1289

RBIT (vector)

Reverse Bit order (vector). This instruction reads each vector element from the source SIMD&FP register, reverses the
bits of the element, places the results into a vector, and writes the vector to the destination SIMD&FP register.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 0 1 1 0 0 0 0 0 0 1 0 1 1 0 Rn Rd

RBIT <Vd>.<T>, <Vn>.<T>

integer d = UInt(Rd);
integer n = UInt(Rn);

integer esize = 8;
integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV 8;

Assembler Symbols

<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.

<T> Is an arrangement specifier, encoded in “Q”:

Q <T>
0 8B
1 16B

<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.

Operation

CheckFPAdvSIMDEnabled64();
bits(datasize) operand = V[n];
bits(datasize) result;
bits(esize) element;
bits(esize) rev;

for e = 0 to elements-1
element = Elem[operand, e, esize];
for i = 0 to esize-1
rev<(esize-1)-i> = element<i>;
Elem[result, e, esize] = rev;

V[d] = result;

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

RBIT (vector) Page 1290

REV16 (vector)

Reverse elements in 16-bit halfwords (vector). This instruction reverses the order of 8-bit elements in each halfword of
the vector in the source SIMD&FP register, places the results into a vector, and writes the vector to the destination
SIMD&FP register.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 size 1 0 0 0 0 0 0 0 0 1 1 0 Rn Rd
U o0

REV16 <Vd>.<T>, <Vn>.<T>

integer d = UInt(Rd);
integer n = UInt(Rn);

// size=esize: B(0), H(1), S(1), D(S)

integer esize = 8 << UInt(size);
integer datasize = if Q == '1' then 128 else 64;

// op=REVx: 64(0), 32(1), 16(2)

bits(2) op = o0:U;

// => op+size:
// 64+B = 0, 64+H = 1, 64+S = 2, 64+D = X
// 32+B = 1, 32+H = 2, 32+S = X, 32+D = X
// 16+B = 2, 16+H = X, 16+S = X, 16+D = X
// 8+B = X, 8+H = X, 8+S = X, 8+D = X
// => 3-(op+size) (index bits in group)
// 64/B = 3, 64+H = 2, 64+S = 1, 64+D = X
// 32+B = 2, 32+H = 1, 32+S = X, 32+D = X
// 16+B = 1, 16+H = X, 16+S = X, 16+D = X
// 8+B = X, 8+H = X, 8+S = X, 8+D = X

// index bits within group: 1, 2, 3

if UInt(op) + UInt(size) >= 3 then UNDEFINED;

integer container_size;
case op of
when '10' container_size = 16;
when '01' container_size = 32;
when '00' container_size = 64;

integer containers = datasize DIV container_size;

integer elements_per_container = container_size DIV esize;

Assembler Symbols

<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.

<T> Is an arrangement specifier, encoded in “size:Q”:

size Q <T>
00 0 8B
00 1 16B
01 x RESERVED
1x x RESERVED

<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.

REV16 (vector) Page 1291

Operation

CheckFPAdvSIMDEnabled64();
bits(datasize) operand = V[n];
bits(datasize) result;
integer element = 0;
integer rev_element;
for c = 0 to containers-1
rev_element = element + elements_per_container - 1;
for e = 0 to elements_per_container-1
Elem[result, rev_element, esize] = Elem[operand, element, esize];
element = element + 1;
rev_element = rev_element - 1;

V[d] = result;

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

REV16 (vector) Page 1292

REV32 (vector)

Reverse elements in 32-bit words (vector). This instruction reverses the order of 8-bit or 16-bit elements in each word
of the vector in the source SIMD&FP register, places the results into a vector, and writes the vector to the destination
SIMD&FP register.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 size 1 0 0 0 0 0 0 0 0 0 1 0 Rn Rd
U o0

REV32 <Vd>.<T>, <Vn>.<T>

integer d = UInt(Rd);
integer n = UInt(Rn);

// size=esize: B(0), H(1), S(1), D(S)

integer esize = 8 << UInt(size);
integer datasize = if Q == '1' then 128 else 64;

// op=REVx: 64(0), 32(1), 16(2)

bits(2) op = o0:U;

// index bits within group: 1, 2, 3

if UInt(op) + UInt(size) >= 3 then UNDEFINED;

integer container_size;
case op of
when '10' container_size = 16;
when '01' container_size = 32;
when '00' container_size = 64;

integer containers = datasize DIV container_size;

integer elements_per_container = container_size DIV esize;

Assembler Symbols

<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.

<T> Is an arrangement specifier, encoded in “size:Q”:

size Q <T>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
1x x RESERVED

<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.

REV32 (vector) Page 1293

Operation

V[d] = result;

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

REV32 (vector) Page 1294

REV64

Reverse elements in 64-bit doublewords (vector). This instruction reverses the order of 8-bit, 16-bit, or 32-bit elements
in each doubleword of the vector in the source SIMD&FP register, places the results into a vector, and writes the
vector to the destination SIMD&FP register.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 size 1 0 0 0 0 0 0 0 0 0 1 0 Rn Rd
U o0

REV64 <Vd>.<T>, <Vn>.<T>

integer d = UInt(Rd);
integer n = UInt(Rn);

// size=esize: B(0), H(1), S(1), D(S)

integer esize = 8 << UInt(size);
integer datasize = if Q == '1' then 128 else 64;

// op=REVx: 64(0), 32(1), 16(2)

bits(2) op = o0:U;

// index bits within group: 1, 2, 3

if UInt(op) + UInt(size) >= 3 then UNDEFINED;

integer container_size;
case op of
when '10' container_size = 16;
when '01' container_size = 32;
when '00' container_size = 64;

integer containers = datasize DIV container_size;

integer elements_per_container = container_size DIV esize;

Assembler Symbols

<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.

<T> Is an arrangement specifier, encoded in “size:Q”:

size Q <T>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 x RESERVED

<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.

REV64 Page 1295

Operation

V[d] = result;

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

REV64 Page 1296

RSHRN, RSHRN2

Rounding Shift Right Narrow (immediate). This instruction reads each unsigned integer value from the vector in the
source SIMD&FP register, right shifts each result by an immediate value, writes the final result to a vector, and writes
the vector to the lower or upper half of the destination SIMD&FP register. The destination vector elements are half as
long as the source vector elements. The results are rounded. For truncated results, see SHRN.
The RSHRN instruction writes the vector to the lower half of the destination register and clears the upper half, while the
RSHRN2 instruction writes the vector to the upper half of the destination register without affecting the other bits of the
register.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 1 0 != 0000 immb 1 0 0 0 1 1 Rn Rd
immh op

RSHRN{2} <Vd>.<Tb>, <Vn>.<Ta>, #<shift>

integer d = UInt(Rd);
integer n = UInt(Rn);

if immh == '0000' then SEE(asimdimm);

if immh<3> == '1' then UNDEFINED;
integer esize = 8 << HighestSetBit(immh);
integer datasize = 64;
integer part = UInt(Q);
integer elements = datasize DIV esize;

integer shift = (2 * esize) - UInt(immh:immb);

boolean round = (op == '1');

Assembler Symbols

2 Is the second and upper half specifier. If present it causes the operation to be performed on the upper
64 bits of the registers holding the narrower elements, and is encoded in “Q”:

Q 2
0 [absent]
1 [present]

<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.

<Tb> Is an arrangement specifier, encoded in “immh:Q”:

immh Q <Tb>
0000 x SEE Advanced SIMD modified immediate
0001 0 8B
0001 1 16B
001x 0 4H
001x 1 8H
01xx 0 2S
01xx 1 4S
1xxx x RESERVED

<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.

<Ta> Is an arrangement specifier, encoded in “immh”:

RSHRN, RSHRN2 Page 1297

immh <Ta>
0000 SEE Advanced SIMD modified immediate
0001 8H
001x 4S
01xx 2D
1xxx RESERVED

<shift> Is the right shift amount, in the range 1 to the destination element width in bits, encoded in
“immh:immb”:

immh <shift>
0000 SEE Advanced SIMD modified immediate
0001 (16-UInt(immh:immb))
001x (32-UInt(immh:immb))
01xx (64-UInt(immh:immb))
1xxx RESERVED

Operation

CheckFPAdvSIMDEnabled64();
bits(datasize*2) operand = V[n];
bits(datasize) result;
integer round_const = if round then (1 << (shift - 1)) else 0;
integer element;

for e = 0 to elements-1
element = (UInt(Elem[operand, e, 2*esize]) + round_const) >> shift;
Elem[result, e, esize] = element<esize-1:0>;

Vpart[d, part] = result;

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

RSHRN, RSHRN2 Page 1298

RSUBHN, RSUBHN2

Rounding Subtract returning High Narrow. This instruction subtracts each vector element of the second source
SIMD&FP register from the corresponding vector element of the first source SIMD&FP register, places the most
significant half of the result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP
register.
The results are rounded. For truncated results, see SUBHN.
The RSUBHN instruction writes the vector to the lower half of the destination register and clears the upper half, while
the RSUBHN2 instruction writes the vector to the upper half of the destination register without affecting the other bits
of the register.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 size 1 Rm 0 1 1 0 0 0 Rn Rd
U o1

RSUBHN{2} <Vd>.<Tb>, <Vn>.<Ta>, <Vm>.<Ta>

integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);

if size == '11' then UNDEFINED;

integer esize = 8 << UInt(size);
integer datasize = 64;
integer part = UInt(Q);
integer elements = datasize DIV esize;

boolean sub_op = (o1 == '1');

boolean round = (U == '1');

Assembler Symbols

2 Is the second and upper half specifier. If present it causes the operation to be performed on the upper
64 bits of the registers holding the narrower elements, and is encoded in “Q”:

Q 2
0 [absent]
1 [present]

<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.

<Tb> Is an arrangement specifier, encoded in “size:Q”:

size Q <Tb>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 x RESERVED

<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.

<Ta> Is an arrangement specifier, encoded in “size”:

size <Ta>
00 8H
01 4S
10 2D
11 RESERVED

RSUBHN, RSUBHN2 Page 1299

<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.

Operation

Vpart[d, part] = result;

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

RSUBHN, RSUBHN2 Page 1300

SABA

Signed Absolute difference and Accumulate. This instruction subtracts the elements of the vector of the second source
SIMD&FP register from the corresponding elements of the first source SIMD&FP register, and accumulates the
absolute values of the results into the elements of the vector of the destination SIMD&FP register.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 size 1 Rm 0 1 1 1 1 1 Rn Rd
U ac

SABA <Vd>.<T>, <Vn>.<T>, <Vm>.<T>

boolean unsigned = (U == '1');

boolean accumulate = (ac == '1');

Assembler Symbols

<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.

<T> Is an arrangement specifier, encoded in “size:Q”:

size Q <T>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 x RESERVED

<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.

Operation

CheckFPAdvSIMDEnabled64();
bits(datasize) operand1 = V[n];
bits(datasize) operand2 = V[m];
bits(datasize) result;
integer element1;
integer element2;
bits(esize) absdiff;

result = if accumulate then V[d] else Zeros();

for e = 0 to elements-1
element1 = Int(Elem[operand1, e, esize], unsigned);
element2 = Int(Elem[operand2, e, esize], unsigned);
absdiff = Abs(element1-element2)<esize-1:0>;
Elem[result, e, esize] = Elem[result, e, esize] + absdiff;
V[d] = result;

SABA Page 1301

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

SABA Page 1302

SABAL, SABAL2

Signed Absolute difference and Accumulate Long. This instruction subtracts the vector elements in the lower or upper
half of the second source SIMD&FP register from the corresponding vector elements of the first source SIMD&FP
register, and accumulates the absolute values of the results into the vector elements of the destination SIMD&FP
register. The destination vector elements are twice as long as the source vector elements.
The SABAL instruction extracts each source vector from the lower half of each source register. The SABAL2 instruction
extracts each source vector from the upper half of each source register.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 size 1 Rm 0 1 0 1 0 0 Rn Rd
U op

SABAL{2} <Vd>.<Ta>, <Vn>.<Tb>, <Vm>.<Tb>

integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);

if size == '11' then UNDEFINED;

integer esize = 8 << UInt(size);
integer datasize = 64;
integer part = UInt(Q);
integer elements = datasize DIV esize;

boolean accumulate = (op == '0');

boolean unsigned = (U == '1');

Assembler Symbols

2 Is the second and upper half specifier. If present it causes the operation to be performed on the upper
64 bits of the registers holding the narrower elements, and is encoded in “Q”:

Q 2
0 [absent]
1 [present]

<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.

<Ta> Is an arrangement specifier, encoded in “size”:

size <Ta>
00 8H
01 4S
10 2D
11 RESERVED

<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.

<Tb> Is an arrangement specifier, encoded in “size:Q”:

size Q <Tb>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 x RESERVED

<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.

SABAL, SABAL2 Page 1303

Operation

CheckFPAdvSIMDEnabled64();
bits(datasize) operand1 = Vpart[n, part];
bits(datasize) operand2 = Vpart[m, part];
bits(2*datasize) result;
integer element1;
integer element2;
bits(2*esize) absdiff;

result = if accumulate then V[d] else Zeros();

for e = 0 to elements-1
element1 = Int(Elem[operand1, e, esize], unsigned);
element2 = Int(Elem[operand2, e, esize], unsigned);
absdiff = Abs(element1-element2)<2*esize-1:0>;
Elem[result, e, 2*esize] = Elem[result, e, 2*esize] + absdiff;
V[d] = result;

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

SABAL, SABAL2 Page 1304

SABD

Signed Absolute Difference. This instruction subtracts the elements of the vector of the second source SIMD&FP
register from the corresponding elements of the first source SIMD&FP register, places the the absolute values of the
results into a vector, and writes the vector to the destination SIMD&FP register.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 size 1 Rm 0 1 1 1 0 1 Rn Rd
U ac

SABD <Vd>.<T>, <Vn>.<T>, <Vm>.<T>

boolean unsigned = (U == '1');

boolean accumulate = (ac == '1');

Assembler Symbols

<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.

<T> Is an arrangement specifier, encoded in “size:Q”:

size Q <T>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 x RESERVED

<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.

Operation

CheckFPAdvSIMDEnabled64();
bits(datasize) operand1 = V[n];
bits(datasize) operand2 = V[m];
bits(datasize) result;
integer element1;
integer element2;
bits(esize) absdiff;

result = if accumulate then V[d] else Zeros();

SABD Page 1305

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

SABD Page 1306

SABDL, SABDL2

Signed Absolute Difference Long. This instruction subtracts the vector elements of the second source SIMD&FP
register from the corresponding vector elements of the first source SIMD&FP register, places the absolute value of the
results into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register. The
destination vector elements are twice as long as the source vector elements.
The SABDL instruction extracts each source vector from the lower half of each source register, while the SABDL2
instruction extracts each source vector from the upper half of each source register.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 size 1 Rm 0 1 1 1 0 0 Rn Rd
U op

SABDL{2} <Vd>.<Ta>, <Vn>.<Tb>, <Vm>.<Tb>

integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);

if size == '11' then UNDEFINED;

integer esize = 8 << UInt(size);
integer datasize = 64;
integer part = UInt(Q);
integer elements = datasize DIV esize;

boolean accumulate = (op == '0');

boolean unsigned = (U == '1');

Assembler Symbols

2 Is the second and upper half specifier. If present it causes the operation to be performed on the upper
64 bits of the registers holding the narrower elements, and is encoded in “Q”:

Q 2
0 [absent]
1 [present]

<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.

<Ta> Is an arrangement specifier, encoded in “size”:

size <Ta>
00 8H
01 4S
10 2D
11 RESERVED

<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.

<Tb> Is an arrangement specifier, encoded in “size:Q”:

size Q <Tb>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 x RESERVED

<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.

SABDL, SABDL2 Page 1307

Operation

CheckFPAdvSIMDEnabled64();
bits(datasize) operand1 = Vpart[n, part];
bits(datasize) operand2 = Vpart[m, part];
bits(2*datasize) result;
integer element1;
integer element2;
bits(2*esize) absdiff;

result = if accumulate then V[d] else Zeros();

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

SABDL, SABDL2 Page 1308

SADALP

Signed Add and Accumulate Long Pairwise. This instruction adds pairs of adjacent signed integer values from the
vector in the source SIMD&FP register and accumulates the results into the vector elements of the destination
SIMD&FP register. The destination vector elements are twice as long as the source vector elements.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 size 1 0 0 0 0 0 0 1 1 0 1 0 Rn Rd
U op

SADALP <Vd>.<Ta>, <Vn>.<Tb>

integer d = UInt(Rd);
integer n = UInt(Rn);

if size == '11' then UNDEFINED;

integer esize = 8 << UInt(size);
integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV (2 * esize);
boolean acc = (op == '1');
boolean unsigned = (U == '1');

Assembler Symbols

<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.

<Ta> Is an arrangement specifier, encoded in “size:Q”:

size Q <Ta>
00 0 4H
00 1 8H
01 0 2S
01 1 4S
10 0 1D
10 1 2D
11 x RESERVED

<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.

<Tb> Is an arrangement specifier, encoded in “size:Q”:

size Q <Tb>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 x RESERVED

SADALP Page 1309

Operation

CheckFPAdvSIMDEnabled64();
bits(datasize) operand = V[n];
bits(datasize) result;

bits(2*esize) sum;
integer op1;
integer op2;

if acc then result = V[d];

for e = 0 to elements-1
op1 = Int(Elem[operand, 2*e+0, esize], unsigned);
op2 = Int(Elem[operand, 2*e+1, esize], unsigned);
sum = (op1+op2)<2*esize-1:0>;
if acc then
Elem[result, e, 2*esize] = Elem[result, e, 2*esize] + sum;
else
Elem[result, e, 2*esize] = sum;

V[d] = result;

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

SADALP Page 1310

SADDL, SADDL2

Signed Add Long (vector). This instruction adds each vector element in the lower or upper half of the first source
SIMD&FP register to the corresponding vector element of the second source SIMD&FP register, places the results
into a vector, and writes the vector to the destination SIMD&FP register. The destination vector elements are twice as
long as the source vector elements. All the values in this instruction are signed integer values.
The SADDL instruction extracts each source vector from the lower half of each source register. The SADDL2 instruction
extracts each source vector from the upper half of each source register.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 size 1 Rm 0 0 0 0 0 0 Rn Rd
U o1

SADDL{2} <Vd>.<Ta>, <Vn>.<Tb>, <Vm>.<Tb>

integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);

if size == '11' then UNDEFINED;

integer esize = 8 << UInt(size);
integer datasize = 64;
integer part = UInt(Q);
integer elements = datasize DIV esize;

boolean sub_op = (o1 == '1');

boolean unsigned = (U == '1');

Assembler Symbols

2 Is the second and upper half specifier. If present it causes the operation to be performed on the upper
64 bits of the registers holding the narrower elements, and is encoded in “Q”:

Q 2
0 [absent]
1 [present]

<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.

<Ta> Is an arrangement specifier, encoded in “size”:

size <Ta>
00 8H
01 4S
10 2D
11 RESERVED

<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.

<Tb> Is an arrangement specifier, encoded in “size:Q”:

size Q <Tb>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 x RESERVED

<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.

SADDL, SADDL2 Page 1311

Operation

CheckFPAdvSIMDEnabled64();
bits(datasize) operand1 = Vpart[n, part];
bits(datasize) operand2 = Vpart[m, part];
bits(2*datasize) result;
integer element1;
integer element2;
integer sum;

for e = 0 to elements-1
element1 = Int(Elem[operand1, e, esize], unsigned);
element2 = Int(Elem[operand2, e, esize], unsigned);
if sub_op then
sum = element1 - element2;
else
sum = element1 + element2;
Elem[result, e, 2*esize] = sum<2*esize-1:0>;

V[d] = result;

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

SADDL, SADDL2 Page 1312

SADDLP

Signed Add Long Pairwise. This instruction adds pairs of adjacent signed integer values from the vector in the source
SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register. The
destination vector elements are twice as long as the source vector elements.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 size 1 0 0 0 0 0 0 0 1 0 1 0 Rn Rd
U op

SADDLP <Vd>.<Ta>, <Vn>.<Tb>

integer d = UInt(Rd);
integer n = UInt(Rn);

if size == '11' then UNDEFINED;

integer esize = 8 << UInt(size);
integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV (2 * esize);
boolean acc = (op == '1');
boolean unsigned = (U == '1');

Assembler Symbols

<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.

<Ta> Is an arrangement specifier, encoded in “size:Q”:

size Q <Ta>
00 0 4H
00 1 8H
01 0 2S
01 1 4S
10 0 1D
10 1 2D
11 x RESERVED

<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.

<Tb> Is an arrangement specifier, encoded in “size:Q”:

size Q <Tb>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 x RESERVED

SADDLP Page 1313

Operation

CheckFPAdvSIMDEnabled64();
bits(datasize) operand = V[n];
bits(datasize) result;

bits(2*esize) sum;
integer op1;
integer op2;

if acc then result = V[d];

V[d] = result;

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

SADDLP Page 1314

SADDLV

Signed Add Long across Vector. This instruction adds every vector element in the source SIMD&FP register together,
and writes the scalar result to the destination SIMD&FP register. The destination scalar is twice as long as the source
vector elements. All the values in this instruction are signed integer values.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 size 1 1 0 0 0 0 0 0 1 1 1 0 Rn Rd
U

SADDLV <V><d>, <Vn>.<T>

integer d = UInt(Rd);
integer n = UInt(Rn);

if size:Q == '100' then UNDEFINED;

if size == '11' then UNDEFINED;
integer esize = 8 << UInt(size);
integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;

boolean unsigned = (U == '1');

Assembler Symbols

<V> Is the destination width specifier, encoded in “size”:

size <V>
00 H
01 S
10 D
11 RESERVED

<d> Is the number of the SIMD&FP destination register, encoded in the "Rd" field.
<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.

<T> Is an arrangement specifier, encoded in “size:Q”:

size Q <T>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 RESERVED
10 1 4S
11 x RESERVED

Operation

CheckFPAdvSIMDEnabled64();
bits(datasize) operand = V[n];
integer sum;

sum = Int(Elem[operand, 0, esize], unsigned);

for e = 1 to elements-1
sum = sum + Int(Elem[operand, e, esize], unsigned);

V[d] = sum<2*esize-1:0>;

SADDLV Page 1315

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

SADDLV Page 1316

SADDW, SADDW2

Signed Add Wide. This instruction adds vector elements of the first source SIMD&FP register to the corresponding
vector elements in the lower or upper half of the second source SIMD&FP register, places the results in a vector, and
writes the vector to the SIMD&FP destination register.
The SADDW instruction extracts the second source vector from the lower half of the second source register. The SADDW2
instruction extracts the second source vector from the upper half of the second source register.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 size 1 Rm 0 0 0 1 0 0 Rn Rd
U o1

SADDW{2} <Vd>.<Ta>, <Vn>.<Ta>, <Vm>.<Tb>

integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);

if size == '11' then UNDEFINED;

integer esize = 8 << UInt(size);
integer datasize = 64;
integer part = UInt(Q);
integer elements = datasize DIV esize;

boolean sub_op = (o1 == '1');

boolean unsigned = (U == '1');

Assembler Symbols

2 Is the second and upper half specifier. If present it causes the operation to be performed on the upper
64 bits of the registers holding the narrower elements, and is encoded in “Q”:

Q 2
0 [absent]
1 [present]

<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.

<Ta> Is an arrangement specifier, encoded in “size”:

size <Ta>
00 8H
01 4S
10 2D
11 RESERVED

<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.

<Tb> Is an arrangement specifier, encoded in “size:Q”:

size Q <Tb>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 x RESERVED

SADDW, SADDW2 Page 1317

Operation

CheckFPAdvSIMDEnabled64();
bits(2*datasize) operand1 = V[n];
bits(datasize) operand2 = Vpart[m, part];
bits(2*datasize) result;
integer element1;
integer element2;
integer sum;

for e = 0 to elements-1
element1 = Int(Elem[operand1, e, 2*esize], unsigned);
element2 = Int(Elem[operand2, e, esize], unsigned);
if sub_op then
sum = element1 - element2;
else
sum = element1 + element2;
Elem[result, e, 2*esize] = sum<2*esize-1:0>;

V[d] = result;

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

SADDW, SADDW2 Page 1318

SCVTF (scalar, fixed-point)

Signed fixed-point Convert to Floating-point (scalar). This instruction converts the signed value in the 32-bit or 64-bit
general-purpose source register to a floating-point value using the rounding mode that is specified by the FPCR, and
writes the result to the SIMD&FP destination register.
A floating-point exception can be generated by this instruction. Depending on the settings in FPCR, the exception
results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see
Floating-point exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the Security state and
Exception level in which the instruction is executed, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 0 0 1 1 1 1 0 ftype 0 0 0 0 1 0 scale Rn Rd
rmode opcode

32-bit to half-precision (sf == 0 && ftype == 11)

(FEAT_FP16)

SCVTF <Hd>, <Wn>, #<fbits>

32-bit to single-precision (sf == 0 && ftype == 00)

SCVTF <Sd>, <Wn>, #<fbits>

32-bit to double-precision (sf == 0 && ftype == 01)

SCVTF <Dd>, <Wn>, #<fbits>

64-bit to half-precision (sf == 1 && ftype == 11)

(FEAT_FP16)

SCVTF <Hd>, <Xn>, #<fbits>

64-bit to single-precision (sf == 1 && ftype == 00)

SCVTF <Sd>, <Xn>, #<fbits>

64-bit to double-precision (sf == 1 && ftype == 01)

SCVTF <Dd>, <Xn>, #<fbits>

integer d = UInt(Rd);
integer n = UInt(Rn);

integer intsize = if sf == '1' then 64 else 32;

integer fltsize;
FPRounding rounding;

case ftype of
when '00' fltsize = 32;
when '01' fltsize = 64;
when '10' UNDEFINED;
when '11'
if HaveFP16Ext() then
fltsize = 16;
else
UNDEFINED;

if sf == '0' && scale<5> == '0' then UNDEFINED;

integer fracbits = 64 - UInt(scale);

rounding = FPRoundingMode(FPCR[]);

SCVTF (scalar, fixed-point) Page 1319

Assembler Symbols

<Dd> Is the 64-bit name of the SIMD&FP destination register, encoded in the "Rd" field.
<Hd> Is the 16-bit name of the SIMD&FP destination register, encoded in the "Rd" field.
<Sd> Is the 32-bit name of the SIMD&FP destination register, encoded in the "Rd" field.
<Xn> Is the 64-bit name of the general-purpose source register, encoded in the "Rn" field.
<Wn> Is the 32-bit name of the general-purpose source register, encoded in the "Rn" field.
<fbits> For the 32-bit to double-precision, 32-bit to half-precision and 32-bit to single-precision variant: is the
number of bits after the binary point in the fixed-point source, in the range 1 to 32, encoded as 64
minus "scale".
For the 64-bit to double-precision, 64-bit to half-precision and 64-bit to single-precision variant: is the
number of bits after the binary point in the fixed-point source, in the range 1 to 64, encoded as 64
minus "scale".

Operation

CheckFPAdvSIMDEnabled64();

FPCRType fpcr = FPCR[];

boolean merge = IsMerging(fpcr);
integer fsize = if merge then 128 else fltsize;
bits(fsize) fltval;
bits(intsize) intval;

intval = X[n];
fltval = if merge then V[d] else Zeros();
Elem[fltval, 0, fltsize] = FixedToFP(intval, fracbits, FALSE, fpcr, rounding);
V[d] = fltval;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

SCVTF (scalar, fixed-point) Page 1320

SCVTF (scalar, integer)

Signed integer Convert to Floating-point (scalar). This instruction converts the signed integer value in the general-
purpose source register to a floating-point value using the rounding mode that is specified by the FPCR, and writes the
result to the SIMD&FP destination register.
A floating-point exception can be generated by this instruction. Depending on the settings in FPCR, the exception
results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see
Floating-point exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 0 0 1 1 1 1 0 ftype 1 0 0 0 1 0 0 0 0 0 0 0 Rn Rd
rmode opcode

32-bit to half-precision (sf == 0 && ftype == 11)

(FEAT_FP16)

SCVTF <Hd>, <Wn>

32-bit to single-precision (sf == 0 && ftype == 00)

SCVTF <Sd>, <Wn>

32-bit to double-precision (sf == 0 && ftype == 01)

SCVTF <Dd>, <Wn>

64-bit to half-precision (sf == 1 && ftype == 11)

(FEAT_FP16)

SCVTF <Hd>, <Xn>

64-bit to single-precision (sf == 1 && ftype == 00)

SCVTF <Sd>, <Xn>

64-bit to double-precision (sf == 1 && ftype == 01)

SCVTF <Dd>, <Xn>

integer d = UInt(Rd);
integer n = UInt(Rn);

integer intsize = if sf == '1' then 64 else 32;

integer fltsize;
FPRounding rounding;

case ftype of
when '00'
fltsize = 32;
when '01'
fltsize = 64;
when '10'
UNDEFINED;
when '11'
if HaveFP16Ext() then
fltsize = 16;
else
UNDEFINED;

rounding = FPRoundingMode(FPCR[]);

SCVTF (scalar, integer) Page 1321

Assembler Symbols

Operation

CheckFPAdvSIMDEnabled64();

FPCRType fpcr = FPCR[];

boolean merge = IsMerging(fpcr);
integer fsize = if merge then 128 else fltsize;
bits(fsize) fltval;
bits(intsize) intval;

intval = X[n];
fltval = if merge then V[d] else Zeros();
Elem[fltval, 0, fltsize] = FixedToFP(intval, 0, FALSE, fpcr, rounding);
V[d] = fltval;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

SCVTF (scalar, integer) Page 1322

SCVTF (vector, fixed-point)

Signed fixed-point Convert to Floating-point (vector). This instruction converts each element in a vector from fixed-
point to floating-point using the rounding mode that is specified by the FPCR, and writes the result to the SIMD&FP
destination register.
A floating-point exception can be generated by this instruction. Depending on the settings in FPCR, the exception
results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see
Floating-point exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the Security state and
Exception level in which the instruction is executed, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: Scalar and Vector

Scalar

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 1 0 != 0000 immb 1 1 1 0 0 1 Rn Rd
U immh

SCVTF <V><d>, <V><n>, #<fbits>

integer d = UInt(Rd);
integer n = UInt(Rn);

if immh == '000x' || (immh == '001x' && !HaveFP16Ext()) then UNDEFINED;

integer esize = if immh == '1xxx' then 64 else if immh == '01xx' then 32 else 16;
integer datasize = esize;
integer elements = 1;

integer fracbits = (esize * 2) - UInt(immh:immb);

boolean unsigned = (U == '1');
FPRounding rounding = FPRoundingMode(FPCR[]);

Vector

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 1 0 != 0000 immb 1 1 1 0 0 1 Rn Rd
U immh

SCVTF <Vd>.<T>, <Vn>.<T>, #<fbits>

integer d = UInt(Rd);
integer n = UInt(Rn);

if immh == '0000' then SEE(asimdimm);

integer fracbits = (esize * 2) - UInt(immh:immb);

boolean unsigned = (U == '1');
FPRounding rounding = FPRoundingMode(FPCR[]);

Assembler Symbols

<V> Is a width specifier, encoded in “immh”:

SCVTF (vector, fixed-point) Page 1323

immh <V>
000x RESERVED
001x H
01xx S
1xxx D

<T> Is an arrangement specifier, encoded in “immh:Q”:

immh Q <T>
0000 x SEE Advanced SIMD modified immediate
0001 x RESERVED
001x 0 4H
001x 1 8H
01xx 0 2S
01xx 1 4S
1xxx 0 RESERVED
1xxx 1 2D

<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.

<fbits> For the scalar variant: is the number of fractional bits, in the range 1 to the operand width, encoded in
“immh:immb”:

immh <fbits>
000x RESERVED
001x (32-Uint(immh:immb))
01xx (64-UInt(immh:immb))
1xxx (128-UInt(immh:immb))

For the vector variant: is the number of fractional bits, in the range 1 to the element width, encoded in
“immh:immb”:

immh <fbits>
0000 SEE Advanced SIMD modified immediate
0001 RESERVED
001x (32-Uint(immh:immb))
01xx (64-UInt(immh:immb))
1xxx (128-UInt(immh:immb))

Operation

CheckFPAdvSIMDEnabled64();
bits(datasize) operand = V[n];

bits(esize) element;
FPCRType fpcr = FPCR[];
boolean merge = elements == 1 && IsMerging(fpcr);
bits(128) result = if merge then V[d] else Zeros();

for e = 0 to elements-1
element = Elem[operand, e, esize];
Elem[result, e, esize] = FixedToFP(element, fracbits, unsigned, fpcr, rounding);

V[d] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

SCVTF (vector, fixed-point) Page 1324

SCVTF (vector, integer)

Signed integer Convert to Floating-point (vector). This instruction converts each element in a vector from signed
integer to floating-point using the rounding mode that is specified by the FPCR, and writes the result to the SIMD&FP
destination register.
A floating-point exception can be generated by this instruction. Depending on the settings in FPCR, the exception
results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see
Floating-point exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the Security state and
Exception level in which the instruction is executed, an attempt to execute the instruction might be trapped.
It has encodings from 4 classes: Scalar half precision , Scalar single-precision and double-precision , Vector half
precision and Vector single-precision and double-precision

Scalar half precision

(FEAT_FP16)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 0 0 1 1 1 1 0 0 1 1 1 0 1 1 0 Rn Rd
U

SCVTF <Hd>, <Hn>

if !HaveFP16Ext() then UNDEFINED;

integer d = UInt(Rd);
integer n = UInt(Rn);

integer esize = 16;

integer datasize = esize;
integer elements = 1;
boolean unsigned = (U == '1');

Scalar single-precision and double-precision

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 0 0 sz 1 0 0 0 0 1 1 1 0 1 1 0 Rn Rd
U

SCVTF <V><d>, <V><n>

integer d = UInt(Rd);
integer n = UInt(Rn);

integer esize = 32 << UInt(sz);

integer datasize = esize;
integer elements = 1;
boolean unsigned = (U == '1');

Vector half precision

(FEAT_FP16)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 0 1 1 1 1 0 0 1 1 1 0 1 1 0 Rn Rd
U

SCVTF (vector, integer) Page 1325

SCVTF <Vd>.<T>, <Vn>.<T>

if !HaveFP16Ext() then UNDEFINED;

integer d = UInt(Rd);
integer n = UInt(Rn);

integer esize = 16;

integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;
boolean unsigned = (U == '1');

Vector single-precision and double-precision

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 0 sz 1 0 0 0 0 1 1 1 0 1 1 0 Rn Rd
U

SCVTF <Vd>.<T>, <Vn>.<T>

integer d = UInt(Rd);
integer n = UInt(Rn);

if sz:Q == '10' then UNDEFINED;

integer esize = 32 << UInt(sz);
integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;
boolean unsigned = (U == '1');

Assembler Symbols

<Hd> Is the 16-bit name of the SIMD&FP destination register, encoded in the "Rd" field.
<Hn> Is the 16-bit name of the SIMD&FP source register, encoded in the "Rn" field.

<V> Is a width specifier, encoded in “sz”:

sz <V>
0 S
1 D

<T> For the half-precision variant: is an arrangement specifier, encoded in “Q”:

Q <T>
0 4H
1 8H

For the single-precision and double-precision variant: is an arrangement specifier, encoded in “sz:Q”:

sz Q <T>
0 0 2S
0 1 4S
1 0 RESERVED
1 1 2D

<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.

SCVTF (vector, integer) Page 1326

Operation

CheckFPAdvSIMDEnabled64();
bits(datasize) operand = V[n];

FPCRType fpcr = FPCR[];

boolean merge = elements == 1 && IsMerging(fpcr);
bits(128) result = if merge then V[d] else Zeros();

FPRounding rounding = FPRoundingMode(fpcr);

bits(esize) element;
for e = 0 to elements-1
element = Elem[operand, e, esize];
Elem[result, e, esize] = FixedToFP(element, 0, unsigned, fpcr, rounding);

V[d] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

SCVTF (vector, integer) Page 1327

SDOT (by element)

Dot Product signed arithmetic (vector, by element). This instruction performs the dot product of the four 8-bit
elements in each 32-bit element of the first source register with the four 8-bit elements of an indexed 32-bit element in
the second source register, accumulating the result into the corresponding 32-bit element of the destination register.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
In Armv8.2 and Armv8.3, this is an OPTIONAL instruction. From Armv8.4 it is mandatory for all implementations to
support it.

Note

ID_AA64ISAR0_EL1.DP indicates whether this instruction is supported.

Vector
(FEAT_DotProd)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 1 size L M Rm 1 1 1 0 H 0 Rn Rd
U

SDOT <Vd>.<Ta>, <Vn>.<Tb>, <Vm>.4B[<index>]

if !HaveDOTPExt() then UNDEFINED;

if size != '10' then UNDEFINED;
boolean signed = (U == '0');

integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(M:Rm);
integer index = UInt(H:L);

integer esize = 8 << UInt(size);

integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;

Assembler Symbols

<Vd> Is the name of the SIMD&FP third source and destination register, encoded in the "Rd" field.

<Ta> Is an arrangement specifier, encoded in “Q”:

Q <Ta>
0 2S
1 4S

<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.

<Tb> Is an arrangement specifier, encoded in “Q”:

Q <Tb>
0 8B
1 16B

<Vm> Is the name of the second SIMD&FP source register, encoded in the "M:Rm" fields.
<index> Is the element index, encoded in the "H:L" fields.

SDOT (by element) Page 1328

Operation

CheckFPAdvSIMDEnabled64();
bits(datasize) operand1 = V[n];
bits(128) operand2 = V[m];
bits(datasize) result = V[d];
for e = 0 to elements-1
integer res = 0;
integer element1, element2;
for i = 0 to 3
if signed then
element1 = SInt(Elem[operand1, 4*e+i, esize DIV 4]);
element2 = SInt(Elem[operand2, 4*index+i, esize DIV 4]);
else
element1 = UInt(Elem[operand1, 4*e+i, esize DIV 4]);
element2 = UInt(Elem[operand2, 4*index+i, esize DIV 4]);
res = res + element1 * element2;
Elem[result, e, esize] = Elem[result, e, esize] + res;
V[d] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

SDOT (by element) Page 1329

SDOT (vector)

Dot Product signed arithmetic (vector). This instruction performs the dot product of the four signed 8-bit elements in
each 32-bit element of the first source register with the four signed 8-bit elements of the corresponding 32-bit element
in the second source register, accumulating the result into the corresponding 32-bit element of the destination
register.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
In Armv8.2 and Armv8.3, this is an OPTIONAL instruction. From Armv8.4 it is mandatory for all implementations to
support it.

Note

ID_AA64ISAR0_EL1.DP indicates whether this instruction is supported.

Vector
(FEAT_DotProd)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 size 0 Rm 1 0 0 1 0 1 Rn Rd
U

SDOT <Vd>.<Ta>, <Vn>.<Tb>, <Vm>.<Tb>

if !HaveDOTPExt() then UNDEFINED;

if size != '10' then UNDEFINED;
boolean signed = (U == '0');
integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer esize = 8 << UInt(size);
integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;

Assembler Symbols

<Vd> Is the name of the SIMD&FP third source and destination register, encoded in the "Rd" field.

<Ta> Is an arrangement specifier, encoded in “Q”:

Q <Ta>
0 2S
1 4S

<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.

<Tb> Is an arrangement specifier, encoded in “Q”:

Q <Tb>
0 8B
1 16B

<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.

SDOT (vector) Page 1330

Operation

CheckFPAdvSIMDEnabled64();
bits(datasize) operand1 = V[n];
bits(datasize) operand2 = V[m];
bits(datasize) result;

result = V[d];
for e = 0 to elements-1
integer res = 0;
integer element1, element2;
for i = 0 to 3
if signed then
element1 = SInt(Elem[operand1, 4*e+i, esize DIV 4]);
element2 = SInt(Elem[operand2, 4*e+i, esize DIV 4]);
else
element1 = UInt(Elem[operand1, 4*e+i, esize DIV 4]);
element2 = UInt(Elem[operand2, 4*e+i, esize DIV 4]);
res = res + element1 * element2;
Elem[result, e, esize] = Elem[result, e, esize] + res;
V[d] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

SDOT (vector) Page 1331

SHA1C

SHA1 hash update (choose).

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 0 0 0 0 Rm 0 0 0 0 0 0 Rn Rd

SHA1C <Qd>, <Sn>, <Vm>.4S

integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
if !HaveSHA1Ext() then UNDEFINED;

Assembler Symbols

<Qd> Is the 128-bit name of the SIMD&FP source and destination, encoded in the "Rd" field.
<Sn> Is the 32-bit name of the second SIMD&FP source register, encoded in the "Rn" field.
<Vm> Is the name of the third SIMD&FP source register, encoded in the "Rm" field.

Operation

AArch64.CheckFPAdvSIMDEnabled();

bits(128) X = V[d];
bits(32) Y = V[n]; // Note: 32 not 128 bits wide
bits(128) W = V[m];
bits(32) t;

for e = 0 to 3
t = SHAchoose(X<63:32>, X<95:64>, X<127:96>);
Y = Y + ROL(X<31:0>, 5) + t + Elem[W, e, 32];
X<63:32> = ROL(X<63:32>, 30);
<Y, X> = ROL(Y:X, 32);
V[d] = X;

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

SHA1C Page 1332

SHA1H

SHA1 fixed rotate.

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 0 0 0 1 0 1 0 0 0 0 0 0 0 1 0 Rn Rd

SHA1H <Sd>, <Sn>

integer d = UInt(Rd);
integer n = UInt(Rn);
if !HaveSHA1Ext() then UNDEFINED;

Assembler Symbols

<Sd> Is the 32-bit name of the SIMD&FP destination register, encoded in the "Rd" field.
<Sn> Is the 32-bit name of the SIMD&FP source register, encoded in the "Rn" field.

Operation

AArch64.CheckFPAdvSIMDEnabled();

bits(32) operand = V[n]; // read element [0] only, [1-3] zeroed

V[d] = ROL(operand, 30);

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

SHA1H Page 1333

SHA1M

SHA1 hash update (majority).

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 0 0 0 0 Rm 0 0 1 0 0 0 Rn Rd

SHA1M <Qd>, <Sn>, <Vm>.4S

integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
if !HaveSHA1Ext() then UNDEFINED;

Assembler Symbols

Operation

AArch64.CheckFPAdvSIMDEnabled();

bits(128) X = V[d];
bits(32) Y = V[n]; // Note: 32 not 128 bits wide
bits(128) W = V[m];
bits(32) t;

for e = 0 to 3
t = SHAmajority(X<63:32>, X<95:64>, X<127:96>);
Y = Y + ROL(X<31:0>, 5) + t + Elem[W, e, 32];
X<63:32> = ROL(X<63:32>, 30);
<Y, X> = ROL(Y:X, 32);
V[d] = X;

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

SHA1M Page 1334

SHA1P

SHA1 hash update (parity).

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 0 0 0 0 Rm 0 0 0 1 0 0 Rn Rd

SHA1P <Qd>, <Sn>, <Vm>.4S

integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
if !HaveSHA1Ext() then UNDEFINED;

Assembler Symbols

Operation

AArch64.CheckFPAdvSIMDEnabled();

bits(128) X = V[d];
bits(32) Y = V[n]; // Note: 32 not 128 bits wide
bits(128) W = V[m];
bits(32) t;

for e = 0 to 3
t = SHAparity(X<63:32>, X<95:64>, X<127:96>);
Y = Y + ROL(X<31:0>, 5) + t + Elem[W, e, 32];
X<63:32> = ROL(X<63:32>, 30);
<Y, X> = ROL(Y:X, 32);
V[d] = X;

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

SHA1P Page 1335

SHA1SU0

SHA1 schedule update 0.

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 0 0 0 0 Rm 0 0 1 1 0 0 Rn Rd

SHA1SU0 <Vd>.4S, <Vn>.4S, <Vm>.4S

integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
if !HaveSHA1Ext() then UNDEFINED;

Assembler Symbols

<Vd> Is the name of the SIMD&FP source and destination register, encoded in the "Rd" field.
<Vn> Is the name of the second SIMD&FP source register, encoded in the "Rn" field.
<Vm> Is the name of the third SIMD&FP source register, encoded in the "Rm" field.

Operation

AArch64.CheckFPAdvSIMDEnabled();

bits(128) operand1 = V[d];

bits(128) operand2 = V[n];
bits(128) operand3 = V[m];
bits(128) result;

result = operand2<63:0>:operand1<127:64>;
result = result EOR operand1 EOR operand3;
V[d] = result;

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

SHA1SU0 Page 1336

SHA1SU1

SHA1 schedule update 1.

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 0 0 0 1 0 1 0 0 0 0 0 0 1 1 0 Rn Rd

SHA1SU1 <Vd>.4S, <Vn>.4S

integer d = UInt(Rd);
integer n = UInt(Rn);
if !HaveSHA1Ext() then UNDEFINED;

Assembler Symbols

<Vd> Is the name of the SIMD&FP source and destination register, encoded in the "Rd" field.
<Vn> Is the name of the second SIMD&FP source register, encoded in the "Rn" field.

Operation

AArch64.CheckFPAdvSIMDEnabled();

bits(128) operand1 = V[d];

bits(128) operand2 = V[n];
bits(128) result;
bits(128) T = operand1 EOR LSR(operand2, 32);
result<31:0> = ROL(T<31:0>, 1);
result<63:32> = ROL(T<63:32>, 1);
result<95:64> = ROL(T<95:64>, 1);
result<127:96> = ROL(T<127:96>, 1) EOR ROL(T<31:0>, 2);
V[d] = result;

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

SHA1SU1 Page 1337

SHA256H

SHA256 hash update (part 1).

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 0 0 0 0 Rm 0 1 0 0 0 0 Rn Rd
P

SHA256H <Qd>, <Qn>, <Vm>.4S

integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
if !HaveSHA256Ext() then UNDEFINED;

Assembler Symbols

<Qd> Is the 128-bit name of the SIMD&FP source and destination, encoded in the "Rd" field.
<Qn> Is the 128-bit name of the second SIMD&FP source register, encoded in the "Rn" field.
<Vm> Is the name of the third SIMD&FP source register, encoded in the "Rm" field.

Operation

AArch64.CheckFPAdvSIMDEnabled();

bits(128) result;
result = SHA256hash(V[d], V[n], V[m], TRUE);
V[d] = result;

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

SHA256H Page 1338

SHA256H2

SHA256 hash update (part 2).

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 0 0 0 0 Rm 0 1 0 1 0 0 Rn Rd
P

SHA256H2 <Qd>, <Qn>, <Vm>.4S

integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
if !HaveSHA256Ext() then UNDEFINED;

Assembler Symbols

Operation

AArch64.CheckFPAdvSIMDEnabled();

bits(128) result;
result = SHA256hash(V[n], V[d], V[m], FALSE);
V[d] = result;

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

SHA256H2 Page 1339

SHA256SU0

SHA256 schedule update 0.

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 0 0 0 1 0 1 0 0 0 0 0 1 0 1 0 Rn Rd

SHA256SU0 <Vd>.4S, <Vn>.4S

integer d = UInt(Rd);
integer n = UInt(Rn);
if !HaveSHA256Ext() then UNDEFINED;

Assembler Symbols

<Vd> Is the name of the SIMD&FP source and destination register, encoded in the "Rd" field.
<Vn> Is the name of the second SIMD&FP source register, encoded in the "Rn" field.

Operation

AArch64.CheckFPAdvSIMDEnabled();

bits(128) operand1 = V[d];

bits(128) operand2 = V[n];
bits(128) result;
bits(128) T = operand2<31:0>:operand1<127:32>;
bits(32) elt;

for e = 0 to 3
elt = Elem[T, e, 32];
elt = ROR(elt, 7) EOR ROR(elt, 18) EOR LSR(elt, 3);
Elem[result, e, 32] = elt + Elem[operand1, e, 32];
V[d] = result;

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

SHA256SU0 Page 1340

SHA256SU1

SHA256 schedule update 1.

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 0 0 0 0 Rm 0 1 1 0 0 0 Rn Rd

SHA256SU1 <Vd>.4S, <Vn>.4S, <Vm>.4S

integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
if !HaveSHA256Ext() then UNDEFINED;

Assembler Symbols

Operation

AArch64.CheckFPAdvSIMDEnabled();

bits(128) operand1 = V[d];

bits(128) operand2 = V[n];
bits(128) operand3 = V[m];
bits(128) result;
bits(128) T0 = operand3<31:0>:operand2<127:32>;
bits(64) T1;
bits(32) elt;

T1 = operand3<127:64>;
for e = 0 to 1
elt = Elem[T1, e, 32];
elt = ROR(elt, 17) EOR ROR(elt, 19) EOR LSR(elt, 10);
elt = elt + Elem[operand1, e, 32] + Elem[T0, e, 32];
Elem[result, e, 32] = elt;

T1 = result<63:0>;
for e = 2 to 3
elt = Elem[T1, e-2, 32];
elt = ROR(elt, 17) EOR ROR(elt, 19) EOR LSR(elt, 10);
elt = elt + Elem[operand1, e, 32] + Elem[T0, e, 32];
Elem[result, e, 32] = elt;

V[d] = result;

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

SHA256SU1 Page 1341

SHA512H

SHA512 Hash update part 1 takes the values from the three 128-bit source SIMD&FP registers and produces a 128-bit
output value that combines the sigma1 and chi functions of two iterations of the SHA512 computation. It returns this
value to the destination SIMD&FP register.
This instruction is implemented only when FEAT_SHA512 is implemented.

Advanced SIMD
(FEAT_SHA512)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 1 1 1 0 0 1 1 Rm 1 0 0 0 0 0 Rn Rd

SHA512H <Qd>, <Qn>, <Vm>.2D

if !HaveSHA512Ext() then UNDEFINED;

integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);

Assembler Symbols

<Qd> Is the 128-bit name of the SIMD&FP source and destination register, encoded in the "Rd" field.
<Qn> Is the 128-bit name of the second SIMD&FP source register, encoded in the "Rn" field.
<Vm> Is the name of the third SIMD&FP source register, encoded in the "Rm" field.

Operation

AArch64.CheckFPAdvSIMDEnabled();

bits(128) Vtmp;
bits(64) MSigma1;
bits(64) tmp;
bits(128) X = V[n];
bits(128) Y = V[m];
bits(128) W = V[d];

MSigma1 = ROR(Y<127:64>, 14) EOR ROR(Y<127:64>, 18) EOR ROR(Y<127:64>, 41);

Vtmp<127:64> = (Y<127:64> AND X<63:0>) EOR (NOT(Y<127:64>) AND X<127:64>);
Vtmp<127:64> = (Vtmp<127:64> + MSigma1 + W<127:64>);
tmp = Vtmp<127:64> + Y<63:0>;
MSigma1 = ROR(tmp, 14) EOR ROR(tmp, 18) EOR ROR(tmp, 41);
Vtmp<63:0> = (tmp AND Y<127:64>) EOR (NOT(tmp) AND X<63:0>);
Vtmp<63:0> = (Vtmp<63:0> + MSigma1 + W<63:0>);
V[d] = Vtmp;

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

SHA512H Page 1342

SHA512H2

SHA512 Hash update part 2 takes the values from the three 128-bit source SIMD&FP registers and produces a 128-bit
output value that combines the sigma0 and majority functions of two iterations of the SHA512 computation. It returns
this value to the destination SIMD&FP register.
This instruction is implemented only when FEAT_SHA512 is implemented.

Advanced SIMD
(FEAT_SHA512)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 1 1 1 0 0 1 1 Rm 1 0 0 0 0 1 Rn Rd

SHA512H2 <Qd>, <Qn>, <Vm>.2D

if !HaveSHA512Ext() then UNDEFINED;

integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);

Assembler Symbols

Operation

AArch64.CheckFPAdvSIMDEnabled();

bits(128) Vtmp;
bits(64) NSigma0;
bits(128) X = V[n];
bits(128) Y = V[m];
bits(128) W = V[d];

NSigma0 = ROR(Y<63:0>, 28) EOR ROR(Y<63:0>, 34) EOR ROR(Y<63:0>, 39);

Vtmp<127:64> = (X<63:0> AND Y<127:64>) EOR (X<63:0> AND Y<63:0>) EOR (Y<127:64> AND Y<63:0>);
Vtmp<127:64> = (Vtmp<127:64> + NSigma0 + W<127:64>);
NSigma0 = ROR(Vtmp<127:64>, 28) EOR ROR(Vtmp<127:64>, 34) EOR ROR(Vtmp<127:64>, 39);
Vtmp<63:0> = (Vtmp<127:64> AND Y<63:0>) EOR (Vtmp<127:64> AND Y<127:64>) EOR (Y<127:64> AND Y<63:0>);
Vtmp<63:0> = (Vtmp<63:0> + NSigma0 + W<63:0>);

V[d] = Vtmp;

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

SHA512H2 Page 1343

SHA512SU0

SHA512 Schedule Update 0 takes the values from the two 128-bit source SIMD&FP registers and produces a 128-bit
output value that combines the gamma0 functions of two iterations of the SHA512 schedule update that are performed
after the first 16 iterations within a block. It returns this value to the destination SIMD&FP register.
This instruction is implemented only when FEAT_SHA512 is implemented.

Advanced SIMD
(FEAT_SHA512)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 1 1 1 0 1 1 0 0 0 0 0 0 1 0 0 0 0 0 Rn Rd

SHA512SU0 <Vd>.2D, <Vn>.2D

if !HaveSHA512Ext() then UNDEFINED;

integer d = UInt(Rd);
integer n = UInt(Rn);

Assembler Symbols

<Vd> Is the name of the SIMD&FP source and destination register, encoded in the "Rd" field.
<Vn> Is the name of the second SIMD&FP source register, encoded in the "Rn" field.

Operation

AArch64.CheckFPAdvSIMDEnabled();

bits(64) sig0;
bits(128) Vtmp;
bits(128) X = V[n];
bits(128) W = V[d];
sig0 = ROR(W<127:64>, 1) EOR ROR(W<127:64>, 8) EOR ('0000000':W<127:71>);
Vtmp<63:0> = W<63:0> + sig0;
sig0 = ROR(X<63:0>, 1) EOR ROR(X<63:0>, 8) EOR ('0000000':X<63:7>);
Vtmp<127:64> = W<127:64> + sig0;
V[d] = Vtmp;

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

SHA512SU0 Page 1344

SHA512SU1

SHA512 Schedule Update 1 takes the values from the three source SIMD&FP registers and produces a 128-bit output
value that combines the gamma1 functions of two iterations of the SHA512 schedule update that are performed after
the first 16 iterations within a block. It returns this value to the destination SIMD&FP register.
This instruction is implemented only when FEAT_SHA512 is implemented.

Advanced SIMD
(FEAT_SHA512)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 1 1 1 0 0 1 1 Rm 1 0 0 0 1 0 Rn Rd

SHA512SU1 <Vd>.2D, <Vn>.2D, <Vm>.2D

if !HaveSHA512Ext() then UNDEFINED;

integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);

Assembler Symbols

Operation

AArch64.CheckFPAdvSIMDEnabled();

bits(64) sig1;
bits(128) Vtmp;
bits(128) X = V[n];
bits(128) Y = V[m];
bits(128) W = V[d];

sig1 = ROR(X<127:64>, 19) EOR ROR(X<127:64>, 61) EOR ('000000':X<127:70>);

Vtmp<127:64> = W<127:64> + sig1 + Y<127:64>;
sig1 = ROR(X<63:0>, 19) EOR ROR(X<63:0>, 61) EOR ('000000':X<63:6>);
Vtmp<63:0> = W<63:0> + sig1 + Y<63:0>;
V[d] = Vtmp;

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

SHA512SU1 Page 1345

SHADD

Signed Halving Add. This instruction adds corresponding signed integer values from the two source SIMD&FP
registers, shifts each result right one bit, places the results into a vector, and writes the vector to the destination
SIMD&FP register.
The results are truncated. For rounded results, see SRHADD.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 size 1 Rm 0 0 0 0 0 1 Rn Rd
U

SHADD <Vd>.<T>, <Vn>.<T>, <Vm>.<T>

Assembler Symbols

<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.

<T> Is an arrangement specifier, encoded in “size:Q”:

size Q <T>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 x RESERVED

<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.

Operation

CheckFPAdvSIMDEnabled64();
bits(datasize) operand1 = V[n];
bits(datasize) operand2 = V[m];
bits(datasize) result;
integer element1;
integer element2;
integer sum;

for e = 0 to elements-1
element1 = Int(Elem[operand1, e, esize], unsigned);
element2 = Int(Elem[operand2, e, esize], unsigned);
sum = element1 + element2;
Elem[result, e, esize] = sum<esize:1>;

V[d] = result;

Operational information

If PSTATE.DIT is 1:

SHADD Page 1346

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

SHADD Page 1347

SHL

Shift Left (immediate). This instruction reads each value from a vector, left shifts each result by an immediate value,
writes the final result to a vector, and writes the vector to the destination SIMD&FP register.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: Scalar and Vector

Scalar

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 1 0 != 0000 immb 0 1 0 1 0 1 Rn Rd
immh

SHL <V><d>, <V><n>, #<shift>

integer d = UInt(Rd);
integer n = UInt(Rn);

if immh<3> != '1' then UNDEFINED;

integer esize = 8 << 3;
integer datasize = esize;
integer elements = 1;

integer shift = UInt(immh:immb) - esize;

Vector

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 1 0 != 0000 immb 0 1 0 1 0 1 Rn Rd
immh

SHL <Vd>.<T>, <Vn>.<T>, #<shift>

integer d = UInt(Rd);
integer n = UInt(Rn);

if immh == '0000' then SEE(asimdimm);

if immh<3>:Q == '10' then UNDEFINED;
integer esize = 8 << HighestSetBit(immh);
integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;

integer shift = UInt(immh:immb) - esize;

Assembler Symbols

<V> Is a width specifier, encoded in “immh”:

immh <V>
0xxx RESERVED
1xxx D

<T> Is an arrangement specifier, encoded in “immh:Q”:

SHL Page 1348

immh Q <T>
0000 x SEE Advanced SIMD modified immediate
0001 0 8B
0001 1 16B
001x 0 4H
001x 1 8H
01xx 0 2S
01xx 1 4S
1xxx 0 RESERVED
1xxx 1 2D

<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.

<shift> For the scalar variant: is the left shift amount, in the range 0 to 63, encoded in “immh:immb”:

immh <shift>
0xxx RESERVED
1xxx (UInt(immh:immb)-64)

For the vector variant: is the left shift amount, in the range 0 to the element width in bits minus 1,
encoded in “immh:immb”:

immh <shift>
0000 SEE Advanced SIMD modified immediate
0001 (UInt(immh:immb)-8)
001x (UInt(immh:immb)-16)
01xx (UInt(immh:immb)-32)
1xxx (UInt(immh:immb)-64)

Operation

CheckFPAdvSIMDEnabled64();
bits(datasize) operand = V[n];
bits(datasize) result;

for e = 0 to elements-1
Elem[result, e, esize] = LSL(Elem[operand, e, esize], shift);

V[d] = result;

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

SHL Page 1349

SHLL, SHLL2

Shift Left Long (by element size). This instruction reads each vector element in the lower or upper half of the source
SIMD&FP register, left shifts each result by the element size, writes the final result to a vector, and writes the vector
to the destination SIMD&FP register. The destination vector elements are twice as long as the source vector elements.
The SHLL instruction extracts vector elements from the lower half of the source register. The SHLL2 instruction
extracts vector elements from the upper half of the source register.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 size 1 0 0 0 0 1 0 0 1 1 1 0 Rn Rd

SHLL{2} <Vd>.<Ta>, <Vn>.<Tb>, #<shift>

integer d = UInt(Rd);
integer n = UInt(Rn);

if size == '11' then UNDEFINED;

integer esize = 8 << UInt(size);
integer datasize = 64;
integer part = UInt(Q);
integer elements = datasize DIV esize;

integer shift = esize;

boolean unsigned = FALSE; // Or TRUE without change of functionality

Assembler Symbols

2 Is the second and upper half specifier. If present it causes the operation to be performed on the upper
64 bits of the registers holding the narrower elements, and is encoded in “Q”:

Q 2
0 [absent]
1 [present]

<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.

<Ta> Is an arrangement specifier, encoded in “size”:

size <Ta>
00 8H
01 4S
10 2D
11 RESERVED

<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.

<Tb> Is an arrangement specifier, encoded in “size:Q”:

size Q <Tb>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 x RESERVED

<shift> Is the left shift amount, which must be equal to the source element width in bits, encoded in “size”:

SHLL, SHLL2 Page 1350

size <shift>
00 8
01 16
10 32
11 RESERVED

Operation

CheckFPAdvSIMDEnabled64();
bits(datasize) operand = Vpart[n, part];
bits(2*datasize) result;
integer element;

for e = 0 to elements-1
element = Int(Elem[operand, e, esize], unsigned) << shift;
Elem[result, e, 2*esize] = element<2*esize-1:0>;

V[d] = result;

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

SHLL, SHLL2 Page 1351

SHRN, SHRN2

Shift Right Narrow (immediate). This instruction reads each unsigned integer value from the source SIMD&FP
register, right shifts each result by an immediate value, puts the final result into a vector, and writes the vector to the
lower or upper half of the destination SIMD&FP register. The destination vector elements are half as long as the
source vector elements. The results are truncated. For rounded results, see RSHRN.
The RSHRN instruction writes the vector to the lower half of the destination register and clears the upper half, while the
RSHRN2 instruction writes the vector to the upper half of the destination register without affecting the other bits of the
register.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 1 0 != 0000 immb 1 0 0 0 0 1 Rn Rd
immh op

SHRN{2} <Vd>.<Tb>, <Vn>.<Ta>, #<shift>

integer d = UInt(Rd);
integer n = UInt(Rn);

if immh == '0000' then SEE(asimdimm);

if immh<3> == '1' then UNDEFINED;
integer esize = 8 << HighestSetBit(immh);
integer datasize = 64;
integer part = UInt(Q);
integer elements = datasize DIV esize;

integer shift = (2 * esize) - UInt(immh:immb);

boolean round = (op == '1');

Assembler Symbols

2 Is the second and upper half specifier. If present it causes the operation to be performed on the upper
64 bits of the registers holding the narrower elements, and is encoded in “Q”:

Q 2
0 [absent]
1 [present]

<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.

<Tb> Is an arrangement specifier, encoded in “immh:Q”:

immh Q <Tb>
0000 x SEE Advanced SIMD modified immediate
0001 0 8B
0001 1 16B
001x 0 4H
001x 1 8H
01xx 0 2S
01xx 1 4S
1xxx x RESERVED

<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.

<Ta> Is an arrangement specifier, encoded in “immh”:

SHRN, SHRN2 Page 1352

immh <Ta>
0000 SEE Advanced SIMD modified immediate
0001 8H
001x 4S
01xx 2D
1xxx RESERVED

<shift> Is the right shift amount, in the range 1 to the destination element width in bits, encoded in
“immh:immb”:

immh <shift>
0000 SEE Advanced SIMD modified immediate
0001 (16-UInt(immh:immb))
001x (32-UInt(immh:immb))
01xx (64-UInt(immh:immb))
1xxx RESERVED

Operation

CheckFPAdvSIMDEnabled64();
bits(datasize*2) operand = V[n];
bits(datasize) result;
integer round_const = if round then (1 << (shift - 1)) else 0;
integer element;

for e = 0 to elements-1
element = (UInt(Elem[operand, e, 2*esize]) + round_const) >> shift;
Elem[result, e, esize] = element<esize-1:0>;

Vpart[d, part] = result;

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

SHRN, SHRN2 Page 1353

SHSUB

Signed Halving Subtract. This instruction subtracts the elements in the vector in the second source SIMD&FP register
from the corresponding elements in the vector in the first source SIMD&FP register, shifts each result right one bit,
places each result into elements of a vector, and writes the vector to the destination SIMD&FP register.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 size 1 Rm 0 0 1 0 0 1 Rn Rd
U

SHSUB <Vd>.<T>, <Vn>.<T>, <Vm>.<T>

Assembler Symbols

<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.

<T> Is an arrangement specifier, encoded in “size:Q”:

size Q <T>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 x RESERVED

<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.

Operation

CheckFPAdvSIMDEnabled64();
bits(datasize) operand1 = V[n];
bits(datasize) operand2 = V[m];
bits(datasize) result;
integer element1;
integer element2;
integer diff;

for e = 0 to elements-1
element1 = Int(Elem[operand1, e, esize], unsigned);
element2 = Int(Elem[operand2, e, esize], unsigned);
diff = element1 - element2;
Elem[result, e, esize] = diff<esize:1>;

V[d] = result;

Operational information

If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:

SHSUB Page 1354

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

SHSUB Page 1355

SLI

Shift Left and Insert (immediate). This instruction reads each vector element in the source SIMD&FP register, left
shifts each vector element by an immediate value, and inserts the result into the corresponding vector element in the
destination SIMD&FP register such that the new zero bits created by the shift are not inserted but retain their existing
value. Bits shifted out of the left of each vector element in the source register are lost.
The following figure shows an example of the operation of shift left by 3 for an 8-bit vector element.
63 56 55 0
Vn.B[7]

63 56 55 0
Vd.B[7] after operation

63 56 55 0
Vd.B[7] before operation

Scalar

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 1 1 1 0 != 0000 immb 0 1 0 1 0 1 Rn Rd
immh

SLI <V><d>, <V><n>, #<shift>

integer d = UInt(Rd);
integer n = UInt(Rn);

if immh<3> != '1' then UNDEFINED;

integer esize = 8 << 3;
integer datasize = esize;
integer elements = 1;

integer shift = UInt(immh:immb) - esize;

Vector

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 1 0 != 0000 immb 0 1 0 1 0 1 Rn Rd
immh

SLI <Vd>.<T>, <Vn>.<T>, #<shift>

integer d = UInt(Rd);
integer n = UInt(Rn);

if immh == '0000' then SEE(asimdimm);

if immh<3>:Q == '10' then UNDEFINED;
integer esize = 8 << HighestSetBit(immh);
integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;

integer shift = UInt(immh:immb) - esize;

SLI Page 1356

Assembler Symbols

<V> Is a width specifier, encoded in “immh”:

immh <V>
0xxx RESERVED
1xxx D

<T> Is an arrangement specifier, encoded in “immh:Q”:

immh Q <T>
0000 x SEE Advanced SIMD modified immediate
0001 0 8B
0001 1 16B
001x 0 4H
001x 1 8H
01xx 0 2S
01xx 1 4S
1xxx 0 RESERVED
1xxx 1 2D

<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.

<shift> For the scalar variant: is the left shift amount, in the range 0 to 63, encoded in “immh:immb”:

immh <shift>
0xxx RESERVED
1xxx (UInt(immh:immb)-64)

For the vector variant: is the left shift amount, in the range 0 to the element width in bits minus 1,
encoded in “immh:immb”:

immh <shift>
0000 SEE Advanced SIMD modified immediate
0001 (UInt(immh:immb)-8)
001x (UInt(immh:immb)-16)
01xx (UInt(immh:immb)-32)
1xxx (UInt(immh:immb)-64)

Operation

CheckFPAdvSIMDEnabled64();
bits(datasize) operand = V[n];
bits(datasize) operand2 = V[d];
bits(datasize) result;
bits(esize) mask = LSL(Ones(esize), shift);
bits(esize) shifted;

for e = 0 to elements-1
shifted = LSL(Elem[operand, e, esize], shift);
Elem[result, e, esize] = (Elem[operand2, e, esize] AND NOT(mask)) OR shifted;
V[d] = result;

Operational information

SLI Page 1357

◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

SLI Page 1358

SM3PARTW1

SM3PARTW1 takes three 128-bit vectors from the three source SIMD&FP registers and returns a 128-bit result in the
destination SIMD&FP register. The result is obtained by a three-way exclusive OR of the elements within the input
vectors with some fixed rotations, see the Operation pseudocode for more information.
This instruction is implemented only when FEAT_SM3 is implemented.

Advanced SIMD
(FEAT_SM3)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 1 1 1 0 0 1 1 Rm 1 1 0 0 0 0 Rn Rd

SM3PARTW1 <Vd>.4S, <Vn>.4S, <Vm>.4S

if !HaveSM3Ext() then UNDEFINED;

integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);

Assembler Symbols

Operation

AArch64.CheckFPAdvSIMDEnabled();

bits(128) Vm = V[m];
bits(128) Vn = V[n];
bits(128) Vd = V[d];
bits(128) result;

result<95:0> = (Vd EOR Vn)<95:0> EOR (ROL(Vm<127:96>, 15):ROL(Vm<95:64>, 15):ROL(Vm<63:32>, 15));

for i = 0 to 3
if i == 3 then
result<127:96> = (Vd EOR Vn)<127:96> EOR (ROL(result<31:0>, 15));
result<(32*i)+31:(32*i)> = result<(32*i)+31:(32*i)> EOR ROL(result<(32*i)+31:(32*i)>, 15) EOR ROL(res
V[d] = result;

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

SM3PARTW1 Page 1359

SM3PARTW2

SM3PARTW2 takes three 128-bit vectors from three source SIMD&FP registers and returns a 128-bit result in the
destination SIMD&FP register. The result is obtained by a three-way exclusive OR of the elements within the input
vectors with some fixed rotations, see the Operation pseudocode for more information.
This instruction is implemented only when FEAT_SM3 is implemented.

Advanced SIMD
(FEAT_SM3)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 1 1 1 0 0 1 1 Rm 1 1 0 0 0 1 Rn Rd

SM3PARTW2 <Vd>.4S, <Vn>.4S, <Vm>.4S

if !HaveSM3Ext() then UNDEFINED;

integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);

Assembler Symbols

Operation

AArch64.CheckFPAdvSIMDEnabled();

bits(128) Vm = V[m];
bits(128) Vn = V[n];
bits(128) Vd = V[d];
bits(128) result;
bits(128) tmp;
bits(32) tmp2;
tmp<127:0> = Vn EOR (ROL(Vm<127:96>, 7):ROL(Vm<95:64>, 7):ROL(Vm<63:32>, 7):ROL(Vm<31:0>, 7));
result<127:0> = Vd<127:0> EOR tmp<127:0>;
tmp2 = ROL(tmp<31:0>, 15);
tmp2 = tmp2 EOR ROL(tmp2, 15) EOR ROL(tmp2, 23);
result<127:96> = result<127:96> EOR tmp2;
V[d] = result;

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

SM3PARTW2 Page 1360

SM3SS1

SM3SS1 rotates the top 32 bits of the 128-bit vector in the first source SIMD&FP register by 12, and adds that 32-bit
value to the two other 32-bit values held in the top 32 bits of each of the 128-bit vectors in the second and third source
SIMD&FP registers, rotating this result left by 7 and writing the final result into the top 32 bits of the vector in the
destination SIMD&FP register, with the bottom 96 bits of the vector being written to 0.
This instruction is implemented only when FEAT_SM3 is implemented.

Advanced SIMD
(FEAT_SM3)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 1 1 1 0 0 1 0 Rm 0 Ra Rn Rd

SM3SS1 <Vd>.4S, <Vn>.4S, <Vm>.4S, <Va>.4S

if !HaveSM3Ext() then UNDEFINED;

integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer a = UInt(Ra);

Assembler Symbols

Operation

AArch64.CheckFPAdvSIMDEnabled();

bits(128) Vm = V[m];
bits(128) Vn = V[n];
bits(128) Va = V[a];
bits(128) result;
result<127:96> = ROL((ROL(Vn<127:96>, 12) + Vm<127:96> + Va<127:96>), 7);
result<95:0> = Zeros();
V[d] = result;

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

SM3SS1 Page 1361

SM3TT1A

SM3TT1A takes three 128-bit vectors from three source SIMD&FP registers and a 2-bit immediate index value, and
returns a 128-bit result in the destination SIMD&FP register. It performs a three-way exclusive OR of the three 32-bit
fields held in the upper three elements of the first source vector, and adds the resulting 32-bit value and the following
three other 32-bit values:
• The bottom 32-bit element of the first source vector, Vd, that was used for the three-way exclusive OR.
• The result of the exclusive OR of the top 32-bit element of the second source vector, Vn, with a rotation left by
12 of the top 32-bit element of the first source vector.
• A 32-bit element indexed out of the third source vector, Vm.
The result of this addition is returned as the top element of the result. The other elements of the result are taken from
elements of the first source vector, with the element returned in bits<63:32> being rotated left by 9.
This instruction is implemented only when FEAT_SM3 is implemented.

Advanced SIMD
(FEAT_SM3)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 1 1 1 0 0 1 0 Rm 1 0 imm2 0 0 Rn Rd

SM3TT1A <Vd>.4S, <Vn>.4S, <Vm>.S[<imm2>]

if !HaveSM3Ext() then UNDEFINED;

integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer i = UInt(imm2);

Assembler Symbols

Operation

AArch64.CheckFPAdvSIMDEnabled();

bits(128) Vm = V[m];
bits(128) Vn = V[n];
bits(128) Vd = V[d];
bits(32) WjPrime;
bits(128) result;
bits(32) TT1;
bits(32) SS2;

WjPrime = Elem[Vm, i, 32];

SS2 = Vn<127:96> EOR ROL(Vd<127:96>, 12);
TT1 = Vd<63:32> EOR (Vd<127:96> EOR Vd<95:64>);
TT1 = (TT1+Vd<31:0>+SS2+WjPrime)<31:0>;
result<31:0> = Vd<63:32>;
result<63:32> = ROL(Vd<95:64>, 9);
result<95:64> = Vd<127:96>;
result<127:96> = TT1;
V[d] = result;

Operational information

If PSTATE.DIT is 1:

SM3TT1A Page 1362

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

SM3TT1A Page 1363

SM3TT1B

SM3TT1B takes three 128-bit vectors from three source SIMD&FP registers and a 2-bit immediate index value, and
returns a 128-bit result in the destination SIMD&FP register. It performs a 32-bit majority function between the three
32-bit fields held in the upper three elements of the first source vector, and adds the resulting 32-bit value and the
following three other 32-bit values:
• The bottom 32-bit element of the first source vector, Vd, that was used for the 32-bit majority function.
• The result of the exclusive OR of the top 32-bit element of the second source vector, Vn, with a rotation left by
12 of the top 32-bit element of the first source vector.
• A 32-bit element indexed out of the third source vector, Vm.
The result of this addition is returned as the top element of the result. The other elements of the result are taken from
elements of the first source vector, with the element returned in bits<63:32> being rotated left by 9.
This instruction is implemented only when FEAT_SM3 is implemented.

Advanced SIMD
(FEAT_SM3)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 1 1 1 0 0 1 0 Rm 1 0 imm2 0 1 Rn Rd

SM3TT1B <Vd>.4S, <Vn>.4S, <Vm>.S[<imm2>]

if !HaveSM3Ext() then UNDEFINED;

integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer i = UInt(imm2);

Assembler Symbols

Operation

AArch64.CheckFPAdvSIMDEnabled();

bits(128) Vm = V[m];
bits(128) Vn = V[n];
bits(128) Vd = V[d];
bits(32) WjPrime;
bits(128) result;
bits(32) TT1;
bits(32) SS2;

WjPrime = Elem[Vm, i, 32];

SS2 = Vn<127:96> EOR ROL(Vd<127:96>, 12);
TT1 = (Vd<127:96> AND Vd<63:32>) OR (Vd<127:96> AND Vd<95:64>) OR (Vd<63:32> AND Vd<95:64>);
TT1 = (TT1+Vd<31:0>+SS2+WjPrime)<31:0>;
result<31:0> = Vd<63:32>;
result<63:32> = ROL(Vd<95:64>, 9);
result<95:64> = Vd<127:96>;
result<127:96> = TT1;
V[d] = result;

Operational information

If PSTATE.DIT is 1:

SM3TT1B Page 1364

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

SM3TT1B Page 1365

SM3TT2A

SM3TT2A takes three 128-bit vectors from three source SIMD&FP register and a 2-bit immediate index value, and
returns a 128-bit result in the destination SIMD&FP register. It performs a three-way exclusive OR of the three 32-bit
fields held in the upper three elements of the first source vector, and adds the resulting 32-bit value and the following
three other 32-bit values:
• The bottom 32-bit element of the first source vector, Vd, that was used for the three-way exclusive OR.
• The 32-bit element held in the top 32 bits of the second source vector, Vn.
• A 32-bit element indexed out of the third source vector, Vm.
A three-way exclusive OR is performed of the result of this addition, the result of the addition rotated left by 9, and the
result of the addition rotated left by 17. The result of this exclusive OR is returned as the top element of the returned
result. The other elements of this result are taken from elements of the first source vector, with the element returned
in bits<63:32> being rotated left by 19.
This instruction is implemented only when FEAT_SM3 is implemented.

Advanced SIMD
(FEAT_SM3)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 1 1 1 0 0 1 0 Rm 1 0 imm2 1 0 Rn Rd

SM3TT2A <Vd>.4S, <Vn>.4S, <Vm>.S[<imm2>]

if !HaveSM3Ext() then UNDEFINED;

integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer i = UInt(imm2);

Assembler Symbols

Operation

AArch64.CheckFPAdvSIMDEnabled();

bits(128) Vm = V[m];
bits(128) Vn = V[n];
bits(128) Vd = V[d];
bits(32) Wj;
bits(128) result;
bits(32) TT2;

Wj = Elem[Vm, i, 32];
TT2 = Vd<63:32> EOR (Vd<127:96> EOR Vd<95:64>);
TT2 = (TT2+Vd<31:0>+Vn<127:96>+Wj)<31:0>;

result<31:0> = Vd<63:32>;
result<63:32> = ROL(Vd<95:64>, 19);
result<95:64> = Vd<127:96>;
result<127:96> = TT2 EOR ROL(TT2, 9) EOR ROL(TT2, 17);
V[d] = result;

Operational information

If PSTATE.DIT is 1:

SM3TT2A Page 1366

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

SM3TT2A Page 1367

SM3TT2B

SM3TT2B takes three 128-bit vectors from three source SIMD&FP registers, and a 2-bit immediate index value, and
returns a 128-bit result in the destination SIMD&FP register. It performs a 32-bit majority function between the three
32-bit fields held in the upper three elements of the first source vector, and adds the resulting 32-bit value and the
following three other 32-bit values:
• The bottom 32-bit element of the first source vector, Vd, that was used for the 32-bit majority function.
• The 32-bit element held in the top 32 bits of the second source vector, Vn.
• A 32-bit element indexed out of the third source vector, Vm.
A three-way exclusive OR is performed of the result of this addition, the result of the addition rotated left by 9, and the
result of the addition rotated left by 17. The result of this exclusive OR is returned as the top element of the returned
result. The other elements of this result are taken from elements of the first source vector, with the element returned
in bits<63:32> being rotated left by 19.
This instruction is implemented only when FEAT_SM3 is implemented.

Advanced SIMD
(FEAT_SM3)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 1 1 1 0 0 1 0 Rm 1 0 imm2 1 1 Rn Rd

SM3TT2B <Vd>.4S, <Vn>.4S, <Vm>.S[<imm2>]

if !HaveSM3Ext() then UNDEFINED;

integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer i = UInt(imm2);

Assembler Symbols

Operation

AArch64.CheckFPAdvSIMDEnabled();

bits(128) Vm = V[m];
bits(128) Vn = V[n];
bits(128) Vd = V[d];
bits(32) Wj;
bits(128) result;
bits(32) TT2;

Wj = Elem[Vm, i, 32];
TT2 = (Vd<127:96> AND Vd<95:64>) OR (NOT(Vd<127:96>) AND Vd<63:32>);
TT2 = (TT2+Vd<31:0>+Vn<127:96>+Wj)<31:0>;

result<31:0> = Vd<63:32>;
result<63:32> = ROL(Vd<95:64>, 19);
result<95:64> = Vd<127:96>;
result<127:96> = TT2 EOR ROL(TT2, 9) EOR ROL(TT2, 17);
V[d] = result;

Operational information

If PSTATE.DIT is 1:

SM3TT2B Page 1368

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

SM3TT2B Page 1369

SM4E

SM4 Encode takes input data as a 128-bit vector from the first source SIMD&FP register, and four iterations of the
round key held as the elements of the 128-bit vector in the second source SIMD&FP register. It encrypts the data by
four rounds, in accordance with the SM4 standard, returning the 128-bit result to the destination SIMD&FP register.
This instruction is implemented only when FEAT_SM4 is implemented.

Advanced SIMD
(FEAT_SM4)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 1 1 1 0 1 1 0 0 0 0 0 0 1 0 0 0 0 1 Rn Rd

SM4E <Vd>.4S, <Vn>.4S

if !HaveSM4Ext() then UNDEFINED;

integer d = UInt(Rd);
integer n = UInt(Rn);

Assembler Symbols

<Vd> Is the name of the SIMD&FP source and destination register, encoded in the "Rd" field.
<Vn> Is the name of the second SIMD&FP source register, encoded in the "Rn" field.

Operation

AArch64.CheckFPAdvSIMDEnabled();

bits(128) Vn = V[n];
bits(32) intval;
bits(128) roundresult;
bits(32) roundkey;

roundresult = V[d];
for index = 0 to 3
roundkey = Elem[Vn, index, 32];

intval = roundresult<127:96> EOR roundresult<95:64> EOR roundresult<63:32> EOR roundkey;

for i = 0 to 3
Elem[intval, i, 8] = Sbox(Elem[intval, i, 8]);

intval = intval EOR ROL(intval, 2) EOR ROL(intval, 10) EOR ROL(intval, 18) EOR ROL(intval, 24);
intval = intval EOR roundresult<31:0>;

roundresult<31:0> = roundresult<63:32>;
roundresult<63:32> = roundresult<95:64>;
roundresult<95:64> = roundresult<127:96>;
roundresult<127:96> = intval;

V[d] = roundresult;

Operational information

SM4E Page 1370

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

SM4E Page 1371

SM4EKEY

SM4 Key takes an input as a 128-bit vector from the first source SIMD&FP register and a 128-bit constant from the
second SIMD&FP register. It derives four iterations of the output key, in accordance with the SM4 standard, returning
the 128-bit result to the destination SIMD&FP register.
This instruction is implemented only when FEAT_SM4 is implemented.

Advanced SIMD
(FEAT_SM4)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 1 1 1 0 0 1 1 Rm 1 1 0 0 1 0 Rn Rd

SM4EKEY <Vd>.4S, <Vn>.4S, <Vm>.4S

if !HaveSM4Ext() then UNDEFINED;

integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);

Assembler Symbols

Operation

AArch64.CheckFPAdvSIMDEnabled();

bits(128) Vm = V[m];
bits(32) intval;
bits(128) result;
bits(32) const;
bits(128) roundresult;

roundresult = V[n];
for index = 0 to 3
const = Elem[Vm, index, 32];

intval = roundresult<127:96> EOR roundresult<95:64> EOR roundresult<63:32> EOR const;

for i = 0 to 3
Elem[intval, i, 8] = Sbox(Elem[intval, i, 8]);

intval = intval EOR ROL(intval, 13) EOR ROL(intval, 23);

intval = intval EOR roundresult<31:0>;

roundresult<31:0> = roundresult<63:32>;
roundresult<63:32> = roundresult<95:64>;
roundresult<95:64> = roundresult<127:96>;
roundresult<127:96> = intval;

V[d] = roundresult;

Operational information

If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.

SM4EKEY Page 1372

• The response of this instruction to asynchronous exceptions does not vary based on:
◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

SM4EKEY Page 1373

SMAX

Signed Maximum (vector). This instruction compares corresponding elements in the vectors in the two source
SIMD&FP registers, places the larger of each pair of signed integer values into a vector, and writes the vector to the
destination SIMD&FP register.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 size 1 Rm 0 1 1 0 0 1 Rn Rd
U o1

SMAX <Vd>.<T>, <Vn>.<T>, <Vm>.<T>

boolean unsigned = (U == '1');

boolean minimum = (o1 == '1');

Assembler Symbols

<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.

<T> Is an arrangement specifier, encoded in “size:Q”:

size Q <T>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 x RESERVED

<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.

Operation

CheckFPAdvSIMDEnabled64();
bits(datasize) operand1 = V[n];
bits(datasize) operand2 = V[m];
bits(datasize) result;
integer element1;
integer element2;
integer maxmin;

for e = 0 to elements-1
element1 = Int(Elem[operand1, e, esize], unsigned);
element2 = Int(Elem[operand2, e, esize], unsigned);
maxmin = if minimum then Min(element1, element2) else Max(element1, element2);
Elem[result, e, esize] = maxmin<esize-1:0>;

V[d] = result;

SMAX Page 1374

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

SMAX Page 1375

SMAXP

Signed Maximum Pairwise. This instruction creates a vector by concatenating the vector elements of the first source
SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent
vector elements in the two source SIMD&FP registers, writes the largest of each pair of signed integer values into a
vector, and writes the vector to the destination SIMD&FP register.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 size 1 Rm 1 0 1 0 0 1 Rn Rd
U o1

SMAXP <Vd>.<T>, <Vn>.<T>, <Vm>.<T>

boolean unsigned = (U == '1');

boolean minimum = (o1 == '1');

Assembler Symbols

<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.

<T> Is an arrangement specifier, encoded in “size:Q”:

size Q <T>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 x RESERVED

<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.

Operation

CheckFPAdvSIMDEnabled64();
bits(datasize) operand1 = V[n];
bits(datasize) operand2 = V[m];
bits(datasize) result;
bits(2*datasize) concat = operand2:operand1;
integer element1;
integer element2;
integer maxmin;

for e = 0 to elements-1
element1 = Int(Elem[concat, 2*e, esize], unsigned);
element2 = Int(Elem[concat, (2*e)+1, esize], unsigned);
maxmin = if minimum then Min(element1, element2) else Max(element1, element2);
Elem[result, e, esize] = maxmin<esize-1:0>;

V[d] = result;

SMAXP Page 1376

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

SMAXP Page 1377

SMAXV

Signed Maximum across Vector. This instruction compares all the vector elements in the source SIMD&FP register,
and writes the largest of the values as a scalar to the destination SIMD&FP register. All the values in this instruction
are signed integer values.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 size 1 1 0 0 0 0 1 0 1 0 1 0 Rn Rd
U op

SMAXV <V><d>, <Vn>.<T>

integer d = UInt(Rd);
integer n = UInt(Rn);

if size:Q == '100' then UNDEFINED;

if size == '11' then UNDEFINED;
integer esize = 8 << UInt(size);
integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;

boolean unsigned = (U == '1');

boolean min = (op == '1');

Assembler Symbols

<V> Is the destination width specifier, encoded in “size”:

size <V>
00 B
01 H
10 S
11 RESERVED

<d> Is the number of the SIMD&FP destination register, encoded in the "Rd" field.
<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.

<T> Is an arrangement specifier, encoded in “size:Q”:

size Q <T>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 RESERVED
10 1 4S
11 x RESERVED

Operation

CheckFPAdvSIMDEnabled64();
bits(datasize) operand = V[n];
integer maxmin;
integer element;

maxmin = Int(Elem[operand, 0, esize], unsigned);

for e = 1 to elements-1
element = Int(Elem[operand, e, esize], unsigned);
maxmin = if min then Min(maxmin, element) else Max(maxmin, element);

V[d] = maxmin<esize-1:0>;

SMAXV Page 1378

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

SMAXV Page 1379

SMIN

Signed Minimum (vector). This instruction compares corresponding elements in the vectors in the two source
SIMD&FP registers, places the smaller of each of the two signed integer values into a vector, and writes the vector to
the destination SIMD&FP register.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 size 1 Rm 0 1 1 0 1 1 Rn Rd
U o1

SMIN <Vd>.<T>, <Vn>.<T>, <Vm>.<T>

boolean unsigned = (U == '1');

boolean minimum = (o1 == '1');

Assembler Symbols

<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.

<T> Is an arrangement specifier, encoded in “size:Q”:

size Q <T>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 x RESERVED

<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.

Operation

CheckFPAdvSIMDEnabled64();
bits(datasize) operand1 = V[n];
bits(datasize) operand2 = V[m];
bits(datasize) result;
integer element1;
integer element2;
integer maxmin;

V[d] = result;

SMIN Page 1380

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

SMIN Page 1381

SMINP

Signed Minimum Pairwise. This instruction creates a vector by concatenating the vector elements of the first source
SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent
vector elements in the two source SIMD&FP registers, writes the smallest of each pair of signed integer values into a
vector, and writes the vector to the destination SIMD&FP register.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 size 1 Rm 1 0 1 0 1 1 Rn Rd
U o1

SMINP <Vd>.<T>, <Vn>.<T>, <Vm>.<T>

boolean unsigned = (U == '1');

boolean minimum = (o1 == '1');

Assembler Symbols

<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.

<T> Is an arrangement specifier, encoded in “size:Q”:

size Q <T>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 x RESERVED

<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.

Operation

V[d] = result;

SMINP Page 1382

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

SMINP Page 1383

SMINV

Signed Minimum across Vector. This instruction compares all the vector elements in the source SIMD&FP register,
and writes the smallest of the values as a scalar to the destination SIMD&FP register. All the values in this instruction
are signed integer values.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 size 1 1 0 0 0 1 1 0 1 0 1 0 Rn Rd
U op

SMINV <V><d>, <Vn>.<T>

integer d = UInt(Rd);
integer n = UInt(Rn);

if size:Q == '100' then UNDEFINED;

if size == '11' then UNDEFINED;
integer esize = 8 << UInt(size);
integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;

boolean unsigned = (U == '1');

boolean min = (op == '1');

Assembler Symbols

<V> Is the destination width specifier, encoded in “size”:

size <V>
00 B
01 H
10 S
11 RESERVED

<d> Is the number of the SIMD&FP destination register, encoded in the "Rd" field.
<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.

<T> Is an arrangement specifier, encoded in “size:Q”:

size Q <T>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 RESERVED
10 1 4S
11 x RESERVED

Operation

CheckFPAdvSIMDEnabled64();
bits(datasize) operand = V[n];
integer maxmin;
integer element;

maxmin = Int(Elem[operand, 0, esize], unsigned);

for e = 1 to elements-1
element = Int(Elem[operand, e, esize], unsigned);
maxmin = if min then Min(maxmin, element) else Max(maxmin, element);

V[d] = maxmin<esize-1:0>;

SMINV Page 1384

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

SMINV Page 1385

SMLAL, SMLAL2 (by element)

Signed Multiply-Add Long (vector, by element). This instruction multiplies each vector element in the lower or upper
half of the first source SIMD&FP register by the specified vector element in the second source SIMD&FP register, and
accumulates the results with the vector elements of the destination SIMD&FP register. The destination vector
elements are twice as long as the elements that are multiplied. All the values in this instruction are signed integer
values.
The SMLAL instruction extracts vector elements from the lower half of the first source register. The SMLAL2 instruction
extracts vector elements from the upper half of the first source register.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 1 size L M Rm 0 0 1 0 H 0 Rn Rd
U o2

SMLAL{2} <Vd>.<Ta>, <Vn>.<Tb>, <Vm>.<Ts>[<index>]

integer idxdsize = if H == '1' then 128 else 64;

integer index;
bit Rmhi;
case size of
when '01' index = UInt(H:L:M); Rmhi = '0';
when '10' index = UInt(H:L); Rmhi = M;
otherwise UNDEFINED;

integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rmhi:Rm);

integer esize = 8 << UInt(size);

integer datasize = 64;
integer part = UInt(Q);
integer elements = datasize DIV esize;

boolean unsigned = (U == '1');

boolean sub_op = (o2 == '1');

Assembler Symbols

2 Is the second and upper half specifier. If present it causes the operation to be performed on the upper
64 bits of the registers holding the narrower elements, and is encoded in “Q”:

Q 2
0 [absent]
1 [present]

<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.

<Ta> Is an arrangement specifier, encoded in “size”:

size <Ta>
00 RESERVED
01 4S
10 2D
11 RESERVED

<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.

<Tb> Is an arrangement specifier, encoded in “size:Q”:

SMLAL, SMLAL2 (by

Page 1386
element)
size Q <Tb>
00 x RESERVED
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 x RESERVED

<Vm> Is the name of the second SIMD&FP source register, encoded in “size:M:Rm”:

size <Vm>
00 RESERVED
01 0:Rm
10 M:Rm
11 RESERVED
Restricted to V0-V15 when element size <Ts> is H.

<Ts> Is an element size specifier, encoded in “size”:

size <Ts>
00 RESERVED
01 H
10 S
11 RESERVED

<index> Is the element index, encoded in “size:L:H:M”:

size <index>
00 RESERVED
01 H:L:M
10 H:L
11 RESERVED

Operation

CheckFPAdvSIMDEnabled64();
bits(datasize) operand1 = Vpart[n, part];
bits(idxdsize) operand2 = V[m];
bits(2*datasize) operand3 = V[d];
bits(2*datasize) result;
integer element1;
integer element2;
bits(2*esize) product;

element2 = Int(Elem[operand2, index, esize], unsigned);

for e = 0 to elements-1
element1 = Int(Elem[operand1, e, esize], unsigned);
product = (element1*element2)<2*esize-1:0>;
if sub_op then
Elem[result, e, 2*esize] = Elem[operand3, e, 2*esize] - product;
else
Elem[result, e, 2*esize] = Elem[operand3, e, 2*esize] + product;

V[d] = result;

Operational information

SMLAL, SMLAL2 (by

Page 1387
element)
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

SMLAL, SMLAL2 (by

Page 1388
element)
SMLAL, SMLAL2 (vector)

Signed Multiply-Add Long (vector). This instruction multiplies corresponding signed integer values in the lower or
upper half of the vectors of the two source SIMD&FP registers, and accumulates the results with the vector elements
of the destination SIMD&FP register. The destination vector elements are twice as long as the elements that are
multiplied.
The SMLAL instruction extracts each source vector from the lower half of each source register. The SMLAL2 instruction
extracts each source vector from the upper half of each source register.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 size 1 Rm 1 0 0 0 0 0 Rn Rd
U o1

SMLAL{2} <Vd>.<Ta>, <Vn>.<Tb>, <Vm>.<Tb>

integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);

if size == '11' then UNDEFINED;

integer esize = 8 << UInt(size);
integer datasize = 64;
integer part = UInt(Q);
integer elements = datasize DIV esize;
boolean sub_op = (o1 == '1');
boolean unsigned = (U == '1');

Assembler Symbols

2 Is the second and upper half specifier. If present it causes the operation to be performed on the upper
64 bits of the registers holding the narrower elements, and is encoded in “Q”:

Q 2
0 [absent]
1 [present]

<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.

<Ta> Is an arrangement specifier, encoded in “size”:

size <Ta>
00 8H
01 4S
10 2D
11 RESERVED

<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.

<Tb> Is an arrangement specifier, encoded in “size:Q”:

size Q <Tb>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 x RESERVED

<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.

SMLAL, SMLAL2 (vector) Page 1389

Operation

CheckFPAdvSIMDEnabled64();
bits(datasize) operand1 = Vpart[n, part];
bits(datasize) operand2 = Vpart[m, part];
bits(2*datasize) operand3 = V[d];
bits(2*datasize) result;
integer element1;
integer element2;
bits(2*esize) product;
bits(2*esize) accum;

for e = 0 to elements-1
element1 = Int(Elem[operand1, e, esize], unsigned);
element2 = Int(Elem[operand2, e, esize], unsigned);
product = (element1*element2)<2*esize-1:0>;
if sub_op then
accum = Elem[operand3, e, 2*esize] - product;
else
accum = Elem[operand3, e, 2*esize] + product;
Elem[result, e, 2*esize] = accum;

V[d] = result;

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

SMLAL, SMLAL2 (vector) Page 1390

SMLSL, SMLSL2 (by element)

Signed Multiply-Subtract Long (vector, by element). This instruction multiplies each vector element in the lower or
upper half of the first source SIMD&FP register by the specified vector element of the second source SIMD&FP
register and subtracts the results from the vector elements of the destination SIMD&FP register. The destination
vector elements are twice as long as the elements that are multiplied.
The SMLSL instruction extracts vector elements from the lower half of the first source register. The SMLSL2 instruction
extracts vector elements from the upper half of the first source register.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 1 size L M Rm 0 1 1 0 H 0 Rn Rd
U o2

SMLSL{2} <Vd>.<Ta>, <Vn>.<Tb>, <Vm>.<Ts>[<index>]

integer idxdsize = if H == '1' then 128 else 64;

integer index;
bit Rmhi;
case size of
when '01' index = UInt(H:L:M); Rmhi = '0';
when '10' index = UInt(H:L); Rmhi = M;
otherwise UNDEFINED;

integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rmhi:Rm);

integer esize = 8 << UInt(size);

integer datasize = 64;
integer part = UInt(Q);
integer elements = datasize DIV esize;

boolean unsigned = (U == '1');

boolean sub_op = (o2 == '1');

Assembler Symbols

2 Is the second and upper half specifier. If present it causes the operation to be performed on the upper
64 bits of the registers holding the narrower elements, and is encoded in “Q”:

Q 2
0 [absent]
1 [present]

<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.

<Ta> Is an arrangement specifier, encoded in “size”:

size <Ta>
00 RESERVED
01 4S
10 2D
11 RESERVED

<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.

<Tb> Is an arrangement specifier, encoded in “size:Q”:

SMLSL, SMLSL2 (by

Page 1391
element)
size Q <Tb>
00 x RESERVED
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 x RESERVED

<Vm> Is the name of the second SIMD&FP source register, encoded in “size:M:Rm”:

size <Vm>
00 RESERVED
01 0:Rm
10 M:Rm
11 RESERVED
Restricted to V0-V15 when element size <Ts> is H.

<Ts> Is an element size specifier, encoded in “size”:

size <Ts>
00 RESERVED
01 H
10 S
11 RESERVED

<index> Is the element index, encoded in “size:L:H:M”:

size <index>
00 RESERVED
01 H:L:M
10 H:L
11 RESERVED

Operation

element2 = Int(Elem[operand2, index, esize], unsigned);

V[d] = result;

Operational information

SMLSL, SMLSL2 (by

Page 1392
element)
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

SMLSL, SMLSL2 (by

Page 1393
element)
SMLSL, SMLSL2 (vector)

Signed Multiply-Subtract Long (vector). This instruction multiplies corresponding signed integer values in the lower or
upper half of the vectors of the two source SIMD&FP registers, and subtracts the results from the vector elements of
the destination SIMD&FP register. The destination vector elements are twice as long as the elements that are
multiplied.
The SMLSL instruction extracts each source vector from the lower half of each source register. The SMLSL2 instruction
extracts each source vector from the upper half of each source register.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 size 1 Rm 1 0 1 0 0 0 Rn Rd
U o1

SMLSL{2} <Vd>.<Ta>, <Vn>.<Tb>, <Vm>.<Tb>

integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);

if size == '11' then UNDEFINED;

integer esize = 8 << UInt(size);
integer datasize = 64;
integer part = UInt(Q);
integer elements = datasize DIV esize;
boolean sub_op = (o1 == '1');
boolean unsigned = (U == '1');

Assembler Symbols

2 Is the second and upper half specifier. If present it causes the operation to be performed on the upper
64 bits of the registers holding the narrower elements, and is encoded in “Q”:

Q 2
0 [absent]
1 [present]

<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.

<Ta> Is an arrangement specifier, encoded in “size”:

size <Ta>
00 8H
01 4S
10 2D
11 RESERVED

<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.

<Tb> Is an arrangement specifier, encoded in “size:Q”:

size Q <Tb>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 x RESERVED

<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.

SMLSL, SMLSL2 (vector) Page 1394

Operation

V[d] = result;

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

SMLSL, SMLSL2 (vector) Page 1395

SMMLA (vector)

Signed 8-bit integer matrix multiply-accumulate. This instruction multiplies the 2x8 matrix of signed 8-bit integer
values in the first source vector by the 8x2 matrix of signed 8-bit integer values in the second source vector. The
resulting 2x2 32-bit integer matrix product is destructively added to the 32-bit integer matrix accumulator in the
destination vector. This is equivalent to performing an 8-way dot product per destination element.
From Armv8.2 to Armv8.5, this is an OPTIONAL instruction. From Armv8.6 it is mandatory for implementations that
include Advanced SIMD to support it. ID_AA64ISAR1_EL1.I8MM indicates whether this instruction is supported.

Vector
(FEAT_I8MM)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 0 1 1 1 0 1 0 0 Rm 1 0 1 0 0 1 Rn Rd
U B

SMMLA <Vd>.4S, <Vn>.16B, <Vm>.16B

if !HaveInt8MatMulExt() then UNDEFINED;

integer n = UInt(Rn);
integer m = UInt(Rm);
integer d = UInt(Rd);

Assembler Symbols

Operation

CheckFPAdvSIMDEnabled64();
bits(128) operand1 = V[n];
bits(128) operand2 = V[m];
bits(128) addend = V[d];

V[d] = MatMulAdd(addend, operand1, operand2, FALSE, FALSE);

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

SMMLA (vector) Page 1396

SMOV

Signed Move vector element to general-purpose register. This instruction reads the signed integer from the source
SIMD&FP register, sign-extends it to form a 32-bit or 64-bit value, and writes the result to destination general-purpose
register.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 0 0 0 imm5 0 0 1 0 1 1 Rn Rd

32-bit (Q == 0)

SMOV <Wd>, <Vn>.<Ts>[<index>]

64-bit (Q == 1)

SMOV <Xd>, <Vn>.<Ts>[<index>]

integer d = UInt(Rd);
integer n = UInt(Rn);

integer size;
case Q:imm5 of
when 'xxxxx1' size = 0; // SMOV [WX]d, Vn.B
when 'xxxx10' size = 1; // SMOV [WX]d, Vn.H
when '1xx100' size = 2; // SMOV Xd, Vn.S
otherwise UNDEFINED;

integer idxdsize = if imm5<4> == '1' then 128 else 64;

integer index = UInt(imm5<4:size+1>);
integer esize = 8 << size;
integer datasize = if Q == '1' then 64 else 32;

Assembler Symbols

<Ts> For the 32-bit variant: is an element size specifier, encoded in “imm5”:

imm5 <Ts>
xxx00 RESERVED
xxxx1 B
xxx10 H

For the 64-bit variant: is an element size specifier, encoded in “imm5”:

imm5 <Ts>
xx000 RESERVED
xxxx1 B
xxx10 H
xx100 S

<index> For the 32-bit variant: is the element index encoded in “imm5”:

imm5 <index>
xxx00 RESERVED
xxxx1 imm5<4:1>
xxx10 imm5<4:2>

SMOV Page 1397

For the 64-bit variant: is the element index encoded in “imm5”:

imm5 <index>
xx000 RESERVED
xxxx1 imm5<4:1>
xxx10 imm5<4:2>
xx100 imm5<4:3>

Operation

CheckFPAdvSIMDEnabled64();
bits(idxdsize) operand = V[n];

X[d] = SignExtend(Elem[operand, index, esize], datasize);

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

SMOV Page 1398

SMULL, SMULL2 (by element)

Signed Multiply Long (vector, by element). This instruction multiplies each vector element in the lower or upper half of
the first source SIMD&FP register by the specified vector element of the second source SIMD&FP register, places the
result in a vector, and writes the vector to the destination SIMD&FP register. The destination vector elements are
twice as long as the elements that are multiplied.
The SMULL instruction extracts vector elements from the lower half of the first source register. The SMULL2 instruction
extracts vector elements from the upper half of the first source register.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 1 size L M Rm 1 0 1 0 H 0 Rn Rd
U

SMULL{2} <Vd>.<Ta>, <Vn>.<Tb>, <Vm>.<Ts>[<index>]

integer idxdsize = if H == '1' then 128 else 64;

integer index;
bit Rmhi;
case size of
when '01' index = UInt(H:L:M); Rmhi = '0';
when '10' index = UInt(H:L); Rmhi = M;
otherwise UNDEFINED;

integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rmhi:Rm);

integer esize = 8 << UInt(size);

integer datasize = 64;
integer part = UInt(Q);
integer elements = datasize DIV esize;
boolean unsigned = (U == '1');

Assembler Symbols

2 Is the second and upper half specifier. If present it causes the operation to be performed on the upper
64 bits of the registers holding the narrower elements, and is encoded in “Q”:

Q 2
0 [absent]
1 [present]

<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.

<Ta> Is an arrangement specifier, encoded in “size”:

size <Ta>
00 RESERVED
01 4S
10 2D
11 RESERVED

<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.

<Tb> Is an arrangement specifier, encoded in “size:Q”:

SMULL, SMULL2 (by

Page 1399
element)
size Q <Tb>
00 x RESERVED
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 x RESERVED

<Vm> Is the name of the second SIMD&FP source register, encoded in “size:M:Rm”:

size <Vm>
00 RESERVED
01 0:Rm
10 M:Rm
11 RESERVED
Restricted to V0-V15 when element size <Ts> is H.

<Ts> Is an element size specifier, encoded in “size”:

size <Ts>
00 RESERVED
01 H
10 S
11 RESERVED

<index> Is the element index, encoded in “size:L:H:M”:

size <index>
00 RESERVED
01 H:L:M
10 H:L
11 RESERVED

Operation

CheckFPAdvSIMDEnabled64();
bits(datasize) operand1 = Vpart[n, part];
bits(idxdsize) operand2 = V[m];
bits(2*datasize) result;
integer element1;
integer element2;
bits(2*esize) product;

element2 = Int(Elem[operand2, index, esize], unsigned);

for e = 0 to elements-1
element1 = Int(Elem[operand1, e, esize], unsigned);
product = (element1*element2)<2*esize-1:0>;
Elem[result, e, 2*esize] = product;

V[d] = result;

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

SMULL, SMULL2 (by

Page 1400
element)
SMULL, SMULL2 (vector)

Signed Multiply Long (vector). This instruction multiplies corresponding signed integer values in the lower or upper
half of the vectors of the two source SIMD&FP registers, places the results in a vector, and writes the vector to the
destination SIMD&FP register.
The destination vector elements are twice as long as the elements that are multiplied.
The SMULL instruction extracts each source vector from the lower half of each source register. The SMULL2 instruction
extracts each source vector from the upper half of each source register.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 size 1 Rm 1 1 0 0 0 0 Rn Rd
U

SMULL{2} <Vd>.<Ta>, <Vn>.<Tb>, <Vm>.<Tb>

integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);

if size == '11' then UNDEFINED;

integer esize = 8 << UInt(size);
integer datasize = 64;
integer part = UInt(Q);
integer elements = datasize DIV esize;

boolean unsigned = (U == '1');

Assembler Symbols

2 Is the second and upper half specifier. If present it causes the operation to be performed on the upper
64 bits of the registers holding the narrower elements, and is encoded in “Q”:

Q 2
0 [absent]
1 [present]

<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.

<Ta> Is an arrangement specifier, encoded in “size”:

size <Ta>
00 8H
01 4S
10 2D
11 RESERVED

<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.

<Tb> Is an arrangement specifier, encoded in “size:Q”:

size Q <Tb>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 x RESERVED

<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.

SMULL, SMULL2 (vector) Page 1401

Operation

CheckFPAdvSIMDEnabled64();
bits(datasize) operand1 = Vpart[n, part];
bits(datasize) operand2 = Vpart[m, part];
bits(2*datasize) result;
integer element1;
integer element2;

for e = 0 to elements-1
element1 = Int(Elem[operand1, e, esize], unsigned);
element2 = Int(Elem[operand2, e, esize], unsigned);
Elem[result, e, 2*esize] = (element1*element2)<2*esize-1:0>;

V[d] = result;

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

SMULL, SMULL2 (vector) Page 1402

SQABS

Signed saturating Absolute value. This instruction reads each vector element from the source SIMD&FP register, puts
the absolute value of the result into a vector, and writes the vector to the destination SIMD&FP register. All the values
in this instruction are signed integer values.
If overflow occurs with any of the results, those results are saturated. If saturation occurs, the cumulative saturation
bit FPSR.QC is set.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: Scalar and Vector

Scalar

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 0 size 1 0 0 0 0 0 0 1 1 1 1 0 Rn Rd
U

SQABS <V><d>, <V><n>

integer d = UInt(Rd);
integer n = UInt(Rn);

integer esize = 8 << UInt(size);

integer datasize = esize;
integer elements = 1;
boolean neg = (U == '1');

Vector

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 size 1 0 0 0 0 0 0 1 1 1 1 0 Rn Rd
U

SQABS <Vd>.<T>, <Vn>.<T>

integer d = UInt(Rd);
integer n = UInt(Rn);

if size:Q == '110' then UNDEFINED;

integer esize = 8 << UInt(size);
integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;
boolean neg = (U == '1');

Assembler Symbols

<V> Is a width specifier, encoded in “size”:

size <V>
00 B
01 H
10 S
11 D

<T> Is an arrangement specifier, encoded in “size:Q”:

SQABS Page 1403

size Q <T>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 0 RESERVED
11 1 2D

<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.

Operation

CheckFPAdvSIMDEnabled64();
bits(datasize) operand = V[n];
bits(datasize) result;
integer element;
boolean sat;

for e = 0 to elements-1
element = SInt(Elem[operand, e, esize]);
if neg then
element = -element;
else
element = Abs(element);
(Elem[result, e, esize], sat) = SignedSatQ(element, esize);
if sat then FPSR.QC = '1';

V[d] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

SQABS Page 1404

SQADD

Signed saturating Add. This instruction adds the values of corresponding elements of the two source SIMD&FP
registers, places the results into a vector, and writes the vector to the destination SIMD&FP register.
If overflow occurs with any of the results, those results are saturated. If saturation occurs, the cumulative saturation
bit FPSR.QC is set.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: Scalar and Vector

Scalar

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 0 size 1 Rm 0 0 0 0 1 1 Rn Rd
U

SQADD <V><d>, <V><n>, <V><m>

integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer esize = 8 << UInt(size);
integer datasize = esize;
integer elements = 1;
boolean unsigned = (U == '1');

Vector

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 size 1 Rm 0 0 0 0 1 1 Rn Rd
U

SQADD <Vd>.<T>, <Vn>.<T>, <Vm>.<T>

Assembler Symbols

<V> Is a width specifier, encoded in “size”:

size <V>
00 B
01 H
10 S
11 D

SQADD Page 1405

<T> Is an arrangement specifier, encoded in “size:Q”:

size Q <T>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 0 RESERVED
11 1 2D

<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.

Operation

CheckFPAdvSIMDEnabled64();
bits(datasize) operand1 = V[n];
bits(datasize) operand2 = V[m];
bits(datasize) result;
integer element1;
integer element2;
integer sum;
boolean sat;

for e = 0 to elements-1
element1 = Int(Elem[operand1, e, esize], unsigned);
element2 = Int(Elem[operand2, e, esize], unsigned);
sum = element1 + element2;
(Elem[result, e, esize], sat) = SatQ(sum, esize, unsigned);
if sat then FPSR.QC = '1';

V[d] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

SQADD Page 1406

SQDMLAL, SQDMLAL2 (by element)

Signed saturating Doubling Multiply-Add Long (by element). This instruction multiplies each vector element in the
lower or upper half of the first source SIMD&FP register by the specified vector element of the second source
SIMD&FP register, doubles the results, and accumulates the final results with the vector elements of the destination
SIMD&FP register. The destination vector elements are twice as long as the elements that are multiplied.
If overflow occurs with any of the results, those results are saturated. If saturation occurs, the cumulative saturation
bit FPSR.QC is set.
The SQDMLAL instruction extracts vector elements from the lower half of the first source register. The SQDMLAL2
instruction extracts vector elements from the upper half of the first source register.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: Scalar and Vector

Scalar

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 1 size L M Rm 0 0 1 1 H 0 Rn Rd
o2

SQDMLAL <Va><d>, <Vb><n>, <Vm>.<Ts>[<index>]

integer idxdsize = if H == '1' then 128 else 64;

integer index;
bit Rmhi;
case size of
when '01' index = UInt(H:L:M); Rmhi = '0';
when '10' index = UInt(H:L); Rmhi = M;
otherwise UNDEFINED;

integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rmhi:Rm);

integer esize = 8 << UInt(size);

integer datasize = esize;
integer elements = 1;
integer part = 0;

boolean sub_op = (o2 == '1');

Vector

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 1 size L M Rm 0 0 1 1 H 0 Rn Rd
o2

SQDMLAL, SQDMLAL2 (by

Page 1407
element)
SQDMLAL{2} <Vd>.<Ta>, <Vn>.<Tb>, <Vm>.<Ts>[<index>]

integer idxdsize = if H == '1' then 128 else 64;

integer index;
bit Rmhi;
case size of
when '01' index = UInt(H:L:M); Rmhi = '0';
when '10' index = UInt(H:L); Rmhi = M;
otherwise UNDEFINED;

integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rmhi:Rm);

integer esize = 8 << UInt(size);

integer datasize = 64;
integer part = UInt(Q);
integer elements = datasize DIV esize;

boolean sub_op = (o2 == '1');

Assembler Symbols

2 Is the second and upper half specifier. If present it causes the operation to be performed on the upper
64 bits of the registers holding the narrower elements, and is encoded in “Q”:

Q 2
0 [absent]
1 [present]

<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.

<Ta> Is an arrangement specifier, encoded in “size”:

size <Ta>
00 RESERVED
01 4S
10 2D
11 RESERVED

<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.

<Tb> Is an arrangement specifier, encoded in “size:Q”:

size Q <Tb>
00 x RESERVED
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 x RESERVED

<Va> Is the destination width specifier, encoded in “size”:

size <Va>
00 RESERVED
01 S
10 D
11 RESERVED

<d> Is the number of the SIMD&FP destination register, encoded in the "Rd" field.

<Vb> Is the source width specifier, encoded in “size”:

SQDMLAL, SQDMLAL2 (by

Page 1408
element)
size <Vb>
00 RESERVED
01 H
10 S
11 RESERVED

<n> Is the number of the first SIMD&FP source register, encoded in the "Rn" field.

<Vm> Is the name of the second SIMD&FP source register, encoded in “size:M:Rm”:

size <Vm>
00 RESERVED
01 0:Rm
10 M:Rm
11 RESERVED
Restricted to V0-V15 when element size <Ts> is H.

<Ts> Is an element size specifier, encoded in “size”:

size <Ts>
00 RESERVED
01 H
10 S
11 RESERVED

<index> Is the element index, encoded in “size:L:H:M”:

size <index>
00 RESERVED
01 H:L:M
10 H:L
11 RESERVED

Operation

element2 = SInt(Elem[operand2, index, esize]);

for e = 0 to elements-1
element1 = SInt(Elem[operand1, e, esize]);
(product, sat1) = SignedSatQ(2 * element1 * element2, 2 * esize);
if sub_op then
accum = SInt(Elem[operand3, e, 2*esize]) - SInt(product);
else
accum = SInt(Elem[operand3, e, 2*esize]) + SInt(product);
(Elem[result, e, 2*esize], sat2) = SignedSatQ(accum, 2 * esize);
if sat1 || sat2 then FPSR.QC = '1';

V[d] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

SQDMLAL, SQDMLAL2 (by

Page 1409
element)
SQDMLAL, SQDMLAL2 (vector)

Signed saturating Doubling Multiply-Add Long. This instruction multiplies corresponding signed integer values in the
lower or upper half of the vectors of the two source SIMD&FP registers, doubles the results, and accumulates the final
results with the vector elements of the destination SIMD&FP register. The destination vector elements are twice as
long as the elements that are multiplied.
If overflow occurs with any of the results, those results are saturated. If saturation occurs, the cumulative saturation
bit FPSR.QC is set.
The SQDMLAL instruction extracts each source vector from the lower half of each source register. The SQDMLAL2
instruction extracts each source vector from the upper half of each source register.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: Scalar and Vector

Scalar

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 0 size 1 Rm 1 0 0 1 0 0 Rn Rd
o1

SQDMLAL <Va><d>, <Vb><n>, <Vb><m>

integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);

if size == '00' || size == '11' then UNDEFINED;

integer esize = 8 << UInt(size);
integer datasize = esize;
integer elements = 1;
integer part = 0;

boolean sub_op = (o1 == '1');

Vector

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 size 1 Rm 1 0 0 1 0 0 Rn Rd
o1

SQDMLAL{2} <Vd>.<Ta>, <Vn>.<Tb>, <Vm>.<Tb>

integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);

if size == '00' || size == '11' then UNDEFINED;

integer esize = 8 << UInt(size);
integer datasize = 64;
integer part = UInt(Q);
integer elements = datasize DIV esize;

boolean sub_op = (o1 == '1');

Assembler Symbols

2 Is the second and upper half specifier. If present it causes the operation to be performed on the upper
64 bits of the registers holding the narrower elements, and is encoded in “Q”:

SQDMLAL, SQDMLAL2
Page 1410
(vector)
Q 2
0 [absent]
1 [present]

<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.

<Ta> Is an arrangement specifier, encoded in “size”:

size <Ta>
00 RESERVED
01 4S
10 2D
11 RESERVED

<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.

<Tb> Is an arrangement specifier, encoded in “size:Q”:

size Q <Tb>
00 x RESERVED
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 x RESERVED

<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.

<Va> Is the destination width specifier, encoded in “size”:

size <Va>
00 RESERVED
01 S
10 D
11 RESERVED

<d> Is the number of the SIMD&FP destination register, encoded in the "Rd" field.

<Vb> Is the source width specifier, encoded in “size”:

size <Vb>
00 RESERVED
01 H
10 S
11 RESERVED

<n> Is the number of the first SIMD&FP source register, encoded in the "Rn" field.
<m> Is the number of the second SIMD&FP source register, encoded in the "Rm" field.

SQDMLAL, SQDMLAL2
Page 1411
(vector)
Operation

for e = 0 to elements-1
element1 = SInt(Elem[operand1, e, esize]);
element2 = SInt(Elem[operand2, e, esize]);
(product, sat1) = SignedSatQ(2 * element1 * element2, 2 * esize);
if sub_op then
accum = SInt(Elem[operand3, e, 2*esize]) - SInt(product);
else
accum = SInt(Elem[operand3, e, 2*esize]) + SInt(product);
(Elem[result, e, 2*esize], sat2) = SignedSatQ(accum, 2 * esize);
if sat1 || sat2 then FPSR.QC = '1';

V[d] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

SQDMLAL, SQDMLAL2
Page 1412
(vector)
SQDMLSL, SQDMLSL2 (by element)

Signed saturating Doubling Multiply-Subtract Long (by element). This instruction multiplies each vector element in
the lower or upper half of the first source SIMD&FP register by the specified vector element of the second source
SIMD&FP register, doubles the results, and subtracts the final results from the vector elements of the destination
SIMD&FP register. The destination vector elements are twice as long as the elements that are multiplied. All the
values in this instruction are signed integer values.
If overflow occurs with any of the results, those results are saturated. If saturation occurs, the cumulative saturation
bit FPSR.QC is set.
The SQDMLSL instruction extracts vector elements from the lower half of the first source register. The SQDMLSL2
instruction extracts vector elements from the upper half of the first source register.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: Scalar and Vector

Scalar

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 1 size L M Rm 0 1 1 1 H 0 Rn Rd
o2

SQDMLSL <Va><d>, <Vb><n>, <Vm>.<Ts>[<index>]

integer idxdsize = if H == '1' then 128 else 64;

integer index;
bit Rmhi;
case size of
when '01' index = UInt(H:L:M); Rmhi = '0';
when '10' index = UInt(H:L); Rmhi = M;
otherwise UNDEFINED;

integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rmhi:Rm);

integer esize = 8 << UInt(size);

integer datasize = esize;
integer elements = 1;
integer part = 0;

boolean sub_op = (o2 == '1');

Vector

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 1 size L M Rm 0 1 1 1 H 0 Rn Rd
o2

SQDMLSL, SQDMLSL2 (by

Page 1413
element)
SQDMLSL{2} <Vd>.<Ta>, <Vn>.<Tb>, <Vm>.<Ts>[<index>]

integer idxdsize = if H == '1' then 128 else 64;

integer index;
bit Rmhi;
case size of
when '01' index = UInt(H:L:M); Rmhi = '0';
when '10' index = UInt(H:L); Rmhi = M;
otherwise UNDEFINED;

integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rmhi:Rm);

integer esize = 8 << UInt(size);

integer datasize = 64;
integer part = UInt(Q);
integer elements = datasize DIV esize;

boolean sub_op = (o2 == '1');

Assembler Symbols

2 Is the second and upper half specifier. If present it causes the operation to be performed on the upper
64 bits of the registers holding the narrower elements, and is encoded in “Q”:

Q 2
0 [absent]
1 [present]

<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.

<Ta> Is an arrangement specifier, encoded in “size”:

size <Ta>
00 RESERVED
01 4S
10 2D
11 RESERVED

<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.

<Tb> Is an arrangement specifier, encoded in “size:Q”:

size Q <Tb>
00 x RESERVED
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 x RESERVED

<Va> Is the destination width specifier, encoded in “size”:

size <Va>
00 RESERVED
01 S
10 D
11 RESERVED

<d> Is the number of the SIMD&FP destination register, encoded in the "Rd" field.

<Vb> Is the source width specifier, encoded in “size”:

SQDMLSL, SQDMLSL2 (by

Page 1414
element)
size <Vb>
00 RESERVED
01 H
10 S
11 RESERVED

<n> Is the number of the first SIMD&FP source register, encoded in the "Rn" field.

<Vm> Is the name of the second SIMD&FP source register, encoded in “size:M:Rm”:

size <Vm>
00 RESERVED
01 0:Rm
10 M:Rm
11 RESERVED
Restricted to V0-V15 when element size <Ts> is H.

<Ts> Is an element size specifier, encoded in “size”:

size <Ts>
00 RESERVED
01 H
10 S
11 RESERVED

<index> Is the element index, encoded in “size:L:H:M”:

size <index>
00 RESERVED
01 H:L:M
10 H:L
11 RESERVED

Operation

element2 = SInt(Elem[operand2, index, esize]);

V[d] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

SQDMLSL, SQDMLSL2 (by

Page 1415
element)
SQDMLSL, SQDMLSL2 (vector)

Signed saturating Doubling Multiply-Subtract Long. This instruction multiplies corresponding signed integer values in
the lower or upper half of the vectors of the two source SIMD&FP registers, doubles the results, and subtracts the
final results from the vector elements of the destination SIMD&FP register. The destination vector elements are twice
as long as the elements that are multiplied.
If overflow occurs with any of the results, those results are saturated. If saturation occurs, the cumulative saturation
bit FPSR.QC is set.
The SQDMLSL instruction extracts each source vector from the lower half of each source register. The SQDMLSL2
instruction extracts each source vector from the upper half of each source register.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: Scalar and Vector

Scalar

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 0 size 1 Rm 1 0 1 1 0 0 Rn Rd
o1

SQDMLSL <Va><d>, <Vb><n>, <Vb><m>

integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);

if size == '00' || size == '11' then UNDEFINED;

integer esize = 8 << UInt(size);
integer datasize = esize;
integer elements = 1;
integer part = 0;

boolean sub_op = (o1 == '1');

Vector

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 size 1 Rm 1 0 1 1 0 0 Rn Rd
o1

SQDMLSL{2} <Vd>.<Ta>, <Vn>.<Tb>, <Vm>.<Tb>

integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);

if size == '00' || size == '11' then UNDEFINED;

integer esize = 8 << UInt(size);
integer datasize = 64;
integer part = UInt(Q);
integer elements = datasize DIV esize;

boolean sub_op = (o1 == '1');

Assembler Symbols

2 Is the second and upper half specifier. If present it causes the operation to be performed on the upper
64 bits of the registers holding the narrower elements, and is encoded in “Q”:

SQDMLSL, SQDMLSL2
Page 1416
(vector)
Q 2
0 [absent]
1 [present]

<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.

<Ta> Is an arrangement specifier, encoded in “size”:

size <Ta>
00 RESERVED
01 4S
10 2D
11 RESERVED

<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.

<Tb> Is an arrangement specifier, encoded in “size:Q”:

size Q <Tb>
00 x RESERVED
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 x RESERVED

<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.

<Va> Is the destination width specifier, encoded in “size”:

size <Va>
00 RESERVED
01 S
10 D
11 RESERVED

<d> Is the number of the SIMD&FP destination register, encoded in the "Rd" field.

<Vb> Is the source width specifier, encoded in “size”:

size <Vb>
00 RESERVED
01 H
10 S
11 RESERVED

<n> Is the number of the first SIMD&FP source register, encoded in the "Rn" field.
<m> Is the number of the second SIMD&FP source register, encoded in the "Rm" field.

SQDMLSL, SQDMLSL2
Page 1417
(vector)
Operation

V[d] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

SQDMLSL, SQDMLSL2
Page 1418
(vector)
SQDMULH (by element)

Signed saturating Doubling Multiply returning High half (by element). This instruction multiplies each vector element
in the first source SIMD&FP register by the specified vector element of the second source SIMD&FP register, doubles
the results, places the most significant half of the final results into a vector, and writes the vector to the destination
SIMD&FP register.
The results are truncated. For rounded results, see SQRDMULH.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: Scalar and Vector

Scalar

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 1 size L M Rm 1 1 0 0 H 0 Rn Rd
op

SQDMULH <V><d>, <V><n>, <Vm>.<Ts>[<index>]

integer idxdsize = if H == '1' then 128 else 64;

integer index;
bit Rmhi;
case size of
when '01' index = UInt(H:L:M); Rmhi = '0';
when '10' index = UInt(H:L); Rmhi = M;
otherwise UNDEFINED;

integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rmhi:Rm);

integer esize = 8 << UInt(size);

integer datasize = esize;
integer elements = 1;

boolean round = (op == '1');

Vector

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 1 size L M Rm 1 1 0 0 H 0 Rn Rd
op

SQDMULH <Vd>.<T>, <Vn>.<T>, <Vm>.<Ts>[<index>]

integer idxdsize = if H == '1' then 128 else 64;

integer index;
bit Rmhi;
case size of
when '01' index = UInt(H:L:M); Rmhi = '0';
when '10' index = UInt(H:L); Rmhi = M;
otherwise UNDEFINED;

integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rmhi:Rm);

integer esize = 8 << UInt(size);

integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;

boolean round = (op == '1');

SQDMULH (by element) Page 1419

Assembler Symbols

<V> Is a width specifier, encoded in “size”:

size <V>
00 RESERVED
01 H
10 S
11 RESERVED

<T> Is an arrangement specifier, encoded in “size:Q”:

size Q <T>
00 x RESERVED
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 x RESERVED

<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.

<Vm> Is the name of the second SIMD&FP source register, encoded in “size:M:Rm”:

size <Vm>
00 RESERVED
01 0:Rm
10 M:Rm
11 RESERVED
Restricted to V0-V15 when element size <Ts> is H.

<Ts> Is an element size specifier, encoded in “size”:

size <Ts>
00 RESERVED
01 H
10 S
11 RESERVED

<index> Is the element index, encoded in “size:L:H:M”:

size <index>
00 RESERVED
01 H:L:M
10 H:L
11 RESERVED

SQDMULH (by element) Page 1420

Operation

CheckFPAdvSIMDEnabled64();
bits(datasize) operand1 = V[n];
bits(idxdsize) operand2 = V[m];
bits(datasize) result;
integer round_const = if round then 1 << (esize - 1) else 0;
integer element1;
integer element2;
integer product;
boolean sat;

element2 = SInt(Elem[operand2, index, esize]);

for e = 0 to elements-1
element1 = SInt(Elem[operand1, e, esize]);
product = (2 * element1 * element2) + round_const;
// The following only saturates if element1 and element2 equal -(2^(esize-1))
(Elem[result, e, esize], sat) = SignedSatQ(product >> esize, esize);
if sat then FPSR.QC = '1';

V[d] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

SQDMULH (by element) Page 1421

SQDMULH (vector)

Signed saturating Doubling Multiply returning High half. This instruction multiplies the values of corresponding
elements of the two source SIMD&FP registers, doubles the results, places the most significant half of the final results
into a vector, and writes the vector to the destination SIMD&FP register.
The results are truncated. For rounded results, see SQRDMULH.
If overflow occurs with any of the results, those results are saturated. If saturation occurs, the cumulative saturation
bit FPSR.QC is set.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: Scalar and Vector

Scalar

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 0 size 1 Rm 1 0 1 1 0 1 Rn Rd
U

SQDMULH <V><d>, <V><n>, <V><m>

integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
if size == '11' || size == '00' then UNDEFINED;
integer esize = 8 << UInt(size);
integer datasize = esize;
integer elements = 1;
boolean rounding = (U == '1');

Vector

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 size 1 Rm 1 0 1 1 0 1 Rn Rd
U

SQDMULH <Vd>.<T>, <Vn>.<T>, <Vm>.<T>

integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
if size == '11' || size == '00' then UNDEFINED;
integer esize = 8 << UInt(size);
integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;
boolean rounding = (U == '1');

Assembler Symbols

<V> Is a width specifier, encoded in “size”:

size <V>
00 RESERVED
01 H
10 S
11 RESERVED

SQDMULH (vector) Page 1422

<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.

<T> Is an arrangement specifier, encoded in “size:Q”:

size Q <T>
00 x RESERVED
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 x RESERVED

<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.

Operation

CheckFPAdvSIMDEnabled64();
bits(datasize) operand1 = V[n];
bits(datasize) operand2 = V[m];
bits(datasize) result;
integer round_const = if rounding then 1 << (esize - 1) else 0;
integer element1;
integer element2;
integer product;
boolean sat;

for e = 0 to elements-1
element1 = SInt(Elem[operand1, e, esize]);
element2 = SInt(Elem[operand2, e, esize]);
product = (2 * element1 * element2) + round_const;
(Elem[result, e, esize], sat) = SignedSatQ(product >> esize, esize);
if sat then FPSR.QC = '1';

V[d] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

SQDMULH (vector) Page 1423

SQDMULL, SQDMULL2 (by element)

Signed saturating Doubling Multiply Long (by element). This instruction multiplies each vector element in the lower or
upper half of the first source SIMD&FP register by the specified vector element of the second source SIMD&FP
register, doubles the results, places the final results in a vector, and writes the vector to the destination SIMD&FP
register. All the values in this instruction are signed integer values.
If overflow occurs with any of the results, those results are saturated. If saturation occurs, the cumulative saturation
bit FPSR.QC is set.
The SQDMULL instruction extracts the first source vector from the lower half of the first source register. The SQDMULL2
instruction extracts the first source vector from the upper half of the first source register.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: Scalar and Vector

Scalar

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 1 size L M Rm 1 0 1 1 H 0 Rn Rd

SQDMULL <Va><d>, <Vb><n>, <Vm>.<Ts>[<index>]

integer idxdsize = if H == '1' then 128 else 64;

integer index;
bit Rmhi;
case size of
when '01' index = UInt(H:L:M); Rmhi = '0';
when '10' index = UInt(H:L); Rmhi = M;
otherwise UNDEFINED;

integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rmhi:Rm);

integer esize = 8 << UInt(size);

integer datasize = esize;
integer elements = 1;
integer part = 0;

Vector

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 1 size L M Rm 1 0 1 1 H 0 Rn Rd

SQDMULL{2} <Vd>.<Ta>, <Vn>.<Tb>, <Vm>.<Ts>[<index>]

integer idxdsize = if H == '1' then 128 else 64;

integer index;
bit Rmhi;
case size of
when '01' index = UInt(H:L:M); Rmhi = '0';
when '10' index = UInt(H:L); Rmhi = M;
otherwise UNDEFINED;

integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rmhi:Rm);

integer esize = 8 << UInt(size);

integer datasize = 64;
integer part = UInt(Q);
integer elements = datasize DIV esize;

SQDMULL, SQDMULL2 (by

Page 1424
element)
Assembler Symbols

2 Is the second and upper half specifier. If present it causes the operation to be performed on the upper
64 bits of the registers holding the narrower elements, and is encoded in “Q”:

Q 2
0 [absent]
1 [present]

<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.

<Ta> Is an arrangement specifier, encoded in “size”:

size <Ta>
00 RESERVED
01 4S
10 2D
11 RESERVED

<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.

<Tb> Is an arrangement specifier, encoded in “size:Q”:

size Q <Tb>
00 x RESERVED
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 x RESERVED

<Va> Is the destination width specifier, encoded in “size”:

size <Va>
00 RESERVED
01 S
10 D
11 RESERVED

<d> Is the number of the SIMD&FP destination register, encoded in the "Rd" field.

<Vb> Is the source width specifier, encoded in “size”:

size <Vb>
00 RESERVED
01 H
10 S
11 RESERVED

<n> Is the number of the first SIMD&FP source register, encoded in the "Rn" field.

<Vm> Is the name of the second SIMD&FP source register, encoded in “size:M:Rm”:

size <Vm>
00 RESERVED
01 0:Rm
10 M:Rm
11 RESERVED
Restricted to V0-V15 when element size <Ts> is H.

<Ts> Is an element size specifier, encoded in “size”:

SQDMULL, SQDMULL2 (by

Page 1425
element)
size <Ts>
00 RESERVED
01 H
10 S
11 RESERVED

<index> Is the element index, encoded in “size:L:H:M”:

size <index>
00 RESERVED
01 H:L:M
10 H:L
11 RESERVED

Operation

CheckFPAdvSIMDEnabled64();

bits(datasize) operand1 = Vpart[n, part];

bits(idxdsize) operand2 = V[m];
bits(2*datasize) result;
integer element1;
integer element2;
bits(2*esize) product;
boolean sat;

element2 = SInt(Elem[operand2, index, esize]);

for e = 0 to elements-1
element1 = SInt(Elem[operand1, e, esize]);
(product, sat) = SignedSatQ(2 * element1 * element2, 2 * esize);
Elem[result, e, 2*esize] = product;
if sat then FPSR.QC = '1';

V[d] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

SQDMULL, SQDMULL2 (by

Page 1426
element)
SQDMULL, SQDMULL2 (vector)

Signed saturating Doubling Multiply Long. This instruction multiplies corresponding vector elements in the lower or
upper half of the two source SIMD&FP registers, doubles the results, places the final results in a vector, and writes
the vector to the destination SIMD&FP register.
If overflow occurs with any of the results, those results are saturated. If saturation occurs, the cumulative saturation
bit FPSR.QC is set.
The SQDMULL instruction extracts each source vector from the lower half of each source register. The SQDMULL2
instruction extracts each source vector from the upper half of each source register.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: Scalar and Vector

Scalar

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 0 size 1 Rm 1 1 0 1 0 0 Rn Rd

SQDMULL <Va><d>, <Vb><n>, <Vb><m>

integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);

if size == '00' || size == '11' then UNDEFINED;

integer esize = 8 << UInt(size);
integer datasize = esize;
integer elements = 1;
integer part = 0;

Vector

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 size 1 Rm 1 1 0 1 0 0 Rn Rd

SQDMULL{2} <Vd>.<Ta>, <Vn>.<Tb>, <Vm>.<Tb>

integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);

if size == '00' || size == '11' then UNDEFINED;

integer esize = 8 << UInt(size);
integer datasize = 64;
integer part = UInt(Q);
integer elements = datasize DIV esize;

Assembler Symbols

2 Is the second and upper half specifier. If present it causes the operation to be performed on the upper
64 bits of the registers holding the narrower elements, and is encoded in “Q”:

Q 2
0 [absent]
1 [present]

<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.

<Ta> Is an arrangement specifier, encoded in “size”:

SQDMULL, SQDMULL2
Page 1427
(vector)
size <Ta>
00 RESERVED
01 4S
10 2D
11 RESERVED

<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.

<Tb> Is an arrangement specifier, encoded in “size:Q”:

size Q <Tb>
00 x RESERVED
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 x RESERVED

<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.

<Va> Is the destination width specifier, encoded in “size”:

size <Va>
00 RESERVED
01 S
10 D
11 RESERVED

<d> Is the number of the SIMD&FP destination register, encoded in the "Rd" field.

<Vb> Is the source width specifier, encoded in “size”:

size <Vb>
00 RESERVED
01 H
10 S
11 RESERVED

<n> Is the number of the first SIMD&FP source register, encoded in the "Rn" field.
<m> Is the number of the second SIMD&FP source register, encoded in the "Rm" field.

Operation

CheckFPAdvSIMDEnabled64();
bits(datasize) operand1 = Vpart[n, part];
bits(datasize) operand2 = Vpart[m, part];
bits(2*datasize) result;
integer element1;
integer element2;
bits(2*esize) product;
boolean sat;

for e = 0 to elements-1
element1 = SInt(Elem[operand1, e, esize]);
element2 = SInt(Elem[operand2, e, esize]);
(product, sat) = SignedSatQ(2 * element1 * element2, 2 * esize);
Elem[result, e, 2*esize] = product;
if sat then FPSR.QC = '1';

V[d] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

SQDMULL, SQDMULL2
Page 1428
(vector)
SQNEG

Signed saturating Negate. This instruction reads each vector element from the source SIMD&FP register, negates
each value, places the result into a vector, and writes the vector to the destination SIMD&FP register. All the values in
this instruction are signed integer values.
If overflow occurs with any of the results, those results are saturated. If saturation occurs, the cumulative saturation
bit FPSR.QC is set.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: Scalar and Vector

Scalar

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 1 1 0 size 1 0 0 0 0 0 0 1 1 1 1 0 Rn Rd
U

SQNEG <V><d>, <V><n>

integer d = UInt(Rd);
integer n = UInt(Rn);

integer esize = 8 << UInt(size);

integer datasize = esize;
integer elements = 1;
boolean neg = (U == '1');

Vector

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 size 1 0 0 0 0 0 0 1 1 1 1 0 Rn Rd
U

SQNEG <Vd>.<T>, <Vn>.<T>

integer d = UInt(Rd);
integer n = UInt(Rn);

if size:Q == '110' then UNDEFINED;

integer esize = 8 << UInt(size);
integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;
boolean neg = (U == '1');

Assembler Symbols

<V> Is a width specifier, encoded in “size”:

size <V>
00 B
01 H
10 S
11 D

<T> Is an arrangement specifier, encoded in “size:Q”:

SQNEG Page 1429

size Q <T>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 0 RESERVED
11 1 2D

<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.

Operation

CheckFPAdvSIMDEnabled64();
bits(datasize) operand = V[n];
bits(datasize) result;
integer element;
boolean sat;

V[d] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

SQNEG Page 1430

SQRDMLAH (by element)

Signed Saturating Rounding Doubling Multiply Accumulate returning High Half (by element). This instruction
multiplies the vector elements of the first source SIMD&FP register with the value of a vector element of the second
source SIMD&FP register without saturating the multiply results, doubles the results, and accumulates the most
significant half of the final results with the vector elements of the destination SIMD&FP register. The results are
rounded.
If any of the results overflow, they are saturated. The cumulative saturation bit, FPSR.QC, is set if saturation occurs.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: Scalar and Vector

Scalar
(FEAT_RDM)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 1 1 1 size L M Rm 1 1 0 1 H 0 Rn Rd
S

SQRDMLAH <V><d>, <V><n>, <Vm>.<Ts>[<index>]

if !HaveQRDMLAHExt() then UNDEFINED;

integer idxdsize = if H == '1' then 128 else 64;

integer index;
bit Rmhi;
case size of
when '01' index = UInt(H:L:M); Rmhi = '0';
when '10' index = UInt(H:L); Rmhi = M;
otherwise UNDEFINED;

integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rmhi:Rm);

integer esize = 8 << UInt(size);

integer datasize = esize;
integer elements = 1;

boolean rounding = TRUE;

boolean sub_op = (S == '1');

Vector
(FEAT_RDM)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 1 size L M Rm 1 1 0 1 H 0 Rn Rd
S

SQRDMLAH (by element) Page 1431

SQRDMLAH <Vd>.<T>, <Vn>.<T>, <Vm>.<Ts>[<index>]

if !HaveQRDMLAHExt() then UNDEFINED;

integer idxdsize = if H == '1' then 128 else 64;

integer index;
bit Rmhi;
case size of
when '01' index = UInt(H:L:M); Rmhi = '0';
when '10' index = UInt(H:L); Rmhi = M;
otherwise UNDEFINED;

integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rmhi:Rm);

integer esize = 8 << UInt(size);

integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;

boolean rounding = TRUE;

boolean sub_op = (S == '1');

Assembler Symbols

<V> Is a width specifier, encoded in “size”:

size <V>
00 RESERVED
01 H
10 S
11 RESERVED

<T> Is an arrangement specifier, encoded in “size:Q”:

size Q <T>
00 x RESERVED
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 x RESERVED

<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.

<Vm> Is the name of the second SIMD&FP source register, encoded in “size:M:Rm”:

size <Vm>
00 RESERVED
01 0:Rm
10 M:Rm
11 RESERVED
Restricted to V0-V15 when element size <Ts> is H.

<Ts> Is an element size specifier, encoded in “size”:

SQRDMLAH (by element) Page 1432

size <Ts>
00 RESERVED
01 H
10 S
11 RESERVED

<index> Is the element index, encoded in “size:L:H:M”:

size <index>
00 RESERVED
01 H:L:M
10 H:L
11 RESERVED

Operation

CheckFPAdvSIMDEnabled64();
bits(datasize) operand1 = V[n];
bits(idxdsize) operand2 = V[m];
bits(datasize) operand3 = V[d];
bits(datasize) result;
integer rounding_const = if rounding then 1 << (esize - 1) else 0;
integer element1;
integer element2;
integer element3;
integer product;
integer accum;
boolean sat;

element2 = SInt(Elem[operand2, index, esize]);

for e = 0 to elements-1
element1 = SInt(Elem[operand1, e, esize]);
element3 = SInt(Elem[operand3, e, esize]);
if sub_op then
accum = ((element3 << esize) - 2 * (element1 * element2) + rounding_const);
else
accum = ((element3 << esize) + 2 * (element1 * element2) + rounding_const);
(Elem[result, e, esize], sat) = SignedSatQ(accum >> esize, esize);
if sat then FPSR.QC = '1';

V[d] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

SQRDMLAH (by element) Page 1433

SQRDMLAH (vector)

Signed Saturating Rounding Doubling Multiply Accumulate returning High Half (vector). This instruction multiplies
the vector elements of the first source SIMD&FP register with the corresponding vector elements of the second source
SIMD&FP register without saturating the multiply results, doubles the results, and accumulates the most significant
half of the final results with the vector elements of the destination SIMD&FP register. The results are rounded.
If any of the results overflow, they are saturated. The cumulative saturation bit, FPSR.QC, is set if saturation occurs.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: Scalar and Vector

Scalar
(FEAT_RDM)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 1 1 0 size 0 Rm 1 0 0 0 0 1 Rn Rd
S

SQRDMLAH <V><d>, <V><n>, <V><m>

if !HaveQRDMLAHExt() then UNDEFINED;

Vector
(FEAT_RDM)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 size 0 Rm 1 0 0 0 0 1 Rn Rd
S

SQRDMLAH <Vd>.<T>, <Vn>.<T>, <Vm>.<T>

if !HaveQRDMLAHExt() then UNDEFINED;

Assembler Symbols

<V> Is a width specifier, encoded in “size”:

SQRDMLAH (vector) Page 1434

size <V>
00 RESERVED
01 H
10 S
11 RESERVED

<T> Is an arrangement specifier, encoded in “size:Q”:

size Q <T>
00 x RESERVED
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 x RESERVED

<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.

Operation

CheckFPAdvSIMDEnabled64();
bits(datasize) operand1 = V[n];
bits(datasize) operand2 = V[m];
bits(datasize) operand3 = V[d];
bits(datasize) result;
integer rounding_const = if rounding then 1 << (esize - 1) else 0;
integer element1;
integer element2;
integer element3;
integer product;
integer accum;
boolean sat;

for e = 0 to elements-1
element1 = SInt(Elem[operand1, e, esize]);
element2 = SInt(Elem[operand2, e, esize]);
element3 = SInt(Elem[operand3, e, esize]);
if sub_op then
accum = ((element3 << esize) - 2 * (element1 * element2) + rounding_const);
else
accum = ((element3 << esize) + 2 * (element1 * element2) + rounding_const);
(Elem[result, e, esize], sat) = SignedSatQ(accum >> esize, esize);
if sat then FPSR.QC = '1';

V[d] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

SQRDMLAH (vector) Page 1435

SQRDMLSH (by element)

Signed Saturating Rounding Doubling Multiply Subtract returning High Half (by element). This instruction multiplies
the vector elements of the first source SIMD&FP register with the value of a vector element of the second source
SIMD&FP register without saturating the multiply results, doubles the results, and subtracts the most significant half
of the final results from the vector elements of the destination SIMD&FP register. The results are rounded.
If any of the results overflow, they are saturated. The cumulative saturation bit, FPSR.QC, is set if saturation occurs.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: Scalar and Vector

Scalar
(FEAT_RDM)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 1 1 1 size L M Rm 1 1 1 1 H 0 Rn Rd
S

SQRDMLSH <V><d>, <V><n>, <Vm>.<Ts>[<index>]

if !HaveQRDMLAHExt() then UNDEFINED;

integer idxdsize = if H == '1' then 128 else 64;

integer index;
bit Rmhi;
case size of
when '01' index = UInt(H:L:M); Rmhi = '0';
when '10' index = UInt(H:L); Rmhi = M;
otherwise UNDEFINED;

integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rmhi:Rm);

integer esize = 8 << UInt(size);

integer datasize = esize;
integer elements = 1;

boolean rounding = TRUE;

boolean sub_op = (S == '1');

Vector
(FEAT_RDM)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 1 size L M Rm 1 1 1 1 H 0 Rn Rd
S

SQRDMLSH (by element) Page 1436

SQRDMLSH <Vd>.<T>, <Vn>.<T>, <Vm>.<Ts>[<index>]

if !HaveQRDMLAHExt() then UNDEFINED;

integer idxdsize = if H == '1' then 128 else 64;

integer index;
bit Rmhi;
case size of
when '01' index = UInt(H:L:M); Rmhi = '0';
when '10' index = UInt(H:L); Rmhi = M;
otherwise UNDEFINED;

integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rmhi:Rm);

integer esize = 8 << UInt(size);

integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;

boolean rounding = TRUE;

boolean sub_op = (S == '1');

Assembler Symbols

<V> Is a width specifier, encoded in “size”:

size <V>
00 RESERVED
01 H
10 S
11 RESERVED

<T> Is an arrangement specifier, encoded in “size:Q”:

size Q <T>
00 x RESERVED
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 x RESERVED

<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.

<Vm> Is the name of the second SIMD&FP source register, encoded in “size:M:Rm”:

size <Vm>
00 RESERVED
01 0:Rm
10 M:Rm
11 RESERVED
Restricted to V0-V15 when element size <Ts> is H.

<Ts> Is an element size specifier, encoded in “size”:

SQRDMLSH (by element) Page 1437

size <Ts>
00 RESERVED
01 H
10 S
11 RESERVED

<index> Is the element index, encoded in “size:L:H:M”:

size <index>
00 RESERVED
01 H:L:M
10 H:L
11 RESERVED

Operation

element2 = SInt(Elem[operand2, index, esize]);

V[d] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

SQRDMLSH (by element) Page 1438

SQRDMLSH (vector)

Signed Saturating Rounding Doubling Multiply Subtract returning High Half (vector). This instruction multiplies the
vector elements of the first source SIMD&FP register with the corresponding vector elements of the second source
SIMD&FP register without saturating the multiply results, doubles the results, and subtracts the most significant half
of the final results from the vector elements of the destination SIMD&FP register. The results are rounded.
If any of the results overflow, they are saturated. The cumulative saturation bit, FPSR.QC, is set if saturation occurs.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: Scalar and Vector

Scalar
(FEAT_RDM)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 1 1 0 size 0 Rm 1 0 0 0 1 1 Rn Rd
S

SQRDMLSH <V><d>, <V><n>, <V><m>

if !HaveQRDMLAHExt() then UNDEFINED;

Vector
(FEAT_RDM)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 size 0 Rm 1 0 0 0 1 1 Rn Rd
S

SQRDMLSH <Vd>.<T>, <Vn>.<T>, <Vm>.<T>

if !HaveQRDMLAHExt() then UNDEFINED;

Assembler Symbols

<V> Is a width specifier, encoded in “size”:

SQRDMLSH (vector) Page 1439

size <V>
00 RESERVED
01 H
10 S
11 RESERVED

<T> Is an arrangement specifier, encoded in “size:Q”:

size Q <T>
00 x RESERVED
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 x RESERVED

<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.

Operation

V[d] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

SQRDMLSH (vector) Page 1440

SQRDMULH (by element)

Signed saturating Rounding Doubling Multiply returning High half (by element). This instruction multiplies each
vector element in the first source SIMD&FP register by the specified vector element of the second source SIMD&FP
register, doubles the results, places the most significant half of the final results into a vector, and writes the vector to
the destination SIMD&FP register.
The results are rounded. For truncated results, see SQDMULH.
If any of the results overflows, they are saturated. If saturation occurs, the cumulative saturation bit FPSR.QC is set.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: Scalar and Vector

Scalar

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 1 size L M Rm 1 1 0 1 H 0 Rn Rd
op

SQRDMULH <V><d>, <V><n>, <Vm>.<Ts>[<index>]

integer idxdsize = if H == '1' then 128 else 64;

integer index;
bit Rmhi;
case size of
when '01' index = UInt(H:L:M); Rmhi = '0';
when '10' index = UInt(H:L); Rmhi = M;
otherwise UNDEFINED;

integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rmhi:Rm);

integer esize = 8 << UInt(size);

integer datasize = esize;
integer elements = 1;

boolean round = (op == '1');

Vector

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 1 size L M Rm 1 1 0 1 H 0 Rn Rd
op

SQRDMULH <Vd>.<T>, <Vn>.<T>, <Vm>.<Ts>[<index>]

integer idxdsize = if H == '1' then 128 else 64;

integer index;
bit Rmhi;
case size of
when '01' index = UInt(H:L:M); Rmhi = '0';
when '10' index = UInt(H:L); Rmhi = M;
otherwise UNDEFINED;

integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rmhi:Rm);

integer esize = 8 << UInt(size);

integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;

boolean round = (op == '1');

SQRDMULH (by element) Page 1441

Assembler Symbols

<V> Is a width specifier, encoded in “size”:

size <V>
00 RESERVED
01 H
10 S
11 RESERVED

<T> Is an arrangement specifier, encoded in “size:Q”:

size Q <T>
00 x RESERVED
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 x RESERVED

<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.

<Vm> Is the name of the second SIMD&FP source register, encoded in “size:M:Rm”:

size <Vm>
00 RESERVED
01 0:Rm
10 M:Rm
11 RESERVED
Restricted to V0-V15 when element size <Ts> is H.

<Ts> Is an element size specifier, encoded in “size”:

size <Ts>
00 RESERVED
01 H
10 S
11 RESERVED

<index> Is the element index, encoded in “size:L:H:M”:

size <index>
00 RESERVED
01 H:L:M
10 H:L
11 RESERVED

SQRDMULH (by element) Page 1442

Operation

element2 = SInt(Elem[operand2, index, esize]);

V[d] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

SQRDMULH (by element) Page 1443

SQRDMULH (vector)

Signed saturating Rounding Doubling Multiply returning High half. This instruction multiplies the values of
corresponding elements of the two source SIMD&FP registers, doubles the results, places the most significant half of
the final results into a vector, and writes the vector to the destination SIMD&FP register.
The results are rounded. For truncated results, see SQDMULH.
If overflow occurs with any of the results, those results are saturated. If saturation occurs, the cumulative saturation
bit FPSR.QC is set.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: Scalar and Vector

Scalar

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 1 1 0 size 1 Rm 1 0 1 1 0 1 Rn Rd
U

SQRDMULH <V><d>, <V><n>, <V><m>

Vector

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 size 1 Rm 1 0 1 1 0 1 Rn Rd
U

SQRDMULH <Vd>.<T>, <Vn>.<T>, <Vm>.<T>

Assembler Symbols

<V> Is a width specifier, encoded in “size”:

size <V>
00 RESERVED
01 H
10 S
11 RESERVED

SQRDMULH (vector) Page 1444

<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.

<T> Is an arrangement specifier, encoded in “size:Q”:

size Q <T>
00 x RESERVED
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 x RESERVED

<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.

Operation

V[d] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

SQRDMULH (vector) Page 1445

SQRSHL

Signed saturating Rounding Shift Left (register). This instruction takes each vector element in the first source
SIMD&FP register, shifts it by a value from the least significant byte of the corresponding vector element of the
second source SIMD&FP register, places the results into a vector, and writes the vector to the destination SIMD&FP
register.
If the shift value is positive, the operation is a left shift. Otherwise, it is a right shift. The results are rounded. For
truncated results, see SQSHL.
If overflow occurs with any of the results, those results are saturated. If saturation occurs, the cumulative saturation
bit FPSR.QC is set.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: Scalar and Vector

Scalar

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 0 size 1 Rm 0 1 0 1 1 1 Rn Rd
U R S

SQRSHL <V><d>, <V><n>, <V><m>

integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer esize = 8 << UInt(size);
integer datasize = esize;
integer elements = 1;
boolean unsigned = (U == '1');
boolean rounding = (R == '1');
boolean saturating = (S == '1');
if S == '0' && size != '11' then UNDEFINED;

Vector

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 size 1 Rm 0 1 0 1 1 1 Rn Rd
U R S

SQRSHL <Vd>.<T>, <Vn>.<T>, <Vm>.<T>

Assembler Symbols

<V> Is a width specifier, encoded in “size”:

SQRSHL Page 1446

size <V>
00 B
01 H
10 S
11 D

<T> Is an arrangement specifier, encoded in “size:Q”:

size Q <T>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 0 RESERVED
11 1 2D

<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.

Operation

CheckFPAdvSIMDEnabled64();
bits(datasize) operand1 = V[n];
bits(datasize) operand2 = V[m];
bits(datasize) result;

integer round_const = 0;
integer shift;
integer element;
boolean sat;

for e = 0 to elements-1
shift = SInt(Elem[operand2, e, esize]<7:0>);
if rounding then
round_const = 1 << (-shift - 1); // 0 for left shift, 2^(n-1) for right shift
element = (Int(Elem[operand1, e, esize], unsigned) + round_const) << shift;
if saturating then
(Elem[result, e, esize], sat) = SatQ(element, esize, unsigned);
if sat then FPSR.QC = '1';
else
Elem[result, e, esize] = element<esize-1:0>;

V[d] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

SQRSHL Page 1447

SQRSHRN, SQRSHRN2

Signed saturating Rounded Shift Right Narrow (immediate). This instruction reads each vector element in the source
SIMD&FP register, right shifts each result by an immediate value, saturates each shifted result to a value that is half
the original width, puts the final result into a vector, and writes the vector to the lower or upper half of the destination
SIMD&FP register. All the values in this instruction are signed integer values. The destination vector elements are half
as long as the source vector elements. The results are rounded. For truncated results, see SQSHRN.
The SQRSHRN instruction writes the vector to the lower half of the destination register and clears the upper half, while
the SQRSHRN2 instruction writes the vector to the upper half of the destination register without affecting the other bits
of the register.
If saturation occurs, the cumulative saturation bit FPSR.QC is set.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: Scalar and Vector

Scalar

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 1 0 != 0000 immb 1 0 0 1 1 1 Rn Rd
U immh op

SQRSHRN <Vb><d>, <Va><n>, #<shift>

integer d = UInt(Rd);
integer n = UInt(Rn);

if immh == '0000' then UNDEFINED;

if immh<3> == '1' then UNDEFINED;
integer esize = 8 << HighestSetBit(immh);
integer datasize = esize;
integer elements = 1;
integer part = 0;

integer shift = (2 * esize) - UInt(immh:immb);

boolean round = (op == '1');
boolean unsigned = (U == '1');

Vector

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 1 0 != 0000 immb 1 0 0 1 1 1 Rn Rd
U immh op

SQRSHRN{2} <Vd>.<Tb>, <Vn>.<Ta>, #<shift>

integer d = UInt(Rd);
integer n = UInt(Rn);

if immh == '0000' then SEE(asimdimm);

if immh<3> == '1' then UNDEFINED;
integer esize = 8 << HighestSetBit(immh);
integer datasize = 64;
integer part = UInt(Q);
integer elements = datasize DIV esize;

integer shift = (2 * esize) - UInt(immh:immb);

boolean round = (op == '1');
boolean unsigned = (U == '1');

SQRSHRN, SQRSHRN2 Page 1448

Assembler Symbols

2 Is the second and upper half specifier. If present it causes the operation to be performed on the upper
64 bits of the registers holding the narrower elements, and is encoded in “Q”:

Q 2
0 [absent]
1 [present]

<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.

<Tb> Is an arrangement specifier, encoded in “immh:Q”:

immh Q <Tb>
0000 x SEE Advanced SIMD modified immediate
0001 0 8B
0001 1 16B
001x 0 4H
001x 1 8H
01xx 0 2S
01xx 1 4S
1xxx x RESERVED

<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.

<Ta> Is an arrangement specifier, encoded in “immh”:

immh <Ta>
0000 SEE Advanced SIMD modified immediate
0001 8H
001x 4S
01xx 2D
1xxx RESERVED

<Vb> Is the destination width specifier, encoded in “immh”:

immh <Vb>
0000 RESERVED
0001 B
001x H
01xx S
1xxx RESERVED

<d> Is the number of the SIMD&FP destination register, in the "Rd" field.

<Va> Is the source width specifier, encoded in “immh”:

immh <Va>
0000 RESERVED
0001 H
001x S
01xx D
1xxx RESERVED

<n> Is the number of the first SIMD&FP source register, encoded in the "Rn" field.

<shift> For the scalar variant: is the right shift amount, in the range 1 to the destination operand width in bits,
encoded in “immh:immb”:

immh <shift>
0000 RESERVED
0001 (16-UInt(immh:immb))
001x (32-UInt(immh:immb))
01xx (64-UInt(immh:immb))
1xxx RESERVED

SQRSHRN, SQRSHRN2 Page 1449

For the vector variant: is the right shift amount, in the range 1 to the destination element width in bits,
encoded in “immh:immb”:

immh <shift>
0000 SEE Advanced SIMD modified immediate
0001 (16-UInt(immh:immb))
001x (32-UInt(immh:immb))
01xx (64-UInt(immh:immb))
1xxx RESERVED

Operation

CheckFPAdvSIMDEnabled64();
bits(datasize*2) operand = V[n];
bits(datasize) result;
integer round_const = if round then (1 << (shift - 1)) else 0;
integer element;
boolean sat;

for e = 0 to elements-1
element = (Int(Elem[operand, e, 2*esize], unsigned) + round_const) >> shift;
(Elem[result, e, esize], sat) = SatQ(element, esize, unsigned);
if sat then FPSR.QC = '1';

Vpart[d, part] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

SQRSHRN, SQRSHRN2 Page 1450

SQRSHRUN, SQRSHRUN2

Signed saturating Rounded Shift Right Unsigned Narrow (immediate). This instruction reads each signed integer value
in the vector of the source SIMD&FP register, right shifts each value by an immediate value, saturates the result to an
unsigned integer value that is half the original width, places the final result into a vector, and writes the vector to the
destination SIMD&FP register. The results are rounded. For truncated results, see SQSHRUN.
The SQRSHRUN instruction writes the vector to the lower half of the destination register and clears the upper half, while
the SQRSHRUN2 instruction writes the vector to the upper half of the destination register without affecting the other
bits of the register.
If saturation occurs, the cumulative saturation bit FPSR.QC is set.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: Scalar and Vector

Scalar

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 1 1 1 0 != 0000 immb 1 0 0 0 1 1 Rn Rd
immh op

SQRSHRUN <Vb><d>, <Va><n>, #<shift>

integer d = UInt(Rd);
integer n = UInt(Rn);

if immh == '0000' then UNDEFINED;

if immh<3> == '1' then UNDEFINED;
integer esize = 8 << HighestSetBit(immh);
integer datasize = esize;
integer elements = 1;
integer part = 0;

integer shift = (2 * esize) - UInt(immh:immb);

boolean round = (op == '1');

Vector

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 1 0 != 0000 immb 1 0 0 0 1 1 Rn Rd
immh op

SQRSHRUN{2} <Vd>.<Tb>, <Vn>.<Ta>, #<shift>

integer d = UInt(Rd);
integer n = UInt(Rn);

if immh == '0000' then SEE(asimdimm);

if immh<3> == '1' then UNDEFINED;
integer esize = 8 << HighestSetBit(immh);
integer datasize = 64;
integer part = UInt(Q);
integer elements = datasize DIV esize;

integer shift = (2 * esize) - UInt(immh:immb);

boolean round = (op == '1');

Assembler Symbols

2 Is the second and upper half specifier. If present it causes the operation to be performed on the upper
64 bits of the registers holding the narrower elements, and is encoded in “Q”:

SQRSHRUN, SQRSHRUN2 Page 1451

Q 2
0 [absent]
1 [present]

<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.

<Tb> Is an arrangement specifier, encoded in “immh:Q”:

immh Q <Tb>
0000 x SEE Advanced SIMD modified immediate
0001 0 8B
0001 1 16B
001x 0 4H
001x 1 8H
01xx 0 2S
01xx 1 4S
1xxx x RESERVED

<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.

<Ta> Is an arrangement specifier, encoded in “immh”:

immh <Ta>
0000 SEE Advanced SIMD modified immediate
0001 8H
001x 4S
01xx 2D
1xxx RESERVED

<Vb> Is the destination width specifier, encoded in “immh”:

immh <Vb>
0000 RESERVED
0001 B
001x H
01xx S
1xxx RESERVED

<d> Is the number of the SIMD&FP destination register, in the "Rd" field.

<Va> Is the source width specifier, encoded in “immh”:

immh <Va>
0000 RESERVED
0001 H
001x S
01xx D
1xxx RESERVED

<n> Is the number of the first SIMD&FP source register, encoded in the "Rn" field.

<shift> For the scalar variant: is the right shift amount, in the range 1 to the destination operand width in bits,
encoded in “immh:immb”:

immh <shift>
0000 RESERVED
0001 (16-UInt(immh:immb))
001x (32-UInt(immh:immb))
01xx (64-UInt(immh:immb))
1xxx RESERVED

For the vector variant: is the right shift amount, in the range 1 to the destination element width in bits,
encoded in “immh:immb”:

SQRSHRUN, SQRSHRUN2 Page 1452

immh <shift>
0000 SEE Advanced SIMD modified immediate
0001 (16-UInt(immh:immb))
001x (32-UInt(immh:immb))
01xx (64-UInt(immh:immb))
1xxx RESERVED

Operation

CheckFPAdvSIMDEnabled64();
bits(datasize*2) operand = V[n];
bits(datasize) result;
integer round_const = if round then (1 << (shift - 1)) else 0;
integer element;
boolean sat;

for e = 0 to elements-1
element = (SInt(Elem[operand, e, 2*esize]) + round_const) >> shift;
(Elem[result, e, esize], sat) = UnsignedSatQ(element, esize);
if sat then FPSR.QC = '1';

Vpart[d, part] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

SQRSHRUN, SQRSHRUN2 Page 1453

SQSHL (immediate)

Signed saturating Shift Left (immediate). This instruction reads each vector element in the source SIMD&FP register,
shifts each result by an immediate value, places the final result in a vector, and writes the vector to the destination
SIMD&FP register. The results are truncated. For rounded results, see UQRSHL.
If overflow occurs with any of the results, those results are saturated. If saturation occurs, the cumulative saturation
bit FPSR.QC is set.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: Scalar and Vector

Scalar

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 1 0 != 0000 immb 0 1 1 1 0 1 Rn Rd
U immh op

SQSHL <V><d>, <V><n>, #<shift>

integer d = UInt(Rd);
integer n = UInt(Rn);

if immh == '0000' then UNDEFINED;

integer esize = 8 << HighestSetBit(immh);
integer datasize = esize;
integer elements = 1;

integer shift = UInt(immh:immb) - esize;

boolean src_unsigned;
boolean dst_unsigned;
case op:U of
when '00' UNDEFINED;
when '01' src_unsigned = FALSE; dst_unsigned = TRUE;
when '10' src_unsigned = FALSE; dst_unsigned = FALSE;
when '11' src_unsigned = TRUE; dst_unsigned = TRUE;

Vector

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 1 0 != 0000 immb 0 1 1 1 0 1 Rn Rd
U immh op

SQSHL <Vd>.<T>, <Vn>.<T>, #<shift>

integer d = UInt(Rd);
integer n = UInt(Rn);

if immh == '0000' then SEE(asimdimm);

if immh<3>:Q == '10' then UNDEFINED;
integer esize = 8 << HighestSetBit(immh);
integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;

integer shift = UInt(immh:immb) - esize;

SQSHL (immediate) Page 1454

Assembler Symbols

<V> Is a width specifier, encoded in “immh”:

immh <V>
0000 RESERVED
0001 B
001x H
01xx S
1xxx D

<T> Is an arrangement specifier, encoded in “immh:Q”:

immh Q <T>
0000 x SEE Advanced SIMD modified immediate
0001 0 8B
0001 1 16B
001x 0 4H
001x 1 8H
01xx 0 2S
01xx 1 4S
1xxx 0 RESERVED
1xxx 1 2D

<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.

<shift> For the scalar variant: is the left shift amount, in the range 0 to the operand width in bits minus 1,
encoded in “immh:immb”:

immh <shift>
0000 RESERVED
0001 (UInt(immh:immb)-8)
001x (UInt(immh:immb)-16)
01xx (UInt(immh:immb)-32)
1xxx (UInt(immh:immb)-64)

For the vector variant: is the left shift amount, in the range 0 to the element width in bits minus 1,
encoded in “immh:immb”:

immh <shift>
0000 SEE Advanced SIMD modified immediate
0001 (UInt(immh:immb)-8)
001x (UInt(immh:immb)-16)
01xx (UInt(immh:immb)-32)
1xxx (UInt(immh:immb)-64)

Operation

CheckFPAdvSIMDEnabled64();
bits(datasize) operand = V[n];
bits(datasize) result;
integer element;
boolean sat;

for e = 0 to elements-1
element = Int(Elem[operand, e, esize], src_unsigned) << shift;
(Elem[result, e, esize], sat) = SatQ(element, esize, dst_unsigned);
if sat then FPSR.QC = '1';

V[d] = result;

SQSHL (immediate) Page 1455

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

SQSHL (immediate) Page 1456

SQSHL (register)

Signed saturating Shift Left (register). This instruction takes each element in the vector of the first source SIMD&FP
register, shifts each element by a value from the least significant byte of the corresponding element of the second
source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register.
If the shift value is positive, the operation is a left shift. Otherwise, it is a right shift. The results are truncated. For
rounded results, see SQRSHL.
If overflow occurs with any of the results, those results are saturated. If saturation occurs, the cumulative saturation
bit FPSR.QC is set.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: Scalar and Vector

Scalar

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 0 size 1 Rm 0 1 0 0 1 1 Rn Rd
U R S

SQSHL <V><d>, <V><n>, <V><m>

Vector

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 size 1 Rm 0 1 0 0 1 1 Rn Rd
U R S

SQSHL <Vd>.<T>, <Vn>.<T>, <Vm>.<T>

Assembler Symbols

<V> Is a width specifier, encoded in “size”:

size <V>
00 B
01 H
10 S
11 D

SQSHL (register) Page 1457

<T> Is an arrangement specifier, encoded in “size:Q”:

size Q <T>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 0 RESERVED
11 1 2D

<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.

Operation

CheckFPAdvSIMDEnabled64();
bits(datasize) operand1 = V[n];
bits(datasize) operand2 = V[m];
bits(datasize) result;

integer round_const = 0;
integer shift;
integer element;
boolean sat;

V[d] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

SQSHL (register) Page 1458

SQSHLU

Signed saturating Shift Left Unsigned (immediate). This instruction reads each signed integer value in the vector of
the source SIMD&FP register, shifts each value by an immediate value, saturates the shifted result to an unsigned
integer value, places the result in a vector, and writes the vector to the destination SIMD&FP register. The results are
truncated. For rounded results, see UQRSHL.
If saturation occurs, the cumulative saturation bit FPSR.QC is set.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: Scalar and Vector

Scalar

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 1 1 1 0 != 0000 immb 0 1 1 0 0 1 Rn Rd
U immh op

SQSHLU <V><d>, <V><n>, #<shift>

integer d = UInt(Rd);
integer n = UInt(Rn);

if immh == '0000' then UNDEFINED;

integer esize = 8 << HighestSetBit(immh);
integer datasize = esize;
integer elements = 1;

integer shift = UInt(immh:immb) - esize;

Vector

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 1 0 != 0000 immb 0 1 1 0 0 1 Rn Rd
U immh op

SQSHLU <Vd>.<T>, <Vn>.<T>, #<shift>

integer d = UInt(Rd);
integer n = UInt(Rn);

if immh == '0000' then SEE(asimdimm);

if immh<3>:Q == '10' then UNDEFINED;
integer esize = 8 << HighestSetBit(immh);
integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;

integer shift = UInt(immh:immb) - esize;

SQSHLU Page 1459

Assembler Symbols

<V> Is a width specifier, encoded in “immh”:

immh <V>
0000 RESERVED
0001 B
001x H
01xx S
1xxx D

<T> Is an arrangement specifier, encoded in “immh:Q”:

immh Q <T>
0000 x SEE Advanced SIMD modified immediate
0001 0 8B
0001 1 16B
001x 0 4H
001x 1 8H
01xx 0 2S
01xx 1 4S
1xxx 0 RESERVED
1xxx 1 2D

<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.

<shift> For the scalar variant: is the left shift amount, in the range 0 to the operand width in bits minus 1,
encoded in “immh:immb”:

immh <shift>
0000 RESERVED
0001 (UInt(immh:immb)-8)
001x (UInt(immh:immb)-16)
01xx (UInt(immh:immb)-32)
1xxx (UInt(immh:immb)-64)

For the vector variant: is the left shift amount, in the range 0 to the element width in bits minus 1,
encoded in “immh:immb”:

immh <shift>
0000 SEE Advanced SIMD modified immediate
0001 (UInt(immh:immb)-8)
001x (UInt(immh:immb)-16)
01xx (UInt(immh:immb)-32)
1xxx (UInt(immh:immb)-64)

Operation

CheckFPAdvSIMDEnabled64();
bits(datasize) operand = V[n];
bits(datasize) result;
integer element;
boolean sat;

for e = 0 to elements-1
element = Int(Elem[operand, e, esize], src_unsigned) << shift;
(Elem[result, e, esize], sat) = SatQ(element, esize, dst_unsigned);
if sat then FPSR.QC = '1';

V[d] = result;

SQSHLU Page 1460

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

SQSHLU Page 1461

SQSHRN, SQSHRN2

Signed saturating Shift Right Narrow (immediate). This instruction reads each vector element in the source SIMD&FP
register, right shifts and truncates each result by an immediate value, saturates each shifted result to a value that is
half the original width, puts the final result into a vector, and writes the vector to the lower or upper half of the
destination SIMD&FP register. All the values in this instruction are signed integer values. The destination vector
elements are half as long as the source vector elements. For rounded results, see SQRSHRN.
The SQSHRN instruction writes the vector to the lower half of the destination register and clears the upper half, while
the SQSHRN2 instruction writes the vector to the upper half of the destination register without affecting the other bits
of the register.
If saturation occurs, the cumulative saturation bit FPSR.QC is set.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: Scalar and Vector

Scalar

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 1 0 != 0000 immb 1 0 0 1 0 1 Rn Rd
U immh op

SQSHRN <Vb><d>, <Va><n>, #<shift>

integer d = UInt(Rd);
integer n = UInt(Rn);

if immh == '0000' then UNDEFINED;

if immh<3> == '1' then UNDEFINED;
integer esize = 8 << HighestSetBit(immh);
integer datasize = esize;
integer elements = 1;
integer part = 0;

integer shift = (2 * esize) - UInt(immh:immb);

boolean round = (op == '1');
boolean unsigned = (U == '1');

Vector

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 1 0 != 0000 immb 1 0 0 1 0 1 Rn Rd
U immh op

SQSHRN{2} <Vd>.<Tb>, <Vn>.<Ta>, #<shift>

integer d = UInt(Rd);
integer n = UInt(Rn);

if immh == '0000' then SEE(asimdimm);

if immh<3> == '1' then UNDEFINED;
integer esize = 8 << HighestSetBit(immh);
integer datasize = 64;
integer part = UInt(Q);
integer elements = datasize DIV esize;

integer shift = (2 * esize) - UInt(immh:immb);

boolean round = (op == '1');
boolean unsigned = (U == '1');

SQSHRN, SQSHRN2 Page 1462

Assembler Symbols

2 Is the second and upper half specifier. If present it causes the operation to be performed on the upper
64 bits of the registers holding the narrower elements, and is encoded in “Q”:

Q 2
0 [absent]
1 [present]

<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.

<Tb> Is an arrangement specifier, encoded in “immh:Q”:

immh Q <Tb>
0000 x SEE Advanced SIMD modified immediate
0001 0 8B
0001 1 16B
001x 0 4H
001x 1 8H
01xx 0 2S
01xx 1 4S
1xxx x RESERVED

<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.

<Ta> Is an arrangement specifier, encoded in “immh”:

immh <Ta>
0000 SEE Advanced SIMD modified immediate
0001 8H
001x 4S
01xx 2D
1xxx RESERVED

<Vb> Is the destination width specifier, encoded in “immh”:

immh <Vb>
0000 RESERVED
0001 B
001x H
01xx S
1xxx RESERVED

<d> Is the number of the SIMD&FP destination register, in the "Rd" field.

<Va> Is the source width specifier, encoded in “immh”:

immh <Va>
0000 RESERVED
0001 H
001x S
01xx D
1xxx RESERVED

<n> Is the number of the first SIMD&FP source register, encoded in the "Rn" field.

<shift> For the scalar variant: is the right shift amount, in the range 1 to the destination operand width in bits,
encoded in “immh:immb”:

immh <shift>
0000 RESERVED
0001 (16-UInt(immh:immb))
001x (32-UInt(immh:immb))
01xx (64-UInt(immh:immb))
1xxx RESERVED

SQSHRN, SQSHRN2 Page 1463

For the vector variant: is the right shift amount, in the range 1 to the destination element width in bits,
encoded in “immh:immb”:

immh <shift>
0000 SEE Advanced SIMD modified immediate
0001 (16-UInt(immh:immb))
001x (32-UInt(immh:immb))
01xx (64-UInt(immh:immb))
1xxx RESERVED

Operation

CheckFPAdvSIMDEnabled64();
bits(datasize*2) operand = V[n];
bits(datasize) result;
integer round_const = if round then (1 << (shift - 1)) else 0;
integer element;
boolean sat;

for e = 0 to elements-1
element = (Int(Elem[operand, e, 2*esize], unsigned) + round_const) >> shift;
(Elem[result, e, esize], sat) = SatQ(element, esize, unsigned);
if sat then FPSR.QC = '1';

Vpart[d, part] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

SQSHRN, SQSHRN2 Page 1464

SQSHRUN, SQSHRUN2

Signed saturating Shift Right Unsigned Narrow (immediate). This instruction reads each signed integer value in the
vector of the source SIMD&FP register, right shifts each value by an immediate value, saturates the result to an
unsigned integer value that is half the original width, places the final result into a vector, and writes the vector to the
destination SIMD&FP register. The results are truncated. For rounded results, see SQRSHRUN.
The SQSHRUN instruction writes the vector to the lower half of the destination register and clears the upper half, while
the SQSHRUN2 instruction writes the vector to the upper half of the destination register without affecting the other bits
of the register.
If saturation occurs, the cumulative saturation bit FPSR.QC is set.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: Scalar and Vector

Scalar

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 1 1 1 0 != 0000 immb 1 0 0 0 0 1 Rn Rd
immh op

SQSHRUN <Vb><d>, <Va><n>, #<shift>

integer d = UInt(Rd);
integer n = UInt(Rn);

if immh == '0000' then UNDEFINED;

if immh<3> == '1' then UNDEFINED;
integer esize = 8 << HighestSetBit(immh);
integer datasize = esize;
integer elements = 1;
integer part = 0;

integer shift = (2 * esize) - UInt(immh:immb);

boolean round = (op == '1');

Vector

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 1 0 != 0000 immb 1 0 0 0 0 1 Rn Rd
immh op

SQSHRUN{2} <Vd>.<Tb>, <Vn>.<Ta>, #<shift>

integer d = UInt(Rd);
integer n = UInt(Rn);

if immh == '0000' then SEE(asimdimm);

if immh<3> == '1' then UNDEFINED;
integer esize = 8 << HighestSetBit(immh);
integer datasize = 64;
integer part = UInt(Q);
integer elements = datasize DIV esize;

integer shift = (2 * esize) - UInt(immh:immb);

boolean round = (op == '1');

Assembler Symbols

2 Is the second and upper half specifier. If present it causes the operation to be performed on the upper
64 bits of the registers holding the narrower elements, and is encoded in “Q”:

SQSHRUN, SQSHRUN2 Page 1465

Q 2
0 [absent]
1 [present]

<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.

<Tb> Is an arrangement specifier, encoded in “immh:Q”:

immh Q <Tb>
0000 x SEE Advanced SIMD modified immediate
0001 0 8B
0001 1 16B
001x 0 4H
001x 1 8H
01xx 0 2S
01xx 1 4S
1xxx x RESERVED

<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.

<Ta> Is an arrangement specifier, encoded in “immh”:

immh <Ta>
0000 SEE Advanced SIMD modified immediate
0001 8H
001x 4S
01xx 2D
1xxx RESERVED

<Vb> Is the destination width specifier, encoded in “immh”:

immh <Vb>
0000 RESERVED
0001 B
001x H
01xx S
1xxx RESERVED

<d> Is the number of the SIMD&FP destination register, in the "Rd" field.

<Va> Is the source width specifier, encoded in “immh”:

immh <Va>
0000 RESERVED
0001 H
001x S
01xx D
1xxx RESERVED

<n> Is the number of the first SIMD&FP source register, encoded in the "Rn" field.

<shift> For the scalar variant: is the right shift amount, in the range 1 to the destination operand width in bits,
encoded in “immh:immb”:

immh <shift>
0000 RESERVED
0001 (16-UInt(immh:immb))
001x (32-UInt(immh:immb))
01xx (64-UInt(immh:immb))
1xxx RESERVED

For the vector variant: is the right shift amount, in the range 1 to the destination element width in bits,
encoded in “immh:immb”:

SQSHRUN, SQSHRUN2 Page 1466

immh <shift>
0000 SEE Advanced SIMD modified immediate
0001 (16-UInt(immh:immb))
001x (32-UInt(immh:immb))
01xx (64-UInt(immh:immb))
1xxx RESERVED

Operation

CheckFPAdvSIMDEnabled64();
bits(datasize*2) operand = V[n];
bits(datasize) result;
integer round_const = if round then (1 << (shift - 1)) else 0;
integer element;
boolean sat;

for e = 0 to elements-1
element = (SInt(Elem[operand, e, 2*esize]) + round_const) >> shift;
(Elem[result, e, esize], sat) = UnsignedSatQ(element, esize);
if sat then FPSR.QC = '1';

Vpart[d, part] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

SQSHRUN, SQSHRUN2 Page 1467

SQSUB

Signed saturating Subtract. This instruction subtracts the element values of the second source SIMD&FP register
from the corresponding element values of the first source SIMD&FP register, places the results into a vector, and
writes the vector to the destination SIMD&FP register.
If overflow occurs with any of the results, those results are saturated. If saturation occurs, the cumulative saturation
bit FPSR.QC is set.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: Scalar and Vector

Scalar

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 0 size 1 Rm 0 0 1 0 1 1 Rn Rd
U

SQSUB <V><d>, <V><n>, <V><m>

integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer esize = 8 << UInt(size);
integer datasize = esize;
integer elements = 1;
boolean unsigned = (U == '1');

Vector

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 size 1 Rm 0 0 1 0 1 1 Rn Rd
U

SQSUB <Vd>.<T>, <Vn>.<T>, <Vm>.<T>

Assembler Symbols

<V> Is a width specifier, encoded in “size”:

size <V>
00 B
01 H
10 S
11 D

SQSUB Page 1468

<T> Is an arrangement specifier, encoded in “size:Q”:

size Q <T>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 0 RESERVED
11 1 2D

<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.

Operation

CheckFPAdvSIMDEnabled64();
bits(datasize) operand1 = V[n];
bits(datasize) operand2 = V[m];
bits(datasize) result;
integer element1;
integer element2;
integer diff;
boolean sat;

for e = 0 to elements-1
element1 = Int(Elem[operand1, e, esize], unsigned);
element2 = Int(Elem[operand2, e, esize], unsigned);
diff = element1 - element2;
(Elem[result, e, esize], sat) = SatQ(diff, esize, unsigned);
if sat then FPSR.QC = '1';

V[d] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

SQSUB Page 1469

SQXTN, SQXTN2

Signed saturating extract Narrow. This instruction reads each vector element from the source SIMD&FP register,
saturates the value to half the original width, places the result into a vector, and writes the vector to the lower or
upper half of the destination SIMD&FP register. The destination vector elements are half as long as the source vector
elements. All the values in this instruction are signed integer values.
If overflow occurs with any of the results, those results are saturated. If saturation occurs, the cumulative saturation
bit FPSR.QC is set.
The SQXTN instruction writes the vector to the lower half of the destination register and clears the upper half, while the
SQXTN2 instruction writes the vector to the upper half of the destination register without affecting the other bits of the
register.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: Scalar and Vector

Scalar

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 0 size 1 0 0 0 0 1 0 1 0 0 1 0 Rn Rd
U

SQXTN <Vb><d>, <Va><n>

integer d = UInt(Rd);
integer n = UInt(Rn);

if size == '11' then UNDEFINED;

integer esize = 8 << UInt(size);
integer datasize = esize;
integer part = 0;
integer elements = 1;

boolean unsigned = (U == '1');

Vector

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 size 1 0 0 0 0 1 0 1 0 0 1 0 Rn Rd
U

SQXTN{2} <Vd>.<Tb>, <Vn>.<Ta>

integer d = UInt(Rd);
integer n = UInt(Rn);

if size == '11' then UNDEFINED;

integer esize = 8 << UInt(size);
integer datasize = 64;
integer part = UInt(Q);
integer elements = datasize DIV esize;

boolean unsigned = (U == '1');

Assembler Symbols

2 Is the second and upper half specifier. If present it causes the operation to be performed on the upper
64 bits of the registers holding the narrower elements, and is encoded in “Q”:

SQXTN, SQXTN2 Page 1470

Q 2
0 [absent]
1 [present]

<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.

<Tb> Is an arrangement specifier, encoded in “size:Q”:

size Q <Tb>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 x RESERVED

<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.

<Ta> Is an arrangement specifier, encoded in “size”:

size <Ta>
00 8H
01 4S
10 2D
11 RESERVED

<Vb> Is the destination width specifier, encoded in “size”:

size <Vb>
00 B
01 H
10 S
11 RESERVED

<d> Is the number of the SIMD&FP destination register, encoded in the "Rd" field.

<Va> Is the source width specifier, encoded in “size”:

size <Va>
00 H
01 S
10 D
11 RESERVED

<n> Is the number of the SIMD&FP source register, encoded in the "Rn" field.

Operation

CheckFPAdvSIMDEnabled64();
bits(2*datasize) operand = V[n];
bits(datasize) result;
bits(2*esize) element;
boolean sat;

for e = 0 to elements-1
element = Elem[operand, e, 2*esize];
(Elem[result, e, esize], sat) = SatQ(Int(element, unsigned), esize, unsigned);
if sat then FPSR.QC = '1';

Vpart[d, part] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

SQXTN, SQXTN2 Page 1471

SQXTUN, SQXTUN2

Signed saturating extract Unsigned Narrow. This instruction reads each signed integer value in the vector of the
source SIMD&FP register, saturates the value to an unsigned integer value that is half the original width, places the
result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register. The
destination vector elements are half as long as the source vector elements.
If saturation occurs, the cumulative saturation bit FPSR.QC is set.
The SQXTUN instruction writes the vector to the lower half of the destination register and clears the upper half, while
the SQXTUN2 instruction writes the vector to the upper half of the destination register without affecting the other bits
of the register.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: Scalar and Vector

Scalar

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 1 1 0 size 1 0 0 0 0 1 0 0 1 0 1 0 Rn Rd

SQXTUN <Vb><d>, <Va><n>

integer d = UInt(Rd);
integer n = UInt(Rn);

if size == '11' then UNDEFINED;

integer esize = 8 << UInt(size);
integer datasize = esize;
integer part = 0;
integer elements = 1;

Vector

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 size 1 0 0 0 0 1 0 0 1 0 1 0 Rn Rd

SQXTUN{2} <Vd>.<Tb>, <Vn>.<Ta>

integer d = UInt(Rd);
integer n = UInt(Rn);

if size == '11' then UNDEFINED;

integer esize = 8 << UInt(size);
integer datasize = 64;
integer part = UInt(Q);
integer elements = datasize DIV esize;

Assembler Symbols

2 Is the second and upper half specifier. If present it causes the operation to be performed on the upper
64 bits of the registers holding the narrower elements, and is encoded in “Q”:

Q 2
0 [absent]
1 [present]

<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.

<Tb> Is an arrangement specifier, encoded in “size:Q”:

SQXTUN, SQXTUN2 Page 1472

size Q <Tb>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 x RESERVED

<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.

<Ta> Is an arrangement specifier, encoded in “size”:

size <Ta>
00 8H
01 4S
10 2D
11 RESERVED

<Vb> Is the destination width specifier, encoded in “size”:

size <Vb>
00 B
01 H
10 S
11 RESERVED

<d> Is the number of the SIMD&FP destination register, encoded in the "Rd" field.

<Va> Is the source width specifier, encoded in “size”:

size <Va>
00 H
01 S
10 D
11 RESERVED

<n> Is the number of the SIMD&FP source register, encoded in the "Rn" field.

Operation

CheckFPAdvSIMDEnabled64();
bits(2*datasize) operand = V[n];
bits(datasize) result;
bits(2*esize) element;
boolean sat;

for e = 0 to elements-1
element = Elem[operand, e, 2*esize];
(Elem[result, e, esize], sat) = UnsignedSatQ(SInt(element), esize);
if sat then FPSR.QC = '1';

Vpart[d, part] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

SQXTUN, SQXTUN2 Page 1473

SRHADD

Signed Rounding Halving Add. This instruction adds corresponding signed integer values from the two source
SIMD&FP registers, shifts each result right one bit, places the results into a vector, and writes the vector to the
destination SIMD&FP register.
The results are rounded. For truncated results, see SHADD.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 size 1 Rm 0 0 0 1 0 1 Rn Rd
U

SRHADD <Vd>.<T>, <Vn>.<T>, <Vm>.<T>

boolean unsigned = (U == '1');

Assembler Symbols

<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.

<T> Is an arrangement specifier, encoded in “size:Q”:

size Q <T>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 x RESERVED

<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.

Operation

CheckFPAdvSIMDEnabled64();
bits(datasize) operand1 = V[n];
bits(datasize) operand2 = V[m];
bits(datasize) result;
integer element1;
integer element2;

for e = 0 to elements-1
element1 = Int(Elem[operand1, e, esize], unsigned);
element2 = Int(Elem[operand2, e, esize], unsigned);
Elem[result, e, esize] = (element1+element2+1)<esize:1>;

V[d] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

SRHADD Page 1474

SRI

Shift Right and Insert (immediate). This instruction reads each vector element in the source SIMD&FP register, right
shifts each vector element by an immediate value, and inserts the result into the corresponding vector element in the
destination SIMD&FP register such that the new zero bits created by the shift are not inserted but retain their existing
value. Bits shifted out of the right of each vector element of the source register are lost.
The following figure shows an example of the operation of shift right by 3 for an 8-bit vector element.
63 56 55 0
Vn.B[7]

63 56 55 0
Vd.B[7] after operation

63 56 55 0
Vd.B[7] before operation

Scalar

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 1 1 1 0 != 0000 immb 0 1 0 0 0 1 Rn Rd
immh

SRI <V><d>, <V><n>, #<shift>

integer d = UInt(Rd);
integer n = UInt(Rn);

if immh<3> != '1' then UNDEFINED;

integer esize = 8 << 3;
integer datasize = esize;
integer elements = 1;

integer shift = (esize * 2) - UInt(immh:immb);

Vector

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 1 0 != 0000 immb 0 1 0 0 0 1 Rn Rd
immh

SRI <Vd>.<T>, <Vn>.<T>, #<shift>

integer d = UInt(Rd);
integer n = UInt(Rn);

if immh == '0000' then SEE(asimdimm);

if immh<3>:Q == '10' then UNDEFINED;
integer esize = 8 << HighestSetBit(immh);
integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;

integer shift = (esize * 2) - UInt(immh:immb);

SRI Page 1475

Assembler Symbols

<V> Is a width specifier, encoded in “immh”:

immh <V>
0xxx RESERVED
1xxx D

<T> Is an arrangement specifier, encoded in “immh:Q”:

immh Q <T>
0000 x SEE Advanced SIMD modified immediate
0001 0 8B
0001 1 16B
001x 0 4H
001x 1 8H
01xx 0 2S
01xx 1 4S
1xxx 0 RESERVED
1xxx 1 2D

<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.

<shift> For the scalar variant: is the right shift amount, in the range 1 to 64, encoded in “immh:immb”:

immh <shift>
0xxx RESERVED
1xxx (128-UInt(immh:immb))

For the vector variant: is the right shift amount, in the range 1 to the element width in bits, encoded in
“immh:immb”:

immh <shift>
0000 SEE Advanced SIMD modified immediate
0001 (16-UInt(immh:immb))
001x (32-UInt(immh:immb))
01xx (64-UInt(immh:immb))
1xxx (128-UInt(immh:immb))

Operation

CheckFPAdvSIMDEnabled64();
bits(datasize) operand = V[n];
bits(datasize) operand2 = V[d];
bits(datasize) result;
bits(esize) mask = LSR(Ones(esize), shift);
bits(esize) shifted;

for e = 0 to elements-1
shifted = LSR(Elem[operand, e, esize], shift);
Elem[result, e, esize] = (Elem[operand2, e, esize] AND NOT(mask)) OR shifted;
V[d] = result;

Operational information

SRI Page 1476

◦ The values of the data supplied in any of its registers.
◦ The values of the NZCV flags.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

SRI Page 1477

SRSHL

Signed Rounding Shift Left (register). This instruction takes each signed integer value in the vector of the first source
SIMD&FP register, shifts it by a value from the least significant byte of the corresponding element of the second
source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register.
If the shift value is positive, the operation is a left shift. If the shift value is negative, it is a rounding right shift. For a
truncating shift, see SSHL.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: Scalar and Vector

Scalar

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 0 size 1 Rm 0 1 0 1 0 1 Rn Rd
U R S

SRSHL <V><d>, <V><n>, <V><m>

Vector

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 size 1 Rm 0 1 0 1 0 1 Rn Rd
U R S

SRSHL <Vd>.<T>, <Vn>.<T>, <Vm>.<T>

Assembler Symbols

<V> Is a width specifier, encoded in “size”:

size <V>
0x RESERVED
10 RESERVED
11 D

<d> Is the number of the SIMD&FP destination register, in the "Rd" field.
<n> Is the number of the first SIMD&FP source register, encoded in the "Rn" field.

SRSHL Page 1478

<m> Is the number of the second SIMD&FP source register, encoded in the "Rm" field.
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.

<T> Is an arrangement specifier, encoded in “size:Q”:

size Q <T>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 0 RESERVED
11 1 2D

<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.

Operation

CheckFPAdvSIMDEnabled64();
bits(datasize) operand1 = V[n];
bits(datasize) operand2 = V[m];
bits(datasize) result;

integer round_const = 0;
integer shift;
integer element;
boolean sat;

V[d] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

SRSHL Page 1479

SRSHR

Signed Rounding Shift Right (immediate). This instruction reads each vector element in the source SIMD&FP register,
right shifts each result by an immediate value, places the final result into a vector, and writes the vector to the
destination SIMD&FP register. All the values in this instruction are signed integer values. The results are rounded. For
truncated results, see SSHR.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: Scalar and Vector

Scalar

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 1 0 != 0000 immb 0 0 1 0 0 1 Rn Rd
U immh o1 o0

SRSHR <V><d>, <V><n>, #<shift>

integer d = UInt(Rd);
integer n = UInt(Rn);

if immh<3> != '1' then UNDEFINED;

integer esize = 8 << 3;
integer datasize = esize;
integer elements = 1;

integer shift = (esize * 2) - UInt(immh:immb);

boolean unsigned = (U == '1');
boolean round = (o1 == '1');
boolean accumulate = (o0 == '1');

Vector

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 1 0 != 0000 immb 0 0 1 0 0 1 Rn Rd
U immh o1 o0

SRSHR <Vd>.<T>, <Vn>.<T>, #<shift>

integer d = UInt(Rd);
integer n = UInt(Rn);

if immh == '0000' then SEE(asimdimm);

if immh<3>:Q == '10' then UNDEFINED;
integer esize = 8 << HighestSetBit(immh);
integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;

integer shift = (esize * 2) - UInt(immh:immb);

boolean unsigned = (U == '1');
boolean round = (o1 == '1');
boolean accumulate = (o0 == '1');

Assembler Symbols

<V> Is a width specifier, encoded in “immh”:

immh <V>
0xxx RESERVED
1xxx D

SRSHR Page 1480

<T> Is an arrangement specifier, encoded in “immh:Q”:

immh Q <T>
0000 x SEE Advanced SIMD modified immediate
0001 0 8B
0001 1 16B
001x 0 4H
001x 1 8H
01xx 0 2S
01xx 1 4S
1xxx 0 RESERVED
1xxx 1 2D

<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.

<shift> For the scalar variant: is the right shift amount, in the range 1 to 64, encoded in “immh:immb”:

immh <shift>
0xxx RESERVED
1xxx (128-UInt(immh:immb))

For the vector variant: is the right shift amount, in the range 1 to the element width in bits, encoded in
“immh:immb”:

immh <shift>
0000 SEE Advanced SIMD modified immediate
0001 (16-UInt(immh:immb))
001x (32-UInt(immh:immb))
01xx (64-UInt(immh:immb))
1xxx (128-UInt(immh:immb))

Operation

CheckFPAdvSIMDEnabled64();
bits(datasize) operand = V[n];
bits(datasize) operand2;
bits(datasize) result;
integer round_const = if round then (1 << (shift - 1)) else 0;
integer element;

operand2 = if accumulate then V[d] else Zeros();

for e = 0 to elements-1
element = (Int(Elem[operand, e, esize], unsigned) + round_const) >> shift;
Elem[result, e, esize] = Elem[operand2, e, esize] + element<esize-1:0>;

V[d] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

SRSHR Page 1481

SRSRA

Signed Rounding Shift Right and Accumulate (immediate). This instruction reads each vector element in the source
SIMD&FP register, right shifts each result by an immediate value, and accumulates the final results with the vector
elements of the destination SIMD&FP register. All the values in this instruction are signed integer values. The results
are rounded. For truncated results, see SSRA.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: Scalar and Vector

Scalar

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 1 0 != 0000 immb 0 0 1 1 0 1 Rn Rd
U immh o1 o0

SRSRA <V><d>, <V><n>, #<shift>

integer d = UInt(Rd);
integer n = UInt(Rn);

if immh<3> != '1' then UNDEFINED;

integer esize = 8 << 3;
integer datasize = esize;
integer elements = 1;

integer shift = (esize * 2) - UInt(immh:immb);

boolean unsigned = (U == '1');
boolean round = (o1 == '1');
boolean accumulate = (o0 == '1');

Vector

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 1 0 != 0000 immb 0 0 1 1 0 1 Rn Rd
U immh o1 o0

SRSRA <Vd>.<T>, <Vn>.<T>, #<shift>

integer d = UInt(Rd);
integer n = UInt(Rn);

if immh == '0000' then SEE(asimdimm);

if immh<3>:Q == '10' then UNDEFINED;
integer esize = 8 << HighestSetBit(immh);
integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;

integer shift = (esize * 2) - UInt(immh:immb);

boolean unsigned = (U == '1');
boolean round = (o1 == '1');
boolean accumulate = (o0 == '1');

Assembler Symbols

<V> Is a width specifier, encoded in “immh”:

immh <V>
0xxx RESERVED
1xxx D

SRSRA Page 1482

<T> Is an arrangement specifier, encoded in “immh:Q”:

immh Q <T>
0000 x SEE Advanced SIMD modified immediate
0001 0 8B
0001 1 16B
001x 0 4H
001x 1 8H
01xx 0 2S
01xx 1 4S
1xxx 0 RESERVED
1xxx 1 2D

<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.

<shift> For the scalar variant: is the right shift amount, in the range 1 to 64, encoded in “immh:immb”:

immh <shift>
0xxx RESERVED
1xxx (128-UInt(immh:immb))

For the vector variant: is the right shift amount, in the range 1 to the element width in bits, encoded in
“immh:immb”:

immh <shift>
0000 SEE Advanced SIMD modified immediate
0001 (16-UInt(immh:immb))
001x (32-UInt(immh:immb))
01xx (64-UInt(immh:immb))
1xxx (128-UInt(immh:immb))

Operation

CheckFPAdvSIMDEnabled64();
bits(datasize) operand = V[n];
bits(datasize) operand2;
bits(datasize) result;
integer round_const = if round then (1 << (shift - 1)) else 0;
integer element;

operand2 = if accumulate then V[d] else Zeros();

for e = 0 to elements-1
element = (Int(Elem[operand, e, esize], unsigned) + round_const) >> shift;
Elem[result, e, esize] = Elem[operand2, e, esize] + element<esize-1:0>;

V[d] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

SRSRA Page 1483

SSHL

Signed Shift Left (register). This instruction takes each signed integer value in the vector of the first source SIMD&FP
register, shifts each value by a value from the least significant byte of the corresponding element of the second source
SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register.
If the shift value is positive, the operation is a left shift. If the shift value is negative, it is a truncating right shift. For a
rounding shift, see SRSHL.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: Scalar and Vector

Scalar

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 0 size 1 Rm 0 1 0 0 0 1 Rn Rd
U R S

SSHL <V><d>, <V><n>, <V><m>

Vector

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 size 1 Rm 0 1 0 0 0 1 Rn Rd
U R S

SSHL <Vd>.<T>, <Vn>.<T>, <Vm>.<T>

Assembler Symbols

<V> Is a width specifier, encoded in “size”:

size <V>
0x RESERVED
10 RESERVED
11 D

<d> Is the number of the SIMD&FP destination register, in the "Rd" field.
<n> Is the number of the first SIMD&FP source register, encoded in the "Rn" field.

SSHL Page 1484

<m> Is the number of the second SIMD&FP source register, encoded in the "Rm" field.
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.

<T> Is an arrangement specifier, encoded in “size:Q”:

size Q <T>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 0 RESERVED
11 1 2D

<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.

Operation

CheckFPAdvSIMDEnabled64();
bits(datasize) operand1 = V[n];
bits(datasize) operand2 = V[m];
bits(datasize) result;

integer round_const = 0;
integer shift;
integer element;
boolean sat;

V[d] = result;

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

SSHL Page 1485

SSHLL, SSHLL2

Signed Shift Left Long (immediate). This instruction reads each vector element from the source SIMD&FP register,
left shifts each vector element by the specified shift amount, places the result into a vector, and writes the vector to
the destination SIMD&FP register. The destination vector elements are twice as long as the source vector elements.
All the values in this instruction are signed integer values.
The SSHLL instruction extracts vector elements from the lower half of the source register. The SSHLL2 instruction
extracts vector elements from the upper half of the source register.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
This instruction is used by the alias SXTL, SXTL2.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 1 0 != 0000 immb 1 0 1 0 0 1 Rn Rd
U immh

SSHLL{2} <Vd>.<Ta>, <Vn>.<Tb>, #<shift>

integer d = UInt(Rd);
integer n = UInt(Rn);

if immh == '0000' then SEE(asimdimm);

if immh<3> == '1' then UNDEFINED;
integer esize = 8 << HighestSetBit(immh);
integer datasize = 64;
integer part = UInt(Q);
integer elements = datasize DIV esize;

integer shift = UInt(immh:immb) - esize;

boolean unsigned = (U == '1');

Assembler Symbols

2 Is the second and upper half specifier. If present it causes the operation to be performed on the upper
64 bits of the registers holding the narrower elements, and is encoded in “Q”:

Q 2
0 [absent]
1 [present]

<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.

<Ta> Is an arrangement specifier, encoded in “immh”:

immh <Ta>
0000 SEE Advanced SIMD modified immediate
0001 8H
001x 4S
01xx 2D
1xxx RESERVED

<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.

<Tb> Is an arrangement specifier, encoded in “immh:Q”:

SSHLL, SSHLL2 Page 1486

immh Q <Tb>
0000 x SEE Advanced SIMD modified immediate
0001 0 8B
0001 1 16B
001x 0 4H
001x 1 8H
01xx 0 2S
01xx 1 4S
1xxx x RESERVED

<shift> Is the left shift amount, in the range 0 to the source element width in bits minus 1, encoded in
“immh:immb”:

immh <shift>
0000 SEE Advanced SIMD modified immediate
0001 (UInt(immh:immb)-8)
001x (UInt(immh:immb)-16)
01xx (UInt(immh:immb)-32)
1xxx RESERVED

Alias Conditions

Alias Is preferred when

SXTL, SXTL2 immb == '000' && BitCount(immh) == 1

Operation

CheckFPAdvSIMDEnabled64();
bits(datasize) operand = Vpart[n, part];
bits(datasize*2) result;
integer element;

for e = 0 to elements-1
element = Int(Elem[operand, e, esize], unsigned) << shift;
Elem[result, e, 2*esize] = element<2*esize-1:0>;

V[d] = result;

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

SSHLL, SSHLL2 Page 1487

SSHR

Signed Shift Right (immediate). This instruction reads each vector element in the source SIMD&FP register, right
shifts each result by an immediate value, places the final result into a vector, and writes the vector to the destination
SIMD&FP register. All the values in this instruction are signed integer values. The results are truncated. For rounded
results, see SRSHR.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: Scalar and Vector

Scalar

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 1 0 != 0000 immb 0 0 0 0 0 1 Rn Rd
U immh o1 o0

SSHR <V><d>, <V><n>, #<shift>

integer d = UInt(Rd);
integer n = UInt(Rn);

if immh<3> != '1' then UNDEFINED;

integer esize = 8 << 3;
integer datasize = esize;
integer elements = 1;

integer shift = (esize * 2) - UInt(immh:immb);

boolean unsigned = (U == '1');
boolean round = (o1 == '1');
boolean accumulate = (o0 == '1');

Vector

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 1 0 != 0000 immb 0 0 0 0 0 1 Rn Rd
U immh o1 o0

SSHR <Vd>.<T>, <Vn>.<T>, #<shift>

integer d = UInt(Rd);
integer n = UInt(Rn);

if immh == '0000' then SEE(asimdimm);

if immh<3>:Q == '10' then UNDEFINED;
integer esize = 8 << HighestSetBit(immh);
integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;

integer shift = (esize * 2) - UInt(immh:immb);

boolean unsigned = (U == '1');
boolean round = (o1 == '1');
boolean accumulate = (o0 == '1');

Assembler Symbols

<V> Is a width specifier, encoded in “immh”:

immh <V>
0xxx RESERVED
1xxx D

SSHR Page 1488

<T> Is an arrangement specifier, encoded in “immh:Q”:

immh Q <T>
0000 x SEE Advanced SIMD modified immediate
0001 0 8B
0001 1 16B
001x 0 4H
001x 1 8H
01xx 0 2S
01xx 1 4S
1xxx 0 RESERVED
1xxx 1 2D

<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.

<shift> For the scalar variant: is the right shift amount, in the range 1 to 64, encoded in “immh:immb”:

immh <shift>
0xxx RESERVED
1xxx (128-UInt(immh:immb))

For the vector variant: is the right shift amount, in the range 1 to the element width in bits, encoded in
“immh:immb”:

immh <shift>
0000 SEE Advanced SIMD modified immediate
0001 (16-UInt(immh:immb))
001x (32-UInt(immh:immb))
01xx (64-UInt(immh:immb))
1xxx (128-UInt(immh:immb))

Operation

CheckFPAdvSIMDEnabled64();
bits(datasize) operand = V[n];
bits(datasize) operand2;
bits(datasize) result;
integer round_const = if round then (1 << (shift - 1)) else 0;
integer element;

operand2 = if accumulate then V[d] else Zeros();

for e = 0 to elements-1
element = (Int(Elem[operand, e, esize], unsigned) + round_const) >> shift;
Elem[result, e, esize] = Elem[operand2, e, esize] + element<esize-1:0>;

V[d] = result;

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

SSHR Page 1489

SSRA

Signed Shift Right and Accumulate (immediate). This instruction reads each vector element in the source SIMD&FP
register, right shifts each result by an immediate value, and accumulates the final results with the vector elements of
the destination SIMD&FP register. All the values in this instruction are signed integer values. The results are
truncated. For rounded results, see SRSRA.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: Scalar and Vector

Scalar

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 1 0 != 0000 immb 0 0 0 1 0 1 Rn Rd
U immh o1 o0

SSRA <V><d>, <V><n>, #<shift>

integer d = UInt(Rd);
integer n = UInt(Rn);

if immh<3> != '1' then UNDEFINED;

integer esize = 8 << 3;
integer datasize = esize;
integer elements = 1;

integer shift = (esize * 2) - UInt(immh:immb);

boolean unsigned = (U == '1');
boolean round = (o1 == '1');
boolean accumulate = (o0 == '1');

Vector

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 1 0 != 0000 immb 0 0 0 1 0 1 Rn Rd
U immh o1 o0

SSRA <Vd>.<T>, <Vn>.<T>, #<shift>

integer d = UInt(Rd);
integer n = UInt(Rn);

if immh == '0000' then SEE(asimdimm);

if immh<3>:Q == '10' then UNDEFINED;
integer esize = 8 << HighestSetBit(immh);
integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;

integer shift = (esize * 2) - UInt(immh:immb);

boolean unsigned = (U == '1');
boolean round = (o1 == '1');
boolean accumulate = (o0 == '1');

Assembler Symbols

<V> Is a width specifier, encoded in “immh”:

immh <V>
0xxx RESERVED
1xxx D

SSRA Page 1490

<T> Is an arrangement specifier, encoded in “immh:Q”:

immh Q <T>
0000 x SEE Advanced SIMD modified immediate
0001 0 8B
0001 1 16B
001x 0 4H
001x 1 8H
01xx 0 2S
01xx 1 4S
1xxx 0 RESERVED
1xxx 1 2D

<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.

<shift> For the scalar variant: is the right shift amount, in the range 1 to 64, encoded in “immh:immb”:

immh <shift>
0xxx RESERVED
1xxx (128-UInt(immh:immb))

For the vector variant: is the right shift amount, in the range 1 to the element width in bits, encoded in
“immh:immb”:

immh <shift>
0000 SEE Advanced SIMD modified immediate
0001 (16-UInt(immh:immb))
001x (32-UInt(immh:immb))
01xx (64-UInt(immh:immb))
1xxx (128-UInt(immh:immb))

Operation

CheckFPAdvSIMDEnabled64();
bits(datasize) operand = V[n];
bits(datasize) operand2;
bits(datasize) result;
integer round_const = if round then (1 << (shift - 1)) else 0;
integer element;

operand2 = if accumulate then V[d] else Zeros();

for e = 0 to elements-1
element = (Int(Elem[operand, e, esize], unsigned) + round_const) >> shift;
Elem[result, e, esize] = Elem[operand2, e, esize] + element<esize-1:0>;

V[d] = result;

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

SSRA Page 1491

SSUBL, SSUBL2

Signed Subtract Long. This instruction subtracts each vector element in the lower or upper half of the second source
SIMD&FP register from the corresponding vector element of the first source SIMD&FP register, places the results into
a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are signed
integer values. The destination vector elements are twice as long as the source vector elements.
The SSUBL instruction extracts each source vector from the lower half of each source register. The SSUBL2 instruction
extracts each source vector from the upper half of each source register.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 size 1 Rm 0 0 1 0 0 0 Rn Rd
U o1

SSUBL{2} <Vd>.<Ta>, <Vn>.<Tb>, <Vm>.<Tb>

integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);

if size == '11' then UNDEFINED;

integer esize = 8 << UInt(size);
integer datasize = 64;
integer part = UInt(Q);
integer elements = datasize DIV esize;

boolean sub_op = (o1 == '1');

boolean unsigned = (U == '1');

Assembler Symbols

2 Is the second and upper half specifier. If present it causes the operation to be performed on the upper
64 bits of the registers holding the narrower elements, and is encoded in “Q”:

Q 2
0 [absent]
1 [present]

<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.

<Ta> Is an arrangement specifier, encoded in “size”:

size <Ta>
00 8H
01 4S
10 2D
11 RESERVED

<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.

<Tb> Is an arrangement specifier, encoded in “size:Q”:

size Q <Tb>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 x RESERVED

<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.

SSUBL, SSUBL2 Page 1492

Operation

CheckFPAdvSIMDEnabled64();
bits(datasize) operand1 = Vpart[n, part];
bits(datasize) operand2 = Vpart[m, part];
bits(2*datasize) result;
integer element1;
integer element2;
integer sum;

V[d] = result;

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

SSUBL, SSUBL2 Page 1493

SSUBW, SSUBW2

Signed Subtract Wide. This instruction subtracts each vector element in the lower or upper half of the second source
SIMD&FP register from the corresponding vector element in the first source SIMD&FP register, places the result in a
vector, and writes the vector to the SIMD&FP destination register. All the values in this instruction are signed integer
values.
The SSUBW instruction extracts the second source vector from the lower half of the second source register. The SSUBW2
instruction extracts the second source vector from the upper half of the second source register.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 size 1 Rm 0 0 1 1 0 0 Rn Rd
U o1

SSUBW{2} <Vd>.<Ta>, <Vn>.<Ta>, <Vm>.<Tb>

integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);

if size == '11' then UNDEFINED;

integer esize = 8 << UInt(size);
integer datasize = 64;
integer part = UInt(Q);
integer elements = datasize DIV esize;

boolean sub_op = (o1 == '1');

boolean unsigned = (U == '1');

Assembler Symbols

2 Is the second and upper half specifier. If present it causes the operation to be performed on the upper
64 bits of the registers holding the narrower elements, and is encoded in “Q”:

Q 2
0 [absent]
1 [present]

<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.

<Ta> Is an arrangement specifier, encoded in “size”:

size <Ta>
00 8H
01 4S
10 2D
11 RESERVED

<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.

<Tb> Is an arrangement specifier, encoded in “size:Q”:

size Q <Tb>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 x RESERVED

SSUBW, SSUBW2 Page 1494

Operation

CheckFPAdvSIMDEnabled64();
bits(2*datasize) operand1 = V[n];
bits(datasize) operand2 = Vpart[m, part];
bits(2*datasize) result;
integer element1;
integer element2;
integer sum;

V[d] = result;

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

SSUBW, SSUBW2 Page 1495

ST1 (multiple structures)

Store multiple single-element structures from one, two, three, or four registers. This instruction stores elements to
memory from one, two, three, or four SIMD&FP registers, without interleaving. Every element of each register is
stored.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: No offset and Post-index

No offset

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 0 0 0 0 0 0 0 0 0 0 x x 1 x size Rn Rt
L opcode

One register (opcode == 0111)

ST1 { <Vt>.<T> }, [<Xn|SP>]

Two registers (opcode == 1010)

ST1 { <Vt>.<T>, <Vt2>.<T> }, [<Xn|SP>]

Three registers (opcode == 0110)

ST1 { <Vt>.<T>, <Vt2>.<T>, <Vt3>.<T> }, [<Xn|SP>]

Four registers (opcode == 0010)

ST1 { <Vt>.<T>, <Vt2>.<T>, <Vt3>.<T>, <Vt4>.<T> }, [<Xn|SP>]

integer t = UInt(Rt);
integer n = UInt(Rn);
integer m = integer UNKNOWN;
boolean wback = FALSE;
boolean tag_checked = wback || n != 31;

Post-index

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 0 0 1 0 0 Rm x x 1 x size Rn Rt
L opcode

ST1 (multiple structures) Page 1496

One register, immediate offset (Rm == 11111 && opcode == 0111)

ST1 { <Vt>.<T> }, [<Xn|SP>], <imm>

One register, register offset (Rm != 11111 && opcode == 0111)

ST1 { <Vt>.<T> }, [<Xn|SP>], <Xm>

Two registers, immediate offset (Rm == 11111 && opcode == 1010)

ST1 { <Vt>.<T>, <Vt2>.<T> }, [<Xn|SP>], <imm>

Two registers, register offset (Rm != 11111 && opcode == 1010)

ST1 { <Vt>.<T>, <Vt2>.<T> }, [<Xn|SP>], <Xm>

Three registers, immediate offset (Rm == 11111 && opcode == 0110)

ST1 { <Vt>.<T>, <Vt2>.<T>, <Vt3>.<T> }, [<Xn|SP>], <imm>

Three registers, register offset (Rm != 11111 && opcode == 0110)

ST1 { <Vt>.<T>, <Vt2>.<T>, <Vt3>.<T> }, [<Xn|SP>], <Xm>

Four registers, immediate offset (Rm == 11111 && opcode == 0010)

ST1 { <Vt>.<T>, <Vt2>.<T>, <Vt3>.<T>, <Vt4>.<T> }, [<Xn|SP>], <imm>

Four registers, register offset (Rm != 11111 && opcode == 0010)

ST1 { <Vt>.<T>, <Vt2>.<T>, <Vt3>.<T>, <Vt4>.<T> }, [<Xn|SP>], <Xm>

integer t = UInt(Rt);
integer n = UInt(Rn);
integer m = UInt(Rm);
boolean wback = TRUE;
boolean tag_checked = wback || n != 31;

Assembler Symbols

<Vt> Is the name of the first or only SIMD&FP register to be transferred, encoded in the "Rt" field.

<T> Is an arrangement specifier, encoded in “size:Q”:

size Q <T>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 0 1D
11 1 2D

ST1 (multiple structures) Page 1497

<imm> For the one register, immediate offset variant: is the post-index immediate offset, encoded in “Q”:

Q <imm>
0 #8
1 #16

For the two registers, immediate offset variant: is the post-index immediate offset, encoded in “Q”:

Q <imm>
0 #16
1 #32

For the three registers, immediate offset variant: is the post-index immediate offset, encoded in “Q”:

Q <imm>
0 #24
1 #48

For the four registers, immediate offset variant: is the post-index immediate offset, encoded in “Q”:

Q <imm>
0 #32
1 #64

<Xm> Is the 64-bit name of the general-purpose post-index register, excluding XZR, encoded in the "Rm" field.

Shared Decode

MemOp memop = if L == '1' then MemOp_LOAD else MemOp_STORE;

integer datasize = if Q == '1' then 128 else 64;
integer esize = 8 << UInt(size);
integer elements = datasize DIV esize;

integer rpt; // number of iterations

integer selem; // structure elements

// .1D format only permitted with LD1 & ST1

if size:Q == '110' && selem != 1 then UNDEFINED;

ST1 (multiple structures) Page 1498

Operation

CheckFPAdvSIMDEnabled64();

bits(64) address;
bits(64) offs;
bits(datasize) rval;
integer tt;
constant integer ebytes = esize DIV 8;

if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);

if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];

if wback then
if m != 31 then
offs = X[m];
if n == 31 then
SP[] = address + offs;
else
X[n] = address + offs;

Operational information

If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

ST1 (multiple structures) Page 1499

ST1 (single structure)

Store a single-element structure from one lane of one register. This instruction stores the specified element of a
SIMD&FP register to memory.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: No offset and Post-index

No offset

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 0 1 0 0 0 0 0 0 0 0 x x 0 S size Rn Rt
L R opcode

8-bit (opcode == 000)

ST1 { <Vt>.B }[<index>], [<Xn|SP>]

16-bit (opcode == 010 && size == x0)

ST1 { <Vt>.H }[<index>], [<Xn|SP>]

32-bit (opcode == 100 && size == 00)

ST1 { <Vt>.S }[<index>], [<Xn|SP>]

64-bit (opcode == 100 && S == 0 && size == 01)

ST1 { <Vt>.D }[<index>], [<Xn|SP>]

integer t = UInt(Rt);
integer n = UInt(Rn);
integer m = integer UNKNOWN;
boolean wback = FALSE;
boolean tag_checked = wback || n != 31;

Post-index

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 0 1 1 0 0 Rm x x 0 S size Rn Rt
L R opcode

ST1 (single structure) Page 1500

8-bit, immediate offset (Rm == 11111 && opcode == 000)

ST1 { <Vt>.B }[<index>], [<Xn|SP>], #1

8-bit, register offset (Rm != 11111 && opcode == 000)

ST1 { <Vt>.B }[<index>], [<Xn|SP>], <Xm>

16-bit, immediate offset (Rm == 11111 && opcode == 010 && size == x0)

ST1 { <Vt>.H }[<index>], [<Xn|SP>], #2

16-bit, register offset (Rm != 11111 && opcode == 010 && size == x0)

ST1 { <Vt>.H }[<index>], [<Xn|SP>], <Xm>

32-bit, immediate offset (Rm == 11111 && opcode == 100 && size == 00)

ST1 { <Vt>.S }[<index>], [<Xn|SP>], #4

32-bit, register offset (Rm != 11111 && opcode == 100 && size == 00)

ST1 { <Vt>.S }[<index>], [<Xn|SP>], <Xm>

64-bit, immediate offset (Rm == 11111 && opcode == 100 && S == 0 && size == 01)

ST1 { <Vt>.D }[<index>], [<Xn|SP>], #8

64-bit, register offset (Rm != 11111 && opcode == 100 && S == 0 && size == 01)

ST1 { <Vt>.D }[<index>], [<Xn|SP>], <Xm>

integer t = UInt(Rt);
integer n = UInt(Rn);
integer m = UInt(Rm);
boolean wback = TRUE;
boolean tag_checked = wback || n != 31;

Assembler Symbols

ST1 (single structure) Page 1501

Shared Decode

integer init_scale = UInt(opcode<2:1>);

integer scale = init_scale;
integer selem = UInt(opcode<0>:R) + 1;
boolean replicate = FALSE;
integer index;

MemOp memop = if L == '1' then MemOp_LOAD else MemOp_STORE;

integer datasize = if Q == '1' then 128 else 64;
integer esize = 8 << scale;

ST1 (single structure) Page 1502

Operation

if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);

CheckFPAdvSIMDEnabled64();

bits(64) address;
bits(64) offs;
bits(128) rval;
bits(esize) element;
constant integer ebytes = esize DIV 8;

if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];

if wback then
if m != 31 then
offs = X[m];
if n == 31 then
SP[] = address + offs;
else
X[n] = address + offs;

Operational information

If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

ST1 (single structure) Page 1503

ST2 (multiple structures)

Store multiple 2-element structures from two registers. This instruction stores multiple 2-element structures from two
SIMD&FP registers to memory, with interleaving. Every element of each register is stored.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: No offset and Post-index

No offset

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 size Rn Rt
L opcode

ST2 { <Vt>.<T>, <Vt2>.<T> }, [<Xn|SP>]

integer t = UInt(Rt);
integer n = UInt(Rn);
integer m = integer UNKNOWN;
boolean wback = FALSE;
boolean tag_checked = wback || n != 31;

Post-index

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 0 0 1 0 0 Rm 1 0 0 0 size Rn Rt
L opcode

Immediate offset (Rm == 11111)

ST2 { <Vt>.<T>, <Vt2>.<T> }, [<Xn|SP>], <imm>

Register offset (Rm != 11111)

ST2 { <Vt>.<T>, <Vt2>.<T> }, [<Xn|SP>], <Xm>

integer t = UInt(Rt);
integer n = UInt(Rn);
integer m = UInt(Rm);
boolean wback = TRUE;
boolean tag_checked = wback || n != 31;

Assembler Symbols

<Vt> Is the name of the first or only SIMD&FP register to be transferred, encoded in the "Rt" field.

<T> Is an arrangement specifier, encoded in “size:Q”:

size Q <T>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 0 RESERVED
11 1 2D

ST2 (multiple structures) Page 1504

<imm> Is the post-index immediate offset, encoded in “Q”:

Q <imm>
0 #16
1 #32

<Xm> Is the 64-bit name of the general-purpose post-index register, excluding XZR, encoded in the "Rm" field.

Shared Decode

MemOp memop = if L == '1' then MemOp_LOAD else MemOp_STORE;

integer datasize = if Q == '1' then 128 else 64;
integer esize = 8 << UInt(size);
integer elements = datasize DIV esize;

integer rpt; // number of iterations

integer selem; // structure elements

// .1D format only permitted with LD1 & ST1

if size:Q == '110' && selem != 1 then UNDEFINED;

ST2 (multiple structures) Page 1505

Operation

CheckFPAdvSIMDEnabled64();

bits(64) address;
bits(64) offs;
bits(datasize) rval;
integer tt;
constant integer ebytes = esize DIV 8;

if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);

if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];

if wback then
if m != 31 then
offs = X[m];
if n == 31 then
SP[] = address + offs;
else
X[n] = address + offs;

Operational information

If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

ST2 (multiple structures) Page 1506

ST2 (single structure)

Store single 2-element structure from one lane of two registers. This instruction stores a 2-element structure to
memory from corresponding elements of two SIMD&FP registers.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: No offset and Post-index

No offset

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 0 1 0 0 1 0 0 0 0 0 x x 0 S size Rn Rt
L R opcode

8-bit (opcode == 000)

ST2 { <Vt>.B, <Vt2>.B }[<index>], [<Xn|SP>]

16-bit (opcode == 010 && size == x0)

ST2 { <Vt>.H, <Vt2>.H }[<index>], [<Xn|SP>]

32-bit (opcode == 100 && size == 00)

ST2 { <Vt>.S, <Vt2>.S }[<index>], [<Xn|SP>]

64-bit (opcode == 100 && S == 0 && size == 01)

ST2 { <Vt>.D, <Vt2>.D }[<index>], [<Xn|SP>]

integer t = UInt(Rt);
integer n = UInt(Rn);
integer m = integer UNKNOWN;
boolean wback = FALSE;
boolean tag_checked = wback || n != 31;

Post-index

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 0 1 1 0 1 Rm x x 0 S size Rn Rt
L R opcode

ST2 (single structure) Page 1507

8-bit, immediate offset (Rm == 11111 && opcode == 000)

ST2 { <Vt>.B, <Vt2>.B }[<index>], [<Xn|SP>], #2

8-bit, register offset (Rm != 11111 && opcode == 000)

ST2 { <Vt>.B, <Vt2>.B }[<index>], [<Xn|SP>], <Xm>

16-bit, immediate offset (Rm == 11111 && opcode == 010 && size == x0)

ST2 { <Vt>.H, <Vt2>.H }[<index>], [<Xn|SP>], #4

16-bit, register offset (Rm != 11111 && opcode == 010 && size == x0)

ST2 { <Vt>.H, <Vt2>.H }[<index>], [<Xn|SP>], <Xm>

32-bit, immediate offset (Rm == 11111 && opcode == 100 && size == 00)

ST2 { <Vt>.S, <Vt2>.S }[<index>], [<Xn|SP>], #8

32-bit, register offset (Rm != 11111 && opcode == 100 && size == 00)

ST2 { <Vt>.S, <Vt2>.S }[<index>], [<Xn|SP>], <Xm>

64-bit, immediate offset (Rm == 11111 && opcode == 100 && S == 0 && size == 01)

ST2 { <Vt>.D, <Vt2>.D }[<index>], [<Xn|SP>], #16

64-bit, register offset (Rm != 11111 && opcode == 100 && S == 0 && size == 01)

ST2 { <Vt>.D, <Vt2>.D }[<index>], [<Xn|SP>], <Xm>

integer t = UInt(Rt);
integer n = UInt(Rn);
integer m = UInt(Rm);
boolean wback = TRUE;
boolean tag_checked = wback || n != 31;

Assembler Symbols

ST2 (single structure) Page 1508

Shared Decode

integer init_scale = UInt(opcode<2:1>);

integer scale = init_scale;
integer selem = UInt(opcode<0>:R) + 1;
boolean replicate = FALSE;
integer index;

MemOp memop = if L == '1' then MemOp_LOAD else MemOp_STORE;

integer datasize = if Q == '1' then 128 else 64;
integer esize = 8 << scale;

ST2 (single structure) Page 1509

Operation

if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);

CheckFPAdvSIMDEnabled64();

bits(64) address;
bits(64) offs;
bits(128) rval;
bits(esize) element;
constant integer ebytes = esize DIV 8;

if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];

if wback then
if m != 31 then
offs = X[m];
if n == 31 then
SP[] = address + offs;
else
X[n] = address + offs;

Operational information

If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

ST2 (single structure) Page 1510

ST3 (multiple structures)

Store multiple 3-element structures from three registers. This instruction stores multiple 3-element structures to
memory from three SIMD&FP registers, with interleaving. Every element of each register is stored.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: No offset and Post-index

No offset

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 size Rn Rt
L opcode

ST3 { <Vt>.<T>, <Vt2>.<T>, <Vt3>.<T> }, [<Xn|SP>]

integer t = UInt(Rt);
integer n = UInt(Rn);
integer m = integer UNKNOWN;
boolean wback = FALSE;
boolean tag_checked = wback || n != 31;

Post-index

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 0 0 1 0 0 Rm 0 1 0 0 size Rn Rt
L opcode

Immediate offset (Rm == 11111)

ST3 { <Vt>.<T>, <Vt2>.<T>, <Vt3>.<T> }, [<Xn|SP>], <imm>

Register offset (Rm != 11111)

ST3 { <Vt>.<T>, <Vt2>.<T>, <Vt3>.<T> }, [<Xn|SP>], <Xm>

integer t = UInt(Rt);
integer n = UInt(Rn);
integer m = UInt(Rm);
boolean wback = TRUE;
boolean tag_checked = wback || n != 31;

Assembler Symbols

<Vt> Is the name of the first or only SIMD&FP register to be transferred, encoded in the "Rt" field.

<T> Is an arrangement specifier, encoded in “size:Q”:

size Q <T>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 0 RESERVED
11 1 2D

ST3 (multiple structures) Page 1511

<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.

<imm> Is the post-index immediate offset, encoded in “Q”:

Q <imm>
0 #24
1 #48

<Xm> Is the 64-bit name of the general-purpose post-index register, excluding XZR, encoded in the "Rm" field.

Shared Decode

MemOp memop = if L == '1' then MemOp_LOAD else MemOp_STORE;

integer datasize = if Q == '1' then 128 else 64;
integer esize = 8 << UInt(size);
integer elements = datasize DIV esize;

integer rpt; // number of iterations

integer selem; // structure elements

// .1D format only permitted with LD1 & ST1

if size:Q == '110' && selem != 1 then UNDEFINED;

ST3 (multiple structures) Page 1512

Operation

CheckFPAdvSIMDEnabled64();

bits(64) address;
bits(64) offs;
bits(datasize) rval;
integer tt;
constant integer ebytes = esize DIV 8;

if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);

if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];

if wback then
if m != 31 then
offs = X[m];
if n == 31 then
SP[] = address + offs;
else
X[n] = address + offs;

Operational information

If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

ST3 (multiple structures) Page 1513

ST3 (single structure)

Store single 3-element structure from one lane of three registers. This instruction stores a 3-element structure to
memory from corresponding elements of three SIMD&FP registers.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: No offset and Post-index

No offset

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 0 1 0 0 0 0 0 0 0 0 x x 1 S size Rn Rt
L R opcode

8-bit (opcode == 001)

ST3 { <Vt>.B, <Vt2>.B, <Vt3>.B }[<index>], [<Xn|SP>]

16-bit (opcode == 011 && size == x0)

ST3 { <Vt>.H, <Vt2>.H, <Vt3>.H }[<index>], [<Xn|SP>]

32-bit (opcode == 101 && size == 00)

ST3 { <Vt>.S, <Vt2>.S, <Vt3>.S }[<index>], [<Xn|SP>]

64-bit (opcode == 101 && S == 0 && size == 01)

ST3 { <Vt>.D, <Vt2>.D, <Vt3>.D }[<index>], [<Xn|SP>]

integer t = UInt(Rt);
integer n = UInt(Rn);
integer m = integer UNKNOWN;
boolean wback = FALSE;
boolean tag_checked = wback || n != 31;

Post-index

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 0 1 1 0 0 Rm x x 1 S size Rn Rt
L R opcode

ST3 (single structure) Page 1514

8-bit, immediate offset (Rm == 11111 && opcode == 001)

ST3 { <Vt>.B, <Vt2>.B, <Vt3>.B }[<index>], [<Xn|SP>], #3

8-bit, register offset (Rm != 11111 && opcode == 001)

ST3 { <Vt>.B, <Vt2>.B, <Vt3>.B }[<index>], [<Xn|SP>], <Xm>

16-bit, immediate offset (Rm == 11111 && opcode == 011 && size == x0)

ST3 { <Vt>.H, <Vt2>.H, <Vt3>.H }[<index>], [<Xn|SP>], #6

16-bit, register offset (Rm != 11111 && opcode == 011 && size == x0)

ST3 { <Vt>.H, <Vt2>.H, <Vt3>.H }[<index>], [<Xn|SP>], <Xm>

32-bit, immediate offset (Rm == 11111 && opcode == 101 && size == 00)

ST3 { <Vt>.S, <Vt2>.S, <Vt3>.S }[<index>], [<Xn|SP>], #12

32-bit, register offset (Rm != 11111 && opcode == 101 && size == 00)

ST3 { <Vt>.S, <Vt2>.S, <Vt3>.S }[<index>], [<Xn|SP>], <Xm>

64-bit, immediate offset (Rm == 11111 && opcode == 101 && S == 0 && size == 01)

ST3 { <Vt>.D, <Vt2>.D, <Vt3>.D }[<index>], [<Xn|SP>], #24

64-bit, register offset (Rm != 11111 && opcode == 101 && S == 0 && size == 01)

ST3 { <Vt>.D, <Vt2>.D, <Vt3>.D }[<index>], [<Xn|SP>], <Xm>

integer t = UInt(Rt);
integer n = UInt(Rn);
integer m = UInt(Rm);
boolean wback = TRUE;
boolean tag_checked = wback || n != 31;

Assembler Symbols

<Vt> Is the name of the first or only SIMD&FP register to be transferred, encoded in the "Rt" field.
<Vt2> Is the name of the second SIMD&FP register to be transferred, encoded as "Rt" plus 1 modulo 32.
<Vt3> Is the name of the third SIMD&FP register to be transferred, encoded as "Rt" plus 2 modulo 32.
<index> For the 8-bit variant: is the element index, encoded in "Q:S:size".
For the 16-bit variant: is the element index, encoded in "Q:S:size<1>".
For the 32-bit variant: is the element index, encoded in "Q:S".
For the 64-bit variant: is the element index, encoded in "Q".
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<Xm> Is the 64-bit name of the general-purpose post-index register, excluding XZR, encoded in the "Rm" field.

ST3 (single structure) Page 1515

Shared Decode

integer init_scale = UInt(opcode<2:1>);

integer scale = init_scale;
integer selem = UInt(opcode<0>:R) + 1;
boolean replicate = FALSE;
integer index;

MemOp memop = if L == '1' then MemOp_LOAD else MemOp_STORE;

integer datasize = if Q == '1' then 128 else 64;
integer esize = 8 << scale;

ST3 (single structure) Page 1516

Operation

if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);

CheckFPAdvSIMDEnabled64();

bits(64) address;
bits(64) offs;
bits(128) rval;
bits(esize) element;
constant integer ebytes = esize DIV 8;

if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];

if wback then
if m != 31 then
offs = X[m];
if n == 31 then
SP[] = address + offs;
else
X[n] = address + offs;

Operational information

If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

ST3 (single structure) Page 1517

ST4 (multiple structures)

Store multiple 4-element structures from four registers. This instruction stores multiple 4-element structures to
memory from four SIMD&FP registers, with interleaving. Every element of each register is stored.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: No offset and Post-index

No offset

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 size Rn Rt
L opcode

ST4 { <Vt>.<T>, <Vt2>.<T>, <Vt3>.<T>, <Vt4>.<T> }, [<Xn|SP>]

integer t = UInt(Rt);
integer n = UInt(Rn);
integer m = integer UNKNOWN;
boolean wback = FALSE;
boolean tag_checked = wback || n != 31;

Post-index

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 0 0 1 0 0 Rm 0 0 0 0 size Rn Rt
L opcode

Immediate offset (Rm == 11111)

ST4 { <Vt>.<T>, <Vt2>.<T>, <Vt3>.<T>, <Vt4>.<T> }, [<Xn|SP>], <imm>

Register offset (Rm != 11111)

ST4 { <Vt>.<T>, <Vt2>.<T>, <Vt3>.<T>, <Vt4>.<T> }, [<Xn|SP>], <Xm>

integer t = UInt(Rt);
integer n = UInt(Rn);
integer m = UInt(Rm);
boolean wback = TRUE;
boolean tag_checked = wback || n != 31;

Assembler Symbols

<Vt> Is the name of the first or only SIMD&FP register to be transferred, encoded in the "Rt" field.

<T> Is an arrangement specifier, encoded in “size:Q”:

size Q <T>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 0 RESERVED
11 1 2D

ST4 (multiple structures) Page 1518

<imm> Is the post-index immediate offset, encoded in “Q”:

Q <imm>
0 #32
1 #64

<Xm> Is the 64-bit name of the general-purpose post-index register, excluding XZR, encoded in the "Rm" field.

Shared Decode

MemOp memop = if L == '1' then MemOp_LOAD else MemOp_STORE;

integer datasize = if Q == '1' then 128 else 64;
integer esize = 8 << UInt(size);
integer elements = datasize DIV esize;

integer rpt; // number of iterations

integer selem; // structure elements

// .1D format only permitted with LD1 & ST1

if size:Q == '110' && selem != 1 then UNDEFINED;

ST4 (multiple structures) Page 1519

Operation

CheckFPAdvSIMDEnabled64();

bits(64) address;
bits(64) offs;
bits(datasize) rval;
integer tt;
constant integer ebytes = esize DIV 8;

if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);

if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];

if wback then
if m != 31 then
offs = X[m];
if n == 31 then
SP[] = address + offs;
else
X[n] = address + offs;

Operational information

If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

ST4 (multiple structures) Page 1520

ST4 (single structure)

Store single 4-element structure from one lane of four registers. This instruction stores a 4-element structure to
memory from corresponding elements of four SIMD&FP registers.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: No offset and Post-index

No offset

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 0 1 0 0 1 0 0 0 0 0 x x 1 S size Rn Rt
L R opcode

8-bit (opcode == 001)

ST4 { <Vt>.B, <Vt2>.B, <Vt3>.B, <Vt4>.B }[<index>], [<Xn|SP>]

16-bit (opcode == 011 && size == x0)

ST4 { <Vt>.H, <Vt2>.H, <Vt3>.H, <Vt4>.H }[<index>], [<Xn|SP>]

32-bit (opcode == 101 && size == 00)

ST4 { <Vt>.S, <Vt2>.S, <Vt3>.S, <Vt4>.S }[<index>], [<Xn|SP>]

64-bit (opcode == 101 && S == 0 && size == 01)

ST4 { <Vt>.D, <Vt2>.D, <Vt3>.D, <Vt4>.D }[<index>], [<Xn|SP>]

integer t = UInt(Rt);
integer n = UInt(Rn);
integer m = integer UNKNOWN;
boolean wback = FALSE;
boolean tag_checked = wback || n != 31;

Post-index

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 0 1 1 0 1 Rm x x 1 S size Rn Rt
L R opcode

ST4 (single structure) Page 1521

8-bit, immediate offset (Rm == 11111 && opcode == 001)

ST4 { <Vt>.B, <Vt2>.B, <Vt3>.B, <Vt4>.B }[<index>], [<Xn|SP>], #4

8-bit, register offset (Rm != 11111 && opcode == 001)

ST4 { <Vt>.B, <Vt2>.B, <Vt3>.B, <Vt4>.B }[<index>], [<Xn|SP>], <Xm>

16-bit, immediate offset (Rm == 11111 && opcode == 011 && size == x0)

ST4 { <Vt>.H, <Vt2>.H, <Vt3>.H, <Vt4>.H }[<index>], [<Xn|SP>], #8

16-bit, register offset (Rm != 11111 && opcode == 011 && size == x0)

ST4 { <Vt>.H, <Vt2>.H, <Vt3>.H, <Vt4>.H }[<index>], [<Xn|SP>], <Xm>

32-bit, immediate offset (Rm == 11111 && opcode == 101 && size == 00)

ST4 { <Vt>.S, <Vt2>.S, <Vt3>.S, <Vt4>.S }[<index>], [<Xn|SP>], #16

32-bit, register offset (Rm != 11111 && opcode == 101 && size == 00)

ST4 { <Vt>.S, <Vt2>.S, <Vt3>.S, <Vt4>.S }[<index>], [<Xn|SP>], <Xm>

64-bit, immediate offset (Rm == 11111 && opcode == 101 && S == 0 && size == 01)

ST4 { <Vt>.D, <Vt2>.D, <Vt3>.D, <Vt4>.D }[<index>], [<Xn|SP>], #32

64-bit, register offset (Rm != 11111 && opcode == 101 && S == 0 && size == 01)

ST4 { <Vt>.D, <Vt2>.D, <Vt3>.D, <Vt4>.D }[<index>], [<Xn|SP>], <Xm>

integer t = UInt(Rt);
integer n = UInt(Rn);
integer m = UInt(Rm);
boolean wback = TRUE;
boolean tag_checked = wback || n != 31;

Assembler Symbols

<Vt> Is the name of the first or only SIMD&FP register to be transferred, encoded in the "Rt" field.
<Vt2> Is the name of the second SIMD&FP register to be transferred, encoded as "Rt" plus 1 modulo 32.
<Vt3> Is the name of the third SIMD&FP register to be transferred, encoded as "Rt" plus 2 modulo 32.
<Vt4> Is the name of the fourth SIMD&FP register to be transferred, encoded as "Rt" plus 3 modulo 32.
<index> For the 8-bit variant: is the element index, encoded in "Q:S:size".
For the 16-bit variant: is the element index, encoded in "Q:S:size<1>".
For the 32-bit variant: is the element index, encoded in "Q:S".
For the 64-bit variant: is the element index, encoded in "Q".
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<Xm> Is the 64-bit name of the general-purpose post-index register, excluding XZR, encoded in the "Rm" field.

ST4 (single structure) Page 1522

Shared Decode

integer init_scale = UInt(opcode<2:1>);

integer scale = init_scale;
integer selem = UInt(opcode<0>:R) + 1;
boolean replicate = FALSE;
integer index;

MemOp memop = if L == '1' then MemOp_LOAD else MemOp_STORE;

integer datasize = if Q == '1' then 128 else 64;
integer esize = 8 << scale;

ST4 (single structure) Page 1523

Operation

if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);

CheckFPAdvSIMDEnabled64();

bits(64) address;
bits(64) offs;
bits(128) rval;
bits(esize) element;
constant integer ebytes = esize DIV 8;

if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];

if wback then
if m != 31 then
offs = X[m];
if n == 31 then
SP[] = address + offs;
else
X[n] = address + offs;

Operational information

If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

ST4 (single structure) Page 1524

STNP (SIMD&FP)

Store Pair of SIMD&FP registers, with Non-temporal hint. This instruction stores a pair of SIMD&FP registers to
memory, issuing a hint to the memory system that the access is non-temporal. The address used for the store is
calculated from an address from a base register value and an immediate offset. For information about non-temporal
pair instructions, see Load/Store SIMD and Floating-point Non-temporal pair.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
opc 1 0 1 1 0 0 0 0 imm7 Rt2 Rn Rt
L

32-bit (opc == 00)

STNP <St1>, <St2>, [<Xn|SP>{, #<imm>}]

64-bit (opc == 01)

STNP <Dt1>, <Dt2>, [<Xn|SP>{, #<imm>}]

128-bit (opc == 10)

STNP <Qt1>, <Qt2>, [<Xn|SP>{, #<imm>}]

// Empty.

Assembler Symbols

Shared Decode

STNP (SIMD&FP) Page 1525

Operation

CheckFPAdvSIMDEnabled64();
bits(64) address;
bits(datasize) data1;
bits(datasize) data2;
constant integer dbytes = datasize DIV 8;

if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);

if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];

address = address + offset;

data1 = V[t];
data2 = V[t2];
Mem[address, dbytes, AccType_VECSTREAM] = data1;
Mem[address+dbytes, dbytes, AccType_VECSTREAM] = data2;

Operational information

If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

STNP (SIMD&FP) Page 1526

STP (SIMD&FP)

Store Pair of SIMD&FP registers. This instruction stores a pair of SIMD&FP registers to memory. The address used for
the store is calculated from a base register value and an immediate offset.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 3 classes: Post-index , Pre-index and Signed offset

Post-index

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
opc 1 0 1 1 0 0 1 0 imm7 Rt2 Rn Rt
L

32-bit (opc == 00)

STP <St1>, <St2>, [<Xn|SP>], #<imm>

64-bit (opc == 01)

STP <Dt1>, <Dt2>, [<Xn|SP>], #<imm>

128-bit (opc == 10)

STP <Qt1>, <Qt2>, [<Xn|SP>], #<imm>

boolean wback = TRUE;

boolean postindex = TRUE;

Pre-index

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
opc 1 0 1 1 0 1 1 0 imm7 Rt2 Rn Rt
L

32-bit (opc == 00)

STP <St1>, <St2>, [<Xn|SP>, #<imm>]!

64-bit (opc == 01)

STP <Dt1>, <Dt2>, [<Xn|SP>, #<imm>]!

128-bit (opc == 10)

STP <Qt1>, <Qt2>, [<Xn|SP>, #<imm>]!

boolean wback = TRUE;

boolean postindex = FALSE;

Signed offset

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
opc 1 0 1 1 0 1 0 0 imm7 Rt2 Rn Rt
L

STP (SIMD&FP) Page 1527

32-bit (opc == 00)

STP <St1>, <St2>, [<Xn|SP>{, #<imm>}]

64-bit (opc == 01)

STP <Dt1>, <Dt2>, [<Xn|SP>{, #<imm>}]

128-bit (opc == 10)

STP <Qt1>, <Qt2>, [<Xn|SP>{, #<imm>}]

boolean wback = FALSE;

boolean postindex = FALSE;

Assembler Symbols

Shared Decode

STP (SIMD&FP) Page 1528

Operation

CheckFPAdvSIMDEnabled64();
bits(64) address;
bits(datasize) data1;
bits(datasize) data2;
constant integer dbytes = datasize DIV 8;

if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);

if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];

if !postindex then
address = address + offset;

data1 = V[t];
data2 = V[t2];
Mem[address, dbytes, AccType_VEC] = data1;
Mem[address+dbytes, dbytes, AccType_VEC] = data2;

if wback then
if postindex then
address = address + offset;
if n == 31 then
SP[] = address;
else
X[n] = address;

Operational information

If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

STP (SIMD&FP) Page 1529

STR (immediate, SIMD&FP)

Store SIMD&FP register (immediate offset). This instruction stores a single SIMD&FP register to memory. The
address that is used for the store is calculated from a base register value and an immediate offset.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 3 classes: Post-index , Pre-index and Unsigned offset

Post-index

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
size 1 1 1 1 0 0 x 0 0 imm9 0 1 Rn Rt
opc

8-bit (size == 00 && opc == 00)

STR <Bt>, [<Xn|SP>], #<simm>

16-bit (size == 01 && opc == 00)

STR <Ht>, [<Xn|SP>], #<simm>

32-bit (size == 10 && opc == 00)

STR <St>, [<Xn|SP>], #<simm>

64-bit (size == 11 && opc == 00)

STR <Dt>, [<Xn|SP>], #<simm>

128-bit (size == 00 && opc == 10)

STR <Qt>, [<Xn|SP>], #<simm>

boolean wback = TRUE;

boolean postindex = TRUE;
integer scale = UInt(opc<1>:size);
if scale > 4 then UNDEFINED;
bits(64) offset = SignExtend(imm9, 64);

Pre-index

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
size 1 1 1 1 0 0 x 0 0 imm9 1 1 Rn Rt
opc

STR (immediate, SIMD&FP) Page 1530

8-bit (size == 00 && opc == 00)

STR <Bt>, [<Xn|SP>, #<simm>]!

16-bit (size == 01 && opc == 00)

STR <Ht>, [<Xn|SP>, #<simm>]!

32-bit (size == 10 && opc == 00)

STR <St>, [<Xn|SP>, #<simm>]!

64-bit (size == 11 && opc == 00)

STR <Dt>, [<Xn|SP>, #<simm>]!

128-bit (size == 00 && opc == 10)

STR <Qt>, [<Xn|SP>, #<simm>]!

boolean wback = TRUE;

boolean postindex = FALSE;
integer scale = UInt(opc<1>:size);
if scale > 4 then UNDEFINED;
bits(64) offset = SignExtend(imm9, 64);

Unsigned offset

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
size 1 1 1 1 0 1 x 0 imm12 Rn Rt
opc

8-bit (size == 00 && opc == 00)

STR <Bt>, [<Xn|SP>{, #<pimm>}]

16-bit (size == 01 && opc == 00)

STR <Ht>, [<Xn|SP>{, #<pimm>}]

32-bit (size == 10 && opc == 00)

STR <St>, [<Xn|SP>{, #<pimm>}]

64-bit (size == 11 && opc == 00)

STR <Dt>, [<Xn|SP>{, #<pimm>}]

128-bit (size == 00 && opc == 10)

STR <Qt>, [<Xn|SP>{, #<pimm>}]

boolean wback = FALSE;

boolean postindex = FALSE;
integer scale = UInt(opc<1>:size);
if scale > 4 then UNDEFINED;
bits(64) offset = LSL(ZeroExtend(imm12, 64), scale);

STR (immediate, SIMD&FP) Page 1531

Assembler Symbols

Shared Decode

STR (immediate, SIMD&FP) Page 1532

Operation

CheckFPAdvSIMDEnabled64();
bits(64) address;
bits(datasize) data;

if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);

if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];

if !postindex then
address = address + offset;

case memop of
when MemOp_STORE
data = V[t];
Mem[address, datasize DIV 8, AccType_VEC] = data;

when MemOp_LOAD
data = Mem[address, datasize DIV 8, AccType_VEC];
V[t] = data;

if wback then
if postindex then
address = address + offset;
if n == 31 then
SP[] = address;
else
X[n] = address;

Operational information

If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

STR (immediate, SIMD&FP) Page 1533

STR (register, SIMD&FP)

Store SIMD&FP register (register offset). This instruction stores a single SIMD&FP register to memory. The address
that is used for the store is calculated from a base register value and an offset register value. The offset can be
optionally shifted and extended.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
size 1 1 1 1 0 0 x 0 1 Rm option S 1 0 Rn Rt
opc

8-bit (size == 00 && opc == 00 && option != 011)

STR <Bt>, [<Xn|SP>, (<Wm>|<Xm>), <extend> {<amount>}]

8-bit (size == 00 && opc == 00 && option == 011)

STR <Bt>, [<Xn|SP>, <Xm>{, LSL <amount>}]

16-bit (size == 01 && opc == 00)

STR <Ht>, [<Xn|SP>, (<Wm>|<Xm>){, <extend> {<amount>}}]

32-bit (size == 10 && opc == 00)

STR <St>, [<Xn|SP>, (<Wm>|<Xm>){, <extend> {<amount>}}]

64-bit (size == 11 && opc == 00)

STR <Dt>, [<Xn|SP>, (<Wm>|<Xm>){, <extend> {<amount>}}]

128-bit (size == 00 && opc == 10)

STR <Qt>, [<Xn|SP>, (<Wm>|<Xm>){, <extend> {<amount>}}]

integer scale = UInt(opc<1>:size);

if scale > 4 then UNDEFINED;
if option<1> == '0' then UNDEFINED; // sub-word index
ExtendType extend_type = DecodeRegExtend(option);
integer shift = if S == '1' then scale else 0;

Assembler Symbols

STR (register, SIMD&FP) Page 1534

<extend> For the 8-bit variant: is the index extend specifier, encoded in “option”:

option <extend>
010 UXTW
110 SXTW
111 SXTX

option <extend>
010 UXTW
011 LSL
110 SXTW
111 SXTX

<amount> For the 8-bit variant: is the index shift amount, it must be #0, encoded in "S" as 0 if omitted, or as 1 if
present.

For the 16-bit variant: is the index shift amount, optional only when <extend> is not LSL. Where it is
permitted to be optional, it defaults to #0. It is encoded in “S”:

S <amount>
0 #0
1 #1

For the 32-bit variant: is the index shift amount, optional only when <extend> is not LSL. Where it is
permitted to be optional, it defaults to #0. It is encoded in “S”:

S <amount>
0 #0
1 #2

For the 64-bit variant: is the index shift amount, optional only when <extend> is not LSL. Where it is
permitted to be optional, it defaults to #0. It is encoded in “S”:

S <amount>
0 #0
1 #3

For the 128-bit variant: is the index shift amount, optional only when <extend> is not LSL. Where it is
permitted to be optional, it defaults to #0. It is encoded in “S”:

S <amount>
0 #0
1 #4

Shared Decode

STR (register, SIMD&FP) Page 1535

Operation

bits(64) offset = ExtendReg(m, extend_type, shift);

CheckFPAdvSIMDEnabled64();
bits(64) address;
bits(datasize) data;

if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);

if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];

address = address + offset;

case memop of
when MemOp_STORE
data = V[t];
Mem[address, datasize DIV 8, AccType_VEC] = data;

when MemOp_LOAD
data = Mem[address, datasize DIV 8, AccType_VEC];
V[t] = data;

Operational information

If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

STR (register, SIMD&FP) Page 1536

STUR (SIMD&FP)

Store SIMD&FP register (unscaled offset). This instruction stores a single SIMD&FP register to memory. The address
that is used for the store is calculated from a base register value and an optional immediate offset.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
size 1 1 1 1 0 0 x 0 0 imm9 0 0 Rn Rt
opc

8-bit (size == 00 && opc == 00)

STUR <Bt>, [<Xn|SP>{, #<simm>}]

16-bit (size == 01 && opc == 00)

STUR <Ht>, [<Xn|SP>{, #<simm>}]

32-bit (size == 10 && opc == 00)

STUR <St>, [<Xn|SP>{, #<simm>}]

64-bit (size == 11 && opc == 00)

STUR <Dt>, [<Xn|SP>{, #<simm>}]

128-bit (size == 00 && opc == 10)

STUR <Qt>, [<Xn|SP>{, #<simm>}]

integer scale = UInt(opc<1>:size);

if scale > 4 then UNDEFINED;
bits(64) offset = SignExtend(imm9, 64);

Assembler Symbols

Shared Decode

integer n = UInt(Rn);
integer t = UInt(Rt);
MemOp memop = if opc<0> == '1' then MemOp_LOAD else MemOp_STORE;
integer datasize = 8 << scale;
boolean tag_checked = memop != MemOp_PREFETCH && (n != 31);

STUR (SIMD&FP) Page 1537

Operation

CheckFPAdvSIMDEnabled64();
bits(64) address;
bits(datasize) data;

if HaveMTE2Ext() then
SetTagCheckedInstruction(tag_checked);

if n == 31 then
CheckSPAlignment();
address = SP[];
else
address = X[n];

address = address + offset;

case memop of
when MemOp_STORE
data = V[t];
Mem[address, datasize DIV 8, AccType_VEC] = data;

when MemOp_LOAD
data = Mem[address, datasize DIV 8, AccType_VEC];
V[t] = data;

Operational information

If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

STUR (SIMD&FP) Page 1538

SUB (vector)

Subtract (vector). This instruction subtracts each vector element in the second source SIMD&FP register from the
corresponding vector element in the first source SIMD&FP register, places the result into a vector, and writes the
vector to the destination SIMD&FP register.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: Scalar and Vector

Scalar

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 1 1 0 size 1 Rm 1 0 0 0 0 1 Rn Rd
U

SUB <V><d>, <V><n>, <V><m>

Vector

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 size 1 Rm 1 0 0 0 0 1 Rn Rd
U

SUB <Vd>.<T>, <Vn>.<T>, <Vm>.<T>

Assembler Symbols

<V> Is a width specifier, encoded in “size”:

size <V>
0x RESERVED
10 RESERVED
11 D

<T> Is an arrangement specifier, encoded in “size:Q”:

SUB (vector) Page 1539

size Q <T>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 0 RESERVED
11 1 2D

<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.

Operation

CheckFPAdvSIMDEnabled64();
bits(datasize) operand1 = V[n];
bits(datasize) operand2 = V[m];
bits(datasize) result;
bits(esize) element1;
bits(esize) element2;

V[d] = result;

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

SUB (vector) Page 1540

SUBHN, SUBHN2

Subtract returning High Narrow. This instruction subtracts each vector element in the second source SIMD&FP
register from the corresponding vector element in the first source SIMD&FP register, places the most significant half
of the result into a vector, and writes the vector to the lower or upper half of the destination SIMD&FP register. All the
values in this instruction are signed integer values.
The results are truncated. For rounded results, see RSUBHN.
The SUBHN instruction writes the vector to the lower half of the destination register and clears the upper half, while the
SUBHN2 instruction writes the vector to the upper half of the destination register without affecting the other bits of the
register.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 size 1 Rm 0 1 1 0 0 0 Rn Rd
U o1

SUBHN{2} <Vd>.<Tb>, <Vn>.<Ta>, <Vm>.<Ta>

integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);

if size == '11' then UNDEFINED;

integer esize = 8 << UInt(size);
integer datasize = 64;
integer part = UInt(Q);
integer elements = datasize DIV esize;

boolean sub_op = (o1 == '1');

boolean round = (U == '1');

Assembler Symbols

2 Is the second and upper half specifier. If present it causes the operation to be performed on the upper
64 bits of the registers holding the narrower elements, and is encoded in “Q”:

Q 2
0 [absent]
1 [present]

<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.

<Tb> Is an arrangement specifier, encoded in “size:Q”:

size Q <Tb>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 x RESERVED

<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.

<Ta> Is an arrangement specifier, encoded in “size”:

size <Ta>
00 8H
01 4S
10 2D
11 RESERVED

SUBHN, SUBHN2 Page 1541

<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.

Operation

Vpart[d, part] = result;

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

SUBHN, SUBHN2 Page 1542

SUDOT (by element)

Dot product index form with signed and unsigned integers. This instruction performs the dot product of the four
signed 8-bit integer values in each 32-bit element of the first source register with the four unsigned 8-bit integer
values in an indexed 32-bit element of the second source register, accumulating the result into the corresponding
32-bit element of the destination vector.
From Armv8.2 to Armv8.5, this is an OPTIONAL instruction. From Armv8.6 it is mandatory for implementations that
include Advanced SIMD to support it. ID_AA64ISAR1_EL1.I8MM indicates whether this instruction is supported.

Vector
(FEAT_I8MM)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 1 0 0 L M Rm 1 1 1 1 H 0 Rn Rd
US

SUDOT <Vd>.<Ta>, <Vn>.<Tb>, <Vm>.4B[<index>]

if !HaveInt8MatMulExt() then UNDEFINED;

boolean op1_unsigned = (US == '1');
boolean op2_unsigned = (US == '0');
integer n = UInt(Rn);
integer m = UInt(M:Rm);
integer d = UInt(Rd);
integer i = UInt(H:L);
integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV 32;

Assembler Symbols

<Vd> Is the name of the SIMD&FP third source and destination register, encoded in the "Rd" field.

<Ta> Is an arrangement specifier, encoded in “Q”:

Q <Ta>
0 2S
1 4S

<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.

<Tb> Is an arrangement specifier, encoded in “Q”:

Q <Tb>
0 8B
1 16B

<Vm> Is the name of the second SIMD&FP source register, encoded in the "M:Rm" fields.
<index> Is the immediate index of a quadtuplet of four 8-bit elements in the range 0 to 3, encoded in the "H:L"
fields.

SUDOT (by element) Page 1543

Operation

CheckFPAdvSIMDEnabled64();
bits(datasize) operand1 = V[n];
bits(128) operand2 = V[m];
bits(datasize) operand3 = V[d];
bits(datasize) result;

for e = 0 to elements-1
bits(32) res = Elem[operand3, e, 32];
for b = 0 to 3
integer element1 = Int(Elem[operand1, 4*e+b, 8], op1_unsigned);
integer element2 = Int(Elem[operand2, 4*i+b, 8], op2_unsigned);
res = res + element1 * element2;
Elem[result, e, 32] = res;
V[d] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

SUDOT (by element) Page 1544

SUQADD

Signed saturating Accumulate of Unsigned value. This instruction adds the unsigned integer values of the vector
elements in the source SIMD&FP register to corresponding signed integer values of the vector elements in the
destination SIMD&FP register, and writes the resulting signed integer values to the destination SIMD&FP register.
If overflow occurs with any of the results, those results are saturated. If saturation occurs, the cumulative saturation
bit FPSR.QC is set.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: Scalar and Vector

Scalar

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 1 1 1 1 0 size 1 0 0 0 0 0 0 0 1 1 1 0 Rn Rd
U

SUQADD <V><d>, <V><n>

integer d = UInt(Rd);
integer n = UInt(Rn);

integer esize = 8 << UInt(size);

integer datasize = esize;
integer elements = 1;

boolean unsigned = (U == '1');

Vector

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 size 1 0 0 0 0 0 0 0 1 1 1 0 Rn Rd
U

SUQADD <Vd>.<T>, <Vn>.<T>

integer d = UInt(Rd);
integer n = UInt(Rn);

if size:Q == '110' then UNDEFINED;

integer esize = 8 << UInt(size);
integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;

boolean unsigned = (U == '1');

Assembler Symbols

<V> Is a width specifier, encoded in “size”:

size <V>
00 B
01 H
10 S
11 D

SUQADD Page 1545

<T> Is an arrangement specifier, encoded in “size:Q”:

size Q <T>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 0 RESERVED
11 1 2D

<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.

Operation

CheckFPAdvSIMDEnabled64();
bits(datasize) operand = V[n];
bits(datasize) result;

bits(datasize) operand2 = V[d];

integer op1;
integer op2;
boolean sat;

for e = 0 to elements-1
op1 = Int(Elem[operand, e, esize], !unsigned);
op2 = Int(Elem[operand2, e, esize], unsigned);
(Elem[result, e, esize], sat) = SatQ(op1 + op2, esize, unsigned);
if sat then FPSR.QC = '1';
V[d] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

SUQADD Page 1546

SXTL, SXTL2

Signed extend Long. This instruction duplicates each vector element in the lower or upper half of the source
SIMD&FP register into a vector, and writes the vector to the destination SIMD&FP register. The destination vector
elements are twice as long as the source vector elements. All the values in this instruction are signed integer values.
The SXTL instruction extracts the source vector from the lower half of the source register. The SXTL2 instruction
extracts the source vector from the upper half of the source register.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.

This is an alias of SSHLL, SSHLL2. This means:

• The encodings in this description are named to match the encodings of SSHLL, SSHLL2.
• The description of SSHLL, SSHLL2 gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 1 0 != 0000 0 0 0 1 0 1 0 0 1 Rn Rd
U immh immb

SXTL{2} <Vd>.<Ta>, <Vn>.<Tb>

is equivalent to

SSHLL{2} <Vd>.<Ta>, <Vn>.<Tb>, #0

and is the preferred disassembly when BitCount(immh) == 1.

Assembler Symbols

2 Is the second and upper half specifier. If present it causes the operation to be performed on the upper
64 bits of the registers holding the narrower elements, and is encoded in “Q”:

Q 2
0 [absent]
1 [present]

<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.

<Ta> Is an arrangement specifier, encoded in “immh”:

immh <Ta>
0000 SEE Advanced SIMD modified immediate
0001 8H
001x 4S
01xx 2D
1xxx RESERVED

<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.

<Tb> Is an arrangement specifier, encoded in “immh:Q”:

immh Q <Tb>
0000 x SEE Advanced SIMD modified immediate
0001 0 8B
0001 1 16B
001x 0 4H
001x 1 8H
01xx 0 2S
01xx 1 4S
1xxx x RESERVED

SXTL, SXTL2 Page 1547

Operation

The description of SSHLL, SSHLL2 gives the operational pseudocode for this instruction.

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

SXTL, SXTL2 Page 1548

TBL

Table vector Lookup. This instruction reads each value from the vector elements in the index source SIMD&FP
register, uses each result as an index to perform a lookup in a table of bytes that is described by one to four source
table SIMD&FP registers, places the lookup result in a vector, and writes the vector to the destination SIMD&FP
register. If an index is out of range for the table, the result for that lookup is 0. If more than one source register is
used to describe the table, the first source register describes the lowest bytes of the table.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 0 0 0 Rm 0 len 0 0 0 Rn Rd
op

Two register table (len == 01)

TBL <Vd>.<Ta>, { <Vn>.16B, <Vn+1>.16B }, <Vm>.<Ta>

Three register table (len == 10)

TBL <Vd>.<Ta>, { <Vn>.16B, <Vn+1>.16B, <Vn+2>.16B }, <Vm>.<Ta>

Four register table (len == 11)

TBL <Vd>.<Ta>, { <Vn>.16B, <Vn+1>.16B, <Vn+2>.16B, <Vn+3>.16B }, <Vm>.<Ta>

Single register table (len == 00)

TBL <Vd>.<Ta>, { <Vn>.16B }, <Vm>.<Ta>

integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);

integer datasize = if Q == '1' then 128 else 64;

integer elements = datasize DIV 8;
integer regs = UInt(len) + 1;
boolean is_tbl = (op == '0');

Assembler Symbols

<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.

<Ta> Is an arrangement specifier, encoded in “Q”:

Q <Ta>
0 8B
1 16B

<Vn> For the four register table, three register table and two register table variant: is the name of the first
SIMD&FP table register, encoded in the "Rn" field.
For the single register table variant: is the name of the SIMD&FP table register, encoded in the "Rn"
field.
<Vn+1> Is the name of the second SIMD&FP table register, encoded as "Rn" plus 1 modulo 32.
<Vn+2> Is the name of the third SIMD&FP table register, encoded as "Rn" plus 2 modulo 32.
<Vn+3> Is the name of the fourth SIMD&FP table register, encoded as "Rn" plus 3 modulo 32.
<Vm> Is the name of the SIMD&FP index register, encoded in the "Rm" field.

TBL Page 1549

Operation

CheckFPAdvSIMDEnabled64();
bits(datasize) indices = V[m];
bits(128*regs) table = Zeros();
bits(datasize) result;
integer index;

// Create table from registers

for i = 0 to regs-1
table<128*i+127:128*i> = V[n];
n = (n + 1) MOD 32;

result = if is_tbl then Zeros() else V[d];

for i = 0 to elements-1
index = UInt(Elem[indices, i, 8]);
if index < 16 * regs then
Elem[result, i, 8] = Elem[table, index, 8];

V[d] = result;

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

TBL Page 1550

TBX

Table vector lookup extension. This instruction reads each value from the vector elements in the index source
SIMD&FP register, uses each result as an index to perform a lookup in a table of bytes that is described by one to four
source table SIMD&FP registers, places the lookup result in a vector, and writes the vector to the destination
SIMD&FP register. If an index is out of range for the table, the existing value in the vector element of the destination
register is left unchanged. If more than one source register is used to describe the table, the first source register
describes the lowest bytes of the table.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 0 0 0 Rm 0 len 1 0 0 Rn Rd
op

Two register table (len == 01)

TBX <Vd>.<Ta>, { <Vn>.16B, <Vn+1>.16B }, <Vm>.<Ta>

Three register table (len == 10)

TBX <Vd>.<Ta>, { <Vn>.16B, <Vn+1>.16B, <Vn+2>.16B }, <Vm>.<Ta>

Four register table (len == 11)

TBX <Vd>.<Ta>, { <Vn>.16B, <Vn+1>.16B, <Vn+2>.16B, <Vn+3>.16B }, <Vm>.<Ta>

Single register table (len == 00)

TBX <Vd>.<Ta>, { <Vn>.16B }, <Vm>.<Ta>

integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);

integer datasize = if Q == '1' then 128 else 64;

integer elements = datasize DIV 8;
integer regs = UInt(len) + 1;
boolean is_tbl = (op == '0');

Assembler Symbols

<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.

<Ta> Is an arrangement specifier, encoded in “Q”:

Q <Ta>
0 8B
1 16B

TBX Page 1551

Operation

CheckFPAdvSIMDEnabled64();
bits(datasize) indices = V[m];
bits(128*regs) table = Zeros();
bits(datasize) result;
integer index;

// Create table from registers

for i = 0 to regs-1
table<128*i+127:128*i> = V[n];
n = (n + 1) MOD 32;

result = if is_tbl then Zeros() else V[d];

for i = 0 to elements-1
index = UInt(Elem[indices, i, 8]);
if index < 16 * regs then
Elem[result, i, 8] = Elem[table, index, 8];

V[d] = result;

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

TBX Page 1552

TRN1

Transpose vectors (primary). This instruction reads corresponding even-numbered vector elements from the two
source SIMD&FP registers, starting at zero, places each result into consecutive elements of a vector, and writes the
vector to the destination SIMD&FP register. Vector elements from the first source register are placed into even-
numbered elements of the destination vector, starting at zero, while vector elements from the second source register
are placed into odd-numbered elements of the destination vector.

Note

By using this instruction with TRN2, a 2 x 2 matrix can be transposed.

The following figure shows an example of the operation of TRN1 and TRN2 halfword operations where Q = 0.
TRN1.16 TRN2.16

Vn Vn

3 2 1 0 3 2 1 0
Vd Vd

Vm Vm

TRN1 <Vd>.<T>, <Vn>.<T>, <Vm>.<T>

integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);

if size:Q == '110' then UNDEFINED;

integer esize = 8 << UInt(size);
integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;
integer part = UInt(op);
integer pairs = elements DIV 2;

Assembler Symbols

<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.

<T> Is an arrangement specifier, encoded in “size:Q”:

size Q <T>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 0 RESERVED
11 1 2D

<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.

TRN1 Page 1553

Operation

CheckFPAdvSIMDEnabled64();
bits(datasize) operand1 = V[n];
bits(datasize) operand2 = V[m];
bits(datasize) result;

for p = 0 to pairs-1
Elem[result, 2*p+0, esize] = Elem[operand1, 2*p+part, esize];
Elem[result, 2*p+1, esize] = Elem[operand2, 2*p+part, esize];

V[d] = result;

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

TRN1 Page 1554

TRN2

Transpose vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two
source SIMD&FP registers, places each result into consecutive elements of a vector, and writes the vector to the
destination SIMD&FP register. Vector elements from the first source register are placed into even-numbered elements
of the destination vector, starting at zero, while vector elements from the second source register are placed into odd-
numbered elements of the destination vector.

Note

By using this instruction with TRN1, a 2 x 2 matrix can be transposed.

The following figure shows an example of the operation of TRN1 and TRN2 halfword operations where Q = 0.
TRN1.16 TRN2.16

Vn Vn

3 2 1 0 3 2 1 0
Vd Vd

Vm Vm

TRN2 <Vd>.<T>, <Vn>.<T>, <Vm>.<T>

integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);

if size:Q == '110' then UNDEFINED;

integer esize = 8 << UInt(size);
integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;
integer part = UInt(op);
integer pairs = elements DIV 2;

Assembler Symbols

<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.

<T> Is an arrangement specifier, encoded in “size:Q”:

size Q <T>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 0 RESERVED
11 1 2D

<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.

TRN2 Page 1555

Operation

CheckFPAdvSIMDEnabled64();
bits(datasize) operand1 = V[n];
bits(datasize) operand2 = V[m];
bits(datasize) result;

for p = 0 to pairs-1
Elem[result, 2*p+0, esize] = Elem[operand1, 2*p+part, esize];
Elem[result, 2*p+1, esize] = Elem[operand2, 2*p+part, esize];

V[d] = result;

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

TRN2 Page 1556

UABA

Unsigned Absolute difference and Accumulate. This instruction subtracts the elements of the vector of the second
source SIMD&FP register from the corresponding elements of the first source SIMD&FP register, and accumulates the
absolute values of the results into the elements of the vector of the destination SIMD&FP register.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 size 1 Rm 0 1 1 1 1 1 Rn Rd
U ac

UABA <Vd>.<T>, <Vn>.<T>, <Vm>.<T>

boolean unsigned = (U == '1');

boolean accumulate = (ac == '1');

Assembler Symbols

<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.

<T> Is an arrangement specifier, encoded in “size:Q”:

size Q <T>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 x RESERVED

<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.

Operation

CheckFPAdvSIMDEnabled64();
bits(datasize) operand1 = V[n];
bits(datasize) operand2 = V[m];
bits(datasize) result;
integer element1;
integer element2;
bits(esize) absdiff;

result = if accumulate then V[d] else Zeros();

UABA Page 1557

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

UABA Page 1558

UABAL, UABAL2

Unsigned Absolute difference and Accumulate Long. This instruction subtracts the vector elements in the lower or
upper half of the second source SIMD&FP register from the corresponding vector elements of the first source
SIMD&FP register, and accumulates the absolute values of the results into the vector elements of the destination
SIMD&FP register. The destination vector elements are twice as long as the source vector elements. All the values in
this instruction are unsigned integer values.
The UABAL instruction extracts each source vector from the lower half of each source register. The UABAL2 instruction
extracts each source vector from the upper half of each source register.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 size 1 Rm 0 1 0 1 0 0 Rn Rd
U op

UABAL{2} <Vd>.<Ta>, <Vn>.<Tb>, <Vm>.<Tb>

integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);

if size == '11' then UNDEFINED;

integer esize = 8 << UInt(size);
integer datasize = 64;
integer part = UInt(Q);
integer elements = datasize DIV esize;

boolean accumulate = (op == '0');

boolean unsigned = (U == '1');

Assembler Symbols

2 Is the second and upper half specifier. If present it causes the operation to be performed on the upper
64 bits of the registers holding the narrower elements, and is encoded in “Q”:

Q 2
0 [absent]
1 [present]

<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.

<Ta> Is an arrangement specifier, encoded in “size”:

size <Ta>
00 8H
01 4S
10 2D
11 RESERVED

<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.

<Tb> Is an arrangement specifier, encoded in “size:Q”:

size Q <Tb>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 x RESERVED

<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.

UABAL, UABAL2 Page 1559

Operation

CheckFPAdvSIMDEnabled64();
bits(datasize) operand1 = Vpart[n, part];
bits(datasize) operand2 = Vpart[m, part];
bits(2*datasize) result;
integer element1;
integer element2;
bits(2*esize) absdiff;

result = if accumulate then V[d] else Zeros();

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

UABAL, UABAL2 Page 1560

UABD

Unsigned Absolute Difference (vector). This instruction subtracts the elements of the vector of the second source
SIMD&FP register from the corresponding elements of the first source SIMD&FP register, places the the absolute
values of the results into a vector, and writes the vector to the destination SIMD&FP register.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 size 1 Rm 0 1 1 1 0 1 Rn Rd
U ac

UABD <Vd>.<T>, <Vn>.<T>, <Vm>.<T>

boolean unsigned = (U == '1');

boolean accumulate = (ac == '1');

Assembler Symbols

<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.

<T> Is an arrangement specifier, encoded in “size:Q”:

size Q <T>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 x RESERVED

<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.

Operation

CheckFPAdvSIMDEnabled64();
bits(datasize) operand1 = V[n];
bits(datasize) operand2 = V[m];
bits(datasize) result;
integer element1;
integer element2;
bits(esize) absdiff;

result = if accumulate then V[d] else Zeros();

UABD Page 1561

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

UABD Page 1562

UABDL, UABDL2

Unsigned Absolute Difference Long. This instruction subtracts the vector elements in the lower or upper half of the
second source SIMD&FP register from the corresponding vector elements of the first source SIMD&FP register,
places the absolute value of the result into a vector, and writes the vector to the destination SIMD&FP register. The
destination vector elements are twice as long as the source vector elements. All the values in this instruction are
unsigned integer values.
The UABDL instruction extracts each source vector from the lower half of each source register. The UABDL2 instruction
extracts each source vector from the upper half of each source register.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 size 1 Rm 0 1 1 1 0 0 Rn Rd
U op

UABDL{2} <Vd>.<Ta>, <Vn>.<Tb>, <Vm>.<Tb>

integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);

if size == '11' then UNDEFINED;

integer esize = 8 << UInt(size);
integer datasize = 64;
integer part = UInt(Q);
integer elements = datasize DIV esize;

boolean accumulate = (op == '0');

boolean unsigned = (U == '1');

Assembler Symbols

2 Is the second and upper half specifier. If present it causes the operation to be performed on the upper
64 bits of the registers holding the narrower elements, and is encoded in “Q”:

Q 2
0 [absent]
1 [present]

<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.

<Ta> Is an arrangement specifier, encoded in “size”:

size <Ta>
00 8H
01 4S
10 2D
11 RESERVED

<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.

<Tb> Is an arrangement specifier, encoded in “size:Q”:

size Q <Tb>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 x RESERVED

<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.

UABDL, UABDL2 Page 1563

Operation

CheckFPAdvSIMDEnabled64();
bits(datasize) operand1 = Vpart[n, part];
bits(datasize) operand2 = Vpart[m, part];
bits(2*datasize) result;
integer element1;
integer element2;
bits(2*esize) absdiff;

result = if accumulate then V[d] else Zeros();

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

UABDL, UABDL2 Page 1564

UADALP

Unsigned Add and Accumulate Long Pairwise. This instruction adds pairs of adjacent unsigned integer values from the
vector in the source SIMD&FP register and accumulates the results with the vector elements of the destination
SIMD&FP register. The destination vector elements are twice as long as the source vector elements.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 size 1 0 0 0 0 0 0 1 1 0 1 0 Rn Rd
U op

UADALP <Vd>.<Ta>, <Vn>.<Tb>

integer d = UInt(Rd);
integer n = UInt(Rn);

if size == '11' then UNDEFINED;

integer esize = 8 << UInt(size);
integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV (2 * esize);
boolean acc = (op == '1');
boolean unsigned = (U == '1');

Assembler Symbols

<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.

<Ta> Is an arrangement specifier, encoded in “size:Q”:

size Q <Ta>
00 0 4H
00 1 8H
01 0 2S
01 1 4S
10 0 1D
10 1 2D
11 x RESERVED

<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.

<Tb> Is an arrangement specifier, encoded in “size:Q”:

size Q <Tb>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 x RESERVED

UADALP Page 1565

Operation

CheckFPAdvSIMDEnabled64();
bits(datasize) operand = V[n];
bits(datasize) result;

bits(2*esize) sum;
integer op1;
integer op2;

if acc then result = V[d];

V[d] = result;

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

UADALP Page 1566

UADDL, UADDL2

Unsigned Add Long (vector). This instruction adds each vector element in the lower or upper half of the first source
SIMD&FP register to the corresponding vector element of the second source SIMD&FP register, places the result into
a vector, and writes the vector to the destination SIMD&FP register. The destination vector elements are twice as long
as the source vector elements. All the values in this instruction are unsigned integer values.
The UADDL instruction extracts each source vector from the lower half of each source register. The UADDL2 instruction
extracts each source vector from the upper half of each source register.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 size 1 Rm 0 0 0 0 0 0 Rn Rd
U o1

UADDL{2} <Vd>.<Ta>, <Vn>.<Tb>, <Vm>.<Tb>

integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);

if size == '11' then UNDEFINED;

integer esize = 8 << UInt(size);
integer datasize = 64;
integer part = UInt(Q);
integer elements = datasize DIV esize;

boolean sub_op = (o1 == '1');

boolean unsigned = (U == '1');

Assembler Symbols

2 Is the second and upper half specifier. If present it causes the operation to be performed on the upper
64 bits of the registers holding the narrower elements, and is encoded in “Q”:

Q 2
0 [absent]
1 [present]

<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.

<Ta> Is an arrangement specifier, encoded in “size”:

size <Ta>
00 8H
01 4S
10 2D
11 RESERVED

<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.

<Tb> Is an arrangement specifier, encoded in “size:Q”:

size Q <Tb>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 x RESERVED

<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.

UADDL, UADDL2 Page 1567

Operation

CheckFPAdvSIMDEnabled64();
bits(datasize) operand1 = Vpart[n, part];
bits(datasize) operand2 = Vpart[m, part];
bits(2*datasize) result;
integer element1;
integer element2;
integer sum;

V[d] = result;

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

UADDL, UADDL2 Page 1568

UADDLP

Unsigned Add Long Pairwise. This instruction adds pairs of adjacent unsigned integer values from the vector in the
source SIMD&FP register, places the result into a vector, and writes the vector to the destination SIMD&FP register.
The destination vector elements are twice as long as the source vector elements.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 size 1 0 0 0 0 0 0 0 1 0 1 0 Rn Rd
U op

UADDLP <Vd>.<Ta>, <Vn>.<Tb>

integer d = UInt(Rd);
integer n = UInt(Rn);

if size == '11' then UNDEFINED;

integer esize = 8 << UInt(size);
integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV (2 * esize);
boolean acc = (op == '1');
boolean unsigned = (U == '1');

Assembler Symbols

<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.

<Ta> Is an arrangement specifier, encoded in “size:Q”:

size Q <Ta>
00 0 4H
00 1 8H
01 0 2S
01 1 4S
10 0 1D
10 1 2D
11 x RESERVED

<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.

<Tb> Is an arrangement specifier, encoded in “size:Q”:

size Q <Tb>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 x RESERVED

UADDLP Page 1569

Operation

CheckFPAdvSIMDEnabled64();
bits(datasize) operand = V[n];
bits(datasize) result;

bits(2*esize) sum;
integer op1;
integer op2;

if acc then result = V[d];

V[d] = result;

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

UADDLP Page 1570

UADDLV

Unsigned sum Long across Vector. This instruction adds every vector element in the source SIMD&FP register
together, and writes the scalar result to the destination SIMD&FP register. The destination scalar is twice as long as
the source vector elements. All the values in this instruction are unsigned integer values.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 size 1 1 0 0 0 0 0 0 1 1 1 0 Rn Rd
U

UADDLV <V><d>, <Vn>.<T>

integer d = UInt(Rd);
integer n = UInt(Rn);

if size:Q == '100' then UNDEFINED;

if size == '11' then UNDEFINED;
integer esize = 8 << UInt(size);
integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;

boolean unsigned = (U == '1');

Assembler Symbols

<V> Is the destination width specifier, encoded in “size”:

size <V>
00 H
01 S
10 D
11 RESERVED

<d> Is the number of the SIMD&FP destination register, encoded in the "Rd" field.
<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.

<T> Is an arrangement specifier, encoded in “size:Q”:

size Q <T>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 RESERVED
10 1 4S
11 x RESERVED

Operation

CheckFPAdvSIMDEnabled64();
bits(datasize) operand = V[n];
integer sum;

sum = Int(Elem[operand, 0, esize], unsigned);

for e = 1 to elements-1
sum = sum + Int(Elem[operand, e, esize], unsigned);

V[d] = sum<2*esize-1:0>;

UADDLV Page 1571

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

UADDLV Page 1572

UADDW, UADDW2

Unsigned Add Wide. This instruction adds the vector elements of the first source SIMD&FP register to the
corresponding vector elements in the lower or upper half of the second source SIMD&FP register, places the result in
a vector, and writes the vector to the SIMD&FP destination register. The vector elements of the destination register
and the first source register are twice as long as the vector elements of the second source register. All the values in
this instruction are unsigned integer values.
The UADDW instruction extracts vector elements from the lower half of the second source register. The UADDW2
instruction extracts vector elements from the upper half of the second source register.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 size 1 Rm 0 0 0 1 0 0 Rn Rd
U o1

UADDW{2} <Vd>.<Ta>, <Vn>.<Ta>, <Vm>.<Tb>

integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);

if size == '11' then UNDEFINED;

integer esize = 8 << UInt(size);
integer datasize = 64;
integer part = UInt(Q);
integer elements = datasize DIV esize;

boolean sub_op = (o1 == '1');

boolean unsigned = (U == '1');

Assembler Symbols

2 Is the second and upper half specifier. If present it causes the operation to be performed on the upper
64 bits of the registers holding the narrower elements, and is encoded in “Q”:

Q 2
0 [absent]
1 [present]

<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.

<Ta> Is an arrangement specifier, encoded in “size”:

size <Ta>
00 8H
01 4S
10 2D
11 RESERVED

<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.

<Tb> Is an arrangement specifier, encoded in “size:Q”:

size Q <Tb>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 x RESERVED

UADDW, UADDW2 Page 1573

Operation

CheckFPAdvSIMDEnabled64();
bits(2*datasize) operand1 = V[n];
bits(datasize) operand2 = Vpart[m, part];
bits(2*datasize) result;
integer element1;
integer element2;
integer sum;

V[d] = result;

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

UADDW, UADDW2 Page 1574

UCVTF (scalar, fixed-point)

Unsigned fixed-point Convert to Floating-point (scalar). This instruction converts the unsigned value in the 32-bit or
64-bit general-purpose source register to a floating-point value using the rounding mode that is specified by the FPCR,
and writes the result to the SIMD&FP destination register.
A floating-point exception can be generated by this instruction. Depending on the settings in FPCR, the exception
results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see
Floating-point exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the Security state and
Exception level in which the instruction is executed, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 0 0 1 1 1 1 0 ftype 0 0 0 0 1 1 scale Rn Rd
rmode opcode

32-bit to half-precision (sf == 0 && ftype == 11)

(FEAT_FP16)

UCVTF <Hd>, <Wn>, #<fbits>

32-bit to single-precision (sf == 0 && ftype == 00)

UCVTF <Sd>, <Wn>, #<fbits>

32-bit to double-precision (sf == 0 && ftype == 01)

UCVTF <Dd>, <Wn>, #<fbits>

64-bit to half-precision (sf == 1 && ftype == 11)

(FEAT_FP16)

UCVTF <Hd>, <Xn>, #<fbits>

64-bit to single-precision (sf == 1 && ftype == 00)

UCVTF <Sd>, <Xn>, #<fbits>

64-bit to double-precision (sf == 1 && ftype == 01)

UCVTF <Dd>, <Xn>, #<fbits>

integer d = UInt(Rd);
integer n = UInt(Rn);

integer intsize = if sf == '1' then 64 else 32;

integer fltsize;
FPRounding rounding;

case ftype of
when '00' fltsize = 32;
when '01' fltsize = 64;
when '10' UNDEFINED;
when '11'
if HaveFP16Ext() then
fltsize = 16;
else
UNDEFINED;

if sf == '0' && scale<5> == '0' then UNDEFINED;

integer fracbits = 64 - UInt(scale);

rounding = FPRoundingMode(FPCR[]);

UCVTF (scalar, fixed-point) Page 1575

Assembler Symbols

<Dd> Is the 64-bit name of the SIMD&FP destination register, encoded in the "Rd" field.
<Hd> Is the 16-bit name of the SIMD&FP destination register, encoded in the "Rd" field.
<Sd> Is the 32-bit name of the SIMD&FP destination register, encoded in the "Rd" field.
<Xn> Is the 64-bit name of the general-purpose source register, encoded in the "Rn" field.
<Wn> Is the 32-bit name of the general-purpose source register, encoded in the "Rn" field.
<fbits> For the 32-bit to double-precision, 32-bit to half-precision and 32-bit to single-precision variant: is the
number of bits after the binary point in the fixed-point source, in the range 1 to 32, encoded as 64
minus "scale".
For the 64-bit to double-precision, 64-bit to half-precision and 64-bit to single-precision variant: is the
number of bits after the binary point in the fixed-point source, in the range 1 to 64, encoded as 64
minus "scale".

Operation

CheckFPAdvSIMDEnabled64();

FPCRType fpcr = FPCR[];

boolean merge = IsMerging(fpcr);
integer fsize = if merge then 128 else fltsize;
bits(fsize) fltval;
bits(intsize) intval;

intval = X[n];
fltval = if merge then V[d] else Zeros();
Elem[fltval, 0, fltsize] = FixedToFP(intval, fracbits, TRUE, fpcr, rounding);
V[d] = fltval;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

UCVTF (scalar, fixed-point) Page 1576

UCVTF (scalar, integer)

Unsigned integer Convert to Floating-point (scalar). This instruction converts the unsigned integer value in the
general-purpose source register to a floating-point value using the rounding mode that is specified by the FPCR, and
writes the result to the SIMD&FP destination register.
A floating-point exception can be generated by this instruction. Depending on the settings in FPCR, the exception
results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see
Floating-point exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
sf 0 0 1 1 1 1 0 ftype 1 0 0 0 1 1 0 0 0 0 0 0 Rn Rd
rmode opcode

32-bit to half-precision (sf == 0 && ftype == 11)

(FEAT_FP16)

UCVTF <Hd>, <Wn>

32-bit to single-precision (sf == 0 && ftype == 00)

UCVTF <Sd>, <Wn>

32-bit to double-precision (sf == 0 && ftype == 01)

UCVTF <Dd>, <Wn>

64-bit to half-precision (sf == 1 && ftype == 11)

(FEAT_FP16)

UCVTF <Hd>, <Xn>

64-bit to single-precision (sf == 1 && ftype == 00)

UCVTF <Sd>, <Xn>

64-bit to double-precision (sf == 1 && ftype == 01)

UCVTF <Dd>, <Xn>

integer d = UInt(Rd);
integer n = UInt(Rn);

integer intsize = if sf == '1' then 64 else 32;

integer fltsize;
FPRounding rounding;

case ftype of
when '00'
fltsize = 32;
when '01'
fltsize = 64;
when '10'
UNDEFINED;
when '11'
if HaveFP16Ext() then
fltsize = 16;
else
UNDEFINED;

rounding = FPRoundingMode(FPCR[]);

UCVTF (scalar, integer) Page 1577

Assembler Symbols

Operation

CheckFPAdvSIMDEnabled64();

FPCRType fpcr = FPCR[];

boolean merge = IsMerging(fpcr);
integer fsize = if merge then 128 else fltsize;
bits(fsize) fltval;
bits(intsize) intval;

intval = X[n];
fltval = if merge then V[d] else Zeros();
Elem[fltval, 0, fltsize] = FixedToFP(intval, 0, TRUE, fpcr, rounding);
V[d] = fltval;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

UCVTF (scalar, integer) Page 1578

UCVTF (vector, fixed-point)

Unsigned fixed-point Convert to Floating-point (vector). This instruction converts each element in a vector from fixed-
point to floating-point using the rounding mode that is specified by the FPCR, and writes the result to the SIMD&FP
destination register.
A floating-point exception can be generated by this instruction. Depending on the settings in FPCR, the exception
results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see
Floating-point exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the Security state and
Exception level in which the instruction is executed, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: Scalar and Vector

Scalar

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 1 1 1 0 != 0000 immb 1 1 1 0 0 1 Rn Rd
U immh

UCVTF <V><d>, <V><n>, #<fbits>

integer d = UInt(Rd);
integer n = UInt(Rn);

if immh == '000x' || (immh == '001x' && !HaveFP16Ext()) then UNDEFINED;

integer esize = if immh == '1xxx' then 64 else if immh == '01xx' then 32 else 16;
integer datasize = esize;
integer elements = 1;

integer fracbits = (esize * 2) - UInt(immh:immb);

boolean unsigned = (U == '1');
FPRounding rounding = FPRoundingMode(FPCR[]);

Vector

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 1 0 != 0000 immb 1 1 1 0 0 1 Rn Rd
U immh

UCVTF <Vd>.<T>, <Vn>.<T>, #<fbits>

integer d = UInt(Rd);
integer n = UInt(Rn);

if immh == '0000' then SEE(asimdimm);

integer fracbits = (esize * 2) - UInt(immh:immb);

boolean unsigned = (U == '1');
FPRounding rounding = FPRoundingMode(FPCR[]);

Assembler Symbols

<V> Is a width specifier, encoded in “immh”:

UCVTF (vector, fixed-point) Page 1579

immh <V>
000x RESERVED
001x H
01xx S
1xxx D

<T> Is an arrangement specifier, encoded in “immh:Q”:

immh Q <T>
0000 x SEE Advanced SIMD modified immediate
0001 x RESERVED
001x 0 4H
001x 1 8H
01xx 0 2S
01xx 1 4S
1xxx 0 RESERVED
1xxx 1 2D

<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.

<fbits> For the scalar variant: is the number of fractional bits, in the range 1 to the operand width, encoded in
“immh:immb”:

immh <fbits>
000x RESERVED
001x (32-Uint(immh:immb))
01xx (64-UInt(immh:immb))
1xxx (128-UInt(immh:immb))

For the vector variant: is the number of fractional bits, in the range 1 to the element width, encoded in
“immh:immb”:

immh <fbits>
0000 SEE Advanced SIMD modified immediate
0001 RESERVED
001x (32-Uint(immh:immb))
01xx (64-UInt(immh:immb))
1xxx (128-UInt(immh:immb))

Operation

CheckFPAdvSIMDEnabled64();
bits(datasize) operand = V[n];

bits(esize) element;
FPCRType fpcr = FPCR[];
boolean merge = elements == 1 && IsMerging(fpcr);
bits(128) result = if merge then V[d] else Zeros();

for e = 0 to elements-1
element = Elem[operand, e, esize];
Elem[result, e, esize] = FixedToFP(element, fracbits, unsigned, fpcr, rounding);

V[d] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

UCVTF (vector, fixed-point) Page 1580

UCVTF (vector, integer)

Unsigned integer Convert to Floating-point (vector). This instruction converts each element in a vector from an
unsigned integer value to a floating-point value using the rounding mode that is specified by the FPCR, and writes the
result to the SIMD&FP destination register.
A floating-point exception can be generated by this instruction. Depending on the settings in FPCR, the exception
results in either a flag being set in FPSR, or a synchronous exception being generated. For more information, see
Floating-point exception traps.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the Security state and
Exception level in which the instruction is executed, an attempt to execute the instruction might be trapped.
It has encodings from 4 classes: Scalar half precision , Scalar single-precision and double-precision , Vector half
precision and Vector single-precision and double-precision

Scalar half precision

(FEAT_FP16)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 1 1 0 0 1 1 1 1 0 0 1 1 1 0 1 1 0 Rn Rd
U

UCVTF <Hd>, <Hn>

if !HaveFP16Ext() then UNDEFINED;

integer d = UInt(Rd);
integer n = UInt(Rn);

integer esize = 16;

integer datasize = esize;
integer elements = 1;
boolean unsigned = (U == '1');

Scalar single-precision and double-precision

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 1 1 0 0 sz 1 0 0 0 0 1 1 1 0 1 1 0 Rn Rd
U

UCVTF <V><d>, <V><n>

integer d = UInt(Rd);
integer n = UInt(Rn);

integer esize = 32 << UInt(sz);

integer datasize = esize;
integer elements = 1;
boolean unsigned = (U == '1');

Vector half precision

(FEAT_FP16)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 0 1 1 1 1 0 0 1 1 1 0 1 1 0 Rn Rd
U

UCVTF (vector, integer) Page 1581

UCVTF <Vd>.<T>, <Vn>.<T>

if !HaveFP16Ext() then UNDEFINED;

integer d = UInt(Rd);
integer n = UInt(Rn);

integer esize = 16;

integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;
boolean unsigned = (U == '1');

Vector single-precision and double-precision

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 0 sz 1 0 0 0 0 1 1 1 0 1 1 0 Rn Rd
U

UCVTF <Vd>.<T>, <Vn>.<T>

integer d = UInt(Rd);
integer n = UInt(Rn);

if sz:Q == '10' then UNDEFINED;

integer esize = 32 << UInt(sz);
integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;
boolean unsigned = (U == '1');

Assembler Symbols

<Hd> Is the 16-bit name of the SIMD&FP destination register, encoded in the "Rd" field.
<Hn> Is the 16-bit name of the SIMD&FP source register, encoded in the "Rn" field.

<V> Is a width specifier, encoded in “sz”:

sz <V>
0 S
1 D

<T> For the half-precision variant: is an arrangement specifier, encoded in “Q”:

Q <T>
0 4H
1 8H

For the single-precision and double-precision variant: is an arrangement specifier, encoded in “sz:Q”:

sz Q <T>
0 0 2S
0 1 4S
1 0 RESERVED
1 1 2D

<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.

UCVTF (vector, integer) Page 1582

Operation

CheckFPAdvSIMDEnabled64();
bits(datasize) operand = V[n];

FPCRType fpcr = FPCR[];

boolean merge = elements == 1 && IsMerging(fpcr);
bits(128) result = if merge then V[d] else Zeros();

FPRounding rounding = FPRoundingMode(fpcr);

bits(esize) element;
for e = 0 to elements-1
element = Elem[operand, e, esize];
Elem[result, e, esize] = FixedToFP(element, 0, unsigned, fpcr, rounding);

V[d] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

UCVTF (vector, integer) Page 1583

UDOT (by element)

Dot Product unsigned arithmetic (vector, by element). This instruction performs the dot product of the four 8-bit
elements in each 32-bit element of the first source register with the four 8-bit elements of an indexed 32-bit element in
the second source register, accumulating the result into the corresponding 32-bit element of the destination register.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
In Armv8.2 and Armv8.3, this is an OPTIONAL instruction. From Armv8.4 it is mandatory for all implementations to
support it.

Note

ID_AA64ISAR0_EL1.DP indicates whether this instruction is supported.

Vector
(FEAT_DotProd)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 1 size L M Rm 1 1 1 0 H 0 Rn Rd
U

UDOT <Vd>.<Ta>, <Vn>.<Tb>, <Vm>.4B[<index>]

if !HaveDOTPExt() then UNDEFINED;

if size != '10' then UNDEFINED;
boolean signed = (U == '0');

integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(M:Rm);
integer index = UInt(H:L);

integer esize = 8 << UInt(size);

integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;

Assembler Symbols

<Vd> Is the name of the SIMD&FP third source and destination register, encoded in the "Rd" field.

<Ta> Is an arrangement specifier, encoded in “Q”:

Q <Ta>
0 2S
1 4S

<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.

<Tb> Is an arrangement specifier, encoded in “Q”:

Q <Tb>
0 8B
1 16B

<Vm> Is the name of the second SIMD&FP source register, encoded in the "M:Rm" fields.
<index> Is the element index, encoded in the "H:L" fields.

UDOT (by element) Page 1584

Operation

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

UDOT (by element) Page 1585

UDOT (vector)

Dot Product unsigned arithmetic (vector). This instruction performs the dot product of the four unsigned 8-bit
elements in each 32-bit element of the first source register with the four unsigned 8-bit elements of the corresponding
32-bit element in the second source register, accumulating the result into the corresponding 32-bit element of the
destination register.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
In Armv8.2 and Armv8.3, this is an OPTIONAL instruction. From Armv8.4 it is mandatory for all implementations to
support it.

Note

ID_AA64ISAR0_EL1.DP indicates whether this instruction is supported.

Vector
(FEAT_DotProd)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 size 0 Rm 1 0 0 1 0 1 Rn Rd
U

UDOT <Vd>.<Ta>, <Vn>.<Tb>, <Vm>.<Tb>

if !HaveDOTPExt() then UNDEFINED;

Assembler Symbols

<Vd> Is the name of the SIMD&FP third source and destination register, encoded in the "Rd" field.

<Ta> Is an arrangement specifier, encoded in “Q”:

Q <Ta>
0 2S
1 4S

<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.

<Tb> Is an arrangement specifier, encoded in “Q”:

Q <Tb>
0 8B
1 16B

<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.

UDOT (vector) Page 1586

Operation

CheckFPAdvSIMDEnabled64();
bits(datasize) operand1 = V[n];
bits(datasize) operand2 = V[m];
bits(datasize) result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

UDOT (vector) Page 1587

UHADD

Unsigned Halving Add. This instruction adds corresponding unsigned integer values from the two source SIMD&FP
registers, shifts each result right one bit, places the results into a vector, and writes the vector to the destination
SIMD&FP register.
The results are truncated. For rounded results, see URHADD.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 size 1 Rm 0 0 0 0 0 1 Rn Rd
U

UHADD <Vd>.<T>, <Vn>.<T>, <Vm>.<T>

Assembler Symbols

<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.

<T> Is an arrangement specifier, encoded in “size:Q”:

size Q <T>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 x RESERVED

<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.

Operation

CheckFPAdvSIMDEnabled64();
bits(datasize) operand1 = V[n];
bits(datasize) operand2 = V[m];
bits(datasize) result;
integer element1;
integer element2;
integer sum;

for e = 0 to elements-1
element1 = Int(Elem[operand1, e, esize], unsigned);
element2 = Int(Elem[operand2, e, esize], unsigned);
sum = element1 + element2;
Elem[result, e, esize] = sum<esize:1>;

V[d] = result;

Operational information

If PSTATE.DIT is 1:

UHADD Page 1588

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

UHADD Page 1589

UHSUB

Unsigned Halving Subtract. This instruction subtracts the vector elements in the second source SIMD&FP register
from the corresponding vector elements in the first source SIMD&FP register, shifts each result right one bit, places
each result into a vector, and writes the vector to the destination SIMD&FP register.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 size 1 Rm 0 0 1 0 0 1 Rn Rd
U

UHSUB <Vd>.<T>, <Vn>.<T>, <Vm>.<T>

Assembler Symbols

<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.

<T> Is an arrangement specifier, encoded in “size:Q”:

size Q <T>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 x RESERVED

<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.

Operation

CheckFPAdvSIMDEnabled64();
bits(datasize) operand1 = V[n];
bits(datasize) operand2 = V[m];
bits(datasize) result;
integer element1;
integer element2;
integer diff;

for e = 0 to elements-1
element1 = Int(Elem[operand1, e, esize], unsigned);
element2 = Int(Elem[operand2, e, esize], unsigned);
diff = element1 - element2;
Elem[result, e, esize] = diff<esize:1>;

V[d] = result;

Operational information

If PSTATE.DIT is 1:
• The execution time of this instruction is independent of:

UHSUB Page 1590

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

UHSUB Page 1591

UMAX

Unsigned Maximum (vector). This instruction compares corresponding elements in the vectors in the two source
SIMD&FP registers, places the larger of each pair of unsigned integer values into a vector, and writes the vector to
the destination SIMD&FP register.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 size 1 Rm 0 1 1 0 0 1 Rn Rd
U o1

UMAX <Vd>.<T>, <Vn>.<T>, <Vm>.<T>

boolean unsigned = (U == '1');

boolean minimum = (o1 == '1');

Assembler Symbols

<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.

<T> Is an arrangement specifier, encoded in “size:Q”:

size Q <T>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 x RESERVED

<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.

Operation

CheckFPAdvSIMDEnabled64();
bits(datasize) operand1 = V[n];
bits(datasize) operand2 = V[m];
bits(datasize) result;
integer element1;
integer element2;
integer maxmin;

V[d] = result;

UMAX Page 1592

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

UMAX Page 1593

UMAXP

Unsigned Maximum Pairwise. This instruction creates a vector by concatenating the vector elements of the first source
SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent
vector elements in the two source SIMD&FP registers, writes the largest of each pair of unsigned integer values into a
vector, and writes the vector to the destination SIMD&FP register.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 size 1 Rm 1 0 1 0 0 1 Rn Rd
U o1

UMAXP <Vd>.<T>, <Vn>.<T>, <Vm>.<T>

boolean unsigned = (U == '1');

boolean minimum = (o1 == '1');

Assembler Symbols

<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.

<T> Is an arrangement specifier, encoded in “size:Q”:

size Q <T>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 x RESERVED

<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.

Operation

V[d] = result;

UMAXP Page 1594

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

UMAXP Page 1595

UMAXV

Unsigned Maximum across Vector. This instruction compares all the vector elements in the source SIMD&FP register,
and writes the largest of the values as a scalar to the destination SIMD&FP register. All the values in this instruction
are unsigned integer values.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 size 1 1 0 0 0 0 1 0 1 0 1 0 Rn Rd
U op

UMAXV <V><d>, <Vn>.<T>

integer d = UInt(Rd);
integer n = UInt(Rn);

if size:Q == '100' then UNDEFINED;

if size == '11' then UNDEFINED;
integer esize = 8 << UInt(size);
integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;

boolean unsigned = (U == '1');

boolean min = (op == '1');

Assembler Symbols

<V> Is the destination width specifier, encoded in “size”:

size <V>
00 B
01 H
10 S
11 RESERVED

<d> Is the number of the SIMD&FP destination register, encoded in the "Rd" field.
<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.

<T> Is an arrangement specifier, encoded in “size:Q”:

size Q <T>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 RESERVED
10 1 4S
11 x RESERVED

Operation

CheckFPAdvSIMDEnabled64();
bits(datasize) operand = V[n];
integer maxmin;
integer element;

maxmin = Int(Elem[operand, 0, esize], unsigned);

for e = 1 to elements-1
element = Int(Elem[operand, e, esize], unsigned);
maxmin = if min then Min(maxmin, element) else Max(maxmin, element);

V[d] = maxmin<esize-1:0>;

UMAXV Page 1596

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

UMAXV Page 1597

UMIN

Unsigned Minimum (vector). This instruction compares corresponding vector elements in the two source SIMD&FP
registers, places the smaller of each of the two unsigned integer values into a vector, and writes the vector to the
destination SIMD&FP register.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 size 1 Rm 0 1 1 0 1 1 Rn Rd
U o1

UMIN <Vd>.<T>, <Vn>.<T>, <Vm>.<T>

boolean unsigned = (U == '1');

boolean minimum = (o1 == '1');

Assembler Symbols

<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.

<T> Is an arrangement specifier, encoded in “size:Q”:

size Q <T>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 x RESERVED

<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.

Operation

CheckFPAdvSIMDEnabled64();
bits(datasize) operand1 = V[n];
bits(datasize) operand2 = V[m];
bits(datasize) result;
integer element1;
integer element2;
integer maxmin;

V[d] = result;

UMIN Page 1598

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

UMIN Page 1599

UMINP

Unsigned Minimum Pairwise. This instruction creates a vector by concatenating the vector elements of the first source
SIMD&FP register after the vector elements of the second source SIMD&FP register, reads each pair of adjacent
vector elements in the two source SIMD&FP registers, writes the smallest of each pair of unsigned integer values into
a vector, and writes the vector to the destination SIMD&FP register.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 size 1 Rm 1 0 1 0 1 1 Rn Rd
U o1

UMINP <Vd>.<T>, <Vn>.<T>, <Vm>.<T>

boolean unsigned = (U == '1');

boolean minimum = (o1 == '1');

Assembler Symbols

<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.

<T> Is an arrangement specifier, encoded in “size:Q”:

size Q <T>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 x RESERVED

<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.

Operation

V[d] = result;

UMINP Page 1600

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

UMINP Page 1601

UMINV

Unsigned Minimum across Vector. This instruction compares all the vector elements in the source SIMD&FP register,
and writes the smallest of the values as a scalar to the destination SIMD&FP register. All the values in this instruction
are unsigned integer values.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 size 1 1 0 0 0 1 1 0 1 0 1 0 Rn Rd
U op

UMINV <V><d>, <Vn>.<T>

integer d = UInt(Rd);
integer n = UInt(Rn);

if size:Q == '100' then UNDEFINED;

if size == '11' then UNDEFINED;
integer esize = 8 << UInt(size);
integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;

boolean unsigned = (U == '1');

boolean min = (op == '1');

Assembler Symbols

<V> Is the destination width specifier, encoded in “size”:

size <V>
00 B
01 H
10 S
11 RESERVED

<d> Is the number of the SIMD&FP destination register, encoded in the "Rd" field.
<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.

<T> Is an arrangement specifier, encoded in “size:Q”:

size Q <T>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 RESERVED
10 1 4S
11 x RESERVED

Operation

CheckFPAdvSIMDEnabled64();
bits(datasize) operand = V[n];
integer maxmin;
integer element;

maxmin = Int(Elem[operand, 0, esize], unsigned);

for e = 1 to elements-1
element = Int(Elem[operand, e, esize], unsigned);
maxmin = if min then Min(maxmin, element) else Max(maxmin, element);

V[d] = maxmin<esize-1:0>;

UMINV Page 1602

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

UMINV Page 1603

UMLAL, UMLAL2 (by element)

Unsigned Multiply-Add Long (vector, by element). This instruction multiplies each vector element in the lower or
upper half of the first source SIMD&FP register by the specified vector element of the second source SIMD&FP
register and accumulates the results with the vector elements of the destination SIMD&FP register. The destination
vector elements are twice as long as the elements that are multiplied.
The UMLAL instruction extracts vector elements from the lower half of the first source register. The UMLAL2 instruction
extracts vector elements from the upper half of the first source register.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 1 size L M Rm 0 0 1 0 H 0 Rn Rd
U o2

UMLAL{2} <Vd>.<Ta>, <Vn>.<Tb>, <Vm>.<Ts>[<index>]

integer idxdsize = if H == '1' then 128 else 64;

integer index;
bit Rmhi;
case size of
when '01' index = UInt(H:L:M); Rmhi = '0';
when '10' index = UInt(H:L); Rmhi = M;
otherwise UNDEFINED;

integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rmhi:Rm);

integer esize = 8 << UInt(size);

integer datasize = 64;
integer part = UInt(Q);
integer elements = datasize DIV esize;

boolean unsigned = (U == '1');

boolean sub_op = (o2 == '1');

Assembler Symbols

2 Is the second and upper half specifier. If present it causes the operation to be performed on the upper
64 bits of the registers holding the narrower elements, and is encoded in “Q”:

Q 2
0 [absent]
1 [present]

<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.

<Ta> Is an arrangement specifier, encoded in “size”:

size <Ta>
00 RESERVED
01 4S
10 2D
11 RESERVED

<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.

<Tb> Is an arrangement specifier, encoded in “size:Q”:

UMLAL, UMLAL2 (by

Page 1604
element)
size Q <Tb>
00 x RESERVED
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 x RESERVED

<Vm> Is the name of the second SIMD&FP source register, encoded in “size:M:Rm”:

size <Vm>
00 RESERVED
01 0:Rm
10 M:Rm
11 RESERVED
Restricted to V0-V15 when element size <Ts> is H.

<Ts> Is an element size specifier, encoded in “size”:

size <Ts>
00 RESERVED
01 H
10 S
11 RESERVED

<index> Is the element index, encoded in “size:L:H:M”:

size <index>
00 RESERVED
01 H:L:M
10 H:L
11 RESERVED

Operation

element2 = Int(Elem[operand2, index, esize], unsigned);

V[d] = result;

Operational information

UMLAL, UMLAL2 (by

Page 1605
element)
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

UMLAL, UMLAL2 (by

Page 1606
element)
UMLAL, UMLAL2 (vector)

Unsigned Multiply-Add Long (vector). This instruction multiplies the vector elements in the lower or upper half of the
first source SIMD&FP register by the corresponding vector elements of the second source SIMD&FP register, and
accumulates the results with the vector elements of the destination SIMD&FP register. The destination vector
elements are twice as long as the elements that are multiplied.
The UMLAL instruction extracts vector elements from the lower half of the first source register. The UMLAL2 instruction
extracts vector elements from the upper half of the first source register.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 size 1 Rm 1 0 0 0 0 0 Rn Rd
U o1

UMLAL{2} <Vd>.<Ta>, <Vn>.<Tb>, <Vm>.<Tb>

integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);

if size == '11' then UNDEFINED;

integer esize = 8 << UInt(size);
integer datasize = 64;
integer part = UInt(Q);
integer elements = datasize DIV esize;
boolean sub_op = (o1 == '1');
boolean unsigned = (U == '1');

Assembler Symbols

2 Is the second and upper half specifier. If present it causes the operation to be performed on the upper
64 bits of the registers holding the narrower elements, and is encoded in “Q”:

Q 2
0 [absent]
1 [present]

<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.

<Ta> Is an arrangement specifier, encoded in “size”:

size <Ta>
00 8H
01 4S
10 2D
11 RESERVED

<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.

<Tb> Is an arrangement specifier, encoded in “size:Q”:

size Q <Tb>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 x RESERVED

<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.

UMLAL, UMLAL2 (vector) Page 1607

Operation

V[d] = result;

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

UMLAL, UMLAL2 (vector) Page 1608

UMLSL, UMLSL2 (by element)

Unsigned Multiply-Subtract Long (vector, by element). This instruction multiplies each vector element in the lower or
upper half of the first source SIMD&FP register by the specified vector element of the second source SIMD&FP
register and subtracts the results from the vector elements of the destination SIMD&FP register. The destination
vector elements are twice as long as the elements that are multiplied.
The UMLSL instruction extracts vector elements from the lower half of the first source register. The UMLSL2 instruction
extracts vector elements from the upper half of the first source register.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 1 size L M Rm 0 1 1 0 H 0 Rn Rd
U o2

UMLSL{2} <Vd>.<Ta>, <Vn>.<Tb>, <Vm>.<Ts>[<index>]

integer idxdsize = if H == '1' then 128 else 64;

integer index;
bit Rmhi;
case size of
when '01' index = UInt(H:L:M); Rmhi = '0';
when '10' index = UInt(H:L); Rmhi = M;
otherwise UNDEFINED;

integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rmhi:Rm);

integer esize = 8 << UInt(size);

integer datasize = 64;
integer part = UInt(Q);
integer elements = datasize DIV esize;

boolean unsigned = (U == '1');

boolean sub_op = (o2 == '1');

Assembler Symbols

2 Is the second and upper half specifier. If present it causes the operation to be performed on the upper
64 bits of the registers holding the narrower elements, and is encoded in “Q”:

Q 2
0 [absent]
1 [present]

<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.

<Ta> Is an arrangement specifier, encoded in “size”:

size <Ta>
00 RESERVED
01 4S
10 2D
11 RESERVED

<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.

<Tb> Is an arrangement specifier, encoded in “size:Q”:

UMLSL, UMLSL2 (by

Page 1609
element)
size Q <Tb>
00 x RESERVED
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 x RESERVED

<Vm> Is the name of the second SIMD&FP source register, encoded in “size:M:Rm”:

size <Vm>
00 RESERVED
01 0:Rm
10 M:Rm
11 RESERVED
Restricted to V0-V15 when element size <Ts> is H.

<Ts> Is an element size specifier, encoded in “size”:

size <Ts>
00 RESERVED
01 H
10 S
11 RESERVED

<index> Is the element index, encoded in “size:L:H:M”:

size <index>
00 RESERVED
01 H:L:M
10 H:L
11 RESERVED

Operation

element2 = Int(Elem[operand2, index, esize], unsigned);

V[d] = result;

Operational information

UMLSL, UMLSL2 (by

Page 1610
element)
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

UMLSL, UMLSL2 (by

Page 1611
element)
UMLSL, UMLSL2 (vector)

Unsigned Multiply-Subtract Long (vector). This instruction multiplies corresponding vector elements in the lower or
upper half of the two source SIMD&FP registers, and subtracts the results from the vector elements of the destination
SIMD&FP register. The destination vector elements are twice as long as the elements that are multiplied. All the
values in this instruction are unsigned integer values.
The UMLSL instruction extracts each source vector from the lower half of each source register. The UMLSL2 instruction
extracts each source vector from the upper half of each source register.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 size 1 Rm 1 0 1 0 0 0 Rn Rd
U o1

UMLSL{2} <Vd>.<Ta>, <Vn>.<Tb>, <Vm>.<Tb>

integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);

if size == '11' then UNDEFINED;

integer esize = 8 << UInt(size);
integer datasize = 64;
integer part = UInt(Q);
integer elements = datasize DIV esize;
boolean sub_op = (o1 == '1');
boolean unsigned = (U == '1');

Assembler Symbols

2 Is the second and upper half specifier. If present it causes the operation to be performed on the upper
64 bits of the registers holding the narrower elements, and is encoded in “Q”:

Q 2
0 [absent]
1 [present]

<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.

<Ta> Is an arrangement specifier, encoded in “size”:

size <Ta>
00 8H
01 4S
10 2D
11 RESERVED

<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.

<Tb> Is an arrangement specifier, encoded in “size:Q”:

size Q <Tb>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 x RESERVED

<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.

UMLSL, UMLSL2 (vector) Page 1612

Operation

V[d] = result;

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

UMLSL, UMLSL2 (vector) Page 1613

UMMLA (vector)

Unsigned 8-bit integer matrix multiply-accumulate. This instruction multiplies the 2x8 matrix of unsigned 8-bit integer
values in the first source vector by the 8x2 matrix of unsigned 8-bit integer values in the second source vector. The
resulting 2x2 32-bit integer matrix product is destructively added to the 32-bit integer matrix accumulator in the
destination vector. This is equivalent to performing an 8-way dot product per destination element.
From Armv8.2 to Armv8.5, this is an OPTIONAL instruction. From Armv8.6 it is mandatory for implementations that
include Advanced SIMD to support it. ID_AA64ISAR1_EL1.I8MM indicates whether this instruction is supported.

Vector
(FEAT_I8MM)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 1 1 1 0 1 0 0 Rm 1 0 1 0 0 1 Rn Rd
U B

UMMLA <Vd>.4S, <Vn>.16B, <Vm>.16B

if !HaveInt8MatMulExt() then UNDEFINED;

integer n = UInt(Rn);
integer m = UInt(Rm);
integer d = UInt(Rd);

Assembler Symbols

Operation

CheckFPAdvSIMDEnabled64();
bits(128) operand1 = V[n];
bits(128) operand2 = V[m];
bits(128) addend = V[d];

V[d] = MatMulAdd(addend, operand1, operand2, TRUE, TRUE);

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

UMMLA (vector) Page 1614

UMOV

Unsigned Move vector element to general-purpose register. This instruction reads the unsigned integer from the
source SIMD&FP register, zero-extends it to form a 32-bit or 64-bit value, and writes the result to the destination
general-purpose register.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
This instruction is used by the alias MOV (to general).
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 0 0 0 imm5 0 0 1 1 1 1 Rn Rd

32-bit (Q == 0)

UMOV <Wd>, <Vn>.<Ts>[<index>]

64-bit (Q == 1 && imm5 == x1000)

UMOV <Xd>, <Vn>.<Ts>[<index>]

integer d = UInt(Rd);
integer n = UInt(Rn);

integer size;
case Q:imm5 of
when '0xxxx1' size = 0; // UMOV Wd, Vn.B
when '0xxx10' size = 1; // UMOV Wd, Vn.H
when '0xx100' size = 2; // UMOV Wd, Vn.S
when '1x1000' size = 3; // UMOV Xd, Vn.D
otherwise UNDEFINED;

integer idxdsize = if imm5<4> == '1' then 128 else 64;

integer index = UInt(imm5<4:size+1>);
integer esize = 8 << size;
integer datasize = if Q == '1' then 64 else 32;

Assembler Symbols

<Ts> For the 32-bit variant: is an element size specifier, encoded in “imm5”:

imm5 <Ts>
xx000 RESERVED
xxxx1 B
xxx10 H
xx100 S

For the 64-bit variant: is an element size specifier, encoded in “imm5”:

imm5 <Ts>
x0000 RESERVED
xxxx1 RESERVED
xxx10 RESERVED
xx100 RESERVED
x1000 D

<index> For the 32-bit variant: is the element index encoded in “imm5”:

UMOV Page 1615

imm5 <index>
xx000 RESERVED
xxxx1 imm5<4:1>
xxx10 imm5<4:2>
xx100 imm5<4:3>

For the 64-bit variant: is the element index encoded in "imm5<4>".

Alias Conditions

Alias Is preferred when

MOV (to general) imm5 == 'x1000'
MOV (to general) imm5 == 'xx100'

Operation

CheckFPAdvSIMDEnabled64();
bits(idxdsize) operand = V[n];

X[d] = ZeroExtend(Elem[operand, index, esize], datasize);

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

UMOV Page 1616

UMULL, UMULL2 (by element)

Unsigned Multiply Long (vector, by element). This instruction multiplies each vector element in the lower or upper half
of the first source SIMD&FP register by the specified vector element of the second source SIMD&FP register, places
the results in a vector, and writes the vector to the destination SIMD&FP register. The destination vector elements are
twice as long as the elements that are multiplied.
The UMULL instruction extracts vector elements from the lower half of the first source register. The UMULL2 instruction
extracts vector elements from the upper half of the first source register.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 1 size L M Rm 1 0 1 0 H 0 Rn Rd
U

UMULL{2} <Vd>.<Ta>, <Vn>.<Tb>, <Vm>.<Ts>[<index>]

integer idxdsize = if H == '1' then 128 else 64;

integer index;
bit Rmhi;
case size of
when '01' index = UInt(H:L:M); Rmhi = '0';
when '10' index = UInt(H:L); Rmhi = M;
otherwise UNDEFINED;

integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rmhi:Rm);

integer esize = 8 << UInt(size);

integer datasize = 64;
integer part = UInt(Q);
integer elements = datasize DIV esize;
boolean unsigned = (U == '1');

Assembler Symbols

2 Is the second and upper half specifier. If present it causes the operation to be performed on the upper
64 bits of the registers holding the narrower elements, and is encoded in “Q”:

Q 2
0 [absent]
1 [present]

<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.

<Ta> Is an arrangement specifier, encoded in “size”:

size <Ta>
00 RESERVED
01 4S
10 2D
11 RESERVED

<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.

<Tb> Is an arrangement specifier, encoded in “size:Q”:

UMULL, UMULL2 (by

Page 1617
element)
size Q <Tb>
00 x RESERVED
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 x RESERVED

<Vm> Is the name of the second SIMD&FP source register, encoded in “size:M:Rm”:

size <Vm>
00 RESERVED
01 0:Rm
10 M:Rm
11 RESERVED
Restricted to V0-V15 when element size <Ts> is H.

<Ts> Is an element size specifier, encoded in “size”:

size <Ts>
00 RESERVED
01 H
10 S
11 RESERVED

<index> Is the element index, encoded in “size:L:H:M”:

size <index>
00 RESERVED
01 H:L:M
10 H:L
11 RESERVED

Operation

CheckFPAdvSIMDEnabled64();
bits(datasize) operand1 = Vpart[n, part];
bits(idxdsize) operand2 = V[m];
bits(2*datasize) result;
integer element1;
integer element2;
bits(2*esize) product;

element2 = Int(Elem[operand2, index, esize], unsigned);

for e = 0 to elements-1
element1 = Int(Elem[operand1, e, esize], unsigned);
product = (element1*element2)<2*esize-1:0>;
Elem[result, e, 2*esize] = product;

V[d] = result;

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

UMULL, UMULL2 (by

Page 1618
element)
UMULL, UMULL2 (vector)

Unsigned Multiply long (vector). This instruction multiplies corresponding vector elements in the lower or upper half
of the two source SIMD&FP registers, places the result in a vector, and writes the vector to the destination SIMD&FP
register. The destination vector elements are twice as long as the elements that are multiplied. All the values in this
instruction are unsigned integer values.
The UMULL instruction extracts each source vector from the lower half of each source register. The UMULL2 instruction
extracts each source vector from the upper half of each source register.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 size 1 Rm 1 1 0 0 0 0 Rn Rd
U

UMULL{2} <Vd>.<Ta>, <Vn>.<Tb>, <Vm>.<Tb>

integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);

if size == '11' then UNDEFINED;

integer esize = 8 << UInt(size);
integer datasize = 64;
integer part = UInt(Q);
integer elements = datasize DIV esize;

boolean unsigned = (U == '1');

Assembler Symbols

2 Is the second and upper half specifier. If present it causes the operation to be performed on the upper
64 bits of the registers holding the narrower elements, and is encoded in “Q”:

Q 2
0 [absent]
1 [present]

<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.

<Ta> Is an arrangement specifier, encoded in “size”:

size <Ta>
00 8H
01 4S
10 2D
11 RESERVED

<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.

<Tb> Is an arrangement specifier, encoded in “size:Q”:

size Q <Tb>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 x RESERVED

<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.

UMULL, UMULL2 (vector) Page 1619

Operation

CheckFPAdvSIMDEnabled64();
bits(datasize) operand1 = Vpart[n, part];
bits(datasize) operand2 = Vpart[m, part];
bits(2*datasize) result;
integer element1;
integer element2;

for e = 0 to elements-1
element1 = Int(Elem[operand1, e, esize], unsigned);
element2 = Int(Elem[operand2, e, esize], unsigned);
Elem[result, e, 2*esize] = (element1*element2)<2*esize-1:0>;

V[d] = result;

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

UMULL, UMULL2 (vector) Page 1620

UQADD

Unsigned saturating Add. This instruction adds the values of corresponding elements of the two source SIMD&FP
registers, places the results into a vector, and writes the vector to the destination SIMD&FP register.
If overflow occurs with any of the results, those results are saturated. If saturation occurs, the cumulative saturation
bit FPSR.QC is set.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: Scalar and Vector

Scalar

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 1 1 0 size 1 Rm 0 0 0 0 1 1 Rn Rd
U

UQADD <V><d>, <V><n>, <V><m>

integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer esize = 8 << UInt(size);
integer datasize = esize;
integer elements = 1;
boolean unsigned = (U == '1');

Vector

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 size 1 Rm 0 0 0 0 1 1 Rn Rd
U

UQADD <Vd>.<T>, <Vn>.<T>, <Vm>.<T>

Assembler Symbols

<V> Is a width specifier, encoded in “size”:

size <V>
00 B
01 H
10 S
11 D

UQADD Page 1621

<T> Is an arrangement specifier, encoded in “size:Q”:

size Q <T>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 0 RESERVED
11 1 2D

<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.

Operation

CheckFPAdvSIMDEnabled64();
bits(datasize) operand1 = V[n];
bits(datasize) operand2 = V[m];
bits(datasize) result;
integer element1;
integer element2;
integer sum;
boolean sat;

V[d] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

UQADD Page 1622

UQRSHL

Unsigned saturating Rounding Shift Left (register). This instruction takes each vector element of the first source
SIMD&FP register, shifts the vector element by a value from the least significant byte of the corresponding vector
element of the second source SIMD&FP register, places the results into a vector, and writes the vector to the
destination SIMD&FP register.
If the shift value is positive, the operation is a left shift. Otherwise, it is a right shift. The results are rounded. For
truncated results, see UQSHL.
If overflow occurs with any of the results, those results are saturated. If saturation occurs, the cumulative saturation
bit FPSR.QC is set.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: Scalar and Vector

Scalar

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 1 1 0 size 1 Rm 0 1 0 1 1 1 Rn Rd
U R S

UQRSHL <V><d>, <V><n>, <V><m>

Vector

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 size 1 Rm 0 1 0 1 1 1 Rn Rd
U R S

UQRSHL <Vd>.<T>, <Vn>.<T>, <Vm>.<T>

Assembler Symbols

<V> Is a width specifier, encoded in “size”:

UQRSHL Page 1623

size <V>
00 B
01 H
10 S
11 D

<T> Is an arrangement specifier, encoded in “size:Q”:

size Q <T>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 0 RESERVED
11 1 2D

<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.

Operation

CheckFPAdvSIMDEnabled64();
bits(datasize) operand1 = V[n];
bits(datasize) operand2 = V[m];
bits(datasize) result;

integer round_const = 0;
integer shift;
integer element;
boolean sat;

V[d] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

UQRSHL Page 1624

UQRSHRN, UQRSHRN2

Unsigned saturating Rounded Shift Right Narrow (immediate). This instruction reads each vector element in the
source SIMD&FP register, right shifts each result by an immediate value, puts the final result into a vector, and writes
the vector to the lower or upper half of the destination SIMD&FP register. All the values in this instruction are
unsigned integer values. The results are rounded. For truncated results, see UQSHRN.
The UQRSHRN instruction writes the vector to the lower half of the destination register and clears the upper half, while
the UQRSHRN2 instruction writes the vector to the upper half of the destination register without affecting the other bits
of the register.
If overflow occurs with any of the results, those results are saturated. If saturation occurs, the cumulative saturation
bit FPSR.QC is set.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: Scalar and Vector

Scalar

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 1 1 1 0 != 0000 immb 1 0 0 1 1 1 Rn Rd
U immh op

UQRSHRN <Vb><d>, <Va><n>, #<shift>

integer d = UInt(Rd);
integer n = UInt(Rn);

if immh == '0000' then UNDEFINED;

if immh<3> == '1' then UNDEFINED;
integer esize = 8 << HighestSetBit(immh);
integer datasize = esize;
integer elements = 1;
integer part = 0;

integer shift = (2 * esize) - UInt(immh:immb);

boolean round = (op == '1');
boolean unsigned = (U == '1');

Vector

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 1 0 != 0000 immb 1 0 0 1 1 1 Rn Rd
U immh op

UQRSHRN{2} <Vd>.<Tb>, <Vn>.<Ta>, #<shift>

integer d = UInt(Rd);
integer n = UInt(Rn);

if immh == '0000' then SEE(asimdimm);

if immh<3> == '1' then UNDEFINED;
integer esize = 8 << HighestSetBit(immh);
integer datasize = 64;
integer part = UInt(Q);
integer elements = datasize DIV esize;

integer shift = (2 * esize) - UInt(immh:immb);

boolean round = (op == '1');
boolean unsigned = (U == '1');

UQRSHRN, UQRSHRN2 Page 1625

Assembler Symbols

2 Is the second and upper half specifier. If present it causes the operation to be performed on the upper
64 bits of the registers holding the narrower elements, and is encoded in “Q”:

Q 2
0 [absent]
1 [present]

<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.

<Tb> Is an arrangement specifier, encoded in “immh:Q”:

immh Q <Tb>
0000 x SEE Advanced SIMD modified immediate
0001 0 8B
0001 1 16B
001x 0 4H
001x 1 8H
01xx 0 2S
01xx 1 4S
1xxx x RESERVED

<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.

<Ta> Is an arrangement specifier, encoded in “immh”:

immh <Ta>
0000 SEE Advanced SIMD modified immediate
0001 8H
001x 4S
01xx 2D
1xxx RESERVED

<Vb> Is the destination width specifier, encoded in “immh”:

immh <Vb>
0000 RESERVED
0001 B
001x H
01xx S
1xxx RESERVED

<d> Is the number of the SIMD&FP destination register, in the "Rd" field.

<Va> Is the source width specifier, encoded in “immh”:

immh <Va>
0000 RESERVED
0001 H
001x S
01xx D
1xxx RESERVED

<n> Is the number of the first SIMD&FP source register, encoded in the "Rn" field.

<shift> For the scalar variant: is the right shift amount, in the range 1 to the destination operand width in bits,
encoded in “immh:immb”:

immh <shift>
0000 RESERVED
0001 (16-UInt(immh:immb))
001x (32-UInt(immh:immb))
01xx (64-UInt(immh:immb))
1xxx RESERVED

UQRSHRN, UQRSHRN2 Page 1626

For the vector variant: is the right shift amount, in the range 1 to the destination element width in bits,
encoded in “immh:immb”:

immh <shift>
0000 SEE Advanced SIMD modified immediate
0001 (16-UInt(immh:immb))
001x (32-UInt(immh:immb))
01xx (64-UInt(immh:immb))
1xxx RESERVED

Operation

CheckFPAdvSIMDEnabled64();
bits(datasize*2) operand = V[n];
bits(datasize) result;
integer round_const = if round then (1 << (shift - 1)) else 0;
integer element;
boolean sat;

for e = 0 to elements-1
element = (Int(Elem[operand, e, 2*esize], unsigned) + round_const) >> shift;
(Elem[result, e, esize], sat) = SatQ(element, esize, unsigned);
if sat then FPSR.QC = '1';

Vpart[d, part] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

UQRSHRN, UQRSHRN2 Page 1627

UQSHL (immediate)

Unsigned saturating Shift Left (immediate). This instruction takes each vector element in the source SIMD&FP
register, shifts it by an immediate value, places the results in a vector, and writes the vector to the destination
SIMD&FP register. The results are truncated. For rounded results, see UQRSHL.
If overflow occurs with any of the results, those results are saturated. If saturation occurs, the cumulative saturation
bit FPSR.QC is set.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: Scalar and Vector

Scalar

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 1 1 1 0 != 0000 immb 0 1 1 1 0 1 Rn Rd
U immh op

UQSHL <V><d>, <V><n>, #<shift>

integer d = UInt(Rd);
integer n = UInt(Rn);

if immh == '0000' then UNDEFINED;

integer esize = 8 << HighestSetBit(immh);
integer datasize = esize;
integer elements = 1;

integer shift = UInt(immh:immb) - esize;

Vector

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 1 0 != 0000 immb 0 1 1 1 0 1 Rn Rd
U immh op

UQSHL <Vd>.<T>, <Vn>.<T>, #<shift>

integer d = UInt(Rd);
integer n = UInt(Rn);

if immh == '0000' then SEE(asimdimm);

if immh<3>:Q == '10' then UNDEFINED;
integer esize = 8 << HighestSetBit(immh);
integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;

integer shift = UInt(immh:immb) - esize;

UQSHL (immediate) Page 1628

Assembler Symbols

<V> Is a width specifier, encoded in “immh”:

immh <V>
0000 RESERVED
0001 B
001x H
01xx S
1xxx D

<T> Is an arrangement specifier, encoded in “immh:Q”:

immh Q <T>
0000 x SEE Advanced SIMD modified immediate
0001 0 8B
0001 1 16B
001x 0 4H
001x 1 8H
01xx 0 2S
01xx 1 4S
1xxx 0 RESERVED
1xxx 1 2D

<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.

<shift> For the scalar variant: is the left shift amount, in the range 0 to the operand width in bits minus 1,
encoded in “immh:immb”:

immh <shift>
0000 RESERVED
0001 (UInt(immh:immb)-8)
001x (UInt(immh:immb)-16)
01xx (UInt(immh:immb)-32)
1xxx (UInt(immh:immb)-64)

For the vector variant: is the left shift amount, in the range 0 to the element width in bits minus 1,
encoded in “immh:immb”:

immh <shift>
0000 SEE Advanced SIMD modified immediate
0001 (UInt(immh:immb)-8)
001x (UInt(immh:immb)-16)
01xx (UInt(immh:immb)-32)
1xxx (UInt(immh:immb)-64)

Operation

CheckFPAdvSIMDEnabled64();
bits(datasize) operand = V[n];
bits(datasize) result;
integer element;
boolean sat;

for e = 0 to elements-1
element = Int(Elem[operand, e, esize], src_unsigned) << shift;
(Elem[result, e, esize], sat) = SatQ(element, esize, dst_unsigned);
if sat then FPSR.QC = '1';

V[d] = result;

UQSHL (immediate) Page 1629

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

UQSHL (immediate) Page 1630

UQSHL (register)

Unsigned saturating Shift Left (register). This instruction takes each element in the vector of the first source
SIMD&FP register, shifts the element by a value from the least significant byte of the corresponding element of the
second source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP
register.
If the shift value is positive, the operation is a left shift. Otherwise, it is a right shift. The results are truncated. For
rounded results, see UQRSHL.
If overflow occurs with any of the results, those results are saturated. If saturation occurs, the cumulative saturation
bit FPSR.QC is set.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: Scalar and Vector

Scalar

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 1 1 0 size 1 Rm 0 1 0 0 1 1 Rn Rd
U R S

UQSHL <V><d>, <V><n>, <V><m>

Vector

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 size 1 Rm 0 1 0 0 1 1 Rn Rd
U R S

UQSHL <Vd>.<T>, <Vn>.<T>, <Vm>.<T>

Assembler Symbols

<V> Is a width specifier, encoded in “size”:

UQSHL (register) Page 1631

size <V>
00 B
01 H
10 S
11 D

<T> Is an arrangement specifier, encoded in “size:Q”:

size Q <T>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 0 RESERVED
11 1 2D

<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.

Operation

CheckFPAdvSIMDEnabled64();
bits(datasize) operand1 = V[n];
bits(datasize) operand2 = V[m];
bits(datasize) result;

integer round_const = 0;
integer shift;
integer element;
boolean sat;

V[d] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

UQSHL (register) Page 1632

UQSHRN, UQSHRN2

Unsigned saturating Shift Right Narrow (immediate). This instruction reads each vector element in the source
SIMD&FP register, right shifts each result by an immediate value, saturates each shifted result to a value that is half
the original width, puts the final result into a vector, and writes the vector to the lower or upper half of the destination
SIMD&FP register. All the values in this instruction are unsigned integer values. The results are truncated. For
rounded results, see UQRSHRN.
The UQSHRN instruction writes the vector to the lower half of the destination register and clears the upper half, while
the UQSHRN2 instruction writes the vector to the upper half of the destination register without affecting the other bits
of the register.
If overflow occurs with any of the results, those results are saturated. If saturation occurs, the cumulative saturation
bit FPSR.QC is set.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: Scalar and Vector

Scalar

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 1 1 1 0 != 0000 immb 1 0 0 1 0 1 Rn Rd
U immh op

UQSHRN <Vb><d>, <Va><n>, #<shift>

integer d = UInt(Rd);
integer n = UInt(Rn);

if immh == '0000' then UNDEFINED;

if immh<3> == '1' then UNDEFINED;
integer esize = 8 << HighestSetBit(immh);
integer datasize = esize;
integer elements = 1;
integer part = 0;

integer shift = (2 * esize) - UInt(immh:immb);

boolean round = (op == '1');
boolean unsigned = (U == '1');

Vector

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 1 0 != 0000 immb 1 0 0 1 0 1 Rn Rd
U immh op

UQSHRN{2} <Vd>.<Tb>, <Vn>.<Ta>, #<shift>

integer d = UInt(Rd);
integer n = UInt(Rn);

if immh == '0000' then SEE(asimdimm);

if immh<3> == '1' then UNDEFINED;
integer esize = 8 << HighestSetBit(immh);
integer datasize = 64;
integer part = UInt(Q);
integer elements = datasize DIV esize;

integer shift = (2 * esize) - UInt(immh:immb);

boolean round = (op == '1');
boolean unsigned = (U == '1');

UQSHRN, UQSHRN2 Page 1633

Assembler Symbols

2 Is the second and upper half specifier. If present it causes the operation to be performed on the upper
64 bits of the registers holding the narrower elements, and is encoded in “Q”:

Q 2
0 [absent]
1 [present]

<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.

<Tb> Is an arrangement specifier, encoded in “immh:Q”:

immh Q <Tb>
0000 x SEE Advanced SIMD modified immediate
0001 0 8B
0001 1 16B
001x 0 4H
001x 1 8H
01xx 0 2S
01xx 1 4S
1xxx x RESERVED

<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.

<Ta> Is an arrangement specifier, encoded in “immh”:

immh <Ta>
0000 SEE Advanced SIMD modified immediate
0001 8H
001x 4S
01xx 2D
1xxx RESERVED

<Vb> Is the destination width specifier, encoded in “immh”:

immh <Vb>
0000 RESERVED
0001 B
001x H
01xx S
1xxx RESERVED

<d> Is the number of the SIMD&FP destination register, in the "Rd" field.

<Va> Is the source width specifier, encoded in “immh”:

immh <Va>
0000 RESERVED
0001 H
001x S
01xx D
1xxx RESERVED

<n> Is the number of the first SIMD&FP source register, encoded in the "Rn" field.

<shift> For the scalar variant: is the right shift amount, in the range 1 to the destination operand width in bits,
encoded in “immh:immb”:

immh <shift>
0000 RESERVED
0001 (16-UInt(immh:immb))
001x (32-UInt(immh:immb))
01xx (64-UInt(immh:immb))
1xxx RESERVED

UQSHRN, UQSHRN2 Page 1634

For the vector variant: is the right shift amount, in the range 1 to the destination element width in bits,
encoded in “immh:immb”:

immh <shift>
0000 SEE Advanced SIMD modified immediate
0001 (16-UInt(immh:immb))
001x (32-UInt(immh:immb))
01xx (64-UInt(immh:immb))
1xxx RESERVED

Operation

CheckFPAdvSIMDEnabled64();
bits(datasize*2) operand = V[n];
bits(datasize) result;
integer round_const = if round then (1 << (shift - 1)) else 0;
integer element;
boolean sat;

for e = 0 to elements-1
element = (Int(Elem[operand, e, 2*esize], unsigned) + round_const) >> shift;
(Elem[result, e, esize], sat) = SatQ(element, esize, unsigned);
if sat then FPSR.QC = '1';

Vpart[d, part] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

UQSHRN, UQSHRN2 Page 1635

UQSUB

Unsigned saturating Subtract. This instruction subtracts the element values of the second source SIMD&FP register
from the corresponding element values of the first source SIMD&FP register, places the results into a vector, and
writes the vector to the destination SIMD&FP register.
If overflow occurs with any of the results, those results are saturated. If saturation occurs, the cumulative saturation
bit FPSR.QC is set.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: Scalar and Vector

Scalar

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 1 1 0 size 1 Rm 0 0 1 0 1 1 Rn Rd
U

UQSUB <V><d>, <V><n>, <V><m>

integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer esize = 8 << UInt(size);
integer datasize = esize;
integer elements = 1;
boolean unsigned = (U == '1');

Vector

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 size 1 Rm 0 0 1 0 1 1 Rn Rd
U

UQSUB <Vd>.<T>, <Vn>.<T>, <Vm>.<T>

Assembler Symbols

<V> Is a width specifier, encoded in “size”:

size <V>
00 B
01 H
10 S
11 D

UQSUB Page 1636

<T> Is an arrangement specifier, encoded in “size:Q”:

size Q <T>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 0 RESERVED
11 1 2D

<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.

Operation

CheckFPAdvSIMDEnabled64();
bits(datasize) operand1 = V[n];
bits(datasize) operand2 = V[m];
bits(datasize) result;
integer element1;
integer element2;
integer diff;
boolean sat;

V[d] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

UQSUB Page 1637

UQXTN, UQXTN2

Unsigned saturating extract Narrow. This instruction reads each vector element from the source SIMD&FP register,
saturates each value to half the original width, places the result into a vector, and writes the vector to the destination
SIMD&FP register. All the values in this instruction are unsigned integer values.
If saturation occurs, the cumulative saturation bit FPSR.QC is set.
The UQXTN instruction writes the vector to the lower half of the destination register and clears the upper half, while the
UQXTN2 instruction writes the vector to the upper half of the destination register without affecting the other bits of the
register.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: Scalar and Vector

Scalar

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 1 1 0 size 1 0 0 0 0 1 0 1 0 0 1 0 Rn Rd
U

UQXTN <Vb><d>, <Va><n>

integer d = UInt(Rd);
integer n = UInt(Rn);

if size == '11' then UNDEFINED;

integer esize = 8 << UInt(size);
integer datasize = esize;
integer part = 0;
integer elements = 1;

boolean unsigned = (U == '1');

Vector

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 size 1 0 0 0 0 1 0 1 0 0 1 0 Rn Rd
U

UQXTN{2} <Vd>.<Tb>, <Vn>.<Ta>

integer d = UInt(Rd);
integer n = UInt(Rn);

if size == '11' then UNDEFINED;

integer esize = 8 << UInt(size);
integer datasize = 64;
integer part = UInt(Q);
integer elements = datasize DIV esize;

boolean unsigned = (U == '1');

Assembler Symbols

2 Is the second and upper half specifier. If present it causes the operation to be performed on the upper
64 bits of the registers holding the narrower elements, and is encoded in “Q”:

Q 2
0 [absent]
1 [present]

<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.

UQXTN, UQXTN2 Page 1638

<Tb> Is an arrangement specifier, encoded in “size:Q”:

size Q <Tb>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 x RESERVED

<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.

<Ta> Is an arrangement specifier, encoded in “size”:

size <Ta>
00 8H
01 4S
10 2D
11 RESERVED

<Vb> Is the destination width specifier, encoded in “size”:

size <Vb>
00 B
01 H
10 S
11 RESERVED

<d> Is the number of the SIMD&FP destination register, encoded in the "Rd" field.

<Va> Is the source width specifier, encoded in “size”:

size <Va>
00 H
01 S
10 D
11 RESERVED

<n> Is the number of the SIMD&FP source register, encoded in the "Rn" field.

Operation

CheckFPAdvSIMDEnabled64();
bits(2*datasize) operand = V[n];
bits(datasize) result;
bits(2*esize) element;
boolean sat;

for e = 0 to elements-1
element = Elem[operand, e, 2*esize];
(Elem[result, e, esize], sat) = SatQ(Int(element, unsigned), esize, unsigned);
if sat then FPSR.QC = '1';

Vpart[d, part] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

UQXTN, UQXTN2 Page 1639

URECPE

Unsigned Reciprocal Estimate. This instruction reads each vector element from the source SIMD&FP register,
calculates an approximate inverse for the unsigned integer value, places the result into a vector, and writes the vector
to the destination SIMD&FP register.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 1 sz 1 0 0 0 0 1 1 1 0 0 1 0 Rn Rd

URECPE <Vd>.<T>, <Vn>.<T>

integer d = UInt(Rd);
integer n = UInt(Rn);

if sz == '1' then UNDEFINED;

integer esize = 32;
integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;

Assembler Symbols

<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.

<T> Is an arrangement specifier, encoded in “sz:Q”:

sz Q <T>
0 0 2S
0 1 4S
1 x RESERVED

<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.

Operation

CheckFPAdvSIMDEnabled64();
bits(datasize) operand = V[n];
bits(datasize) result;
bits(32) element;

for e = 0 to elements-1
element = Elem[operand, e, 32];
Elem[result, e, 32] = UnsignedRecipEstimate(element);

V[d] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

URECPE Page 1640

URHADD

Unsigned Rounding Halving Add. This instruction adds corresponding unsigned integer values from the two source
SIMD&FP registers, shifts each result right one bit, places the results into a vector, and writes the vector to the
destination SIMD&FP register.
The results are rounded. For truncated results, see UHADD.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 size 1 Rm 0 0 0 1 0 1 Rn Rd
U

URHADD <Vd>.<T>, <Vn>.<T>, <Vm>.<T>

boolean unsigned = (U == '1');

Assembler Symbols

<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.

<T> Is an arrangement specifier, encoded in “size:Q”:

size Q <T>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 x RESERVED

<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.

Operation

CheckFPAdvSIMDEnabled64();
bits(datasize) operand1 = V[n];
bits(datasize) operand2 = V[m];
bits(datasize) result;
integer element1;
integer element2;

for e = 0 to elements-1
element1 = Int(Elem[operand1, e, esize], unsigned);
element2 = Int(Elem[operand2, e, esize], unsigned);
Elem[result, e, esize] = (element1+element2+1)<esize:1>;

V[d] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

URHADD Page 1641

URSHL

Unsigned Rounding Shift Left (register). This instruction takes each element in the vector of the first source SIMD&FP
register, shifts the vector element by a value from the least significant byte of the corresponding element of the second
source SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register.
If the shift value is positive, the operation is a left shift. If the shift value is negative, it is a rounding right shift.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: Scalar and Vector

Scalar

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 1 1 0 size 1 Rm 0 1 0 1 0 1 Rn Rd
U R S

URSHL <V><d>, <V><n>, <V><m>

Vector

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 size 1 Rm 0 1 0 1 0 1 Rn Rd
U R S

URSHL <Vd>.<T>, <Vn>.<T>, <Vm>.<T>

Assembler Symbols

<V> Is a width specifier, encoded in “size”:

size <V>
0x RESERVED
10 RESERVED
11 D

URSHL Page 1642

<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.

<T> Is an arrangement specifier, encoded in “size:Q”:

size Q <T>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 0 RESERVED
11 1 2D

<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.

Operation

CheckFPAdvSIMDEnabled64();
bits(datasize) operand1 = V[n];
bits(datasize) operand2 = V[m];
bits(datasize) result;

integer round_const = 0;
integer shift;
integer element;
boolean sat;

V[d] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

URSHL Page 1643

URSHR

Unsigned Rounding Shift Right (immediate). This instruction reads each vector element in the source SIMD&FP
register, right shifts each result by an immediate value, writes the final result to a vector, and writes the vector to the
destination SIMD&FP register. All the values in this instruction are unsigned integer values. The results are rounded.
For truncated results, see USHR.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: Scalar and Vector

Scalar

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 1 1 1 0 != 0000 immb 0 0 1 0 0 1 Rn Rd
U immh o1 o0

URSHR <V><d>, <V><n>, #<shift>

integer d = UInt(Rd);
integer n = UInt(Rn);

if immh<3> != '1' then UNDEFINED;

integer esize = 8 << 3;
integer datasize = esize;
integer elements = 1;

integer shift = (esize * 2) - UInt(immh:immb);

boolean unsigned = (U == '1');
boolean round = (o1 == '1');
boolean accumulate = (o0 == '1');

Vector

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 1 0 != 0000 immb 0 0 1 0 0 1 Rn Rd
U immh o1 o0

URSHR <Vd>.<T>, <Vn>.<T>, #<shift>

integer d = UInt(Rd);
integer n = UInt(Rn);

if immh == '0000' then SEE(asimdimm);

if immh<3>:Q == '10' then UNDEFINED;
integer esize = 8 << HighestSetBit(immh);
integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;

integer shift = (esize * 2) - UInt(immh:immb);

boolean unsigned = (U == '1');
boolean round = (o1 == '1');
boolean accumulate = (o0 == '1');

Assembler Symbols

<V> Is a width specifier, encoded in “immh”:

immh <V>
0xxx RESERVED
1xxx D

URSHR Page 1644

<T> Is an arrangement specifier, encoded in “immh:Q”:

immh Q <T>
0000 x SEE Advanced SIMD modified immediate
0001 0 8B
0001 1 16B
001x 0 4H
001x 1 8H
01xx 0 2S
01xx 1 4S
1xxx 0 RESERVED
1xxx 1 2D

<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.

<shift> For the scalar variant: is the right shift amount, in the range 1 to 64, encoded in “immh:immb”:

immh <shift>
0xxx RESERVED
1xxx (128-UInt(immh:immb))

For the vector variant: is the right shift amount, in the range 1 to the element width in bits, encoded in
“immh:immb”:

immh <shift>
0000 SEE Advanced SIMD modified immediate
0001 (16-UInt(immh:immb))
001x (32-UInt(immh:immb))
01xx (64-UInt(immh:immb))
1xxx (128-UInt(immh:immb))

Operation

CheckFPAdvSIMDEnabled64();
bits(datasize) operand = V[n];
bits(datasize) operand2;
bits(datasize) result;
integer round_const = if round then (1 << (shift - 1)) else 0;
integer element;

operand2 = if accumulate then V[d] else Zeros();

for e = 0 to elements-1
element = (Int(Elem[operand, e, esize], unsigned) + round_const) >> shift;
Elem[result, e, esize] = Elem[operand2, e, esize] + element<esize-1:0>;

V[d] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

URSHR Page 1645

URSQRTE

Unsigned Reciprocal Square Root Estimate. This instruction reads each vector element from the source SIMD&FP
register, calculates an approximate inverse square root for each value, places the result into a vector, and writes the
vector to the destination SIMD&FP register. All the values in this instruction are unsigned integer values.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 1 sz 1 0 0 0 0 1 1 1 0 0 1 0 Rn Rd

URSQRTE <Vd>.<T>, <Vn>.<T>

integer d = UInt(Rd);
integer n = UInt(Rn);

if sz == '1' then UNDEFINED;

integer esize = 32;
integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;

Assembler Symbols

<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.

<T> Is an arrangement specifier, encoded in “sz:Q”:

sz Q <T>
0 0 2S
0 1 4S
1 x RESERVED

<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.

Operation

CheckFPAdvSIMDEnabled64();
bits(datasize) operand = V[n];
bits(datasize) result;
bits(32) element;

for e = 0 to elements-1
element = Elem[operand, e, 32];
Elem[result, e, 32] = UnsignedRSqrtEstimate(element);

V[d] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

URSQRTE Page 1646

URSRA

Unsigned Rounding Shift Right and Accumulate (immediate). This instruction reads each vector element in the source
SIMD&FP register, right shifts each result by an immediate value, and accumulates the final results with the vector
elements of the destination SIMD&FP register. All the values in this instruction are unsigned integer values. The
results are rounded. For truncated results, see USRA.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: Scalar and Vector

Scalar

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 1 1 1 0 != 0000 immb 0 0 1 1 0 1 Rn Rd
U immh o1 o0

URSRA <V><d>, <V><n>, #<shift>

integer d = UInt(Rd);
integer n = UInt(Rn);

if immh<3> != '1' then UNDEFINED;

integer esize = 8 << 3;
integer datasize = esize;
integer elements = 1;

integer shift = (esize * 2) - UInt(immh:immb);

boolean unsigned = (U == '1');
boolean round = (o1 == '1');
boolean accumulate = (o0 == '1');

Vector

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 1 0 != 0000 immb 0 0 1 1 0 1 Rn Rd
U immh o1 o0

URSRA <Vd>.<T>, <Vn>.<T>, #<shift>

integer d = UInt(Rd);
integer n = UInt(Rn);

if immh == '0000' then SEE(asimdimm);

if immh<3>:Q == '10' then UNDEFINED;
integer esize = 8 << HighestSetBit(immh);
integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;

integer shift = (esize * 2) - UInt(immh:immb);

boolean unsigned = (U == '1');
boolean round = (o1 == '1');
boolean accumulate = (o0 == '1');

Assembler Symbols

<V> Is a width specifier, encoded in “immh”:

immh <V>
0xxx RESERVED
1xxx D

URSRA Page 1647

<T> Is an arrangement specifier, encoded in “immh:Q”:

immh Q <T>
0000 x SEE Advanced SIMD modified immediate
0001 0 8B
0001 1 16B
001x 0 4H
001x 1 8H
01xx 0 2S
01xx 1 4S
1xxx 0 RESERVED
1xxx 1 2D

<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.

<shift> For the scalar variant: is the right shift amount, in the range 1 to 64, encoded in “immh:immb”:

immh <shift>
0xxx RESERVED
1xxx (128-UInt(immh:immb))

For the vector variant: is the right shift amount, in the range 1 to the element width in bits, encoded in
“immh:immb”:

immh <shift>
0000 SEE Advanced SIMD modified immediate
0001 (16-UInt(immh:immb))
001x (32-UInt(immh:immb))
01xx (64-UInt(immh:immb))
1xxx (128-UInt(immh:immb))

Operation

CheckFPAdvSIMDEnabled64();
bits(datasize) operand = V[n];
bits(datasize) operand2;
bits(datasize) result;
integer round_const = if round then (1 << (shift - 1)) else 0;
integer element;

operand2 = if accumulate then V[d] else Zeros();

for e = 0 to elements-1
element = (Int(Elem[operand, e, esize], unsigned) + round_const) >> shift;
Elem[result, e, esize] = Elem[operand2, e, esize] + element<esize-1:0>;

V[d] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

URSRA Page 1648

USDOT (by element)

Dot Product index form with unsigned and signed integers. This instruction performs the dot product of the four
unsigned 8-bit integer values in each 32-bit element of the first source register with the four signed 8-bit integer
values in an indexed 32-bit element of the second source register, accumulating the result into the corresponding
32-bit element of the destination register.
From Armv8.2 to Armv8.5, this is an OPTIONAL instruction. From Armv8.6 it is mandatory for implementations that
include Advanced SIMD to support it. ID_AA64ISAR1_EL1.I8MM indicates whether this instruction is supported.

Vector
(FEAT_I8MM)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 1 1 0 L M Rm 1 1 1 1 H 0 Rn Rd
US

USDOT <Vd>.<Ta>, <Vn>.<Tb>, <Vm>.4B[<index>]

if !HaveInt8MatMulExt() then UNDEFINED;

Assembler Symbols

<Vd> Is the name of the SIMD&FP third source and destination register, encoded in the "Rd" field.

<Ta> Is an arrangement specifier, encoded in “Q”:

Q <Ta>
0 2S
1 4S

<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.

<Tb> Is an arrangement specifier, encoded in “Q”:

Q <Tb>
0 8B
1 16B

USDOT (by element) Page 1649

Operation

CheckFPAdvSIMDEnabled64();
bits(datasize) operand1 = V[n];
bits(128) operand2 = V[m];
bits(datasize) operand3 = V[d];
bits(datasize) result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

USDOT (by element) Page 1650

USDOT (vector)

Dot Product vector form with unsigned and signed integers. This instruction performs the dot product of the four
unsigned 8-bit integer values in each 32-bit element of the first source register with the four signed 8-bit integer
values in the corresponding 32-bit element of the second source register, accumulating the result into the
corresponding 32-bit element of the destination register.
From Armv8.2 to Armv8.5, this is an OPTIONAL instruction. From Armv8.6 it is mandatory for implementations that
include Advanced SIMD to support it. ID_AA64ISAR1_EL1.I8MM indicates whether this instruction is supported.

Vector
(FEAT_I8MM)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 1 0 0 Rm 1 0 0 1 1 1 Rn Rd

USDOT <Vd>.<Ta>, <Vn>.<Tb>, <Vm>.<Tb>

if !HaveInt8MatMulExt() then UNDEFINED;

integer n = UInt(Rn);
integer m = UInt(Rm);
integer d = UInt(Rd);
integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV 32;

Assembler Symbols

<Vd> Is the name of the SIMD&FP third source and destination register, encoded in the "Rd" field.

<Ta> Is an arrangement specifier, encoded in “Q”:

Q <Ta>
0 2S
1 4S

<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.

<Tb> Is an arrangement specifier, encoded in “Q”:

Q <Tb>
0 8B
1 16B

<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.

Operation

CheckFPAdvSIMDEnabled64();
bits(datasize) operand1 = V[n];
bits(datasize) operand2 = V[m];
bits(datasize) operand3 = V[d];
bits(datasize) result;

for e = 0 to elements-1
bits(32) res = Elem[operand3, e, 32];
for b = 0 to 3
integer element1 = UInt(Elem[operand1, 4*e+b, 8]);
integer element2 = SInt(Elem[operand2, 4*e+b, 8]);
res = res + element1 * element2;
Elem[result, e, 32] = res;

V[d] = result;

USDOT (vector) Page 1651

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

USDOT (vector) Page 1652

USHL

Unsigned Shift Left (register). This instruction takes each element in the vector of the first source SIMD&FP register,
shifts each element by a value from the least significant byte of the corresponding element of the second source
SIMD&FP register, places the results in a vector, and writes the vector to the destination SIMD&FP register.
If the shift value is positive, the operation is a left shift. If the shift value is negative, it is a truncating right shift. For a
rounding shift, see URSHL.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: Scalar and Vector

Scalar

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 1 1 0 size 1 Rm 0 1 0 0 0 1 Rn Rd
U R S

USHL <V><d>, <V><n>, <V><m>

Vector

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 size 1 Rm 0 1 0 0 0 1 Rn Rd
U R S

USHL <Vd>.<T>, <Vn>.<T>, <Vm>.<T>

Assembler Symbols

<V> Is a width specifier, encoded in “size”:

size <V>
0x RESERVED
10 RESERVED
11 D

<d> Is the number of the SIMD&FP destination register, in the "Rd" field.
<n> Is the number of the first SIMD&FP source register, encoded in the "Rn" field.

USHL Page 1653

<m> Is the number of the second SIMD&FP source register, encoded in the "Rm" field.
<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.

<T> Is an arrangement specifier, encoded in “size:Q”:

size Q <T>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 0 RESERVED
11 1 2D

<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.

Operation

CheckFPAdvSIMDEnabled64();
bits(datasize) operand1 = V[n];
bits(datasize) operand2 = V[m];
bits(datasize) result;

integer round_const = 0;
integer shift;
integer element;
boolean sat;

V[d] = result;

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

USHL Page 1654

USHLL, USHLL2

Unsigned Shift Left Long (immediate). This instruction reads each vector element in the lower or upper half of the
source SIMD&FP register, shifts the unsigned integer value left by the specified number of bits, places the result into
a vector, and writes the vector to the destination SIMD&FP register. The destination vector elements are twice as long
as the source vector elements.
The USHLL instruction extracts vector elements from the lower half of the source register. The USHLL2 instruction
extracts vector elements from the upper half of the source register.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
This instruction is used by the alias UXTL, UXTL2.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 1 0 != 0000 immb 1 0 1 0 0 1 Rn Rd
U immh

USHLL{2} <Vd>.<Ta>, <Vn>.<Tb>, #<shift>

integer d = UInt(Rd);
integer n = UInt(Rn);

if immh == '0000' then SEE(asimdimm);

if immh<3> == '1' then UNDEFINED;
integer esize = 8 << HighestSetBit(immh);
integer datasize = 64;
integer part = UInt(Q);
integer elements = datasize DIV esize;

integer shift = UInt(immh:immb) - esize;

boolean unsigned = (U == '1');

Assembler Symbols

2 Is the second and upper half specifier. If present it causes the operation to be performed on the upper
64 bits of the registers holding the narrower elements, and is encoded in “Q”:

Q 2
0 [absent]
1 [present]

<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.

<Ta> Is an arrangement specifier, encoded in “immh”:

immh <Ta>
0000 SEE Advanced SIMD modified immediate
0001 8H
001x 4S
01xx 2D
1xxx RESERVED

<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.

<Tb> Is an arrangement specifier, encoded in “immh:Q”:

USHLL, USHLL2 Page 1655

immh Q <Tb>
0000 x SEE Advanced SIMD modified immediate
0001 0 8B
0001 1 16B
001x 0 4H
001x 1 8H
01xx 0 2S
01xx 1 4S
1xxx x RESERVED

<shift> Is the left shift amount, in the range 0 to the source element width in bits minus 1, encoded in
“immh:immb”:

immh <shift>
0000 SEE Advanced SIMD modified immediate
0001 (UInt(immh:immb)-8)
001x (UInt(immh:immb)-16)
01xx (UInt(immh:immb)-32)
1xxx RESERVED

Alias Conditions

Alias Is preferred when

UXTL, UXTL2 immb == '000' && BitCount(immh) == 1

Operation

CheckFPAdvSIMDEnabled64();
bits(datasize) operand = Vpart[n, part];
bits(datasize*2) result;
integer element;

for e = 0 to elements-1
element = Int(Elem[operand, e, esize], unsigned) << shift;
Elem[result, e, 2*esize] = element<2*esize-1:0>;

V[d] = result;

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

USHLL, USHLL2 Page 1656

USHR

Unsigned Shift Right (immediate). This instruction reads each vector element in the source SIMD&FP register, right
shifts each result by an immediate value, writes the final result to a vector, and writes the vector to the destination
SIMD&FP register. All the values in this instruction are unsigned integer values. The results are truncated. For
rounded results, see URSHR.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: Scalar and Vector

Scalar

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 1 1 1 0 != 0000 immb 0 0 0 0 0 1 Rn Rd
U immh o1 o0

USHR <V><d>, <V><n>, #<shift>

integer d = UInt(Rd);
integer n = UInt(Rn);

if immh<3> != '1' then UNDEFINED;

integer esize = 8 << 3;
integer datasize = esize;
integer elements = 1;

integer shift = (esize * 2) - UInt(immh:immb);

boolean unsigned = (U == '1');
boolean round = (o1 == '1');
boolean accumulate = (o0 == '1');

Vector

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 1 0 != 0000 immb 0 0 0 0 0 1 Rn Rd
U immh o1 o0

USHR <Vd>.<T>, <Vn>.<T>, #<shift>

integer d = UInt(Rd);
integer n = UInt(Rn);

if immh == '0000' then SEE(asimdimm);

if immh<3>:Q == '10' then UNDEFINED;
integer esize = 8 << HighestSetBit(immh);
integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;

integer shift = (esize * 2) - UInt(immh:immb);

boolean unsigned = (U == '1');
boolean round = (o1 == '1');
boolean accumulate = (o0 == '1');

Assembler Symbols

<V> Is a width specifier, encoded in “immh”:

immh <V>
0xxx RESERVED
1xxx D

USHR Page 1657

<T> Is an arrangement specifier, encoded in “immh:Q”:

immh Q <T>
0000 x SEE Advanced SIMD modified immediate
0001 0 8B
0001 1 16B
001x 0 4H
001x 1 8H
01xx 0 2S
01xx 1 4S
1xxx 0 RESERVED
1xxx 1 2D

<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.

<shift> For the scalar variant: is the right shift amount, in the range 1 to 64, encoded in “immh:immb”:

immh <shift>
0xxx RESERVED
1xxx (128-UInt(immh:immb))

For the vector variant: is the right shift amount, in the range 1 to the element width in bits, encoded in
“immh:immb”:

immh <shift>
0000 SEE Advanced SIMD modified immediate
0001 (16-UInt(immh:immb))
001x (32-UInt(immh:immb))
01xx (64-UInt(immh:immb))
1xxx (128-UInt(immh:immb))

Operation

CheckFPAdvSIMDEnabled64();
bits(datasize) operand = V[n];
bits(datasize) operand2;
bits(datasize) result;
integer round_const = if round then (1 << (shift - 1)) else 0;
integer element;

operand2 = if accumulate then V[d] else Zeros();

for e = 0 to elements-1
element = (Int(Elem[operand, e, esize], unsigned) + round_const) >> shift;
Elem[result, e, esize] = Elem[operand2, e, esize] + element<esize-1:0>;

V[d] = result;

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

USHR Page 1658

USMMLA (vector)

Unsigned and signed 8-bit integer matrix multiply-accumulate. This instruction multiplies the 2x8 matrix of unsigned
8-bit integer values in the first source vector by the 8x2 matrix of signed 8-bit integer values in the second source
vector. The resulting 2x2 32-bit integer matrix product is destructively added to the 32-bit integer matrix accumulator
in the destination vector. This is equivalent to performing an 8-way dot product per destination element.
From Armv8.2 to Armv8.5, this is an OPTIONAL instruction. From Armv8.6 it is mandatory for implementations that
include Advanced SIMD to support it. ID_AA64ISAR1_EL1.I8MM indicates whether this instruction is supported.

Vector
(FEAT_I8MM)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 0 1 1 1 0 1 0 0 Rm 1 0 1 0 1 1 Rn Rd
U B

USMMLA <Vd>.4S, <Vn>.16B, <Vm>.16B

if !HaveInt8MatMulExt() then UNDEFINED;

integer n = UInt(Rn);
integer m = UInt(Rm);
integer d = UInt(Rd);

Assembler Symbols

Operation

CheckFPAdvSIMDEnabled64();
bits(128) operand1 = V[n];
bits(128) operand2 = V[m];
bits(128) addend = V[d];

V[d] = MatMulAdd(addend, operand1, operand2, TRUE, FALSE);

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

USMMLA (vector) Page 1659

USQADD

Unsigned saturating Accumulate of Signed value. This instruction adds the signed integer values of the vector
elements in the source SIMD&FP register to corresponding unsigned integer values of the vector elements in the
destination SIMD&FP register, and accumulates the resulting unsigned integer values with the vector elements of the
destination SIMD&FP register.
If overflow occurs with any of the results, those results are saturated. If saturation occurs, the cumulative saturation
bit FPSR.QC is set.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: Scalar and Vector

Scalar

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 1 1 0 size 1 0 0 0 0 0 0 0 1 1 1 0 Rn Rd
U

USQADD <V><d>, <V><n>

integer d = UInt(Rd);
integer n = UInt(Rn);

integer esize = 8 << UInt(size);

integer datasize = esize;
integer elements = 1;

boolean unsigned = (U == '1');

Vector

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 size 1 0 0 0 0 0 0 0 1 1 1 0 Rn Rd
U

USQADD <Vd>.<T>, <Vn>.<T>

integer d = UInt(Rd);
integer n = UInt(Rn);

if size:Q == '110' then UNDEFINED;

integer esize = 8 << UInt(size);
integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;

boolean unsigned = (U == '1');

Assembler Symbols

<V> Is a width specifier, encoded in “size”:

size <V>
00 B
01 H
10 S
11 D

USQADD Page 1660

<T> Is an arrangement specifier, encoded in “size:Q”:

size Q <T>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 0 RESERVED
11 1 2D

<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.

Operation

CheckFPAdvSIMDEnabled64();
bits(datasize) operand = V[n];
bits(datasize) result;

bits(datasize) operand2 = V[d];

integer op1;
integer op2;
boolean sat;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

USQADD Page 1661

USRA

Unsigned Shift Right and Accumulate (immediate). This instruction reads each vector element in the source SIMD&FP
register, right shifts each result by an immediate value, and accumulates the final results with the vector elements of
the destination SIMD&FP register. All the values in this instruction are unsigned integer values. The results are
truncated. For rounded results, see URSRA.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
It has encodings from 2 classes: Scalar and Vector

Scalar

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 1 1 1 1 1 0 != 0000 immb 0 0 0 1 0 1 Rn Rd
U immh o1 o0

USRA <V><d>, <V><n>, #<shift>

integer d = UInt(Rd);
integer n = UInt(Rn);

if immh<3> != '1' then UNDEFINED;

integer esize = 8 << 3;
integer datasize = esize;
integer elements = 1;

integer shift = (esize * 2) - UInt(immh:immb);

boolean unsigned = (U == '1');
boolean round = (o1 == '1');
boolean accumulate = (o0 == '1');

Vector

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 1 0 != 0000 immb 0 0 0 1 0 1 Rn Rd
U immh o1 o0

USRA <Vd>.<T>, <Vn>.<T>, #<shift>

integer d = UInt(Rd);
integer n = UInt(Rn);

if immh == '0000' then SEE(asimdimm);

if immh<3>:Q == '10' then UNDEFINED;
integer esize = 8 << HighestSetBit(immh);
integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;

integer shift = (esize * 2) - UInt(immh:immb);

boolean unsigned = (U == '1');
boolean round = (o1 == '1');
boolean accumulate = (o0 == '1');

Assembler Symbols

<V> Is a width specifier, encoded in “immh”:

immh <V>
0xxx RESERVED
1xxx D

USRA Page 1662

<T> Is an arrangement specifier, encoded in “immh:Q”:

immh Q <T>
0000 x SEE Advanced SIMD modified immediate
0001 0 8B
0001 1 16B
001x 0 4H
001x 1 8H
01xx 0 2S
01xx 1 4S
1xxx 0 RESERVED
1xxx 1 2D

<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.

<shift> For the scalar variant: is the right shift amount, in the range 1 to 64, encoded in “immh:immb”:

immh <shift>
0xxx RESERVED
1xxx (128-UInt(immh:immb))

For the vector variant: is the right shift amount, in the range 1 to the element width in bits, encoded in
“immh:immb”:

immh <shift>
0000 SEE Advanced SIMD modified immediate
0001 (16-UInt(immh:immb))
001x (32-UInt(immh:immb))
01xx (64-UInt(immh:immb))
1xxx (128-UInt(immh:immb))

Operation

CheckFPAdvSIMDEnabled64();
bits(datasize) operand = V[n];
bits(datasize) operand2;
bits(datasize) result;
integer round_const = if round then (1 << (shift - 1)) else 0;
integer element;

operand2 = if accumulate then V[d] else Zeros();

for e = 0 to elements-1
element = (Int(Elem[operand, e, esize], unsigned) + round_const) >> shift;
Elem[result, e, esize] = Elem[operand2, e, esize] + element<esize-1:0>;

V[d] = result;

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

USRA Page 1663

USUBL, USUBL2

Unsigned Subtract Long. This instruction subtracts each vector element in the lower or upper half of the second
source SIMD&FP register from the corresponding vector element of the first source SIMD&FP register, places the
result into a vector, and writes the vector to the destination SIMD&FP register. All the values in this instruction are
unsigned integer values. The destination vector elements are twice as long as the source vector elements.
The USUBL instruction extracts each source vector from the lower half of each source register. The USUBL2 instruction
extracts each source vector from the upper half of each source register.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 size 1 Rm 0 0 1 0 0 0 Rn Rd
U o1

USUBL{2} <Vd>.<Ta>, <Vn>.<Tb>, <Vm>.<Tb>

integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);

if size == '11' then UNDEFINED;

integer esize = 8 << UInt(size);
integer datasize = 64;
integer part = UInt(Q);
integer elements = datasize DIV esize;

boolean sub_op = (o1 == '1');

boolean unsigned = (U == '1');

Assembler Symbols

2 Is the second and upper half specifier. If present it causes the operation to be performed on the upper
64 bits of the registers holding the narrower elements, and is encoded in “Q”:

Q 2
0 [absent]
1 [present]

<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.

<Ta> Is an arrangement specifier, encoded in “size”:

size <Ta>
00 8H
01 4S
10 2D
11 RESERVED

<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.

<Tb> Is an arrangement specifier, encoded in “size:Q”:

size Q <Tb>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 x RESERVED

<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.

USUBL, USUBL2 Page 1664

Operation

CheckFPAdvSIMDEnabled64();
bits(datasize) operand1 = Vpart[n, part];
bits(datasize) operand2 = Vpart[m, part];
bits(2*datasize) result;
integer element1;
integer element2;
integer sum;

V[d] = result;

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

USUBL, USUBL2 Page 1665

USUBW, USUBW2

Unsigned Subtract Wide. This instruction subtracts each vector element of the second source SIMD&FP register from
the corresponding vector element in the lower or upper half of the first source SIMD&FP register, places the result in
a vector, and writes the vector to the SIMD&FP destination register. All the values in this instruction are unsigned
integer values.
The vector elements of the destination register and the first source register are twice as long as the vector elements of
the second source register.
The USUBW instruction extracts vector elements from the lower half of the first source register. The USUBW2 instruction
extracts vector elements from the upper half of the first source register.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 0 size 1 Rm 0 0 1 1 0 0 Rn Rd
U o1

USUBW{2} <Vd>.<Ta>, <Vn>.<Ta>, <Vm>.<Tb>

integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);

if size == '11' then UNDEFINED;

integer esize = 8 << UInt(size);
integer datasize = 64;
integer part = UInt(Q);
integer elements = datasize DIV esize;

boolean sub_op = (o1 == '1');

boolean unsigned = (U == '1');

Assembler Symbols

2 Is the second and upper half specifier. If present it causes the operation to be performed on the upper
64 bits of the registers holding the narrower elements, and is encoded in “Q”:

Q 2
0 [absent]
1 [present]

<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.

<Ta> Is an arrangement specifier, encoded in “size”:

size <Ta>
00 8H
01 4S
10 2D
11 RESERVED

<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.

<Tb> Is an arrangement specifier, encoded in “size:Q”:

USUBW, USUBW2 Page 1666

size Q <Tb>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 x RESERVED

Operation

CheckFPAdvSIMDEnabled64();
bits(2*datasize) operand1 = V[n];
bits(datasize) operand2 = Vpart[m, part];
bits(2*datasize) result;
integer element1;
integer element2;
integer sum;

V[d] = result;

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

USUBW, USUBW2 Page 1667

UXTL, UXTL2

Unsigned extend Long. This instruction copies each vector element from the lower or upper half of the source
SIMD&FP register into a vector, and writes the vector to the destination SIMD&FP register. The destination vector
elements are twice as long as the source vector elements.
The UXTL instruction extracts vector elements from the lower half of the source register. The UXTL2 instruction
extracts vector elements from the upper half of the source register.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.

This is an alias of USHLL, USHLL2. This means:

• The encodings in this description are named to match the encodings of USHLL, USHLL2.
• The description of USHLL, USHLL2 gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 1 0 1 1 1 1 0 != 0000 0 0 0 1 0 1 0 0 1 Rn Rd
U immh immb

UXTL{2} <Vd>.<Ta>, <Vn>.<Tb>

is equivalent to

USHLL{2} <Vd>.<Ta>, <Vn>.<Tb>, #0

and is the preferred disassembly when BitCount(immh) == 1.

Assembler Symbols

2 Is the second and upper half specifier. If present it causes the operation to be performed on the upper
64 bits of the registers holding the narrower elements, and is encoded in “Q”:

Q 2
0 [absent]
1 [present]

<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.

<Ta> Is an arrangement specifier, encoded in “immh”:

immh <Ta>
0000 SEE Advanced SIMD modified immediate
0001 8H
001x 4S
01xx 2D
1xxx RESERVED

<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.

<Tb> Is an arrangement specifier, encoded in “immh:Q”:

immh Q <Tb>
0000 x SEE Advanced SIMD modified immediate
0001 0 8B
0001 1 16B
001x 0 4H
001x 1 8H
01xx 0 2S
01xx 1 4S
1xxx x RESERVED

UXTL, UXTL2 Page 1668

Operation

The description of USHLL, USHLL2 gives the operational pseudocode for this instruction.

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

UXTL, UXTL2 Page 1669

UZP1

Unzip vectors (primary). This instruction reads corresponding even-numbered vector elements from the two source
SIMD&FP registers, starting at zero, places the result from the first source register into consecutive elements in the
lower half of a vector, and the result from the second source register into consecutive elements in the upper half of a
vector, and writes the vector to the destination SIMD&FP register.

Note

This instruction can be used with UZP2 to de-interleave two vectors.

The following figure shows an example of the operation of UZP1 and UZP2 with the arrangement specifier 8B.
Vn A7 A6 A5 A4 A3 A2 A1 A0
Vm B7 B6 B5 B4 B3 B2 B1 B0

UZP1.8, doubleword UZP2.8, doubleword

Vd B6 B4 B2 B0 A6 A4 A2 A0 Vd B7 B5 B3 B1 A7 A5 A3 A1

UZP1 <Vd>.<T>, <Vn>.<T>, <Vm>.<T>

integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);

if size:Q == '110' then UNDEFINED;

integer esize = 8 << UInt(size);
integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;
integer part = UInt(op);

Assembler Symbols

<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.

<T> Is an arrangement specifier, encoded in “size:Q”:

size Q <T>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 0 RESERVED
11 1 2D

<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.

UZP1 Page 1670

Operation

CheckFPAdvSIMDEnabled64();
bits(datasize) operandl = V[n];
bits(datasize) operandh = V[m];
bits(datasize) result;

bits(datasize*2) zipped = operandh:operandl;

for e = 0 to elements-1
Elem[result, e, esize] = Elem[zipped, 2*e+part, esize];

V[d] = result;

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

UZP1 Page 1671

UZP2

Unzip vectors (secondary). This instruction reads corresponding odd-numbered vector elements from the two source
SIMD&FP registers, places the result from the first source register into consecutive elements in the lower half of a
vector, and the result from the second source register into consecutive elements in the upper half of a vector, and
writes the vector to the destination SIMD&FP register.

Note

This instruction can be used with UZP1 to de-interleave two vectors.

The following figure shows an example of the operation of UZP1 and UZP2 with the arrangement specifier 8B.
Vn A7 A6 A5 A4 A3 A2 A1 A0
Vm B7 B6 B5 B4 B3 B2 B1 B0

UZP1.8, doubleword UZP2.8, doubleword

Vd B6 B4 B2 B0 A6 A4 A2 A0 Vd B7 B5 B3 B1 A7 A5 A3 A1

UZP2 <Vd>.<T>, <Vn>.<T>, <Vm>.<T>

integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);

if size:Q == '110' then UNDEFINED;

integer esize = 8 << UInt(size);
integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;
integer part = UInt(op);

Assembler Symbols

<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.

<T> Is an arrangement specifier, encoded in “size:Q”:

size Q <T>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 0 RESERVED
11 1 2D

<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.

UZP2 Page 1672

Operation

CheckFPAdvSIMDEnabled64();
bits(datasize) operandl = V[n];
bits(datasize) operandh = V[m];
bits(datasize) result;

bits(datasize*2) zipped = operandh:operandl;

for e = 0 to elements-1
Elem[result, e, esize] = Elem[zipped, 2*e+part, esize];

V[d] = result;

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

UZP2 Page 1673

XAR

Exclusive OR and Rotate performs a bitwise exclusive OR of the 128-bit vectors in the two source SIMD&FP registers,
rotates each 64-bit element of the resulting 128-bit vector right by the value specified by a 6-bit immediate value, and
writes the result to the destination SIMD&FP register.
This instruction is implemented only when FEAT_SHA3 is implemented.

Advanced SIMD
(FEAT_SHA3)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 1 1 1 0 1 0 0 Rm imm6 Rn Rd

XAR <Vd>.2D, <Vn>.2D, <Vm>.2D, #<imm6>

if !HaveSHA3Ext() then UNDEFINED;

integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);

Assembler Symbols

Operation

AArch64.CheckFPAdvSIMDEnabled();

bits(128) Vm = V[m];
bits(128) Vn = V[n];
bits(128) tmp;
tmp = Vn EOR Vm;
V[d] = ROR(tmp<127:64>, UInt(imm6)):ROR(tmp<63:0>, UInt(imm6));

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

XAR Page 1674

XTN, XTN2

Extract Narrow. This instruction reads each vector element from the source SIMD&FP register, narrows each value to
half the original width, places the result into a vector, and writes the vector to the lower or upper half of the
destination SIMD&FP register. The destination vector elements are half as long as the source vector elements.
The XTN instruction writes the vector to the lower half of the destination register and clears the upper half, while the
XTN2 instruction writes the vector to the upper half of the destination register without affecting the other bits of the
register.
Depending on the settings in the CPACR_EL1, CPTR_EL2, and CPTR_EL3 registers, and the current Security state and
Exception level, an attempt to execute the instruction might be trapped.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 Q 0 0 1 1 1 0 size 1 0 0 0 0 1 0 0 1 0 1 0 Rn Rd

XTN{2} <Vd>.<Tb>, <Vn>.<Ta>

integer d = UInt(Rd);
integer n = UInt(Rn);

if size == '11' then UNDEFINED;

integer esize = 8 << UInt(size);
integer datasize = 64;
integer part = UInt(Q);
integer elements = datasize DIV esize;

Assembler Symbols

2 Is the second and upper half specifier. If present it causes the operation to be performed on the upper
64 bits of the registers holding the narrower elements, and is encoded in “Q”:

Q 2
0 [absent]
1 [present]

<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.

<Tb> Is an arrangement specifier, encoded in “size:Q”:

size Q <Tb>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 x RESERVED

<Vn> Is the name of the SIMD&FP source register, encoded in the "Rn" field.

<Ta> Is an arrangement specifier, encoded in “size”:

size <Ta>
00 8H
01 4S
10 2D
11 RESERVED

XTN, XTN2 Page 1675

Operation

CheckFPAdvSIMDEnabled64();
bits(2*datasize) operand = V[n];
bits(datasize) result;
bits(2*esize) element;

for e = 0 to elements-1
element = Elem[operand, e, 2*esize];
Elem[result, e, esize] = element<esize-1:0>;
Vpart[d, part] = result;

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

XTN, XTN2 Page 1676

ZIP1

Zip vectors (primary). This instruction reads adjacent vector elements from the lower half of two source SIMD&FP
registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination
SIMD&FP register. The first pair from the first source register is placed into the two lowest vector elements, with
subsequent pairs taken alternately from each source register.

Note

This instruction can be used with ZIP2 to interleave two vectors.

The following figure shows an example of the operation of ZIP1 and ZIP2 with the arrangement specifier 8B.
Vn A7 A6 A5 A4 A3 A2 A1 A0
Vm B7 B6 B5 B4 B3 B2 B1 B0

ZIP1.8, doubleword ZIP2.8, doubleword

Vd B3 A3 B2 A2 B1 A1 B0 A0 Vd B7 A7 B6 A6 B5 A5 B4 A4

ZIP1 <Vd>.<T>, <Vn>.<T>, <Vm>.<T>

integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);

if size:Q == '110' then UNDEFINED;

integer esize = 8 << UInt(size);
integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;
integer part = UInt(op);
integer pairs = elements DIV 2;

Assembler Symbols

<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.

<T> Is an arrangement specifier, encoded in “size:Q”:

size Q <T>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 0 RESERVED
11 1 2D

<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.

ZIP1 Page 1677

Operation

CheckFPAdvSIMDEnabled64();
bits(datasize) operand1 = V[n];
bits(datasize) operand2 = V[m];
bits(datasize) result;

integer base = part * pairs;

for p = 0 to pairs-1
Elem[result, 2*p+0, esize] = Elem[operand1, base+p, esize];
Elem[result, 2*p+1, esize] = Elem[operand2, base+p, esize];

V[d] = result;

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

ZIP1 Page 1678

ZIP2

Zip vectors (secondary). This instruction reads adjacent vector elements from the upper half of two source SIMD&FP
registers as pairs, interleaves the pairs and places them into a vector, and writes the vector to the destination
SIMD&FP register. The first pair from the first source register is placed into the two lowest vector elements, with
subsequent pairs taken alternately from each source register.

Note

This instruction can be used with ZIP1 to interleave two vectors.

The following figure shows an example of the operation of ZIP1 and ZIP2 with the arrangement specifier 8B.
Vn A7 A6 A5 A4 A3 A2 A1 A0
Vm B7 B6 B5 B4 B3 B2 B1 B0

ZIP1.8, doubleword ZIP2.8, doubleword

Vd B3 A3 B2 A2 B1 A1 B0 A0 Vd B7 A7 B6 A6 B5 A5 B4 A4

ZIP2 <Vd>.<T>, <Vn>.<T>, <Vm>.<T>

integer d = UInt(Rd);
integer n = UInt(Rn);
integer m = UInt(Rm);

if size:Q == '110' then UNDEFINED;

integer esize = 8 << UInt(size);
integer datasize = if Q == '1' then 128 else 64;
integer elements = datasize DIV esize;
integer part = UInt(op);
integer pairs = elements DIV 2;

Assembler Symbols

<Vd> Is the name of the SIMD&FP destination register, encoded in the "Rd" field.

<T> Is an arrangement specifier, encoded in “size:Q”:

size Q <T>
00 0 8B
00 1 16B
01 0 4H
01 1 8H
10 0 2S
10 1 4S
11 0 RESERVED
11 1 2D

<Vn> Is the name of the first SIMD&FP source register, encoded in the "Rn" field.
<Vm> Is the name of the second SIMD&FP source register, encoded in the "Rm" field.

ZIP2 Page 1679

Operation

CheckFPAdvSIMDEnabled64();
bits(datasize) operand1 = V[n];
bits(datasize) operand2 = V[m];
bits(datasize) result;

integer base = part * pairs;

for p = 0 to pairs-1
Elem[result, 2*p+0, esize] = Elem[operand1, base+p, esize];
Elem[result, 2*p+1, esize] = Elem[operand2, base+p, esize];

V[d] = result;

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

ZIP2 Page 1680

A64 -- SVE Instructions (alphabetic order)

ABS: Absolute value (predicated).

ADD (immediate): Add immediate (unpredicated).

ADD (vectors, predicated): Add vectors (predicated).

ADD (vectors, unpredicated): Add vectors (unpredicated).

ADDPL: Add multiple of predicate register size to scalar register.

ADDVL: Add multiple of vector register size to scalar register.

ADR: Compute vector address.

AND (immediate): Bitwise AND with immediate (unpredicated).

AND (predicates): Bitwise AND predicates.

AND (vectors, predicated): Bitwise AND vectors (predicated).

AND (vectors, unpredicated): Bitwise AND vectors (unpredicated).

ANDS: Bitwise AND predicates, setting the condition flags.

ANDV: Bitwise AND reduction to scalar.

ASR (immediate, predicated): Arithmetic shift right by immediate (predicated).

ASR (immediate, unpredicated): Arithmetic shift right by immediate (unpredicated).

ASR (vectors): Arithmetic shift right by vector (predicated).

ASR (wide elements, predicated): Arithmetic shift right by 64-bit wide elements (predicated).

ASR (wide elements, unpredicated): Arithmetic shift right by 64-bit wide elements (unpredicated).

ASRD: Arithmetic shift right for divide by immediate (predicated).

ASRR: Reversed arithmetic shift right by vector (predicated).

BFCVT: Floating-point down convert to BFloat16 format (predicated).

BFCVTNT: Floating-point down convert and narrow to BFloat16 (top, predicated).

BFDOT (indexed): BFloat16 floating-point indexed dot product.

BFDOT (vectors): BFloat16 floating-point dot product.

BFMLALB (indexed): BFloat16 floating-point multiply-add long to single-precision (bottom, indexed).

BFMLALB (vectors): BFloat16 floating-point multiply-add long to single-precision (bottom).

BFMLALT (indexed): BFloat16 floating-point multiply-add long to single-precision (top, indexed).

BFMLALT (vectors): BFloat16 floating-point multiply-add long to single-precision (top).

BFMMLA: BFloat16 floating-point matrix multiply-accumulate.

BIC (immediate): Bitwise clear bits using immediate (unpredicated): an alias of AND (immediate).

BIC (predicates): Bitwise clear predicates.

BIC (vectors, predicated): Bitwise clear vectors (predicated).

BIC (vectors, unpredicated): Bitwise clear vectors (unpredicated).

BICS: Bitwise clear predicates, setting the condition flags.

Page 1681
A64 -- SVE Instructions (alphabetic order)

BRKA: Break after first true condition.

BRKAS: Break after first true condition, setting the condition flags.

BRKB: Break before first true condition.

BRKBS: Break before first true condition, setting the condition flags.

BRKN: Propagate break to next partition.

BRKNS: Propagate break to next partition, setting the condition flags.

BRKPA: Break after first true condition, propagating from previous partition.

BRKPAS: Break after first true condition, propagating from previous partition and setting the condition flags.

BRKPB: Break before first true condition, propagating from previous partition.

BRKPBS: Break before first true condition, propagating from previous partition and setting the condition flags.

CLASTA (scalar): Conditionally extract element after last to general-purpose register.

CLASTA (SIMD&FP scalar): Conditionally extract element after last to SIMD&FP scalar register.

CLASTA (vectors): Conditionally extract element after last to vector register.

CLASTB (scalar): Conditionally extract last element to general-purpose register.

CLASTB (SIMD&FP scalar): Conditionally extract last element to SIMD&FP scalar register.

CLASTB (vectors): Conditionally extract last element to vector register.

CLS: Count leading sign bits (predicated).

CLZ: Count leading zero bits (predicated).

CMP<cc> (immediate): Compare vector to immediate.

CMP<cc> (vectors): Compare vectors.

CMP<cc> (wide elements): Compare vector to 64-bit wide elements.

CMPLE (vectors): Compare signed less than or equal to vector, setting the condition flags: an alias of CMP<cc>
(vectors).

CMPLO (vectors): Compare unsigned lower than vector, setting the condition flags: an alias of CMP<cc> (vectors).

CMPLS (vectors): Compare unsigned lower or same as vector, setting the condition flags: an alias of CMP<cc>
(vectors).

CMPLT (vectors): Compare signed less than vector, setting the condition flags: an alias of CMP<cc> (vectors).

CNOT: Logically invert boolean condition in vector (predicated).

CNT: Count non-zero bits (predicated).

CNTB, CNTD, CNTH, CNTW: Set scalar to multiple of predicate constraint element count.

CNTP: Set scalar to count of true predicate elements.

COMPACT: Shuffle active elements of vector to the right and fill with zero.

CPY (immediate, merging): Copy signed integer immediate to vector elements (merging).

CPY (immediate, zeroing): Copy signed integer immediate to vector elements (zeroing).

CPY (scalar): Copy general-purpose register to vector elements (predicated).

CPY (SIMD&FP scalar): Copy SIMD&FP scalar register to vector elements (predicated).

CTERMEQ, CTERMNE: Compare and terminate loop.

Page 1682
A64 -- SVE Instructions (alphabetic order)

DECB, DECD, DECH, DECW (scalar): Decrement scalar by multiple of predicate constraint element count.

DECD, DECH, DECW (vector): Decrement vector by multiple of predicate constraint element count.

DECP (scalar): Decrement scalar by count of true predicate elements.

DECP (vector): Decrement vector by count of true predicate elements.

DUP (immediate): Broadcast signed immediate to vector elements (unpredicated).

DUP (indexed): Broadcast indexed element to vector (unpredicated).

DUP (scalar): Broadcast general-purpose register to vector elements (unpredicated).

DUPM: Broadcast logical bitmask immediate to vector (unpredicated).

EON: Bitwise exclusive OR with inverted immediate (unpredicated): an alias of EOR (immediate).

EOR (immediate): Bitwise exclusive OR with immediate (unpredicated).

EOR (predicates): Bitwise exclusive OR predicates.

EOR (vectors, predicated): Bitwise exclusive OR vectors (predicated).

EOR (vectors, unpredicated): Bitwise exclusive OR vectors (unpredicated).

EORS: Bitwise exclusive OR predicates, setting the condition flags.

EORV: Bitwise exclusive OR reduction to scalar.

EXT: Extract vector from pair of vectors.

FABD: Floating-point absolute difference (predicated).

FABS: Floating-point absolute value (predicated).

FAC<cc>: Floating-point absolute compare vectors.

FACLE: Floating-point absolute compare less than or equal: an alias of FAC<cc>.

FACLT: Floating-point absolute compare less than: an alias of FAC<cc>.

FADD (immediate): Floating-point add immediate (predicated).

FADD (vectors, predicated): Floating-point add vector (predicated).

FADD (vectors, unpredicated): Floating-point add vector (unpredicated).

FADDA: Floating-point add strictly-ordered reduction, accumulating in scalar.

FADDV: Floating-point add recursive reduction to scalar.

FCADD: Floating-point complex add with rotate (predicated).

FCM<cc> (vectors): Floating-point compare vectors.

FCM<cc> (zero): Floating-point compare vector with zero.

FCMLA (indexed): Floating-point complex multiply-add by indexed values with rotate.

FCMLA (vectors): Floating-point complex multiply-add with rotate (predicated).

FCMLE (vectors): Floating-point compare less than or equal to vector: an alias of FCM<cc> (vectors).

FCMLT (vectors): Floating-point compare less than vector: an alias of FCM<cc> (vectors).

FCPY: Copy 8-bit floating-point immediate to vector elements (predicated).

FCVT: Floating-point convert precision (predicated).

FCVTZS: Floating-point convert to signed integer, rounding toward zero (predicated).

Page 1683
A64 -- SVE Instructions (alphabetic order)

FCVTZU: Floating-point convert to unsigned integer, rounding toward zero (predicated).

FDIV: Floating-point divide by vector (predicated).

FDIVR: Floating-point reversed divide by vector (predicated).

FDUP: Broadcast 8-bit floating-point immediate to vector elements (unpredicated).

FEXPA: Floating-point exponential accelerator.

FMAD: Floating-point fused multiply-add vectors (predicated), writing multiplicand [Zdn = Za + Zdn * Zm].

FMAX (immediate): Floating-point maximum with immediate (predicated).

FMAX (vectors): Floating-point maximum (predicated).

FMAXNM (immediate): Floating-point maximum number with immediate (predicated).

FMAXNM (vectors): Floating-point maximum number (predicated).

FMAXNMV: Floating-point maximum number recursive reduction to scalar.

FMAXV: Floating-point maximum recursive reduction to scalar.

FMIN (immediate): Floating-point minimum with immediate (predicated).

FMIN (vectors): Floating-point minimum (predicated).

FMINNM (immediate): Floating-point minimum number with immediate (predicated).

FMINNM (vectors): Floating-point minimum number (predicated).

FMINNMV: Floating-point minimum number recursive reduction to scalar.

FMINV: Floating-point minimum recursive reduction to scalar.

FMLA (indexed): Floating-point fused multiply-add by indexed elements (Zda = Zda + Zn * Zm[indexed]).

FMLA (vectors): Floating-point fused multiply-add vectors (predicated), writing addend [Zda = Zda + Zn * Zm].

FMLS (indexed): Floating-point fused multiply-subtract by indexed elements (Zda = Zda + -Zn * Zm[indexed]).

FMLS (vectors): Floating-point fused multiply-subtract vectors (predicated), writing addend [Zda = Zda + -Zn * Zm].

FMMLA: Floating-point matrix multiply-accumulate.

FMOV (immediate, predicated): Move 8-bit floating-point immediate to vector elements (predicated): an alias of FCPY.

FMOV (immediate, unpredicated): Move 8-bit floating-point immediate to vector elements (unpredicated): an alias of
FDUP.

FMOV (zero, predicated): Move floating-point +0.0 to vector elements (predicated): an alias of CPY (immediate,
merging).

FMOV (zero, unpredicated): Move floating-point +0.0 to vector elements (unpredicated): an alias of DUP (immediate).

FMSB: Floating-point fused multiply-subtract vectors (predicated), writing multiplicand [Zdn = Za + -Zdn * Zm].

FMUL (immediate): Floating-point multiply by immediate (predicated).

FMUL (indexed): Floating-point multiply by indexed elements.

FMUL (vectors, predicated): Floating-point multiply vectors (predicated).

FMUL (vectors, unpredicated): Floating-point multiply vectors (unpredicated).

FMULX: Floating-point multiply-extended vectors (predicated).

FNEG: Floating-point negate (predicated).

FNMAD: Floating-point negated fused multiply-add vectors (predicated), writing multiplicand [Zdn = -Za + -Zdn *
Zm].

Page 1684
A64 -- SVE Instructions (alphabetic order)

FNMLA: Floating-point negated fused multiply-add vectors (predicated), writing addend [Zda = -Zda + -Zn * Zm].

FNMLS: Floating-point negated fused multiply-subtract vectors (predicated), writing addend [Zda = -Zda + Zn * Zm].

FNMSB: Floating-point negated fused multiply-subtract vectors (predicated), writing multiplicand [Zdn = -Za + Zdn *
Zm].

FRECPE: Floating-point reciprocal estimate (unpredicated).

FRECPS: Floating-point reciprocal step (unpredicated).

FRECPX: Floating-point reciprocal exponent (predicated).

FRINT<r>: Floating-point round to integral value (predicated).

FRSQRTE: Floating-point reciprocal square root estimate (unpredicated).

FRSQRTS: Floating-point reciprocal square root step (unpredicated).

FSCALE: Floating-point adjust exponent by vector (predicated).

FSQRT: Floating-point square root (predicated).

FSUB (immediate): Floating-point subtract immediate (predicated).

FSUB (vectors, predicated): Floating-point subtract vectors (predicated).

FSUB (vectors, unpredicated): Floating-point subtract vectors (unpredicated).

FSUBR (immediate): Floating-point reversed subtract from immediate (predicated).

FSUBR (vectors): Floating-point reversed subtract vectors (predicated).

FTMAD: Floating-point trigonometric multiply-add coefficient.

FTSMUL: Floating-point trigonometric starting value.

FTSSEL: Floating-point trigonometric select coefficient.

INCB, INCD, INCH, INCW (scalar): Increment scalar by multiple of predicate constraint element count.

INCD, INCH, INCW (vector): Increment vector by multiple of predicate constraint element count.

INCP (scalar): Increment scalar by count of true predicate elements.

INCP (vector): Increment vector by count of true predicate elements.

INDEX (immediate, scalar): Create index starting from immediate and incremented by general-purpose register.

INDEX (immediates): Create index starting from and incremented by immediate.

INDEX (scalar, immediate): Create index starting from general-purpose register and incremented by immediate.

INDEX (scalars): Create index starting from and incremented by general-purpose register.

INSR (scalar): Insert general-purpose register in shifted vector.

INSR (SIMD&FP scalar): Insert SIMD&FP scalar register in shifted vector.

LASTA (scalar): Extract element after last to general-purpose register.

LASTA (SIMD&FP scalar): Extract element after last to SIMD&FP scalar register.

LASTB (scalar): Extract last element to general-purpose register.

LASTB (SIMD&FP scalar): Extract last element to SIMD&FP scalar register.

LD1B (scalar plus immediate): Contiguous load unsigned bytes to vector (immediate index).

LD1B (scalar plus scalar): Contiguous load unsigned bytes to vector (scalar index).

LD1B (scalar plus vector): Gather load unsigned bytes to vector (vector index).

Page 1685
A64 -- SVE Instructions (alphabetic order)

LD1B (vector plus immediate): Gather load unsigned bytes to vector (immediate index).

LD1D (scalar plus immediate): Contiguous load doublewords to vector (immediate index).

LD1D (scalar plus scalar): Contiguous load doublewords to vector (scalar index).

LD1D (scalar plus vector): Gather load doublewords to vector (vector index).

LD1D (vector plus immediate): Gather load doublewords to vector (immediate index).

LD1H (scalar plus immediate): Contiguous load unsigned halfwords to vector (immediate index).

LD1H (scalar plus scalar): Contiguous load unsigned halfwords to vector (scalar index).

LD1H (scalar plus vector): Gather load unsigned halfwords to vector (vector index).

LD1H (vector plus immediate): Gather load unsigned halfwords to vector (immediate index).

LD1RB: Load and broadcast unsigned byte to vector.

LD1RD: Load and broadcast doubleword to vector.

LD1RH: Load and broadcast unsigned halfword to vector.

LD1ROB (scalar plus immediate): Contiguous load and replicate thirty-two bytes (immediate index).

LD1ROB (scalar plus scalar): Contiguous load and replicate thirty-two bytes (scalar index).

LD1ROD (scalar plus immediate): Contiguous load and replicate four doublewords (immediate index).

LD1ROD (scalar plus scalar): Contiguous load and replicate four doublewords (scalar index).

LD1ROH (scalar plus immediate): Contiguous load and replicate sixteen halfwords (immediate index).

LD1ROH (scalar plus scalar): Contiguous load and replicate sixteen halfwords (scalar index).

LD1ROW (scalar plus immediate): Contiguous load and replicate eight words (immediate index).

LD1ROW (scalar plus scalar): Contiguous load and replicate eight words (scalar index).

LD1RQB (scalar plus immediate): Contiguous load and replicate sixteen bytes (immediate index).

LD1RQB (scalar plus scalar): Contiguous load and replicate sixteen bytes (scalar index).

LD1RQD (scalar plus immediate): Contiguous load and replicate two doublewords (immediate index).

LD1RQD (scalar plus scalar): Contiguous load and replicate two doublewords (scalar index).

LD1RQH (scalar plus immediate): Contiguous load and replicate eight halfwords (immediate index).

LD1RQH (scalar plus scalar): Contiguous load and replicate eight halfwords (scalar index).

LD1RQW (scalar plus immediate): Contiguous load and replicate four words (immediate index).

LD1RQW (scalar plus scalar): Contiguous load and replicate four words (scalar index).

LD1RSB: Load and broadcast signed byte to vector.

LD1RSH: Load and broadcast signed halfword to vector.

LD1RSW: Load and broadcast signed word to vector.

LD1RW: Load and broadcast unsigned word to vector.

LD1SB (scalar plus immediate): Contiguous load signed bytes to vector (immediate index).

LD1SB (scalar plus scalar): Contiguous load signed bytes to vector (scalar index).

LD1SB (scalar plus vector): Gather load signed bytes to vector (vector index).

LD1SB (vector plus immediate): Gather load signed bytes to vector (immediate index).

Page 1686
A64 -- SVE Instructions (alphabetic order)

LD1SH (scalar plus immediate): Contiguous load signed halfwords to vector (immediate index).

LD1SH (scalar plus scalar): Contiguous load signed halfwords to vector (scalar index).

LD1SH (scalar plus vector): Gather load signed halfwords to vector (vector index).

LD1SH (vector plus immediate): Gather load signed halfwords to vector (immediate index).

LD1SW (scalar plus immediate): Contiguous load signed words to vector (immediate index).

LD1SW (scalar plus scalar): Contiguous load signed words to vector (scalar index).

LD1SW (scalar plus vector): Gather load signed words to vector (vector index).

LD1SW (vector plus immediate): Gather load signed words to vector (immediate index).

LD1W (scalar plus immediate): Contiguous load unsigned words to vector (immediate index).

LD1W (scalar plus scalar): Contiguous load unsigned words to vector (scalar index).

LD1W (scalar plus vector): Gather load unsigned words to vector (vector index).

LD1W (vector plus immediate): Gather load unsigned words to vector (immediate index).

LD2B (scalar plus immediate): Contiguous load two-byte structures to two vectors (immediate index).

LD2B (scalar plus scalar): Contiguous load two-byte structures to two vectors (scalar index).

LD2D (scalar plus immediate): Contiguous load two-doubleword structures to two vectors (immediate index).

LD2D (scalar plus scalar): Contiguous load two-doubleword structures to two vectors (scalar index).

LD2H (scalar plus immediate): Contiguous load two-halfword structures to two vectors (immediate index).

LD2H (scalar plus scalar): Contiguous load two-halfword structures to two vectors (scalar index).

LD2W (scalar plus immediate): Contiguous load two-word structures to two vectors (immediate index).

LD2W (scalar plus scalar): Contiguous load two-word structures to two vectors (scalar index).

LD3B (scalar plus immediate): Contiguous load three-byte structures to three vectors (immediate index).

LD3B (scalar plus scalar): Contiguous load three-byte structures to three vectors (scalar index).

LD3D (scalar plus immediate): Contiguous load three-doubleword structures to three vectors (immediate index).

LD3D (scalar plus scalar): Contiguous load three-doubleword structures to three vectors (scalar index).

LD3H (scalar plus immediate): Contiguous load three-halfword structures to three vectors (immediate index).

LD3H (scalar plus scalar): Contiguous load three-halfword structures to three vectors (scalar index).

LD3W (scalar plus immediate): Contiguous load three-word structures to three vectors (immediate index).

LD3W (scalar plus scalar): Contiguous load three-word structures to three vectors (scalar index).

LD4B (scalar plus immediate): Contiguous load four-byte structures to four vectors (immediate index).

LD4B (scalar plus scalar): Contiguous load four-byte structures to four vectors (scalar index).

LD4D (scalar plus immediate): Contiguous load four-doubleword structures to four vectors (immediate index).

LD4D (scalar plus scalar): Contiguous load four-doubleword structures to four vectors (scalar index).

LD4H (scalar plus immediate): Contiguous load four-halfword structures to four vectors (immediate index).

LD4H (scalar plus scalar): Contiguous load four-halfword structures to four vectors (scalar index).

LD4W (scalar plus immediate): Contiguous load four-word structures to four vectors (immediate index).

LD4W (scalar plus scalar): Contiguous load four-word structures to four vectors (scalar index).

Page 1687
A64 -- SVE Instructions (alphabetic order)

LDFF1B (scalar plus scalar): Contiguous load first-fault unsigned bytes to vector (scalar index).

LDFF1B (scalar plus vector): Gather load first-fault unsigned bytes to vector (vector index).

LDFF1B (vector plus immediate): Gather load first-fault unsigned bytes to vector (immediate index).

LDFF1D (scalar plus scalar): Contiguous load first-fault doublewords to vector (scalar index).

LDFF1D (scalar plus vector): Gather load first-fault doublewords to vector (vector index).

LDFF1D (vector plus immediate): Gather load first-fault doublewords to vector (immediate index).

LDFF1H (scalar plus scalar): Contiguous load first-fault unsigned halfwords to vector (scalar index).

LDFF1H (scalar plus vector): Gather load first-fault unsigned halfwords to vector (vector index).

LDFF1H (vector plus immediate): Gather load first-fault unsigned halfwords to vector (immediate index).

LDFF1SB (scalar plus scalar): Contiguous load first-fault signed bytes to vector (scalar index).

LDFF1SB (scalar plus vector): Gather load first-fault signed bytes to vector (vector index).

LDFF1SB (vector plus immediate): Gather load first-fault signed bytes to vector (immediate index).

LDFF1SH (scalar plus scalar): Contiguous load first-fault signed halfwords to vector (scalar index).

LDFF1SH (scalar plus vector): Gather load first-fault signed halfwords to vector (vector index).

LDFF1SH (vector plus immediate): Gather load first-fault signed halfwords to vector (immediate index).

LDFF1SW (scalar plus scalar): Contiguous load first-fault signed words to vector (scalar index).

LDFF1SW (scalar plus vector): Gather load first-fault signed words to vector (vector index).

LDFF1SW (vector plus immediate): Gather load first-fault signed words to vector (immediate index).

LDFF1W (scalar plus scalar): Contiguous load first-fault unsigned words to vector (scalar index).

LDFF1W (scalar plus vector): Gather load first-fault unsigned words to vector (vector index).

LDFF1W (vector plus immediate): Gather load first-fault unsigned words to vector (immediate index).

LDNF1B: Contiguous load non-fault unsigned bytes to vector (immediate index).

LDNF1D: Contiguous load non-fault doublewords to vector (immediate index).

LDNF1H: Contiguous load non-fault unsigned halfwords to vector (immediate index).

LDNF1SB: Contiguous load non-fault signed bytes to vector (immediate index).

LDNF1SH: Contiguous load non-fault signed halfwords to vector (immediate index).

LDNF1SW: Contiguous load non-fault signed words to vector (immediate index).

LDNF1W: Contiguous load non-fault unsigned words to vector (immediate index).

LDNT1B (scalar plus immediate): Contiguous load non-temporal bytes to vector (immediate index).

LDNT1B (scalar plus scalar): Contiguous load non-temporal bytes to vector (scalar index).

LDNT1D (scalar plus immediate): Contiguous load non-temporal doublewords to vector (immediate index).

LDNT1D (scalar plus scalar): Contiguous load non-temporal doublewords to vector (scalar index).

LDNT1H (scalar plus immediate): Contiguous load non-temporal halfwords to vector (immediate index).

LDNT1H (scalar plus scalar): Contiguous load non-temporal halfwords to vector (scalar index).

LDNT1W (scalar plus immediate): Contiguous load non-temporal words to vector (immediate index).

LDNT1W (scalar plus scalar): Contiguous load non-temporal words to vector (scalar index).

Page 1688
A64 -- SVE Instructions (alphabetic order)

LDR (predicate): Load predicate register.

LDR (vector): Load vector register.

LSL (immediate, predicated): Logical shift left by immediate (predicated).

LSL (immediate, unpredicated): Logical shift left by immediate (unpredicated).

LSL (vectors): Logical shift left by vector (predicated).

LSL (wide elements, predicated): Logical shift left by 64-bit wide elements (predicated).

LSL (wide elements, unpredicated): Logical shift left by 64-bit wide elements (unpredicated).

LSLR: Reversed logical shift left by vector (predicated).

LSR (immediate, predicated): Logical shift right by immediate (predicated).

LSR (immediate, unpredicated): Logical shift right by immediate (unpredicated).

LSR (vectors): Logical shift right by vector (predicated).

LSR (wide elements, predicated): Logical shift right by 64-bit wide elements (predicated).

LSR (wide elements, unpredicated): Logical shift right by 64-bit wide elements (unpredicated).

LSRR: Reversed logical shift right by vector (predicated).

MAD: Multiply-add vectors (predicated), writing multiplicand [Zdn = Za + Zdn * Zm].

MLA: Multiply-add vectors (predicated), writing addend [Zda = Zda + Zn * Zm].

MLS: Multiply-subtract vectors (predicated), writing addend [Zda = Zda - Zn * Zm].

MOV: Move logical bitmask immediate to vector (unpredicated): an alias of DUPM.

MOV: Move predicate (unpredicated): an alias of ORR (predicates).

MOV (immediate, predicated, merging): Move signed integer immediate to vector elements (merging): an alias of CPY
(immediate, merging).

MOV (immediate, predicated, zeroing): Move signed integer immediate to vector elements (zeroing): an alias of CPY
(immediate, zeroing).

MOV (immediate, unpredicated): Move signed immediate to vector elements (unpredicated): an alias of DUP
(immediate).

MOV (predicate, predicated, merging): Move predicates (merging): an alias of SEL (predicates).

MOV (predicate, predicated, zeroing): Move predicates (zeroing): an alias of AND (predicates).

MOV (scalar, predicated): Move general-purpose register to vector elements (predicated): an alias of CPY (scalar).

MOV (scalar, unpredicated): Move general-purpose register to vector elements (unpredicated): an alias of DUP
(scalar).

MOV (SIMD&FP scalar, predicated): Move SIMD&FP scalar register to vector elements (predicated): an alias of CPY
(SIMD&FP scalar).

MOV (SIMD&FP scalar, unpredicated): Move indexed element or SIMD&FP scalar to vector (unpredicated): an alias of
DUP (indexed).

MOV (vector, predicated): Move vector elements (predicated): an alias of SEL (vectors).

MOV (vector, unpredicated): Move vector register (unpredicated): an alias of ORR (vectors, unpredicated).

MOVPRFX (predicated): Move prefix (predicated).

MOVPRFX (unpredicated): Move prefix (unpredicated).

MOVS (predicated): Move predicates (zeroing), setting the condition flags: an alias of ANDS.

Page 1689
A64 -- SVE Instructions (alphabetic order)

MOVS (unpredicated): Move predicate (unpredicated), setting the condition flags: an alias of ORRS.

MSB: Multiply-subtract vectors (predicated), writing multiplicand [Zdn = Za - Zdn * Zm].

MUL (immediate): Multiply by immediate (unpredicated).

MUL (vectors): Multiply vectors (predicated).

NAND: Bitwise NAND predicates.

NANDS: Bitwise NAND predicates, setting the condition flags.

NEG: Negate (predicated).

NOR: Bitwise NOR predicates.

NORS: Bitwise NOR predicates, setting the condition flags.

NOT (predicate): Bitwise invert predicate: an alias of EOR (predicates).

NOT (vector): Bitwise invert vector (predicated).

NOTS: Bitwise invert predicate, setting the condition flags: an alias of EORS.

ORN (immediate): Bitwise inclusive OR with inverted immediate (unpredicated): an alias of ORR (immediate).

ORN (predicates): Bitwise inclusive OR inverted predicate.

ORNS: Bitwise inclusive OR inverted predicate, setting the condition flags.

ORR (immediate): Bitwise inclusive OR with immediate (unpredicated).

ORR (predicates): Bitwise inclusive OR predicates.

ORR (vectors, predicated): Bitwise inclusive OR vectors (predicated).

ORR (vectors, unpredicated): Bitwise inclusive OR vectors (unpredicated).

ORRS: Bitwise inclusive OR predicates, setting the condition flags.

ORV: Bitwise inclusive OR reduction to scalar.

PFALSE: Set all predicate elements to false.

PFIRST: Set the first active predicate element to true.

PNEXT: Find next active predicate.

PRFB (scalar plus immediate): Contiguous prefetch bytes (immediate index).

PRFB (scalar plus scalar): Contiguous prefetch bytes (scalar index).

PRFB (scalar plus vector): Gather prefetch bytes (scalar plus vector).

PRFB (vector plus immediate): Gather prefetch bytes (vector plus immediate).

PRFD (scalar plus immediate): Contiguous prefetch doublewords (immediate index).

PRFD (scalar plus scalar): Contiguous prefetch doublewords (scalar index).

PRFD (scalar plus vector): Gather prefetch doublewords (scalar plus vector).

PRFD (vector plus immediate): Gather prefetch doublewords (vector plus immediate).

PRFH (scalar plus immediate): Contiguous prefetch halfwords (immediate index).

PRFH (scalar plus scalar): Contiguous prefetch halfwords (scalar index).

PRFH (scalar plus vector): Gather prefetch halfwords (scalar plus vector).

PRFH (vector plus immediate): Gather prefetch halfwords (vector plus immediate).

Page 1690
A64 -- SVE Instructions (alphabetic order)

PRFW (scalar plus immediate): Contiguous prefetch words (immediate index).

PRFW (scalar plus scalar): Contiguous prefetch words (scalar index).

PRFW (scalar plus vector): Gather prefetch words (scalar plus vector).

PRFW (vector plus immediate): Gather prefetch words (vector plus immediate).

PTEST: Set condition flags for predicate.

PTRUE: Initialise predicate from named constraint.

PTRUES: Initialise predicate from named constraint and set the condition flags.

PUNPKHI, PUNPKLO: Unpack and widen half of predicate.

RBIT: Reverse bits (predicated).

RDFFR (predicated): Return predicate of succesfully loaded elements.

RDFFR (unpredicated): Read the first-fault register.

RDFFRS: Return predicate of succesfully loaded elements, setting the condition flags.

RDVL: Read multiple of vector register size to scalar register.

REV (predicate): Reverse all elements in a predicate.

REV (vector): Reverse all elements in a vector (unpredicated).

REVB, REVH, REVW: Reverse bytes / halfwords / words within elements (predicated).

SABD: Signed absolute difference (predicated).

SADDV: Signed add reduction to scalar.

SCVTF: Signed integer convert to floating-point (predicated).

SDIV: Signed divide (predicated).

SDIVR: Signed reversed divide (predicated).

SDOT (indexed): Signed integer indexed dot product.

SDOT (vectors): Signed integer dot product.

SEL (predicates): Conditionally select elements from two predicates.

SEL (vectors): Conditionally select elements from two vectors.

SETFFR: Initialise the first-fault register to all true.

SMAX (immediate): Signed maximum with immediate (unpredicated).

SMAX (vectors): Signed maximum vectors (predicated).

SMAXV: Signed maximum reduction to scalar.

SMIN (immediate): Signed minimum with immediate (unpredicated).

SMIN (vectors): Signed minimum vectors (predicated).

SMINV: Signed minimum reduction to scalar.

SMMLA: Signed integer matrix multiply-accumulate.

SMULH: Signed multiply returning high half (predicated).

SPLICE: Splice two vectors under predicate control.

SQADD (immediate): Signed saturating add immediate (unpredicated).

Page 1691
A64 -- SVE Instructions (alphabetic order)

SQADD (vectors): Signed saturating add vectors (unpredicated).

SQDECB: Signed saturating decrement scalar by multiple of 8-bit predicate constraint element count.

SQDECD (scalar): Signed saturating decrement scalar by multiple of 64-bit predicate constraint element count.

SQDECD (vector): Signed saturating decrement vector by multiple of 64-bit predicate constraint element count.

SQDECH (scalar): Signed saturating decrement scalar by multiple of 16-bit predicate constraint element count.

SQDECH (vector): Signed saturating decrement vector by multiple of 16-bit predicate constraint element count.

SQDECP (scalar): Signed saturating decrement scalar by count of true predicate elements.

SQDECP (vector): Signed saturating decrement vector by count of true predicate elements.

SQDECW (scalar): Signed saturating decrement scalar by multiple of 32-bit predicate constraint element count.

SQDECW (vector): Signed saturating decrement vector by multiple of 32-bit predicate constraint element count.

SQINCB: Signed saturating increment scalar by multiple of 8-bit predicate constraint element count.

SQINCD (scalar): Signed saturating increment scalar by multiple of 64-bit predicate constraint element count.

SQINCD (vector): Signed saturating increment vector by multiple of 64-bit predicate constraint element count.

SQINCH (scalar): Signed saturating increment scalar by multiple of 16-bit predicate constraint element count.

SQINCH (vector): Signed saturating increment vector by multiple of 16-bit predicate constraint element count.

SQINCP (scalar): Signed saturating increment scalar by count of true predicate elements.

SQINCP (vector): Signed saturating increment vector by count of true predicate elements.

SQINCW (scalar): Signed saturating increment scalar by multiple of 32-bit predicate constraint element count.

SQINCW (vector): Signed saturating increment vector by multiple of 32-bit predicate constraint element count.

SQSUB (immediate): Signed saturating subtract immediate (unpredicated).

SQSUB (vectors): Signed saturating subtract vectors (unpredicated).

ST1B (scalar plus immediate): Contiguous store bytes from vector (immediate index).

ST1B (scalar plus scalar): Contiguous store bytes from vector (scalar index).

ST1B (scalar plus vector): Scatter store bytes from a vector (vector index).

ST1B (vector plus immediate): Scatter store bytes from a vector (immediate index).

ST1D (scalar plus immediate): Contiguous store doublewords from vector (immediate index).

ST1D (scalar plus scalar): Contiguous store doublewords from vector (scalar index).

ST1D (scalar plus vector): Scatter store doublewords from a vector (vector index).

ST1D (vector plus immediate): Scatter store doublewords from a vector (immediate index).

ST1H (scalar plus immediate): Contiguous store halfwords from vector (immediate index).

ST1H (scalar plus scalar): Contiguous store halfwords from vector (scalar index).

ST1H (scalar plus vector): Scatter store halfwords from a vector (vector index).

ST1H (vector plus immediate): Scatter store halfwords from a vector (immediate index).

ST1W (scalar plus immediate): Contiguous store words from vector (immediate index).

ST1W (scalar plus scalar): Contiguous store words from vector (scalar index).

ST1W (scalar plus vector): Scatter store words from a vector (vector index).

Page 1692
A64 -- SVE Instructions (alphabetic order)

ST1W (vector plus immediate): Scatter store words from a vector (immediate index).

ST2B (scalar plus immediate): Contiguous store two-byte structures from two vectors (immediate index).

ST2B (scalar plus scalar): Contiguous store two-byte structures from two vectors (scalar index).

ST2D (scalar plus immediate): Contiguous store two-doubleword structures from two vectors (immediate index).

ST2D (scalar plus scalar): Contiguous store two-doubleword structures from two vectors (scalar index).

ST2H (scalar plus immediate): Contiguous store two-halfword structures from two vectors (immediate index).

ST2H (scalar plus scalar): Contiguous store two-halfword structures from two vectors (scalar index).

ST2W (scalar plus immediate): Contiguous store two-word structures from two vectors (immediate index).

ST2W (scalar plus scalar): Contiguous store two-word structures from two vectors (scalar index).

ST3B (scalar plus immediate): Contiguous store three-byte structures from three vectors (immediate index).

ST3B (scalar plus scalar): Contiguous store three-byte structures from three vectors (scalar index).

ST3D (scalar plus immediate): Contiguous store three-doubleword structures from three vectors (immediate index).

ST3D (scalar plus scalar): Contiguous store three-doubleword structures from three vectors (scalar index).

ST3H (scalar plus immediate): Contiguous store three-halfword structures from three vectors (immediate index).

ST3H (scalar plus scalar): Contiguous store three-halfword structures from three vectors (scalar index).

ST3W (scalar plus immediate): Contiguous store three-word structures from three vectors (immediate index).

ST3W (scalar plus scalar): Contiguous store three-word structures from three vectors (scalar index).

ST4B (scalar plus immediate): Contiguous store four-byte structures from four vectors (immediate index).

ST4B (scalar plus scalar): Contiguous store four-byte structures from four vectors (scalar index).

ST4D (scalar plus immediate): Contiguous store four-doubleword structures from four vectors (immediate index).

ST4D (scalar plus scalar): Contiguous store four-doubleword structures from four vectors (scalar index).

ST4H (scalar plus immediate): Contiguous store four-halfword structures from four vectors (immediate index).

ST4H (scalar plus scalar): Contiguous store four-halfword structures from four vectors (scalar index).

ST4W (scalar plus immediate): Contiguous store four-word structures from four vectors (immediate index).

ST4W (scalar plus scalar): Contiguous store four-word structures from four vectors (scalar index).

STNT1B (scalar plus immediate): Contiguous store non-temporal bytes from vector (immediate index).

STNT1B (scalar plus scalar): Contiguous store non-temporal bytes from vector (scalar index).

STNT1D (scalar plus immediate): Contiguous store non-temporal doublewords from vector (immediate index).

STNT1D (scalar plus scalar): Contiguous store non-temporal doublewords from vector (scalar index).

STNT1H (scalar plus immediate): Contiguous store non-temporal halfwords from vector (immediate index).

STNT1H (scalar plus scalar): Contiguous store non-temporal halfwords from vector (scalar index).

STNT1W (scalar plus immediate): Contiguous store non-temporal words from vector (immediate index).

STNT1W (scalar plus scalar): Contiguous store non-temporal words from vector (scalar index).

STR (predicate): Store predicate register.

STR (vector): Store vector register.

SUB (immediate): Subtract immediate (unpredicated).

Page 1693
A64 -- SVE Instructions (alphabetic order)

SUB (vectors, predicated): Subtract vectors (predicated).

SUB (vectors, unpredicated): Subtract vectors (unpredicated).

SUBR (immediate): Reversed subtract from immediate (unpredicated).

SUBR (vectors): Reversed subtract vectors (predicated).

SUDOT: Signed by unsigned integer indexed dot product.

SUNPKHI, SUNPKLO: Signed unpack and extend half of vector.

SXTB, SXTH, SXTW: Signed byte / halfword / word extend (predicated).

TBL: Programmable table lookup in single vector table.

TRN1, TRN2 (predicates): Interleave even or odd elements from two predicates.

TRN1, TRN2 (vectors): Interleave even or odd elements from two vectors.

UABD: Unsigned absolute difference (predicated).

UADDV: Unsigned add reduction to scalar.

UCVTF: Unsigned integer convert to floating-point (predicated).

UDIV: Unsigned divide (predicated).

UDIVR: Unsigned reversed divide (predicated).

UDOT (indexed): Unsigned integer indexed dot product.

UDOT (vectors): Unsigned integer dot product.

UMAX (immediate): Unsigned maximum with immediate (unpredicated).

UMAX (vectors): Unsigned maximum vectors (predicated).

UMAXV: Unsigned maximum reduction to scalar.

UMIN (immediate): Unsigned minimum with immediate (unpredicated).

UMIN (vectors): Unsigned minimum vectors (predicated).

UMINV: Unsigned minimum reduction to scalar.

UMMLA: Unsigned integer matrix multiply-accumulate.

UMULH: Unsigned multiply returning high half (predicated).

UQADD (immediate): Unsigned saturating add immediate (unpredicated).

UQADD (vectors): Unsigned saturating add vectors (unpredicated).

UQDECB: Unsigned saturating decrement scalar by multiple of 8-bit predicate constraint element count.

UQDECD (scalar): Unsigned saturating decrement scalar by multiple of 64-bit predicate constraint element count.

UQDECD (vector): Unsigned saturating decrement vector by multiple of 64-bit predicate constraint element count.

UQDECH (scalar): Unsigned saturating decrement scalar by multiple of 16-bit predicate constraint element count.

UQDECH (vector): Unsigned saturating decrement vector by multiple of 16-bit predicate constraint element count.

UQDECP (scalar): Unsigned saturating decrement scalar by count of true predicate elements.

UQDECP (vector): Unsigned saturating decrement vector by count of true predicate elements.

UQDECW (scalar): Unsigned saturating decrement scalar by multiple of 32-bit predicate constraint element count.

UQDECW (vector): Unsigned saturating decrement vector by multiple of 32-bit predicate constraint element count.

Page 1694
A64 -- SVE Instructions (alphabetic order)

UQINCB: Unsigned saturating increment scalar by multiple of 8-bit predicate constraint element count.

UQINCD (scalar): Unsigned saturating increment scalar by multiple of 64-bit predicate constraint element count.

UQINCD (vector): Unsigned saturating increment vector by multiple of 64-bit predicate constraint element count.

UQINCH (scalar): Unsigned saturating increment scalar by multiple of 16-bit predicate constraint element count.

UQINCH (vector): Unsigned saturating increment vector by multiple of 16-bit predicate constraint element count.

UQINCP (scalar): Unsigned saturating increment scalar by count of true predicate elements.

UQINCP (vector): Unsigned saturating increment vector by count of true predicate elements.

UQINCW (scalar): Unsigned saturating increment scalar by multiple of 32-bit predicate constraint element count.

UQINCW (vector): Unsigned saturating increment vector by multiple of 32-bit predicate constraint element count.

UQSUB (immediate): Unsigned saturating subtract immediate (unpredicated).

UQSUB (vectors): Unsigned saturating subtract vectors (unpredicated).

USDOT (indexed): Unsigned by signed integer indexed dot product.

USDOT (vectors): Unsigned by signed integer dot product.

USMMLA: Unsigned by signed integer matrix multiply-accumulate.

UUNPKHI, UUNPKLO: Unsigned unpack and extend half of vector.

UXTB, UXTH, UXTW: Unsigned byte / halfword / word extend (predicated).

UZP1, UZP2 (predicates): Concatenate even or odd elements from two predicates.

UZP1, UZP2 (vectors): Concatenate even or odd elements from two vectors.

WHILELE: While incrementing signed scalar less than or equal to scalar.

WHILELO: While incrementing unsigned scalar lower than scalar.

WHILELS: While incrementing unsigned scalar lower or same as scalar.

WHILELT: While incrementing signed scalar less than scalar.

WRFFR: Write the first-fault register.

ZIP1, ZIP2 (predicates): Interleave elements from two half predicates.

ZIP1, ZIP2 (vectors): Interleave elements from two half vectors.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

Page 1695
ABS

Absolute value (predicated)

Compute the absolute value of the signed integer in each active element of the source vector, and place the results in
the corresponding elements of the destination vector. Inactive elements in the destination vector register remain
unmodified.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 0 1 0 1 1 0 1 0 1 Pg Zn Zd

ABS <Zd>.<T>, <Pg>/M, <Zn>.<T>

if !HaveSVE() then UNDEFINED;

integer esize = 8 << UInt(size);
integer g = UInt(Pg);
integer n = UInt(Zn);
integer d = UInt(Zd);

Assembler Symbols

<Zd> Is the name of the destination scalable vector register, encoded in the "Zd" field.

<T> Is the size specifier, encoded in “size”:

size <T>
00 B
01 H
10 S
11 D

<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Zn> Is the name of the source scalable vector register, encoded in the "Zn" field.

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) operand = if AnyActiveElement(mask, esize) then Z[n] else Zeros();
bits(VL) result = Z[d];

for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
integer element = SInt(Elem[operand, e, esize]);
element = Abs(element);
Elem[result, e, esize] = element<esize-1:0>;

Z[d] = result;

Operational information

This instruction might be immediately preceded in program order by a MOVPRFX instruction. The MOVPRFX instruction
must conform to all of the following requirements, otherwise the behavior of the MOVPRFX and this instruction is
UNPREDICTABLE:
• The MOVPRFX instruction must be unpredicated, or be predicated using the same governing predicate register
and source element size as this instruction.
• The MOVPRFX instruction must specify the same destination register as this instruction.
• The destination register must not refer to architectural register state referenced by any other source operand
register of this instruction.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

ABS Page 1696

ADD (immediate)

Add immediate (unpredicated)

Add an unsigned immediate to each element of the source vector, and destructively place the results in the
corresponding elements of the source vector. This instruction is unpredicated.
The immediate is an unsigned value in the range 0 to 255, and for element widths of 16 bits or higher it may also be a
positive multiple of 256 in the range 256 to 65280.
The immediate is encoded in 8 bits with an optional left shift by 8. The preferred disassembly when the shift option is
specified is "#<uimm8>, LSL #8". However an assembler and disassembler may also allow use of the shifted 16-bit
value unless the immediate is 0 and the shift amount is 8, which must be unambiguously described as "#0, LSL #8".
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 size 1 0 0 0 0 0 1 1 sh imm8 Zdn

ADD <Zdn>.<T>, <Zdn>.<T>, #<imm>{, <shift>}

if !HaveSVE() then UNDEFINED;

if size:sh == '001' then UNDEFINED;
integer esize = 8 << UInt(size);
integer dn = UInt(Zdn);
integer imm = UInt(imm8);
if sh == '1' then imm = imm << 8;

Assembler Symbols

<Zdn> Is the name of the source and destination scalable vector register, encoded in the "Zdn" field.

<T> Is the size specifier, encoded in “size”:

size <T>
00 B
01 H
10 S
11 D

<imm> Is an unsigned immediate in the range 0 to 255, encoded in the "imm8" field.

<shift> Is the optional left shift to apply to the immediate, defaulting to LSL #0 and encoded in “sh”:

sh <shift>
0 LSL #0
1 LSL #8

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(VL) operand1 = Z[dn];
bits(VL) result;

for e = 0 to elements-1
bits(esize) element1 = Elem[operand1, e, esize];
Elem[result, e, esize] = element1 + imm;

Z[dn] = result;

Operational information

ADD (immediate) Page 1697

• The MOVPRFX instruction must specify the same destination register as this instruction.
• The destination register must not refer to architectural register state referenced by any other source operand
register of this instruction.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

ADD (immediate) Page 1698

ADD (vectors, predicated)

Add vectors (predicated)

Add active elements of the second source vector to corresponding elements of the first source vector and destructively
place the results in the corresponding elements of the first source vector. Inactive elements in the destination vector
register remain unmodified.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 0 0 0 0 0 0 0 0 0 Pg Zm Zdn

ADD <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>

if !HaveSVE() then UNDEFINED;

integer esize = 8 << UInt(size);
integer g = UInt(Pg);
integer dn = UInt(Zdn);
integer m = UInt(Zm);

Assembler Symbols

<Zdn> Is the name of the first source and destination scalable vector register, encoded in the "Zdn" field.

<T> Is the size specifier, encoded in “size”:

size <T>
00 B
01 H
10 S
11 D

<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Zm> Is the name of the second source scalable vector register, encoded in the "Zm" field.

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) operand1 = Z[dn];
bits(VL) operand2 = if AnyActiveElement(mask, esize) then Z[m] else Zeros();
bits(VL) result;

for e = 0 to elements-1
bits(esize) element1 = Elem[operand1, e, esize];
bits(esize) element2 = Elem[operand2, e, esize];
if ElemP[mask, e, esize] == '1' then
Elem[result, e, esize] = element1 + element2;
else
Elem[result, e, esize] = Elem[operand1, e, esize];

Z[dn] = result;

Operational information

ADD (vectors, predicated) Page 1699

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

ADD (vectors, predicated) Page 1700

ADD (vectors, unpredicated)

Add vectors (unpredicated)

Add all elements of the second source vector to corresponding elements of the first source vector and place the results
in the corresponding elements of the destination vector. This instruction is unpredicated.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 1 Zm 0 0 0 0 0 0 Zn Zd

ADD <Zd>.<T>, <Zn>.<T>, <Zm>.<T>

if !HaveSVE() then UNDEFINED;

integer esize = 8 << UInt(size);
integer n = UInt(Zn);
integer m = UInt(Zm);
integer d = UInt(Zd);

Assembler Symbols

<Zd> Is the name of the destination scalable vector register, encoded in the "Zd" field.

<T> Is the size specifier, encoded in “size”:

size <T>
00 B
01 H
10 S
11 D

<Zn> Is the name of the first source scalable vector register, encoded in the "Zn" field.
<Zm> Is the name of the second source scalable vector register, encoded in the "Zm" field.

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(VL) operand1 = Z[n];
bits(VL) operand2 = Z[m];
bits(VL) result;

for e = 0 to elements-1
bits(esize) element1 = Elem[operand1, e, esize];
bits(esize) element2 = Elem[operand2, e, esize];
Elem[result, e, esize] = element1 + element2;

Z[d] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

ADD (vectors, unpredicated) Page 1701

ADDPL

Add multiple of predicate register size to scalar register

Add the current predicate register size in bytes multiplied by an immediate in the range -32 to 31 to the 64-bit source
general-purpose register or current stack pointer and place the result in the 64-bit destination general-purpose
register or current stack pointer.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 0 1 1 Rn 0 1 0 1 0 imm6 Rd

ADDPL <Xd|SP>, <Xn|SP>, #<imm>

if !HaveSVE() then UNDEFINED;

integer n = UInt(Rn);
integer d = UInt(Rd);
integer imm = SInt(imm6);

Assembler Symbols

<Xd|SP> Is the 64-bit name of the destination general-purpose register or stack pointer, encoded in the "Rd"
field.
<Xn|SP> Is the 64-bit name of the source general-purpose register or stack pointer, encoded in the "Rn" field.
<imm> Is the signed immediate operand, in the range -32 to 31, encoded in the "imm6" field.

Operation

CheckSVEEnabled();
bits(64) operand1 = if n == 31 then SP[] else X[n];
bits(64) result = operand1 + (imm * (PL DIV 8));

if d == 31 then
SP[] = result;
else
X[d] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

ADDPL Page 1702

ADDVL

Add multiple of vector register size to scalar register

Add the current vector register size in bytes multiplied by an immediate in the range -32 to 31 to the 64-bit source
general-purpose register or current stack pointer, and place the result in the 64-bit destination general-purpose
register or current stack pointer.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 0 0 1 Rn 0 1 0 1 0 imm6 Rd

ADDVL <Xd|SP>, <Xn|SP>, #<imm>

if !HaveSVE() then UNDEFINED;

integer n = UInt(Rn);
integer d = UInt(Rd);
integer imm = SInt(imm6);

Assembler Symbols

Operation

CheckSVEEnabled();
bits(64) operand1 = if n == 31 then SP[] else X[n];
bits(64) result = operand1 + (imm * (VL DIV 8));

if d == 31 then
SP[] = result;
else
X[d] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

ADDVL Page 1703

ADR

Compute vector address

Optionally sign or zero-extend the least significant 32-bits of each element from a vector of offsets or indices in the
second source vector, scale each index by 2, 4 or 8, add to a vector of base addresses from the first source vector, and
place the resulting addresses in the destination vector. This instruction is unpredicated.
It has encodings from 3 classes: Packed offsets , Unpacked 32-bit signed offsets and Unpacked 32-bit unsigned offsets

Packed offsets

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 1 sz 1 Zm 1 0 1 0 msz Zn Zd

ADR <Zd>.<T>, [<Zn>.<T>, <Zm>.<T>{, <mod> <amount>}]

if !HaveSVE() then UNDEFINED;

integer esize = 32 << UInt(sz);
integer n = UInt(Zn);
integer m = UInt(Zm);
integer d = UInt(Zd);
integer osize = esize;
boolean unsigned = TRUE;
integer mbytes = 1 << UInt(msz);

Unpacked 32-bit signed offsets

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 0 0 1 Zm 1 0 1 0 msz Zn Zd

ADR <Zd>.D, [<Zn>.D, <Zm>.D, SXTW{ <amount>}]

if !HaveSVE() then UNDEFINED;

integer esize = 64;
integer n = UInt(Zn);
integer m = UInt(Zm);
integer d = UInt(Zd);
integer osize = 32;
boolean unsigned = FALSE;
integer mbytes = 1 << UInt(msz);

Unpacked 32-bit unsigned offsets

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 0 1 1 Zm 1 0 1 0 msz Zn Zd

ADR <Zd>.D, [<Zn>.D, <Zm>.D, UXTW{ <amount>}]

if !HaveSVE() then UNDEFINED;

integer esize = 64;
integer n = UInt(Zn);
integer m = UInt(Zm);
integer d = UInt(Zd);
integer osize = 32;
boolean unsigned = TRUE;
integer mbytes = 1 << UInt(msz);

Assembler Symbols

<Zd> Is the name of the destination scalable vector register, encoded in the "Zd" field.

ADR Page 1704

<T> Is the size specifier, encoded in “sz”:

sz <T>
0 S
1 D

<Zn> Is the name of the base scalable vector register, encoded in the "Zn" field.
<Zm> Is the name of the offset scalable vector register, encoded in the "Zm" field.

<mod> Is the index extend and shift specifier, encoded in “msz”:

msz <mod>
00 [absent]
x1 LSL
10 LSL

<amount> Is the index shift amount, encoded in “msz”:

msz <amount>
00 [absent]
01 #1
10 #2
11 #3

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(VL) base = Z[n];
bits(VL) offs = Z[m];
bits(VL) result;

for e = 0 to elements-1
bits(esize) addr = Elem[base, e, esize];
integer offset = Int(Elem[offs, e, esize]<osize-1:0>, unsigned);
Elem[result, e, esize] = addr + (offset * mbytes);

Z[d] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

ADR Page 1705

AND (immediate)

Bitwise AND with immediate (unpredicated)

Bitwise AND an immediate with each 64-bit element of the source vector, and destructively place the results in the
corresponding elements of the source vector. The immediate is a 64-bit value consisting of a single run of ones or
zeros repeating every 2, 4, 8, 16, 32 or 64 bits. This instruction is unpredicated.
This instruction is used by the pseudo-instruction BIC (immediate).
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 1 0 0 0 0 0 imm13 Zdn

AND <Zdn>.<T>, <Zdn>.<T>, #<const>

if !HaveSVE() then UNDEFINED;

integer dn = UInt(Zdn);
bits(64) imm;
(imm, -) = DecodeBitMasks(imm13<12>, imm13<5:0>, imm13<11:6>, TRUE);

Assembler Symbols

<Zdn> Is the name of the source and destination scalable vector register, encoded in the "Zdn" field.

<T> Is the size specifier, encoded in “imm13<12>:imm13<5:0>”:

imm13<12> imm13<5:0> <T>

0 0xxxxx S
0 10xxxx H
0 110xxx B
0 1110xx B
0 11110x B
0 111110 RESERVED
0 111111 RESERVED
1 xxxxxx D

<const> Is a 64, 32, 16 or 8-bit bitmask consisting of replicated 2, 4, 8, 16, 32 or 64 bit fields, each field
containing a rotated run of non-zero bits, encoded in the "imm13" field.

Operation

CheckSVEEnabled();
integer elements = VL DIV 64;
bits(VL) operand = Z[dn];
bits(VL) result;

for e = 0 to elements-1
bits(64) element1 = Elem[operand, e, 64];
Elem[result, e, 64] = element1 AND imm;

Z[dn] = result;

Operational information

This instruction might be immediately preceded in program order by a MOVPRFX instruction. The MOVPRFX instruction
must conform to all of the following requirements, otherwise the behavior of the MOVPRFX and this instruction is
UNPREDICTABLE:
• The MOVPRFX instruction must be unpredicated.
• The MOVPRFX instruction must specify the same destination register as this instruction.
• The destination register must not refer to architectural register state referenced by any other source operand
register of this instruction.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

AND (immediate) Page 1706

AND (predicates)

Bitwise AND predicates

Bitwise AND active elements of the second source predicate with corresponding elements of the first source predicate
and place the results in the corresponding elements of the destination predicate. Inactive elements in the destination
predicate register are set to zero. Does not set the condition flags.
This instruction is used by the alias MOV (predicate, predicated, zeroing).
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 0 0 0 0 Pm 0 1 Pg 0 Pn 0 Pd
S

AND <Pd>.B, <Pg>/Z, <Pn>.B, <Pm>.B

if !HaveSVE() then UNDEFINED;

integer esize = 8;
integer g = UInt(Pg);
integer n = UInt(Pn);
integer m = UInt(Pm);
integer d = UInt(Pd);
boolean setflags = FALSE;

Assembler Symbols

<Pd> Is the name of the destination scalable predicate register, encoded in the "Pd" field.
<Pg> Is the name of the governing scalable predicate register, encoded in the "Pg" field.
<Pn> Is the name of the first source scalable predicate register, encoded in the "Pn" field.
<Pm> Is the name of the second source scalable predicate register, encoded in the "Pm" field.

Alias Conditions

Alias Is preferred when

MOV (predicate, predicated, zeroing) S == '0' && Pn == Pm

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(PL) operand1 = P[n];
bits(PL) operand2 = P[m];
bits(PL) result;

for e = 0 to elements-1
bit element1 = ElemP[operand1, e, esize];
bit element2 = ElemP[operand2, e, esize];
if ElemP[mask, e, esize] == '1' then
ElemP[result, e, esize] = element1 AND element2;
else
ElemP[result, e, esize] = '0';

if setflags then
PSTATE.<N,Z,C,V> = PredTest(mask, result, esize);
P[d] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

AND (predicates) Page 1707

AND (vectors, predicated)

Bitwise AND vectors (predicated)

Bitwise AND active elements of the second source vector with corresponding elements of the first source vector and
destructively place the results in the corresponding elements of the first source vector. Inactive elements in the
destination vector register remain unmodified.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 0 1 1 0 1 0 0 0 0 Pg Zm Zdn

AND <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>

if !HaveSVE() then UNDEFINED;

integer esize = 8 << UInt(size);
integer g = UInt(Pg);
integer dn = UInt(Zdn);
integer m = UInt(Zm);

Assembler Symbols

<Zdn> Is the name of the first source and destination scalable vector register, encoded in the "Zdn" field.

<T> Is the size specifier, encoded in “size”:

size <T>
00 B
01 H
10 S
11 D

<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Zm> Is the name of the second source scalable vector register, encoded in the "Zm" field.

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) operand1 = Z[dn];
bits(VL) operand2 = if AnyActiveElement(mask, esize) then Z[m] else Zeros();
bits(VL) result;

for e = 0 to elements-1
bits(esize) element1 = Elem[operand1, e, esize];
bits(esize) element2 = Elem[operand2, e, esize];
if ElemP[mask, e, esize] == '1' then
Elem[result, e, esize] = element1 AND element2;
else
Elem[result, e, esize] = Elem[operand1, e, esize];

Z[dn] = result;

Operational information

AND (vectors, predicated) Page 1708

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

AND (vectors, predicated) Page 1709

AND (vectors, unpredicated)

Bitwise AND vectors (unpredicated)

Bitwise AND all elements of the second source vector with corresponding elements of the first source vector and place
the results in the corresponding elements of the destination vector. This instruction is unpredicated.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 0 0 1 Zm 0 0 1 1 0 0 Zn Zd

AND <Zd>.D, <Zn>.D, <Zm>.D

if !HaveSVE() then UNDEFINED;

integer n = UInt(Zn);
integer m = UInt(Zm);
integer d = UInt(Zd);

Assembler Symbols

<Zd> Is the name of the destination scalable vector register, encoded in the "Zd" field.
<Zn> Is the name of the first source scalable vector register, encoded in the "Zn" field.
<Zm> Is the name of the second source scalable vector register, encoded in the "Zm" field.

Operation

CheckSVEEnabled();
bits(VL) operand1 = Z[n];
bits(VL) operand2 = Z[m];

Z[d] = operand1 AND operand2;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

AND (vectors, unpredicated) Page 1710

ANDS

Bitwise AND predicates, setting the condition flags

Bitwise AND active elements of the second source predicate with corresponding elements of the first source predicate
and place the results in the corresponding elements of the destination predicate. Inactive elements in the destination
predicate register are set to zero. Sets the FIRST (N), NONE (Z), !LAST (C) condition flags based on the predicate result,
and the V flag to zero.
This instruction is used by the alias MOVS (predicated).
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 0 1 0 0 Pm 0 1 Pg 0 Pn 0 Pd
S

ANDS <Pd>.B, <Pg>/Z, <Pn>.B, <Pm>.B

if !HaveSVE() then UNDEFINED;

integer esize = 8;
integer g = UInt(Pg);
integer n = UInt(Pn);
integer m = UInt(Pm);
integer d = UInt(Pd);
boolean setflags = TRUE;

Assembler Symbols

Alias Conditions

Alias Is preferred when

MOVS (predicated) S == '1' && Pn == Pm

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(PL) operand1 = P[n];
bits(PL) operand2 = P[m];
bits(PL) result;

if setflags then
PSTATE.<N,Z,C,V> = PredTest(mask, result, esize);
P[d] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

ANDS Page 1711

ANDV

Bitwise AND reduction to scalar

Bitwise AND horizontally across all lanes of a vector, and place the result in the SIMD&FP scalar destination register.
Inactive elements in the source vector are treated as all ones.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 0 1 1 0 1 0 0 0 1 Pg Zn Vd

ANDV <V><d>, <Pg>, <Zn>.<T>

if !HaveSVE() then UNDEFINED;

integer esize = 8 << UInt(size);
integer g = UInt(Pg);
integer n = UInt(Zn);
integer d = UInt(Vd);

Assembler Symbols

<V> Is a width specifier, encoded in “size”:

size <V>
00 B
01 H
10 S
11 D

<d> Is the number [0-31] of the destination SIMD&FP register, encoded in the "Vd" field.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Zn> Is the name of the source scalable vector register, encoded in the "Zn" field.

<T> Is the size specifier, encoded in “size”:

size <T>
00 B
01 H
10 S
11 D

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) operand = if AnyActiveElement(mask, esize) then Z[n] else Zeros();
bits(esize) result = Ones(esize);

for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
result = result AND Elem[operand, e, esize];

V[d] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

ANDV Page 1712

ASR (immediate, predicated)

Arithmetic shift right by immediate (predicated)

Shift right by immediate, preserving the sign bit, each active element of the source vector, and destructively place the
results in the corresponding elements of the source vector. The immediate shift amount is an unsigned value in the
range 1 to number of bits per element. Inactive elements in the destination vector register remain unmodified.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 tszh 0 0 0 0 0 0 1 0 0 Pg tszl imm3 Zdn
L U

ASR <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, #<const>

if !HaveSVE() then UNDEFINED;

bits(4) tsize = tszh:tszl;
integer esize;
case tsize of
when '0000' UNDEFINED;
when '0001' esize = 8;
when '001x' esize = 16;
when '01xx' esize = 32;
when '1xxx' esize = 64;
integer g = UInt(Pg);
integer dn = UInt(Zdn);
integer shift = (2 * esize) - UInt(tsize:imm3);

Assembler Symbols

<Zdn> Is the name of the source and destination scalable vector register, encoded in the "Zdn" field.

<T> Is the size specifier, encoded in “tszh:tszl”:

tszh tszl <T>

00 00 RESERVED
00 01 B
00 1x H
01 xx S
1x xx D

<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<const> Is the immediate shift amount, in the range 1 to number of bits per element, encoded in "tsz:imm3".

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(VL) operand1 = Z[dn];
bits(PL) mask = P[g];
bits(VL) result;

for e = 0 to elements-1
bits(esize) element1 = Elem[operand1, e, esize];
if ElemP[mask, e, esize] == '1' then
Elem[result, e, esize] = ASR(element1, shift);
else
Elem[result, e, esize] = Elem[operand1, e, esize];

Z[dn] = result;

ASR (immediate, predicated) Page 1713

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

ASR (immediate, predicated) Page 1714

ASR (immediate, unpredicated)

Arithmetic shift right by immediate (unpredicated)

Shift right by immediate, preserving the sign bit, each element of the source vector, and place the results in the
corresponding elements of the destination vector. The immediate shift amount is an unsigned value in the range 1 to
number of bits per element. This instruction is unpredicated.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 tszh 1 tszl imm3 1 0 0 1 0 0 Zn Zd
U

ASR <Zd>.<T>, <Zn>.<T>, #<const>

if !HaveSVE() then UNDEFINED;

bits(4) tsize = tszh:tszl;
integer esize;
case tsize of
when '0000' UNDEFINED;
when '0001' esize = 8;
when '001x' esize = 16;
when '01xx' esize = 32;
when '1xxx' esize = 64;
integer n = UInt(Zn);
integer d = UInt(Zd);
integer shift = (2 * esize) - UInt(tsize:imm3);

Assembler Symbols

<Zd> Is the name of the destination scalable vector register, encoded in the "Zd" field.

<T> Is the size specifier, encoded in “tszh:tszl”:

tszh tszl <T>

00 00 RESERVED
00 01 B
00 1x H
01 xx S
1x xx D

<Zn> Is the name of the source scalable vector register, encoded in the "Zn" field.
<const> Is the immediate shift amount, in the range 1 to number of bits per element, encoded in "tsz:imm3".

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(VL) operand1 = Z[n];
bits(VL) result;

for e = 0 to elements-1
bits(esize) element1 = Elem[operand1, e, esize];
Elem[result, e, esize] = ASR(element1, shift);

Z[d] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

ASR (immediate,
Page 1715
unpredicated)
ASR (vectors)

Arithmetic shift right by vector (predicated)

Shift right, preserving the sign bit, active elements of the first source vector by corresponding elements of the second
source vector and destructively place the results in the corresponding elements of the first source vector. The shift
amount operand is a vector of unsigned elements in which all bits are significant, and not used modulo the element
size. Inactive elements in the destination vector register remain unmodified.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 0 1 0 0 0 0 1 0 0 Pg Zm Zdn
R L U

ASR <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>

if !HaveSVE() then UNDEFINED;

integer esize = 8 << UInt(size);
integer g = UInt(Pg);
integer dn = UInt(Zdn);
integer m = UInt(Zm);

Assembler Symbols

<Zdn> Is the name of the first source and destination scalable vector register, encoded in the "Zdn" field.

<T> Is the size specifier, encoded in “size”:

size <T>
00 B
01 H
10 S
11 D

<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Zm> Is the name of the second source scalable vector register, encoded in the "Zm" field.

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) operand1 = Z[dn];
bits(VL) operand2 = if AnyActiveElement(mask, esize) then Z[m] else Zeros();
bits(VL) result;

for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
bits(esize) element1 = Elem[operand1, e, esize];
bits(esize) element2 = Elem[operand2, e, esize];
integer shift = Min(UInt(element2), esize);
Elem[result, e, esize] = ASR(element1, shift);
else
Elem[result, e, esize] = Elem[operand1, e, esize];

Z[dn] = result;

Operational information

ASR (vectors) Page 1716

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

ASR (vectors) Page 1717

ASR (wide elements, predicated)

Arithmetic shift right by 64-bit wide elements (predicated)

Shift right, preserving the sign bit, active elements of the first source vector by corresponding overlapping 64-bit
elements of the second source vector and destructively place the results in the corresponding elements of the first
source vector. The shift amount is a vector of unsigned 64-bit doubleword elements in which all bits are significant,
and not used modulo the destination element size. Inactive elements in the destination vector register remain
unmodified.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 0 1 1 0 0 0 1 0 0 Pg Zm Zdn
R L U

ASR <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.D

if !HaveSVE() then UNDEFINED;

if size == '11' then UNDEFINED;
integer esize = 8 << UInt(size);
integer g = UInt(Pg);
integer dn = UInt(Zdn);
integer m = UInt(Zm);

Assembler Symbols

<Zdn> Is the name of the first source and destination scalable vector register, encoded in the "Zdn" field.

<T> Is the size specifier, encoded in “size”:

size <T>
00 B
01 H
10 S
11 RESERVED

<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Zm> Is the name of the second source scalable vector register, encoded in the "Zm" field.

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) operand1 = Z[dn];
bits(VL) operand2 = if AnyActiveElement(mask, esize) then Z[m] else Zeros();
bits(VL) result;

for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
bits(esize) element1 = Elem[operand1, e, esize];
bits(64) element2 = Elem[operand2, (e * esize) DIV 64, 64];
integer shift = Min(UInt(element2), esize);
Elem[result, e, esize] = ASR(element1, shift);
else
Elem[result, e, esize] = Elem[operand1, e, esize];

Z[dn] = result;

Operational information

ASR (wide elements,

Page 1718
predicated)
• The MOVPRFX instruction must be unpredicated, or be predicated using the same governing predicate register
and destination element size as this instruction.
• The MOVPRFX instruction must specify the same destination register as this instruction.
• The destination register must not refer to architectural register state referenced by any other source operand
register of this instruction.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

ASR (wide elements,

Page 1719
predicated)
ASR (wide elements, unpredicated)

Arithmetic shift right by 64-bit wide elements (unpredicated)

Shift right, preserving the sign bit, all elements of the first source vector by corresponding overlapping 64-bit
elements of the second source vector and place the first in the corresponding elements of the destination vector. The
shift amount is a vector of unsigned 64-bit doubleword elements in which all bits are significant, and not used modulo
the destination element size. This instruction is unpredicated.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 1 Zm 1 0 0 0 0 0 Zn Zd
U

ASR <Zd>.<T>, <Zn>.<T>, <Zm>.D

if !HaveSVE() then UNDEFINED;

if size == '11' then UNDEFINED;
integer esize = 8 << UInt(size);
integer n = UInt(Zn);
integer m = UInt(Zm);
integer d = UInt(Zd);

Assembler Symbols

<Zd> Is the name of the destination scalable vector register, encoded in the "Zd" field.

<T> Is the size specifier, encoded in “size”:

size <T>
00 B
01 H
10 S
11 RESERVED

<Zn> Is the name of the first source scalable vector register, encoded in the "Zn" field.
<Zm> Is the name of the second source scalable vector register, encoded in the "Zm" field.

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(VL) operand1 = Z[n];
bits(VL) operand2 = Z[m];
bits(VL) result;

for e = 0 to elements-1
bits(esize) element1 = Elem[operand1, e, esize];
bits(64) element2 = Elem[operand2, (e * esize) DIV 64, 64];
integer shift = Min(UInt(element2), esize);
Elem[result, e, esize] = ASR(element1, shift);

Z[d] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

ASR (wide elements,

Page 1720
unpredicated)
ASRD

Arithmetic shift right for divide by immediate (predicated)

Shift right by immediate, preserving the sign bit, each active element of the source vector, and destructively place the
results in the corresponding elements of the source vector. The result rounds toward zero as in a signed division. The
immediate shift amount is an unsigned value in the range 1 to number of bits per element. Inactive elements in the
destination vector register remain unmodified.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 tszh 0 0 0 1 0 0 1 0 0 Pg tszl imm3 Zdn
L U

ASRD <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, #<const>

if !HaveSVE() then UNDEFINED;

Assembler Symbols

<Zdn> Is the name of the source and destination scalable vector register, encoded in the "Zdn" field.

<T> Is the size specifier, encoded in “tszh:tszl”:

tszh tszl <T>

00 00 RESERVED
00 01 B
00 1x H
01 xx S
1x xx D

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) operand1 = Z[dn];
bits(VL) result;

for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
integer element1 = SInt(Elem[operand1, e, esize]);
if element1 < 0 then
element1 = element1 + ((1 << shift) - 1);
Elem[result, e, esize] = (element1 >> shift)<esize-1:0>;
else
Elem[result, e, esize] = Elem[operand1, e, esize];

Z[dn] = result;

ASRD Page 1721

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

ASRD Page 1722

ASRR

Reversed arithmetic shift right by vector (predicated)

Reversed shift right, preserving the sign bit, active elements of the second source vector by corresponding elements of
the first source vector and destructively place the results in the corresponding elements of the first source vector. The
shift amount operand is a vector of unsigned elements in which all bits are significant, and not used modulo the
element size. Inactive elements in the destination vector register remain unmodified.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 0 1 0 1 0 0 1 0 0 Pg Zm Zdn
R L U

ASRR <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>

if !HaveSVE() then UNDEFINED;

integer esize = 8 << UInt(size);
integer g = UInt(Pg);
integer dn = UInt(Zdn);
integer m = UInt(Zm);

Assembler Symbols

<Zdn> Is the name of the first source and destination scalable vector register, encoded in the "Zdn" field.

<T> Is the size specifier, encoded in “size”:

size <T>
00 B
01 H
10 S
11 D

<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Zm> Is the name of the second source scalable vector register, encoded in the "Zm" field.

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) operand1 = Z[dn];
bits(VL) operand2 = if AnyActiveElement(mask, esize) then Z[m] else Zeros();
bits(VL) result;

for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
bits(esize) element1 = Elem[operand1, e, esize];
bits(esize) element2 = Elem[operand2, e, esize];
integer shift = Min(UInt(element1), esize);
Elem[result, e, esize] = ASR(element2, shift);
else
Elem[result, e, esize] = Elem[operand1, e, esize];

Z[dn] = result;

Operational information

ASRR Page 1723

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

ASRR Page 1724

BFCVT

Floating-point down convert to BFloat16 format (predicated)

Convert to BFloat16 from single-precision in each active floating-point element of the source vector, and place the
results in the corresponding elements of the destination vector. Inactive elements in the destination vector register
remain unmodified.
Since the result type is smaller than the input type, the results are zero-extended to fill each destination element.
ID_AA64ZFR0_EL1.BF16 indicates whether this instruction is implemented.

SVE
(FEAT_BF16)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 1 0 0 0 1 0 1 0 1 0 1 Pg Zn Zd

BFCVT <Zd>.H, <Pg>/M, <Zn>.S

if !HaveSVE() || !HaveBF16Ext() then UNDEFINED;

integer g = UInt(Pg);
integer n = UInt(Zn);
integer d = UInt(Zd);

Assembler Symbols

<Zd> Is the name of the destination scalable vector register, encoded in the "Zd" field.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Zn> Is the name of the source scalable vector register, encoded in the "Zn" field.

Operation

CheckSVEEnabled();
integer elements = VL DIV 32;
bits(PL) mask = P[g];
bits(VL) operand = if AnyActiveElement(mask, 32) then Z[n] else Zeros();
bits(VL) result = Z[d];

for e = 0 to elements-1
if ElemP[mask, e, 32] == '1' then
bits(32) element = Elem[operand, e, 32];
Elem[result, 2*e, 16] = FPConvertBF(element, FPCR[]);
Elem[result, 2*e+1, 16] = Zeros();

Z[d] = result;

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

BFCVT Page 1725

BFCVTNT

Floating-point down convert and narrow to BFloat16 (top, predicated)

Convert to BFloat16 from single-precision in each active floating-point element of the source vector, and place the
results in the odd-numbered 16-bit elements of the destination vector, leaving the even-numbered elements
unchanged. Inactive elements in the destination vector register remain unmodified.
ID_AA64ZFR0_EL1.BF16 indicates whether this instruction is implemented.

SVE
(FEAT_BF16)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 0 1 0 0 0 1 0 1 0 1 0 1 Pg Zn Zd

BFCVTNT <Zd>.H, <Pg>/M, <Zn>.S

if !HaveSVE() || !HaveBF16Ext() then UNDEFINED;

integer g = UInt(Pg);
integer n = UInt(Zn);
integer d = UInt(Zd);

Assembler Symbols

<Zd> Is the name of the destination scalable vector register, encoded in the "Zd" field.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Zn> Is the name of the first source scalable vector register, encoded in the "Zn" field.

Operation

CheckSVEEnabled();
integer elements = VL DIV 32;
bits(PL) mask = P[g];
bits(VL) operand = if AnyActiveElement(mask, 32) then Z[n] else Zeros();
bits(VL) result = Z[d];

for e = 0 to elements-1
if ElemP[mask, e, 32] == '1' then
bits(32) element = Elem[operand, e, 32];
Elem[result, 2*e+1, 16] = FPConvertBF(element, FPCR[]);

Z[d] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

BFCVTNT Page 1726

BFDOT (indexed)

BFloat16 floating-point indexed dot product

Irrespective of the control bits in the FPCR, this instruction:

* Performs an unfused sum-of-products of each pair of adjacent BFloat16 elements in the first source vector with the
specified pair of elements in the second vector. The intermediate single-precision products are rounded before they
are summed, and the intermediate sum is rounded before accumulation into the single-precision destination element
that overlaps with the corresponding pair of BFloat16 elements in the first source vector.
* Uses the non-IEEE Round-to-Odd rounding mode, which forces bit 0 of an inexact result to 1, and rounds an overflow
to an appropriately signed Infinity.
* Does not modify the cumulative FPSR exception bits (IDC, IXC, UFC, OFC, DZC, and IOC).
* Disables trapped floating-point exceptions, as if the FPCR trap enable bits (IDE, IXE, UFE, OFE, DZE, and IOE) are
all zero.
* Flushes denormalized inputs and results to zero, as if FPCR.{FZ, FIZ} is {1, 1}.
* Only the Default NaN is generated, as if FPCR.DN had the value 1.
The BFloat16 pairs within the second source vector are specified using an immediate index which selects the same
BFloat16 pair position within each 128-bit vector segment. The index range is from 0 to 3.
This instruction is unpredicated.
ID_AA64ZFR0_EL1.BF16 indicates whether this instruction is implemented.

SVE
(FEAT_BF16)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 0 0 1 1 i2 Zm 0 1 0 0 0 0 Zn Zda

BFDOT <Zda>.S, <Zn>.H, <Zm>.H[<imm>]

if !HaveSVE() || !HaveBF16Ext() then UNDEFINED;

integer n = UInt(Zn);
integer m = UInt(Zm);
integer da = UInt(Zda);
integer index = UInt(i2);

Assembler Symbols

<Zda> Is the name of the third source and destination scalable vector register, encoded in the "Zda" field.
<Zn> Is the name of the first source scalable vector register, encoded in the "Zn" field.
<Zm> Is the name of the second source scalable vector register Z0-Z7, encoded in the "Zm" field.
<imm> Is the immediate index, in the range 0 to 3, encoded in the "i2" field.

BFDOT (indexed) Page 1727

Operation

CheckSVEEnabled();
integer elements = VL DIV 32;
integer eltspersegment = 128 DIV 32;
bits(VL) operand1 = Z[n];
bits(VL) operand2 = Z[m];
bits(VL) operand3 = Z[da];
bits(VL) result;

for e = 0 to elements-1
integer segmentbase = e - (e MOD eltspersegment);
integer s = segmentbase + index;
bits(16) elt1_a = Elem[operand1, 2 * e + 0, 16];
bits(16) elt1_b = Elem[operand1, 2 * e + 1, 16];
bits(16) elt2_a = Elem[operand2, 2 * s + 0, 16];
bits(16) elt2_b = Elem[operand2, 2 * s + 1, 16];
bits(32) sum = Elem[operand3, e, 32];

sum = BFDotAdd(sum, elt1_a, elt1_b, elt2_a, elt2_b, FPCR[]);

Elem[result, e, 32] = sum;

Z[da] = result;

Operational information

This instruction might be immediately preceded in program order by a MOVPRFX instruction. The MOVPRFX instruction
must conform to all of the following requirements, otherwise the behavior of the MOVPRFX and this instruction is
UNPREDICTABLE:
• The MOVPRFX instruction must be unpredicated.
• The MOVPRFX instruction must specify the same destination register as this instruction.
• The destination register must not refer to architectural register state referenced by any other source operand
register of this instruction.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

BFDOT (indexed) Page 1728

BFDOT (vectors)

BFloat16 floating-point dot product

Irrespective of the control bits in the FPCR, this instruction:

* Performs an unfused sum-of-products of each pair of adjacent BFloat16 elements in the source vectors. The
intermediate single-precision products are rounded before they are summed, and the intermediate sum is rounded
before accumulation into the single-precision destination element that overlaps with the corresponding pair of
BFloat16 elements in the source vectors.
* Uses the non-IEEE Round-to-Odd rounding mode, which forces bit 0 of an inexact result to 1, and rounds an overflow
to an appropriately signed Infinity.
* Does not modify the cumulative FPSR exception bits (IDC, IXC, UFC, OFC, DZC, and IOC).
* Disables trapped floating-point exceptions, as if the FPCR trap enable bits (IDE, IXE, UFE, OFE, DZE, and IOE) are
all zero.
* Flushes denormalized inputs and results to zero, as if FPCR.{FZ, FIZ} is {1, 1}.
* Only the Default NaN is generated, as if FPCR.DN had the value 1.
This instruction is unpredicated.
ID_AA64ZFR0_EL1.BF16 indicates whether this instruction is implemented.

SVE
(FEAT_BF16)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 0 0 1 1 Zm 1 0 0 0 0 0 Zn Zda

BFDOT <Zda>.S, <Zn>.H, <Zm>.H

if !HaveSVE() || !HaveBF16Ext() then UNDEFINED;

integer n = UInt(Zn);
integer m = UInt(Zm);
integer da = UInt(Zda);

Assembler Symbols

Operation

CheckSVEEnabled();
integer elements = VL DIV 32;
bits(VL) operand1 = Z[n];
bits(VL) operand2 = Z[m];
bits(VL) operand3 = Z[da];
bits(VL) result;

for e = 0 to elements-1
bits(16) elt1_a = Elem[operand1, 2 * e + 0, 16];
bits(16) elt1_b = Elem[operand1, 2 * e + 1, 16];
bits(16) elt2_a = Elem[operand2, 2 * e + 0, 16];
bits(16) elt2_b = Elem[operand2, 2 * e + 1, 16];
bits(32) sum = Elem[operand3, e, 32];

sum = BFDotAdd(sum, elt1_a, elt1_b, elt2_a, elt2_b, FPCR[]);

Elem[result, e, 32] = sum;

Z[da] = result;

BFDOT (vectors) Page 1729

Operational information

This instruction might be immediately preceded in program order by a MOVPRFX instruction. The MOVPRFX instruction
must conform to all of the following requirements, otherwise the behavior of the MOVPRFX and this instruction is
UNPREDICTABLE:
• The MOVPRFX instruction must be unpredicated.
• The MOVPRFX instruction must specify the same destination register as this instruction.
• The destination register must not refer to architectural register state referenced by any other source operand
register of this instruction.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

BFDOT (vectors) Page 1730

BFMLALB (indexed)

BFloat16 floating-point multiply-add long to single-precision (bottom, indexed)

This BFloat16 floating-point multiply-add long instruction widens the even-numbered BFloat16 elements in the first
source vector and the indexed element from the corresponding 128-bit segment in the second source vector to single-
precision format and then destructively multiplies and adds these values without intermediate rounding to the single-
precision elements of the destination vector that overlap with the corresponding BFloat16 elements in the first source
vector. This instruction is unpredicated.
ID_AA64ZFR0_EL1.BF16 indicates whether this instruction is implemented.

SVE
(FEAT_BF16)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 0 1 1 1 i3h Zm 0 1 0 0 i3l 0 Zn Zda
o2 op T

BFMLALB <Zda>.S, <Zn>.H, <Zm>.H[<imm>]

if !HaveSVE() || !HaveBF16Ext() then UNDEFINED;

integer n = UInt(Zn);
integer m = UInt(Zm);
integer da = UInt(Zda);
integer index = UInt(i3h:i3l);

Assembler Symbols

<Zda> Is the name of the third source and destination scalable vector register, encoded in the "Zda" field.
<Zn> Is the name of the first source scalable vector register, encoded in the "Zn" field.
<Zm> Is the name of the second source scalable vector register Z0-Z7, encoded in the "Zm" field.
<imm> Is the immediate index, in the range 0 to 7, encoded in the "i3h:i3l" fields.

Operation

CheckSVEEnabled();
integer elements = VL DIV 32;
integer eltspersegment = 128 DIV 32;
bits(VL) operand1 = Z[n];
bits(VL) operand2 = Z[m];
bits(VL) operand3 = Z[da];
bits(VL) result;

for e = 0 to elements-1
integer segmentbase = e - (e MOD eltspersegment);
integer s = 2 * segmentbase + index;
bits(32) element1 = Elem[operand1, 2 * e + 0, 16] : Zeros(16);
bits(32) element2 = Elem[operand2, s, 16] : Zeros(16);
bits(32) element3 = Elem[operand3, e, 32];
Elem[result, e, 32] = BFMulAdd(element3, element1, element2, FPCR[]);

Z[da] = result;

Operational information

BFMLALB (indexed) Page 1731

• The destination register must not refer to architectural register state referenced by any other source operand
register of this instruction.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

BFMLALB (indexed) Page 1732

BFMLALB (vectors)

BFloat16 floating-point multiply-add long to single-precision (bottom)

This BFloat16 floating-point multiply-add long instruction widens the even-numbered BFloat16 elements in the first
source vector and the corresponding elements in the second source vector to single-precision format and then
destructively multiplies and adds these values without intermediate rounding to the single-precision elements of the
destination vector that overlap with the corresponding BFloat16 elements in the source vectors. This instruction is
unpredicated.
ID_AA64ZFR0_EL1.BF16 indicates whether this instruction is implemented.

SVE
(FEAT_BF16)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 0 1 1 1 Zm 1 0 0 0 0 0 Zn Zda
o2 op T

BFMLALB <Zda>.S, <Zn>.H, <Zm>.H

if !HaveSVE() || !HaveBF16Ext() then UNDEFINED;

integer n = UInt(Zn);
integer m = UInt(Zm);
integer da = UInt(Zda);

Assembler Symbols

Operation

CheckSVEEnabled();
integer elements = VL DIV 32;
bits(VL) operand1 = Z[n];
bits(VL) operand2 = Z[m];
bits(VL) operand3 = Z[da];
bits(VL) result;

for e = 0 to elements-1
bits(32) element1 = Elem[operand1, 2 * e + 0, 16] : Zeros(16);
bits(32) element2 = Elem[operand2, 2 * e + 0, 16] : Zeros(16);
bits(32) element3 = Elem[operand3, e, 32];
Elem[result, e, 32] = BFMulAdd(element3, element1, element2, FPCR[]);

Z[da] = result;

Operational information

This instruction might be immediately preceded in program order by a MOVPRFX instruction. The MOVPRFX instruction
must conform to all of the following requirements, otherwise the behavior of the MOVPRFX and this instruction is
UNPREDICTABLE:
• The MOVPRFX instruction must be unpredicated.
• The MOVPRFX instruction must specify the same destination register as this instruction.
• The destination register must not refer to architectural register state referenced by any other source operand
register of this instruction.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

BFMLALB (vectors) Page 1733

BFMLALT (indexed)

BFloat16 floating-point multiply-add long to single-precision (top, indexed)

This BFloat16 floating-point multiply-add long instruction widens the odd-numbered BFloat16 elements in the first
source vector and the indexed element from the corresponding 128-bit segment in the second source vector to single-
precision format and then destructively multiplies and adds these values without intermediate rounding to the single-
precision elements of the destination vector that overlap with the corresponding BFloat16 elements in the first source
vector. This instruction is unpredicated.
ID_AA64ZFR0_EL1.BF16 indicates whether this instruction is implemented.

SVE
(FEAT_BF16)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 0 1 1 1 i3h Zm 0 1 0 0 i3l 1 Zn Zda
o2 op T

BFMLALT <Zda>.S, <Zn>.H, <Zm>.H[<imm>]

if !HaveSVE() || !HaveBF16Ext() then UNDEFINED;

integer n = UInt(Zn);
integer m = UInt(Zm);
integer da = UInt(Zda);
integer index = UInt(i3h:i3l);

Assembler Symbols

<Zda> Is the name of the third source and destination scalable vector register, encoded in the "Zda" field.
<Zn> Is the name of the first source scalable vector register, encoded in the "Zn" field.
<Zm> Is the name of the second source scalable vector register Z0-Z7, encoded in the "Zm" field.
<imm> Is the immediate index, in the range 0 to 7, encoded in the "i3h:i3l" fields.

Operation

CheckSVEEnabled();
integer elements = VL DIV 32;
integer eltspersegment = 128 DIV 32;
bits(VL) operand1 = Z[n];
bits(VL) operand2 = Z[m];
bits(VL) operand3 = Z[da];
bits(VL) result;

for e = 0 to elements-1
integer segmentbase = e - (e MOD eltspersegment);
integer s = 2 * segmentbase + index;
bits(32) element1 = Elem[operand1, 2 * e + 1, 16] : Zeros(16);
bits(32) element2 = Elem[operand2, s, 16] : Zeros(16);
bits(32) element3 = Elem[operand3, e, 32];
Elem[result, e, 32] = BFMulAdd(element3, element1, element2, FPCR[]);

Z[da] = result;

Operational information

BFMLALT (indexed) Page 1734

• The destination register must not refer to architectural register state referenced by any other source operand
register of this instruction.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

BFMLALT (indexed) Page 1735

BFMLALT (vectors)

BFloat16 floating-point multiply-add long to single-precision (top)

This BFloat16 floating-point multiply-add long instruction widens the odd-numbered BFloat16 elements in the first
source vector and the corresponding elements in the second source vector to single-precision format and then
destructively multiplies and adds these values without intermediate rounding to the single-precision elements of the
destination vector that overlap with the corresponding BFloat16 elements in the source vectors. This instruction is
unpredicated.
ID_AA64ZFR0_EL1.BF16 indicates whether this instruction is implemented.

SVE
(FEAT_BF16)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 0 1 1 1 Zm 1 0 0 0 0 1 Zn Zda
o2 op T

BFMLALT <Zda>.S, <Zn>.H, <Zm>.H

if !HaveSVE() || !HaveBF16Ext() then UNDEFINED;

integer n = UInt(Zn);
integer m = UInt(Zm);
integer da = UInt(Zda);

Assembler Symbols

Operation

CheckSVEEnabled();
integer elements = VL DIV 32;
bits(VL) operand1 = Z[n];
bits(VL) operand2 = Z[m];
bits(VL) operand3 = Z[da];
bits(VL) result;

for e = 0 to elements-1
bits(32) element1 = Elem[operand1, 2 * e + 1, 16] : Zeros(16);
bits(32) element2 = Elem[operand2, 2 * e + 1, 16] : Zeros(16);
bits(32) element3 = Elem[operand3, e, 32];
Elem[result, e, 32] = BFMulAdd(element3, element1, element2, FPCR[]);

Z[da] = result;

Operational information

This instruction might be immediately preceded in program order by a MOVPRFX instruction. The MOVPRFX instruction
must conform to all of the following requirements, otherwise the behavior of the MOVPRFX and this instruction is
UNPREDICTABLE:
• The MOVPRFX instruction must be unpredicated.
• The MOVPRFX instruction must specify the same destination register as this instruction.
• The destination register must not refer to architectural register state referenced by any other source operand
register of this instruction.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

BFMLALT (vectors) Page 1736

BFMMLA

BFloat16 floating-point matrix multiply-accumulate

Irrespective of the control bits in the FPCR, this instruction:

* Performs two unfused sums-of-products within each two pairs of adjacent BFloat16 elements while multiplying the
2×4 matrix of BFloat16 values held in each 128-bit segment of the first source vector by the 4×2 matrix of BFloat16
values in the corresponding segment of the second source vector. The intermediate single-precision products are
rounded before they are summed and the intermediate sum is rounded before accumulation into the 2×2 single-
precision matrix in the corresponding segment of the destination vector. This is equivalent to accumulating two 2-way
unfused dot products per destination element.
* Uses the non-IEEE Round-to-Odd rounding mode, which forces bit 0 of an inexact result to 1, and rounds an overflow
to an appropriately signed Infinity.
* Does not modify the cumulative FPSR exception bits (IDC, IXC, UFC, OFC, DZC, and IOC).
* Disables trapped floating-point exceptions, as if the FPCR trap enable bits (IDE, IXE, UFE, OFE, DZE, and IOE) are
all zero.
* Flushes denormalized inputs and results to zero, as if FPCR.{FZ, FIZ} is {1, 1}.
* Only the Default NaN is generated, as if FPCR.DN had the value 1.
This instruction is unpredicated and vector length agnostic.
ID_AA64ZFR0_EL1.BF16 indicates whether this instruction is implemented.

SVE
(FEAT_BF16)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 0 0 1 1 Zm 1 1 1 0 0 1 Zn Zda

BFMMLA <Zda>.S, <Zn>.H, <Zm>.H

if !HaveSVE() || !HaveBF16Ext() then UNDEFINED;

integer n = UInt(Zn);
integer m = UInt(Zm);
integer da = UInt(Zda);

Assembler Symbols

Operation

CheckSVEEnabled();
integer segments = VL DIV 128;
bits(VL) operand1 = Z[n];
bits(VL) operand2 = Z[m];
bits(VL) operand3 = Z[da];
bits(VL) result;
bits(128) op1, op2;
bits(128) res, addend;

for s = 0 to segments-1
op1 = Elem[operand1, s, 128];
op2 = Elem[operand2, s, 128];
addend = Elem[operand3, s, 128];
res = BFMatMulAdd(addend, op1, op2);
Elem[result, s, 128] = res;

Z[da] = result;

BFMMLA Page 1737

Operational information

This instruction might be immediately preceded in program order by a MOVPRFX instruction. The MOVPRFX instruction
must conform to all of the following requirements, otherwise the behavior of the MOVPRFX and this instruction is
UNPREDICTABLE:
• The MOVPRFX instruction must be unpredicated.
• The MOVPRFX instruction must specify the same destination register as this instruction.
• The destination register must not refer to architectural register state referenced by any other source operand
register of this instruction.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

BFMMLA Page 1738

BIC (immediate)

Bitwise clear bits using immediate (unpredicated)

Bitwise clear bits using immediate with each 64-bit element of the source vector, and destructively place the results in
the corresponding elements of the source vector. The immediate is a 64-bit value consisting of a single run of ones or
zeros repeating every 2, 4, 8, 16, 32 or 64 bits. This instruction is unpredicated.

This is a pseudo-instruction of AND (immediate). This means:

• The encodings in this description are named to match the encodings of AND (immediate).
• The assembler syntax is used only for assembly, and is not used on disassembly.
• The description of AND (immediate) gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 1 0 0 0 0 0 imm13 Zdn

BIC <Zdn>.<T>, <Zdn>.<T>, #<const>

is equivalent to

AND <Zdn>.<T>, <Zdn>.<T>, #(-<const> - 1)

Assembler Symbols

<Zdn> Is the name of the source and destination scalable vector register, encoded in the "Zdn" field.

<T> Is the size specifier, encoded in “imm13<12>:imm13<5:0>”:

imm13<12> imm13<5:0> <T>

0 0xxxxx S
0 10xxxx H
0 110xxx B
0 1110xx B
0 11110x B
0 111110 RESERVED
0 111111 RESERVED
1 xxxxxx D

<const> Is a 64, 32, 16 or 8-bit bitmask consisting of replicated 2, 4, 8, 16, 32 or 64 bit fields, each field
containing a rotated run of non-zero bits, encoded in the "imm13" field.

Operation

The description of AND (immediate) gives the operational pseudocode for this instruction.

Operational information

This instruction might be immediately preceded in program order by a MOVPRFX instruction. The MOVPRFX instruction
must conform to all of the following requirements, otherwise the behavior of the MOVPRFX and this instruction is
UNPREDICTABLE:
• The MOVPRFX instruction must be unpredicated.
• The MOVPRFX instruction must specify the same destination register as this instruction.
• The destination register must not refer to architectural register state referenced by any other source operand
register of this instruction.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

BIC (immediate) Page 1739

BIC (predicates)

Bitwise clear predicates

Bitwise AND inverted active elements of the second source predicate with corresponding elements of the first source
predicate and place the results in the corresponding elements of the destination predicate. Inactive elements in the
destination predicate register are set to zero. Does not set the condition flags.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 0 0 0 0 Pm 0 1 Pg 0 Pn 1 Pd
S

BIC <Pd>.B, <Pg>/Z, <Pn>.B, <Pm>.B

if !HaveSVE() then UNDEFINED;

integer esize = 8;
integer g = UInt(Pg);
integer n = UInt(Pn);
integer m = UInt(Pm);
integer d = UInt(Pd);
boolean setflags = FALSE;

Assembler Symbols

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(PL) operand1 = P[n];
bits(PL) operand2 = P[m];
bits(PL) result;

if setflags then
PSTATE.<N,Z,C,V> = PredTest(mask, result, esize);
P[d] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

BIC (predicates) Page 1740

BIC (vectors, predicated)

Bitwise clear vectors (predicated)

Bitwise AND inverted active elements of the second source vector with corresponding elements of the first source
vector and destructively place the results in the corresponding elements of the first source vector. Inactive elements in
the destination vector register remain unmodified.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 0 1 1 0 1 1 0 0 0 Pg Zm Zdn

BIC <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>

if !HaveSVE() then UNDEFINED;

integer esize = 8 << UInt(size);
integer g = UInt(Pg);
integer dn = UInt(Zdn);
integer m = UInt(Zm);

Assembler Symbols

<Zdn> Is the name of the first source and destination scalable vector register, encoded in the "Zdn" field.

<T> Is the size specifier, encoded in “size”:

size <T>
00 B
01 H
10 S
11 D

<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Zm> Is the name of the second source scalable vector register, encoded in the "Zm" field.

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) operand1 = Z[dn];
bits(VL) operand2 = if AnyActiveElement(mask, esize) then Z[m] else Zeros();
bits(VL) result;

for e = 0 to elements-1
bits(esize) element1 = Elem[operand1, e, esize];
bits(esize) element2 = Elem[operand2, e, esize];
if ElemP[mask, e, esize] == '1' then
Elem[result, e, esize] = element1 AND (NOT element2);
else
Elem[result, e, esize] = Elem[operand1, e, esize];

Z[dn] = result;

Operational information

BIC (vectors, predicated) Page 1741

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

BIC (vectors, predicated) Page 1742

BIC (vectors, unpredicated)

Bitwise clear vectors (unpredicated)

Bitwise AND inverted all elements of the second source vector with corresponding elements of the first source vector
and place the results in the corresponding elements of the destination vector. This instruction is unpredicated.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 1 1 1 Zm 0 0 1 1 0 0 Zn Zd

BIC <Zd>.D, <Zn>.D, <Zm>.D

if !HaveSVE() then UNDEFINED;

integer n = UInt(Zn);
integer m = UInt(Zm);
integer d = UInt(Zd);

Assembler Symbols

Operation

CheckSVEEnabled();
bits(VL) operand1 = Z[n];
bits(VL) operand2 = Z[m];

Z[d] = operand1 AND (NOT operand2);

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

BIC (vectors, unpredicated) Page 1743

BICS

Bitwise clear predicates, setting the condition flags

Bitwise AND inverted active elements of the second source predicate with corresponding elements of the first source
predicate and place the results in the corresponding elements of the destination predicate. Inactive elements in the
destination predicate register are set to zero. Sets the FIRST (N), NONE (Z), !LAST (C) condition flags based on the
predicate result, and the V flag to zero.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 0 1 0 0 Pm 0 1 Pg 0 Pn 1 Pd
S

BICS <Pd>.B, <Pg>/Z, <Pn>.B, <Pm>.B

if !HaveSVE() then UNDEFINED;

integer esize = 8;
integer g = UInt(Pg);
integer n = UInt(Pn);
integer m = UInt(Pm);
integer d = UInt(Pd);
boolean setflags = TRUE;

Assembler Symbols

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(PL) operand1 = P[n];
bits(PL) operand2 = P[m];
bits(PL) result;

if setflags then
PSTATE.<N,Z,C,V> = PredTest(mask, result, esize);
P[d] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

BICS Page 1744

BRKA

Break after first true condition

Sets destination predicate elements up to and including the first active and true source element to true, then sets
subsequent elements to false. Inactive elements in the destination predicate register remain unmodified or are set to
zero, depending on whether merging or zeroing predication is selected. Does not set the condition flags.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 0 0 0 1 0 0 0 0 0 1 Pg 0 Pn M Pd
B S

BRKA <Pd>.B, <Pg>/<ZM>, <Pn>.B

if !HaveSVE() then UNDEFINED;

integer esize = 8;
integer g = UInt(Pg);
integer n = UInt(Pn);
integer d = UInt(Pd);
boolean merging = (M == '1');
boolean setflags = FALSE;

Assembler Symbols

<Pd> Is the name of the destination scalable predicate register, encoded in the "Pd" field.
<Pg> Is the name of the governing scalable predicate register, encoded in the "Pg" field.

<ZM> Is the predication qualifier, encoded in “M”:

M <ZM>
0 Z
1 M

<Pn> Is the name of the source scalable predicate register, encoded in the "Pn" field.

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(PL) operand = P[n];
bits(PL) operand2 = P[d];
boolean break = FALSE;
bits(PL) result;

for e = 0 to elements-1
boolean element = ElemP[operand, e, esize] == '1';
if ElemP[mask, e, esize] == '1' then
ElemP[result, e, esize] = if !break then '1' else '0';
break = break || element;
elsif merging then
ElemP[result, e, esize] = ElemP[operand2, e, esize];
else
ElemP[result, e, esize] = '0';

if setflags then
PSTATE.<N,Z,C,V> = PredTest(mask, result, esize);
P[d] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

BRKA Page 1745

BRKAS

Break after first true condition, setting the condition flags

Sets destination predicate elements up to and including the first active and true source element to true, then sets
subsequent elements to false. Inactive elements in the destination predicate register are set to zero. Sets the FIRST (N),
NONE (Z), !LAST (C) condition flags based on the predicate result, and the V flag to zero.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 0 1 0 1 0 0 0 0 0 1 Pg 0 Pn 0 Pd
B S M

BRKAS <Pd>.B, <Pg>/Z, <Pn>.B

if !HaveSVE() then UNDEFINED;

integer esize = 8;
integer g = UInt(Pg);
integer n = UInt(Pn);
integer d = UInt(Pd);
boolean merging = FALSE;
boolean setflags = TRUE;

Assembler Symbols

<Pd> Is the name of the destination scalable predicate register, encoded in the "Pd" field.
<Pg> Is the name of the governing scalable predicate register, encoded in the "Pg" field.
<Pn> Is the name of the source scalable predicate register, encoded in the "Pn" field.

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(PL) operand = P[n];
bits(PL) operand2 = P[d];
boolean break = FALSE;
bits(PL) result;

if setflags then
PSTATE.<N,Z,C,V> = PredTest(mask, result, esize);
P[d] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

BRKAS Page 1746

BRKB

Break before first true condition

Sets destination predicate elements up to but not including the first active and true source element to true, then sets
subsequent elements to false. Inactive elements in the destination predicate register remain unmodified or are set to
zero, depending on whether merging or zeroing predication is selected. Does not set the condition flags.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 1 0 0 1 0 0 0 0 0 1 Pg 0 Pn M Pd
B S

BRKB <Pd>.B, <Pg>/<ZM>, <Pn>.B

if !HaveSVE() then UNDEFINED;

integer esize = 8;
integer g = UInt(Pg);
integer n = UInt(Pn);
integer d = UInt(Pd);
boolean merging = (M == '1');
boolean setflags = FALSE;

Assembler Symbols

<Pd> Is the name of the destination scalable predicate register, encoded in the "Pd" field.
<Pg> Is the name of the governing scalable predicate register, encoded in the "Pg" field.

<ZM> Is the predication qualifier, encoded in “M”:

M <ZM>
0 Z
1 M

<Pn> Is the name of the source scalable predicate register, encoded in the "Pn" field.

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(PL) operand = P[n];
bits(PL) operand2 = P[d];
boolean break = FALSE;
bits(PL) result;

for e = 0 to elements-1
boolean element = ElemP[operand, e, esize] == '1';
if ElemP[mask, e, esize] == '1' then
break = break || element;
ElemP[result, e, esize] = if !break then '1' else '0';
elsif merging then
ElemP[result, e, esize] = ElemP[operand2, e, esize];
else
ElemP[result, e, esize] = '0';

if setflags then
PSTATE.<N,Z,C,V> = PredTest(mask, result, esize);
P[d] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

BRKB Page 1747

BRKBS

Break before first true condition, setting the condition flags

Sets destination predicate elements up to but not including the first active and true source element to true, then sets
subsequent elements to false. Inactive elements in the destination predicate register are set to zero. Sets the FIRST (N),
NONE (Z), !LAST (C) condition flags based on the predicate result, and the V flag to zero.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 1 1 0 1 0 0 0 0 0 1 Pg 0 Pn 0 Pd
B S M

BRKBS <Pd>.B, <Pg>/Z, <Pn>.B

if !HaveSVE() then UNDEFINED;

integer esize = 8;
integer g = UInt(Pg);
integer n = UInt(Pn);
integer d = UInt(Pd);
boolean merging = FALSE;
boolean setflags = TRUE;

Assembler Symbols

<Pd> Is the name of the destination scalable predicate register, encoded in the "Pd" field.
<Pg> Is the name of the governing scalable predicate register, encoded in the "Pg" field.
<Pn> Is the name of the source scalable predicate register, encoded in the "Pn" field.

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(PL) operand = P[n];
bits(PL) operand2 = P[d];
boolean break = FALSE;
bits(PL) result;

if setflags then
PSTATE.<N,Z,C,V> = PredTest(mask, result, esize);
P[d] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

BRKBS Page 1748

BRKN

Propagate break to next partition

If the last active element of the first source predicate is false then set the destination predicate to all-false. Otherwise
leaves the destination and second source predicate unchanged. Does not set the condition flags.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 0 0 0 1 1 0 0 0 0 1 Pg 0 Pn 0 Pdm
S

BRKN <Pdm>.B, <Pg>/Z, <Pn>.B, <Pdm>.B

if !HaveSVE() then UNDEFINED;

integer g = UInt(Pg);
integer n = UInt(Pn);
integer dm = UInt(Pdm);
boolean setflags = FALSE;

Assembler Symbols

<Pdm> Is the name of the second source and destination scalable predicate register, encoded in the "Pdm"
field.
<Pg> Is the name of the governing scalable predicate register, encoded in the "Pg" field.
<Pn> Is the name of the first source scalable predicate register, encoded in the "Pn" field.

Operation

CheckSVEEnabled();
bits(PL) mask = P[g];
bits(PL) operand1 = P[n];
bits(PL) operand2 = P[dm];
bits(PL) result;

if LastActive(mask, operand1, 8) == '1' then

result = operand2;
else
result = Zeros();

if setflags then
PSTATE.<N,Z,C,V> = PredTest(Ones(PL), result, 8);
P[dm] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

BRKN Page 1749

BRKNS

Propagate break to next partition, setting the condition flags

If the last active element of the first source predicate is false then set the destination predicate to all-false. Otherwise
leaves the destination and second source predicate unchanged. Sets the FIRST (N), NONE (Z), !LAST (C) condition flags
based on the predicate result, and the V flag to zero.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 0 1 0 1 1 0 0 0 0 1 Pg 0 Pn 0 Pdm
S

BRKNS <Pdm>.B, <Pg>/Z, <Pn>.B, <Pdm>.B

if !HaveSVE() then UNDEFINED;

integer g = UInt(Pg);
integer n = UInt(Pn);
integer dm = UInt(Pdm);
boolean setflags = TRUE;

Assembler Symbols

Operation

CheckSVEEnabled();
bits(PL) mask = P[g];
bits(PL) operand1 = P[n];
bits(PL) operand2 = P[dm];
bits(PL) result;

if LastActive(mask, operand1, 8) == '1' then

result = operand2;
else
result = Zeros();

if setflags then
PSTATE.<N,Z,C,V> = PredTest(Ones(PL), result, 8);
P[dm] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

BRKNS Page 1750

BRKPA

Break after first true condition, propagating from previous partition

If the last active element of the first source predicate is false then set the destination predicate to all-false. Otherwise
sets destination predicate elements up to and including the first active and true source element to true, then sets
subsequent elements to false. Inactive elements in the destination predicate register are set to zero. Does not set the
condition flags.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 0 0 0 0 Pm 1 1 Pg 0 Pn 0 Pd
S B

BRKPA <Pd>.B, <Pg>/Z, <Pn>.B, <Pm>.B

if !HaveSVE() then UNDEFINED;

integer esize = 8;
integer g = UInt(Pg);
integer n = UInt(Pn);
integer m = UInt(Pm);
integer d = UInt(Pd);
boolean setflags = FALSE;

Assembler Symbols

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(PL) operand1 = P[n];
bits(PL) operand2 = P[m];
bits(PL) result;
boolean last = (LastActive(mask, operand1, 8) == '1');

for e = 0 to elements-1
if ElemP[mask, e, 8] == '1' then
ElemP[result, e, 8] = if last then '1' else '0';
last = last && (ElemP[operand2, e, 8] == '0');
else
ElemP[result, e, 8] = '0';

if setflags then
PSTATE.<N,Z,C,V> = PredTest(mask, result, esize);
P[d] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

BRKPA Page 1751

BRKPAS

Break after first true condition, propagating from previous partition and setting the condition flags

If the last active element of the first source predicate is false then set the destination predicate to all-false. Otherwise
sets destination predicate elements up to and including the first active and true source element to true, then sets
subsequent elements to false. Inactive elements in the destination predicate register are set to zero. Sets the FIRST (N),
NONE (Z), !LAST (C) condition flags based on the predicate result, and the V flag to zero.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 0 1 0 0 Pm 1 1 Pg 0 Pn 0 Pd
S B

BRKPAS <Pd>.B, <Pg>/Z, <Pn>.B, <Pm>.B

if !HaveSVE() then UNDEFINED;

integer esize = 8;
integer g = UInt(Pg);
integer n = UInt(Pn);
integer m = UInt(Pm);
integer d = UInt(Pd);
boolean setflags = TRUE;

Assembler Symbols

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(PL) operand1 = P[n];
bits(PL) operand2 = P[m];
bits(PL) result;
boolean last = (LastActive(mask, operand1, 8) == '1');

for e = 0 to elements-1
if ElemP[mask, e, 8] == '1' then
ElemP[result, e, 8] = if last then '1' else '0';
last = last && (ElemP[operand2, e, 8] == '0');
else
ElemP[result, e, 8] = '0';

if setflags then
PSTATE.<N,Z,C,V> = PredTest(mask, result, esize);
P[d] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

BRKPAS Page 1752

BRKPB

Break before first true condition, propagating from previous partition

If the last active element of the first source predicate is false then set the destination predicate to all-false. Otherwise
sets destination predicate elements up to but not including the first active and true source element to true, then sets
subsequent elements to false. Inactive elements in the destination predicate register are set to zero. Does not set the
condition flags.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 0 0 0 0 Pm 1 1 Pg 0 Pn 1 Pd
S B

BRKPB <Pd>.B, <Pg>/Z, <Pn>.B, <Pm>.B

if !HaveSVE() then UNDEFINED;

integer esize = 8;
integer g = UInt(Pg);
integer n = UInt(Pn);
integer m = UInt(Pm);
integer d = UInt(Pd);
boolean setflags = FALSE;

Assembler Symbols

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(PL) operand1 = P[n];
bits(PL) operand2 = P[m];
bits(PL) result;
boolean last = (LastActive(mask, operand1, 8) == '1');

for e = 0 to elements-1
if ElemP[mask, e, 8] == '1' then
last = last && (ElemP[operand2, e, 8] == '0');
ElemP[result, e, 8] = if last then '1' else '0';
else
ElemP[result, e, 8] = '0';

if setflags then
PSTATE.<N,Z,C,V> = PredTest(mask, result, esize);
P[d] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

BRKPB Page 1753

BRKPBS

Break before first true condition, propagating from previous partition and setting the condition flags

If the last active element of the first source predicate is false then set the destination predicate to all-false. Otherwise
sets destination predicate elements up to but not including the first active and true source element to true, then sets
subsequent elements to false. Inactive elements in the destination predicate register are set to zero. Sets the FIRST (N),
NONE (Z), !LAST (C) condition flags based on the predicate result, and the V flag to zero.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 0 1 0 0 Pm 1 1 Pg 0 Pn 1 Pd
S B

BRKPBS <Pd>.B, <Pg>/Z, <Pn>.B, <Pm>.B

if !HaveSVE() then UNDEFINED;

integer esize = 8;
integer g = UInt(Pg);
integer n = UInt(Pn);
integer m = UInt(Pm);
integer d = UInt(Pd);
boolean setflags = TRUE;

Assembler Symbols

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(PL) operand1 = P[n];
bits(PL) operand2 = P[m];
bits(PL) result;
boolean last = (LastActive(mask, operand1, 8) == '1');

for e = 0 to elements-1
if ElemP[mask, e, 8] == '1' then
last = last && (ElemP[operand2, e, 8] == '0');
ElemP[result, e, 8] = if last then '1' else '0';
else
ElemP[result, e, 8] = '0';

if setflags then
PSTATE.<N,Z,C,V> = PredTest(mask, result, esize);
P[d] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

BRKPBS Page 1754

CLASTA (scalar)

Conditionally extract element after last to general-purpose register

From the source vector register extract the element after the last active element, or if the last active element is the
final element extract element zero, and then zero-extend that element to destructively place in the destination and
first source general-purpose register. If there are no active elements then destructively zero-extend the least
significant element-size bits of the destination and first source general-purpose register.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 size 1 1 0 0 0 0 1 0 1 Pg Zm Rdn
B

CLASTA <R><dn>, <Pg>, <R><dn>, <Zm>.<T>

if !HaveSVE() then UNDEFINED;

integer esize = 8 << UInt(size);
integer g = UInt(Pg);
integer dn = UInt(Rdn);
integer m = UInt(Zm);
integer csize = if esize < 64 then 32 else 64;
boolean isBefore = FALSE;

Assembler Symbols

<R> Is a width specifier, encoded in “size”:

size <R>
01 W
x0 W
11 X

<dn> Is the number [0-30] of the source and destination general-purpose register or the name ZR (31),
encoded in the "Rdn" field.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Zm> Is the name of the source scalable vector register, encoded in the "Zm" field.

<T> Is the size specifier, encoded in “size”:

size <T>
00 B
01 H
10 S
11 D

CLASTA (scalar) Page 1755

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(esize) operand1 = X[dn];
bits(VL) operand2 = Z[m];
bits(csize) result;
integer last = LastActiveElement(mask, esize);

if last < 0 then

result = ZeroExtend(operand1);
else
if !isBefore then
last = last + 1;
if last >= elements then last = 0;
result = ZeroExtend(Elem[operand2, last, esize]);

X[dn] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

CLASTA (scalar) Page 1756

CLASTA (SIMD&FP scalar)

Conditionally extract element after last to SIMD&FP scalar register

From the source vector register extract the element after the last active element, or if the last active element is the
final element extract element zero, and then zero-extend that element to destructively place in the destination and
first source SIMD & floating-point scalar register. If there are no active elements then destructively zero-extend the
least significant element-size bits of the destination and first source SIMD & floating-point scalar register.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 size 1 0 1 0 1 0 1 0 0 Pg Zm Vdn
B

CLASTA <V><dn>, <Pg>, <V><dn>, <Zm>.<T>

if !HaveSVE() then UNDEFINED;

integer esize = 8 << UInt(size);
integer g = UInt(Pg);
integer dn = UInt(Vdn);
integer m = UInt(Zm);
boolean isBefore = FALSE;

Assembler Symbols

<V> Is a width specifier, encoded in “size”:

size <V>
00 B
01 H
10 S
11 D

<dn> Is the number [0-31] of the source and destination SIMD&FP register, encoded in the "Vdn" field.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Zm> Is the name of the source scalable vector register, encoded in the "Zm" field.

<T> Is the size specifier, encoded in “size”:

size <T>
00 B
01 H
10 S
11 D

CLASTA (SIMD&FP scalar) Page 1757

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(esize) operand1 = V[dn];
bits(VL) operand2 = Z[m];
bits(esize) result;
integer last = LastActiveElement(mask, esize);

if last < 0 then

result = ZeroExtend(operand1);
else
if !isBefore then
last = last + 1;
if last >= elements then last = 0;
result = Elem[operand2, last, esize];

V[dn] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

CLASTA (SIMD&FP scalar) Page 1758

CLASTA (vectors)

Conditionally extract element after last to vector register

From the second source vector register extract the element after the last active element, or if the last active element
is the final element extract element zero, and then replicate that element to destructively fill the destination and first
source vector.
If there are no active elements then leave the destination and source vector unmodified.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 size 1 0 1 0 0 0 1 0 0 Pg Zm Zdn
B

CLASTA <Zdn>.<T>, <Pg>, <Zdn>.<T>, <Zm>.<T>

if !HaveSVE() then UNDEFINED;

integer esize = 8 << UInt(size);
integer g = UInt(Pg);
integer dn = UInt(Zdn);
integer m = UInt(Zm);
boolean isBefore = FALSE;

Assembler Symbols

<Zdn> Is the name of the first source and destination scalable vector register, encoded in the "Zdn" field.

<T> Is the size specifier, encoded in “size”:

size <T>
00 B
01 H
10 S
11 D

<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Zm> Is the name of the second source scalable vector register, encoded in the "Zm" field.

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) operand1 = Z[dn];
bits(VL) operand2 = Z[m];
bits(VL) result;
integer last = LastActiveElement(mask, esize);

if last < 0 then

result = operand1;
else
if !isBefore then
last = last + 1;
if last >= elements then last = 0;
for e = 0 to elements-1
Elem[result, e, esize] = Elem[operand2, last, esize];

Z[dn] = result;

Operational information

CLASTA (vectors) Page 1759

• The MOVPRFX instruction must be unpredicated.
• The MOVPRFX instruction must specify the same destination register as this instruction.
• The destination register must not refer to architectural register state referenced by any other source operand
register of this instruction.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

CLASTA (vectors) Page 1760

CLASTB (scalar)

Conditionally extract last element to general-purpose register

From the source vector register extract the last active element, and then zero-extend that element to destructively
place in the destination and first source general-purpose register. If there are no active elements then destructively
zero-extend the least significant element-size bits of the destination and first source general-purpose register.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 size 1 1 0 0 0 1 1 0 1 Pg Zm Rdn
B

CLASTB <R><dn>, <Pg>, <R><dn>, <Zm>.<T>

if !HaveSVE() then UNDEFINED;

integer esize = 8 << UInt(size);
integer g = UInt(Pg);
integer dn = UInt(Rdn);
integer m = UInt(Zm);
integer csize = if esize < 64 then 32 else 64;
boolean isBefore = TRUE;

Assembler Symbols

<R> Is a width specifier, encoded in “size”:

size <R>
01 W
x0 W
11 X

<T> Is the size specifier, encoded in “size”:

size <T>
00 B
01 H
10 S
11 D

CLASTB (scalar) Page 1761

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(esize) operand1 = X[dn];
bits(VL) operand2 = Z[m];
bits(csize) result;
integer last = LastActiveElement(mask, esize);

if last < 0 then

result = ZeroExtend(operand1);
else
if !isBefore then
last = last + 1;
if last >= elements then last = 0;
result = ZeroExtend(Elem[operand2, last, esize]);

X[dn] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

CLASTB (scalar) Page 1762

CLASTB (SIMD&FP scalar)

Conditionally extract last element to SIMD&FP scalar register

From the source vector register extract the last active element, and then zero-extend that element to destructively
place in the destination and first source SIMD & floating-point scalar register. If there are no active elements then
destructively zero-extend the least significant element-size bits of the destination and first source SIMD & floating-
point scalar register.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 size 1 0 1 0 1 1 1 0 0 Pg Zm Vdn
B

CLASTB <V><dn>, <Pg>, <V><dn>, <Zm>.<T>

if !HaveSVE() then UNDEFINED;

integer esize = 8 << UInt(size);
integer g = UInt(Pg);
integer dn = UInt(Vdn);
integer m = UInt(Zm);
boolean isBefore = TRUE;

Assembler Symbols

<V> Is a width specifier, encoded in “size”:

size <V>
00 B
01 H
10 S
11 D

<T> Is the size specifier, encoded in “size”:

size <T>
00 B
01 H
10 S
11 D

CLASTB (SIMD&FP scalar) Page 1763

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(esize) operand1 = V[dn];
bits(VL) operand2 = Z[m];
bits(esize) result;
integer last = LastActiveElement(mask, esize);

if last < 0 then

result = ZeroExtend(operand1);
else
if !isBefore then
last = last + 1;
if last >= elements then last = 0;
result = Elem[operand2, last, esize];

V[dn] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

CLASTB (SIMD&FP scalar) Page 1764

CLASTB (vectors)

Conditionally extract last element to vector register

From the second source vector register extract the last active element, and then replicate that element to
destructively fill the destination and first source vector.
If there are no active elements then leave the destination and source vector unmodified.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 size 1 0 1 0 0 1 1 0 0 Pg Zm Zdn
B

CLASTB <Zdn>.<T>, <Pg>, <Zdn>.<T>, <Zm>.<T>

if !HaveSVE() then UNDEFINED;

integer esize = 8 << UInt(size);
integer g = UInt(Pg);
integer dn = UInt(Zdn);
integer m = UInt(Zm);
boolean isBefore = TRUE;

Assembler Symbols

<Zdn> Is the name of the first source and destination scalable vector register, encoded in the "Zdn" field.

<T> Is the size specifier, encoded in “size”:

size <T>
00 B
01 H
10 S
11 D

<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Zm> Is the name of the second source scalable vector register, encoded in the "Zm" field.

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) operand1 = Z[dn];
bits(VL) operand2 = Z[m];
bits(VL) result;
integer last = LastActiveElement(mask, esize);

if last < 0 then

result = operand1;
else
if !isBefore then
last = last + 1;
if last >= elements then last = 0;
for e = 0 to elements-1
Elem[result, e, esize] = Elem[operand2, last, esize];

Z[dn] = result;

Operational information

CLASTB (vectors) Page 1765

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

CLASTB (vectors) Page 1766

CLS

Count leading sign bits (predicated)

Count leading sign bits in each active element of the source vector, and place the results in the corresponding
elements of the destination vector. Inactive elements in the destination vector register remain unmodified.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 0 1 1 0 0 0 1 0 1 Pg Zn Zd

CLS <Zd>.<T>, <Pg>/M, <Zn>.<T>

if !HaveSVE() then UNDEFINED;

integer esize = 8 << UInt(size);
integer g = UInt(Pg);
integer n = UInt(Zn);
integer d = UInt(Zd);

Assembler Symbols

<Zd> Is the name of the destination scalable vector register, encoded in the "Zd" field.

<T> Is the size specifier, encoded in “size”:

size <T>
00 B
01 H
10 S
11 D

<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Zn> Is the name of the source scalable vector register, encoded in the "Zn" field.

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) operand = if AnyActiveElement(mask, esize) then Z[n] else Zeros();
bits(VL) result = Z[d];

for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
bits(esize) element = Elem[operand, e, esize];
Elem[result, e, esize] = CountLeadingSignBits(element)<esize-1:0>;

Z[d] = result;

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

CLS Page 1767

CLZ

Count leading zero bits (predicated)

Count leading zero bits in each active element of the source vector, and place the results in the corresponding
elements of the destination vector. Inactive elements in the destination vector register remain unmodified.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 0 1 1 0 0 1 1 0 1 Pg Zn Zd

CLZ <Zd>.<T>, <Pg>/M, <Zn>.<T>

if !HaveSVE() then UNDEFINED;

integer esize = 8 << UInt(size);
integer g = UInt(Pg);
integer n = UInt(Zn);
integer d = UInt(Zd);

Assembler Symbols

<Zd> Is the name of the destination scalable vector register, encoded in the "Zd" field.

<T> Is the size specifier, encoded in “size”:

size <T>
00 B
01 H
10 S
11 D

<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Zn> Is the name of the source scalable vector register, encoded in the "Zn" field.

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) operand = if AnyActiveElement(mask, esize) then Z[n] else Zeros();
bits(VL) result = Z[d];

for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
bits(esize) element = Elem[operand, e, esize];
Elem[result, e, esize] = CountLeadingZeroBits(element)<esize-1:0>;

Z[d] = result;

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

CLZ Page 1768

CMP<cc> (immediate)

Compare vector to immediate

Compare active integer elements in the source vector with an immediate, and place the boolean results of the
specified comparison in the corresponding elements of the destination predicate. Inactive elements in the destination
predicate register are set to zero. Sets the FIRST (N), NONE (Z), !LAST (C) condition flags based on the predicate result,
and the V flag to zero.
The <cc> symbol specifies one of the standard ARM condition codes: EQ, GE, GT, HI, HS, LE, LO, LS, LT or NE.
It has encodings from 10 classes: Equal , Greater than , Greater than or equal , Higher , Higher or same , Less than ,
Less than or equal , Lower , Lower or same and Not equal

Equal

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 size 0 imm5 1 0 0 Pg Zn 0 Pd
ne

CMPEQ <Pd>.<T>, <Pg>/Z, <Zn>.<T>, #<imm>

if !HaveSVE() then UNDEFINED;

integer esize = 8 << UInt(size);
integer g = UInt(Pg);
integer n = UInt(Zn);
integer d = UInt(Pd);
SVECmp op = Cmp_EQ;
integer imm = SInt(imm5);
boolean unsigned = FALSE;

Greater than

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 size 0 imm5 0 0 0 Pg Zn 1 Pd
lt ne

CMPGT <Pd>.<T>, <Pg>/Z, <Zn>.<T>, #<imm>

if !HaveSVE() then UNDEFINED;

integer esize = 8 << UInt(size);
integer g = UInt(Pg);
integer n = UInt(Zn);
integer d = UInt(Pd);
SVECmp op = Cmp_GT;
integer imm = SInt(imm5);
boolean unsigned = FALSE;

Greater than or equal

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 size 0 imm5 0 0 0 Pg Zn 0 Pd
lt ne

CMP<cc> (immediate) Page 1769

CMPGE <Pd>.<T>, <Pg>/Z, <Zn>.<T>, #<imm>

if !HaveSVE() then UNDEFINED;

integer esize = 8 << UInt(size);
integer g = UInt(Pg);
integer n = UInt(Zn);
integer d = UInt(Pd);
SVECmp op = Cmp_GE;
integer imm = SInt(imm5);
boolean unsigned = FALSE;

Higher

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 0 size 1 imm7 0 Pg Zn 1 Pd
lt ne

CMPHI <Pd>.<T>, <Pg>/Z, <Zn>.<T>, #<imm>

if !HaveSVE() then UNDEFINED;

integer esize = 8 << UInt(size);
integer g = UInt(Pg);
integer n = UInt(Zn);
integer d = UInt(Pd);
SVECmp op = Cmp_GT;
integer imm = UInt(imm7);
boolean unsigned = TRUE;

Higher or same

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 0 size 1 imm7 0 Pg Zn 0 Pd
lt ne

CMPHS <Pd>.<T>, <Pg>/Z, <Zn>.<T>, #<imm>

if !HaveSVE() then UNDEFINED;

integer esize = 8 << UInt(size);
integer g = UInt(Pg);
integer n = UInt(Zn);
integer d = UInt(Pd);
SVECmp op = Cmp_GE;
integer imm = UInt(imm7);
boolean unsigned = TRUE;

Less than

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 size 0 imm5 0 0 1 Pg Zn 0 Pd
lt ne

CMPLT <Pd>.<T>, <Pg>/Z, <Zn>.<T>, #<imm>

if !HaveSVE() then UNDEFINED;

integer esize = 8 << UInt(size);
integer g = UInt(Pg);
integer n = UInt(Zn);
integer d = UInt(Pd);
SVECmp op = Cmp_LT;
integer imm = SInt(imm5);
boolean unsigned = FALSE;

CMP<cc> (immediate) Page 1770

Less than or equal

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 size 0 imm5 0 0 1 Pg Zn 1 Pd
lt ne

CMPLE <Pd>.<T>, <Pg>/Z, <Zn>.<T>, #<imm>

if !HaveSVE() then UNDEFINED;

integer esize = 8 << UInt(size);
integer g = UInt(Pg);
integer n = UInt(Zn);
integer d = UInt(Pd);
SVECmp op = Cmp_LE;
integer imm = SInt(imm5);
boolean unsigned = FALSE;

Lower

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 0 size 1 imm7 1 Pg Zn 0 Pd
lt ne

CMPLO <Pd>.<T>, <Pg>/Z, <Zn>.<T>, #<imm>

if !HaveSVE() then UNDEFINED;

integer esize = 8 << UInt(size);
integer g = UInt(Pg);
integer n = UInt(Zn);
integer d = UInt(Pd);
SVECmp op = Cmp_LT;
integer imm = UInt(imm7);
boolean unsigned = TRUE;

Lower or same

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 0 size 1 imm7 1 Pg Zn 1 Pd
lt ne

CMPLS <Pd>.<T>, <Pg>/Z, <Zn>.<T>, #<imm>

if !HaveSVE() then UNDEFINED;

integer esize = 8 << UInt(size);
integer g = UInt(Pg);
integer n = UInt(Zn);
integer d = UInt(Pd);
SVECmp op = Cmp_LE;
integer imm = UInt(imm7);
boolean unsigned = TRUE;

Not equal

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 size 0 imm5 1 0 0 Pg Zn 1 Pd
ne

CMP<cc> (immediate) Page 1771

CMPNE <Pd>.<T>, <Pg>/Z, <Zn>.<T>, #<imm>

if !HaveSVE() then UNDEFINED;

integer esize = 8 << UInt(size);
integer g = UInt(Pg);
integer n = UInt(Zn);
integer d = UInt(Pd);
SVECmp op = Cmp_NE;
integer imm = SInt(imm5);
boolean unsigned = FALSE;

Assembler Symbols

<Pd> Is the name of the destination scalable predicate register, encoded in the "Pd" field.

<T> Is the size specifier, encoded in “size”:

size <T>
00 B
01 H
10 S
11 D

<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Zn> Is the name of the source scalable vector register, encoded in the "Zn" field.
<imm> For the equal, greater than, greater than or equal, less than, less than or equal and not equal variant: is
the signed immediate operand, in the range -16 to 15, encoded in the "imm5" field.
For the higher, higher or same, lower and lower or same variant: is the unsigned immediate operand, in
the range 0 to 127, encoded in the "imm7" field.

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) operand1 = if AnyActiveElement(mask, esize) then Z[n] else Zeros();
bits(PL) result;

for e = 0 to elements-1
integer element1 = Int(Elem[operand1, e, esize], unsigned);
if ElemP[mask, e, esize] == '1' then
boolean cond;
case op of
when Cmp_EQ cond = element1 == imm;
when Cmp_NE cond = element1 != imm;
when Cmp_GE cond = element1 >= imm;
when Cmp_LT cond = element1 < imm;
when Cmp_GT cond = element1 > imm;
when Cmp_LE cond = element1 <= imm;
ElemP[result, e, esize] = if cond then '1' else '0';
else
ElemP[result, e, esize] = '0';

PSTATE.<N,Z,C,V> = PredTest(mask, result, esize);

P[d] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

CMP<cc> (immediate) Page 1772

CMP<cc> (vectors)

Compare vectors

Compare active integer elements in the first source vector with corresponding elements in the second source vector,
and place the boolean results of the specified comparison in the corresponding elements of the destination predicate.
Inactive elements in the destination predicate register are set to zero. Sets the FIRST (N), NONE (Z), !LAST (C) condition
flags based on the predicate result, and the V flag to zero.
The <cc> symbol specifies one of the standard ARM condition codes: EQ, GE, GT, HI, HS or NE.
This instruction is used by the pseudo-instructions CMPLE (vectors), CMPLO (vectors), CMPLS (vectors), and CMPLT
(vectors).
It has encodings from 6 classes: Equal , Greater than , Greater than or equal , Higher , Higher or same and Not equal

Equal

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 0 size 0 Zm 1 0 1 Pg Zn 0 Pd
ne

CMPEQ <Pd>.<T>, <Pg>/Z, <Zn>.<T>, <Zm>.<T>

if !HaveSVE() then UNDEFINED;

integer esize = 8 << UInt(size);
integer g = UInt(Pg);
integer n = UInt(Zn);
integer m = UInt(Zm);
integer d = UInt(Pd);
SVECmp op = Cmp_EQ;
boolean unsigned = FALSE;

Greater than

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 0 size 0 Zm 1 0 0 Pg Zn 1 Pd
ne

CMPGT <Pd>.<T>, <Pg>/Z, <Zn>.<T>, <Zm>.<T>

if !HaveSVE() then UNDEFINED;

integer esize = 8 << UInt(size);
integer g = UInt(Pg);
integer n = UInt(Zn);
integer m = UInt(Zm);
integer d = UInt(Pd);
SVECmp op = Cmp_GT;
boolean unsigned = FALSE;

Greater than or equal

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 0 size 0 Zm 1 0 0 Pg Zn 0 Pd
ne

CMP<cc> (vectors) Page 1773

CMPGE <Pd>.<T>, <Pg>/Z, <Zn>.<T>, <Zm>.<T>

if !HaveSVE() then UNDEFINED;

integer esize = 8 << UInt(size);
integer g = UInt(Pg);
integer n = UInt(Zn);
integer m = UInt(Zm);
integer d = UInt(Pd);
SVECmp op = Cmp_GE;
boolean unsigned = FALSE;

Higher

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 0 size 0 Zm 0 0 0 Pg Zn 1 Pd
ne

CMPHI <Pd>.<T>, <Pg>/Z, <Zn>.<T>, <Zm>.<T>

if !HaveSVE() then UNDEFINED;

integer esize = 8 << UInt(size);
integer g = UInt(Pg);
integer n = UInt(Zn);
integer m = UInt(Zm);
integer d = UInt(Pd);
SVECmp op = Cmp_GT;
boolean unsigned = TRUE;

Higher or same

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 0 size 0 Zm 0 0 0 Pg Zn 0 Pd
ne

CMPHS <Pd>.<T>, <Pg>/Z, <Zn>.<T>, <Zm>.<T>

if !HaveSVE() then UNDEFINED;

integer esize = 8 << UInt(size);
integer g = UInt(Pg);
integer n = UInt(Zn);
integer m = UInt(Zm);
integer d = UInt(Pd);
SVECmp op = Cmp_GE;
boolean unsigned = TRUE;

Not equal

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 0 size 0 Zm 1 0 1 Pg Zn 1 Pd
ne

CMPNE <Pd>.<T>, <Pg>/Z, <Zn>.<T>, <Zm>.<T>

if !HaveSVE() then UNDEFINED;

integer esize = 8 << UInt(size);
integer g = UInt(Pg);
integer n = UInt(Zn);
integer m = UInt(Zm);
integer d = UInt(Pd);
SVECmp op = Cmp_NE;
boolean unsigned = FALSE;

CMP<cc> (vectors) Page 1774

Assembler Symbols

<Pd> Is the name of the destination scalable predicate register, encoded in the "Pd" field.

<T> Is the size specifier, encoded in “size”:

size <T>
00 B
01 H
10 S
11 D

<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Zn> Is the name of the first source scalable vector register, encoded in the "Zn" field.
<Zm> Is the name of the second source scalable vector register, encoded in the "Zm" field.

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) operand1 = if AnyActiveElement(mask, esize) then Z[n] else Zeros();
bits(VL) operand2 = if AnyActiveElement(mask, esize) then Z[m] else Zeros();
bits(PL) result;

for e = 0 to elements-1
integer element1 = Int(Elem[operand1, e, esize], unsigned);
if ElemP[mask, e, esize] == '1' then
boolean cond;
integer element2 = Int(Elem[operand2, e, esize], unsigned);
case op of
when Cmp_EQ cond = element1 == element2;
when Cmp_NE cond = element1 != element2;
when Cmp_GE cond = element1 >= element2;
when Cmp_LT cond = element1 < element2;
when Cmp_GT cond = element1 > element2;
when Cmp_LE cond = element1 <= element2;
ElemP[result, e, esize] = if cond then '1' else '0';
else
ElemP[result, e, esize] = '0';

PSTATE.<N,Z,C,V> = PredTest(mask, result, esize);

P[d] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

CMP<cc> (vectors) Page 1775

CMP<cc> (wide elements)

Compare vector to 64-bit wide elements

Compare active integer elements in the first source vector with overlapping 64-bit doubleword elements in the second
source vector, and place the boolean results of the specified comparison in the corresponding elements of the
destination predicate. Inactive elements in the destination predicate register are set to zero. Sets the FIRST (N), NONE
(Z), !LAST (C) condition flags based on the predicate result, and the V flag to zero.
The <cc> symbol specifies one of the standard ARM condition codes: EQ, GE, GT, HI, HS, LE, LO, LS, LT or NE.
It has encodings from 10 classes: Equal , Greater than , Greater than or equal , Higher , Higher or same , Less than ,
Less than or equal , Lower , Lower or same and Not equal

Equal

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 0 size 0 Zm 0 0 1 Pg Zn 0 Pd
ne

CMPEQ <Pd>.<T>, <Pg>/Z, <Zn>.<T>, <Zm>.D

if !HaveSVE() then UNDEFINED;

if size == '11' then UNDEFINED;
integer esize = 8 << UInt(size);
integer g = UInt(Pg);
integer n = UInt(Zn);
integer m = UInt(Zm);
integer d = UInt(Pd);
SVECmp op = Cmp_EQ;
boolean unsigned = FALSE;

Greater than

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 0 size 0 Zm 0 1 0 Pg Zn 1 Pd
U lt ne

CMPGT <Pd>.<T>, <Pg>/Z, <Zn>.<T>, <Zm>.D

if !HaveSVE() then UNDEFINED;

if size == '11' then UNDEFINED;
integer esize = 8 << UInt(size);
integer g = UInt(Pg);
integer n = UInt(Zn);
integer m = UInt(Zm);
integer d = UInt(Pd);
SVECmp op = Cmp_GT;
boolean unsigned = FALSE;

Greater than or equal

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 0 size 0 Zm 0 1 0 Pg Zn 0 Pd
U lt ne

CMP<cc> (wide elements) Page 1776

CMPGE <Pd>.<T>, <Pg>/Z, <Zn>.<T>, <Zm>.D

if !HaveSVE() then UNDEFINED;

if size == '11' then UNDEFINED;
integer esize = 8 << UInt(size);
integer g = UInt(Pg);
integer n = UInt(Zn);
integer m = UInt(Zm);
integer d = UInt(Pd);
SVECmp op = Cmp_GE;
boolean unsigned = FALSE;

Higher

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 0 size 0 Zm 1 1 0 Pg Zn 1 Pd
U lt ne

CMPHI <Pd>.<T>, <Pg>/Z, <Zn>.<T>, <Zm>.D

if !HaveSVE() then UNDEFINED;

if size == '11' then UNDEFINED;
integer esize = 8 << UInt(size);
integer g = UInt(Pg);
integer n = UInt(Zn);
integer m = UInt(Zm);
integer d = UInt(Pd);
SVECmp op = Cmp_GT;
boolean unsigned = TRUE;

Higher or same

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 0 size 0 Zm 1 1 0 Pg Zn 0 Pd
U lt ne

CMPHS <Pd>.<T>, <Pg>/Z, <Zn>.<T>, <Zm>.D

if !HaveSVE() then UNDEFINED;

if size == '11' then UNDEFINED;
integer esize = 8 << UInt(size);
integer g = UInt(Pg);
integer n = UInt(Zn);
integer m = UInt(Zm);
integer d = UInt(Pd);
SVECmp op = Cmp_GE;
boolean unsigned = TRUE;

Less than

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 0 size 0 Zm 0 1 1 Pg Zn 0 Pd
U lt ne

CMP<cc> (wide elements) Page 1777

CMPLT <Pd>.<T>, <Pg>/Z, <Zn>.<T>, <Zm>.D

if !HaveSVE() then UNDEFINED;

if size == '11' then UNDEFINED;
integer esize = 8 << UInt(size);
integer g = UInt(Pg);
integer n = UInt(Zn);
integer m = UInt(Zm);
integer d = UInt(Pd);
SVECmp op = Cmp_LT;
boolean unsigned = FALSE;

Less than or equal

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 0 size 0 Zm 0 1 1 Pg Zn 1 Pd
U lt ne

CMPLE <Pd>.<T>, <Pg>/Z, <Zn>.<T>, <Zm>.D

if !HaveSVE() then UNDEFINED;

if size == '11' then UNDEFINED;
integer esize = 8 << UInt(size);
integer g = UInt(Pg);
integer n = UInt(Zn);
integer m = UInt(Zm);
integer d = UInt(Pd);
SVECmp op = Cmp_LE;
boolean unsigned = FALSE;

Lower

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 0 size 0 Zm 1 1 1 Pg Zn 0 Pd
U lt ne

CMPLO <Pd>.<T>, <Pg>/Z, <Zn>.<T>, <Zm>.D

if !HaveSVE() then UNDEFINED;

if size == '11' then UNDEFINED;
integer esize = 8 << UInt(size);
integer g = UInt(Pg);
integer n = UInt(Zn);
integer m = UInt(Zm);
integer d = UInt(Pd);
SVECmp op = Cmp_LT;
boolean unsigned = TRUE;

Lower or same

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 0 size 0 Zm 1 1 1 Pg Zn 1 Pd
U lt ne

CMP<cc> (wide elements) Page 1778

CMPLS <Pd>.<T>, <Pg>/Z, <Zn>.<T>, <Zm>.D

if !HaveSVE() then UNDEFINED;

if size == '11' then UNDEFINED;
integer esize = 8 << UInt(size);
integer g = UInt(Pg);
integer n = UInt(Zn);
integer m = UInt(Zm);
integer d = UInt(Pd);
SVECmp op = Cmp_LE;
boolean unsigned = TRUE;

Not equal

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 0 size 0 Zm 0 0 1 Pg Zn 1 Pd
ne

CMPNE <Pd>.<T>, <Pg>/Z, <Zn>.<T>, <Zm>.D

if !HaveSVE() then UNDEFINED;

if size == '11' then UNDEFINED;
integer esize = 8 << UInt(size);
integer g = UInt(Pg);
integer n = UInt(Zn);
integer m = UInt(Zm);
integer d = UInt(Pd);
SVECmp op = Cmp_NE;
boolean unsigned = FALSE;

Assembler Symbols

<Pd> Is the name of the destination scalable predicate register, encoded in the "Pd" field.

<T> Is the size specifier, encoded in “size”:

size <T>
00 B
01 H
10 S
11 RESERVED

CMP<cc> (wide elements) Page 1779

Operation

for e = 0 to elements-1
integer element1 = Int(Elem[operand1, e, esize], unsigned);
if ElemP[mask, e, esize] == '1' then
boolean cond;
integer element2 = Int(Elem[operand2, (e * esize) DIV 64, 64], unsigned);
case op of
when Cmp_EQ cond = element1 == element2;
when Cmp_NE cond = element1 != element2;
when Cmp_GE cond = element1 >= element2;
when Cmp_LT cond = element1 < element2;
when Cmp_GT cond = element1 > element2;
when Cmp_LE cond = element1 <= element2;
ElemP[result, e, esize] = if cond then '1' else '0';
else
ElemP[result, e, esize] = '0';

PSTATE.<N,Z,C,V> = PredTest(mask, result, esize);

P[d] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

CMP<cc> (wide elements) Page 1780

CMPLE (vectors)

Compare signed less than or equal to vector, setting the condition flags

Compare active signed integer elements in the first source vector being less than or equal to corresponding signed
elements in the second source vector, and place the boolean results of the comparison in the corresponding elements
of the destination predicate. Inactive elements in the destination predicate register are set to zero. Sets the FIRST (N),
NONE (Z), !LAST (C) condition flags based on the predicate result, and the V flag to zero.

This is a pseudo-instruction of CMP<cc> (vectors). This means:

• The encodings in this description are named to match the encodings of CMP<cc> (vectors).
• The assembler syntax is used only for assembly, and is not used on disassembly.
• The description of CMP<cc> (vectors) gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 0 size 0 Zm 1 0 0 Pg Zn 0 Pd
ne

CMPLE <Pd>.<T>, <Pg>/Z, <Zm>.<T>, <Zn>.<T>

is equivalent to

CMPGE <Pd>.<T>, <Pg>/Z, <Zn>.<T>, <Zm>.<T>

Assembler Symbols

<Pd> Is the name of the destination scalable predicate register, encoded in the "Pd" field.
<Zm> Is the name of the second source scalable vector register, encoded in the "Zm" field.
<Zn> Is the name of the first source scalable vector register, encoded in the "Zn" field.

<T> Is the size specifier, encoded in “size”:

size <T>
00 B
01 H
10 S
11 D

<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.

Operation

The description of CMP<cc> (vectors) gives the operational pseudocode for this instruction.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

CMPLE (vectors) Page 1781

CMPLO (vectors)

Compare unsigned lower than vector, setting the condition flags

Compare active unsigned integer elements in the first source vector being lower than corresponding unsigned
elements in the second source vector, and place the boolean results of the comparison in the corresponding elements
of the destination predicate. Inactive elements in the destination predicate register are set to zero. Sets the FIRST (N),
NONE (Z), !LAST (C) condition flags based on the predicate result, and the V flag to zero.

This is a pseudo-instruction of CMP<cc> (vectors). This means:

CMPLO <Pd>.<T>, <Pg>/Z, <Zm>.<T>, <Zn>.<T>

is equivalent to

CMPHI <Pd>.<T>, <Pg>/Z, <Zn>.<T>, <Zm>.<T>

Assembler Symbols

<T> Is the size specifier, encoded in “size”:

size <T>
00 B
01 H
10 S
11 D

<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.

Operation

The description of CMP<cc> (vectors) gives the operational pseudocode for this instruction.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

CMPLO (vectors) Page 1782

CMPLS (vectors)

Compare unsigned lower or same as vector, setting the condition flags

Compare active unsigned integer elements in the first source vector being lower than or same as corresponding
unsigned elements in the second source vector, and place the boolean results of the comparison in the corresponding
elements of the destination predicate. Inactive elements in the destination predicate register are set to zero. Sets the
FIRST (N), NONE (Z), !LAST (C) condition flags based on the predicate result, and the V flag to zero.

This is a pseudo-instruction of CMP<cc> (vectors). This means:

CMPLS <Pd>.<T>, <Pg>/Z, <Zm>.<T>, <Zn>.<T>

is equivalent to

CMPHS <Pd>.<T>, <Pg>/Z, <Zn>.<T>, <Zm>.<T>

Assembler Symbols

<T> Is the size specifier, encoded in “size”:

size <T>
00 B
01 H
10 S
11 D

<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.

Operation

The description of CMP<cc> (vectors) gives the operational pseudocode for this instruction.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

CMPLS (vectors) Page 1783

CMPLT (vectors)

Compare signed less than vector, setting the condition flags

Compare active signed integer elements in the first source vector being less than corresponding signed elements in
the second source vector, and place the boolean results of the comparison in the corresponding elements of the
destination predicate. Inactive elements in the destination predicate register are set to zero. Sets the FIRST (N), NONE
(Z), !LAST (C) condition flags based on the predicate result, and the V flag to zero.

This is a pseudo-instruction of CMP<cc> (vectors). This means:

CMPLT <Pd>.<T>, <Pg>/Z, <Zm>.<T>, <Zn>.<T>

is equivalent to

CMPGT <Pd>.<T>, <Pg>/Z, <Zn>.<T>, <Zm>.<T>

Assembler Symbols

<T> Is the size specifier, encoded in “size”:

size <T>
00 B
01 H
10 S
11 D

<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.

Operation

The description of CMP<cc> (vectors) gives the operational pseudocode for this instruction.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

CMPLT (vectors) Page 1784

CNOT

Logically invert boolean condition in vector (predicated)

Logically invert the boolean value in each active element of the source vector, and place the results in the
corresponding elements of the destination vector. Inactive elements in the destination vector register remain
unmodified.
Boolean TRUE is any non-zero value in a source, and one in a result element. Boolean FALSE is always zero.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 0 1 1 0 1 1 1 0 1 Pg Zn Zd

CNOT <Zd>.<T>, <Pg>/M, <Zn>.<T>

if !HaveSVE() then UNDEFINED;

integer esize = 8 << UInt(size);
integer g = UInt(Pg);
integer n = UInt(Zn);
integer d = UInt(Zd);

Assembler Symbols

<Zd> Is the name of the destination scalable vector register, encoded in the "Zd" field.

<T> Is the size specifier, encoded in “size”:

size <T>
00 B
01 H
10 S
11 D

<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Zn> Is the name of the source scalable vector register, encoded in the "Zn" field.

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) operand = if AnyActiveElement(mask, esize) then Z[n] else Zeros();
bits(VL) result = Z[d];

for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
bits(esize) element = Elem[operand, e, esize];
Elem[result, e, esize] = ZeroExtend(IsZeroBit(element), esize);

Z[d] = result;

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

CNOT Page 1785

CNT

Count non-zero bits (predicated)

Count non-zero bits in each active element of the source vector, and place the results in the corresponding elements of
the destination vector. Inactive elements in the destination vector register remain unmodified.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 0 1 1 0 1 0 1 0 1 Pg Zn Zd

CNT <Zd>.<T>, <Pg>/M, <Zn>.<T>

if !HaveSVE() then UNDEFINED;

integer esize = 8 << UInt(size);
integer g = UInt(Pg);
integer n = UInt(Zn);
integer d = UInt(Zd);

Assembler Symbols

<Zd> Is the name of the destination scalable vector register, encoded in the "Zd" field.

<T> Is the size specifier, encoded in “size”:

size <T>
00 B
01 H
10 S
11 D

<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Zn> Is the name of the source scalable vector register, encoded in the "Zn" field.

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) operand = if AnyActiveElement(mask, esize) then Z[n] else Zeros();
bits(VL) result = Z[d];

for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
bits(esize) element = Elem[operand, e, esize];
Elem[result, e, esize] = BitCount(element)<esize-1:0>;

Z[d] = result;

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

CNT Page 1786

CNTB, CNTD, CNTH, CNTW

Set scalar to multiple of predicate constraint element count

Determines the number of active elements implied by the named predicate constraint, multiplies that by an immediate
in the range 1 to 16 inclusive, and then places the result in the scalar destination.
The named predicate constraint limits the number of active elements in a single predicate to:
* A fixed number (VL1 to VL256)
* The largest power of two (POW2)
* The largest multiple of three or four (MUL3 or MUL4)
* All available, implicitly a multiple of two (ALL).
Unspecified or out of range constraint encodings generate an empty predicate or zero element count rather than
Undefined Instruction exception.
It has encodings from 4 classes: Byte , Doubleword , Halfword and Word

Byte

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 0 0 1 0 imm4 1 1 1 0 0 0 pattern Rd
size<1>size<0>

CNTB <Xd>{, <pattern>{, MUL #<imm>}}

if !HaveSVE() then UNDEFINED;

integer esize = 8;
integer d = UInt(Rd);
bits(5) pat = pattern;
integer imm = UInt(imm4) + 1;

Doubleword

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 1 1 1 0 imm4 1 1 1 0 0 0 pattern Rd
size<1>size<0>

CNTD <Xd>{, <pattern>{, MUL #<imm>}}

if !HaveSVE() then UNDEFINED;

integer esize = 64;
integer d = UInt(Rd);
bits(5) pat = pattern;
integer imm = UInt(imm4) + 1;

Halfword

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 0 1 1 0 imm4 1 1 1 0 0 0 pattern Rd
size<1>size<0>

CNTH <Xd>{, <pattern>{, MUL #<imm>}}

if !HaveSVE() then UNDEFINED;

integer esize = 16;
integer d = UInt(Rd);
bits(5) pat = pattern;
integer imm = UInt(imm4) + 1;

CNTB, CNTD, CNTH, CNTW Page 1787

Word

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 1 0 1 0 imm4 1 1 1 0 0 0 pattern Rd
size<1>size<0>

CNTW <Xd>{, <pattern>{, MUL #<imm>}}

if !HaveSVE() then UNDEFINED;

integer esize = 32;
integer d = UInt(Rd);
bits(5) pat = pattern;
integer imm = UInt(imm4) + 1;

Assembler Symbols

<Xd> Is the 64-bit name of the destination general-purpose register, encoded in the "Rd" field.

<pattern> Is the optional pattern specifier, defaulting to ALL, encoded in “pattern”:

pattern <pattern>
00000 POW2
00001 VL1
00010 VL2
00011 VL3
00100 VL4
00101 VL5
00110 VL6
00111 VL7
01000 VL8
01001 VL16
01010 VL32
01011 VL64
01100 VL128
01101 VL256
0111x #uimm5
101x1 #uimm5
10110 #uimm5
1x0x1 #uimm5
1x010 #uimm5
1xx00 #uimm5
11101 MUL4
11110 MUL3
11111 ALL

<imm> Is the immediate multiplier, in the range 1 to 16, defaulting to 1, encoded in the "imm4" field.

Operation

CheckSVEEnabled();
integer count = DecodePredCount(pat, esize);

X[d] = (count * imm)<63:0>;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

CNTB, CNTD, CNTH, CNTW Page 1788

CNTP

Set scalar to count of true predicate elements

Counts the number of active and true elements in the source predicate and places the scalar result in the destination
general-purpose register. Inactive predicate elements are not counted.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 size 1 0 0 0 0 0 1 0 Pg 0 Pn Rd

CNTP <Xd>, <Pg>, <Pn>.<T>

if !HaveSVE() then UNDEFINED;

integer esize = 8 << UInt(size);
integer g = UInt(Pg);
integer n = UInt(Pn);
integer d = UInt(Rd);

Assembler Symbols

<Xd> Is the 64-bit name of the destination general-purpose register, encoded in the "Rd" field.
<Pg> Is the name of the governing scalable predicate register, encoded in the "Pg" field.
<Pn> Is the name of the source scalable predicate register, encoded in the "Pn" field.

<T> Is the size specifier, encoded in “size”:

size <T>
00 B
01 H
10 S
11 D

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(PL) operand = P[n];
bits(64) sum = Zeros();

for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' && ElemP[operand, e, esize] == '1' then
sum = sum + 1;
X[d] = sum;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

CNTP Page 1789

COMPACT

Shuffle active elements of vector to the right and fill with zero

Read the active elements from the source vector and pack them into the lowest-numbered elements of the destination
vector. Then set any remaining elements of the destination vector to zero.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 size 1 0 0 0 0 1 1 0 0 Pg Zn Zd

COMPACT <Zd>.<T>, <Pg>, <Zn>.<T>

if !HaveSVE() then UNDEFINED;

if size == '0x' then UNDEFINED;
integer esize = 8 << UInt(size);
integer g = UInt(Pg);
integer n = UInt(Zn);
integer d = UInt(Zd);

Assembler Symbols

<Zd> Is the name of the destination scalable vector register, encoded in the "Zd" field.

<T> Is the size specifier, encoded in “size<0>”:

size<0> <T>
0 S
1 D

<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Zn> Is the name of the source scalable vector register, encoded in the "Zn" field.

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) operand1 = if AnyActiveElement(mask, esize) then Z[n] else Zeros();
bits(VL) result = Zeros();
integer x = 0;

for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
bits(esize) element = Elem[operand1, e, esize];
Elem[result, x, esize] = element;
x = x + 1;

Z[d] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

COMPACT Page 1790

CPY (immediate, merging)

Copy signed integer immediate to vector elements (merging)

Copy a signed integer immediate to each active element in the destination vector. Inactive elements in the destination
vector register remain unmodified.
The immediate operand is a signed value in the range -128 to +127, and for element widths of 16 bits or higher it may
also be a signed multiple of 256 in the range -32768 to +32512 (excluding 0).
The immediate is encoded in 8 bits with an optional left shift by 8. The preferred disassembly when the shift option is
specified is "#<simm8>, LSL #8". However an assembler and disassembler may also allow use of the shifted 16-bit
value unless the immediate is 0 and the shift amount is 8, which must be unambiguously described as "#0, LSL #8".
This instruction is used by the alias MOV (immediate, predicated, merging).
This instruction is used by the pseudo-instruction FMOV (zero, predicated).
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 size 0 1 Pg 0 1 sh imm8 Zd
M

CPY <Zd>.<T>, <Pg>/M, #<imm>{, <shift>}

if !HaveSVE() then UNDEFINED;

if size:sh == '001' then UNDEFINED;
integer esize = 8 << UInt(size);
integer g = UInt(Pg);
integer d = UInt(Zd);
boolean merging = TRUE;
integer imm = SInt(imm8);
if sh == '1' then imm = imm << 8;

Assembler Symbols

<Zd> Is the name of the destination scalable vector register, encoded in the "Zd" field.

<T> Is the size specifier, encoded in “size”:

size <T>
00 B
01 H
10 S
11 D

<Pg> Is the name of the governing scalable predicate register, encoded in the "Pg" field.
<imm> Is a signed immediate in the range -128 to 127, encoded in the "imm8" field.

<shift> Is the optional left shift to apply to the immediate, defaulting to LSL #0 and encoded in “sh”:

sh <shift>
0 LSL #0
1 LSL #8

CPY (immediate, merging) Page 1791

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) dest = Z[d];
bits(VL) result;

for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
Elem[result, e, esize] = imm<esize-1:0>;
elsif merging then
Elem[result, e, esize] = Elem[dest, e, esize];
else
Elem[result, e, esize] = Zeros();

Z[d] = result;

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

CPY (immediate, merging) Page 1792

CPY (immediate, zeroing)

Copy signed integer immediate to vector elements (zeroing)

Copy a signed integer immediate to each active element in the destination vector. Inactive elements in the destination
vector register are set to zero.
The immediate operand is a signed value in the range -128 to +127, and for element widths of 16 bits or higher it may
also be a signed multiple of 256 in the range -32768 to +32512 (excluding 0).
The immediate is encoded in 8 bits with an optional left shift by 8. The preferred disassembly when the shift option is
specified is "#<simm8>, LSL #8". However an assembler and disassembler may also allow use of the shifted 16-bit
value unless the immediate is 0 and the shift amount is 8, which must be unambiguously described as "#0, LSL #8".
This instruction is used by the alias MOV (immediate, predicated, zeroing).
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 size 0 1 Pg 0 0 sh imm8 Zd
M

CPY <Zd>.<T>, <Pg>/Z, #<imm>{, <shift>}

if !HaveSVE() then UNDEFINED;

if size:sh == '001' then UNDEFINED;
integer esize = 8 << UInt(size);
integer g = UInt(Pg);
integer d = UInt(Zd);
boolean merging = FALSE;
integer imm = SInt(imm8);
if sh == '1' then imm = imm << 8;

Assembler Symbols

<Zd> Is the name of the destination scalable vector register, encoded in the "Zd" field.

<T> Is the size specifier, encoded in “size”:

size <T>
00 B
01 H
10 S
11 D

<Pg> Is the name of the governing scalable predicate register, encoded in the "Pg" field.
<imm> Is a signed immediate in the range -128 to 127, encoded in the "imm8" field.

<shift> Is the optional left shift to apply to the immediate, defaulting to LSL #0 and encoded in “sh”:

sh <shift>
0 LSL #0
1 LSL #8

CPY (immediate, zeroing) Page 1793

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) dest = Z[d];
bits(VL) result;

Z[d] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

CPY (immediate, zeroing) Page 1794

CPY (scalar)

Copy general-purpose register to vector elements (predicated)

Copy the general-purpose scalar source register to each active element in the destination vector. Inactive elements in
the destination vector register remain unmodified.
This instruction is used by the alias MOV (scalar, predicated).
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 size 1 0 1 0 0 0 1 0 1 Pg Rn Zd

CPY <Zd>.<T>, <Pg>/M, <R><n|SP>

if !HaveSVE() then UNDEFINED;

integer esize = 8 << UInt(size);
integer g = UInt(Pg);
integer n = UInt(Rn);
integer d = UInt(Zd);

Assembler Symbols

<Zd> Is the name of the destination scalable vector register, encoded in the "Zd" field.

<T> Is the size specifier, encoded in “size”:

size <T>
00 B
01 H
10 S
11 D

<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.

<R> Is a width specifier, encoded in “size”:

size <R>
01 W
x0 W
11 X

<n|SP> Is the number [0-30] of the general-purpose source register or the name SP (31), encoded in the "Rn"
field.

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) result = Z[d];
if AnyActiveElement(mask, esize) then
bits(64) operand1;
if n == 31 then
operand1 = SP[];
else
operand1 = X[n];

for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
Elem[result, e, esize] = operand1<esize-1:0>;

Z[d] = result;

CPY (scalar) Page 1795

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

CPY (scalar) Page 1796

CPY (SIMD&FP scalar)

Copy SIMD&FP scalar register to vector elements (predicated)

Copy the SIMD & floating-point scalar source register to each active element in the destination vector. Inactive
elements in the destination vector register remain unmodified.
This instruction is used by the alias MOV (SIMD&FP scalar, predicated).
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 size 1 0 0 0 0 0 1 0 0 Pg Vn Zd

CPY <Zd>.<T>, <Pg>/M, <V><n>

if !HaveSVE() then UNDEFINED;

integer esize = 8 << UInt(size);
integer g = UInt(Pg);
integer n = UInt(Vn);
integer d = UInt(Zd);

Assembler Symbols

<Zd> Is the name of the destination scalable vector register, encoded in the "Zd" field.

<T> Is the size specifier, encoded in “size”:

size <T>
00 B
01 H
10 S
11 D

<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.

<V> Is a width specifier, encoded in “size”:

size <V>
00 B
01 H
10 S
11 D

<n> Is the number [0-31] of the source SIMD&FP register, encoded in the "Vn" field.

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(esize) operand1 = if AnyActiveElement(mask, esize) then V[n] else Zeros();
bits(VL) result = Z[d];

for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
Elem[result, e, esize] = operand1;

Z[d] = result;

Operational information

CPY (SIMD&FP scalar) Page 1797

• The MOVPRFX instruction must be unpredicated, or be predicated using the same governing predicate register
and source element size as this instruction.
• The MOVPRFX instruction must specify the same destination register as this instruction.
• The destination register must not refer to architectural register state referenced by any other source operand
register of this instruction.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

CPY (SIMD&FP scalar) Page 1798

CTERMEQ, CTERMNE

Compare and terminate loop

Detect termination conditions in serialized vector loops. Tests whether the comparison between the scalar source
operands holds true and if not tests the state of the !LAST condition flag (C) which indicates whether the previous flag-
setting predicate instruction selected the last element of the vector partition.
The Z and C condition flags are preserved by this instruction. The N and V condition flags are set as a pair to generate
one of the following conditions for a subsequent conditional instruction:
* GE (N=0 & V=0): continue loop (compare failed and last element not selected);
* LT (N=0 & V=1): terminate loop (last element selected);
* LT (N=1 & V=0): terminate loop (compare succeeded);
The scalar source operands are 32-bit or 64-bit general-purpose registers of the same size.
It has encodings from 2 classes: Equal and Not equal

Equal

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 1 sz 1 Rm 0 0 1 0 0 0 Rn 0 0 0 0 0
ne

CTERMEQ <R><n>, <R><m>

if !HaveSVE() then UNDEFINED;

integer esize = 32 << UInt(sz);
integer n = UInt(Rn);
integer m = UInt(Rm);
SVECmp op = Cmp_EQ;

Not equal

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 1 sz 1 Rm 0 0 1 0 0 0 Rn 1 0 0 0 0
ne

CTERMNE <R><n>, <R><m>

if !HaveSVE() then UNDEFINED;

integer esize = 32 << UInt(sz);
integer n = UInt(Rn);
integer m = UInt(Rm);
SVECmp op = Cmp_NE;

Assembler Symbols

<R> Is a width specifier, encoded in “sz”:

sz <R>
0 W
1 X

<n> Is the number [0-30] of the source general-purpose register or the name ZR (31), encoded in the "Rn"
field.
<m> Is the number [0-30] of the source general-purpose register or the name ZR (31), encoded in the "Rm"
field.

CTERMEQ, CTERMNE Page 1799

Operation

CheckSVEEnabled();
bits(esize) operand1 = X[n];
bits(esize) operand2 = X[m];
integer element1 = UInt(operand1);
integer element2 = UInt(operand2);
boolean term;

case op of
when Cmp_EQ term = element1 == element2;
when Cmp_NE term = element1 != element2;
if term then
PSTATE.N = '1';
PSTATE.V = '0';
else
PSTATE.N = '0';
PSTATE.V = (NOT PSTATE.C);

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

CTERMEQ, CTERMNE Page 1800

DECB, DECD, DECH, DECW (scalar)

Decrement scalar by multiple of predicate constraint element count

Determines the number of active elements implied by the named predicate constraint, multiplies that by an immediate
in the range 1 to 16 inclusive, and then uses the result to decrement the scalar destination.
The named predicate constraint limits the number of active elements in a single predicate to:
* A fixed number (VL1 to VL256)
* The largest power of two (POW2)
* The largest multiple of three or four (MUL3 or MUL4)
* All available, implicitly a multiple of two (ALL).
Unspecified or out of range constraint encodings generate an empty predicate or zero element count rather than
Undefined Instruction exception.
It has encodings from 4 classes: Byte , Doubleword , Halfword and Word

Byte

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 0 0 1 1 imm4 1 1 1 0 0 1 pattern Rdn
size<1>size<0> D

DECB <Xdn>{, <pattern>{, MUL #<imm>}}

if !HaveSVE() then UNDEFINED;

integer esize = 8;
integer dn = UInt(Rdn);
bits(5) pat = pattern;
integer imm = UInt(imm4) + 1;

Doubleword

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 1 1 1 1 imm4 1 1 1 0 0 1 pattern Rdn
size<1>size<0> D

DECD <Xdn>{, <pattern>{, MUL #<imm>}}

if !HaveSVE() then UNDEFINED;

integer esize = 64;
integer dn = UInt(Rdn);
bits(5) pat = pattern;
integer imm = UInt(imm4) + 1;

Halfword

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 0 1 1 1 imm4 1 1 1 0 0 1 pattern Rdn
size<1>size<0> D

DECH <Xdn>{, <pattern>{, MUL #<imm>}}

if !HaveSVE() then UNDEFINED;

integer esize = 16;
integer dn = UInt(Rdn);
bits(5) pat = pattern;
integer imm = UInt(imm4) + 1;

DECB, DECD, DECH, DECW

Page 1801
(scalar)
Word

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 1 0 1 1 imm4 1 1 1 0 0 1 pattern Rdn
size<1>size<0> D

DECW <Xdn>{, <pattern>{, MUL #<imm>}}

if !HaveSVE() then UNDEFINED;

integer esize = 32;
integer dn = UInt(Rdn);
bits(5) pat = pattern;
integer imm = UInt(imm4) + 1;

Assembler Symbols

<Xdn> Is the 64-bit name of the source and destination general-purpose register, encoded in the "Rdn" field.

<pattern> Is the optional pattern specifier, defaulting to ALL, encoded in “pattern”:

<imm> Is the immediate multiplier, in the range 1 to 16, defaulting to 1, encoded in the "imm4" field.

Operation

CheckSVEEnabled();
integer count = DecodePredCount(pat, esize);
bits(64) operand1 = X[dn];

X[dn] = operand1 - (count * imm);

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

DECB, DECD, DECH, DECW

Page 1802
(scalar)
DECD, DECH, DECW (vector)

Decrement vector by multiple of predicate constraint element count

Determines the number of active elements implied by the named predicate constraint, multiplies that by an immediate
in the range 1 to 16 inclusive, and then uses the result to decrement all destination vector elements.
The named predicate constraint limits the number of active elements in a single predicate to:
* A fixed number (VL1 to VL256)
* The largest power of two (POW2)
* The largest multiple of three or four (MUL3 or MUL4)
* All available, implicitly a multiple of two (ALL).
Unspecified or out of range constraint encodings generate an empty predicate or zero element count rather than
Undefined Instruction exception.
It has encodings from 3 classes: Doubleword , Halfword and Word

Doubleword

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 1 1 1 1 imm4 1 1 0 0 0 1 pattern Zdn
size<1>size<0> D

DECD <Zdn>.D{, <pattern>{, MUL #<imm>}}

if !HaveSVE() then UNDEFINED;

integer esize = 64;
integer dn = UInt(Zdn);
bits(5) pat = pattern;
integer imm = UInt(imm4) + 1;

Halfword

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 0 1 1 1 imm4 1 1 0 0 0 1 pattern Zdn
size<1>size<0> D

DECH <Zdn>.H{, <pattern>{, MUL #<imm>}}

if !HaveSVE() then UNDEFINED;

integer esize = 16;
integer dn = UInt(Zdn);
bits(5) pat = pattern;
integer imm = UInt(imm4) + 1;

Word

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 1 0 1 1 imm4 1 1 0 0 0 1 pattern Zdn
size<1>size<0> D

DECW <Zdn>.S{, <pattern>{, MUL #<imm>}}

if !HaveSVE() then UNDEFINED;

integer esize = 32;
integer dn = UInt(Zdn);
bits(5) pat = pattern;
integer imm = UInt(imm4) + 1;

DECD, DECH, DECW (vector) Page 1803

Assembler Symbols

<Zdn> Is the name of the source and destination scalable vector register, encoded in the "Zdn" field.

<pattern> Is the optional pattern specifier, defaulting to ALL, encoded in “pattern”:

<imm> Is the immediate multiplier, in the range 1 to 16, defaulting to 1, encoded in the "imm4" field.

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
integer count = DecodePredCount(pat, esize);
bits(VL) operand1 = Z[dn];
bits(VL) result;

for e = 0 to elements-1
Elem[result, e, esize] = Elem[operand1, e, esize] - (count * imm);

Z[dn] = result;

Operational information

This instruction might be immediately preceded in program order by a MOVPRFX instruction. The MOVPRFX instruction
must conform to all of the following requirements, otherwise the behavior of the MOVPRFX and this instruction is
UNPREDICTABLE:
• The MOVPRFX instruction must be unpredicated.
• The MOVPRFX instruction must specify the same destination register as this instruction.
• The destination register must not refer to architectural register state referenced by any other source operand
register of this instruction.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

DECD, DECH, DECW (vector) Page 1804

DECP (scalar)

Decrement scalar by count of true predicate elements

Counts the number of true elements in the source predicate and then uses the result to decrement the scalar
destination.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 size 1 0 1 1 0 1 1 0 0 0 1 0 0 Pm Rdn
D

DECP <Xdn>, <Pm>.<T>

if !HaveSVE() then UNDEFINED;

integer esize = 8 << UInt(size);
integer m = UInt(Pm);
integer dn = UInt(Rdn);

Assembler Symbols

<Xdn> Is the 64-bit name of the source and destination general-purpose register, encoded in the "Rdn" field.
<Pm> Is the name of the source scalable predicate register, encoded in the "Pm" field.

<T> Is the size specifier, encoded in “size”:

size <T>
00 B
01 H
10 S
11 D

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(64) operand1 = X[dn];
bits(PL) operand2 = P[m];
integer count = 0;

for e = 0 to elements-1
if ElemP[operand2, e, esize] == '1' then
count = count + 1;

X[dn] = operand1 - count;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

DECP (scalar) Page 1805

DECP (vector)

Decrement vector by count of true predicate elements

Counts the number of true elements in the source predicate and then uses the result to decrement all destination
vector elements.
The predicate size specifier may be omitted in assembler source code, but this is deprecated and will be prohibited in
a future release of the architecture.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 size 1 0 1 1 0 1 1 0 0 0 0 0 0 Pm Zdn
D

DECP <Zdn>.<T>, <Pm>.<T>

if !HaveSVE() then UNDEFINED;

if size == '00' then UNDEFINED;
integer esize = 8 << UInt(size);
integer m = UInt(Pm);
integer dn = UInt(Zdn);

Assembler Symbols

<Zdn> Is the name of the source and destination scalable vector register, encoded in the "Zdn" field.

<T> Is the size specifier, encoded in “size”:

size <T>
00 RESERVED
01 H
10 S
11 D

<Pm> Is the name of the source scalable predicate register, encoded in the "Pm" field.

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(VL) operand1 = Z[dn];
bits(PL) operand2 = P[m];
bits(VL) result;
integer count = 0;

for e = 0 to elements-1
if ElemP[operand2, e, esize] == '1' then
count = count + 1;

for e = 0 to elements-1
Elem[result, e, esize] = Elem[operand1, e, esize] - count;

Z[dn] = result;

Operational information

This instruction might be immediately preceded in program order by a MOVPRFX instruction. The MOVPRFX instruction
must conform to all of the following requirements, otherwise the behavior of the MOVPRFX and this instruction is
UNPREDICTABLE:
• The MOVPRFX instruction must be unpredicated.
• The MOVPRFX instruction must specify the same destination register as this instruction.
• The destination register must not refer to architectural register state referenced by any other source operand
register of this instruction.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

DECP (vector) Page 1806

DECP (vector) Page 1807

DUP (immediate)

Broadcast signed immediate to vector elements (unpredicated)

Unconditionally broadcast the signed integer immediate into each element of the destination vector. This instruction is
unpredicated.
The immediate operand is a signed value in the range -128 to +127, and for element widths of 16 bits or higher it may
also be a signed multiple of 256 in the range -32768 to +32512 (excluding 0).
The immediate is encoded in 8 bits with an optional left shift by 8. The preferred disassembly when the shift option is
specified is "#<simm8>, LSL #8". However an assembler and disassembler may also allow use of the shifted 16-bit
value unless the immediate is 0 and the shift amount is 8, which must be unambiguously described as "#0, LSL #8".
This instruction is used by the alias MOV (immediate, unpredicated).
This instruction is used by the pseudo-instruction FMOV (zero, unpredicated).
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 size 1 1 1 0 0 0 1 1 sh imm8 Zd

DUP <Zd>.<T>, #<imm>{, <shift>}

if !HaveSVE() then UNDEFINED;

if size:sh == '001' then UNDEFINED;
integer esize = 8 << UInt(size);
integer d = UInt(Zd);
integer imm = SInt(imm8);
if sh == '1' then imm = imm << 8;

Assembler Symbols

<Zd> Is the name of the destination scalable vector register, encoded in the "Zd" field.

<T> Is the size specifier, encoded in “size”:

size <T>
00 B
01 H
10 S
11 D

<imm> Is a signed immediate in the range -128 to 127, encoded in the "imm8" field.

<shift> Is the optional left shift to apply to the immediate, defaulting to LSL #0 and encoded in “sh”:

sh <shift>
0 LSL #0
1 LSL #8

Operation

CheckSVEEnabled();
bits(VL) result = Replicate(imm<esize-1:0>);
Z[d] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

DUP (immediate) Page 1808

DUP (indexed)

Broadcast indexed element to vector (unpredicated)

Unconditionally broadcast the indexed source vector element into each element of the destination vector. This
instruction is unpredicated.
The immediate element index is in the range of 0 to 63 (bytes), 31 (halfwords), 15 (words), 7 (doublewords) or 3
(quadwords). Selecting an element beyond the accessible vector length causes the destination vector to be set to zero.
This instruction is used by the alias MOV (SIMD&FP scalar, unpredicated).
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 imm2 1 tsz 0 0 1 0 0 0 Zn Zd

DUP <Zd>.<T>, <Zn>.<T>[<imm>]

if !HaveSVE() then UNDEFINED;

bits(7) imm = imm2:tsz;
integer esize;
integer index;
case tsz of
when '00000' UNDEFINED;
when '10000' esize = 128; index = UInt(imm<6:5>);
when 'x1000' esize = 64; index = UInt(imm<6:4>);
when 'xx100' esize = 32; index = UInt(imm<6:3>);
when 'xxx10' esize = 16; index = UInt(imm<6:2>);
when 'xxxx1' esize = 8; index = UInt(imm<6:1>);
integer n = UInt(Zn);
integer d = UInt(Zd);

Assembler Symbols

<Zd> Is the name of the destination scalable vector register, encoded in the "Zd" field.

<T> Is the size specifier, encoded in “tsz”:

tsz <T>
00000 RESERVED
xxxx1 B
xxx10 H
xx100 S
x1000 D
10000 Q

<Zn> Is the name of the source scalable vector register, encoded in the "Zn" field.
<imm> Is the immediate index, in the range 0 to one less than the number of elements in 512 bits, encoded in
"imm2:tsz".

Alias Conditions

Alias Is preferred when

MOV (SIMD&FP scalar, unpredicated) BitCount(imm2:tsz) == 1
MOV (SIMD&FP scalar, unpredicated) BitCount(imm2:tsz) > 1

DUP (indexed) Page 1809

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(VL) operand1 = Z[n];
bits(VL) result;
bits(esize) element;

if index >= elements then

element = Zeros();
else
element = Elem[operand1, index, esize];
result = Replicate(element);

Z[d] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

DUP (indexed) Page 1810

DUP (scalar)

Broadcast general-purpose register to vector elements (unpredicated)

Unconditionally broadcast the general-purpose scalar source register into each element of the destination vector. This
instruction is unpredicated.
This instruction is used by the alias MOV (scalar, unpredicated).
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 size 1 0 0 0 0 0 0 0 1 1 1 0 Rn Zd

DUP <Zd>.<T>, <R><n|SP>

if !HaveSVE() then UNDEFINED;

integer esize = 8 << UInt(size);
integer n = UInt(Rn);
integer d = UInt(Zd);

Assembler Symbols

<Zd> Is the name of the destination scalable vector register, encoded in the "Zd" field.

<T> Is the size specifier, encoded in “size”:

size <T>
00 B
01 H
10 S
11 D

<R> Is a width specifier, encoded in “size”:

size <R>
01 W
x0 W
11 X

<n|SP> Is the number [0-30] of the general-purpose source register or the name SP (31), encoded in the "Rn"
field.

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(64) operand;
if n == 31 then
operand = SP[];
else
operand = X[n];
bits(VL) result;

for e = 0 to elements-1
Elem[result, e, esize] = operand<esize-1:0>;

Z[d] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

DUP (scalar) Page 1811

DUPM

Broadcast logical bitmask immediate to vector (unpredicated)

Unconditionally broadcast the logical bitmask immediate into each element of the destination vector. This instruction
is unpredicated. The immediate is a 64-bit value consisting of a single run of ones or zeros repeating every 2, 4, 8, 16,
32 or 64 bits.
This instruction is used by the alias MOV.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 1 1 0 0 0 0 imm13 Zd

DUPM <Zd>.<T>, #<const>

if !HaveSVE() then UNDEFINED;

integer esize = 64;
integer d = UInt(Zd);
bits(esize) imm;
(imm, -) = DecodeBitMasks(imm13<12>, imm13<5:0>, imm13<11:6>, TRUE);

Assembler Symbols

<Zd> Is the name of the destination scalable vector register, encoded in the "Zd" field.

<T> Is the size specifier, encoded in “imm13<12>:imm13<5:0>”:

imm13<12> imm13<5:0> <T>

0 0xxxxx S
0 10xxxx H
0 110xxx B
0 1110xx B
0 11110x B
0 111110 RESERVED
0 111111 RESERVED
1 xxxxxx D

<const> Is a 64, 32, 16 or 8-bit bitmask consisting of replicated 2, 4, 8, 16, 32 or 64 bit fields, each field
containing a rotated run of non-zero bits, encoded in the "imm13" field.

Alias Conditions

Alias Is preferred when

MOV SVEMoveMaskPreferred(imm13)

Operation

CheckSVEEnabled();
bits(VL) result = Replicate(imm);
Z[d] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

DUPM Page 1812

EON

Bitwise exclusive OR with inverted immediate (unpredicated)

Bitwise exclusive OR an inverted immediate with each 64-bit element of the source vector, and destructively place the
results in the corresponding elements of the source vector. The immediate is a 64-bit value consisting of a single run of
ones or zeros repeating every 2, 4, 8, 16, 32 or 64 bits. This instruction is unpredicated.

This is a pseudo-instruction of EOR (immediate). This means:

• The encodings in this description are named to match the encodings of EOR (immediate).
• The assembler syntax is used only for assembly, and is not used on disassembly.
• The description of EOR (immediate) gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 0 1 0 0 0 0 imm13 Zdn

EON <Zdn>.<T>, <Zdn>.<T>, #<const>

is equivalent to

EOR <Zdn>.<T>, <Zdn>.<T>, #(-<const> - 1)

Assembler Symbols

<Zdn> Is the name of the source and destination scalable vector register, encoded in the "Zdn" field.

<T> Is the size specifier, encoded in “imm13<12>:imm13<5:0>”:

imm13<12> imm13<5:0> <T>

0 0xxxxx S
0 10xxxx H
0 110xxx B
0 1110xx B
0 11110x B
0 111110 RESERVED
0 111111 RESERVED
1 xxxxxx D

<const> Is a 64, 32, 16 or 8-bit bitmask consisting of replicated 2, 4, 8, 16, 32 or 64 bit fields, each field
containing a rotated run of non-zero bits, encoded in the "imm13" field.

Operation

The description of EOR (immediate) gives the operational pseudocode for this instruction.

Operational information

This instruction might be immediately preceded in program order by a MOVPRFX instruction. The MOVPRFX instruction
must conform to all of the following requirements, otherwise the behavior of the MOVPRFX and this instruction is
UNPREDICTABLE:
• The MOVPRFX instruction must be unpredicated.
• The MOVPRFX instruction must specify the same destination register as this instruction.
• The destination register must not refer to architectural register state referenced by any other source operand
register of this instruction.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

EON Page 1813

EOR (immediate)

Bitwise exclusive OR with immediate (unpredicated)

Bitwise exclusive OR an immediate with each 64-bit element of the source vector, and destructively place the results in
the corresponding elements of the source vector. The immediate is a 64-bit value consisting of a single run of ones or
zeros repeating every 2, 4, 8, 16, 32 or 64 bits. This instruction is unpredicated.
This instruction is used by the pseudo-instruction EON.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 0 1 0 0 0 0 imm13 Zdn

EOR <Zdn>.<T>, <Zdn>.<T>, #<const>

if !HaveSVE() then UNDEFINED;

integer dn = UInt(Zdn);
bits(64) imm;
(imm, -) = DecodeBitMasks(imm13<12>, imm13<5:0>, imm13<11:6>, TRUE);

Assembler Symbols

<Zdn> Is the name of the source and destination scalable vector register, encoded in the "Zdn" field.

<T> Is the size specifier, encoded in “imm13<12>:imm13<5:0>”:

imm13<12> imm13<5:0> <T>

0 0xxxxx S
0 10xxxx H
0 110xxx B
0 1110xx B
0 11110x B
0 111110 RESERVED
0 111111 RESERVED
1 xxxxxx D

<const> Is a 64, 32, 16 or 8-bit bitmask consisting of replicated 2, 4, 8, 16, 32 or 64 bit fields, each field
containing a rotated run of non-zero bits, encoded in the "imm13" field.

Operation

CheckSVEEnabled();
integer elements = VL DIV 64;
bits(VL) operand = Z[dn];
bits(VL) result;

for e = 0 to elements-1
bits(64) element1 = Elem[operand, e, 64];
Elem[result, e, 64] = element1 EOR imm;

Z[dn] = result;

Operational information

This instruction might be immediately preceded in program order by a MOVPRFX instruction. The MOVPRFX instruction
must conform to all of the following requirements, otherwise the behavior of the MOVPRFX and this instruction is
UNPREDICTABLE:
• The MOVPRFX instruction must be unpredicated.
• The MOVPRFX instruction must specify the same destination register as this instruction.
• The destination register must not refer to architectural register state referenced by any other source operand
register of this instruction.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

EOR (immediate) Page 1814

EOR (predicates)

Bitwise exclusive OR predicates

Bitwise exclusive OR active elements of the second source predicate with corresponding elements of the first source
predicate and place the results in the corresponding elements of the destination predicate. Inactive elements in the
destination predicate register are set to zero. Does not set the condition flags.
This instruction is used by the alias NOT (predicate).
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 0 0 0 0 Pm 0 1 Pg 1 Pn 0 Pd
S

EOR <Pd>.B, <Pg>/Z, <Pn>.B, <Pm>.B

if !HaveSVE() then UNDEFINED;

integer esize = 8;
integer g = UInt(Pg);
integer n = UInt(Pn);
integer m = UInt(Pm);
integer d = UInt(Pd);
boolean setflags = FALSE;

Assembler Symbols

Alias Conditions

Alias Is preferred when

NOT (predicate) Pm == Pg

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(PL) operand1 = P[n];
bits(PL) operand2 = P[m];
bits(PL) result;

if setflags then
PSTATE.<N,Z,C,V> = PredTest(mask, result, esize);
P[d] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

EOR (predicates) Page 1815

EOR (vectors, predicated)

Bitwise exclusive OR vectors (predicated)

Bitwise exclusive OR active elements of the second source vector with corresponding elements of the first source
vector and destructively place the results in the corresponding elements of the first source vector. Inactive elements in
the destination vector register remain unmodified.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 0 1 1 0 0 1 0 0 0 Pg Zm Zdn

EOR <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>

if !HaveSVE() then UNDEFINED;

integer esize = 8 << UInt(size);
integer g = UInt(Pg);
integer dn = UInt(Zdn);
integer m = UInt(Zm);

Assembler Symbols

<Zdn> Is the name of the first source and destination scalable vector register, encoded in the "Zdn" field.

<T> Is the size specifier, encoded in “size”:

size <T>
00 B
01 H
10 S
11 D

<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Zm> Is the name of the second source scalable vector register, encoded in the "Zm" field.

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) operand1 = Z[dn];
bits(VL) operand2 = if AnyActiveElement(mask, esize) then Z[m] else Zeros();
bits(VL) result;

for e = 0 to elements-1
bits(esize) element1 = Elem[operand1, e, esize];
bits(esize) element2 = Elem[operand2, e, esize];
if ElemP[mask, e, esize] == '1' then
Elem[result, e, esize] = element1 EOR element2;
else
Elem[result, e, esize] = Elem[operand1, e, esize];

Z[dn] = result;

Operational information

EOR (vectors, predicated) Page 1816

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

EOR (vectors, predicated) Page 1817

EOR (vectors, unpredicated)

Bitwise exclusive OR vectors (unpredicated)

Bitwise exclusive OR all elements of the second source vector with corresponding elements of the first source vector
and place the results in the corresponding elements of the destination vector. This instruction is unpredicated.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 1 0 1 Zm 0 0 1 1 0 0 Zn Zd

EOR <Zd>.D, <Zn>.D, <Zm>.D

if !HaveSVE() then UNDEFINED;

integer n = UInt(Zn);
integer m = UInt(Zm);
integer d = UInt(Zd);

Assembler Symbols

Operation

CheckSVEEnabled();
bits(VL) operand1 = Z[n];
bits(VL) operand2 = Z[m];

Z[d] = operand1 EOR operand2;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

EOR (vectors, unpredicated) Page 1818

EORS

Bitwise exclusive OR predicates, setting the condition flags

Bitwise exclusive OR active elements of the second source predicate with corresponding elements of the first source
predicate and place the results in the corresponding elements of the destination predicate. Inactive elements in the
destination predicate register are set to zero. Sets the FIRST (N), NONE (Z), !LAST (C) condition flags based on the
predicate result, and the V flag to zero.
This instruction is used by the alias NOTS.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 0 1 0 0 Pm 0 1 Pg 1 Pn 0 Pd
S

EORS <Pd>.B, <Pg>/Z, <Pn>.B, <Pm>.B

if !HaveSVE() then UNDEFINED;

integer esize = 8;
integer g = UInt(Pg);
integer n = UInt(Pn);
integer m = UInt(Pm);
integer d = UInt(Pd);
boolean setflags = TRUE;

Assembler Symbols

Alias Conditions

Alias Is preferred when

NOTS Pm == Pg

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(PL) operand1 = P[n];
bits(PL) operand2 = P[m];
bits(PL) result;

if setflags then
PSTATE.<N,Z,C,V> = PredTest(mask, result, esize);
P[d] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

EORS Page 1819

EORV

Bitwise exclusive OR reduction to scalar

Bitwise exclusive OR horizontally across all lanes of a vector, and place the result in the SIMD&FP scalar destination
register. Inactive elements in the source vector are treated as zero.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 0 1 1 0 0 1 0 0 1 Pg Zn Vd

EORV <V><d>, <Pg>, <Zn>.<T>

if !HaveSVE() then UNDEFINED;

integer esize = 8 << UInt(size);
integer g = UInt(Pg);
integer n = UInt(Zn);
integer d = UInt(Vd);

Assembler Symbols

<V> Is a width specifier, encoded in “size”:

size <V>
00 B
01 H
10 S
11 D

<T> Is the size specifier, encoded in “size”:

size <T>
00 B
01 H
10 S
11 D

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) operand = if AnyActiveElement(mask, esize) then Z[n] else Zeros();
bits(esize) result = Zeros(esize);

for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
result = result EOR Elem[operand, e, esize];

V[d] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

EORV Page 1820

EXT

Extract vector from pair of vectors

Copy the indexed byte up to the last byte of the first source vector to the bottom of the result vector, then fill the
remainder of the result starting from the first byte of the second source vector. The result is placed destructively in the
first source vector. This instruction is unpredicated.
An index that is greater than or equal to the vector length in bytes is treated as zero, leaving the destination and first
source vector unmodified.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 0 0 1 imm8h 0 0 0 imm8l Zm Zdn

EXT <Zdn>.B, <Zdn>.B, <Zm>.B, #<imm>

if !HaveSVE() then UNDEFINED;

integer esize = 8;
integer dn = UInt(Zdn);
integer m = UInt(Zm);
integer position = UInt(imm8h:imm8l);

Assembler Symbols

<Zdn> Is the name of the first source and destination scalable vector register, encoded in the "Zdn" field.
<Zm> Is the name of the second source scalable vector register, encoded in the "Zm" field.
<imm> Is the unsigned immediate operand, in the range 0 to 255, encoded in the "imm8h:imm8l" fields.

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(VL) operand1 = Z[dn];
bits(VL) operand2 = Z[m];
bits(VL) result;

if position >= elements then

position = 0;

position = position << 3;

bits(VL*2) concat = operand2 : operand1;
result = concat<(position+VL)-1:position>;

Z[dn] = result;

Operational information

This instruction might be immediately preceded in program order by a MOVPRFX instruction. The MOVPRFX instruction
must conform to all of the following requirements, otherwise the behavior of the MOVPRFX and this instruction is
UNPREDICTABLE:
• The MOVPRFX instruction must be unpredicated.
• The MOVPRFX instruction must specify the same destination register as this instruction.
• The destination register must not refer to architectural register state referenced by any other source operand
register of this instruction.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

EXT Page 1821

FABD

Floating-point absolute difference (predicated)

Compute the absolute difference of active floating-point elements of the second source vector and corresponding
floating-point elements of the first source vector and destructively place the result in the corresponding elements of
the first source vector. Inactive elements in the destination vector register remain unmodified.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 size 0 0 1 0 0 0 1 0 0 Pg Zm Zdn

FABD <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>

if !HaveSVE() then UNDEFINED;

if size == '00' then UNDEFINED;
integer esize = 8 << UInt(size);
integer g = UInt(Pg);
integer dn = UInt(Zdn);
integer m = UInt(Zm);

Assembler Symbols

<Zdn> Is the name of the first source and destination scalable vector register, encoded in the "Zdn" field.

<T> Is the size specifier, encoded in “size”:

size <T>
00 RESERVED
01 H
10 S
11 D

<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Zm> Is the name of the second source scalable vector register, encoded in the "Zm" field.

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) operand1 = Z[dn];
bits(VL) operand2 = if AnyActiveElement(mask, esize) then Z[m] else Zeros();
bits(VL) result;

for e = 0 to elements-1
bits(esize) element1 = Elem[operand1, e, esize];
if ElemP[mask, e, esize] == '1' then
bits(esize) element2 = Elem[operand2, e, esize];
Elem[result, e, esize] = FPAbs(FPSub(element1, element2, FPCR[]));
else
Elem[result, e, esize] = element1;

Z[dn] = result;

Operational information

FABD Page 1822

• The destination register must not refer to architectural register state referenced by any other source operand
register of this instruction.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

FABD Page 1823

FABS

Floating-point absolute value (predicated)

Take the absolute value of each active floating-point element of the source vector, and place the results in the
corresponding elements of the destination vector. This clears the sign bit and cannot signal a floating-point exception.
Inactive elements in the destination vector register remain unmodified.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 0 1 1 1 0 0 1 0 1 Pg Zn Zd

FABS <Zd>.<T>, <Pg>/M, <Zn>.<T>

if !HaveSVE() then UNDEFINED;

if size == '00' then UNDEFINED;
integer esize = 8 << UInt(size);
integer g = UInt(Pg);
integer n = UInt(Zn);
integer d = UInt(Zd);

Assembler Symbols

<Zd> Is the name of the destination scalable vector register, encoded in the "Zd" field.

<T> Is the size specifier, encoded in “size”:

size <T>
00 RESERVED
01 H
10 S
11 D

<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Zn> Is the name of the source scalable vector register, encoded in the "Zn" field.

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) operand = if AnyActiveElement(mask, esize) then Z[n] else Zeros();
bits(VL) result = Z[d];

for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
bits(esize) element = Elem[operand, e, esize];
Elem[result, e, esize] = FPAbs(element);

Z[d] = result;

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

FABS Page 1824

FAC<cc>

Floating-point absolute compare vectors

Compare active absolute values of floating-point elements in the first source vector with corresponding absolute
values of elements in the second source vector, and place the boolean results of the specified comparison in the
corresponding elements of the destination predicate. Inactive elements in the destination predicate register are set to
zero. Does not set the condition flags.
The <cc> symbol specifies one of the standard ARM condition codes: GE, GT, LE, or LT.
This instruction is used by the pseudo-instructions FACLE, and FACLT.
It has encodings from 2 classes: Greater than and Greater than or equal

Greater than

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 size 0 Zm 1 1 1 Pg Zn 1 Pd

FACGT <Pd>.<T>, <Pg>/Z, <Zn>.<T>, <Zm>.<T>

if !HaveSVE() then UNDEFINED;

if size == '00' then UNDEFINED;
integer esize = 8 << UInt(size);
integer g = UInt(Pg);
integer n = UInt(Zn);
integer m = UInt(Zm);
integer d = UInt(Pd);
SVECmp op = Cmp_GT;

Greater than or equal

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 size 0 Zm 1 1 0 Pg Zn 1 Pd

FACGE <Pd>.<T>, <Pg>/Z, <Zn>.<T>, <Zm>.<T>

if !HaveSVE() then UNDEFINED;

if size == '00' then UNDEFINED;
integer esize = 8 << UInt(size);
integer g = UInt(Pg);
integer n = UInt(Zn);
integer m = UInt(Zm);
integer d = UInt(Pd);
SVECmp op = Cmp_GE;

Assembler Symbols

<Pd> Is the name of the destination scalable predicate register, encoded in the "Pd" field.

<T> Is the size specifier, encoded in “size”:

size <T>
00 RESERVED
01 H
10 S
11 D

FAC<cc> Page 1825

Operation

P[d] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

FAC<cc> Page 1826

FACLE

Floating-point absolute compare less than or equal

Compare active absolute values of floating-point elements in the first source vector being less than or equal to
corresponding absolute values of elements in the second source vector, and place the boolean results of the
comparison in the corresponding elements of the destination predicate. Inactive elements in the destination predicate
register are set to zero. Does not set the condition flags.

This is a pseudo-instruction of FAC<cc>. This means:

• The encodings in this description are named to match the encodings of FAC<cc>.
• The assembler syntax is used only for assembly, and is not used on disassembly.
• The description of FAC<cc> gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 size 0 Zm 1 1 0 Pg Zn 1 Pd

FACLE <Pd>.<T>, <Pg>/Z, <Zm>.<T>, <Zn>.<T>

is equivalent to

FACGE <Pd>.<T>, <Pg>/Z, <Zn>.<T>, <Zm>.<T>

Assembler Symbols

<T> Is the size specifier, encoded in “size”:

size <T>
00 RESERVED
01 H
10 S
11 D

<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.

Operation

The description of FAC<cc> gives the operational pseudocode for this instruction.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

FACLE Page 1827

FACLT

Floating-point absolute compare less than

Compare active absolute values of floating-point elements in the first source vector being less than corresponding
absolute values of elements in the second source vector, and place the boolean results of the comparison in the
corresponding elements of the destination predicate. Inactive elements in the destination predicate register are set to
zero. Does not set the condition flags.

This is a pseudo-instruction of FAC<cc>. This means:

FACLT <Pd>.<T>, <Pg>/Z, <Zm>.<T>, <Zn>.<T>

is equivalent to

FACGT <Pd>.<T>, <Pg>/Z, <Zn>.<T>, <Zm>.<T>

Assembler Symbols

<T> Is the size specifier, encoded in “size”:

size <T>
00 RESERVED
01 H
10 S
11 D

<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.

Operation

The description of FAC<cc> gives the operational pseudocode for this instruction.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

FACLT Page 1828

FADD (immediate)

Floating-point add immediate (predicated)

Add an immediate to each active floating-point element of the source vector, and destructively place the results in the
corresponding elements of the source vector. The immediate may take the value +0.5 or +1.0 only. Inactive elements
in the destination vector register remain unmodified.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 size 0 1 1 0 0 0 1 0 0 Pg 0 0 0 0 i1 Zdn

FADD <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <const>

if !HaveSVE() then UNDEFINED;

if size == '00' then UNDEFINED;
integer esize = 8 << UInt(size);
integer g = UInt(Pg);
integer dn = UInt(Zdn);
bits(esize) imm = if i1 == '0' then FPPointFive('0') else FPOne('0');

Assembler Symbols

<Zdn> Is the name of the source and destination scalable vector register, encoded in the "Zdn" field.

<T> Is the size specifier, encoded in “size”:

size <T>
00 RESERVED
01 H
10 S
11 D

<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.

<const> Is the floating-point immediate value, encoded in “i1”:

i1 <const>
0 #0.5
1 #1.0

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) operand1 = Z[dn];
bits(VL) result;

for e = 0 to elements-1
bits(esize) element1 = Elem[operand1, e, esize];
if ElemP[mask, e, esize] == '1' then
Elem[result, e, esize] = FPAdd(element1, imm, FPCR[]);
else
Elem[result, e, esize] = element1;

Z[dn] = result;

Operational information

FADD (immediate) Page 1829

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

FADD (immediate) Page 1830

FADD (vectors, predicated)

Floating-point add vector (predicated)

Add active floating-point elements of the second source vector to corresponding floating-point elements of the first
source vector and destructively place the results in the corresponding elements of the first source vector. Inactive
elements in the destination vector register remain unmodified.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 size 0 0 0 0 0 0 1 0 0 Pg Zm Zdn

FADD <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>

if !HaveSVE() then UNDEFINED;

if size == '00' then UNDEFINED;
integer esize = 8 << UInt(size);
integer g = UInt(Pg);
integer dn = UInt(Zdn);
integer m = UInt(Zm);

Assembler Symbols

<Zdn> Is the name of the first source and destination scalable vector register, encoded in the "Zdn" field.

<T> Is the size specifier, encoded in “size”:

size <T>
00 RESERVED
01 H
10 S
11 D

<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Zm> Is the name of the second source scalable vector register, encoded in the "Zm" field.

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) operand1 = Z[dn];
bits(VL) operand2 = if AnyActiveElement(mask, esize) then Z[m] else Zeros();
bits(VL) result;

for e = 0 to elements-1
bits(esize) element1 = Elem[operand1, e, esize];
if ElemP[mask, e, esize] == '1' then
bits(esize) element2 = Elem[operand2, e, esize];
Elem[result, e, esize] = FPAdd(element1, element2, FPCR[]);
else
Elem[result, e, esize] = element1;

Z[dn] = result;

Operational information

FADD (vectors, predicated) Page 1831

• The destination register must not refer to architectural register state referenced by any other source operand
register of this instruction.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

FADD (vectors, predicated) Page 1832

FADD (vectors, unpredicated)

Floating-point add vector (unpredicated)

Add all floating-point elements of the second source vector to corresponding elements of the first source vector and
place the results in the corresponding elements of the destination vector. This instruction is unpredicated.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 size 0 Zm 0 0 0 0 0 0 Zn Zd

FADD <Zd>.<T>, <Zn>.<T>, <Zm>.<T>

if !HaveSVE() then UNDEFINED;

if size == '00' then UNDEFINED;
integer esize = 8 << UInt(size);
integer n = UInt(Zn);
integer m = UInt(Zm);
integer d = UInt(Zd);

Assembler Symbols

<Zd> Is the name of the destination scalable vector register, encoded in the "Zd" field.

<T> Is the size specifier, encoded in “size”:

size <T>
00 RESERVED
01 H
10 S
11 D

<Zn> Is the name of the first source scalable vector register, encoded in the "Zn" field.
<Zm> Is the name of the second source scalable vector register, encoded in the "Zm" field.

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(VL) operand1 = Z[n];
bits(VL) operand2 = Z[m];
bits(VL) result;

for e = 0 to elements-1
bits(esize) element1 = Elem[operand1, e, esize];
bits(esize) element2 = Elem[operand2, e, esize];
Elem[result, e, esize] = FPAdd(element1, element2, FPCR[]);

Z[d] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

FADD (vectors, unpredicated) Page 1833

FADDA

Floating-point add strictly-ordered reduction, accumulating in scalar

Floating-point add a SIMD&FP scalar source and all active lanes of the vector source and place the result
destructively in the SIMD&FP scalar source register. Vector elements are processed strictly in order from low to high,
with the scalar source providing the initial value. Inactive elements in the source vector are ignored.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 size 0 1 1 0 0 0 0 0 1 Pg Zm Vdn

FADDA <V><dn>, <Pg>, <V><dn>, <Zm>.<T>

if !HaveSVE() then UNDEFINED;

if size == '00' then UNDEFINED;
integer esize = 8 << UInt(size);
integer g = UInt(Pg);
integer dn = UInt(Vdn);
integer m = UInt(Zm);

Assembler Symbols

<V> Is a width specifier, encoded in “size”:

size <V>
00 RESERVED
01 H
10 S
11 D

<T> Is the size specifier, encoded in “size”:

size <T>
00 RESERVED
01 H
10 S
11 D

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(esize) operand1 = V[dn];
bits(VL) operand2 = if AnyActiveElement(mask, esize) then Z[m] else Zeros();
bits(esize) result = operand1;

for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
bits(esize) element = Elem[operand2, e, esize];
result = FPAdd(result, element, FPCR[]);

V[dn] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

FADDA Page 1834

FADDV

Floating-point add recursive reduction to scalar

Floating-point add horizontally over all lanes of a vector using a recursive pairwise reduction, and place the result in
the SIMD&FP scalar destination register. Inactive elements in the source vector are treated as +0.0.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 size 0 0 0 0 0 0 0 0 1 Pg Zn Vd

FADDV <V><d>, <Pg>, <Zn>.<T>

if !HaveSVE() then UNDEFINED;

if size == '00' then UNDEFINED;
integer esize = 8 << UInt(size);
integer g = UInt(Pg);
integer n = UInt(Zn);
integer d = UInt(Vd);

Assembler Symbols

<V> Is a width specifier, encoded in “size”:

size <V>
00 RESERVED
01 H
10 S
11 D

<T> Is the size specifier, encoded in “size”:

size <T>
00 RESERVED
01 H
10 S
11 D

Operation

CheckSVEEnabled();
bits(PL) mask = P[g];
bits(VL) operand = if AnyActiveElement(mask, esize) then Z[n] else Zeros();
bits(esize) identity = FPZero('0');

V[d] = ReducePredicated(ReduceOp_FADD, operand, mask, identity);

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

FADDV Page 1835

FCADD

Floating-point complex add with rotate (predicated)

Add the real and imaginary components of the active floating-point complex numbers from the first source vector to
the complex numbers from the second source vector which have first been rotated by 90 or 270 degrees in the
direction from the positive real axis towards the positive imaginary axis, when considered in polar representation,
equivalent to multiplying the complex numbers in the second source vector by ± J beforehand. Destructively place the
results in the corresponding elements of the first source vector. Inactive elements in the destination vector register
remain unmodified.
Each complex number is represented in a vector register as an even/odd pair of elements with the real part in the
even-numbered element and the imaginary part in the odd-numbered element.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 0 size 0 0 0 0 0 rot 1 0 0 Pg Zm Zdn

FCADD <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>, <const>

if !HaveSVE() then UNDEFINED;

if size == '00' then UNDEFINED;
integer esize = 8 << UInt(size);
integer g = UInt(Pg);
integer dn = UInt(Zdn);
integer m = UInt(Zm);
boolean sub_i = (rot == '0');
boolean sub_r = (rot == '1');

Assembler Symbols

<Zdn> Is the name of the first source and destination scalable vector register, encoded in the "Zdn" field.

<T> Is the size specifier, encoded in “size”:

size <T>
00 RESERVED
01 H
10 S
11 D

<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Zm> Is the name of the second source scalable vector register, encoded in the "Zm" field.

<const> Is the const specifier, encoded in “rot”:

rot <const>
0 #90
1 #270

FCADD Page 1836

Operation

CheckSVEEnabled();
integer pairs = VL DIV (2 * esize);
bits(PL) mask = P[g];
bits(VL) operand1 = Z[dn];
bits(VL) operand2 = if AnyActiveElement(mask, esize) then Z[m] else Zeros();
bits(VL) result;

for p = 0 to pairs-1
acc_r = Elem[operand1, 2 * p + 0, esize];
acc_i = Elem[operand1, 2 * p + 1, esize];
if ElemP[mask, 2 * p + 0, esize] == '1' then
elt2_i = Elem[operand2, 2 * p + 1, esize];
if sub_i then elt2_i = FPNeg(elt2_i);
acc_r = FPAdd(acc_r, elt2_i, FPCR[]);
if ElemP[mask, 2 * p + 1, esize] == '1' then
elt2_r = Elem[operand2, 2 * p + 0, esize];
if sub_r then elt2_r = FPNeg(elt2_r);
acc_i = FPAdd(acc_i, elt2_r, FPCR[]);
Elem[result, 2 * p + 0, esize] = acc_r;
Elem[result, 2 * p + 1, esize] = acc_i;

Z[dn] = result;

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

FCADD Page 1837

FCM<cc> (vectors)

Floating-point compare vectors

Compare active floating-point elements in the first source vector with corresponding elements in the second source
vector, and place the boolean results of the specified comparison in the corresponding elements of the destination
predicate. Inactive elements in the destination predicate register are set to zero. Does not set the condition flags.
The <cc> symbol specifies one of the standard ARM condition codes: EQ, GE, GT, or NE, with the addition of UO for
an unordered comparison.
This instruction is used by the pseudo-instructions FCMLE (vectors), and FCMLT (vectors).
It has encodings from 5 classes: Equal , Greater than , Greater than or equal , Not equal and Unordered

Equal

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 size 0 Zm 0 1 1 Pg Zn 0 Pd
cmph cmpl

FCMEQ <Pd>.<T>, <Pg>/Z, <Zn>.<T>, <Zm>.<T>

if !HaveSVE() then UNDEFINED;

if size == '00' then UNDEFINED;
integer esize = 8 << UInt(size);
integer g = UInt(Pg);
integer n = UInt(Zn);
integer m = UInt(Zm);
integer d = UInt(Pd);
SVECmp op = Cmp_EQ;

Greater than

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 size 0 Zm 0 1 0 Pg Zn 1 Pd
cmph cmpl

FCMGT <Pd>.<T>, <Pg>/Z, <Zn>.<T>, <Zm>.<T>

if !HaveSVE() then UNDEFINED;

if size == '00' then UNDEFINED;
integer esize = 8 << UInt(size);
integer g = UInt(Pg);
integer n = UInt(Zn);
integer m = UInt(Zm);
integer d = UInt(Pd);
SVECmp op = Cmp_GT;

Greater than or equal

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 size 0 Zm 0 1 0 Pg Zn 0 Pd
cmph cmpl

FCM<cc> (vectors) Page 1838

FCMGE <Pd>.<T>, <Pg>/Z, <Zn>.<T>, <Zm>.<T>

if !HaveSVE() then UNDEFINED;

if size == '00' then UNDEFINED;
integer esize = 8 << UInt(size);
integer g = UInt(Pg);
integer n = UInt(Zn);
integer m = UInt(Zm);
integer d = UInt(Pd);
SVECmp op = Cmp_GE;

Not equal

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 size 0 Zm 0 1 1 Pg Zn 1 Pd
cmph cmpl

FCMNE <Pd>.<T>, <Pg>/Z, <Zn>.<T>, <Zm>.<T>

if !HaveSVE() then UNDEFINED;

if size == '00' then UNDEFINED;
integer esize = 8 << UInt(size);
integer g = UInt(Pg);
integer n = UInt(Zn);
integer m = UInt(Zm);
integer d = UInt(Pd);
SVECmp op = Cmp_NE;

Unordered

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 size 0 Zm 1 1 0 Pg Zn 0 Pd

FCMUO <Pd>.<T>, <Pg>/Z, <Zn>.<T>, <Zm>.<T>

if !HaveSVE() then UNDEFINED;

if size == '00' then UNDEFINED;
integer esize = 8 << UInt(size);
integer g = UInt(Pg);
integer n = UInt(Zn);
integer m = UInt(Zm);
integer d = UInt(Pd);
SVECmp op = Cmp_UN;

Assembler Symbols

<Pd> Is the name of the destination scalable predicate register, encoded in the "Pd" field.

<T> Is the size specifier, encoded in “size”:

size <T>
00 RESERVED
01 H
10 S
11 D

FCM<cc> (vectors) Page 1839

Operation

for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
bits(esize) element1 = Elem[operand1, e, esize];
bits(esize) element2 = Elem[operand2, e, esize];
boolean res;
case op of
when Cmp_EQ res = FPCompareEQ(element1, element2, FPCR[]);
when Cmp_GE res = FPCompareGE(element1, element2, FPCR[]);
when Cmp_GT res = FPCompareGT(element1, element2, FPCR[]);
when Cmp_UN res = FPCompareUN(element1, element2, FPCR[]);
when Cmp_NE res = FPCompareNE(element1, element2, FPCR[]);
when Cmp_LT res = FPCompareGT(element2, element1, FPCR[]);
when Cmp_LE res = FPCompareGE(element2, element1, FPCR[]);
ElemP[result, e, esize] = if res then '1' else '0';
else
ElemP[result, e, esize] = '0';

P[d] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

FCM<cc> (vectors) Page 1840

FCM<cc> (zero)

Floating-point compare vector with zero

Compare active floating-point elements in the source vector with zero, and place the boolean results of the specified
comparison in the corresponding elements of the destination predicate. Inactive elements in the destination predicate
register are set to zero. Does not set the condition flags.
The <cc> symbol specifies one of the standard ARM condition codes: EQ, GE, GT, LE, LT, or NE.
It has encodings from 6 classes: Equal , Greater than , Greater than or equal , Less than , Less than or equal and Not
equal

Equal

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 size 0 1 0 0 1 0 0 0 1 Pg Zn 0 Pd
eq lt ne

FCMEQ <Pd>.<T>, <Pg>/Z, <Zn>.<T>, #0.0

if !HaveSVE() then UNDEFINED;

if size == '00' then UNDEFINED;
integer esize = 8 << UInt(size);
integer g = UInt(Pg);
integer n = UInt(Zn);
integer d = UInt(Pd);
SVECmp op = Cmp_EQ;

Greater than

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 size 0 1 0 0 0 0 0 0 1 Pg Zn 1 Pd
eq lt ne

FCMGT <Pd>.<T>, <Pg>/Z, <Zn>.<T>, #0.0

if !HaveSVE() then UNDEFINED;

if size == '00' then UNDEFINED;
integer esize = 8 << UInt(size);
integer g = UInt(Pg);
integer n = UInt(Zn);
integer d = UInt(Pd);
SVECmp op = Cmp_GT;

Greater than or equal

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 size 0 1 0 0 0 0 0 0 1 Pg Zn 0 Pd
eq lt ne

FCMGE <Pd>.<T>, <Pg>/Z, <Zn>.<T>, #0.0

if !HaveSVE() then UNDEFINED;

if size == '00' then UNDEFINED;
integer esize = 8 << UInt(size);
integer g = UInt(Pg);
integer n = UInt(Zn);
integer d = UInt(Pd);
SVECmp op = Cmp_GE;

FCM<cc> (zero) Page 1841

Less than

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 size 0 1 0 0 0 1 0 0 1 Pg Zn 0 Pd
eq lt ne

FCMLT <Pd>.<T>, <Pg>/Z, <Zn>.<T>, #0.0

if !HaveSVE() then UNDEFINED;

if size == '00' then UNDEFINED;
integer esize = 8 << UInt(size);
integer g = UInt(Pg);
integer n = UInt(Zn);
integer d = UInt(Pd);
SVECmp op = Cmp_LT;

Less than or equal

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 size 0 1 0 0 0 1 0 0 1 Pg Zn 1 Pd
eq lt ne

FCMLE <Pd>.<T>, <Pg>/Z, <Zn>.<T>, #0.0

if !HaveSVE() then UNDEFINED;

if size == '00' then UNDEFINED;
integer esize = 8 << UInt(size);
integer g = UInt(Pg);
integer n = UInt(Zn);
integer d = UInt(Pd);
SVECmp op = Cmp_LE;

Not equal

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 size 0 1 0 0 1 1 0 0 1 Pg Zn 0 Pd
eq lt ne

FCMNE <Pd>.<T>, <Pg>/Z, <Zn>.<T>, #0.0

if !HaveSVE() then UNDEFINED;

if size == '00' then UNDEFINED;
integer esize = 8 << UInt(size);
integer g = UInt(Pg);
integer n = UInt(Zn);
integer d = UInt(Pd);
SVECmp op = Cmp_NE;

Assembler Symbols

<Pd> Is the name of the destination scalable predicate register, encoded in the "Pd" field.

<T> Is the size specifier, encoded in “size”:

size <T>
00 RESERVED
01 H
10 S
11 D

<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Zn> Is the name of the source scalable vector register, encoded in the "Zn" field.

FCM<cc> (zero) Page 1842

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) operand = if AnyActiveElement(mask, esize) then Z[n] else Zeros();
bits(PL) result;

for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
bits(esize) element = Elem[operand, e, esize];
boolean res;
case op of
when Cmp_EQ res = FPCompareEQ(element, 0<esize-1:0>, FPCR[]);
when Cmp_GE res = FPCompareGE(element, 0<esize-1:0>, FPCR[]);
when Cmp_GT res = FPCompareGT(element, 0<esize-1:0>, FPCR[]);
when Cmp_NE res = FPCompareNE(element, 0<esize-1:0>, FPCR[]);
when Cmp_LT res = FPCompareGT(0<esize-1:0>, element, FPCR[]);
when Cmp_LE res = FPCompareGE(0<esize-1:0>, element, FPCR[]);
ElemP[result, e, esize] = if res then '1' else '0';
else
ElemP[result, e, esize] = '0';

P[d] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

FCM<cc> (zero) Page 1843

FCMLA (indexed)

Floating-point complex multiply-add by indexed values with rotate

Multiply the duplicated real components for rotations 0 and 180, or imaginary components for rotations 90 and 270, of
the floating-point complex numbers in each 128-bit segment of the first source vector by the specified complex number
in the corresponding the second source vector segment rotated by 0, 90, 180 or 270 degrees in the direction from the
positive real axis towards the positive imaginary axis, when considered in polar representation.
Then destructively add the products to the corresponding components of the complex numbers in the addend and
destination vector, without intermediate rounding.
These transformations permit the creation of a variety of multiply-add and multiply-subtract operations on complex
numbers by combining two of these instructions with the same vector operands but with rotations that are 90 degrees
apart.
Each complex number is represented in a vector register as an even/odd pair of elements with the real part in the
even-numbered element and the imaginary part in the odd-numbered element.
The complex numbers within the second source vector are specified using an immediate index which selects the same
complex number position within each 128-bit vector segment. The index range is from 0 to one less than the number
of complex numbers per 128-bit segment, encoded in 1 to 2 bits depending on the size of the complex number. This
instruction is unpredicated.
It has encodings from 2 classes: Half-precision and Single-precision

Half-precision

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 0 1 0 1 i2 Zm 0 0 0 1 rot Zn Zda
size<1>size<0>

FCMLA <Zda>.H, <Zn>.H, <Zm>.H[<imm>], <const>

if !HaveSVE() then UNDEFINED;

integer esize = 16;
integer index = UInt(i2);
integer n = UInt(Zn);
integer m = UInt(Zm);
integer da = UInt(Zda);
integer sel_a = UInt(rot<0>);
integer sel_b = UInt(NOT(rot<0>));
boolean neg_i = (rot<1> == '1');
boolean neg_r = (rot<0> != rot<1>);

Single-precision

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 0 1 1 1 i1 Zm 0 0 0 1 rot Zn Zda
size<1>size<0>

FCMLA <Zda>.S, <Zn>.S, <Zm>.S[<imm>], <const>

if !HaveSVE() then UNDEFINED;

integer esize = 32;
integer index = UInt(i1);
integer n = UInt(Zn);
integer m = UInt(Zm);
integer da = UInt(Zda);
integer sel_a = UInt(rot<0>);
integer sel_b = UInt(NOT(rot<0>));
boolean neg_i = (rot<1> == '1');
boolean neg_r = (rot<0> != rot<1>);

FCMLA (indexed) Page 1844

Assembler Symbols

<Zda> Is the name of the third source and destination scalable vector register, encoded in the "Zda" field.
<Zn> Is the name of the first source scalable vector register, encoded in the "Zn" field.
<Zm> For the half-precision variant: is the name of the second source scalable vector register Z0-Z7, encoded
in the "Zm" field.
For the single-precision variant: is the name of the second source scalable vector register Z0-Z15,
encoded in the "Zm" field.
<imm> For the half-precision variant: is the index of a Real and Imaginary pair, in the range 0 to 3, encoded in
the "i2" field.
For the single-precision variant: is the index of a Real and Imaginary pair, in the range 0 to 1, encoded
in the "i1" field.

<const> Is the const specifier, encoded in “rot”:

rot <const>
00 #0
01 #90
10 #180
11 #270

Operation

CheckSVEEnabled();
integer pairs = VL DIV (2 * esize);
integer pairspersegment = 128 DIV (2 * esize);
bits(VL) operand1 = Z[n];
bits(VL) operand2 = Z[m];
bits(VL) operand3 = Z[da];
bits(VL) result;

for p = 0 to pairs-1
segmentbase = p - (p MOD pairspersegment);
s = segmentbase + index;
addend_r = Elem[operand3, 2 * p + 0, esize];
addend_i = Elem[operand3, 2 * p + 1, esize];
elt1_a = Elem[operand1, 2 * p + sel_a, esize];
elt2_a = Elem[operand2, 2 * s + sel_a, esize];
elt2_b = Elem[operand2, 2 * s + sel_b, esize];
if neg_r then elt2_a = FPNeg(elt2_a);
if neg_i then elt2_b = FPNeg(elt2_b);
addend_r = FPMulAdd(addend_r, elt1_a, elt2_a, FPCR[]);
addend_i = FPMulAdd(addend_i, elt1_a, elt2_b, FPCR[]);
Elem[result, 2 * p + 0, esize] = addend_r;
Elem[result, 2 * p + 1, esize] = addend_i;

Z[da] = result;

Operational information

This instruction might be immediately preceded in program order by a MOVPRFX instruction. The MOVPRFX instruction
must conform to all of the following requirements, otherwise the behavior of the MOVPRFX and this instruction is
UNPREDICTABLE:
• The MOVPRFX instruction must be unpredicated.
• The MOVPRFX instruction must specify the same destination register as this instruction.
• The destination register must not refer to architectural register state referenced by any other source operand
register of this instruction.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

FCMLA (indexed) Page 1845

FCMLA (vectors)

Floating-point complex multiply-add with rotate (predicated)

Multiply the duplicated real components for rotations 0 and 180, or imaginary components for rotations 90 and 270, of
the floating-point complex numbers in the first source vector by the corresponding complex number in the second
source vector rotated by 0, 90, 180 or 270 degrees in the direction from the positive real axis towards the positive
imaginary axis, when considered in polar representation.
Then destructively add the products to the corresponding components of the complex numbers in the addend and
destination vector, without intermediate rounding.
These transformations permit the creation of a variety of multiply-add and multiply-subtract operations on complex
numbers by combining two of these instructions with the same vector operands but with rotations that are 90 degrees
apart.
Each complex number is represented in a vector register as an even/odd pair of elements with the real part in the
even-numbered element and the imaginary part in the odd-numbered element. Inactive elements in the destination
vector register remain unmodified.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 0 size 0 Zm 0 rot Pg Zn Zda

FCMLA <Zda>.<T>, <Pg>/M, <Zn>.<T>, <Zm>.<T>, <const>

if !HaveSVE() then UNDEFINED;

if size == '00' then UNDEFINED;
integer esize = 8 << UInt(size);
integer g = UInt(Pg);
integer n = UInt(Zn);
integer m = UInt(Zm);
integer da = UInt(Zda);
integer sel_a = UInt(rot<0>);
integer sel_b = UInt(NOT(rot<0>));
boolean neg_i = (rot<1> == '1');
boolean neg_r = (rot<0> != rot<1>);

Assembler Symbols

<Zda> Is the name of the third source and destination scalable vector register, encoded in the "Zda" field.

<T> Is the size specifier, encoded in “size”:

size <T>
00 RESERVED
01 H
10 S
11 D

<const> Is the const specifier, encoded in “rot”:

rot <const>
00 #0
01 #90
10 #180
11 #270

FCMLA (vectors) Page 1846

Operation

CheckSVEEnabled();
integer pairs = VL DIV (2 * esize);
bits(PL) mask = P[g];
bits(VL) operand1 = if AnyActiveElement(mask, esize) then Z[n] else Zeros();
bits(VL) operand2 = if AnyActiveElement(mask, esize) then Z[m] else Zeros();
bits(VL) operand3 = Z[da];
bits(VL) result;

for p = 0 to pairs-1
addend_r = Elem[operand3, 2 * p + 0, esize];
addend_i = Elem[operand3, 2 * p + 1, esize];
if ElemP[mask, 2 * p + 0, esize] == '1' then
bits(esize) elt1_a = Elem[operand1, 2 * p + sel_a, esize];
bits(esize) elt2_a = Elem[operand2, 2 * p + sel_a, esize];
if neg_r then elt2_a = FPNeg(elt2_a);
addend_r = FPMulAdd(addend_r, elt1_a, elt2_a, FPCR[]);
if ElemP[mask, 2 * p + 1, esize] == '1' then
bits(esize) elt1_a = Elem[operand1, 2 * p + sel_a, esize];
bits(esize) elt2_b = Elem[operand2, 2 * p + sel_b, esize];
if neg_i then elt2_b = FPNeg(elt2_b);
addend_i = FPMulAdd(addend_i, elt1_a, elt2_b, FPCR[]);
Elem[result, 2 * p + 0, esize] = addend_r;
Elem[result, 2 * p + 1, esize] = addend_i;

Z[da] = result;

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

FCMLA (vectors) Page 1847

FCMLE (vectors)

Floating-point compare less than or equal to vector

Compare active floating-point elements in the first source vector being less than or equal to corresponding elements in
the second source vector, and place the boolean results of the comparison in the corresponding elements of the
destination predicate. Inactive elements in the destination predicate register are set to zero. Does not set the
condition flags.

This is a pseudo-instruction of FCM<cc> (vectors). This means:

• The encodings in this description are named to match the encodings of FCM<cc> (vectors).
• The assembler syntax is used only for assembly, and is not used on disassembly.
• The description of FCM<cc> (vectors) gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 size 0 Zm 0 1 0 Pg Zn 0 Pd
cmph cmpl

FCMLE <Pd>.<T>, <Pg>/Z, <Zm>.<T>, <Zn>.<T>

is equivalent to

FCMGE <Pd>.<T>, <Pg>/Z, <Zn>.<T>, <Zm>.<T>

Assembler Symbols

<T> Is the size specifier, encoded in “size”:

size <T>
00 RESERVED
01 H
10 S
11 D

<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.

Operation

The description of FCM<cc> (vectors) gives the operational pseudocode for this instruction.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

FCMLE (vectors) Page 1848

FCMLT (vectors)

Floating-point compare less than vector

Compare active floating-point elements in the first source vector being less than corresponding elements in the second
source vector, and place the boolean results of the comparison in the corresponding elements of the destination
predicate. Inactive elements in the destination predicate register are set to zero. Does not set the condition flags.

This is a pseudo-instruction of FCM<cc> (vectors). This means:

FCMLT <Pd>.<T>, <Pg>/Z, <Zm>.<T>, <Zn>.<T>

is equivalent to

FCMGT <Pd>.<T>, <Pg>/Z, <Zn>.<T>, <Zm>.<T>

Assembler Symbols

<T> Is the size specifier, encoded in “size”:

size <T>
00 RESERVED
01 H
10 S
11 D

<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.

Operation

The description of FCM<cc> (vectors) gives the operational pseudocode for this instruction.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

FCMLT (vectors) Page 1849

FCPY

Copy 8-bit floating-point immediate to vector elements (predicated)

Copy a floating-point immediate into each active element in the destination vector. Inactive elements in the destination
vector register remain unmodified.
This instruction is used by the alias FMOV (immediate, predicated).
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 size 0 1 Pg 1 1 0 imm8 Zd

FCPY <Zd>.<T>, <Pg>/M, #<const>

if !HaveSVE() then UNDEFINED;

if size == '00' then UNDEFINED;
integer esize = 8 << UInt(size);
integer g = UInt(Pg);
integer d = UInt(Zd);
bits(esize) imm = VFPExpandImm(imm8);

Assembler Symbols

<Zd> Is the name of the destination scalable vector register, encoded in the "Zd" field.

<T> Is the size specifier, encoded in “size”:

size <T>
00 RESERVED
01 H
10 S
11 D

<Pg> Is the name of the governing scalable predicate register, encoded in the "Pg" field.
<const> Is a floating-point immediate value expressable as ±n÷16×2^r, where n and r are integers such that 16
≤ n ≤ 31 and -3 ≤ r ≤ 4, i.e. a normalized binary floating-point encoding with 1 sign bit, 3-bit exponent,
and 4-bit fractional part, encoded in the "imm8" field.

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) result = Z[d];

for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
Elem[result, e, esize] = imm;

Z[d] = result;

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

FCPY Page 1850

FCVT

Floating-point convert precision (predicated)

Convert the size and precision of each active floating-point element of the source vector, and place the results in the
corresponding elements of the destination vector. Inactive elements in the destination vector register remain
unmodified.
Since the input and result types have a different size the smaller type is held unpacked in the least significant bits of
elements of the larger size. When the input is the smaller type the upper bits of each source element are ignored.
When the result is the smaller type the results are zero-extended to fill each destination element.
It has encodings from 6 classes: Half-precision to single-precision , Half-precision to double-precision , Single-
precision to half-precision , Single-precision to double-precision , Double-precision to half-precision and Double-
precision to single-precision

Half-precision to single-precision

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 1 0 0 0 1 0 0 1 1 0 1 Pg Zn Zd

FCVT <Zd>.S, <Pg>/M, <Zn>.H

if !HaveSVE() then UNDEFINED;

integer esize = 32;
integer g = UInt(Pg);
integer n = UInt(Zn);
integer d = UInt(Zd);
integer s_esize = 16;
integer d_esize = 32;

Half-precision to double-precision

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 1 1 0 0 1 0 0 1 1 0 1 Pg Zn Zd

FCVT <Zd>.D, <Pg>/M, <Zn>.H

if !HaveSVE() then UNDEFINED;

integer esize = 64;
integer g = UInt(Pg);
integer n = UInt(Zn);
integer d = UInt(Zd);
integer s_esize = 16;
integer d_esize = 64;

Single-precision to half-precision

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 1 0 0 0 1 0 0 0 1 0 1 Pg Zn Zd

FCVT <Zd>.H, <Pg>/M, <Zn>.S

if !HaveSVE() then UNDEFINED;

integer esize = 32;
integer g = UInt(Pg);
integer n = UInt(Zn);
integer d = UInt(Zd);
integer s_esize = 32;
integer d_esize = 16;

FCVT Page 1851

Single-precision to double-precision

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 1 1 0 0 1 0 1 1 1 0 1 Pg Zn Zd

FCVT <Zd>.D, <Pg>/M, <Zn>.S

if !HaveSVE() then UNDEFINED;

integer esize = 64;
integer g = UInt(Pg);
integer n = UInt(Zn);
integer d = UInt(Zd);
integer s_esize = 32;
integer d_esize = 64;

Double-precision to half-precision

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 1 1 0 0 1 0 0 0 1 0 1 Pg Zn Zd

FCVT <Zd>.H, <Pg>/M, <Zn>.D

if !HaveSVE() then UNDEFINED;

integer esize = 64;
integer g = UInt(Pg);
integer n = UInt(Zn);
integer d = UInt(Zd);
integer s_esize = 64;
integer d_esize = 16;

Double-precision to single-precision

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 1 1 0 0 1 0 1 0 1 0 1 Pg Zn Zd

FCVT <Zd>.S, <Pg>/M, <Zn>.D

if !HaveSVE() then UNDEFINED;

integer esize = 64;
integer g = UInt(Pg);
integer n = UInt(Zn);
integer d = UInt(Zd);
integer s_esize = 64;
integer d_esize = 32;

Assembler Symbols

FCVT Page 1852

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) operand = if AnyActiveElement(mask, esize) then Z[n] else Zeros();
bits(VL) result = Z[d];

for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
bits(esize) element = Elem[operand, e, esize];
bits(d_esize) res = FPConvertSVE(element<s_esize-1:0>, FPCR[]);
Elem[result, e, esize] = ZeroExtend(res);

Z[d] = result;

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

FCVT Page 1853

FCVTZS

Floating-point convert to signed integer, rounding toward zero (predicated)

Convert to the signed integer nearer to zero from each active floating-point element of the source vector, and place the
results in the corresponding elements of the destination vector. Inactive elements in the destination vector register
remain unmodified.
If the input and result types have a different size the smaller type is held unpacked in the least significant bits of
elements of the larger size. When the input is the smaller type the upper bits of each source element are ignored.
When the result is the smaller type the results are sign-extended to fill each destination element.
It has encodings from 7 classes: Half-precision to 16-bit , Half-precision to 32-bit , Half-precision to 64-bit , Single-
precision to 32-bit , Single-precision to 64-bit , Double-precision to 32-bit and Double-precision to 64-bit

Half-precision to 16-bit

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 0 1 0 1 1 0 1 0 1 0 1 Pg Zn Zd
int_U

FCVTZS <Zd>.H, <Pg>/M, <Zn>.H

if !HaveSVE() then UNDEFINED;

integer esize = 16;
integer g = UInt(Pg);
integer n = UInt(Zn);
integer d = UInt(Zd);
integer s_esize = 16;
integer d_esize = 16;
boolean unsigned = FALSE;
FPRounding rounding = FPRounding_ZERO;

Half-precision to 32-bit

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 0 1 0 1 1 1 0 0 1 0 1 Pg Zn Zd
int_U

FCVTZS <Zd>.S, <Pg>/M, <Zn>.H

if !HaveSVE() then UNDEFINED;

integer esize = 32;
integer g = UInt(Pg);
integer n = UInt(Zn);
integer d = UInt(Zd);
integer s_esize = 16;
integer d_esize = 32;
boolean unsigned = FALSE;
FPRounding rounding = FPRounding_ZERO;

Half-precision to 64-bit

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 0 1 0 1 1 1 1 0 1 0 1 Pg Zn Zd
int_U

FCVTZS Page 1854

FCVTZS <Zd>.D, <Pg>/M, <Zn>.H

if !HaveSVE() then UNDEFINED;

integer esize = 64;
integer g = UInt(Pg);
integer n = UInt(Zn);
integer d = UInt(Zd);
integer s_esize = 16;
integer d_esize = 64;
boolean unsigned = FALSE;
FPRounding rounding = FPRounding_ZERO;

Single-precision to 32-bit

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 1 0 0 1 1 1 0 0 1 0 1 Pg Zn Zd
int_U

FCVTZS <Zd>.S, <Pg>/M, <Zn>.S

if !HaveSVE() then UNDEFINED;

integer esize = 32;
integer g = UInt(Pg);
integer n = UInt(Zn);
integer d = UInt(Zd);
integer s_esize = 32;
integer d_esize = 32;
boolean unsigned = FALSE;
FPRounding rounding = FPRounding_ZERO;

Single-precision to 64-bit

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 1 1 0 1 1 1 0 0 1 0 1 Pg Zn Zd
int_U

FCVTZS <Zd>.D, <Pg>/M, <Zn>.S

if !HaveSVE() then UNDEFINED;

integer esize = 64;
integer g = UInt(Pg);
integer n = UInt(Zn);
integer d = UInt(Zd);
integer s_esize = 32;
integer d_esize = 64;
boolean unsigned = FALSE;
FPRounding rounding = FPRounding_ZERO;

Double-precision to 32-bit

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 1 1 0 1 1 0 0 0 1 0 1 Pg Zn Zd
int_U

FCVTZS Page 1855

FCVTZS <Zd>.S, <Pg>/M, <Zn>.D

if !HaveSVE() then UNDEFINED;

integer esize = 64;
integer g = UInt(Pg);
integer n = UInt(Zn);
integer d = UInt(Zd);
integer s_esize = 64;
integer d_esize = 32;
boolean unsigned = FALSE;
FPRounding rounding = FPRounding_ZERO;

Double-precision to 64-bit

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 1 1 0 1 1 1 1 0 1 0 1 Pg Zn Zd
int_U

FCVTZS <Zd>.D, <Pg>/M, <Zn>.D

if !HaveSVE() then UNDEFINED;

integer esize = 64;
integer g = UInt(Pg);
integer n = UInt(Zn);
integer d = UInt(Zd);
integer s_esize = 64;
integer d_esize = 64;
boolean unsigned = FALSE;
FPRounding rounding = FPRounding_ZERO;

Assembler Symbols

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) operand = if AnyActiveElement(mask, esize) then Z[n] else Zeros();
bits(VL) result = Z[d];

for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
bits(esize) element = Elem[operand, e, esize];
bits(d_esize) res = FPToFixed(element<s_esize-1:0>, 0, unsigned, FPCR[], rounding);
Elem[result, e, esize] = Extend(res, unsigned);

Z[d] = result;

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

FCVTZS Page 1856

FCVTZS Page 1857

FCVTZU

Floating-point convert to unsigned integer, rounding toward zero (predicated)

Convert to the unsigned integer nearer to zero from each active floating-point element of the source vector, and place
the results in the corresponding elements of the destination vector. Inactive elements in the destination vector register
remain unmodified.
If the input and result types have a different size the smaller type is held unpacked in the least significant bits of
elements of the larger size. When the input is the smaller type the upper bits of each source element are ignored.
When the result is the smaller type the results are zero-extended to fill each destination element.
It has encodings from 7 classes: Half-precision to 16-bit , Half-precision to 32-bit , Half-precision to 64-bit , Single-
precision to 32-bit , Single-precision to 64-bit , Double-precision to 32-bit and Double-precision to 64-bit

Half-precision to 16-bit

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 0 1 0 1 1 0 1 1 1 0 1 Pg Zn Zd
int_U

FCVTZU <Zd>.H, <Pg>/M, <Zn>.H

if !HaveSVE() then UNDEFINED;

integer esize = 16;
integer g = UInt(Pg);
integer n = UInt(Zn);
integer d = UInt(Zd);
integer s_esize = 16;
integer d_esize = 16;
boolean unsigned = TRUE;
FPRounding rounding = FPRounding_ZERO;

Half-precision to 32-bit

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 0 1 0 1 1 1 0 1 1 0 1 Pg Zn Zd
int_U

FCVTZU <Zd>.S, <Pg>/M, <Zn>.H

if !HaveSVE() then UNDEFINED;

integer esize = 32;
integer g = UInt(Pg);
integer n = UInt(Zn);
integer d = UInt(Zd);
integer s_esize = 16;
integer d_esize = 32;
boolean unsigned = TRUE;
FPRounding rounding = FPRounding_ZERO;

Half-precision to 64-bit

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 0 1 0 1 1 1 1 1 1 0 1 Pg Zn Zd
int_U

FCVTZU Page 1858

FCVTZU <Zd>.D, <Pg>/M, <Zn>.H

if !HaveSVE() then UNDEFINED;

integer esize = 64;
integer g = UInt(Pg);
integer n = UInt(Zn);
integer d = UInt(Zd);
integer s_esize = 16;
integer d_esize = 64;
boolean unsigned = TRUE;
FPRounding rounding = FPRounding_ZERO;

Single-precision to 32-bit

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 1 0 0 1 1 1 0 1 1 0 1 Pg Zn Zd
int_U

FCVTZU <Zd>.S, <Pg>/M, <Zn>.S

if !HaveSVE() then UNDEFINED;

integer esize = 32;
integer g = UInt(Pg);
integer n = UInt(Zn);
integer d = UInt(Zd);
integer s_esize = 32;
integer d_esize = 32;
boolean unsigned = TRUE;
FPRounding rounding = FPRounding_ZERO;

Single-precision to 64-bit

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 1 1 0 1 1 1 0 1 1 0 1 Pg Zn Zd
int_U

FCVTZU <Zd>.D, <Pg>/M, <Zn>.S

if !HaveSVE() then UNDEFINED;

integer esize = 64;
integer g = UInt(Pg);
integer n = UInt(Zn);
integer d = UInt(Zd);
integer s_esize = 32;
integer d_esize = 64;
boolean unsigned = TRUE;
FPRounding rounding = FPRounding_ZERO;

Double-precision to 32-bit

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 1 1 0 1 1 0 0 1 1 0 1 Pg Zn Zd
int_U

FCVTZU Page 1859

FCVTZU <Zd>.S, <Pg>/M, <Zn>.D

if !HaveSVE() then UNDEFINED;

integer esize = 64;
integer g = UInt(Pg);
integer n = UInt(Zn);
integer d = UInt(Zd);
integer s_esize = 64;
integer d_esize = 32;
boolean unsigned = TRUE;
FPRounding rounding = FPRounding_ZERO;

Double-precision to 64-bit

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 1 1 0 1 1 1 1 1 1 0 1 Pg Zn Zd
int_U

FCVTZU <Zd>.D, <Pg>/M, <Zn>.D

if !HaveSVE() then UNDEFINED;

integer esize = 64;
integer g = UInt(Pg);
integer n = UInt(Zn);
integer d = UInt(Zd);
integer s_esize = 64;
integer d_esize = 64;
boolean unsigned = TRUE;
FPRounding rounding = FPRounding_ZERO;

Assembler Symbols

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) operand = if AnyActiveElement(mask, esize) then Z[n] else Zeros();
bits(VL) result = Z[d];

Z[d] = result;

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

FCVTZU Page 1860

FCVTZU Page 1861

FDIV

Floating-point divide by vector (predicated)

Divide active floating-point elements of the first source vector by corresponding floating-point elements of the second
source vector and destructively place the quotient in the corresponding elements of the first source vector. Inactive
elements in the destination vector register remain unmodified.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 size 0 0 1 1 0 1 1 0 0 Pg Zm Zdn

FDIV <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>

if !HaveSVE() then UNDEFINED;

if size == '00' then UNDEFINED;
integer esize = 8 << UInt(size);
integer g = UInt(Pg);
integer dn = UInt(Zdn);
integer m = UInt(Zm);

Assembler Symbols

<Zdn> Is the name of the first source and destination scalable vector register, encoded in the "Zdn" field.

<T> Is the size specifier, encoded in “size”:

size <T>
00 RESERVED
01 H
10 S
11 D

<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Zm> Is the name of the second source scalable vector register, encoded in the "Zm" field.

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) operand1 = Z[dn];
bits(VL) operand2 = if AnyActiveElement(mask, esize) then Z[m] else Zeros();
bits(VL) result;

for e = 0 to elements-1
bits(esize) element1 = Elem[operand1, e, esize];
if ElemP[mask, e, esize] == '1' then
bits(esize) element2 = Elem[operand2, e, esize];
Elem[result, e, esize] = FPDiv(element1, element2, FPCR[]);
else
Elem[result, e, esize] = element1;

Z[dn] = result;

Operational information

FDIV Page 1862

• The destination register must not refer to architectural register state referenced by any other source operand
register of this instruction.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

FDIV Page 1863

FDIVR

Floating-point reversed divide by vector (predicated)

Reversed divide active floating-point elements of the second source vector by corresponding floating-point elements of
the first source vector and destructively place the quotient in the corresponding elements of the first source vector.
Inactive elements in the destination vector register remain unmodified.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 size 0 0 1 1 0 0 1 0 0 Pg Zm Zdn

FDIVR <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>

if !HaveSVE() then UNDEFINED;

if size == '00' then UNDEFINED;
integer esize = 8 << UInt(size);
integer g = UInt(Pg);
integer dn = UInt(Zdn);
integer m = UInt(Zm);

Assembler Symbols

<Zdn> Is the name of the first source and destination scalable vector register, encoded in the "Zdn" field.

<T> Is the size specifier, encoded in “size”:

size <T>
00 RESERVED
01 H
10 S
11 D

<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Zm> Is the name of the second source scalable vector register, encoded in the "Zm" field.

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) operand1 = Z[dn];
bits(VL) operand2 = if AnyActiveElement(mask, esize) then Z[m] else Zeros();
bits(VL) result;

for e = 0 to elements-1
bits(esize) element1 = Elem[operand1, e, esize];
if ElemP[mask, e, esize] == '1' then
bits(esize) element2 = Elem[operand2, e, esize];
Elem[result, e, esize] = FPDiv(element2, element1, FPCR[]);
else
Elem[result, e, esize] = element1;

Z[dn] = result;

Operational information

FDIVR Page 1864

• The destination register must not refer to architectural register state referenced by any other source operand
register of this instruction.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

FDIVR Page 1865

FDUP

Broadcast 8-bit floating-point immediate to vector elements (unpredicated)

Unconditionally broadcast the floating-point immediate into each element of the destination vector. This instruction is
unpredicated.
This instruction is used by the alias FMOV (immediate, unpredicated).
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 size 1 1 1 0 0 1 1 1 0 imm8 Zd

FDUP <Zd>.<T>, #<const>

if !HaveSVE() then UNDEFINED;

if size == '00' then UNDEFINED;
integer esize = 8 << UInt(size);
integer d = UInt(Zd);
bits(esize) imm = VFPExpandImm(imm8);

Assembler Symbols

<Zd> Is the name of the destination scalable vector register, encoded in the "Zd" field.

<T> Is the size specifier, encoded in “size”:

size <T>
00 RESERVED
01 H
10 S
11 D

<const> Is a floating-point immediate value expressable as ±n÷16×2^r, where n and r are integers such that 16
≤ n ≤ 31 and -3 ≤ r ≤ 4, i.e. a normalized binary floating-point encoding with 1 sign bit, 3-bit exponent,
and 4-bit fractional part, encoded in the "imm8" field.

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(VL) result;

for e = 0 to elements-1
Elem[result, e, esize] = imm;

Z[d] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

FDUP Page 1866

FEXPA

Floating-point exponential accelerator

The FEXPA instruction accelerates the polynomial series calculation of the EXP(X) function.
The double-precision variant copies the low 52 bits of an entry from a hard-wired table of 64-bit coefficients, indexed
by the low 6 bits of each element of the source vector, and prepends to that the next 11 bits of the source element
(src<16:6>), setting the sign bit to zero.
The single-precision variant copies the low 23 bits of an entry from hard-wired table of 32-bit coefficients, indexed by
the low 6 bits of each element of the source vector, and prepends to that the next 8 bits of the source element
(src<13:6>), setting the sign bit to zero.
The half-precision variant copies the low 10 bits of an entry from hard-wired table of 16-bit coefficients, indexed by the
low 5 bits of each element of the source vector, and prepends to that the next 5 bits of the source element (src<9:5>),
setting the sign bit to zero.
A coefficient table entry with index M holds the floating-point value 2(m/64), or for the half-precision variant 2(m/32).
This instruction is unpredicated.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 1 0 0 0 0 0 1 0 1 1 1 0 Zn Zd

FEXPA <Zd>.<T>, <Zn>.<T>

if !HaveSVE() then UNDEFINED;

if size == '00' then UNDEFINED;
integer esize = 8 << UInt(size);
integer n = UInt(Zn);
integer d = UInt(Zd);

Assembler Symbols

<Zd> Is the name of the destination scalable vector register, encoded in the "Zd" field.

<T> Is the size specifier, encoded in “size”:

size <T>
00 RESERVED
01 H
10 S
11 D

<Zn> Is the name of the source scalable vector register, encoded in the "Zn" field.

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(VL) operand = Z[n];
bits(VL) result;

for e = 0 to elements-1
bits(esize) element = Elem[operand, e, esize];
Elem[result, e, esize] = FPExpA(element);

Z[d] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

FEXPA Page 1867

FMAD

Floating-point fused multiply-add vectors (predicated), writing multiplicand [Zdn = Za + Zdn * Zm]

Multiply the corresponding active floating-point elements of the first and second source vectors and add to elements of
the third (addend) vector without intermediate rounding. Destructively place the results in the destination and first
source (multiplicand) vector. Inactive elements in the destination vector register remain unmodified.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 size 1 Za 1 0 0 Pg Zm Zdn
N op

FMAD <Zdn>.<T>, <Pg>/M, <Zm>.<T>, <Za>.<T>

if !HaveSVE() then UNDEFINED;

if size == '00' then UNDEFINED;
integer esize = 8 << UInt(size);
integer g = UInt(Pg);
integer dn = UInt(Zdn);
integer m = UInt(Zm);
integer a = UInt(Za);
boolean op1_neg = FALSE;
boolean op3_neg = FALSE;

Assembler Symbols

<Zdn> Is the name of the first source and destination scalable vector register, encoded in the "Zdn" field.

<T> Is the size specifier, encoded in “size”:

size <T>
00 RESERVED
01 H
10 S
11 D

<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Zm> Is the name of the second source scalable vector register, encoded in the "Zm" field.
<Za> Is the name of the third source scalable vector register, encoded in the "Za" field.

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) operand1 = Z[dn];
bits(VL) operand2 = if AnyActiveElement(mask, esize) then Z[m] else Zeros();
bits(VL) operand3 = if AnyActiveElement(mask, esize) then Z[a] else Zeros();
bits(VL) result;

for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
bits(esize) element1 = Elem[operand1, e, esize];
bits(esize) element2 = Elem[operand2, e, esize];
bits(esize) element3 = Elem[operand3, e, esize];

if op1_neg then element1 = FPNeg(element1);

if op3_neg then element3 = FPNeg(element3);
Elem[result, e, esize] = FPMulAdd(element3, element1, element2, FPCR[]);
else
Elem[result, e, esize] = Elem[operand1, e, esize];

Z[dn] = result;

FMAD Page 1868

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

FMAD Page 1869

FMAX (immediate)

Floating-point maximum with immediate (predicated)

Determine the maximum of an immediate and each active floating-point element of the source vector, and
destructively place the results in the corresponding elements of the source vector. The immediate may take the value
+0.0 or +1.0 only. If the element value is NaN then the result is NaN. Inactive elements in the destination vector
register remain unmodified.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 size 0 1 1 1 1 0 1 0 0 Pg 0 0 0 0 i1 Zdn

FMAX <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <const>

if !HaveSVE() then UNDEFINED;

if size == '00' then UNDEFINED;
integer esize = 8 << UInt(size);
integer g = UInt(Pg);
integer dn = UInt(Zdn);
bits(esize) imm = if i1 == '0' then Zeros() else FPOne('0');

Assembler Symbols

<Zdn> Is the name of the source and destination scalable vector register, encoded in the "Zdn" field.

<T> Is the size specifier, encoded in “size”:

size <T>
00 RESERVED
01 H
10 S
11 D

<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.

<const> Is the floating-point immediate value, encoded in “i1”:

i1 <const>
0 #0.0
1 #1.0

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) operand1 = Z[dn];
bits(VL) result;

for e = 0 to elements-1
bits(esize) element1 = Elem[operand1, e, esize];
if ElemP[mask, e, esize] == '1' then
Elem[result, e, esize] = FPMax(element1, imm, FPCR[]);
else
Elem[result, e, esize] = element1;

Z[dn] = result;

Operational information

FMAX (immediate) Page 1870

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

FMAX (immediate) Page 1871

FMAX (vectors)

Floating-point maximum (predicated)

Determine the maximum of active floating-point elements of the second source vector and corresponding floating-point
elements of the first source vector and destructively place the results in the corresponding elements of the first source
vector. If one element value is numeric and the other is a quiet NaN, then the result is the numeric value. Inactive
elements in the destination vector register remain unmodified.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 size 0 0 0 1 1 0 1 0 0 Pg Zm Zdn

FMAX <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>

if !HaveSVE() then UNDEFINED;

if size == '00' then UNDEFINED;
integer esize = 8 << UInt(size);
integer g = UInt(Pg);
integer dn = UInt(Zdn);
integer m = UInt(Zm);

Assembler Symbols

<Zdn> Is the name of the first source and destination scalable vector register, encoded in the "Zdn" field.

<T> Is the size specifier, encoded in “size”:

size <T>
00 RESERVED
01 H
10 S
11 D

<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Zm> Is the name of the second source scalable vector register, encoded in the "Zm" field.

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) operand1 = Z[dn];
bits(VL) operand2 = if AnyActiveElement(mask, esize) then Z[m] else Zeros();
bits(VL) result;

for e = 0 to elements-1
bits(esize) element1 = Elem[operand1, e, esize];
if ElemP[mask, e, esize] == '1' then
bits(esize) element2 = Elem[operand2, e, esize];
Elem[result, e, esize] = FPMax(element1, element2, FPCR[]);
else
Elem[result, e, esize] = element1;

Z[dn] = result;

Operational information

FMAX (vectors) Page 1872

• The destination register must not refer to architectural register state referenced by any other source operand
register of this instruction.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

FMAX (vectors) Page 1873

FMAXNM (immediate)

Floating-point maximum number with immediate (predicated)

Determine the maximum number value of an immediate and each active floating-point element of the source vector,
and destructively place the results in the corresponding elements of the source vector. The immediate may take the
value +0.0 or +1.0 only. If the element value is a quiet NaN, then the result is the immediate. Inactive elements in the
destination vector register remain unmodified.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 size 0 1 1 1 0 0 1 0 0 Pg 0 0 0 0 i1 Zdn

FMAXNM <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <const>

if !HaveSVE() then UNDEFINED;

if size == '00' then UNDEFINED;
integer esize = 8 << UInt(size);
integer g = UInt(Pg);
integer dn = UInt(Zdn);
bits(esize) imm = if i1 == '0' then Zeros() else FPOne('0');

Assembler Symbols

<Zdn> Is the name of the source and destination scalable vector register, encoded in the "Zdn" field.

<T> Is the size specifier, encoded in “size”:

size <T>
00 RESERVED
01 H
10 S
11 D

<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.

<const> Is the floating-point immediate value, encoded in “i1”:

i1 <const>
0 #0.0
1 #1.0

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) operand1 = Z[dn];
bits(VL) result;

for e = 0 to elements-1
bits(esize) element1 = Elem[operand1, e, esize];
if ElemP[mask, e, esize] == '1' then
Elem[result, e, esize] = FPMaxNum(element1, imm, FPCR[]);
else
Elem[result, e, esize] = element1;

Z[dn] = result;

Operational information

FMAXNM (immediate) Page 1874

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

FMAXNM (immediate) Page 1875

FMAXNM (vectors)

Floating-point maximum number (predicated)

Determine the maximum number value of active floating-point elements of the second source vector and
corresponding floating-point elements of the first source vector and destructively place the results in the
corresponding elements of the first source vector. If one element value is NaN then the result is the numeric value.
Inactive elements in the destination vector register remain unmodified.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 size 0 0 0 1 0 0 1 0 0 Pg Zm Zdn

FMAXNM <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>

if !HaveSVE() then UNDEFINED;

if size == '00' then UNDEFINED;
integer esize = 8 << UInt(size);
integer g = UInt(Pg);
integer dn = UInt(Zdn);
integer m = UInt(Zm);

Assembler Symbols

<Zdn> Is the name of the first source and destination scalable vector register, encoded in the "Zdn" field.

<T> Is the size specifier, encoded in “size”:

size <T>
00 RESERVED
01 H
10 S
11 D

<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Zm> Is the name of the second source scalable vector register, encoded in the "Zm" field.

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) operand1 = Z[dn];
bits(VL) operand2 = if AnyActiveElement(mask, esize) then Z[m] else Zeros();
bits(VL) result;

for e = 0 to elements-1
bits(esize) element1 = Elem[operand1, e, esize];
if ElemP[mask, e, esize] == '1' then
bits(esize) element2 = Elem[operand2, e, esize];
Elem[result, e, esize] = FPMaxNum(element1, element2, FPCR[]);
else
Elem[result, e, esize] = element1;

Z[dn] = result;

Operational information

FMAXNM (vectors) Page 1876

• The destination register must not refer to architectural register state referenced by any other source operand
register of this instruction.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

FMAXNM (vectors) Page 1877

FMAXNMV

Floating-point maximum number recursive reduction to scalar

Floating-point maximum number horizontally over all lanes of a vector using a recursive pairwise reduction, and place
the result in the SIMD&FP scalar destination register. Inactive elements in the source vector are treated as the default
NaN.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 size 0 0 0 1 0 0 0 0 1 Pg Zn Vd

FMAXNMV <V><d>, <Pg>, <Zn>.<T>

if !HaveSVE() then UNDEFINED;

if size == '00' then UNDEFINED;
integer esize = 8 << UInt(size);
integer g = UInt(Pg);
integer n = UInt(Zn);
integer d = UInt(Vd);

Assembler Symbols

<V> Is a width specifier, encoded in “size”:

size <V>
00 RESERVED
01 H
10 S
11 D

<T> Is the size specifier, encoded in “size”:

size <T>
00 RESERVED
01 H
10 S
11 D

Operation

CheckSVEEnabled();
bits(PL) mask = P[g];
bits(VL) operand = if AnyActiveElement(mask, esize) then Z[n] else Zeros();
bits(esize) identity = FPDefaultNaN();

V[d] = ReducePredicated(ReduceOp_FMAXNUM, operand, mask, identity);

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

FMAXNMV Page 1878

FMAXV

Floating-point maximum recursive reduction to scalar

Floating-point maximum horizontally over all lanes of a vector using a recursive pairwise reduction, and place the
result in the SIMD&FP scalar destination register. Inactive elements in the source vector are treated as -Infinity.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 size 0 0 0 1 1 0 0 0 1 Pg Zn Vd

FMAXV <V><d>, <Pg>, <Zn>.<T>

if !HaveSVE() then UNDEFINED;

if size == '00' then UNDEFINED;
integer esize = 8 << UInt(size);
integer g = UInt(Pg);
integer n = UInt(Zn);
integer d = UInt(Vd);

Assembler Symbols

<V> Is a width specifier, encoded in “size”:

size <V>
00 RESERVED
01 H
10 S
11 D

<T> Is the size specifier, encoded in “size”:

size <T>
00 RESERVED
01 H
10 S
11 D

Operation

CheckSVEEnabled();
bits(PL) mask = P[g];
bits(VL) operand = if AnyActiveElement(mask, esize) then Z[n] else Zeros();
bits(esize) identity = FPInfinity('1');

V[d] = ReducePredicated(ReduceOp_FMAX, operand, mask, identity);

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

FMAXV Page 1879

FMIN (immediate)

Floating-point minimum with immediate (predicated)

Determine the minimum of an immediate and each active floating-point element of the source vector, and destructively
place the results in the corresponding elements of the source vector. The immediate may take the value +0.0 or +1.0
only. If the element value is NaN then the result is NaN. Inactive elements in the destination vector register remain
unmodified.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 size 0 1 1 1 1 1 1 0 0 Pg 0 0 0 0 i1 Zdn

FMIN <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <const>

if !HaveSVE() then UNDEFINED;

if size == '00' then UNDEFINED;
integer esize = 8 << UInt(size);
integer g = UInt(Pg);
integer dn = UInt(Zdn);
bits(esize) imm = if i1 == '0' then Zeros() else FPOne('0');

Assembler Symbols

<Zdn> Is the name of the source and destination scalable vector register, encoded in the "Zdn" field.

<T> Is the size specifier, encoded in “size”:

size <T>
00 RESERVED
01 H
10 S
11 D

<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.

<const> Is the floating-point immediate value, encoded in “i1”:

i1 <const>
0 #0.0
1 #1.0

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) operand1 = Z[dn];
bits(VL) result;

for e = 0 to elements-1
bits(esize) element1 = Elem[operand1, e, esize];
if ElemP[mask, e, esize] == '1' then
Elem[result, e, esize] = FPMin(element1, imm, FPCR[]);
else
Elem[result, e, esize] = element1;

Z[dn] = result;

Operational information

FMIN (immediate) Page 1880

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

FMIN (immediate) Page 1881

FMIN (vectors)

Floating-point minimum (predicated)

Determine the minimum of active floating-point elements of the second source vector and corresponding floating-point
elements of the first source vector and destructively place the results in the corresponding elements of the first source
vector. If the element value is a quiet NaN, then the result is the immediate. Inactive elements in the destination
vector register remain unmodified.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 size 0 0 0 1 1 1 1 0 0 Pg Zm Zdn

FMIN <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>

if !HaveSVE() then UNDEFINED;

if size == '00' then UNDEFINED;
integer esize = 8 << UInt(size);
integer g = UInt(Pg);
integer dn = UInt(Zdn);
integer m = UInt(Zm);

Assembler Symbols

<Zdn> Is the name of the first source and destination scalable vector register, encoded in the "Zdn" field.

<T> Is the size specifier, encoded in “size”:

size <T>
00 RESERVED
01 H
10 S
11 D

<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Zm> Is the name of the second source scalable vector register, encoded in the "Zm" field.

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) operand1 = Z[dn];
bits(VL) operand2 = if AnyActiveElement(mask, esize) then Z[m] else Zeros();
bits(VL) result;

for e = 0 to elements-1
bits(esize) element1 = Elem[operand1, e, esize];
if ElemP[mask, e, esize] == '1' then
bits(esize) element2 = Elem[operand2, e, esize];
Elem[result, e, esize] = FPMin(element1, element2, FPCR[]);
else
Elem[result, e, esize] = element1;

Z[dn] = result;

Operational information

FMIN (vectors) Page 1882

• The destination register must not refer to architectural register state referenced by any other source operand
register of this instruction.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

FMIN (vectors) Page 1883

FMINNM (immediate)

Floating-point minimum number with immediate (predicated)

Determine the minimum number value of an immediate and each active floating-point element of the source vector,
and destructively place the results in the corresponding elements of the source vector. The immediate may take the
value +0.0 or +1.0 only. If one element value is numeric and the other is a quiet NaN, then the result is the numeric
value. Inactive elements in the destination vector register remain unmodified.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 size 0 1 1 1 0 1 1 0 0 Pg 0 0 0 0 i1 Zdn

FMINNM <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <const>

if !HaveSVE() then UNDEFINED;

if size == '00' then UNDEFINED;
integer esize = 8 << UInt(size);
integer g = UInt(Pg);
integer dn = UInt(Zdn);
bits(esize) imm = if i1 == '0' then Zeros() else FPOne('0');

Assembler Symbols

<Zdn> Is the name of the source and destination scalable vector register, encoded in the "Zdn" field.

<T> Is the size specifier, encoded in “size”:

size <T>
00 RESERVED
01 H
10 S
11 D

<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.

<const> Is the floating-point immediate value, encoded in “i1”:

i1 <const>
0 #0.0
1 #1.0

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) operand1 = Z[dn];
bits(VL) result;

for e = 0 to elements-1
bits(esize) element1 = Elem[operand1, e, esize];
if ElemP[mask, e, esize] == '1' then
Elem[result, e, esize] = FPMinNum(element1, imm, FPCR[]);
else
Elem[result, e, esize] = element1;

Z[dn] = result;

Operational information

FMINNM (immediate) Page 1884

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

FMINNM (immediate) Page 1885

FMINNM (vectors)

Floating-point minimum number (predicated)

Determine the minimum number value of active floating-point elements of the second source vector and corresponding
floating-point elements of the first source vector and destructively place the results in the corresponding elements of
the first source vector. If one element value is numeric and the other is a quiet NaN, then the result is the numeric
value. Inactive elements in the destination vector register remain unmodified.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 size 0 0 0 1 0 1 1 0 0 Pg Zm Zdn

FMINNM <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>

if !HaveSVE() then UNDEFINED;

if size == '00' then UNDEFINED;
integer esize = 8 << UInt(size);
integer g = UInt(Pg);
integer dn = UInt(Zdn);
integer m = UInt(Zm);

Assembler Symbols

<Zdn> Is the name of the first source and destination scalable vector register, encoded in the "Zdn" field.

<T> Is the size specifier, encoded in “size”:

size <T>
00 RESERVED
01 H
10 S
11 D

<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Zm> Is the name of the second source scalable vector register, encoded in the "Zm" field.

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) operand1 = Z[dn];
bits(VL) operand2 = if AnyActiveElement(mask, esize) then Z[m] else Zeros();
bits(VL) result;

for e = 0 to elements-1
bits(esize) element1 = Elem[operand1, e, esize];
if ElemP[mask, e, esize] == '1' then
bits(esize) element2 = Elem[operand2, e, esize];
Elem[result, e, esize] = FPMinNum(element1, element2, FPCR[]);
else
Elem[result, e, esize] = element1;

Z[dn] = result;

Operational information

FMINNM (vectors) Page 1886

• The destination register must not refer to architectural register state referenced by any other source operand
register of this instruction.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

FMINNM (vectors) Page 1887

FMINNMV

Floating-point minimum number recursive reduction to scalar

Floating-point minimum number horizontally over all lanes of a vector using a recursive pairwise reduction, and place
the result in the SIMD&FP scalar destination register. Inactive elements in the source vector are treated as the default
NaN.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 size 0 0 0 1 0 1 0 0 1 Pg Zn Vd

FMINNMV <V><d>, <Pg>, <Zn>.<T>

if !HaveSVE() then UNDEFINED;

if size == '00' then UNDEFINED;
integer esize = 8 << UInt(size);
integer g = UInt(Pg);
integer n = UInt(Zn);
integer d = UInt(Vd);

Assembler Symbols

<V> Is a width specifier, encoded in “size”:

size <V>
00 RESERVED
01 H
10 S
11 D

<T> Is the size specifier, encoded in “size”:

size <T>
00 RESERVED
01 H
10 S
11 D

Operation

CheckSVEEnabled();
bits(PL) mask = P[g];
bits(VL) operand = if AnyActiveElement(mask, esize) then Z[n] else Zeros();
bits(esize) identity = FPDefaultNaN();

V[d] = ReducePredicated(ReduceOp_FMINNUM, operand, mask, identity);

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

FMINNMV Page 1888

FMINV

Floating-point minimum recursive reduction to scalar

Floating-point minimum horizontally over all lanes of a vector using a recursive pairwise reduction, and place the
result in the SIMD&FP scalar destination register. Inactive elements in the source vector are treated as +Infinity.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 size 0 0 0 1 1 1 0 0 1 Pg Zn Vd

FMINV <V><d>, <Pg>, <Zn>.<T>

if !HaveSVE() then UNDEFINED;

if size == '00' then UNDEFINED;
integer esize = 8 << UInt(size);
integer g = UInt(Pg);
integer n = UInt(Zn);
integer d = UInt(Vd);

Assembler Symbols

<V> Is a width specifier, encoded in “size”:

size <V>
00 RESERVED
01 H
10 S
11 D

<T> Is the size specifier, encoded in “size”:

size <T>
00 RESERVED
01 H
10 S
11 D

Operation

CheckSVEEnabled();
bits(PL) mask = P[g];
bits(VL) operand = if AnyActiveElement(mask, esize) then Z[n] else Zeros();
bits(esize) identity = FPInfinity('0');

V[d] = ReducePredicated(ReduceOp_FMIN, operand, mask, identity);

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

FMINV Page 1889

FMLA (indexed)

Floating-point fused multiply-add by indexed elements (Zda = Zda + Zn * Zm[indexed])

Multiply all floating-point elements within each 128-bit segment of the first source vector by the specified element in
the corresponding second source vector segment. The products are then destructively added without intermediate
rounding to the corresponding elements of the addend and destination vector.
The elements within the second source vector are specified using an immediate index which selects the same element
position within each 128-bit vector segment. The index range is from 0 to one less than the number of elements per
128-bit segment, encoded in 1 to 3 bits depending on the size of the element. This instruction is unpredicated.
It has encodings from 3 classes: Half-precision , Single-precision and Double-precision

Half-precision

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 0 0 i3h 1 i3l Zm 0 0 0 0 0 0 Zn Zda
op

FMLA <Zda>.H, <Zn>.H, <Zm>.H[<imm>]

if !HaveSVE() then UNDEFINED;

integer esize = 16;
integer index = UInt(i3h:i3l);
integer n = UInt(Zn);
integer m = UInt(Zm);
integer da = UInt(Zda);
boolean op1_neg = FALSE;
boolean op3_neg = FALSE;

Single-precision

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 0 1 0 1 i2 Zm 0 0 0 0 0 0 Zn Zda
size<1>size<0> op

FMLA <Zda>.S, <Zn>.S, <Zm>.S[<imm>]

if !HaveSVE() then UNDEFINED;

integer esize = 32;
integer index = UInt(i2);
integer n = UInt(Zn);
integer m = UInt(Zm);
integer da = UInt(Zda);
boolean op1_neg = FALSE;
boolean op3_neg = FALSE;

Double-precision

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 0 1 1 1 i1 Zm 0 0 0 0 0 0 Zn Zda
size<1>size<0> op

FMLA (indexed) Page 1890

FMLA <Zda>.D, <Zn>.D, <Zm>.D[<imm>]

if !HaveSVE() then UNDEFINED;

integer esize = 64;
integer index = UInt(i1);
integer n = UInt(Zn);
integer m = UInt(Zm);
integer da = UInt(Zda);
boolean op1_neg = FALSE;
boolean op3_neg = FALSE;

Assembler Symbols

<Zda> Is the name of the third source and destination scalable vector register, encoded in the "Zda" field.
<Zn> Is the name of the first source scalable vector register, encoded in the "Zn" field.
<Zm> For the half-precision and single-precision variant: is the name of the second source scalable vector
register Z0-Z7, encoded in the "Zm" field.
For the double-precision variant: is the name of the second source scalable vector register Z0-Z15,
encoded in the "Zm" field.
<imm> For the half-precision variant: is the immediate index, in the range 0 to 7, encoded in the "i3h:i3l" fields.
For the single-precision variant: is the immediate index, in the range 0 to 3, encoded in the "i2" field.
For the double-precision variant: is the immediate index, in the range 0 to 1, encoded in the "i1" field.

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
integer eltspersegment = 128 DIV esize;
bits(VL) operand1 = Z[n];
bits(VL) operand2 = Z[m];
bits(VL) result = Z[da];

for e = 0 to elements-1
integer segmentbase = e - (e MOD eltspersegment);
integer s = segmentbase + index;
bits(esize) element1 = Elem[operand1, e, esize];
bits(esize) element2 = Elem[operand2, s, esize];
bits(esize) element3 = Elem[result, e, esize];
if op1_neg then element1 = FPNeg(element1);
if op3_neg then element3 = FPNeg(element3);
Elem[result, e, esize] = FPMulAdd(element3, element1, element2, FPCR[]);

Z[da] = result;

Operational information

This instruction might be immediately preceded in program order by a MOVPRFX instruction. The MOVPRFX instruction
must conform to all of the following requirements, otherwise the behavior of the MOVPRFX and this instruction is
UNPREDICTABLE:
• The MOVPRFX instruction must be unpredicated.
• The MOVPRFX instruction must specify the same destination register as this instruction.
• The destination register must not refer to architectural register state referenced by any other source operand
register of this instruction.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

FMLA (indexed) Page 1891

FMLA (vectors)

Floating-point fused multiply-add vectors (predicated), writing addend [Zda = Zda + Zn * Zm]

Multiply the corresponding active floating-point elements of the first and second source vectors and add to elements of
the third source (addend) vector without intermediate rounding. Destructively place the results in the destination and
third source (addend) vector. Inactive elements in the destination vector register remain unmodified.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 size 1 Zm 0 0 0 Pg Zn Zda
N op

FMLA <Zda>.<T>, <Pg>/M, <Zn>.<T>, <Zm>.<T>

if !HaveSVE() then UNDEFINED;

if size == '00' then UNDEFINED;
integer esize = 8 << UInt(size);
integer g = UInt(Pg);
integer n = UInt(Zn);
integer m = UInt(Zm);
integer da = UInt(Zda);
boolean op1_neg = FALSE;
boolean op3_neg = FALSE;

Assembler Symbols

<Zda> Is the name of the third source and destination scalable vector register, encoded in the "Zda" field.

<T> Is the size specifier, encoded in “size”:

size <T>
00 RESERVED
01 H
10 S
11 D

Operation

if op1_neg then element1 = FPNeg(element1);

if op3_neg then element3 = FPNeg(element3);
Elem[result, e, esize] = FPMulAdd(element3, element1, element2, FPCR[]);
else
Elem[result, e, esize] = Elem[operand3, e, esize];

Z[da] = result;

FMLA (vectors) Page 1892

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

FMLA (vectors) Page 1893

FMLS (indexed)

Floating-point fused multiply-subtract by indexed elements (Zda = Zda + -Zn * Zm[indexed])

Multiply all floating-point elements within each 128-bit segment of the first source vector by the specified element in
the corresponding second source vector segment. The products are then destructively subtracted without intermediate
rounding from the corresponding elements of the addend and destination vector.
The elements within the second source vector are specified using an immediate index which selects the same element
position within each 128-bit vector segment. The index range is from 0 to one less than the number of elements per
128-bit segment, encoded in 1 to 3 bits depending on the size of the element. This instruction is unpredicated.
It has encodings from 3 classes: Half-precision , Single-precision and Double-precision

Half-precision

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 0 0 i3h 1 i3l Zm 0 0 0 0 0 1 Zn Zda
op

FMLS <Zda>.H, <Zn>.H, <Zm>.H[<imm>]

if !HaveSVE() then UNDEFINED;

integer esize = 16;
integer index = UInt(i3h:i3l);
integer n = UInt(Zn);
integer m = UInt(Zm);
integer da = UInt(Zda);
boolean op1_neg = TRUE;
boolean op3_neg = FALSE;

Single-precision

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 0 1 0 1 i2 Zm 0 0 0 0 0 1 Zn Zda
size<1>size<0> op

FMLS <Zda>.S, <Zn>.S, <Zm>.S[<imm>]

if !HaveSVE() then UNDEFINED;

integer esize = 32;
integer index = UInt(i2);
integer n = UInt(Zn);
integer m = UInt(Zm);
integer da = UInt(Zda);
boolean op1_neg = TRUE;
boolean op3_neg = FALSE;

Double-precision

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 0 1 1 1 i1 Zm 0 0 0 0 0 1 Zn Zda
size<1>size<0> op

FMLS (indexed) Page 1894

FMLS <Zda>.D, <Zn>.D, <Zm>.D[<imm>]

if !HaveSVE() then UNDEFINED;

integer esize = 64;
integer index = UInt(i1);
integer n = UInt(Zn);
integer m = UInt(Zm);
integer da = UInt(Zda);
boolean op1_neg = TRUE;
boolean op3_neg = FALSE;

Assembler Symbols

<Zda> Is the name of the third source and destination scalable vector register, encoded in the "Zda" field.
<Zn> Is the name of the first source scalable vector register, encoded in the "Zn" field.
<Zm> For the half-precision and single-precision variant: is the name of the second source scalable vector
register Z0-Z7, encoded in the "Zm" field.
For the double-precision variant: is the name of the second source scalable vector register Z0-Z15,
encoded in the "Zm" field.
<imm> For the half-precision variant: is the immediate index, in the range 0 to 7, encoded in the "i3h:i3l" fields.
For the single-precision variant: is the immediate index, in the range 0 to 3, encoded in the "i2" field.
For the double-precision variant: is the immediate index, in the range 0 to 1, encoded in the "i1" field.

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
integer eltspersegment = 128 DIV esize;
bits(VL) operand1 = Z[n];
bits(VL) operand2 = Z[m];
bits(VL) result = Z[da];

Z[da] = result;

Operational information

This instruction might be immediately preceded in program order by a MOVPRFX instruction. The MOVPRFX instruction
must conform to all of the following requirements, otherwise the behavior of the MOVPRFX and this instruction is
UNPREDICTABLE:
• The MOVPRFX instruction must be unpredicated.
• The MOVPRFX instruction must specify the same destination register as this instruction.
• The destination register must not refer to architectural register state referenced by any other source operand
register of this instruction.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

FMLS (indexed) Page 1895

FMLS (vectors)

Floating-point fused multiply-subtract vectors (predicated), writing addend [Zda = Zda + -Zn * Zm]

Multiply the corresponding active floating-point elements of the first and second source vectors and subtract from
elements of the third source (addend) vector without intermediate rounding. Destructively place the results in the
destination and third source (addend) vector. Inactive elements in the destination vector register remain unmodified.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 size 1 Zm 0 0 1 Pg Zn Zda
N op

FMLS <Zda>.<T>, <Pg>/M, <Zn>.<T>, <Zm>.<T>

if !HaveSVE() then UNDEFINED;

if size == '00' then UNDEFINED;
integer esize = 8 << UInt(size);
integer g = UInt(Pg);
integer n = UInt(Zn);
integer m = UInt(Zm);
integer da = UInt(Zda);
boolean op1_neg = TRUE;
boolean op3_neg = FALSE;

Assembler Symbols

<Zda> Is the name of the third source and destination scalable vector register, encoded in the "Zda" field.

<T> Is the size specifier, encoded in “size”:

size <T>
00 RESERVED
01 H
10 S
11 D

Operation

if op1_neg then element1 = FPNeg(element1);

if op3_neg then element3 = FPNeg(element3);
Elem[result, e, esize] = FPMulAdd(element3, element1, element2, FPCR[]);
else
Elem[result, e, esize] = Elem[operand3, e, esize];

Z[da] = result;

FMLS (vectors) Page 1896

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

FMLS (vectors) Page 1897

FMMLA

Floating-point matrix multiply-accumulate

The floating-point matrix multiply-accumulate instruction supports single-precision and double-precision data types in
a 2×2 matrix contained in segments of 128 or 256 bits, respectively. It multiplies the 2×2 matrix in each segment of
the first source vector by the 2×2 matrix in the corresponding segment of the second source vector. The resulting 2×2
matrix product is then destructively added to the matrix accumulator held in the corresponding segment of the addend
and destination vector. This is equivalent to performing a 2-way dot product per destination element. This instruction
is unpredicated. The single-precision variant is vector length agnostic. The double-precision variant requires that the
current vector length is at least 256 bits, and if the current vector length is not an integer multiple of 256 bits then the
trailing bits are set to zero.
ID_AA64ZFR0_EL1.F32MM indicates whether the single-precision variant is implemented.
ID_AA64ZFR0_EL1.F64MM indicates whether the double-precision variant is implemented.
It has encodings from 2 classes: 32-bit element and 64-bit element

32-bit element
(FEAT_F32MM)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 0 1 0 1 Zm 1 1 1 0 0 1 Zn Zda

FMMLA <Zda>.S, <Zn>.S, <Zm>.S

if !HaveSVEFP32MatMulExt() then UNDEFINED;

integer esize = 32;
integer n = UInt(Zn);
integer m = UInt(Zm);
integer da = UInt(Zda);

64-bit element
(FEAT_F64MM)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 0 1 1 1 Zm 1 1 1 0 0 1 Zn Zda

FMMLA <Zda>.D, <Zn>.D, <Zm>.D

if !HaveSVEFP64MatMulExt() then UNDEFINED;

integer esize = 64;
integer n = UInt(Zn);
integer m = UInt(Zm);
integer da = UInt(Zda);

Assembler Symbols

FMMLA Page 1898

Operation

CheckSVEEnabled();
if VL < esize * 4 then UNDEFINED;
integer segments = VL DIV (4 * esize);
bits(VL) operand1 = Z[n];
bits(VL) operand2 = Z[m];
bits(VL) operand3 = Z[da];
bits(VL) result = Zeros();
bits(4*esize) op1, op2;
bits(4*esize) res, addend;

for s = 0 to segments-1
op1 = Elem[operand1, s, 4*esize];
op2 = Elem[operand2, s, 4*esize];
addend = Elem[operand3, s, 4*esize];
res = FPMatMulAdd(addend, op1, op2, esize, FPCR[]);
Elem[result, s, 4*esize] = res;

Z[da] = result;

Operational information

This instruction might be immediately preceded in program order by a MOVPRFX instruction. The MOVPRFX instruction
must conform to all of the following requirements, otherwise the behavior of the MOVPRFX and this instruction is
UNPREDICTABLE:
• The MOVPRFX instruction must be unpredicated.
• The MOVPRFX instruction must specify the same destination register as this instruction.
• The destination register must not refer to architectural register state referenced by any other source operand
register of this instruction.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

FMMLA Page 1899

FMOV (immediate, predicated)

Move 8-bit floating-point immediate to vector elements (predicated)

Move a floating-point immediate into each active element in the destination vector. Inactive elements in the
destination vector register remain unmodified.

This is an alias of FCPY. This means:

• The encodings in this description are named to match the encodings of FCPY.
• The description of FCPY gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 size 0 1 Pg 1 1 0 imm8 Zd

FMOV <Zd>.<T>, <Pg>/M, #<const>

is equivalent to

FCPY <Zd>.<T>, <Pg>/M, #<const>

and is always the preferred disassembly.

Assembler Symbols

<Zd> Is the name of the destination scalable vector register, encoded in the "Zd" field.

<T> Is the size specifier, encoded in “size”:

size <T>
00 RESERVED
01 H
10 S
11 D

Operation

The description of FCPY gives the operational pseudocode for this instruction.

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

FMOV (immediate,
Page 1900
predicated)
FMOV (immediate, unpredicated)

Move 8-bit floating-point immediate to vector elements (unpredicated)

Unconditionally broadcast the floating-point immediate into each element of the destination vector. This instruction is
unpredicated.

This is an alias of FDUP. This means:

• The encodings in this description are named to match the encodings of FDUP.
• The description of FDUP gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 size 1 1 1 0 0 1 1 1 0 imm8 Zd

FMOV <Zd>.<T>, #<const>

is equivalent to

FDUP <Zd>.<T>, #<const>

and is always the preferred disassembly.

Assembler Symbols

<Zd> Is the name of the destination scalable vector register, encoded in the "Zd" field.

<T> Is the size specifier, encoded in “size”:

size <T>
00 RESERVED
01 H
10 S
11 D

Operation

The description of FDUP gives the operational pseudocode for this instruction.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

FMOV (immediate,
Page 1901
unpredicated)
FMOV (zero, predicated)

Move floating-point +0.0 to vector elements (predicated)

Move floating-point constant +0.0 to to each active element in the destination vector. Inactive elements in the
destination vector register remain unmodified.

This is a pseudo-instruction of CPY (immediate, merging). This means:

• The encodings in this description are named to match the encodings of CPY (immediate, merging).
• The assembler syntax is used only for assembly, and is not used on disassembly.
• The description of CPY (immediate, merging) gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 size 0 1 Pg 0 1 0 0 0 0 0 0 0 0 0 Zd
M sh imm8

FMOV <Zd>.<T>, <Pg>/M, #0.0

is equivalent to

CPY <Zd>.<T>, <Pg>/M, #0

Assembler Symbols

<Zd> Is the name of the destination scalable vector register, encoded in the "Zd" field.

<T> Is the size specifier, encoded in “size”:

size <T>
00 RESERVED
01 H
10 S
11 D

<Pg> Is the name of the governing scalable predicate register, encoded in the "Pg" field.

Operation

The description of CPY (immediate, merging) gives the operational pseudocode for this instruction.

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

FMOV (zero, predicated) Page 1902

FMOV (zero, unpredicated)

Move floating-point +0.0 to vector elements (unpredicated)

Unconditionally broadcast the floating-point constant +0.0 into each element of the destination vector. This instruction
is unpredicated.

This is a pseudo-instruction of DUP (immediate). This means:

• The encodings in this description are named to match the encodings of DUP (immediate).
• The assembler syntax is used only for assembly, and is not used on disassembly.
• The description of DUP (immediate) gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 size 1 1 1 0 0 0 1 1 0 0 0 0 0 0 0 0 0 Zd
sh imm8

FMOV <Zd>.<T>, #0.0

is equivalent to

DUP <Zd>.<T>, #0

Assembler Symbols

<Zd> Is the name of the destination scalable vector register, encoded in the "Zd" field.

<T> Is the size specifier, encoded in “size”:

size <T>
00 RESERVED
01 H
10 S
11 D

Operation

The description of DUP (immediate) gives the operational pseudocode for this instruction.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

FMOV (zero, unpredicated) Page 1903

FMSB

Floating-point fused multiply-subtract vectors (predicated), writing multiplicand [Zdn = Za + -Zdn * Zm]

Multiply the corresponding active floating-point elements of the first and second source vectors and subtract from
elements of the third (addend) vector without intermediate rounding. Destructively place the results in the destination
and first source (multiplicand) vector. Inactive elements in the destination vector register remain unmodified.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 size 1 Za 1 0 1 Pg Zm Zdn
N op

FMSB <Zdn>.<T>, <Pg>/M, <Zm>.<T>, <Za>.<T>

if !HaveSVE() then UNDEFINED;

if size == '00' then UNDEFINED;
integer esize = 8 << UInt(size);
integer g = UInt(Pg);
integer dn = UInt(Zdn);
integer m = UInt(Zm);
integer a = UInt(Za);
boolean op1_neg = TRUE;
boolean op3_neg = FALSE;

Assembler Symbols

<Zdn> Is the name of the first source and destination scalable vector register, encoded in the "Zdn" field.

<T> Is the size specifier, encoded in “size”:

size <T>
00 RESERVED
01 H
10 S
11 D

Operation

if op1_neg then element1 = FPNeg(element1);

if op3_neg then element3 = FPNeg(element3);
Elem[result, e, esize] = FPMulAdd(element3, element1, element2, FPCR[]);
else
Elem[result, e, esize] = Elem[operand1, e, esize];

Z[dn] = result;

FMSB Page 1904

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

FMSB Page 1905

FMUL (immediate)

Floating-point multiply by immediate (predicated)

Multiply by an immediate each active floating-point element of the source vector, and destructively place the results in
the corresponding elements of the source vector. The immediate may take the value +0.5 or +2.0 only. Inactive
elements in the destination vector register remain unmodified.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 size 0 1 1 0 1 0 1 0 0 Pg 0 0 0 0 i1 Zdn

FMUL <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <const>

if !HaveSVE() then UNDEFINED;

if size == '00' then UNDEFINED;
integer esize = 8 << UInt(size);
integer g = UInt(Pg);
integer dn = UInt(Zdn);
bits(esize) imm = if i1 == '0' then FPPointFive('0') else FPTwo('0');

Assembler Symbols

<Zdn> Is the name of the source and destination scalable vector register, encoded in the "Zdn" field.

<T> Is the size specifier, encoded in “size”:

size <T>
00 RESERVED
01 H
10 S
11 D

<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.

<const> Is the floating-point immediate value, encoded in “i1”:

i1 <const>
0 #0.5
1 #2.0

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) operand1 = Z[dn];
bits(VL) result;

for e = 0 to elements-1
bits(esize) element1 = Elem[operand1, e, esize];
if ElemP[mask, e, esize] == '1' then
Elem[result, e, esize] = FPMul(element1, imm, FPCR[]);
else
Elem[result, e, esize] = element1;

Z[dn] = result;

Operational information

FMUL (immediate) Page 1906

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

FMUL (immediate) Page 1907

FMUL (indexed)

Floating-point multiply by indexed elements

Multiply all floating-point elements within each 128-bit segment of the first source vector by the specified element in
the corresponding second source vector segment. The results are placed in the corresponding elements of the
destination vector.
The elements within the second source vector are specified using an immediate index which selects the same element
position within each 128-bit vector segment. The index range is from 0 to one less than the number of elements per
128-bit segment, encoded in 1 to 3 bits depending on the size of the element. This instruction is unpredicated.
It has encodings from 3 classes: Half-precision , Single-precision and Double-precision

Half-precision

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 0 0 i3h 1 i3l Zm 0 0 1 0 0 0 Zn Zd

FMUL <Zd>.H, <Zn>.H, <Zm>.H[<imm>]

if !HaveSVE() then UNDEFINED;

integer esize = 16;
integer index = UInt(i3h:i3l);
integer n = UInt(Zn);
integer m = UInt(Zm);
integer d = UInt(Zd);

Single-precision

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 0 1 0 1 i2 Zm 0 0 1 0 0 0 Zn Zd
size<1>size<0>

FMUL <Zd>.S, <Zn>.S, <Zm>.S[<imm>]

if !HaveSVE() then UNDEFINED;

integer esize = 32;
integer index = UInt(i2);
integer n = UInt(Zn);
integer m = UInt(Zm);
integer d = UInt(Zd);

Double-precision

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 0 1 1 1 i1 Zm 0 0 1 0 0 0 Zn Zd
size<1>size<0>

FMUL <Zd>.D, <Zn>.D, <Zm>.D[<imm>]

if !HaveSVE() then UNDEFINED;

integer esize = 64;
integer index = UInt(i1);
integer n = UInt(Zn);
integer m = UInt(Zm);
integer d = UInt(Zd);

Assembler Symbols

<Zd> Is the name of the destination scalable vector register, encoded in the "Zd" field.
<Zn> Is the name of the first source scalable vector register, encoded in the "Zn" field.

FMUL (indexed) Page 1908

<Zm> For the half-precision and single-precision variant: is the name of the second source scalable vector
register Z0-Z7, encoded in the "Zm" field.
For the double-precision variant: is the name of the second source scalable vector register Z0-Z15,
encoded in the "Zm" field.
<imm> For the half-precision variant: is the immediate index, in the range 0 to 7, encoded in the "i3h:i3l" fields.
For the single-precision variant: is the immediate index, in the range 0 to 3, encoded in the "i2" field.
For the double-precision variant: is the immediate index, in the range 0 to 1, encoded in the "i1" field.

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
integer eltspersegment = 128 DIV esize;
bits(VL) operand1 = Z[n];
bits(VL) operand2 = Z[m];
bits(VL) result;

Z[d] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

FMUL (indexed) Page 1909

FMUL (vectors, predicated)

Floating-point multiply vectors (predicated)

Multiply active floating-point elements of the first source vector by corresponding floating-point elements of the
second source vector and destructively place the results in the corresponding elements of the first source vector.
Inactive elements in the destination vector register remain unmodified.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 size 0 0 0 0 1 0 1 0 0 Pg Zm Zdn

FMUL <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>

if !HaveSVE() then UNDEFINED;

if size == '00' then UNDEFINED;
integer esize = 8 << UInt(size);
integer g = UInt(Pg);
integer dn = UInt(Zdn);
integer m = UInt(Zm);

Assembler Symbols

<Zdn> Is the name of the first source and destination scalable vector register, encoded in the "Zdn" field.

<T> Is the size specifier, encoded in “size”:

size <T>
00 RESERVED
01 H
10 S
11 D

<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Zm> Is the name of the second source scalable vector register, encoded in the "Zm" field.

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) operand1 = Z[dn];
bits(VL) operand2 = if AnyActiveElement(mask, esize) then Z[m] else Zeros();
bits(VL) result;

for e = 0 to elements-1
bits(esize) element1 = Elem[operand1, e, esize];
if ElemP[mask, e, esize] == '1' then
bits(esize) element2 = Elem[operand2, e, esize];
Elem[result, e, esize] = FPMul(element1, element2, FPCR[]);
else
Elem[result, e, esize] = element1;

Z[dn] = result;

Operational information

FMUL (vectors, predicated) Page 1910

• The destination register must not refer to architectural register state referenced by any other source operand
register of this instruction.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

FMUL (vectors, predicated) Page 1911

FMUL (vectors, unpredicated)

Floating-point multiply vectors (unpredicated)

Multiply all elements of the first source vector by corresponding floating-point elements of the second source vector
and place the results in the corresponding elements of the destination vector. This instruction is unpredicated.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 size 0 Zm 0 0 0 0 1 0 Zn Zd

FMUL <Zd>.<T>, <Zn>.<T>, <Zm>.<T>

if !HaveSVE() then UNDEFINED;

if size == '00' then UNDEFINED;
integer esize = 8 << UInt(size);
integer n = UInt(Zn);
integer m = UInt(Zm);
integer d = UInt(Zd);

Assembler Symbols

<Zd> Is the name of the destination scalable vector register, encoded in the "Zd" field.

<T> Is the size specifier, encoded in “size”:

size <T>
00 RESERVED
01 H
10 S
11 D

<Zn> Is the name of the first source scalable vector register, encoded in the "Zn" field.
<Zm> Is the name of the second source scalable vector register, encoded in the "Zm" field.

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(VL) operand1 = Z[n];
bits(VL) operand2 = Z[m];
bits(VL) result;

for e = 0 to elements-1
bits(esize) element1 = Elem[operand1, e, esize];
bits(esize) element2 = Elem[operand2, e, esize];
Elem[result, e, esize] = FPMul(element1, element2, FPCR[]);

Z[d] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

FMUL (vectors, unpredicated) Page 1912

FMULX

Floating-point multiply-extended vectors (predicated)

Multiply active floating-point elements of the first source vector by corresponding floating-point elements of the
second source vector except that ∞×0.0 gives 2.0 instead of NaN, and destructively place the results in the
corresponding elements of the first source vector. Inactive elements in the destination vector register remain
unmodified.
The instruction can be used with FRECPX to safely convert arbitrary elements in mathematical vector space to UNIT
VECTORS or DIRECTION VECTORS with length 1.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 size 0 0 1 0 1 0 1 0 0 Pg Zm Zdn

FMULX <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>

if !HaveSVE() then UNDEFINED;

if size == '00' then UNDEFINED;
integer esize = 8 << UInt(size);
integer g = UInt(Pg);
integer dn = UInt(Zdn);
integer m = UInt(Zm);

Assembler Symbols

<Zdn> Is the name of the first source and destination scalable vector register, encoded in the "Zdn" field.

<T> Is the size specifier, encoded in “size”:

size <T>
00 RESERVED
01 H
10 S
11 D

<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Zm> Is the name of the second source scalable vector register, encoded in the "Zm" field.

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) operand1 = Z[dn];
bits(VL) operand2 = if AnyActiveElement(mask, esize) then Z[m] else Zeros();
bits(VL) result;

for e = 0 to elements-1
bits(esize) element1 = Elem[operand1, e, esize];
if ElemP[mask, e, esize] == '1' then
bits(esize) element2 = Elem[operand2, e, esize];
Elem[result, e, esize] = FPMulX(element1, element2, FPCR[]);
else
Elem[result, e, esize] = element1;

Z[dn] = result;

Operational information

FMULX Page 1913

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

FMULX Page 1914

FNEG

Floating-point negate (predicated)

Negate each active floating-point element of the source vector, and place the results in the corresponding elements of
the destination vector. This inverts the sign bit and cannot signal a floating-point exception. Inactive elements in the
destination vector register remain unmodified.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 0 1 1 1 0 1 1 0 1 Pg Zn Zd

FNEG <Zd>.<T>, <Pg>/M, <Zn>.<T>

if !HaveSVE() then UNDEFINED;

if size == '00' then UNDEFINED;
integer esize = 8 << UInt(size);
integer g = UInt(Pg);
integer n = UInt(Zn);
integer d = UInt(Zd);

Assembler Symbols

<Zd> Is the name of the destination scalable vector register, encoded in the "Zd" field.

<T> Is the size specifier, encoded in “size”:

size <T>
00 RESERVED
01 H
10 S
11 D

<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Zn> Is the name of the source scalable vector register, encoded in the "Zn" field.

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) operand = if AnyActiveElement(mask, esize) then Z[n] else Zeros();
bits(VL) result = Z[d];

for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
bits(esize) element = Elem[operand, e, esize];
Elem[result, e, esize] = FPNeg(element);

Z[d] = result;

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

FNEG Page 1915

FNMAD

Floating-point negated fused multiply-add vectors (predicated), writing multiplicand [Zdn = -Za + -Zdn * Zm]

Multiply the corresponding active floating-point elements of the first and second source vectors and add to elements of
the third (addend) vector without intermediate rounding. Destructively place the negated results in the destination
and first source (multiplicand) vector. Inactive elements in the destination vector register remain unmodified.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 size 1 Za 1 1 0 Pg Zm Zdn
N op

FNMAD <Zdn>.<T>, <Pg>/M, <Zm>.<T>, <Za>.<T>

if !HaveSVE() then UNDEFINED;

Assembler Symbols

<Zdn> Is the name of the first source and destination scalable vector register, encoded in the "Zdn" field.

<T> Is the size specifier, encoded in “size”:

size <T>
00 RESERVED
01 H
10 S
11 D

Operation

if op1_neg then element1 = FPNeg(element1);

if op3_neg then element3 = FPNeg(element3);
Elem[result, e, esize] = FPMulAdd(element3, element1, element2, FPCR[]);
else
Elem[result, e, esize] = Elem[operand1, e, esize];

Z[dn] = result;

FNMAD Page 1916

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

FNMAD Page 1917

FNMLA

Floating-point negated fused multiply-add vectors (predicated), writing addend [Zda = -Zda + -Zn * Zm]

Multiply the corresponding active floating-point elements of the first and second source vectors and add to elements of
the third source (addend) vector without intermediate rounding. Destructively place the negated results in the
destination and third source (addend) vector. Inactive elements in the destination vector register remain unmodified.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 size 1 Zm 0 1 0 Pg Zn Zda
N op

FNMLA <Zda>.<T>, <Pg>/M, <Zn>.<T>, <Zm>.<T>

if !HaveSVE() then UNDEFINED;

Assembler Symbols

<Zda> Is the name of the third source and destination scalable vector register, encoded in the "Zda" field.

<T> Is the size specifier, encoded in “size”:

size <T>
00 RESERVED
01 H
10 S
11 D

Operation

if op1_neg then element1 = FPNeg(element1);

if op3_neg then element3 = FPNeg(element3);
Elem[result, e, esize] = FPMulAdd(element3, element1, element2, FPCR[]);
else
Elem[result, e, esize] = Elem[operand3, e, esize];

Z[da] = result;

FNMLA Page 1918

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

FNMLA Page 1919

FNMLS

Floating-point negated fused multiply-subtract vectors (predicated), writing addend [Zda = -Zda + Zn * Zm]

Multiply the corresponding active floating-point elements of the first and second source vectors and subtract from
elements of the third source (addend) vector without intermediate rounding. Destructively place the negated results in
the destination and third source (addend) vector. Inactive elements in the destination vector register remain
unmodified.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 size 1 Zm 0 1 1 Pg Zn Zda
N op

FNMLS <Zda>.<T>, <Pg>/M, <Zn>.<T>, <Zm>.<T>

if !HaveSVE() then UNDEFINED;

Assembler Symbols

<Zda> Is the name of the third source and destination scalable vector register, encoded in the "Zda" field.

<T> Is the size specifier, encoded in “size”:

size <T>
00 RESERVED
01 H
10 S
11 D

FNMLS Page 1920

Operation

if op1_neg then element1 = FPNeg(element1);

if op3_neg then element3 = FPNeg(element3);
Elem[result, e, esize] = FPMulAdd(element3, element1, element2, FPCR[]);
else
Elem[result, e, esize] = Elem[operand3, e, esize];

Z[da] = result;

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

FNMLS Page 1921

FNMSB

Floating-point negated fused multiply-subtract vectors (predicated), writing multiplicand [Zdn = -Za + Zdn * Zm]

Multiply the corresponding active floating-point elements of the first and second source vectors and subtract from
elements of the third (addend) vector without intermediate rounding. Destructively place the negated results in the
destination and first source (multiplicand) vector. Inactive elements in the destination vector register remain
unmodified.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 size 1 Za 1 1 1 Pg Zm Zdn
N op

FNMSB <Zdn>.<T>, <Pg>/M, <Zm>.<T>, <Za>.<T>

if !HaveSVE() then UNDEFINED;

Assembler Symbols

<Zdn> Is the name of the first source and destination scalable vector register, encoded in the "Zdn" field.

<T> Is the size specifier, encoded in “size”:

size <T>
00 RESERVED
01 H
10 S
11 D

FNMSB Page 1922

Operation

if op1_neg then element1 = FPNeg(element1);

if op3_neg then element3 = FPNeg(element3);
Elem[result, e, esize] = FPMulAdd(element3, element1, element2, FPCR[]);
else
Elem[result, e, esize] = Elem[operand1, e, esize];

Z[dn] = result;

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

FNMSB Page 1923

FRECPE

Floating-point reciprocal estimate (unpredicated)

Find the approximate reciprocal of each floating-point element of the source vector, and place the results in the
corresponding elements of the destination vector. This instruction is unpredicated.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 size 0 0 1 1 1 0 0 0 1 1 0 0 Zn Zd

FRECPE <Zd>.<T>, <Zn>.<T>

if !HaveSVE() then UNDEFINED;

if size == '00' then UNDEFINED;
integer esize = 8 << UInt(size);
integer n = UInt(Zn);
integer d = UInt(Zd);

Assembler Symbols

<Zd> Is the name of the destination scalable vector register, encoded in the "Zd" field.

<T> Is the size specifier, encoded in “size”:

size <T>
00 RESERVED
01 H
10 S
11 D

<Zn> Is the name of the source scalable vector register, encoded in the "Zn" field.

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(VL) operand = Z[n];
bits(VL) result;

for e = 0 to elements-1
bits(esize) element = Elem[operand, e, esize];
Elem[result, e, esize] = FPRecipEstimate(element, FPCR[]);

Z[d] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

FRECPE Page 1924

FRECPS

Floating-point reciprocal step (unpredicated)

Multiply corresponding floating-point elements of the first and second source vectors, subtract the products from 2.0
without intermediate rounding and place the results in the corresponding elements of the destination vector. This
instruction is unpredicated.
This instruction can be used to perform a single Newton-Raphson iteration for calculating the reciprocal of a vector of
floating-point values.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 size 0 Zm 0 0 0 1 1 0 Zn Zd

FRECPS <Zd>.<T>, <Zn>.<T>, <Zm>.<T>

if !HaveSVE() then UNDEFINED;

if size == '00' then UNDEFINED;
integer esize = 8 << UInt(size);
integer n = UInt(Zn);
integer m = UInt(Zm);
integer d = UInt(Zd);

Assembler Symbols

<Zd> Is the name of the destination scalable vector register, encoded in the "Zd" field.

<T> Is the size specifier, encoded in “size”:

size <T>
00 RESERVED
01 H
10 S
11 D

<Zn> Is the name of the first source scalable vector register, encoded in the "Zn" field.
<Zm> Is the name of the second source scalable vector register, encoded in the "Zm" field.

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(VL) operand1 = Z[n];
bits(VL) operand2 = Z[m];
bits(VL) result;

for e = 0 to elements-1
bits(esize) element1 = Elem[operand1, e, esize];
bits(esize) element2 = Elem[operand2, e, esize];
Elem[result, e, esize] = FPRecipStepFused(element1, element2);

Z[d] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

FRECPS Page 1925

FRECPX

Floating-point reciprocal exponent (predicated)

Invert the exponent and zero the fractional part of each active floating-point element of the source vector, and place
the results in the corresponding elements of the destination vector. Inactive elements in the destination vector register
remain unmodified.
The result of this instruction can be used with FMULX to convert arbitrary elements in mathematical vector space to
"unit vectors" or "direction vectors" of length 1.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 size 0 0 1 1 0 0 1 0 1 Pg Zn Zd

FRECPX <Zd>.<T>, <Pg>/M, <Zn>.<T>

if !HaveSVE() then UNDEFINED;

if size == '00' then UNDEFINED;
integer esize = 8 << UInt(size);
integer g = UInt(Pg);
integer n = UInt(Zn);
integer d = UInt(Zd);

Assembler Symbols

<Zd> Is the name of the destination scalable vector register, encoded in the "Zd" field.

<T> Is the size specifier, encoded in “size”:

size <T>
00 RESERVED
01 H
10 S
11 D

<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Zn> Is the name of the source scalable vector register, encoded in the "Zn" field.

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) operand = if AnyActiveElement(mask, esize) then Z[n] else Zeros();
bits(VL) result = Z[d];

for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
bits(esize) element = Elem[operand, e, esize];
Elem[result, e, esize] = FPRecpX(element, FPCR[]);

Z[d] = result;

Operational information

FRECPX Page 1926

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

FRECPX Page 1927

FRINT<r>

Floating-point round to integral value (predicated)

Round to an integral floating-point value with the specified rounding option from each active floating-point element of
the source vector, and place the results in the corresponding elements of the destination vector. Inactive elements in
the destination vector register remain unmodified.
The <r> symbol specifies one of the following rounding options: N (to nearest, with ties to even), A (to nearest, with
ties away from zero), M (toward minus Infinity), P (toward plus Infinity), Z (toward zero), I (current FPCR rounding
mode), or X (current FPCR rounding mode, signalling inexact).
It has encodings from 7 classes: Current mode , Current mode signalling inexact , Nearest with ties to away , Nearest
with ties to even , Toward zero , Toward minus infinity and Toward plus infinity

Current mode

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 size 0 0 0 1 1 1 1 0 1 Pg Zn Zd

FRINTI <Zd>.<T>, <Pg>/M, <Zn>.<T>

if !HaveSVE() then UNDEFINED;

if size == '00' then UNDEFINED;
integer esize = 8 << UInt(size);
integer g = UInt(Pg);
integer n = UInt(Zn);
integer d = UInt(Zd);
boolean exact = FALSE;
FPRounding rounding = FPRoundingMode(FPCR[]);

Current mode signalling inexact

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 size 0 0 0 1 1 0 1 0 1 Pg Zn Zd

FRINTX <Zd>.<T>, <Pg>/M, <Zn>.<T>

if !HaveSVE() then UNDEFINED;

if size == '00' then UNDEFINED;
integer esize = 8 << UInt(size);
integer g = UInt(Pg);
integer n = UInt(Zn);
integer d = UInt(Zd);
boolean exact = TRUE;
FPRounding rounding = FPRoundingMode(FPCR[]);

Nearest with ties to away

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 size 0 0 0 1 0 0 1 0 1 Pg Zn Zd

FRINTA <Zd>.<T>, <Pg>/M, <Zn>.<T>

if !HaveSVE() then UNDEFINED;

if size == '00' then UNDEFINED;
integer esize = 8 << UInt(size);
integer g = UInt(Pg);
integer n = UInt(Zn);
integer d = UInt(Zd);
boolean exact = FALSE;
FPRounding rounding = FPRounding_TIEAWAY;

FRINT<r> Page 1928

Nearest with ties to even

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 size 0 0 0 0 0 0 1 0 1 Pg Zn Zd

FRINTN <Zd>.<T>, <Pg>/M, <Zn>.<T>

if !HaveSVE() then UNDEFINED;

if size == '00' then UNDEFINED;
integer esize = 8 << UInt(size);
integer g = UInt(Pg);
integer n = UInt(Zn);
integer d = UInt(Zd);
boolean exact = FALSE;
FPRounding rounding = FPRounding_TIEEVEN;

Toward zero

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 size 0 0 0 0 1 1 1 0 1 Pg Zn Zd

FRINTZ <Zd>.<T>, <Pg>/M, <Zn>.<T>

if !HaveSVE() then UNDEFINED;

if size == '00' then UNDEFINED;
integer esize = 8 << UInt(size);
integer g = UInt(Pg);
integer n = UInt(Zn);
integer d = UInt(Zd);
boolean exact = FALSE;
FPRounding rounding = FPRounding_ZERO;

Toward minus infinity

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 size 0 0 0 0 1 0 1 0 1 Pg Zn Zd

FRINTM <Zd>.<T>, <Pg>/M, <Zn>.<T>

if !HaveSVE() then UNDEFINED;

if size == '00' then UNDEFINED;
integer esize = 8 << UInt(size);
integer g = UInt(Pg);
integer n = UInt(Zn);
integer d = UInt(Zd);
boolean exact = FALSE;
FPRounding rounding = FPRounding_NEGINF;

Toward plus infinity

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 size 0 0 0 0 0 1 1 0 1 Pg Zn Zd

FRINTP <Zd>.<T>, <Pg>/M, <Zn>.<T>

if !HaveSVE() then UNDEFINED;

if size == '00' then UNDEFINED;
integer esize = 8 << UInt(size);
integer g = UInt(Pg);
integer n = UInt(Zn);
integer d = UInt(Zd);
boolean exact = FALSE;
FPRounding rounding = FPRounding_POSINF;

FRINT<r> Page 1929

Assembler Symbols

<Zd> Is the name of the destination scalable vector register, encoded in the "Zd" field.

<T> Is the size specifier, encoded in “size”:

size <T>
00 RESERVED
01 H
10 S
11 D

<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Zn> Is the name of the source scalable vector register, encoded in the "Zn" field.

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) operand = if AnyActiveElement(mask, esize) then Z[n] else Zeros();
bits(VL) result = Z[d];

for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
bits(esize) element = Elem[operand, e, esize];
Elem[result, e, esize] = FPRoundInt(element, FPCR[], rounding, exact);

Z[d] = result;

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

FRINT<r> Page 1930

FRSQRTE

Floating-point reciprocal square root estimate (unpredicated)

Find the approximate reciprocal square root of each active floating-point element of the source vector, and place the
results in the corresponding elements of the destination vector. This instruction is unpredicated.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 size 0 0 1 1 1 1 0 0 1 1 0 0 Zn Zd

FRSQRTE <Zd>.<T>, <Zn>.<T>

if !HaveSVE() then UNDEFINED;

if size == '00' then UNDEFINED;
integer esize = 8 << UInt(size);
integer n = UInt(Zn);
integer d = UInt(Zd);

Assembler Symbols

<Zd> Is the name of the destination scalable vector register, encoded in the "Zd" field.

<T> Is the size specifier, encoded in “size”:

size <T>
00 RESERVED
01 H
10 S
11 D

<Zn> Is the name of the source scalable vector register, encoded in the "Zn" field.

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(VL) operand = Z[n];
bits(VL) result;

for e = 0 to elements-1
bits(esize) element = Elem[operand, e, esize];
Elem[result, e, esize] = FPRSqrtEstimate(element, FPCR[]);

Z[d] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

FRSQRTE Page 1931

FRSQRTS

Floating-point reciprocal square root step (unpredicated)

Multiply corresponding floating-point elements of the first and second source vectors, subtract the products from 3.0
and divide the results by 2.0 without any intermediate rounding and place the results in the corresponding elements of
the destination vector. This instruction is unpredicated.
This instruction can be used to perform a single Newton-Raphson iteration for calculating the reciprocal square root of
a vector of floating-point values.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 size 0 Zm 0 0 0 1 1 1 Zn Zd

FRSQRTS <Zd>.<T>, <Zn>.<T>, <Zm>.<T>

if !HaveSVE() then UNDEFINED;

if size == '00' then UNDEFINED;
integer esize = 8 << UInt(size);
integer n = UInt(Zn);
integer m = UInt(Zm);
integer d = UInt(Zd);

Assembler Symbols

<Zd> Is the name of the destination scalable vector register, encoded in the "Zd" field.

<T> Is the size specifier, encoded in “size”:

size <T>
00 RESERVED
01 H
10 S
11 D

<Zn> Is the name of the first source scalable vector register, encoded in the "Zn" field.
<Zm> Is the name of the second source scalable vector register, encoded in the "Zm" field.

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(VL) operand1 = Z[n];
bits(VL) operand2 = Z[m];
bits(VL) result;

for e = 0 to elements-1
bits(esize) element1 = Elem[operand1, e, esize];
bits(esize) element2 = Elem[operand2, e, esize];
Elem[result, e, esize] = FPRSqrtStepFused(element1, element2);

Z[d] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

FRSQRTS Page 1932

FSCALE

Floating-point adjust exponent by vector (predicated)

Multiply the active floating-point elements of the first source vector by 2.0 to the power of the signed integer values in
the corresponding elements of the second source vector and destructively place the results in the corresponding
elements of the first source vector. Inactive elements in the destination vector register remain unmodified.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 size 0 0 1 0 0 1 1 0 0 Pg Zm Zdn

FSCALE <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>

if !HaveSVE() then UNDEFINED;

if size == '00' then UNDEFINED;
integer esize = 8 << UInt(size);
integer g = UInt(Pg);
integer dn = UInt(Zdn);
integer m = UInt(Zm);

Assembler Symbols

<Zdn> Is the name of the first source and destination scalable vector register, encoded in the "Zdn" field.

<T> Is the size specifier, encoded in “size”:

size <T>
00 RESERVED
01 H
10 S
11 D

<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Zm> Is the name of the second source scalable vector register, encoded in the "Zm" field.

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) operand1 = Z[dn];
bits(VL) operand2 = if AnyActiveElement(mask, esize) then Z[m] else Zeros();
bits(VL) result;

for e = 0 to elements-1
bits(esize) element1 = Elem[operand1, e, esize];
if ElemP[mask, e, esize] == '1' then
integer element2 = SInt(Elem[operand2, e, esize]);
Elem[result, e, esize] = FPScale(element1, element2, FPCR[]);
else
Elem[result, e, esize] = element1;

Z[dn] = result;

Operational information

FSCALE Page 1933

• The destination register must not refer to architectural register state referenced by any other source operand
register of this instruction.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

FSCALE Page 1934

FSQRT

Floating-point square root (predicated)

Calculate the square root of each active floating-point element of the source vector, and place the results in the
corresponding elements of the destination vector. Inactive elements in the destination vector register remain
unmodified.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 size 0 0 1 1 0 1 1 0 1 Pg Zn Zd

FSQRT <Zd>.<T>, <Pg>/M, <Zn>.<T>

if !HaveSVE() then UNDEFINED;

if size == '00' then UNDEFINED;
integer esize = 8 << UInt(size);
integer g = UInt(Pg);
integer n = UInt(Zn);
integer d = UInt(Zd);

Assembler Symbols

<Zd> Is the name of the destination scalable vector register, encoded in the "Zd" field.

<T> Is the size specifier, encoded in “size”:

size <T>
00 RESERVED
01 H
10 S
11 D

<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Zn> Is the name of the source scalable vector register, encoded in the "Zn" field.

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) operand = if AnyActiveElement(mask, esize) then Z[n] else Zeros();
bits(VL) result = Z[d];

for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
bits(esize) element = Elem[operand, e, esize];
Elem[result, e, esize] = FPSqrt(element, FPCR[]);

Z[d] = result;

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

FSQRT Page 1935

FSUB (immediate)

Floating-point subtract immediate (predicated)

Subtract an immediate from each active floating-point element of the source vector, and destructively place the results
in the corresponding elements of the source vector. The immediate may take the value +0.5 or +1.0 only. Inactive
elements in the destination vector register remain unmodified.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 size 0 1 1 0 0 1 1 0 0 Pg 0 0 0 0 i1 Zdn

FSUB <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <const>

if !HaveSVE() then UNDEFINED;

if size == '00' then UNDEFINED;
integer esize = 8 << UInt(size);
integer g = UInt(Pg);
integer dn = UInt(Zdn);
bits(esize) imm = if i1 == '0' then FPPointFive('0') else FPOne('0');

Assembler Symbols

<Zdn> Is the name of the source and destination scalable vector register, encoded in the "Zdn" field.

<T> Is the size specifier, encoded in “size”:

size <T>
00 RESERVED
01 H
10 S
11 D

<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.

<const> Is the floating-point immediate value, encoded in “i1”:

i1 <const>
0 #0.5
1 #1.0

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) operand1 = Z[dn];
bits(VL) result;

for e = 0 to elements-1
bits(esize) element1 = Elem[operand1, e, esize];
if ElemP[mask, e, esize] == '1' then
Elem[result, e, esize] = FPSub(element1, imm, FPCR[]);
else
Elem[result, e, esize] = element1;

Z[dn] = result;

Operational information

FSUB (immediate) Page 1936

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

FSUB (immediate) Page 1937

FSUB (vectors, predicated)

Floating-point subtract vectors (predicated)

Subtract active floating-point elements of the second source vector from corresponding floating-point elements of the
first source vector and destructively place the results in the corresponding elements of the first source vector. Inactive
elements in the destination vector register remain unmodified.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 size 0 0 0 0 0 1 1 0 0 Pg Zm Zdn

FSUB <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>

if !HaveSVE() then UNDEFINED;

if size == '00' then UNDEFINED;
integer esize = 8 << UInt(size);
integer g = UInt(Pg);
integer dn = UInt(Zdn);
integer m = UInt(Zm);

Assembler Symbols

<Zdn> Is the name of the first source and destination scalable vector register, encoded in the "Zdn" field.

<T> Is the size specifier, encoded in “size”:

size <T>
00 RESERVED
01 H
10 S
11 D

<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Zm> Is the name of the second source scalable vector register, encoded in the "Zm" field.

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) operand1 = Z[dn];
bits(VL) operand2 = if AnyActiveElement(mask, esize) then Z[m] else Zeros();
bits(VL) result;

for e = 0 to elements-1
bits(esize) element1 = Elem[operand1, e, esize];
if ElemP[mask, e, esize] == '1' then
bits(esize) element2 = Elem[operand2, e, esize];
Elem[result, e, esize] = FPSub(element1, element2, FPCR[]);
else
Elem[result, e, esize] = element1;

Z[dn] = result;

Operational information

FSUB (vectors, predicated) Page 1938

• The destination register must not refer to architectural register state referenced by any other source operand
register of this instruction.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

FSUB (vectors, predicated) Page 1939

FSUB (vectors, unpredicated)

Floating-point subtract vectors (unpredicated)

Subtract all floating-point elements of the second source vector from corresponding elements of the first source vector
and place the results in the corresponding elements of the destination vector. This instruction is unpredicated.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 size 0 Zm 0 0 0 0 0 1 Zn Zd

FSUB <Zd>.<T>, <Zn>.<T>, <Zm>.<T>

if !HaveSVE() then UNDEFINED;

if size == '00' then UNDEFINED;
integer esize = 8 << UInt(size);
integer n = UInt(Zn);
integer m = UInt(Zm);
integer d = UInt(Zd);

Assembler Symbols

<Zd> Is the name of the destination scalable vector register, encoded in the "Zd" field.

<T> Is the size specifier, encoded in “size”:

size <T>
00 RESERVED
01 H
10 S
11 D

<Zn> Is the name of the first source scalable vector register, encoded in the "Zn" field.
<Zm> Is the name of the second source scalable vector register, encoded in the "Zm" field.

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(VL) operand1 = Z[n];
bits(VL) operand2 = Z[m];
bits(VL) result;

for e = 0 to elements-1
bits(esize) element1 = Elem[operand1, e, esize];
bits(esize) element2 = Elem[operand2, e, esize];
Elem[result, e, esize] = FPSub(element1, element2, FPCR[]);

Z[d] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

FSUB (vectors, unpredicated) Page 1940

FSUBR (immediate)

Floating-point reversed subtract from immediate (predicated)

Reversed subtract from an immediate each active floating-point element of the source vector, and destructively place
the results in the corresponding elements of the source vector. The immediate may take the value +0.5 or +1.0 only.
Inactive elements in the destination vector register remain unmodified.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 size 0 1 1 0 1 1 1 0 0 Pg 0 0 0 0 i1 Zdn

FSUBR <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <const>

if !HaveSVE() then UNDEFINED;

if size == '00' then UNDEFINED;
integer esize = 8 << UInt(size);
integer g = UInt(Pg);
integer dn = UInt(Zdn);
bits(esize) imm = if i1 == '0' then FPPointFive('0') else FPOne('0');

Assembler Symbols

<Zdn> Is the name of the source and destination scalable vector register, encoded in the "Zdn" field.

<T> Is the size specifier, encoded in “size”:

size <T>
00 RESERVED
01 H
10 S
11 D

<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.

<const> Is the floating-point immediate value, encoded in “i1”:

i1 <const>
0 #0.5
1 #1.0

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) operand1 = Z[dn];
bits(VL) result;

for e = 0 to elements-1
bits(esize) element1 = Elem[operand1, e, esize];
if ElemP[mask, e, esize] == '1' then
Elem[result, e, esize] = FPSub(imm, element1, FPCR[]);
else
Elem[result, e, esize] = element1;

Z[dn] = result;

Operational information

FSUBR (immediate) Page 1941

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

FSUBR (immediate) Page 1942

FSUBR (vectors)

Floating-point reversed subtract vectors (predicated)

Reversed subtract active floating-point elements of the first source vector from corresponding floating-point elements
of the second source vector and destructively place the results in the corresponding elements of the first source
vector. Inactive elements in the destination vector register remain unmodified.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 size 0 0 0 0 1 1 1 0 0 Pg Zm Zdn

FSUBR <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>

if !HaveSVE() then UNDEFINED;

if size == '00' then UNDEFINED;
integer esize = 8 << UInt(size);
integer g = UInt(Pg);
integer dn = UInt(Zdn);
integer m = UInt(Zm);

Assembler Symbols

<Zdn> Is the name of the first source and destination scalable vector register, encoded in the "Zdn" field.

<T> Is the size specifier, encoded in “size”:

size <T>
00 RESERVED
01 H
10 S
11 D

<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Zm> Is the name of the second source scalable vector register, encoded in the "Zm" field.

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) operand1 = Z[dn];
bits(VL) operand2 = if AnyActiveElement(mask, esize) then Z[m] else Zeros();
bits(VL) result;

for e = 0 to elements-1
bits(esize) element1 = Elem[operand1, e, esize];
if ElemP[mask, e, esize] == '1' then
bits(esize) element2 = Elem[operand2, e, esize];
Elem[result, e, esize] = FPSub(element2, element1, FPCR[]);
else
Elem[result, e, esize] = element1;

Z[dn] = result;

Operational information

FSUBR (vectors) Page 1943

• The destination register must not refer to architectural register state referenced by any other source operand
register of this instruction.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

FSUBR (vectors) Page 1944

FTMAD

Floating-point trigonometric multiply-add coefficient

The FTMAD instruction calculates the series terms for either SIN(X) or COS(X), where the argument X has been adjusted
to be in the range -π/4 < X ≤ π/4.
To calculate the series terms of SIN(X) and COS(X) the initial source operands of FTMAD should be zero in the first source
vector and X2 in the second source vector. The FTMAD instruction is then executed eight times to calculate the sum of
eight series terms, which gives a result of sufficient precision.
The FTMAD instruction multiplies each element of the first source vector by the absolute value of the corresponding
element of the second source vector and performs a fused addition of each product with a value obtained from a table
of hard-wired coefficients, and places the results destructively in the first source vector.
The coefficients are different for SIN(X) and COS(X), and are selected by a combination of the sign bit in the second
source element and an immediate index in the range 0 to 7.
This instruction is unpredicated.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 size 0 1 0 imm3 1 0 0 0 0 0 Zm Zdn

FTMAD <Zdn>.<T>, <Zdn>.<T>, <Zm>.<T>, #<imm>

if !HaveSVE() then UNDEFINED;

if size == '00' then UNDEFINED;
integer esize = 8 << UInt(size);
integer dn = UInt(Zdn);
integer m = UInt(Zm);
integer imm = UInt(imm3);

Assembler Symbols

<Zdn> Is the name of the first source and destination scalable vector register, encoded in the "Zdn" field.

<T> Is the size specifier, encoded in “size”:

size <T>
00 RESERVED
01 H
10 S
11 D

<Zm> Is the name of the second source scalable vector register, encoded in the "Zm" field.
<imm> Is the unsigned immediate operand, in the range 0 to 7, encoded in the "imm3" field.

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(VL) operand1 = Z[dn];
bits(VL) operand2 = Z[m];
bits(VL) result;

for e = 0 to elements-1
bits(esize) element1 = Elem[operand1, e, esize];
bits(esize) element2 = Elem[operand2, e, esize];
Elem[result, e, esize] = FPTrigMAdd(imm, element1, element2, FPCR[]);

Z[dn] = result;

Operational information

FTMAD Page 1945

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

FTMAD Page 1946

FTSMUL

Floating-point trigonometric starting value

The FTSMUL instruction calculates the initial value for the FTMAD instruction. The instruction squares each element in
the first source vector and then sets the sign bit to a copy of bit 0 of the corresponding element in the second source
register, and places the results in the destination vector. This instruction is unpredicated.
To compute SIN(X) or COS(X) the instruction is executed with elements of the first source vector set to X, adjusted to be
in the range -π/4 < X ≤ π/4.
The elements of the second source vector hold the corresponding value of the quadrant Q number as an integer not a
floating-point value. The value Q satisfies the relationship (2q-1) × π/4 < X ≤ (2q+1) × π/4.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 size 0 Zm 0 0 0 0 1 1 Zn Zd

FTSMUL <Zd>.<T>, <Zn>.<T>, <Zm>.<T>

if !HaveSVE() then UNDEFINED;

if size == '00' then UNDEFINED;
integer esize = 8 << UInt(size);
integer n = UInt(Zn);
integer m = UInt(Zm);
integer d = UInt(Zd);

Assembler Symbols

<Zd> Is the name of the destination scalable vector register, encoded in the "Zd" field.

<T> Is the size specifier, encoded in “size”:

size <T>
00 RESERVED
01 H
10 S
11 D

<Zn> Is the name of the first source scalable vector register, encoded in the "Zn" field.
<Zm> Is the name of the second source scalable vector register, encoded in the "Zm" field.

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(VL) operand1 = Z[n];
bits(VL) operand2 = Z[m];
bits(VL) result;

for e = 0 to elements-1
bits(esize) element1 = Elem[operand1, e, esize];
bits(esize) element2 = Elem[operand2, e, esize];
Elem[result, e, esize] = FPTrigSMul(element1, element2, FPCR[]);

Z[d] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

FTSMUL Page 1947

FTSSEL

Floating-point trigonometric select coefficient

The FTSSEL instruction selects the coefficient for the final multiplication in the polynomial series approximation. The
instruction places the value 1.0 or a copy of the first source vector element in the destination element, depending on
bit 0 of the quadrant number Q held in the corresponding element of the second source vector. The sign bit of the
destination element is copied from bit 1 of the corresponding value of Q. This instruction is unpredicated.
To compute SIN(X) or COS(X) the instruction is executed with elements of the first source vector set to X, adjusted to be
in the range -π/4 < X ≤ π/4.
The elements of the second source vector hold the corresponding value of the quadrant Q number as an integer not a
floating-point value. The value Q satisfies the relationship (2q-1) × π/4 < X ≤ (2q+1) × π/4.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 1 Zm 1 0 1 1 0 0 Zn Zd

FTSSEL <Zd>.<T>, <Zn>.<T>, <Zm>.<T>

if !HaveSVE() then UNDEFINED;

if size == '00' then UNDEFINED;
integer esize = 8 << UInt(size);
integer n = UInt(Zn);
integer m = UInt(Zm);
integer d = UInt(Zd);

Assembler Symbols

<Zd> Is the name of the destination scalable vector register, encoded in the "Zd" field.

<T> Is the size specifier, encoded in “size”:

size <T>
00 RESERVED
01 H
10 S
11 D

<Zn> Is the name of the first source scalable vector register, encoded in the "Zn" field.
<Zm> Is the name of the second source scalable vector register, encoded in the "Zm" field.

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(VL) operand1 = Z[n];
bits(VL) operand2 = Z[m];
bits(VL) result;

for e = 0 to elements-1
bits(esize) element1 = Elem[operand1, e, esize];
bits(esize) element2 = Elem[operand2, e, esize];
Elem[result, e, esize] = FPTrigSSel(element1, element2);

Z[d] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

FTSSEL Page 1948

INCB, INCD, INCH, INCW (scalar)

Increment scalar by multiple of predicate constraint element count

Determines the number of active elements implied by the named predicate constraint, multiplies that by an immediate
in the range 1 to 16 inclusive, and then uses the result to increment the scalar destination.
The named predicate constraint limits the number of active elements in a single predicate to:
* A fixed number (VL1 to VL256)
* The largest power of two (POW2)
* The largest multiple of three or four (MUL3 or MUL4)
* All available, implicitly a multiple of two (ALL).
Unspecified or out of range constraint encodings generate an empty predicate or zero element count rather than
Undefined Instruction exception.
It has encodings from 4 classes: Byte , Doubleword , Halfword and Word

Byte

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 0 0 1 1 imm4 1 1 1 0 0 0 pattern Rdn
size<1>size<0> D

INCB <Xdn>{, <pattern>{, MUL #<imm>}}

if !HaveSVE() then UNDEFINED;

integer esize = 8;
integer dn = UInt(Rdn);
bits(5) pat = pattern;
integer imm = UInt(imm4) + 1;

Doubleword

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 1 1 1 1 imm4 1 1 1 0 0 0 pattern Rdn
size<1>size<0> D

INCD <Xdn>{, <pattern>{, MUL #<imm>}}

if !HaveSVE() then UNDEFINED;

integer esize = 64;
integer dn = UInt(Rdn);
bits(5) pat = pattern;
integer imm = UInt(imm4) + 1;

Halfword

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 0 1 1 1 imm4 1 1 1 0 0 0 pattern Rdn
size<1>size<0> D

INCH <Xdn>{, <pattern>{, MUL #<imm>}}

if !HaveSVE() then UNDEFINED;

integer esize = 16;
integer dn = UInt(Rdn);
bits(5) pat = pattern;
integer imm = UInt(imm4) + 1;

INCB, INCD, INCH, INCW

Page 1949
(scalar)
Word

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 1 0 1 1 imm4 1 1 1 0 0 0 pattern Rdn
size<1>size<0> D

INCW <Xdn>{, <pattern>{, MUL #<imm>}}

if !HaveSVE() then UNDEFINED;

integer esize = 32;
integer dn = UInt(Rdn);
bits(5) pat = pattern;
integer imm = UInt(imm4) + 1;

Assembler Symbols

<Xdn> Is the 64-bit name of the source and destination general-purpose register, encoded in the "Rdn" field.

<pattern> Is the optional pattern specifier, defaulting to ALL, encoded in “pattern”:

<imm> Is the immediate multiplier, in the range 1 to 16, defaulting to 1, encoded in the "imm4" field.

Operation

CheckSVEEnabled();
integer count = DecodePredCount(pat, esize);
bits(64) operand1 = X[dn];

X[dn] = operand1 + (count * imm);

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

INCB, INCD, INCH, INCW

Page 1950
(scalar)
INCD, INCH, INCW (vector)

Increment vector by multiple of predicate constraint element count

Determines the number of active elements implied by the named predicate constraint, multiplies that by an immediate
in the range 1 to 16 inclusive, and then uses the result to increment all destination vector elements.
The named predicate constraint limits the number of active elements in a single predicate to:
* A fixed number (VL1 to VL256)
* The largest power of two (POW2)
* The largest multiple of three or four (MUL3 or MUL4)
* All available, implicitly a multiple of two (ALL).
Unspecified or out of range constraint encodings generate an empty predicate or zero element count rather than
Undefined Instruction exception.
It has encodings from 3 classes: Doubleword , Halfword and Word

Doubleword

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 1 1 1 1 imm4 1 1 0 0 0 0 pattern Zdn
size<1>size<0> D

INCD <Zdn>.D{, <pattern>{, MUL #<imm>}}

if !HaveSVE() then UNDEFINED;

integer esize = 64;
integer dn = UInt(Zdn);
bits(5) pat = pattern;
integer imm = UInt(imm4) + 1;

Halfword

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 0 1 1 1 imm4 1 1 0 0 0 0 pattern Zdn
size<1>size<0> D

INCH <Zdn>.H{, <pattern>{, MUL #<imm>}}

if !HaveSVE() then UNDEFINED;

integer esize = 16;
integer dn = UInt(Zdn);
bits(5) pat = pattern;
integer imm = UInt(imm4) + 1;

Word

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 1 0 1 1 imm4 1 1 0 0 0 0 pattern Zdn
size<1>size<0> D

INCW <Zdn>.S{, <pattern>{, MUL #<imm>}}

if !HaveSVE() then UNDEFINED;

integer esize = 32;
integer dn = UInt(Zdn);
bits(5) pat = pattern;
integer imm = UInt(imm4) + 1;

INCD, INCH, INCW (vector) Page 1951

Assembler Symbols

<Zdn> Is the name of the source and destination scalable vector register, encoded in the "Zdn" field.

<pattern> Is the optional pattern specifier, defaulting to ALL, encoded in “pattern”:

<imm> Is the immediate multiplier, in the range 1 to 16, defaulting to 1, encoded in the "imm4" field.

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
integer count = DecodePredCount(pat, esize);
bits(VL) operand1 = Z[dn];
bits(VL) result;

for e = 0 to elements-1
Elem[result, e, esize] = Elem[operand1, e, esize] + (count * imm);

Z[dn] = result;

Operational information

This instruction might be immediately preceded in program order by a MOVPRFX instruction. The MOVPRFX instruction
must conform to all of the following requirements, otherwise the behavior of the MOVPRFX and this instruction is
UNPREDICTABLE:
• The MOVPRFX instruction must be unpredicated.
• The MOVPRFX instruction must specify the same destination register as this instruction.
• The destination register must not refer to architectural register state referenced by any other source operand
register of this instruction.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

INCD, INCH, INCW (vector) Page 1952

INCP (scalar)

Increment scalar by count of true predicate elements

Counts the number of true elements in the source predicate and then uses the result to increment the scalar
destination.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 size 1 0 1 1 0 0 1 0 0 0 1 0 0 Pm Rdn
D

INCP <Xdn>, <Pm>.<T>

if !HaveSVE() then UNDEFINED;

integer esize = 8 << UInt(size);
integer m = UInt(Pm);
integer dn = UInt(Rdn);

Assembler Symbols

<Xdn> Is the 64-bit name of the source and destination general-purpose register, encoded in the "Rdn" field.
<Pm> Is the name of the source scalable predicate register, encoded in the "Pm" field.

<T> Is the size specifier, encoded in “size”:

size <T>
00 B
01 H
10 S
11 D

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(64) operand1 = X[dn];
bits(PL) operand2 = P[m];
integer count = 0;

for e = 0 to elements-1
if ElemP[operand2, e, esize] == '1' then
count = count + 1;

X[dn] = operand1 + count;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

INCP (scalar) Page 1953

INCP (vector)

Increment vector by count of true predicate elements

Counts the number of true elements in the source predicate and then uses the result to increment all destination
vector elements.
The predicate size specifier may be omitted in assembler source code, but this is deprecated and will be prohibited in
a future release of the architecture.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 size 1 0 1 1 0 0 1 0 0 0 0 0 0 Pm Zdn
D

INCP <Zdn>.<T>, <Pm>.<T>

if !HaveSVE() then UNDEFINED;

if size == '00' then UNDEFINED;
integer esize = 8 << UInt(size);
integer m = UInt(Pm);
integer dn = UInt(Zdn);

Assembler Symbols

<Zdn> Is the name of the source and destination scalable vector register, encoded in the "Zdn" field.

<T> Is the size specifier, encoded in “size”:

size <T>
00 RESERVED
01 H
10 S
11 D

<Pm> Is the name of the source scalable predicate register, encoded in the "Pm" field.

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(VL) operand1 = Z[dn];
bits(PL) operand2 = P[m];
bits(VL) result;
integer count = 0;

for e = 0 to elements-1
if ElemP[operand2, e, esize] == '1' then
count = count + 1;

for e = 0 to elements-1
Elem[result, e, esize] = Elem[operand1, e, esize] + count;

Z[dn] = result;

Operational information

This instruction might be immediately preceded in program order by a MOVPRFX instruction. The MOVPRFX instruction
must conform to all of the following requirements, otherwise the behavior of the MOVPRFX and this instruction is
UNPREDICTABLE:
• The MOVPRFX instruction must be unpredicated.
• The MOVPRFX instruction must specify the same destination register as this instruction.
• The destination register must not refer to architectural register state referenced by any other source operand
register of this instruction.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

INCP (vector) Page 1954

INCP (vector) Page 1955

INDEX (immediate, scalar)

Create index starting from immediate and incremented by general-purpose register

Populates the destination vector by setting the first element to the first signed immediate integer operand and
monotonically incrementing the value by the second signed scalar integer operand for each subsequent element. The
scalar source operand is a general-purpose register in which only the least significant bits corresponding to the vector
element size are used and any remaining bits are ignored. This instruction is unpredicated.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 1 Rm 0 1 0 0 1 0 imm5 Zd

INDEX <Zd>.<T>, #<imm>, <R><m>

if !HaveSVE() then UNDEFINED;

integer esize = 8 << UInt(size);
integer m = UInt(Rm);
integer d = UInt(Zd);
integer imm = SInt(imm5);

Assembler Symbols

<Zd> Is the name of the destination scalable vector register, encoded in the "Zd" field.

<T> Is the size specifier, encoded in “size”:

size <T>
00 B
01 H
10 S
11 D

<imm> Is the signed immediate operand, in the range -16 to 15, encoded in the "imm5" field.

<R> Is a width specifier, encoded in “size”:

size <R>
01 W
x0 W
11 X

<m> Is the number [0-30] of the source general-purpose register or the name ZR (31), encoded in the "Rm"
field.

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(esize) operand2 = X[m];
integer element2 = SInt(operand2);
bits(VL) result;

for e = 0 to elements-1
integer index = imm + e * element2;
Elem[result, e, esize] = index<esize-1:0>;

Z[d] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

INDEX (immediate, scalar) Page 1956

INDEX (immediates)

Create index starting from and incremented by immediate

Populates the destination vector by setting the first element to the first signed immediate integer operand and
monotonically incrementing the value by the second signed immediate integer operand for each subsequent element.
This instruction is unpredicated.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 1 imm5b 0 1 0 0 0 0 imm5 Zd

INDEX <Zd>.<T>, #<imm1>, #<imm2>

if !HaveSVE() then UNDEFINED;

integer esize = 8 << UInt(size);
integer d = UInt(Zd);
integer imm1 = SInt(imm5);
integer imm2 = SInt(imm5b);

Assembler Symbols

<Zd> Is the name of the destination scalable vector register, encoded in the "Zd" field.

<T> Is the size specifier, encoded in “size”:

size <T>
00 B
01 H
10 S
11 D

<imm1> Is the first signed immediate operand, in the range -16 to 15, encoded in the "imm5" field.
<imm2> Is the second signed immediate operand, in the range -16 to 15, encoded in the "imm5b" field.

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(VL) result;

for e = 0 to elements-1
integer index = imm1 + e * imm2;
Elem[result, e, esize] = index<esize-1:0>;

Z[d] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

INDEX (immediates) Page 1957

INDEX (scalar, immediate)

Create index starting from general-purpose register and incremented by immediate

Populates the destination vector by setting the first element to the first signed scalar integer operand and
monotonically incrementing the value by the second signed immediate integer operand for each subsequent element.
The scalar source operand is a general-purpose register in which only the least significant bits corresponding to the
vector element size are used and any remaining bits are ignored. This instruction is unpredicated.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 1 imm5 0 1 0 0 0 1 Rn Zd

INDEX <Zd>.<T>, <R><n>, #<imm>

if !HaveSVE() then UNDEFINED;

integer esize = 8 << UInt(size);
integer n = UInt(Rn);
integer d = UInt(Zd);
integer imm = SInt(imm5);

Assembler Symbols

<Zd> Is the name of the destination scalable vector register, encoded in the "Zd" field.

<T> Is the size specifier, encoded in “size”:

size <T>
00 B
01 H
10 S
11 D

<R> Is a width specifier, encoded in “size”:

size <R>
01 W
x0 W
11 X

<n> Is the number [0-30] of the source general-purpose register or the name ZR (31), encoded in the "Rn"
field.
<imm> Is the signed immediate operand, in the range -16 to 15, encoded in the "imm5" field.

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(esize) operand1 = X[n];
integer element1 = SInt(operand1);
bits(VL) result;

for e = 0 to elements-1
integer index = element1 + e * imm;
Elem[result, e, esize] = index<esize-1:0>;

Z[d] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

INDEX (scalar, immediate) Page 1958

INDEX (scalars)

Create index starting from and incremented by general-purpose register

Populates the destination vector by setting the first element to the first signed scalar integer operand and
monotonically incrementing the value by the second signed scalar integer operand for each subsequent element. The
scalar source operands are general-purpose registers in which only the least significant bits corresponding to the
vector element size are used and any remaining bits are ignored. This instruction is unpredicated.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 1 Rm 0 1 0 0 1 1 Rn Zd

INDEX <Zd>.<T>, <R><n>, <R><m>

if !HaveSVE() then UNDEFINED;

integer esize = 8 << UInt(size);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer d = UInt(Zd);

Assembler Symbols

<Zd> Is the name of the destination scalable vector register, encoded in the "Zd" field.

<T> Is the size specifier, encoded in “size”:

size <T>
00 B
01 H
10 S
11 D

<R> Is a width specifier, encoded in “size”:

size <R>
01 W
x0 W
11 X

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(esize) operand1 = X[n];
integer element1 = SInt(operand1);
bits(esize) operand2 = X[m];
integer element2 = SInt(operand2);
bits(VL) result;

for e = 0 to elements-1
integer index = element1 + e * element2;
Elem[result, e, esize] = index<esize-1:0>;

Z[d] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

INDEX (scalars) Page 1959

INSR (scalar)

Insert general-purpose register in shifted vector

Shift the destination vector left by one element, and then place a copy of the least-significant bits of the general-
purpose register in element 0 of the destination vector. This instruction is unpredicated.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 size 1 0 0 1 0 0 0 0 1 1 1 0 Rm Zdn

INSR <Zdn>.<T>, <R><m>

if !HaveSVE() then UNDEFINED;

integer esize = 8 << UInt(size);
integer dn = UInt(Zdn);
integer m = UInt(Rm);

Assembler Symbols

<Zdn> Is the name of the source and destination scalable vector register, encoded in the "Zdn" field.

<T> Is the size specifier, encoded in “size”:

size <T>
00 B
01 H
10 S
11 D

<R> Is a width specifier, encoded in “size”:

size <R>
01 W
x0 W
11 X

<m> Is the number [0-30] of the source general-purpose register or the name ZR (31), encoded in the "Rm"
field.

Operation

CheckSVEEnabled();
bits(VL) dest = Z[dn];
bits(esize) src = X[m];
Z[dn] = dest<(VL-esize)-1:0> : src;

Operational information

This instruction might be immediately preceded in program order by a MOVPRFX instruction. The MOVPRFX instruction
must conform to all of the following requirements, otherwise the behavior of the MOVPRFX and this instruction is
UNPREDICTABLE:
• The MOVPRFX instruction must be unpredicated.
• The MOVPRFX instruction must specify the same destination register as this instruction.
• The destination register must not refer to architectural register state referenced by any other source operand
register of this instruction.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

INSR (scalar) Page 1960

INSR (SIMD&FP scalar)

Insert SIMD&FP scalar register in shifted vector

Shift the destination vector left by one element, and then place a copy of the SIMD&FP scalar register in element 0 of
the destination vector. This instruction is unpredicated.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 size 1 1 0 1 0 0 0 0 1 1 1 0 Vm Zdn

INSR <Zdn>.<T>, <V><m>

if !HaveSVE() then UNDEFINED;

integer esize = 8 << UInt(size);
integer dn = UInt(Zdn);
integer m = UInt(Vm);

Assembler Symbols

<Zdn> Is the name of the source and destination scalable vector register, encoded in the "Zdn" field.

<T> Is the size specifier, encoded in “size”:

size <T>
00 B
01 H
10 S
11 D

<V> Is a width specifier, encoded in “size”:

size <V>
00 B
01 H
10 S
11 D

<m> Is the number [0-31] of the source SIMD&FP register, encoded in the "Vm" field.

Operation

CheckSVEEnabled();
bits(VL) dest = Z[dn];
bits(esize) src = V[m];
Z[dn] = dest<(VL-esize)-1:0> : src;

Operational information

This instruction might be immediately preceded in program order by a MOVPRFX instruction. The MOVPRFX instruction
must conform to all of the following requirements, otherwise the behavior of the MOVPRFX and this instruction is
UNPREDICTABLE:
• The MOVPRFX instruction must be unpredicated.
• The MOVPRFX instruction must specify the same destination register as this instruction.
• The destination register must not refer to architectural register state referenced by any other source operand
register of this instruction.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

INSR (SIMD&FP scalar) Page 1961

LASTA (scalar)

Extract element after last to general-purpose register

If there is an active element then extract the element after the last active element modulo the number of elements
from the final source vector register. If there are no active elements, extract element zero. Then zero-extend and place
the extracted element in the destination general-purpose register.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 size 1 0 0 0 0 0 1 0 1 Pg Zn Rd
B

LASTA <R><d>, <Pg>, <Zn>.<T>

if !HaveSVE() then UNDEFINED;

integer esize = 8 << UInt(size);
integer rsize = if esize < 64 then 32 else 64;
integer g = UInt(Pg);
integer n = UInt(Zn);
integer d = UInt(Rd);
boolean isBefore = FALSE;

Assembler Symbols

<R> Is a width specifier, encoded in “size”:

size <R>
01 W
x0 W
11 X

<d> Is the number [0-30] of the destination general-purpose register or the name ZR (31), encoded in the
"Rd" field.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Zn> Is the name of the source scalable vector register, encoded in the "Zn" field.

<T> Is the size specifier, encoded in “size”:

size <T>
00 B
01 H
10 S
11 D

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) operand = Z[n];
bits(rsize) result;
integer last = LastActiveElement(mask, esize);

if isBefore then
if last < 0 then last = elements - 1;
else
last = last + 1;
if last >= elements then last = 0;
result = ZeroExtend(Elem[operand, last, esize]);

X[d] = result;

LASTA (scalar) Page 1962

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

LASTA (scalar) Page 1963

LASTA (SIMD&FP scalar)

Extract element after last to SIMD&FP scalar register

If there is an active element then extract the element after the last active element modulo the number of elements
from the final source vector register. If there are no active elements, extract element zero. Then place the extracted
element in the destination SIMD&FP scalar register.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 size 1 0 0 0 1 0 1 0 0 Pg Zn Vd
B

LASTA <V><d>, <Pg>, <Zn>.<T>

if !HaveSVE() then UNDEFINED;

integer esize = 8 << UInt(size);
integer g = UInt(Pg);
integer n = UInt(Zn);
integer d = UInt(Vd);
boolean isBefore = FALSE;

Assembler Symbols

<V> Is a width specifier, encoded in “size”:

size <V>
00 B
01 H
10 S
11 D

<T> Is the size specifier, encoded in “size”:

size <T>
00 B
01 H
10 S
11 D

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) operand = Z[n];
integer last = LastActiveElement(mask, esize);

if isBefore then
if last < 0 then last = elements - 1;
else
last = last + 1;
if last >= elements then last = 0;
V[d] = Elem[operand, last, esize];

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

LASTA (SIMD&FP scalar) Page 1964

LASTB (scalar)

Extract last element to general-purpose register

If there is an active element then extract the last active element from the final source vector register. If there are no
active elements, extract the highest-numbered element. Then zero-extend and place the extracted element in the
destination general-purpose register.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 size 1 0 0 0 0 1 1 0 1 Pg Zn Rd
B

LASTB <R><d>, <Pg>, <Zn>.<T>

if !HaveSVE() then UNDEFINED;

integer esize = 8 << UInt(size);
integer rsize = if esize < 64 then 32 else 64;
integer g = UInt(Pg);
integer n = UInt(Zn);
integer d = UInt(Rd);
boolean isBefore = TRUE;

Assembler Symbols

<R> Is a width specifier, encoded in “size”:

size <R>
01 W
x0 W
11 X

<T> Is the size specifier, encoded in “size”:

size <T>
00 B
01 H
10 S
11 D

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) operand = Z[n];
bits(rsize) result;
integer last = LastActiveElement(mask, esize);

if isBefore then
if last < 0 then last = elements - 1;
else
last = last + 1;
if last >= elements then last = 0;
result = ZeroExtend(Elem[operand, last, esize]);

X[d] = result;

LASTB (scalar) Page 1965

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

LASTB (scalar) Page 1966

LASTB (SIMD&FP scalar)

Extract last element to SIMD&FP scalar register

If there is an active element then extract the last active element from the final source vector register. If there are no
active elements, extract the highest-numbered element. Then place the extracted element in the destination SIMD&FP
register.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 size 1 0 0 0 1 1 1 0 0 Pg Zn Vd
B

LASTB <V><d>, <Pg>, <Zn>.<T>

if !HaveSVE() then UNDEFINED;

integer esize = 8 << UInt(size);
integer g = UInt(Pg);
integer n = UInt(Zn);
integer d = UInt(Vd);
boolean isBefore = TRUE;

Assembler Symbols

<V> Is a width specifier, encoded in “size”:

size <V>
00 B
01 H
10 S
11 D

<T> Is the size specifier, encoded in “size”:

size <T>
00 B
01 H
10 S
11 D

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) operand = Z[n];
integer last = LastActiveElement(mask, esize);

if isBefore then
if last < 0 then last = elements - 1;
else
last = last + 1;
if last >= elements then last = 0;
V[d] = Elem[operand, last, esize];

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

LASTB (SIMD&FP scalar) Page 1967

LD1B (scalar plus immediate)

Contiguous load unsigned bytes to vector (immediate index)

Contiguous load of unsigned bytes to elements of a vector register from the memory address generated by a 64-bit
scalar base and immediate index in the range -8 to 7 which is multiplied by the vector's in-memory size, irrespective of
predication, and added to the base address. Inactive elements will not not cause a read from Device memory or signal
a fault, and are set to zero in the destination vector.
It has encodings from 4 classes: 8-bit element , 16-bit element , 32-bit element and 64-bit element

8-bit element

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 0 0 0 0 0 imm4 1 0 1 Pg Rn Zt
dtype<3:1>dtype<0>

LD1B { <Zt>.B }, <Pg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]

if !HaveSVE() then UNDEFINED;

integer t = UInt(Zt);
integer n = UInt(Rn);
integer g = UInt(Pg);
integer esize = 8;
integer msize = 8;
boolean unsigned = TRUE;
integer offset = SInt(imm4);

16-bit element

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 0 0 0 1 0 imm4 1 0 1 Pg Rn Zt
dtype<3:1>dtype<0>

LD1B { <Zt>.H }, <Pg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]

if !HaveSVE() then UNDEFINED;

integer t = UInt(Zt);
integer n = UInt(Rn);
integer g = UInt(Pg);
integer esize = 16;
integer msize = 8;
boolean unsigned = TRUE;
integer offset = SInt(imm4);

32-bit element

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 0 0 1 0 0 imm4 1 0 1 Pg Rn Zt
dtype<3:1>dtype<0>

LD1B { <Zt>.S }, <Pg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]

if !HaveSVE() then UNDEFINED;

integer t = UInt(Zt);
integer n = UInt(Rn);
integer g = UInt(Pg);
integer esize = 32;
integer msize = 8;
boolean unsigned = TRUE;
integer offset = SInt(imm4);

LD1B (scalar plus immediate) Page 1968

64-bit element

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 0 0 1 1 0 imm4 1 0 1 Pg Rn Zt
dtype<3:1>dtype<0>

LD1B { <Zt>.D }, <Pg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]

if !HaveSVE() then UNDEFINED;

integer t = UInt(Zt);
integer n = UInt(Rn);
integer g = UInt(Pg);
integer esize = 64;
integer msize = 8;
boolean unsigned = TRUE;
integer offset = SInt(imm4);

Assembler Symbols

<Zt> Is the name of the scalable vector register to be transferred, encoded in the "Zt" field.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<imm> Is the optional signed immediate vector offset, in the range -8 to 7, defaulting to 0, encoded in the
"imm4" field.

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(64) base;
bits(PL) mask = P[g];
bits(VL) result;
bits(msize) data;
constant integer mbytes = msize DIV 8;

if HaveMTEExt() then SetTagCheckedInstruction(n != 31);

if !AnyActiveElement(mask, esize) then

if n == 31 && ConstrainUnpredictableBool(Unpredictable_CHECKSPNONEACTIVE) then
CheckSPAlignment();
else
if n == 31 then CheckSPAlignment();
base = if n == 31 then SP[] else X[n];

for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
integer eoff = (offset * elements) + e;
bits(64) addr = base + eoff * mbytes;
data = Mem[addr, mbytes, AccType_SVE];
Elem[result, e, esize] = Extend(data, esize, unsigned);
else
Elem[result, e, esize] = Zeros();

Z[t] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

LD1B (scalar plus immediate) Page 1969

LD1B (scalar plus scalar)

Contiguous load unsigned bytes to vector (scalar index)

Contiguous load of unsigned bytes to elements of a vector register from the memory address generated by a 64-bit
scalar base and scalar index which is added to the base address. After each element access the index value is
incremented, but the index register is not updated. Inactive elements will not not cause a read from Device memory or
signal a fault, and are set to zero in the destination vector.
It has encodings from 4 classes: 8-bit element , 16-bit element , 32-bit element and 64-bit element

8-bit element

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 0 0 0 0 Rm 0 1 0 Pg Rn Zt
dtype<3:1>dtype<0>

LD1B { <Zt>.B }, <Pg>/Z, [<Xn|SP>, <Xm>]

if !HaveSVE() then UNDEFINED;

if Rm == '11111' then UNDEFINED;
integer t = UInt(Zt);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer g = UInt(Pg);
integer esize = 8;
integer msize = 8;
boolean unsigned = TRUE;

16-bit element

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 0 0 0 1 Rm 0 1 0 Pg Rn Zt
dtype<3:1>dtype<0>

LD1B { <Zt>.H }, <Pg>/Z, [<Xn|SP>, <Xm>]

if !HaveSVE() then UNDEFINED;

if Rm == '11111' then UNDEFINED;
integer t = UInt(Zt);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer g = UInt(Pg);
integer esize = 16;
integer msize = 8;
boolean unsigned = TRUE;

32-bit element

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 0 0 1 0 Rm 0 1 0 Pg Rn Zt
dtype<3:1>dtype<0>

LD1B (scalar plus scalar) Page 1970

LD1B { <Zt>.S }, <Pg>/Z, [<Xn|SP>, <Xm>]

if !HaveSVE() then UNDEFINED;

if Rm == '11111' then UNDEFINED;
integer t = UInt(Zt);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer g = UInt(Pg);
integer esize = 32;
integer msize = 8;
boolean unsigned = TRUE;

64-bit element

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 0 0 1 1 Rm 0 1 0 Pg Rn Zt
dtype<3:1>dtype<0>

LD1B { <Zt>.D }, <Pg>/Z, [<Xn|SP>, <Xm>]

if !HaveSVE() then UNDEFINED;

if Rm == '11111' then UNDEFINED;
integer t = UInt(Zt);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer g = UInt(Pg);
integer esize = 64;
integer msize = 8;
boolean unsigned = TRUE;

Assembler Symbols

<Zt> Is the name of the scalable vector register to be transferred, encoded in the "Zt" field.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<Xm> Is the 64-bit name of the general-purpose offset register, encoded in the "Rm" field.

LD1B (scalar plus scalar) Page 1971

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(64) base;
bits(PL) mask = P[g];
bits(VL) result;
bits(msize) data;
bits(64) offset;
constant integer mbytes = msize DIV 8;

if HaveMTEExt() then SetTagCheckedInstruction(TRUE);

if !AnyActiveElement(mask, esize) then

if n == 31 && ConstrainUnpredictableBool(Unpredictable_CHECKSPNONEACTIVE) then
CheckSPAlignment();
else
if n == 31 then CheckSPAlignment();
base = if n == 31 then SP[] else X[n];
offset = X[m];

for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
bits(64) addr = base + (UInt(offset) + e) * mbytes;
data = Mem[addr, mbytes, AccType_SVE];
Elem[result, e, esize] = Extend(data, esize, unsigned);
else
Elem[result, e, esize] = Zeros();

Z[t] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

LD1B (scalar plus scalar) Page 1972

LD1B (scalar plus vector)

Gather load unsigned bytes to vector (vector index)

Gather load of unsigned bytes to active elements of a vector register from memory addresses generated by a 64-bit
scalar base plus vector index. The index values are optionally sign or zero-extended from 32 to 64 bits. Inactive
elements will not cause a read from Device memory or signal faults, and are set to zero in the destination vector.
It has encodings from 3 classes: 32-bit unpacked unscaled offset , 32-bit unscaled offset and 64-bit unscaled offset

32-bit unpacked unscaled offset

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 0 1 0 0 0 xs 0 Zm 0 1 0 Pg Rn Zt
msz<1>msz<0> U ff

LD1B { <Zt>.D }, <Pg>/Z, [<Xn|SP>, <Zm>.D, <mod>]

if !HaveSVE() then UNDEFINED;

integer t = UInt(Zt);
integer n = UInt(Rn);
integer m = UInt(Zm);
integer g = UInt(Pg);
integer esize = 64;
integer msize = 8;
integer offs_size = 32;
boolean unsigned = TRUE;
boolean offs_unsigned = xs == '0';
integer scale = 0;

32-bit unscaled offset

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 0 0 0 1 0 0 0 xs 0 Zm 0 1 0 Pg Rn Zt
U ff

LD1B { <Zt>.S }, <Pg>/Z, [<Xn|SP>, <Zm>.S, <mod>]

if !HaveSVE() then UNDEFINED;

integer t = UInt(Zt);
integer n = UInt(Rn);
integer m = UInt(Zm);
integer g = UInt(Pg);
integer esize = 32;
integer msize = 8;
integer offs_size = 32;
boolean unsigned = TRUE;
boolean offs_unsigned = xs == '0';
integer scale = 0;

64-bit unscaled offset

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 0 1 0 0 0 1 0 Zm 1 1 0 Pg Rn Zt
msz<1>msz<0> U ff

LD1B (scalar plus vector) Page 1973

LD1B { <Zt>.D }, <Pg>/Z, [<Xn|SP>, <Zm>.D]

if !HaveSVE() then UNDEFINED;

integer t = UInt(Zt);
integer n = UInt(Rn);
integer m = UInt(Zm);
integer g = UInt(Pg);
integer esize = 64;
integer msize = 8;
integer offs_size = 64;
boolean unsigned = TRUE;
boolean offs_unsigned = TRUE;
integer scale = 0;

Assembler Symbols

<Zt> Is the name of the scalable vector register to be transferred, encoded in the "Zt" field.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<Zm> Is the name of the offset scalable vector register, encoded in the "Zm" field.

<mod> Is the index extend and shift specifier, encoded in “xs”:

xs <mod>
0 UXTW
1 SXTW

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(64) base;
bits(PL) mask = P[g];
bits(VL) offset;
bits(VL) result;
bits(msize) data;
constant integer mbytes = msize DIV 8;

if HaveMTEExt() then SetTagCheckedInstruction(TRUE);

if !AnyActiveElement(mask, esize) then

if n == 31 && ConstrainUnpredictableBool(Unpredictable_CHECKSPNONEACTIVE) then
CheckSPAlignment();
else
if n == 31 then CheckSPAlignment();
base = if n == 31 then SP[] else X[n];
offset = Z[m];

for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
integer off = Int(Elem[offset, e, esize]<offs_size-1:0>, offs_unsigned);
bits(64) addr = base + (off << scale);
data = Mem[addr, mbytes, AccType_SVE];
Elem[result, e, esize] = Extend(data, esize, unsigned);
else
Elem[result, e, esize] = Zeros();

Z[t] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

LD1B (scalar plus vector) Page 1974

LD1B (vector plus immediate)

Gather load unsigned bytes to vector (immediate index)

Gather load of unsigned bytes to active elements of a vector register from memory addresses generated by a vector
base plus immediate index. The index is in the range 0 to 31. Inactive elements will not cause a read from Device
memory or signal faults, and are set to zero in the destination vector.
It has encodings from 2 classes: 32-bit element and 64-bit element

32-bit element

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 0 0 0 1 0 0 0 0 1 imm5 1 1 0 Pg Zn Zt
msz<1>msz<0> U ff

LD1B { <Zt>.S }, <Pg>/Z, [<Zn>.S{, #<imm>}]

if !HaveSVE() then UNDEFINED;

integer t = UInt(Zt);
integer n = UInt(Zn);
integer g = UInt(Pg);
integer esize = 32;
integer msize = 8;
boolean unsigned = TRUE;
integer offset = UInt(imm5);

64-bit element

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 0 1 0 0 0 0 1 imm5 1 1 0 Pg Zn Zt
msz<1>msz<0> U ff

LD1B { <Zt>.D }, <Pg>/Z, [<Zn>.D{, #<imm>}]

if !HaveSVE() then UNDEFINED;

integer t = UInt(Zt);
integer n = UInt(Zn);
integer g = UInt(Pg);
integer esize = 64;
integer msize = 8;
boolean unsigned = TRUE;
integer offset = UInt(imm5);

Assembler Symbols

<Zt> Is the name of the scalable vector register to be transferred, encoded in the "Zt" field.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Zn> Is the name of the base scalable vector register, encoded in the "Zn" field.
<imm> Is the optional unsigned immediate byte offset, in the range 0 to 31, defaulting to 0, encoded in the
"imm5" field.

LD1B (vector plus immediate) Page 1975

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) base;
bits(VL) result;
bits(msize) data;
constant integer mbytes = msize DIV 8;

if HaveMTEExt() then SetTagCheckedInstruction(TRUE);

if AnyActiveElement(mask, esize) then

base = Z[n];

for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
bits(64) addr = ZeroExtend(Elem[base, e, esize], 64) + offset * mbytes;
data = Mem[addr, mbytes, AccType_SVE];
Elem[result, e, esize] = Extend(data, esize, unsigned);
else
Elem[result, e, esize] = Zeros();

Z[t] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

LD1B (vector plus immediate) Page 1976

LD1D (scalar plus immediate)

Contiguous load doublewords to vector (immediate index)

Contiguous load of doublewords to elements of a vector register from the memory address generated by a 64-bit
scalar base and immediate index in the range -8 to 7 which is multiplied by the vector's in-memory size, irrespective of
predication, and added to the base address. Inactive elements will not not cause a read from Device memory or signal
a fault, and are set to zero in the destination vector.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 1 1 1 1 0 imm4 1 0 1 Pg Rn Zt
dtype<3:1>dtype<0>

LD1D { <Zt>.D }, <Pg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]

if !HaveSVE() then UNDEFINED;

integer t = UInt(Zt);
integer n = UInt(Rn);
integer g = UInt(Pg);
integer esize = 64;
integer msize = 64;
boolean unsigned = TRUE;
integer offset = SInt(imm4);

Assembler Symbols

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(64) base;
bits(PL) mask = P[g];
bits(VL) result;
bits(msize) data;
constant integer mbytes = msize DIV 8;

if HaveMTEExt() then SetTagCheckedInstruction(n != 31);

if !AnyActiveElement(mask, esize) then

if n == 31 && ConstrainUnpredictableBool(Unpredictable_CHECKSPNONEACTIVE) then
CheckSPAlignment();
else
if n == 31 then CheckSPAlignment();
base = if n == 31 then SP[] else X[n];

Z[t] = result;

LD1D (scalar plus immediate) Page 1977

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

LD1D (scalar plus immediate) Page 1978

LD1D (scalar plus scalar)

Contiguous load doublewords to vector (scalar index)

Contiguous load of doublewords to elements of a vector register from the memory address generated by a 64-bit
scalar base and scalar index which is multiplied by 8 and added to the base address. After each element access the
index value is incremented, but the index register is not updated. Inactive elements will not not cause a read from
Device memory or signal a fault, and are set to zero in the destination vector.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 1 1 1 1 Rm 0 1 0 Pg Rn Zt
dtype<3:1>dtype<0>

LD1D { <Zt>.D }, <Pg>/Z, [<Xn|SP>, <Xm>, LSL #3]

if !HaveSVE() then UNDEFINED;

if Rm == '11111' then UNDEFINED;
integer t = UInt(Zt);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer g = UInt(Pg);
integer esize = 64;
integer msize = 64;
boolean unsigned = TRUE;

Assembler Symbols

<Zt> Is the name of the scalable vector register to be transferred, encoded in the "Zt" field.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<Xm> Is the 64-bit name of the general-purpose offset register, encoded in the "Rm" field.

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(64) base;
bits(PL) mask = P[g];
bits(VL) result;
bits(msize) data;
bits(64) offset;
constant integer mbytes = msize DIV 8;

if HaveMTEExt() then SetTagCheckedInstruction(TRUE);

if !AnyActiveElement(mask, esize) then

if n == 31 && ConstrainUnpredictableBool(Unpredictable_CHECKSPNONEACTIVE) then
CheckSPAlignment();
else
if n == 31 then CheckSPAlignment();
base = if n == 31 then SP[] else X[n];
offset = X[m];

Z[t] = result;

LD1D (scalar plus scalar) Page 1979

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

LD1D (scalar plus scalar) Page 1980

LD1D (scalar plus vector)

Gather load doublewords to vector (vector index)

Gather load of doublewords to active elements of a vector register from memory addresses generated by a 64-bit
scalar base plus vector index. The index values are optionally first sign or zero-extended from 32 to 64 bits and then
optionally multiplied by 8. Inactive elements will not cause a read from Device memory or signal faults, and are set to
zero in the destination vector.
It has encodings from 4 classes: 32-bit unpacked scaled offset , 32-bit unpacked unscaled offset , 64-bit scaled offset
and 64-bit unscaled offset

32-bit unpacked scaled offset

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 0 1 0 1 1 xs 1 Zm 0 1 0 Pg Rn Zt
U ff

LD1D { <Zt>.D }, <Pg>/Z, [<Xn|SP>, <Zm>.D, <mod> #3]

if !HaveSVE() then UNDEFINED;

integer t = UInt(Zt);
integer n = UInt(Rn);
integer m = UInt(Zm);
integer g = UInt(Pg);
integer esize = 64;
integer msize = 64;
integer offs_size = 32;
boolean unsigned = TRUE;
boolean offs_unsigned = xs == '0';
integer scale = 3;

32-bit unpacked unscaled offset

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 0 1 0 1 1 xs 0 Zm 0 1 0 Pg Rn Zt
msz<1>msz<0> U ff

LD1D { <Zt>.D }, <Pg>/Z, [<Xn|SP>, <Zm>.D, <mod>]

if !HaveSVE() then UNDEFINED;

64-bit scaled offset

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 0 1 0 1 1 1 1 Zm 1 1 0 Pg Rn Zt
U ff

LD1D (scalar plus vector) Page 1981

LD1D { <Zt>.D }, <Pg>/Z, [<Xn|SP>, <Zm>.D, LSL #3]

if !HaveSVE() then UNDEFINED;

integer t = UInt(Zt);
integer n = UInt(Rn);
integer m = UInt(Zm);
integer g = UInt(Pg);
integer esize = 64;
integer msize = 64;
integer offs_size = 64;
boolean unsigned = TRUE;
boolean offs_unsigned = TRUE;
integer scale = 3;

64-bit unscaled offset

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 0 1 0 1 1 1 0 Zm 1 1 0 Pg Rn Zt
msz<1>msz<0> U ff

LD1D { <Zt>.D }, <Pg>/Z, [<Xn|SP>, <Zm>.D]

if !HaveSVE() then UNDEFINED;

Assembler Symbols

<Zt> Is the name of the scalable vector register to be transferred, encoded in the "Zt" field.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<Zm> Is the name of the offset scalable vector register, encoded in the "Zm" field.

<mod> Is the index extend and shift specifier, encoded in “xs”:

xs <mod>
0 UXTW
1 SXTW

LD1D (scalar plus vector) Page 1982

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(64) base;
bits(PL) mask = P[g];
bits(VL) offset;
bits(VL) result;
bits(msize) data;
constant integer mbytes = msize DIV 8;

if HaveMTEExt() then SetTagCheckedInstruction(TRUE);

if !AnyActiveElement(mask, esize) then

if n == 31 && ConstrainUnpredictableBool(Unpredictable_CHECKSPNONEACTIVE) then
CheckSPAlignment();
else
if n == 31 then CheckSPAlignment();
base = if n == 31 then SP[] else X[n];
offset = Z[m];

Z[t] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

LD1D (scalar plus vector) Page 1983

LD1D (vector plus immediate)

Gather load doublewords to vector (immediate index)

Gather load of doublewords to active elements of a vector register from memory addresses generated by a vector base
plus immediate index. The index is a multiple of 8 in the range 0 to 248. Inactive elements will not cause a read from
Device memory or signal faults, and are set to zero in the destination vector.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 0 1 0 1 1 0 1 imm5 1 1 0 Pg Zn Zt
msz<1>msz<0> U ff

LD1D { <Zt>.D }, <Pg>/Z, [<Zn>.D{, #<imm>}]

if !HaveSVE() then UNDEFINED;

integer t = UInt(Zt);
integer n = UInt(Zn);
integer g = UInt(Pg);
integer esize = 64;
integer msize = 64;
boolean unsigned = TRUE;
integer offset = UInt(imm5);

Assembler Symbols

<Zt> Is the name of the scalable vector register to be transferred, encoded in the "Zt" field.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Zn> Is the name of the base scalable vector register, encoded in the "Zn" field.
<imm> Is the optional unsigned immediate byte offset, a multiple of 8 in the range 0 to 248, defaulting to 0,
encoded in the "imm5" field.

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) base;
bits(VL) result;
bits(msize) data;
constant integer mbytes = msize DIV 8;

if HaveMTEExt() then SetTagCheckedInstruction(TRUE);

if AnyActiveElement(mask, esize) then

base = Z[n];

Z[t] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

LD1D (vector plus immediate) Page 1984

LD1H (scalar plus immediate)

Contiguous load unsigned halfwords to vector (immediate index)

Contiguous load of unsigned halfwords to elements of a vector register from the memory address generated by a
64-bit scalar base and immediate index in the range -8 to 7 which is multiplied by the vector's in-memory size,
irrespective of predication, and added to the base address. Inactive elements will not not cause a read from Device
memory or signal a fault, and are set to zero in the destination vector.
It has encodings from 3 classes: 16-bit element , 32-bit element and 64-bit element

16-bit element

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 0 1 0 1 0 imm4 1 0 1 Pg Rn Zt
dtype<3:1>dtype<0>

LD1H { <Zt>.H }, <Pg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]

if !HaveSVE() then UNDEFINED;

integer t = UInt(Zt);
integer n = UInt(Rn);
integer g = UInt(Pg);
integer esize = 16;
integer msize = 16;
boolean unsigned = TRUE;
integer offset = SInt(imm4);

32-bit element

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 0 1 1 0 0 imm4 1 0 1 Pg Rn Zt
dtype<3:1>dtype<0>

LD1H { <Zt>.S }, <Pg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]

if !HaveSVE() then UNDEFINED;

integer t = UInt(Zt);
integer n = UInt(Rn);
integer g = UInt(Pg);
integer esize = 32;
integer msize = 16;
boolean unsigned = TRUE;
integer offset = SInt(imm4);

64-bit element

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 0 1 1 1 0 imm4 1 0 1 Pg Rn Zt
dtype<3:1>dtype<0>

LD1H { <Zt>.D }, <Pg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]

if !HaveSVE() then UNDEFINED;

integer t = UInt(Zt);
integer n = UInt(Rn);
integer g = UInt(Pg);
integer esize = 64;
integer msize = 16;
boolean unsigned = TRUE;
integer offset = SInt(imm4);

LD1H (scalar plus immediate) Page 1985

Assembler Symbols

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(64) base;
bits(PL) mask = P[g];
bits(VL) result;
bits(msize) data;
constant integer mbytes = msize DIV 8;

if HaveMTEExt() then SetTagCheckedInstruction(n != 31);

if !AnyActiveElement(mask, esize) then

if n == 31 && ConstrainUnpredictableBool(Unpredictable_CHECKSPNONEACTIVE) then
CheckSPAlignment();
else
if n == 31 then CheckSPAlignment();
base = if n == 31 then SP[] else X[n];

Z[t] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

LD1H (scalar plus immediate) Page 1986

LD1H (scalar plus scalar)

Contiguous load unsigned halfwords to vector (scalar index)

Contiguous load of unsigned halfwords to elements of a vector register from the memory address generated by a
64-bit scalar base and scalar index which is multiplied by 2 and added to the base address. After each element access
the index value is incremented, but the index register is not updated. Inactive elements will not not cause a read from
Device memory or signal a fault, and are set to zero in the destination vector.
It has encodings from 3 classes: 16-bit element , 32-bit element and 64-bit element

16-bit element

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 0 1 0 1 Rm 0 1 0 Pg Rn Zt
dtype<3:1>dtype<0>

LD1H { <Zt>.H }, <Pg>/Z, [<Xn|SP>, <Xm>, LSL #1]

if !HaveSVE() then UNDEFINED;

if Rm == '11111' then UNDEFINED;
integer t = UInt(Zt);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer g = UInt(Pg);
integer esize = 16;
integer msize = 16;
boolean unsigned = TRUE;

32-bit element

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 0 1 1 0 Rm 0 1 0 Pg Rn Zt
dtype<3:1>dtype<0>

LD1H { <Zt>.S }, <Pg>/Z, [<Xn|SP>, <Xm>, LSL #1]

if !HaveSVE() then UNDEFINED;

if Rm == '11111' then UNDEFINED;
integer t = UInt(Zt);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer g = UInt(Pg);
integer esize = 32;
integer msize = 16;
boolean unsigned = TRUE;

64-bit element

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 0 1 1 1 Rm 0 1 0 Pg Rn Zt
dtype<3:1>dtype<0>

LD1H (scalar plus scalar) Page 1987

LD1H { <Zt>.D }, <Pg>/Z, [<Xn|SP>, <Xm>, LSL #1]

if !HaveSVE() then UNDEFINED;

if Rm == '11111' then UNDEFINED;
integer t = UInt(Zt);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer g = UInt(Pg);
integer esize = 64;
integer msize = 16;
boolean unsigned = TRUE;

Assembler Symbols

<Zt> Is the name of the scalable vector register to be transferred, encoded in the "Zt" field.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<Xm> Is the 64-bit name of the general-purpose offset register, encoded in the "Rm" field.

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(64) base;
bits(PL) mask = P[g];
bits(VL) result;
bits(msize) data;
bits(64) offset;
constant integer mbytes = msize DIV 8;

if HaveMTEExt() then SetTagCheckedInstruction(TRUE);

if !AnyActiveElement(mask, esize) then

if n == 31 && ConstrainUnpredictableBool(Unpredictable_CHECKSPNONEACTIVE) then
CheckSPAlignment();
else
if n == 31 then CheckSPAlignment();
base = if n == 31 then SP[] else X[n];
offset = X[m];

Z[t] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

LD1H (scalar plus scalar) Page 1988

LD1H (scalar plus vector)

Gather load unsigned halfwords to vector (vector index)

Gather load of unsigned halfwords to active elements of a vector register from memory addresses generated by a
64-bit scalar base plus vector index. The index values are optionally first sign or zero-extended from 32 to 64 bits and
then optionally multiplied by 2. Inactive elements will not cause a read from Device memory or signal faults, and are
set to zero in the destination vector.
It has encodings from 6 classes: 32-bit scaled offset , 32-bit unpacked scaled offset , 32-bit unpacked unscaled offset ,
32-bit unscaled offset , 64-bit scaled offset and 64-bit unscaled offset

32-bit scaled offset

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 0 0 0 1 0 0 1 xs 1 Zm 0 1 0 Pg Rn Zt
U ff

LD1H { <Zt>.S }, <Pg>/Z, [<Xn|SP>, <Zm>.S, <mod> #1]

if !HaveSVE() then UNDEFINED;

integer t = UInt(Zt);
integer n = UInt(Rn);
integer m = UInt(Zm);
integer g = UInt(Pg);
integer esize = 32;
integer msize = 16;
integer offs_size = 32;
boolean unsigned = TRUE;
boolean offs_unsigned = xs == '0';
integer scale = 1;

32-bit unpacked scaled offset

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 0 1 0 0 1 xs 1 Zm 0 1 0 Pg Rn Zt
U ff

LD1H { <Zt>.D }, <Pg>/Z, [<Xn|SP>, <Zm>.D, <mod> #1]

if !HaveSVE() then UNDEFINED;

integer t = UInt(Zt);
integer n = UInt(Rn);
integer m = UInt(Zm);
integer g = UInt(Pg);
integer esize = 64;
integer msize = 16;
integer offs_size = 32;
boolean unsigned = TRUE;
boolean offs_unsigned = xs == '0';
integer scale = 1;

32-bit unpacked unscaled offset

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 0 1 0 0 1 xs 0 Zm 0 1 0 Pg Rn Zt
msz<1>msz<0> U ff

LD1H (scalar plus vector) Page 1989

LD1H { <Zt>.D }, <Pg>/Z, [<Xn|SP>, <Zm>.D, <mod>]

if !HaveSVE() then UNDEFINED;

32-bit unscaled offset

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 0 0 0 1 0 0 1 xs 0 Zm 0 1 0 Pg Rn Zt
U ff

LD1H { <Zt>.S }, <Pg>/Z, [<Xn|SP>, <Zm>.S, <mod>]

if !HaveSVE() then UNDEFINED;

64-bit scaled offset

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 0 1 0 0 1 1 1 Zm 1 1 0 Pg Rn Zt
U ff

LD1H { <Zt>.D }, <Pg>/Z, [<Xn|SP>, <Zm>.D, LSL #1]

if !HaveSVE() then UNDEFINED;

integer t = UInt(Zt);
integer n = UInt(Rn);
integer m = UInt(Zm);
integer g = UInt(Pg);
integer esize = 64;
integer msize = 16;
integer offs_size = 64;
boolean unsigned = TRUE;
boolean offs_unsigned = TRUE;
integer scale = 1;

64-bit unscaled offset

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 0 1 0 0 1 1 0 Zm 1 1 0 Pg Rn Zt
msz<1>msz<0> U ff

LD1H (scalar plus vector) Page 1990

LD1H { <Zt>.D }, <Pg>/Z, [<Xn|SP>, <Zm>.D]

if !HaveSVE() then UNDEFINED;

Assembler Symbols

<Zt> Is the name of the scalable vector register to be transferred, encoded in the "Zt" field.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<Zm> Is the name of the offset scalable vector register, encoded in the "Zm" field.

<mod> Is the index extend and shift specifier, encoded in “xs”:

xs <mod>
0 UXTW
1 SXTW

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(64) base;
bits(PL) mask = P[g];
bits(VL) offset;
bits(VL) result;
bits(msize) data;
constant integer mbytes = msize DIV 8;

if HaveMTEExt() then SetTagCheckedInstruction(TRUE);

if !AnyActiveElement(mask, esize) then

if n == 31 && ConstrainUnpredictableBool(Unpredictable_CHECKSPNONEACTIVE) then
CheckSPAlignment();
else
if n == 31 then CheckSPAlignment();
base = if n == 31 then SP[] else X[n];
offset = Z[m];

Z[t] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

LD1H (scalar plus vector) Page 1991

LD1H (vector plus immediate)

Gather load unsigned halfwords to vector (immediate index)

Gather load of unsigned halfwords to active elements of a vector register from memory addresses generated by a
vector base plus immediate index. The index is a multiple of 2 in the range 0 to 62. Inactive elements will not cause a
read from Device memory or signal faults, and are set to zero in the destination vector.
It has encodings from 2 classes: 32-bit element and 64-bit element

32-bit element

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 0 0 0 1 0 0 1 0 1 imm5 1 1 0 Pg Zn Zt
msz<1>msz<0> U ff

LD1H { <Zt>.S }, <Pg>/Z, [<Zn>.S{, #<imm>}]

if !HaveSVE() then UNDEFINED;

integer t = UInt(Zt);
integer n = UInt(Zn);
integer g = UInt(Pg);
integer esize = 32;
integer msize = 16;
boolean unsigned = TRUE;
integer offset = UInt(imm5);

64-bit element

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 0 1 0 0 1 0 1 imm5 1 1 0 Pg Zn Zt
msz<1>msz<0> U ff

LD1H { <Zt>.D }, <Pg>/Z, [<Zn>.D{, #<imm>}]

if !HaveSVE() then UNDEFINED;

integer t = UInt(Zt);
integer n = UInt(Zn);
integer g = UInt(Pg);
integer esize = 64;
integer msize = 16;
boolean unsigned = TRUE;
integer offset = UInt(imm5);

Assembler Symbols

<Zt> Is the name of the scalable vector register to be transferred, encoded in the "Zt" field.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Zn> Is the name of the base scalable vector register, encoded in the "Zn" field.
<imm> Is the optional unsigned immediate byte offset, a multiple of 2 in the range 0 to 62, defaulting to 0,
encoded in the "imm5" field.

LD1H (vector plus immediate) Page 1992

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) base;
bits(VL) result;
bits(msize) data;
constant integer mbytes = msize DIV 8;

if HaveMTEExt() then SetTagCheckedInstruction(TRUE);

if AnyActiveElement(mask, esize) then

base = Z[n];

Z[t] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

LD1H (vector plus immediate) Page 1993

LD1RB

Load and broadcast unsigned byte to vector

Load a single unsigned byte from a memory address generated by a 64-bit scalar base address plus an immediate
offset which is in the range 0 to 63.
Broadcast the loaded data into all active elements of the destination vector, setting the inactive elements to zero. If all
elements are inactive then the instruction will not perform a read from Device memory or cause a data abort.
It has encodings from 4 classes: 8-bit element , 16-bit element , 32-bit element and 64-bit element

8-bit element

31302928272625 24 23 2221201918171615 14 13 121110 9 8 7 6 5 4 3 2 1 0

1 0 0 0 0 1 0 0 0 1 imm6 1 0 0 Pg Rn Zt
dtypeh<1>dtypeh<0> dtypel<1>dtypel<0>

LD1RB { <Zt>.B }, <Pg>/Z, [<Xn|SP>{, #<imm>}]

if !HaveSVE() then UNDEFINED;

integer t = UInt(Zt);
integer n = UInt(Rn);
integer g = UInt(Pg);
integer esize = 8;
integer msize = 8;
boolean unsigned = TRUE;
integer offset = UInt(imm6);

16-bit element

31302928272625 24 23 2221201918171615 14 13 121110 9 8 7 6 5 4 3 2 1 0

1 0 0 0 0 1 0 0 0 1 imm6 1 0 1 Pg Rn Zt
dtypeh<1>dtypeh<0> dtypel<1>dtypel<0>

LD1RB { <Zt>.H }, <Pg>/Z, [<Xn|SP>{, #<imm>}]

if !HaveSVE() then UNDEFINED;

integer t = UInt(Zt);
integer n = UInt(Rn);
integer g = UInt(Pg);
integer esize = 16;
integer msize = 8;
boolean unsigned = TRUE;
integer offset = UInt(imm6);

32-bit element

31302928272625 24 23 2221201918171615 14 13 121110 9 8 7 6 5 4 3 2 1 0

1 0 0 0 0 1 0 0 0 1 imm6 1 1 0 Pg Rn Zt
dtypeh<1>dtypeh<0> dtypel<1>dtypel<0>

LD1RB { <Zt>.S }, <Pg>/Z, [<Xn|SP>{, #<imm>}]

if !HaveSVE() then UNDEFINED;

integer t = UInt(Zt);
integer n = UInt(Rn);
integer g = UInt(Pg);
integer esize = 32;
integer msize = 8;
boolean unsigned = TRUE;
integer offset = UInt(imm6);

LD1RB Page 1994

64-bit element

31302928272625 24 23 2221201918171615 14 13 121110 9 8 7 6 5 4 3 2 1 0

1 0 0 0 0 1 0 0 0 1 imm6 1 1 1 Pg Rn Zt
dtypeh<1>dtypeh<0> dtypel<1>dtypel<0>

LD1RB { <Zt>.D }, <Pg>/Z, [<Xn|SP>{, #<imm>}]

if !HaveSVE() then UNDEFINED;

integer t = UInt(Zt);
integer n = UInt(Rn);
integer g = UInt(Pg);
integer esize = 64;
integer msize = 8;
boolean unsigned = TRUE;
integer offset = UInt(imm6);

Assembler Symbols

<Zt> Is the name of the scalable vector register to be transferred, encoded in the "Zt" field.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<imm> Is the optional unsigned immediate byte offset, in the range 0 to 63, defaulting to 0, encoded in the
"imm6" field.

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(64) base;
bits(PL) mask = P[g];
bits(VL) result;
bits(msize) data;
constant integer mbytes = msize DIV 8;

if HaveMTEExt() then SetTagCheckedInstruction(n != 31);

if !AnyActiveElement(mask, esize) then

if n == 31 && ConstrainUnpredictableBool(Unpredictable_CHECKSPNONEACTIVE) then
CheckSPAlignment();
else
if n == 31 then CheckSPAlignment();
base = if n == 31 then SP[] else X[n];
bits(64) addr = base + offset * mbytes;
data = Mem[addr, mbytes, AccType_SVE];

for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
Elem[result, e, esize] = Extend(data, esize, unsigned);
else
Elem[result, e, esize] = Zeros();

Z[t] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

LD1RB Page 1995

LD1RD

Load and broadcast doubleword to vector

Load a single doubleword from a memory address generated by a 64-bit scalar base address plus an immediate offset
which is a multiple of 8 in the range 0 to 504.
Broadcast the loaded data into all active elements of the destination vector, setting the inactive elements to zero. If all
elements are inactive then the instruction will not perform a read from Device memory or cause a data abort.
31302928272625 24 23 2221201918171615 14 13 121110 9 8 7 6 5 4 3 2 1 0
1 0 0 0 0 1 0 1 1 1 imm6 1 1 1 Pg Rn Zt
dtypeh<1>dtypeh<0> dtypel<1>dtypel<0>

LD1RD { <Zt>.D }, <Pg>/Z, [<Xn|SP>{, #<imm>}]

if !HaveSVE() then UNDEFINED;

integer t = UInt(Zt);
integer n = UInt(Rn);
integer g = UInt(Pg);
integer esize = 64;
integer msize = 64;
boolean unsigned = TRUE;
integer offset = UInt(imm6);

Assembler Symbols

<Zt> Is the name of the scalable vector register to be transferred, encoded in the "Zt" field.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<imm> Is the optional unsigned immediate byte offset, a multiple of 8 in the range 0 to 504, defaulting to 0,
encoded in the "imm6" field.

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(64) base;
bits(PL) mask = P[g];
bits(VL) result;
bits(msize) data;
constant integer mbytes = msize DIV 8;

if HaveMTEExt() then SetTagCheckedInstruction(n != 31);

if !AnyActiveElement(mask, esize) then

for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
Elem[result, e, esize] = Extend(data, esize, unsigned);
else
Elem[result, e, esize] = Zeros();

Z[t] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

LD1RD Page 1996

LD1RD Page 1997

LD1RH

Load and broadcast unsigned halfword to vector

Load a single unsigned halfword from a memory address generated by a 64-bit scalar base address plus an immediate
offset which is a multiple of 2 in the range 0 to 126.
Broadcast the loaded data into all active elements of the destination vector, setting the inactive elements to zero. If all
elements are inactive then the instruction will not perform a read from Device memory or cause a data abort.
It has encodings from 3 classes: 16-bit element , 32-bit element and 64-bit element

16-bit element

31302928272625 24 23 2221201918171615 14 13 121110 9 8 7 6 5 4 3 2 1 0

1 0 0 0 0 1 0 0 1 1 imm6 1 0 1 Pg Rn Zt
dtypeh<1>dtypeh<0> dtypel<1>dtypel<0>

LD1RH { <Zt>.H }, <Pg>/Z, [<Xn|SP>{, #<imm>}]

if !HaveSVE() then UNDEFINED;

integer t = UInt(Zt);
integer n = UInt(Rn);
integer g = UInt(Pg);
integer esize = 16;
integer msize = 16;
boolean unsigned = TRUE;
integer offset = UInt(imm6);

32-bit element

31302928272625 24 23 2221201918171615 14 13 121110 9 8 7 6 5 4 3 2 1 0

1 0 0 0 0 1 0 0 1 1 imm6 1 1 0 Pg Rn Zt
dtypeh<1>dtypeh<0> dtypel<1>dtypel<0>

LD1RH { <Zt>.S }, <Pg>/Z, [<Xn|SP>{, #<imm>}]

if !HaveSVE() then UNDEFINED;

integer t = UInt(Zt);
integer n = UInt(Rn);
integer g = UInt(Pg);
integer esize = 32;
integer msize = 16;
boolean unsigned = TRUE;
integer offset = UInt(imm6);

64-bit element

31302928272625 24 23 2221201918171615 14 13 121110 9 8 7 6 5 4 3 2 1 0

1 0 0 0 0 1 0 0 1 1 imm6 1 1 1 Pg Rn Zt
dtypeh<1>dtypeh<0> dtypel<1>dtypel<0>

LD1RH { <Zt>.D }, <Pg>/Z, [<Xn|SP>{, #<imm>}]

if !HaveSVE() then UNDEFINED;

integer t = UInt(Zt);
integer n = UInt(Rn);
integer g = UInt(Pg);
integer esize = 64;
integer msize = 16;
boolean unsigned = TRUE;
integer offset = UInt(imm6);

LD1RH Page 1998

Assembler Symbols

<Zt> Is the name of the scalable vector register to be transferred, encoded in the "Zt" field.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<imm> Is the optional unsigned immediate byte offset, a multiple of 2 in the range 0 to 126, defaulting to 0,
encoded in the "imm6" field.

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(64) base;
bits(PL) mask = P[g];
bits(VL) result;
bits(msize) data;
constant integer mbytes = msize DIV 8;

if HaveMTEExt() then SetTagCheckedInstruction(n != 31);

if !AnyActiveElement(mask, esize) then

for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
Elem[result, e, esize] = Extend(data, esize, unsigned);
else
Elem[result, e, esize] = Zeros();

Z[t] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

LD1RH Page 1999

LD1ROB (scalar plus immediate)

Contiguous load and replicate thirty-two bytes (immediate index)

Load thirty-two contiguous bytes to elements of a 256-bit (octaword) vector from the memory address generated by a
64-bit scalar base address and immediate index that is a multiple of 32 in the range -256 to +224 added to the base
address.
Inactive elements will not cause a read from Device memory or signal a fault, and are set to zero.
The resulting 256-bit vector is then replicated to fill the destination vector. The instruction requires that the current
vector length is at least 256 bits, and if the current vector length is not an integer multiple of 256 bits then the trailing
bits in the destination vector are set to zero.
Only the first thirty-two predicate elements are used and higher numbered predicate elements are ignored.
ID_AA64ZFR0_EL1.F64MM indicates whether this instruction is implemented.

SVE
(FEAT_F64MM)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 0 0 0 1 0 imm4 0 0 1 Pg Rn Zt
msz<1>msz<0> ssz

LD1ROB { <Zt>.B }, <Pg>/Z, [<Xn|SP>{, #<imm>}]

if !HaveSVEFP64MatMulExt() then UNDEFINED;

integer t = UInt(Zt);
integer n = UInt(Rn);
integer g = UInt(Pg);
integer esize = 8;
integer offset = SInt(imm4);

Assembler Symbols

<Zt> Is the name of the scalable vector register to be transferred, encoded in the "Zt" field.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<imm> Is the optional signed immediate byte offset, a multiple of 32 in the range -256 to 224, defaulting to 0,
encoded in the "imm4" field.

LD1ROB (scalar plus

Page 2000
immediate)
Operation

CheckSVEEnabled();
if VL < 256 then UNDEFINED;
integer elements = 256 DIV esize;
bits(64) base;
bits(PL) mask = P[g]; // low bits only
bits(256) result;
constant integer mbytes = esize DIV 8;

if HaveMTEExt() then SetTagCheckedInstruction(n != 31);

if !AnyActiveElement(mask, esize) then

if n == 31 && ConstrainUnpredictableBool(Unpredictable_CHECKSPNONEACTIVE) then
CheckSPAlignment();
else
if n == 31 then CheckSPAlignment();
base = if n == 31 then SP[] else X[n];

for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
integer eoff = (offset * elements) + e;
bits(64) addr = base + eoff * mbytes;
Elem[result, e, esize] = Mem[addr, mbytes, AccType_SVE];
else
Elem[result, e, esize] = Zeros();

Z[t] = ZeroExtend(Replicate(result, VL DIV 256), VL);

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

LD1ROB (scalar plus

Page 2001
immediate)
LD1ROB (scalar plus scalar)

Contiguous load and replicate thirty-two bytes (scalar index)

Load thirty-two contiguous bytes to elements of a 256-bit (octaword) vector from the memory address generated by a
64-bit scalar base address and scalar index which is added to the base address.
Inactive elements will not cause a read from Device memory or signal a fault, and are set to zero.
The resulting 256-bit vector is then replicated to fill the destination vector. The instruction requires that the current
vector length is at least 256 bits, and if the current vector length is not an integer multiple of 256 bits then the trailing
bits in the destination vector are set to zero.
Only the first thirty-two predicate elements are used and higher numbered predicate elements are ignored.
ID_AA64ZFR0_EL1.F64MM indicates whether this instruction is implemented.

SVE
(FEAT_F64MM)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 0 0 0 1 Rm 0 0 0 Pg Rn Zt
msz<1>msz<0> ssz

LD1ROB { <Zt>.B }, <Pg>/Z, [<Xn|SP>, <Xm>]

if !HaveSVEFP64MatMulExt() then UNDEFINED;

if Rm == '11111' then UNDEFINED;
integer t = UInt(Zt);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer g = UInt(Pg);
integer esize = 8;

Assembler Symbols

<Zt> Is the name of the scalable vector register to be transferred, encoded in the "Zt" field.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<Xm> Is the 64-bit name of the general-purpose offset register, encoded in the "Rm" field.

LD1ROB (scalar plus scalar) Page 2002

Operation

CheckSVEEnabled();
if VL < 256 then UNDEFINED;
integer elements = 256 DIV esize;
bits(64) base;
bits(PL) mask = P[g]; // low bits only
bits(64) offset;
bits(256) result;
constant integer mbytes = esize DIV 8;

if HaveMTEExt() then SetTagCheckedInstruction(TRUE);

if !AnyActiveElement(mask, esize) then

if n == 31 && ConstrainUnpredictableBool(Unpredictable_CHECKSPNONEACTIVE) then
CheckSPAlignment();
else
if n == 31 then CheckSPAlignment();
base = if n == 31 then SP[] else X[n];
offset = X[m];

for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
integer eoff = UInt(offset) + e;
bits(64) addr = base + eoff * mbytes;
Elem[result, e, esize] = Mem[addr, mbytes, AccType_SVE];
else
Elem[result, e, esize] = Zeros();

Z[t] = ZeroExtend(Replicate(result, VL DIV 256), VL);

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

LD1ROB (scalar plus scalar) Page 2003

LD1ROD (scalar plus immediate)

Contiguous load and replicate four doublewords (immediate index)

Load four contiguous doublewords to elements of a 256-bit (octaword) vector from the memory address generated by
a 64-bit scalar base address and immediate index that is a multiple of 32 in the range -256 to +224 added to the base
address.
Inactive elements will not cause a read from Device memory or signal a fault, and are set to zero.
The resulting 256-bit vector is then replicated to fill the destination vector. The instruction requires that the current
vector length is at least 256 bits, and if the current vector length is not an integer multiple of 256 bits then the trailing
bits in the destination vector are set to zero.
Only the first four predicate elements are used and higher numbered predicate elements are ignored.
ID_AA64ZFR0_EL1.F64MM indicates whether this instruction is implemented.

SVE
(FEAT_F64MM)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 1 1 0 1 0 imm4 0 0 1 Pg Rn Zt
msz<1>msz<0> ssz

LD1ROD { <Zt>.D }, <Pg>/Z, [<Xn|SP>{, #<imm>}]

if !HaveSVEFP64MatMulExt() then UNDEFINED;

integer t = UInt(Zt);
integer n = UInt(Rn);
integer g = UInt(Pg);
integer esize = 64;
integer offset = SInt(imm4);

Assembler Symbols

<Zt> Is the name of the scalable vector register to be transferred, encoded in the "Zt" field.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<imm> Is the optional signed immediate byte offset, a multiple of 32 in the range -256 to 224, defaulting to 0,
encoded in the "imm4" field.

LD1ROD (scalar plus

Page 2004
immediate)
Operation

CheckSVEEnabled();
if VL < 256 then UNDEFINED;
integer elements = 256 DIV esize;
bits(64) base;
bits(PL) mask = P[g]; // low bits only
bits(256) result;
constant integer mbytes = esize DIV 8;

if HaveMTEExt() then SetTagCheckedInstruction(n != 31);

if !AnyActiveElement(mask, esize) then

if n == 31 && ConstrainUnpredictableBool(Unpredictable_CHECKSPNONEACTIVE) then
CheckSPAlignment();
else
if n == 31 then CheckSPAlignment();
base = if n == 31 then SP[] else X[n];

Z[t] = ZeroExtend(Replicate(result, VL DIV 256), VL);

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

LD1ROD (scalar plus

Page 2005
immediate)
LD1ROD (scalar plus scalar)

Contiguous load and replicate four doublewords (scalar index)

Load four contiguous doublewords to elements of a 256-bit (octaword) vector from the memory address generated by
a 64-bit scalar base address and scalar index which is multiplied by 8 and added to the base address.
Inactive elements will not cause a read from Device memory or signal a fault, and are set to zero.
The resulting 256-bit vector is then replicated to fill the destination vector. The instruction requires that the current
vector length is at least 256 bits, and if the current vector length is not an integer multiple of 256 bits then the trailing
bits in the destination vector are set to zero.
Only the first four predicate elements are used and higher numbered predicate elements are ignored.
ID_AA64ZFR0_EL1.F64MM indicates whether this instruction is implemented.

SVE
(FEAT_F64MM)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 1 1 0 1 Rm 0 0 0 Pg Rn Zt
msz<1>msz<0> ssz

LD1ROD { <Zt>.D }, <Pg>/Z, [<Xn|SP>, <Xm>, LSL #3]

if !HaveSVEFP64MatMulExt() then UNDEFINED;

if Rm == '11111' then UNDEFINED;
integer t = UInt(Zt);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer g = UInt(Pg);
integer esize = 64;

Assembler Symbols

<Zt> Is the name of the scalable vector register to be transferred, encoded in the "Zt" field.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<Xm> Is the 64-bit name of the general-purpose offset register, encoded in the "Rm" field.

LD1ROD (scalar plus scalar) Page 2006

Operation

if HaveMTEExt() then SetTagCheckedInstruction(TRUE);

if !AnyActiveElement(mask, esize) then

if n == 31 && ConstrainUnpredictableBool(Unpredictable_CHECKSPNONEACTIVE) then
CheckSPAlignment();
else
if n == 31 then CheckSPAlignment();
base = if n == 31 then SP[] else X[n];
offset = X[m];

Z[t] = ZeroExtend(Replicate(result, VL DIV 256), VL);

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

LD1ROD (scalar plus scalar) Page 2007

LD1ROH (scalar plus immediate)

Contiguous load and replicate sixteen halfwords (immediate index)

Load sixteen contiguous halfwords to elements of a 256-bit (octaword) vector from the memory address generated by
a 64-bit scalar base address and immediate index that is a multiple of 32 in the range -256 to +224 added to the base
address.
Inactive elements will not cause a read from Device memory or signal a fault, and are set to zero.
The resulting 256-bit vector is then replicated to fill the destination vector. The instruction requires that the current
vector length is at least 256 bits, and if the current vector length is not an integer multiple of 256 bits then the trailing
bits in the destination vector are set to zero.
Only the first sixteen predicate elements are used and higher numbered predicate elements are ignored.
ID_AA64ZFR0_EL1.F64MM indicates whether this instruction is implemented.

SVE
(FEAT_F64MM)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 0 1 0 1 0 imm4 0 0 1 Pg Rn Zt
msz<1>msz<0> ssz

LD1ROH { <Zt>.H }, <Pg>/Z, [<Xn|SP>{, #<imm>}]

if !HaveSVEFP64MatMulExt() then UNDEFINED;

integer t = UInt(Zt);
integer n = UInt(Rn);
integer g = UInt(Pg);
integer esize = 16;
integer offset = SInt(imm4);

Assembler Symbols

<Zt> Is the name of the scalable vector register to be transferred, encoded in the "Zt" field.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<imm> Is the optional signed immediate byte offset, a multiple of 32 in the range -256 to 224, defaulting to 0,
encoded in the "imm4" field.

LD1ROH (scalar plus

Page 2008
immediate)
Operation

CheckSVEEnabled();
if VL < 256 then UNDEFINED;
integer elements = 256 DIV esize;
bits(64) base;
bits(PL) mask = P[g]; // low bits only
bits(256) result;
constant integer mbytes = esize DIV 8;

if HaveMTEExt() then SetTagCheckedInstruction(n != 31);

if !AnyActiveElement(mask, esize) then

if n == 31 && ConstrainUnpredictableBool(Unpredictable_CHECKSPNONEACTIVE) then
CheckSPAlignment();
else
if n == 31 then CheckSPAlignment();
base = if n == 31 then SP[] else X[n];

Z[t] = ZeroExtend(Replicate(result, VL DIV 256), VL);

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

LD1ROH (scalar plus

Page 2009
immediate)
LD1ROH (scalar plus scalar)

Contiguous load and replicate sixteen halfwords (scalar index)

Load sixteen contiguous halfwords to elements of a 256-bit (octaword) vector from the memory address generated by
a 64-bit scalar base address and scalar index which is multiplied by 2 and added to the base address.
Inactive elements will not cause a read from Device memory or signal a fault, and are set to zero.
The resulting 256-bit vector is then replicated to fill the destination vector. The instruction requires that the current
vector length is at least 256 bits, and if the current vector length is not an integer multiple of 256 bits then the trailing
bits in the destination vector are set to zero.
Only the first sixteen predicate elements are used and higher numbered predicate elements are ignored.
ID_AA64ZFR0_EL1.F64MM indicates whether this instruction is implemented.

SVE
(FEAT_F64MM)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 0 1 0 1 Rm 0 0 0 Pg Rn Zt
msz<1>msz<0> ssz

LD1ROH { <Zt>.H }, <Pg>/Z, [<Xn|SP>, <Xm>, LSL #1]

if !HaveSVEFP64MatMulExt() then UNDEFINED;

if Rm == '11111' then UNDEFINED;
integer t = UInt(Zt);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer g = UInt(Pg);
integer esize = 16;

Assembler Symbols

<Zt> Is the name of the scalable vector register to be transferred, encoded in the "Zt" field.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<Xm> Is the 64-bit name of the general-purpose offset register, encoded in the "Rm" field.

LD1ROH (scalar plus scalar) Page 2010

Operation

if HaveMTEExt() then SetTagCheckedInstruction(TRUE);

if !AnyActiveElement(mask, esize) then

if n == 31 && ConstrainUnpredictableBool(Unpredictable_CHECKSPNONEACTIVE) then
CheckSPAlignment();
else
if n == 31 then CheckSPAlignment();
base = if n == 31 then SP[] else X[n];
offset = X[m];

Z[t] = ZeroExtend(Replicate(result, VL DIV 256), VL);

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

LD1ROH (scalar plus scalar) Page 2011

LD1ROW (scalar plus immediate)

Contiguous load and replicate eight words (immediate index)

Load eight contiguous words to elements of a 256-bit (octaword) vector from the memory address generated by a
64-bit scalar base address and immediate index that is a multiple of 32 in the range -256 to +224 added to the base
address.
Inactive elements will not cause a read from Device memory or signal a fault, and are set to zero.
The resulting 256-bit vector is then replicated to fill the destination vector. The instruction requires that the current
vector length is at least 256 bits, and if the current vector length is not an integer multiple of 256 bits then the trailing
bits in the destination vector are set to zero.
Only the first eight predicate elements are used and higher numbered predicate elements are ignored.
ID_AA64ZFR0_EL1.F64MM indicates whether this instruction is implemented.

SVE
(FEAT_F64MM)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 1 0 0 1 0 imm4 0 0 1 Pg Rn Zt
msz<1>msz<0> ssz

LD1ROW { <Zt>.S }, <Pg>/Z, [<Xn|SP>{, #<imm>}]

if !HaveSVEFP64MatMulExt() then UNDEFINED;

integer t = UInt(Zt);
integer n = UInt(Rn);
integer g = UInt(Pg);
integer esize = 32;
integer offset = SInt(imm4);

Assembler Symbols

<Zt> Is the name of the scalable vector register to be transferred, encoded in the "Zt" field.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<imm> Is the optional signed immediate byte offset, a multiple of 32 in the range -256 to 224, defaulting to 0,
encoded in the "imm4" field.

LD1ROW (scalar plus

Page 2012
immediate)
Operation

CheckSVEEnabled();
if VL < 256 then UNDEFINED;
integer elements = 256 DIV esize;
bits(64) base;
bits(PL) mask = P[g]; // low bits only
bits(256) result;
constant integer mbytes = esize DIV 8;

if HaveMTEExt() then SetTagCheckedInstruction(n != 31);

if !AnyActiveElement(mask, esize) then

if n == 31 && ConstrainUnpredictableBool(Unpredictable_CHECKSPNONEACTIVE) then
CheckSPAlignment();
else
if n == 31 then CheckSPAlignment();
base = if n == 31 then SP[] else X[n];

Z[t] = ZeroExtend(Replicate(result, VL DIV 256), VL);

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

LD1ROW (scalar plus

Page 2013
immediate)
LD1ROW (scalar plus scalar)

Contiguous load and replicate eight words (scalar index)

Load eight contiguous words to elements of a 256-bit (octaword) vector from the memory address generated by a
64-bit scalar base address and scalar index which is multiplied by 4 and added to the base address.
Inactive elements will not cause a read from Device memory or signal a fault, and are set to zero.
The resulting 256-bit vector is then replicated to fill the destination vector. The instruction requires that the current
vector length is at least 256 bits, and if the current vector length is not an integer multiple of 256 bits then the trailing
bits in the destination vector are set to zero.
Only the first eight predicate elements are used and higher numbered predicate elements are ignored.
ID_AA64ZFR0_EL1.F64MM indicates whether this instruction is implemented.

SVE
(FEAT_F64MM)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 1 0 0 1 Rm 0 0 0 Pg Rn Zt
msz<1>msz<0> ssz

LD1ROW { <Zt>.S }, <Pg>/Z, [<Xn|SP>, <Xm>, LSL #2]

if !HaveSVEFP64MatMulExt() then UNDEFINED;

if Rm == '11111' then UNDEFINED;
integer t = UInt(Zt);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer g = UInt(Pg);
integer esize = 32;

Assembler Symbols

<Zt> Is the name of the scalable vector register to be transferred, encoded in the "Zt" field.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<Xm> Is the 64-bit name of the general-purpose offset register, encoded in the "Rm" field.

LD1ROW (scalar plus scalar) Page 2014

Operation

if HaveMTEExt() then SetTagCheckedInstruction(TRUE);

if !AnyActiveElement(mask, esize) then

if n == 31 && ConstrainUnpredictableBool(Unpredictable_CHECKSPNONEACTIVE) then
CheckSPAlignment();
else
if n == 31 then CheckSPAlignment();
base = if n == 31 then SP[] else X[n];
offset = X[m];

Z[t] = ZeroExtend(Replicate(result, VL DIV 256), VL);

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

LD1ROW (scalar plus scalar) Page 2015

LD1RQB (scalar plus immediate)

Contiguous load and replicate sixteen bytes (immediate index)

Load sixteen contiguous bytes to elements of a short, 128-bit (quadword) vector from the memory address generated
by a 64-bit scalar base address and immediate index that is a multiple of 16 in the range -128 to +112 added to the
base address.
Inactive elements will not cause a read from Device memory or signal a fault, and are set to zero. The resulting short
vector is then replicated to fill the long destination vector. Only the first sixteen predicate elements are used and
higher numbered predicate elements are ignored.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 0 0 0 0 0 imm4 0 0 1 Pg Rn Zt
msz<1>msz<0> ssz

LD1RQB { <Zt>.B }, <Pg>/Z, [<Xn|SP>{, #<imm>}]

if !HaveSVE() then UNDEFINED;

integer t = UInt(Zt);
integer n = UInt(Rn);
integer g = UInt(Pg);
integer esize = 8;
integer offset = SInt(imm4);

Assembler Symbols

<Zt> Is the name of the scalable vector register to be transferred, encoded in the "Zt" field.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<imm> Is the optional signed immediate byte offset, a multiple of 16 in the range -128 to 112, defaulting to 0,
encoded in the "imm4" field.

Operation

CheckSVEEnabled();
integer elements = 128 DIV esize;
bits(64) base;
bits(PL) mask = P[g]; // low 16 bits only
bits(128) result;
constant integer mbytes = esize DIV 8;

if HaveMTEExt() then SetTagCheckedInstruction(n != 31);

if !AnyActiveElement(mask, esize) then

if n == 31 && ConstrainUnpredictableBool(Unpredictable_CHECKSPNONEACTIVE) then
CheckSPAlignment();
else
if n == 31 then CheckSPAlignment();
base = if n == 31 then SP[] else X[n];

for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
bits(64) addr = base + (offset * 16) + (e * mbytes);
Elem[result, e, esize] = Mem[addr, mbytes, AccType_SVE];
else
Elem[result, e, esize] = Zeros();

Z[t] = Replicate(result, VL DIV 128);

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

LD1RQB (scalar plus

Page 2016
immediate)
LD1RQB (scalar plus scalar)

Contiguous load and replicate sixteen bytes (scalar index)

Load sixteen contiguous bytes to elements of a short, 128-bit (quadword) vector from the memory address generated
by a 64-bit scalar base address and scalar index which is added to the base address.
Inactive elements will not cause a read from Device memory or signal a fault, and are set to zero. The resulting short
vector is then replicated to fill the long destination vector. Only the first sixteen predicate elements are used and
higher numbered predicate elements are ignored.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 0 0 0 0 Rm 0 0 0 Pg Rn Zt
msz<1>msz<0> ssz

LD1RQB { <Zt>.B }, <Pg>/Z, [<Xn|SP>, <Xm>]

if !HaveSVE() then UNDEFINED;

if Rm == '11111' then UNDEFINED;
integer t = UInt(Zt);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer g = UInt(Pg);
integer esize = 8;

Assembler Symbols

<Zt> Is the name of the scalable vector register to be transferred, encoded in the "Zt" field.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<Xm> Is the 64-bit name of the general-purpose offset register, encoded in the "Rm" field.

Operation

CheckSVEEnabled();
integer elements = 128 DIV esize;
bits(64) base;
bits(PL) mask = P[g]; // low 16 bits only
bits(64) offset;
bits(128) result;
constant integer mbytes = esize DIV 8;

if HaveMTEExt() then SetTagCheckedInstruction(TRUE);

if !AnyActiveElement(mask, esize) then

if n == 31 && ConstrainUnpredictableBool(Unpredictable_CHECKSPNONEACTIVE) then
CheckSPAlignment();
else
if n == 31 then CheckSPAlignment();
base = if n == 31 then SP[] else X[n];
offset = X[m];

Z[t] = Replicate(result, VL DIV 128);

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

LD1RQB (scalar plus scalar) Page 2017

LD1RQB (scalar plus scalar) Page 2018

LD1RQD (scalar plus immediate)

Contiguous load and replicate two doublewords (immediate index)

Load two contiguous doublewords to elements of a short, 128-bit (quadword) vector from the memory address
generated by a 64-bit scalar base address and immediate index that is a multiple of 16 in the range -128 to +112
added to the base address.
Inactive elements will not cause a read from Device memory or signal a fault, and are set to zero. The resulting short
vector is then replicated to fill the long destination vector. Only the first two predicate elements are used and higher
numbered predicate elements are ignored.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 1 1 0 0 0 imm4 0 0 1 Pg Rn Zt
msz<1>msz<0> ssz

LD1RQD { <Zt>.D }, <Pg>/Z, [<Xn|SP>{, #<imm>}]

if !HaveSVE() then UNDEFINED;

integer t = UInt(Zt);
integer n = UInt(Rn);
integer g = UInt(Pg);
integer esize = 64;
integer offset = SInt(imm4);

Assembler Symbols

<Zt> Is the name of the scalable vector register to be transferred, encoded in the "Zt" field.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<imm> Is the optional signed immediate byte offset, a multiple of 16 in the range -128 to 112, defaulting to 0,
encoded in the "imm4" field.

Operation

CheckSVEEnabled();
integer elements = 128 DIV esize;
bits(64) base;
bits(PL) mask = P[g]; // low 16 bits only
bits(128) result;
constant integer mbytes = esize DIV 8;

if HaveMTEExt() then SetTagCheckedInstruction(n != 31);

if !AnyActiveElement(mask, esize) then

if n == 31 && ConstrainUnpredictableBool(Unpredictable_CHECKSPNONEACTIVE) then
CheckSPAlignment();
else
if n == 31 then CheckSPAlignment();
base = if n == 31 then SP[] else X[n];

Z[t] = Replicate(result, VL DIV 128);

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

LD1RQD (scalar plus

Page 2019
immediate)
LD1RQD (scalar plus scalar)

Contiguous load and replicate two doublewords (scalar index)

Load two contiguous doublewords to elements of a short, 128-bit (quadword) vector from the memory address
generated by a 64-bit scalar base address and scalar index which is multiplied by 8 and added to the base address.
Inactive elements will not cause a read from Device memory or signal a fault, and are set to zero. The resulting short
vector is then replicated to fill the long destination vector. Only the first two predicate elements are used and higher
numbered predicate elements are ignored.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 1 1 0 0 Rm 0 0 0 Pg Rn Zt
msz<1>msz<0> ssz

LD1RQD { <Zt>.D }, <Pg>/Z, [<Xn|SP>, <Xm>, LSL #3]

if !HaveSVE() then UNDEFINED;

if Rm == '11111' then UNDEFINED;
integer t = UInt(Zt);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer g = UInt(Pg);
integer esize = 64;

Assembler Symbols

<Zt> Is the name of the scalable vector register to be transferred, encoded in the "Zt" field.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<Xm> Is the 64-bit name of the general-purpose offset register, encoded in the "Rm" field.

Operation

CheckSVEEnabled();
integer elements = 128 DIV esize;
bits(64) base;
bits(PL) mask = P[g]; // low 16 bits only
bits(64) offset;
bits(128) result;
constant integer mbytes = esize DIV 8;

if HaveMTEExt() then SetTagCheckedInstruction(TRUE);

if !AnyActiveElement(mask, esize) then

if n == 31 && ConstrainUnpredictableBool(Unpredictable_CHECKSPNONEACTIVE) then
CheckSPAlignment();
else
if n == 31 then CheckSPAlignment();
base = if n == 31 then SP[] else X[n];
offset = X[m];

Z[t] = Replicate(result, VL DIV 128);

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

LD1RQD (scalar plus scalar) Page 2020

LD1RQD (scalar plus scalar) Page 2021

LD1RQH (scalar plus immediate)

Contiguous load and replicate eight halfwords (immediate index)

Load eight contiguous halfwords to elements of a short, 128-bit (quadword) vector from the memory address
generated by a 64-bit scalar base address and immediate index that is a multiple of 16 in the range -128 to +112
added to the base address.
Inactive elements will not cause a read from Device memory or signal a fault, and are set to zero. The resulting short
vector is then replicated to fill the long destination vector. Only the first eight predicate elements are used and higher
numbered predicate elements are ignored.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 0 1 0 0 0 imm4 0 0 1 Pg Rn Zt
msz<1>msz<0> ssz

LD1RQH { <Zt>.H }, <Pg>/Z, [<Xn|SP>{, #<imm>}]

if !HaveSVE() then UNDEFINED;

integer t = UInt(Zt);
integer n = UInt(Rn);
integer g = UInt(Pg);
integer esize = 16;
integer offset = SInt(imm4);

Assembler Symbols

<Zt> Is the name of the scalable vector register to be transferred, encoded in the "Zt" field.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<imm> Is the optional signed immediate byte offset, a multiple of 16 in the range -128 to 112, defaulting to 0,
encoded in the "imm4" field.

Operation

CheckSVEEnabled();
integer elements = 128 DIV esize;
bits(64) base;
bits(PL) mask = P[g]; // low 16 bits only
bits(128) result;
constant integer mbytes = esize DIV 8;

if HaveMTEExt() then SetTagCheckedInstruction(n != 31);

if !AnyActiveElement(mask, esize) then

if n == 31 && ConstrainUnpredictableBool(Unpredictable_CHECKSPNONEACTIVE) then
CheckSPAlignment();
else
if n == 31 then CheckSPAlignment();
base = if n == 31 then SP[] else X[n];

Z[t] = Replicate(result, VL DIV 128);

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

LD1RQH (scalar plus

Page 2022
immediate)
LD1RQH (scalar plus scalar)

Contiguous load and replicate eight halfwords (scalar index)

Load eight contiguous halfwords to elements of a short, 128-bit (quadword) vector from the memory address
generated by a 64-bit scalar base address and scalar index which is multiplied by 2 and added to the base address.
Inactive elements will not cause a read from Device memory or signal a fault, and are set to zero. The resulting short
vector is then replicated to fill the long destination vector. Only the first eight predicate elements are used and higher
numbered predicate elements are ignored.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 0 1 0 0 Rm 0 0 0 Pg Rn Zt
msz<1>msz<0> ssz

LD1RQH { <Zt>.H }, <Pg>/Z, [<Xn|SP>, <Xm>, LSL #1]

if !HaveSVE() then UNDEFINED;

if Rm == '11111' then UNDEFINED;
integer t = UInt(Zt);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer g = UInt(Pg);
integer esize = 16;

Assembler Symbols

<Zt> Is the name of the scalable vector register to be transferred, encoded in the "Zt" field.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<Xm> Is the 64-bit name of the general-purpose offset register, encoded in the "Rm" field.

Operation

CheckSVEEnabled();
integer elements = 128 DIV esize;
bits(64) base;
bits(PL) mask = P[g]; // low 16 bits only
bits(64) offset;
bits(128) result;
constant integer mbytes = esize DIV 8;

if HaveMTEExt() then SetTagCheckedInstruction(TRUE);

if !AnyActiveElement(mask, esize) then

if n == 31 && ConstrainUnpredictableBool(Unpredictable_CHECKSPNONEACTIVE) then
CheckSPAlignment();
else
if n == 31 then CheckSPAlignment();
base = if n == 31 then SP[] else X[n];
offset = X[m];

Z[t] = Replicate(result, VL DIV 128);

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

LD1RQH (scalar plus scalar) Page 2023

LD1RQH (scalar plus scalar) Page 2024

LD1RQW (scalar plus immediate)

Contiguous load and replicate four words (immediate index)

Load four contiguous words to elements of a short, 128-bit (quadword) vector from the memory address generated by
a 64-bit scalar base address and immediate index that is a multiple of 16 in the range -128 to +112 added to the base
address.
Inactive elements will not cause a read from Device memory or signal a fault, and are set to zero. The resulting short
vector is then replicated to fill the long destination vector. Only the first four predicate elements are used and higher
numbered predicate elements are ignored.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 1 0 0 0 0 imm4 0 0 1 Pg Rn Zt
msz<1>msz<0> ssz

LD1RQW { <Zt>.S }, <Pg>/Z, [<Xn|SP>{, #<imm>}]

if !HaveSVE() then UNDEFINED;

integer t = UInt(Zt);
integer n = UInt(Rn);
integer g = UInt(Pg);
integer esize = 32;
integer offset = SInt(imm4);

Assembler Symbols

<Zt> Is the name of the scalable vector register to be transferred, encoded in the "Zt" field.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<imm> Is the optional signed immediate byte offset, a multiple of 16 in the range -128 to 112, defaulting to 0,
encoded in the "imm4" field.

Operation

CheckSVEEnabled();
integer elements = 128 DIV esize;
bits(64) base;
bits(PL) mask = P[g]; // low 16 bits only
bits(128) result;
constant integer mbytes = esize DIV 8;

if HaveMTEExt() then SetTagCheckedInstruction(n != 31);

if !AnyActiveElement(mask, esize) then

if n == 31 && ConstrainUnpredictableBool(Unpredictable_CHECKSPNONEACTIVE) then
CheckSPAlignment();
else
if n == 31 then CheckSPAlignment();
base = if n == 31 then SP[] else X[n];

Z[t] = Replicate(result, VL DIV 128);

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

LD1RQW (scalar plus

Page 2025
immediate)
LD1RQW (scalar plus scalar)

Contiguous load and replicate four words (scalar index)

Load four contiguous words to elements of a short, 128-bit (quadword) vector from the memory address generated by
a 64-bit scalar base address and scalar index which is multiplied by 4 and added to the base address.
Inactive elements will not cause a read from Device memory or signal a fault, and are set to zero. The resulting short
vector is then replicated to fill the long destination vector. Only the first four predicate elements are used and higher
numbered predicate elements are ignored.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 1 0 0 0 Rm 0 0 0 Pg Rn Zt
msz<1>msz<0> ssz

LD1RQW { <Zt>.S }, <Pg>/Z, [<Xn|SP>, <Xm>, LSL #2]

if !HaveSVE() then UNDEFINED;

if Rm == '11111' then UNDEFINED;
integer t = UInt(Zt);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer g = UInt(Pg);
integer esize = 32;

Assembler Symbols

<Zt> Is the name of the scalable vector register to be transferred, encoded in the "Zt" field.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<Xm> Is the 64-bit name of the general-purpose offset register, encoded in the "Rm" field.

Operation

CheckSVEEnabled();
integer elements = 128 DIV esize;
bits(64) base;
bits(PL) mask = P[g]; // low 16 bits only
bits(64) offset;
bits(128) result;
constant integer mbytes = esize DIV 8;

if HaveMTEExt() then SetTagCheckedInstruction(TRUE);

if !AnyActiveElement(mask, esize) then

if n == 31 && ConstrainUnpredictableBool(Unpredictable_CHECKSPNONEACTIVE) then
CheckSPAlignment();
else
if n == 31 then CheckSPAlignment();
base = if n == 31 then SP[] else X[n];
offset = X[m];

Z[t] = Replicate(result, VL DIV 128);

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

LD1RQW (scalar plus scalar) Page 2026

LD1RQW (scalar plus scalar) Page 2027

LD1RSB

Load and broadcast signed byte to vector

Load a single signed byte from a memory address generated by a 64-bit scalar base address plus an immediate offset
which is in the range 0 to 63.
Broadcast the loaded data into all active elements of the destination vector, setting the inactive elements to zero. If all
elements are inactive then the instruction will not perform a read from Device memory or cause a data abort.
It has encodings from 3 classes: 16-bit element , 32-bit element and 64-bit element

16-bit element

31302928272625 24 23 2221201918171615 14 13 121110 9 8 7 6 5 4 3 2 1 0

1 0 0 0 0 1 0 1 1 1 imm6 1 1 0 Pg Rn Zt
dtypeh<1>dtypeh<0> dtypel<1>dtypel<0>

LD1RSB { <Zt>.H }, <Pg>/Z, [<Xn|SP>{, #<imm>}]

if !HaveSVE() then UNDEFINED;

integer t = UInt(Zt);
integer n = UInt(Rn);
integer g = UInt(Pg);
integer esize = 16;
integer msize = 8;
boolean unsigned = FALSE;
integer offset = UInt(imm6);

32-bit element

31302928272625 24 23 2221201918171615 14 13 121110 9 8 7 6 5 4 3 2 1 0

1 0 0 0 0 1 0 1 1 1 imm6 1 0 1 Pg Rn Zt
dtypeh<1>dtypeh<0> dtypel<1>dtypel<0>

LD1RSB { <Zt>.S }, <Pg>/Z, [<Xn|SP>{, #<imm>}]

if !HaveSVE() then UNDEFINED;

integer t = UInt(Zt);
integer n = UInt(Rn);
integer g = UInt(Pg);
integer esize = 32;
integer msize = 8;
boolean unsigned = FALSE;
integer offset = UInt(imm6);

64-bit element

31302928272625 24 23 2221201918171615 14 13 121110 9 8 7 6 5 4 3 2 1 0

1 0 0 0 0 1 0 1 1 1 imm6 1 0 0 Pg Rn Zt
dtypeh<1>dtypeh<0> dtypel<1>dtypel<0>

LD1RSB { <Zt>.D }, <Pg>/Z, [<Xn|SP>{, #<imm>}]

if !HaveSVE() then UNDEFINED;

integer t = UInt(Zt);
integer n = UInt(Rn);
integer g = UInt(Pg);
integer esize = 64;
integer msize = 8;
boolean unsigned = FALSE;
integer offset = UInt(imm6);

LD1RSB Page 2028

Assembler Symbols

<Zt> Is the name of the scalable vector register to be transferred, encoded in the "Zt" field.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<imm> Is the optional unsigned immediate byte offset, in the range 0 to 63, defaulting to 0, encoded in the
"imm6" field.

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(64) base;
bits(PL) mask = P[g];
bits(VL) result;
bits(msize) data;
constant integer mbytes = msize DIV 8;

if HaveMTEExt() then SetTagCheckedInstruction(n != 31);

if !AnyActiveElement(mask, esize) then

for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
Elem[result, e, esize] = Extend(data, esize, unsigned);
else
Elem[result, e, esize] = Zeros();

Z[t] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

LD1RSB Page 2029

LD1RSH

Load and broadcast signed halfword to vector

Load a single signed halfword from a memory address generated by a 64-bit scalar base address plus an immediate
offset which is a multiple of 2 in the range 0 to 126.
Broadcast the loaded data into all active elements of the destination vector, setting the inactive elements to zero. If all
elements are inactive then the instruction will not perform a read from Device memory or cause a data abort.
It has encodings from 2 classes: 32-bit element and 64-bit element

32-bit element

31302928272625 24 23 2221201918171615 14 13 121110 9 8 7 6 5 4 3 2 1 0

1 0 0 0 0 1 0 1 0 1 imm6 1 0 1 Pg Rn Zt
dtypeh<1>dtypeh<0> dtypel<1>dtypel<0>

LD1RSH { <Zt>.S }, <Pg>/Z, [<Xn|SP>{, #<imm>}]

if !HaveSVE() then UNDEFINED;

integer t = UInt(Zt);
integer n = UInt(Rn);
integer g = UInt(Pg);
integer esize = 32;
integer msize = 16;
boolean unsigned = FALSE;
integer offset = UInt(imm6);

64-bit element

31302928272625 24 23 2221201918171615 14 13 121110 9 8 7 6 5 4 3 2 1 0

1 0 0 0 0 1 0 1 0 1 imm6 1 0 0 Pg Rn Zt
dtypeh<1>dtypeh<0> dtypel<1>dtypel<0>

LD1RSH { <Zt>.D }, <Pg>/Z, [<Xn|SP>{, #<imm>}]

if !HaveSVE() then UNDEFINED;

integer t = UInt(Zt);
integer n = UInt(Rn);
integer g = UInt(Pg);
integer esize = 64;
integer msize = 16;
boolean unsigned = FALSE;
integer offset = UInt(imm6);

Assembler Symbols

<Zt> Is the name of the scalable vector register to be transferred, encoded in the "Zt" field.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<imm> Is the optional unsigned immediate byte offset, a multiple of 2 in the range 0 to 126, defaulting to 0,
encoded in the "imm6" field.

LD1RSH Page 2030

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(64) base;
bits(PL) mask = P[g];
bits(VL) result;
bits(msize) data;
constant integer mbytes = msize DIV 8;

if HaveMTEExt() then SetTagCheckedInstruction(n != 31);

if !AnyActiveElement(mask, esize) then

for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
Elem[result, e, esize] = Extend(data, esize, unsigned);
else
Elem[result, e, esize] = Zeros();

Z[t] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

LD1RSH Page 2031

LD1RSW

Load and broadcast signed word to vector

Load a single signed word from a memory address generated by a 64-bit scalar base address plus an immediate offset
which is a multiple of 4 in the range 0 to 252.
Broadcast the loaded data into all active elements of the destination vector, setting the inactive elements to zero. If all
elements are inactive then the instruction will not perform a read from Device memory or cause a data abort.
31302928272625 24 23 2221201918171615 14 13 121110 9 8 7 6 5 4 3 2 1 0
1 0 0 0 0 1 0 0 1 1 imm6 1 0 0 Pg Rn Zt
dtypeh<1>dtypeh<0> dtypel<1>dtypel<0>

LD1RSW { <Zt>.D }, <Pg>/Z, [<Xn|SP>{, #<imm>}]

if !HaveSVE() then UNDEFINED;

integer t = UInt(Zt);
integer n = UInt(Rn);
integer g = UInt(Pg);
integer esize = 64;
integer msize = 32;
boolean unsigned = FALSE;
integer offset = UInt(imm6);

Assembler Symbols

<Zt> Is the name of the scalable vector register to be transferred, encoded in the "Zt" field.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<imm> Is the optional unsigned immediate byte offset, a multiple of 4 in the range 0 to 252, defaulting to 0,
encoded in the "imm6" field.

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(64) base;
bits(PL) mask = P[g];
bits(VL) result;
bits(msize) data;
constant integer mbytes = msize DIV 8;

if HaveMTEExt() then SetTagCheckedInstruction(n != 31);

if !AnyActiveElement(mask, esize) then

for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
Elem[result, e, esize] = Extend(data, esize, unsigned);
else
Elem[result, e, esize] = Zeros();

Z[t] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

LD1RSW Page 2032

LD1RSW Page 2033

LD1RW

Load and broadcast unsigned word to vector

Load a single unsigned word from a memory address generated by a 64-bit scalar base address plus an immediate
offset which is a multiple of 4 in the range 0 to 252.
Broadcast the loaded data into all active elements of the destination vector, setting the inactive elements to zero. If all
elements are inactive then the instruction will not perform a read from Device memory or cause a data abort.
It has encodings from 2 classes: 32-bit element and 64-bit element

32-bit element

31302928272625 24 23 2221201918171615 14 13 121110 9 8 7 6 5 4 3 2 1 0

1 0 0 0 0 1 0 1 0 1 imm6 1 1 0 Pg Rn Zt
dtypeh<1>dtypeh<0> dtypel<1>dtypel<0>

LD1RW { <Zt>.S }, <Pg>/Z, [<Xn|SP>{, #<imm>}]

if !HaveSVE() then UNDEFINED;

integer t = UInt(Zt);
integer n = UInt(Rn);
integer g = UInt(Pg);
integer esize = 32;
integer msize = 32;
boolean unsigned = TRUE;
integer offset = UInt(imm6);

64-bit element

31302928272625 24 23 2221201918171615 14 13 121110 9 8 7 6 5 4 3 2 1 0

1 0 0 0 0 1 0 1 0 1 imm6 1 1 1 Pg Rn Zt
dtypeh<1>dtypeh<0> dtypel<1>dtypel<0>

LD1RW { <Zt>.D }, <Pg>/Z, [<Xn|SP>{, #<imm>}]

if !HaveSVE() then UNDEFINED;

integer t = UInt(Zt);
integer n = UInt(Rn);
integer g = UInt(Pg);
integer esize = 64;
integer msize = 32;
boolean unsigned = TRUE;
integer offset = UInt(imm6);

Assembler Symbols

<Zt> Is the name of the scalable vector register to be transferred, encoded in the "Zt" field.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<imm> Is the optional unsigned immediate byte offset, a multiple of 4 in the range 0 to 252, defaulting to 0,
encoded in the "imm6" field.

LD1RW Page 2034

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(64) base;
bits(PL) mask = P[g];
bits(VL) result;
bits(msize) data;
constant integer mbytes = msize DIV 8;

if HaveMTEExt() then SetTagCheckedInstruction(n != 31);

if !AnyActiveElement(mask, esize) then

for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
Elem[result, e, esize] = Extend(data, esize, unsigned);
else
Elem[result, e, esize] = Zeros();

Z[t] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

LD1RW Page 2035

LD1SB (scalar plus immediate)

Contiguous load signed bytes to vector (immediate index)

Contiguous load of signed bytes to elements of a vector register from the memory address generated by a 64-bit scalar
base and immediate index in the range -8 to 7 which is multiplied by the vector's in-memory size, irrespective of
predication, and added to the base address. Inactive elements will not not cause a read from Device memory or signal
a fault, and are set to zero in the destination vector.
It has encodings from 3 classes: 16-bit element , 32-bit element and 64-bit element

16-bit element

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 1 1 1 0 0 imm4 1 0 1 Pg Rn Zt
dtype<3:1>dtype<0>

LD1SB { <Zt>.H }, <Pg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]

if !HaveSVE() then UNDEFINED;

integer t = UInt(Zt);
integer n = UInt(Rn);
integer g = UInt(Pg);
integer esize = 16;
integer msize = 8;
boolean unsigned = FALSE;
integer offset = SInt(imm4);

32-bit element

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 1 1 0 1 0 imm4 1 0 1 Pg Rn Zt
dtype<3:1>dtype<0>

LD1SB { <Zt>.S }, <Pg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]

if !HaveSVE() then UNDEFINED;

integer t = UInt(Zt);
integer n = UInt(Rn);
integer g = UInt(Pg);
integer esize = 32;
integer msize = 8;
boolean unsigned = FALSE;
integer offset = SInt(imm4);

64-bit element

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 1 1 0 0 0 imm4 1 0 1 Pg Rn Zt
dtype<3:1>dtype<0>

LD1SB { <Zt>.D }, <Pg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]

if !HaveSVE() then UNDEFINED;

integer t = UInt(Zt);
integer n = UInt(Rn);
integer g = UInt(Pg);
integer esize = 64;
integer msize = 8;
boolean unsigned = FALSE;
integer offset = SInt(imm4);

LD1SB (scalar plus

Page 2036
immediate)
Assembler Symbols

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(64) base;
bits(PL) mask = P[g];
bits(VL) result;
bits(msize) data;
constant integer mbytes = msize DIV 8;

if HaveMTEExt() then SetTagCheckedInstruction(n != 31);

if !AnyActiveElement(mask, esize) then

if n == 31 && ConstrainUnpredictableBool(Unpredictable_CHECKSPNONEACTIVE) then
CheckSPAlignment();
else
if n == 31 then CheckSPAlignment();
base = if n == 31 then SP[] else X[n];

Z[t] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

LD1SB (scalar plus

Page 2037
immediate)
LD1SB (scalar plus scalar)

Contiguous load signed bytes to vector (scalar index)

Contiguous load of signed bytes to elements of a vector register from the memory address generated by a 64-bit scalar
base and scalar index which is added to the base address. After each element access the index value is incremented,
but the index register is not updated. Inactive elements will not not cause a read from Device memory or signal a fault,
and are set to zero in the destination vector.
It has encodings from 3 classes: 16-bit element , 32-bit element and 64-bit element

16-bit element

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 1 1 1 0 Rm 0 1 0 Pg Rn Zt
dtype<3:1>dtype<0>

LD1SB { <Zt>.H }, <Pg>/Z, [<Xn|SP>, <Xm>]

if !HaveSVE() then UNDEFINED;

if Rm == '11111' then UNDEFINED;
integer t = UInt(Zt);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer g = UInt(Pg);
integer esize = 16;
integer msize = 8;
boolean unsigned = FALSE;

32-bit element

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 1 1 0 1 Rm 0 1 0 Pg Rn Zt
dtype<3:1>dtype<0>

LD1SB { <Zt>.S }, <Pg>/Z, [<Xn|SP>, <Xm>]

if !HaveSVE() then UNDEFINED;

if Rm == '11111' then UNDEFINED;
integer t = UInt(Zt);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer g = UInt(Pg);
integer esize = 32;
integer msize = 8;
boolean unsigned = FALSE;

64-bit element

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 1 1 0 0 Rm 0 1 0 Pg Rn Zt
dtype<3:1>dtype<0>

LD1SB (scalar plus scalar) Page 2038

LD1SB { <Zt>.D }, <Pg>/Z, [<Xn|SP>, <Xm>]

if !HaveSVE() then UNDEFINED;

if Rm == '11111' then UNDEFINED;
integer t = UInt(Zt);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer g = UInt(Pg);
integer esize = 64;
integer msize = 8;
boolean unsigned = FALSE;

Assembler Symbols

<Zt> Is the name of the scalable vector register to be transferred, encoded in the "Zt" field.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<Xm> Is the 64-bit name of the general-purpose offset register, encoded in the "Rm" field.

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(64) base;
bits(PL) mask = P[g];
bits(VL) result;
bits(msize) data;
bits(64) offset;
constant integer mbytes = msize DIV 8;

if HaveMTEExt() then SetTagCheckedInstruction(TRUE);

if !AnyActiveElement(mask, esize) then

if n == 31 && ConstrainUnpredictableBool(Unpredictable_CHECKSPNONEACTIVE) then
CheckSPAlignment();
else
if n == 31 then CheckSPAlignment();
base = if n == 31 then SP[] else X[n];
offset = X[m];

Z[t] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

LD1SB (scalar plus scalar) Page 2039

LD1SB (scalar plus vector)

Gather load signed bytes to vector (vector index)

Gather load of signed bytes to active elements of a vector register from memory addresses generated by a 64-bit
scalar base plus vector index. The index values are optionally sign or zero-extended from 32 to 64 bits. Inactive
elements will not cause a read from Device memory or signal faults, and are set to zero in the destination vector.
It has encodings from 3 classes: 32-bit unpacked unscaled offset , 32-bit unscaled offset and 64-bit unscaled offset

32-bit unpacked unscaled offset

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 0 1 0 0 0 xs 0 Zm 0 0 0 Pg Rn Zt
msz<1>msz<0> U ff

LD1SB { <Zt>.D }, <Pg>/Z, [<Xn|SP>, <Zm>.D, <mod>]

if !HaveSVE() then UNDEFINED;

integer t = UInt(Zt);
integer n = UInt(Rn);
integer m = UInt(Zm);
integer g = UInt(Pg);
integer esize = 64;
integer msize = 8;
integer offs_size = 32;
boolean unsigned = FALSE;
boolean offs_unsigned = xs == '0';
integer scale = 0;

32-bit unscaled offset

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 0 0 0 1 0 0 0 xs 0 Zm 0 0 0 Pg Rn Zt
U ff

LD1SB { <Zt>.S }, <Pg>/Z, [<Xn|SP>, <Zm>.S, <mod>]

if !HaveSVE() then UNDEFINED;

integer t = UInt(Zt);
integer n = UInt(Rn);
integer m = UInt(Zm);
integer g = UInt(Pg);
integer esize = 32;
integer msize = 8;
integer offs_size = 32;
boolean unsigned = FALSE;
boolean offs_unsigned = xs == '0';
integer scale = 0;

64-bit unscaled offset

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 0 1 0 0 0 1 0 Zm 1 0 0 Pg Rn Zt
msz<1>msz<0> U ff

LD1SB (scalar plus vector) Page 2040

LD1SB { <Zt>.D }, <Pg>/Z, [<Xn|SP>, <Zm>.D]

if !HaveSVE() then UNDEFINED;

integer t = UInt(Zt);
integer n = UInt(Rn);
integer m = UInt(Zm);
integer g = UInt(Pg);
integer esize = 64;
integer msize = 8;
integer offs_size = 64;
boolean unsigned = FALSE;
boolean offs_unsigned = TRUE;
integer scale = 0;

Assembler Symbols

<Zt> Is the name of the scalable vector register to be transferred, encoded in the "Zt" field.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<Zm> Is the name of the offset scalable vector register, encoded in the "Zm" field.

<mod> Is the index extend and shift specifier, encoded in “xs”:

xs <mod>
0 UXTW
1 SXTW

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(64) base;
bits(PL) mask = P[g];
bits(VL) offset;
bits(VL) result;
bits(msize) data;
constant integer mbytes = msize DIV 8;

if HaveMTEExt() then SetTagCheckedInstruction(TRUE);

if !AnyActiveElement(mask, esize) then

if n == 31 && ConstrainUnpredictableBool(Unpredictable_CHECKSPNONEACTIVE) then
CheckSPAlignment();
else
if n == 31 then CheckSPAlignment();
base = if n == 31 then SP[] else X[n];
offset = Z[m];

Z[t] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

LD1SB (scalar plus vector) Page 2041

LD1SB (vector plus immediate)

Gather load signed bytes to vector (immediate index)

Gather load of signed bytes to active elements of a vector register from memory addresses generated by a vector base
plus immediate index. The index is in the range 0 to 31. Inactive elements will not cause a read from Device memory
or signal faults, and are set to zero in the destination vector.
It has encodings from 2 classes: 32-bit element and 64-bit element

32-bit element

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 0 0 0 1 0 0 0 0 1 imm5 1 0 0 Pg Zn Zt
msz<1>msz<0> U ff

LD1SB { <Zt>.S }, <Pg>/Z, [<Zn>.S{, #<imm>}]

if !HaveSVE() then UNDEFINED;

integer t = UInt(Zt);
integer n = UInt(Zn);
integer g = UInt(Pg);
integer esize = 32;
integer msize = 8;
boolean unsigned = FALSE;
integer offset = UInt(imm5);

64-bit element

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 0 1 0 0 0 0 1 imm5 1 0 0 Pg Zn Zt
msz<1>msz<0> U ff

LD1SB { <Zt>.D }, <Pg>/Z, [<Zn>.D{, #<imm>}]

if !HaveSVE() then UNDEFINED;

integer t = UInt(Zt);
integer n = UInt(Zn);
integer g = UInt(Pg);
integer esize = 64;
integer msize = 8;
boolean unsigned = FALSE;
integer offset = UInt(imm5);

Assembler Symbols

<Zt> Is the name of the scalable vector register to be transferred, encoded in the "Zt" field.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Zn> Is the name of the base scalable vector register, encoded in the "Zn" field.
<imm> Is the optional unsigned immediate byte offset, in the range 0 to 31, defaulting to 0, encoded in the
"imm5" field.

LD1SB (vector plus

Page 2042
immediate)
Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) base;
bits(VL) result;
bits(msize) data;
constant integer mbytes = msize DIV 8;

if HaveMTEExt() then SetTagCheckedInstruction(TRUE);

if AnyActiveElement(mask, esize) then

base = Z[n];

Z[t] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

LD1SB (vector plus

Page 2043
immediate)
LD1SH (scalar plus immediate)

Contiguous load signed halfwords to vector (immediate index)

Contiguous load of signed halfwords to elements of a vector register from the memory address generated by a 64-bit
scalar base and immediate index in the range -8 to 7 which is multiplied by the vector's in-memory size, irrespective of
predication, and added to the base address. Inactive elements will not not cause a read from Device memory or signal
a fault, and are set to zero in the destination vector.
It has encodings from 2 classes: 32-bit element and 64-bit element

32-bit element

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 1 0 0 1 0 imm4 1 0 1 Pg Rn Zt
dtype<3:1>dtype<0>

LD1SH { <Zt>.S }, <Pg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]

if !HaveSVE() then UNDEFINED;

integer t = UInt(Zt);
integer n = UInt(Rn);
integer g = UInt(Pg);
integer esize = 32;
integer msize = 16;
boolean unsigned = FALSE;
integer offset = SInt(imm4);

64-bit element

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 1 0 0 0 0 imm4 1 0 1 Pg Rn Zt
dtype<3:1>dtype<0>

LD1SH { <Zt>.D }, <Pg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]

if !HaveSVE() then UNDEFINED;

integer t = UInt(Zt);
integer n = UInt(Rn);
integer g = UInt(Pg);
integer esize = 64;
integer msize = 16;
boolean unsigned = FALSE;
integer offset = SInt(imm4);

Assembler Symbols

LD1SH (scalar plus

Page 2044
immediate)
Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(64) base;
bits(PL) mask = P[g];
bits(VL) result;
bits(msize) data;
constant integer mbytes = msize DIV 8;

if HaveMTEExt() then SetTagCheckedInstruction(n != 31);

if !AnyActiveElement(mask, esize) then

if n == 31 && ConstrainUnpredictableBool(Unpredictable_CHECKSPNONEACTIVE) then
CheckSPAlignment();
else
if n == 31 then CheckSPAlignment();
base = if n == 31 then SP[] else X[n];

Z[t] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

LD1SH (scalar plus

Page 2045
immediate)
LD1SH (scalar plus scalar)

Contiguous load signed halfwords to vector (scalar index)

Contiguous load of signed halfwords to elements of a vector register from the memory address generated by a 64-bit
scalar base and scalar index which is multiplied by 2 and added to the base address. After each element access the
index value is incremented, but the index register is not updated. Inactive elements will not not cause a read from
Device memory or signal a fault, and are set to zero in the destination vector.
It has encodings from 2 classes: 32-bit element and 64-bit element

32-bit element

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 1 0 0 1 Rm 0 1 0 Pg Rn Zt
dtype<3:1>dtype<0>

LD1SH { <Zt>.S }, <Pg>/Z, [<Xn|SP>, <Xm>, LSL #1]

if !HaveSVE() then UNDEFINED;

if Rm == '11111' then UNDEFINED;
integer t = UInt(Zt);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer g = UInt(Pg);
integer esize = 32;
integer msize = 16;
boolean unsigned = FALSE;

64-bit element

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 1 0 0 0 Rm 0 1 0 Pg Rn Zt
dtype<3:1>dtype<0>

LD1SH { <Zt>.D }, <Pg>/Z, [<Xn|SP>, <Xm>, LSL #1]

if !HaveSVE() then UNDEFINED;

if Rm == '11111' then UNDEFINED;
integer t = UInt(Zt);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer g = UInt(Pg);
integer esize = 64;
integer msize = 16;
boolean unsigned = FALSE;

Assembler Symbols

<Zt> Is the name of the scalable vector register to be transferred, encoded in the "Zt" field.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<Xm> Is the 64-bit name of the general-purpose offset register, encoded in the "Rm" field.

LD1SH (scalar plus scalar) Page 2046

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(64) base;
bits(PL) mask = P[g];
bits(VL) result;
bits(msize) data;
bits(64) offset;
constant integer mbytes = msize DIV 8;

if HaveMTEExt() then SetTagCheckedInstruction(TRUE);

if !AnyActiveElement(mask, esize) then

if n == 31 && ConstrainUnpredictableBool(Unpredictable_CHECKSPNONEACTIVE) then
CheckSPAlignment();
else
if n == 31 then CheckSPAlignment();
base = if n == 31 then SP[] else X[n];
offset = X[m];

Z[t] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

LD1SH (scalar plus scalar) Page 2047

LD1SH (scalar plus vector)

Gather load signed halfwords to vector (vector index)

Gather load of signed halfwords to active elements of a vector register from memory addresses generated by a 64-bit
scalar base plus vector index. The index values are optionally first sign or zero-extended from 32 to 64 bits and then
optionally multiplied by 2. Inactive elements will not cause a read from Device memory or signal faults, and are set to
zero in the destination vector.
It has encodings from 6 classes: 32-bit scaled offset , 32-bit unpacked scaled offset , 32-bit unpacked unscaled offset ,
32-bit unscaled offset , 64-bit scaled offset and 64-bit unscaled offset

32-bit scaled offset

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 0 0 0 1 0 0 1 xs 1 Zm 0 0 0 Pg Rn Zt
U ff

LD1SH { <Zt>.S }, <Pg>/Z, [<Xn|SP>, <Zm>.S, <mod> #1]

if !HaveSVE() then UNDEFINED;

integer t = UInt(Zt);
integer n = UInt(Rn);
integer m = UInt(Zm);
integer g = UInt(Pg);
integer esize = 32;
integer msize = 16;
integer offs_size = 32;
boolean unsigned = FALSE;
boolean offs_unsigned = xs == '0';
integer scale = 1;

32-bit unpacked scaled offset

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 0 1 0 0 1 xs 1 Zm 0 0 0 Pg Rn Zt
U ff

LD1SH { <Zt>.D }, <Pg>/Z, [<Xn|SP>, <Zm>.D, <mod> #1]

if !HaveSVE() then UNDEFINED;

integer t = UInt(Zt);
integer n = UInt(Rn);
integer m = UInt(Zm);
integer g = UInt(Pg);
integer esize = 64;
integer msize = 16;
integer offs_size = 32;
boolean unsigned = FALSE;
boolean offs_unsigned = xs == '0';
integer scale = 1;

32-bit unpacked unscaled offset

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 0 1 0 0 1 xs 0 Zm 0 0 0 Pg Rn Zt
msz<1>msz<0> U ff

LD1SH (scalar plus vector) Page 2048

LD1SH { <Zt>.D }, <Pg>/Z, [<Xn|SP>, <Zm>.D, <mod>]

if !HaveSVE() then UNDEFINED;

32-bit unscaled offset

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 0 0 0 1 0 0 1 xs 0 Zm 0 0 0 Pg Rn Zt
U ff

LD1SH { <Zt>.S }, <Pg>/Z, [<Xn|SP>, <Zm>.S, <mod>]

if !HaveSVE() then UNDEFINED;

64-bit scaled offset

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 0 1 0 0 1 1 1 Zm 1 0 0 Pg Rn Zt
U ff

LD1SH { <Zt>.D }, <Pg>/Z, [<Xn|SP>, <Zm>.D, LSL #1]

if !HaveSVE() then UNDEFINED;

integer t = UInt(Zt);
integer n = UInt(Rn);
integer m = UInt(Zm);
integer g = UInt(Pg);
integer esize = 64;
integer msize = 16;
integer offs_size = 64;
boolean unsigned = FALSE;
boolean offs_unsigned = TRUE;
integer scale = 1;

64-bit unscaled offset

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 0 1 0 0 1 1 0 Zm 1 0 0 Pg Rn Zt
msz<1>msz<0> U ff

LD1SH (scalar plus vector) Page 2049

LD1SH { <Zt>.D }, <Pg>/Z, [<Xn|SP>, <Zm>.D]

if !HaveSVE() then UNDEFINED;

Assembler Symbols

<Zt> Is the name of the scalable vector register to be transferred, encoded in the "Zt" field.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<Zm> Is the name of the offset scalable vector register, encoded in the "Zm" field.

<mod> Is the index extend and shift specifier, encoded in “xs”:

xs <mod>
0 UXTW
1 SXTW

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(64) base;
bits(PL) mask = P[g];
bits(VL) offset;
bits(VL) result;
bits(msize) data;
constant integer mbytes = msize DIV 8;

if HaveMTEExt() then SetTagCheckedInstruction(TRUE);

if !AnyActiveElement(mask, esize) then

if n == 31 && ConstrainUnpredictableBool(Unpredictable_CHECKSPNONEACTIVE) then
CheckSPAlignment();
else
if n == 31 then CheckSPAlignment();
base = if n == 31 then SP[] else X[n];
offset = Z[m];

Z[t] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

LD1SH (scalar plus vector) Page 2050

LD1SH (vector plus immediate)

Gather load signed halfwords to vector (immediate index)

Gather load of signed halfwords to active elements of a vector register from memory addresses generated by a vector
base plus immediate index. The index is a multiple of 2 in the range 0 to 62. Inactive elements will not cause a read
from Device memory or signal faults, and are set to zero in the destination vector.
It has encodings from 2 classes: 32-bit element and 64-bit element

32-bit element

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 0 0 0 1 0 0 1 0 1 imm5 1 0 0 Pg Zn Zt
msz<1>msz<0> U ff

LD1SH { <Zt>.S }, <Pg>/Z, [<Zn>.S{, #<imm>}]

if !HaveSVE() then UNDEFINED;

integer t = UInt(Zt);
integer n = UInt(Zn);
integer g = UInt(Pg);
integer esize = 32;
integer msize = 16;
boolean unsigned = FALSE;
integer offset = UInt(imm5);

64-bit element

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 0 1 0 0 1 0 1 imm5 1 0 0 Pg Zn Zt
msz<1>msz<0> U ff

LD1SH { <Zt>.D }, <Pg>/Z, [<Zn>.D{, #<imm>}]

if !HaveSVE() then UNDEFINED;

integer t = UInt(Zt);
integer n = UInt(Zn);
integer g = UInt(Pg);
integer esize = 64;
integer msize = 16;
boolean unsigned = FALSE;
integer offset = UInt(imm5);

Assembler Symbols

<Zt> Is the name of the scalable vector register to be transferred, encoded in the "Zt" field.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Zn> Is the name of the base scalable vector register, encoded in the "Zn" field.
<imm> Is the optional unsigned immediate byte offset, a multiple of 2 in the range 0 to 62, defaulting to 0,
encoded in the "imm5" field.

LD1SH (vector plus

Page 2051
immediate)
Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) base;
bits(VL) result;
bits(msize) data;
constant integer mbytes = msize DIV 8;

if HaveMTEExt() then SetTagCheckedInstruction(TRUE);

if AnyActiveElement(mask, esize) then

base = Z[n];

Z[t] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

LD1SH (vector plus

Page 2052
immediate)
LD1SW (scalar plus immediate)

Contiguous load signed words to vector (immediate index)

Contiguous load of signed words to elements of a vector register from the memory address generated by a 64-bit
scalar base and immediate index in the range -8 to 7 which is multiplied by the vector's in-memory size, irrespective of
predication, and added to the base address. Inactive elements will not not cause a read from Device memory or signal
a fault, and are set to zero in the destination vector.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 0 1 0 0 0 imm4 1 0 1 Pg Rn Zt
dtype<3:1>dtype<0>

LD1SW { <Zt>.D }, <Pg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]

if !HaveSVE() then UNDEFINED;

integer t = UInt(Zt);
integer n = UInt(Rn);
integer g = UInt(Pg);
integer esize = 64;
integer msize = 32;
boolean unsigned = FALSE;
integer offset = SInt(imm4);

Assembler Symbols

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(64) base;
bits(PL) mask = P[g];
bits(VL) result;
bits(msize) data;
constant integer mbytes = msize DIV 8;

if HaveMTEExt() then SetTagCheckedInstruction(n != 31);

if !AnyActiveElement(mask, esize) then

if n == 31 && ConstrainUnpredictableBool(Unpredictable_CHECKSPNONEACTIVE) then
CheckSPAlignment();
else
if n == 31 then CheckSPAlignment();
base = if n == 31 then SP[] else X[n];

Z[t] = result;

LD1SW (scalar plus

Page 2053
immediate)
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

LD1SW (scalar plus

Page 2054
immediate)
LD1SW (scalar plus scalar)

Contiguous load signed words to vector (scalar index)

Contiguous load of signed words to elements of a vector register from the memory address generated by a 64-bit
scalar base and scalar index which is multiplied by 4 and added to the base address. After each element access the
index value is incremented, but the index register is not updated. Inactive elements will not not cause a read from
Device memory or signal a fault, and are set to zero in the destination vector.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 0 1 0 0 Rm 0 1 0 Pg Rn Zt
dtype<3:1>dtype<0>

LD1SW { <Zt>.D }, <Pg>/Z, [<Xn|SP>, <Xm>, LSL #2]

if !HaveSVE() then UNDEFINED;

if Rm == '11111' then UNDEFINED;
integer t = UInt(Zt);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer g = UInt(Pg);
integer esize = 64;
integer msize = 32;
boolean unsigned = FALSE;

Assembler Symbols

<Zt> Is the name of the scalable vector register to be transferred, encoded in the "Zt" field.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<Xm> Is the 64-bit name of the general-purpose offset register, encoded in the "Rm" field.

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(64) base;
bits(PL) mask = P[g];
bits(VL) result;
bits(msize) data;
bits(64) offset;
constant integer mbytes = msize DIV 8;

if HaveMTEExt() then SetTagCheckedInstruction(TRUE);

if !AnyActiveElement(mask, esize) then

if n == 31 && ConstrainUnpredictableBool(Unpredictable_CHECKSPNONEACTIVE) then
CheckSPAlignment();
else
if n == 31 then CheckSPAlignment();
base = if n == 31 then SP[] else X[n];
offset = X[m];

Z[t] = result;

LD1SW (scalar plus scalar) Page 2055

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

LD1SW (scalar plus scalar) Page 2056

LD1SW (scalar plus vector)

Gather load signed words to vector (vector index)

Gather load of signed words to active elements of a vector register from memory addresses generated by a 64-bit
scalar base plus vector index. The index values are optionally first sign or zero-extended from 32 to 64 bits and then
optionally multiplied by 4. Inactive elements will not cause a read from Device memory or signal faults, and are set to
zero in the destination vector.
It has encodings from 4 classes: 32-bit unpacked scaled offset , 32-bit unpacked unscaled offset , 64-bit scaled offset
and 64-bit unscaled offset

32-bit unpacked scaled offset

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 0 1 0 1 0 xs 1 Zm 0 0 0 Pg Rn Zt
U ff

LD1SW { <Zt>.D }, <Pg>/Z, [<Xn|SP>, <Zm>.D, <mod> #2]

if !HaveSVE() then UNDEFINED;

integer t = UInt(Zt);
integer n = UInt(Rn);
integer m = UInt(Zm);
integer g = UInt(Pg);
integer esize = 64;
integer msize = 32;
integer offs_size = 32;
boolean unsigned = FALSE;
boolean offs_unsigned = xs == '0';
integer scale = 2;

32-bit unpacked unscaled offset

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 0 1 0 1 0 xs 0 Zm 0 0 0 Pg Rn Zt
msz<1>msz<0> U ff

LD1SW { <Zt>.D }, <Pg>/Z, [<Xn|SP>, <Zm>.D, <mod>]

if !HaveSVE() then UNDEFINED;

64-bit scaled offset

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 0 1 0 1 0 1 1 Zm 1 0 0 Pg Rn Zt
U ff

LD1SW (scalar plus vector) Page 2057

LD1SW { <Zt>.D }, <Pg>/Z, [<Xn|SP>, <Zm>.D, LSL #2]

if !HaveSVE() then UNDEFINED;

integer t = UInt(Zt);
integer n = UInt(Rn);
integer m = UInt(Zm);
integer g = UInt(Pg);
integer esize = 64;
integer msize = 32;
integer offs_size = 64;
boolean unsigned = FALSE;
boolean offs_unsigned = TRUE;
integer scale = 2;

64-bit unscaled offset

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 0 1 0 1 0 1 0 Zm 1 0 0 Pg Rn Zt
msz<1>msz<0> U ff

LD1SW { <Zt>.D }, <Pg>/Z, [<Xn|SP>, <Zm>.D]

if !HaveSVE() then UNDEFINED;

Assembler Symbols

<Zt> Is the name of the scalable vector register to be transferred, encoded in the "Zt" field.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<Zm> Is the name of the offset scalable vector register, encoded in the "Zm" field.

<mod> Is the index extend and shift specifier, encoded in “xs”:

xs <mod>
0 UXTW
1 SXTW

LD1SW (scalar plus vector) Page 2058

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(64) base;
bits(PL) mask = P[g];
bits(VL) offset;
bits(VL) result;
bits(msize) data;
constant integer mbytes = msize DIV 8;

if HaveMTEExt() then SetTagCheckedInstruction(TRUE);

if !AnyActiveElement(mask, esize) then

if n == 31 && ConstrainUnpredictableBool(Unpredictable_CHECKSPNONEACTIVE) then
CheckSPAlignment();
else
if n == 31 then CheckSPAlignment();
base = if n == 31 then SP[] else X[n];
offset = Z[m];

Z[t] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

LD1SW (scalar plus vector) Page 2059

LD1SW (vector plus immediate)

Gather load signed words to vector (immediate index)

Gather load of signed words to active elements of a vector register from memory addresses generated by a vector base
plus immediate index. The index is a multiple of 4 in the range 0 to 124. Inactive elements will not cause a read from
Device memory or signal faults, and are set to zero in the destination vector.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 0 1 0 1 0 0 1 imm5 1 0 0 Pg Zn Zt
msz<1>msz<0> U ff

LD1SW { <Zt>.D }, <Pg>/Z, [<Zn>.D{, #<imm>}]

if !HaveSVE() then UNDEFINED;

integer t = UInt(Zt);
integer n = UInt(Zn);
integer g = UInt(Pg);
integer esize = 64;
integer msize = 32;
boolean unsigned = FALSE;
integer offset = UInt(imm5);

Assembler Symbols

<Zt> Is the name of the scalable vector register to be transferred, encoded in the "Zt" field.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Zn> Is the name of the base scalable vector register, encoded in the "Zn" field.
<imm> Is the optional unsigned immediate byte offset, a multiple of 4 in the range 0 to 124, defaulting to 0,
encoded in the "imm5" field.

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) base;
bits(VL) result;
bits(msize) data;
constant integer mbytes = msize DIV 8;

if HaveMTEExt() then SetTagCheckedInstruction(TRUE);

if AnyActiveElement(mask, esize) then

base = Z[n];

Z[t] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

LD1SW (vector plus

Page 2060
immediate)
LD1W (scalar plus immediate)

Contiguous load unsigned words to vector (immediate index)

Contiguous load of unsigned words to elements of a vector register from the memory address generated by a 64-bit
scalar base and immediate index in the range -8 to 7 which is multiplied by the vector's in-memory size, irrespective of
predication, and added to the base address. Inactive elements will not not cause a read from Device memory or signal
a fault, and are set to zero in the destination vector.
It has encodings from 2 classes: 32-bit element and 64-bit element

32-bit element

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 1 0 1 0 0 imm4 1 0 1 Pg Rn Zt
dtype<3:1>dtype<0>

LD1W { <Zt>.S }, <Pg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]

if !HaveSVE() then UNDEFINED;

integer t = UInt(Zt);
integer n = UInt(Rn);
integer g = UInt(Pg);
integer esize = 32;
integer msize = 32;
boolean unsigned = TRUE;
integer offset = SInt(imm4);

64-bit element

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 1 0 1 1 0 imm4 1 0 1 Pg Rn Zt
dtype<3:1>dtype<0>

LD1W { <Zt>.D }, <Pg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]

if !HaveSVE() then UNDEFINED;

integer t = UInt(Zt);
integer n = UInt(Rn);
integer g = UInt(Pg);
integer esize = 64;
integer msize = 32;
boolean unsigned = TRUE;
integer offset = SInt(imm4);

Assembler Symbols

LD1W (scalar plus immediate) Page 2061

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(64) base;
bits(PL) mask = P[g];
bits(VL) result;
bits(msize) data;
constant integer mbytes = msize DIV 8;

if HaveMTEExt() then SetTagCheckedInstruction(n != 31);

if !AnyActiveElement(mask, esize) then

if n == 31 && ConstrainUnpredictableBool(Unpredictable_CHECKSPNONEACTIVE) then
CheckSPAlignment();
else
if n == 31 then CheckSPAlignment();
base = if n == 31 then SP[] else X[n];

Z[t] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

LD1W (scalar plus immediate) Page 2062

LD1W (scalar plus scalar)

Contiguous load unsigned words to vector (scalar index)

Contiguous load of unsigned words to elements of a vector register from the memory address generated by a 64-bit
scalar base and scalar index which is multiplied by 4 and added to the base address. After each element access the
index value is incremented, but the index register is not updated. Inactive elements will not not cause a read from
Device memory or signal a fault, and are set to zero in the destination vector.
It has encodings from 2 classes: 32-bit element and 64-bit element

32-bit element

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 1 0 1 0 Rm 0 1 0 Pg Rn Zt
dtype<3:1>dtype<0>

LD1W { <Zt>.S }, <Pg>/Z, [<Xn|SP>, <Xm>, LSL #2]

if !HaveSVE() then UNDEFINED;

if Rm == '11111' then UNDEFINED;
integer t = UInt(Zt);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer g = UInt(Pg);
integer esize = 32;
integer msize = 32;
boolean unsigned = TRUE;

64-bit element

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 1 0 1 1 Rm 0 1 0 Pg Rn Zt
dtype<3:1>dtype<0>

LD1W { <Zt>.D }, <Pg>/Z, [<Xn|SP>, <Xm>, LSL #2]

if !HaveSVE() then UNDEFINED;

if Rm == '11111' then UNDEFINED;
integer t = UInt(Zt);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer g = UInt(Pg);
integer esize = 64;
integer msize = 32;
boolean unsigned = TRUE;

Assembler Symbols

<Zt> Is the name of the scalable vector register to be transferred, encoded in the "Zt" field.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<Xm> Is the 64-bit name of the general-purpose offset register, encoded in the "Rm" field.

LD1W (scalar plus scalar) Page 2063

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(64) base;
bits(PL) mask = P[g];
bits(VL) result;
bits(msize) data;
bits(64) offset;
constant integer mbytes = msize DIV 8;

if HaveMTEExt() then SetTagCheckedInstruction(TRUE);

if !AnyActiveElement(mask, esize) then

if n == 31 && ConstrainUnpredictableBool(Unpredictable_CHECKSPNONEACTIVE) then
CheckSPAlignment();
else
if n == 31 then CheckSPAlignment();
base = if n == 31 then SP[] else X[n];
offset = X[m];

Z[t] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

LD1W (scalar plus scalar) Page 2064

LD1W (scalar plus vector)

Gather load unsigned words to vector (vector index)

Gather load of unsigned words to active elements of a vector register from memory addresses generated by a 64-bit
scalar base plus vector index. The index values are optionally first sign or zero-extended from 32 to 64 bits and then
optionally multiplied by 4. Inactive elements will not cause a read from Device memory or signal faults, and are set to
zero in the destination vector.
It has encodings from 6 classes: 32-bit scaled offset , 32-bit unpacked scaled offset , 32-bit unpacked unscaled offset ,
32-bit unscaled offset , 64-bit scaled offset and 64-bit unscaled offset

32-bit scaled offset

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 0 0 0 1 0 1 0 xs 1 Zm 0 1 0 Pg Rn Zt
U ff

LD1W { <Zt>.S }, <Pg>/Z, [<Xn|SP>, <Zm>.S, <mod> #2]

if !HaveSVE() then UNDEFINED;

integer t = UInt(Zt);
integer n = UInt(Rn);
integer m = UInt(Zm);
integer g = UInt(Pg);
integer esize = 32;
integer msize = 32;
integer offs_size = 32;
boolean unsigned = TRUE;
boolean offs_unsigned = xs == '0';
integer scale = 2;

32-bit unpacked scaled offset

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 0 1 0 1 0 xs 1 Zm 0 1 0 Pg Rn Zt
U ff

LD1W { <Zt>.D }, <Pg>/Z, [<Xn|SP>, <Zm>.D, <mod> #2]

if !HaveSVE() then UNDEFINED;

integer t = UInt(Zt);
integer n = UInt(Rn);
integer m = UInt(Zm);
integer g = UInt(Pg);
integer esize = 64;
integer msize = 32;
integer offs_size = 32;
boolean unsigned = TRUE;
boolean offs_unsigned = xs == '0';
integer scale = 2;

32-bit unpacked unscaled offset

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 0 1 0 1 0 xs 0 Zm 0 1 0 Pg Rn Zt
msz<1>msz<0> U ff

LD1W (scalar plus vector) Page 2065

LD1W { <Zt>.D }, <Pg>/Z, [<Xn|SP>, <Zm>.D, <mod>]

if !HaveSVE() then UNDEFINED;

32-bit unscaled offset

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 0 0 0 1 0 1 0 xs 0 Zm 0 1 0 Pg Rn Zt
U ff

LD1W { <Zt>.S }, <Pg>/Z, [<Xn|SP>, <Zm>.S, <mod>]

if !HaveSVE() then UNDEFINED;

64-bit scaled offset

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 0 1 0 1 0 1 1 Zm 1 1 0 Pg Rn Zt
U ff

LD1W { <Zt>.D }, <Pg>/Z, [<Xn|SP>, <Zm>.D, LSL #2]

if !HaveSVE() then UNDEFINED;

integer t = UInt(Zt);
integer n = UInt(Rn);
integer m = UInt(Zm);
integer g = UInt(Pg);
integer esize = 64;
integer msize = 32;
integer offs_size = 64;
boolean unsigned = TRUE;
boolean offs_unsigned = TRUE;
integer scale = 2;

64-bit unscaled offset

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 0 1 0 1 0 1 0 Zm 1 1 0 Pg Rn Zt
msz<1>msz<0> U ff

LD1W (scalar plus vector) Page 2066

LD1W { <Zt>.D }, <Pg>/Z, [<Xn|SP>, <Zm>.D]

if !HaveSVE() then UNDEFINED;

Assembler Symbols

<Zt> Is the name of the scalable vector register to be transferred, encoded in the "Zt" field.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<Zm> Is the name of the offset scalable vector register, encoded in the "Zm" field.

<mod> Is the index extend and shift specifier, encoded in “xs”:

xs <mod>
0 UXTW
1 SXTW

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(64) base;
bits(PL) mask = P[g];
bits(VL) offset;
bits(VL) result;
bits(msize) data;
constant integer mbytes = msize DIV 8;

if HaveMTEExt() then SetTagCheckedInstruction(TRUE);

if !AnyActiveElement(mask, esize) then

if n == 31 && ConstrainUnpredictableBool(Unpredictable_CHECKSPNONEACTIVE) then
CheckSPAlignment();
else
if n == 31 then CheckSPAlignment();
base = if n == 31 then SP[] else X[n];
offset = Z[m];

Z[t] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

LD1W (scalar plus vector) Page 2067

LD1W (vector plus immediate)

Gather load unsigned words to vector (immediate index)

Gather load of unsigned words to active elements of a vector register from memory addresses generated by a vector
base plus immediate index. The index is a multiple of 4 in the range 0 to 124. Inactive elements will not cause a read
from Device memory or signal faults, and are set to zero in the destination vector.
It has encodings from 2 classes: 32-bit element and 64-bit element

32-bit element

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 0 0 0 1 0 1 0 0 1 imm5 1 1 0 Pg Zn Zt
msz<1>msz<0> U ff

LD1W { <Zt>.S }, <Pg>/Z, [<Zn>.S{, #<imm>}]

if !HaveSVE() then UNDEFINED;

integer t = UInt(Zt);
integer n = UInt(Zn);
integer g = UInt(Pg);
integer esize = 32;
integer msize = 32;
boolean unsigned = TRUE;
integer offset = UInt(imm5);

64-bit element

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 0 1 0 1 0 0 1 imm5 1 1 0 Pg Zn Zt
msz<1>msz<0> U ff

LD1W { <Zt>.D }, <Pg>/Z, [<Zn>.D{, #<imm>}]

if !HaveSVE() then UNDEFINED;

integer t = UInt(Zt);
integer n = UInt(Zn);
integer g = UInt(Pg);
integer esize = 64;
integer msize = 32;
boolean unsigned = TRUE;
integer offset = UInt(imm5);

Assembler Symbols

<Zt> Is the name of the scalable vector register to be transferred, encoded in the "Zt" field.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Zn> Is the name of the base scalable vector register, encoded in the "Zn" field.
<imm> Is the optional unsigned immediate byte offset, a multiple of 4 in the range 0 to 124, defaulting to 0,
encoded in the "imm5" field.

LD1W (vector plus

Page 2068
immediate)
Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) base;
bits(VL) result;
bits(msize) data;
constant integer mbytes = msize DIV 8;

if HaveMTEExt() then SetTagCheckedInstruction(TRUE);

if AnyActiveElement(mask, esize) then

base = Z[n];

Z[t] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

LD1W (vector plus

Page 2069
immediate)
LD2B (scalar plus immediate)

Contiguous load two-byte structures to two vectors (immediate index)

Contiguous load two-byte structures, each to the same element number in two vector registers from the memory
address generated by a 64-bit scalar base and an immediate index which is a multiple of 2 in the range -16 to 14 that
is multiplied by the vector's in-memory size, irrespective of predication,
Each predicate element applies to the same element number in each of the two vector registers, or equivalently to the
two consecutive bytes in memory which make up each structure. Inactive elements will not cause a read from Device
memory or signal a fault, and the corresponding element is set to zero in each of the two destination vector registers.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 0 0 0 1 0 imm4 1 1 1 Pg Rn Zt
msz<1>msz<0>

LD2B { <Zt1>.B, <Zt2>.B }, <Pg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]

if !HaveSVE() then UNDEFINED;

integer t = UInt(Zt);
integer n = UInt(Rn);
integer g = UInt(Pg);
integer esize = 8;
integer offset = SInt(imm4);
integer nreg = 2;

Assembler Symbols

<Zt1> Is the name of the first scalable vector register to be transferred, encoded in the "Zt" field.
<Zt2> Is the name of the second scalable vector register to be transferred, encoded as "Zt" plus 1 modulo 32.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<imm> Is the optional signed immediate vector offset, a multiple of 2 in the range -16 to 14, defaulting to 0,
encoded in the "imm4" field.

LD2B (scalar plus immediate) Page 2070

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(64) base;
bits(PL) mask = P[g];
constant integer mbytes = esize DIV 8;
array [0..1] of bits(VL) values;

if HaveMTEExt() then SetTagCheckedInstruction(n != 31);

if !AnyActiveElement(mask, esize) then

if n == 31 && ConstrainUnpredictableBool(Unpredictable_CHECKSPNONEACTIVE) then
CheckSPAlignment();
else
if n == 31 then CheckSPAlignment();
base = if n == 31 then SP[] else X[n];

for e = 0 to elements-1
for r = 0 to nreg-1
if ElemP[mask, e, esize] == '1' then
integer eoff = (offset * elements * nreg) + (e * nreg) + r;
bits(64) addr = base + eoff * mbytes;
Elem[values[r], e, esize] = Mem[addr, mbytes, AccType_SVE];
else
Elem[values[r], e, esize] = Zeros();

for r = 0 to nreg-1
Z[(t+r) MOD 32] = values[r];

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

LD2B (scalar plus immediate) Page 2071

LD2B (scalar plus scalar)

Contiguous load two-byte structures to two vectors (scalar index)

Contiguous load two-byte structures, each to the same element number in two vector registers from the memory
address generated by a 64-bit scalar base and a 64-bit scalar index register and added to the base address. After each
structure access the index value is incremented by two. The index register is not updated by the instruction.
Each predicate element applies to the same element number in each of the two vector registers, or equivalently to the
two consecutive bytes in memory which make up each structure. Inactive elements will not cause a read from Device
memory or signal a fault, and the corresponding element is set to zero in each of the two destination vector registers.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 0 0 0 1 Rm 1 1 0 Pg Rn Zt
msz<1>msz<0>

LD2B { <Zt1>.B, <Zt2>.B }, <Pg>/Z, [<Xn|SP>, <Xm>]

if !HaveSVE() then UNDEFINED;

if Rm == '11111' then UNDEFINED;
integer t = UInt(Zt);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer g = UInt(Pg);
integer esize = 8;
integer nreg = 2;

Assembler Symbols

<Zt1> Is the name of the first scalable vector register to be transferred, encoded in the "Zt" field.
<Zt2> Is the name of the second scalable vector register to be transferred, encoded as "Zt" plus 1 modulo 32.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<Xm> Is the 64-bit name of the general-purpose offset register, encoded in the "Rm" field.

LD2B (scalar plus scalar) Page 2072

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(64) base;
bits(PL) mask = P[g];
bits(64) offset;
constant integer mbytes = esize DIV 8;
array [0..1] of bits(VL) values;

if HaveMTEExt() then SetTagCheckedInstruction(TRUE);

if !AnyActiveElement(mask, esize) then

if n == 31 && ConstrainUnpredictableBool(Unpredictable_CHECKSPNONEACTIVE) then
CheckSPAlignment();
else
if n == 31 then CheckSPAlignment();
base = if n == 31 then SP[] else X[n];
offset = X[m];

for e = 0 to elements-1
for r = 0 to nreg-1
if ElemP[mask, e, esize] == '1' then
integer eoff = UInt(offset) + (e * nreg) + r;
bits(64) addr = base + eoff * mbytes;
Elem[values[r], e, esize] = Mem[addr, mbytes, AccType_SVE];
else
Elem[values[r], e, esize] = Zeros();

for r = 0 to nreg-1
Z[(t+r) MOD 32] = values[r];

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

LD2B (scalar plus scalar) Page 2073

LD2D (scalar plus immediate)

Contiguous load two-doubleword structures to two vectors (immediate index)

Contiguous load two-doubleword structures, each to the same element number in two vector registers from the
memory address generated by a 64-bit scalar base and an immediate index which is a multiple of 2 in the range -16 to
14 that is multiplied by the vector's in-memory size, irrespective of predication,
Each predicate element applies to the same element number in each of the two vector registers, or equivalently to the
two consecutive doublewords in memory which make up each structure. Inactive elements will not cause a read from
Device memory or signal a fault, and the corresponding element is set to zero in each of the two destination vector
registers.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 1 1 0 1 0 imm4 1 1 1 Pg Rn Zt
msz<1>msz<0>

LD2D { <Zt1>.D, <Zt2>.D }, <Pg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]

if !HaveSVE() then UNDEFINED;

integer t = UInt(Zt);
integer n = UInt(Rn);
integer g = UInt(Pg);
integer esize = 64;
integer offset = SInt(imm4);
integer nreg = 2;

Assembler Symbols

LD2D (scalar plus immediate) Page 2074

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(64) base;
bits(PL) mask = P[g];
constant integer mbytes = esize DIV 8;
array [0..1] of bits(VL) values;

if HaveMTEExt() then SetTagCheckedInstruction(n != 31);

if !AnyActiveElement(mask, esize) then

if n == 31 && ConstrainUnpredictableBool(Unpredictable_CHECKSPNONEACTIVE) then
CheckSPAlignment();
else
if n == 31 then CheckSPAlignment();
base = if n == 31 then SP[] else X[n];

for r = 0 to nreg-1
Z[(t+r) MOD 32] = values[r];

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

LD2D (scalar plus immediate) Page 2075

LD2D (scalar plus scalar)

Contiguous load two-doubleword structures to two vectors (scalar index)

Contiguous load two-doubleword structures, each to the same element number in two vector registers from the
memory address generated by a 64-bit scalar base and a 64-bit scalar index register scaled by the element size (LSL
option) and added to the base address. After each structure access the index value is incremented by two. The index
register is not updated by the instruction.
Each predicate element applies to the same element number in each of the two vector registers, or equivalently to the
two consecutive doublewords in memory which make up each structure. Inactive elements will not cause a read from
Device memory or signal a fault, and the corresponding element is set to zero in each of the two destination vector
registers.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 1 1 0 1 Rm 1 1 0 Pg Rn Zt
msz<1>msz<0>

LD2D { <Zt1>.D, <Zt2>.D }, <Pg>/Z, [<Xn|SP>, <Xm>, LSL #3]

if !HaveSVE() then UNDEFINED;

if Rm == '11111' then UNDEFINED;
integer t = UInt(Zt);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer g = UInt(Pg);
integer esize = 64;
integer nreg = 2;

Assembler Symbols

<Zt1> Is the name of the first scalable vector register to be transferred, encoded in the "Zt" field.
<Zt2> Is the name of the second scalable vector register to be transferred, encoded as "Zt" plus 1 modulo 32.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<Xm> Is the 64-bit name of the general-purpose offset register, encoded in the "Rm" field.

LD2D (scalar plus scalar) Page 2076

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(64) base;
bits(PL) mask = P[g];
bits(64) offset;
constant integer mbytes = esize DIV 8;
array [0..1] of bits(VL) values;

if HaveMTEExt() then SetTagCheckedInstruction(TRUE);

if !AnyActiveElement(mask, esize) then

if n == 31 && ConstrainUnpredictableBool(Unpredictable_CHECKSPNONEACTIVE) then
CheckSPAlignment();
else
if n == 31 then CheckSPAlignment();
base = if n == 31 then SP[] else X[n];
offset = X[m];

for r = 0 to nreg-1
Z[(t+r) MOD 32] = values[r];

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

LD2D (scalar plus scalar) Page 2077

LD2H (scalar plus immediate)

Contiguous load two-halfword structures to two vectors (immediate index)

Contiguous load two-halfword structures, each to the same element number in two vector registers from the memory
address generated by a 64-bit scalar base and an immediate index which is a multiple of 2 in the range -16 to 14 that
is multiplied by the vector's in-memory size, irrespective of predication,
Each predicate element applies to the same element number in each of the two vector registers, or equivalently to the
two consecutive halfwords in memory which make up each structure. Inactive elements will not cause a read from
Device memory or signal a fault, and the corresponding element is set to zero in each of the two destination vector
registers.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 0 1 0 1 0 imm4 1 1 1 Pg Rn Zt
msz<1>msz<0>

LD2H { <Zt1>.H, <Zt2>.H }, <Pg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]

if !HaveSVE() then UNDEFINED;

integer t = UInt(Zt);
integer n = UInt(Rn);
integer g = UInt(Pg);
integer esize = 16;
integer offset = SInt(imm4);
integer nreg = 2;

Assembler Symbols

LD2H (scalar plus immediate) Page 2078

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(64) base;
bits(PL) mask = P[g];
constant integer mbytes = esize DIV 8;
array [0..1] of bits(VL) values;

if HaveMTEExt() then SetTagCheckedInstruction(n != 31);

if !AnyActiveElement(mask, esize) then

if n == 31 && ConstrainUnpredictableBool(Unpredictable_CHECKSPNONEACTIVE) then
CheckSPAlignment();
else
if n == 31 then CheckSPAlignment();
base = if n == 31 then SP[] else X[n];

for r = 0 to nreg-1
Z[(t+r) MOD 32] = values[r];

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

LD2H (scalar plus immediate) Page 2079

LD2H (scalar plus scalar)

Contiguous load two-halfword structures to two vectors (scalar index)

Contiguous load two-halfword structures, each to the same element number in two vector registers from the memory
address generated by a 64-bit scalar base and a 64-bit scalar index register scaled by the element size (LSL option)
and added to the base address. After each structure access the index value is incremented by two. The index register
is not updated by the instruction.
Each predicate element applies to the same element number in each of the two vector registers, or equivalently to the
two consecutive halfwords in memory which make up each structure. Inactive elements will not cause a read from
Device memory or signal a fault, and the corresponding element is set to zero in each of the two destination vector
registers.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 0 1 0 1 Rm 1 1 0 Pg Rn Zt
msz<1>msz<0>

LD2H { <Zt1>.H, <Zt2>.H }, <Pg>/Z, [<Xn|SP>, <Xm>, LSL #1]

if !HaveSVE() then UNDEFINED;

if Rm == '11111' then UNDEFINED;
integer t = UInt(Zt);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer g = UInt(Pg);
integer esize = 16;
integer nreg = 2;

Assembler Symbols

<Zt1> Is the name of the first scalable vector register to be transferred, encoded in the "Zt" field.
<Zt2> Is the name of the second scalable vector register to be transferred, encoded as "Zt" plus 1 modulo 32.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<Xm> Is the 64-bit name of the general-purpose offset register, encoded in the "Rm" field.

LD2H (scalar plus scalar) Page 2080

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(64) base;
bits(PL) mask = P[g];
bits(64) offset;
constant integer mbytes = esize DIV 8;
array [0..1] of bits(VL) values;

if HaveMTEExt() then SetTagCheckedInstruction(TRUE);

if !AnyActiveElement(mask, esize) then

if n == 31 && ConstrainUnpredictableBool(Unpredictable_CHECKSPNONEACTIVE) then
CheckSPAlignment();
else
if n == 31 then CheckSPAlignment();
base = if n == 31 then SP[] else X[n];
offset = X[m];

for r = 0 to nreg-1
Z[(t+r) MOD 32] = values[r];

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

LD2H (scalar plus scalar) Page 2081

LD2W (scalar plus immediate)

Contiguous load two-word structures to two vectors (immediate index)

Contiguous load two-word structures, each to the same element number in two vector registers from the memory
address generated by a 64-bit scalar base and an immediate index which is a multiple of 2 in the range -16 to 14 that
is multiplied by the vector's in-memory size, irrespective of predication,
Each predicate element applies to the same element number in each of the two vector registers, or equivalently to the
two consecutive words in memory which make up each structure. Inactive elements will not cause a read from Device
memory or signal a fault, and the corresponding element is set to zero in each of the two destination vector registers.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 1 0 0 1 0 imm4 1 1 1 Pg Rn Zt
msz<1>msz<0>

LD2W { <Zt1>.S, <Zt2>.S }, <Pg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]

if !HaveSVE() then UNDEFINED;

integer t = UInt(Zt);
integer n = UInt(Rn);
integer g = UInt(Pg);
integer esize = 32;
integer offset = SInt(imm4);
integer nreg = 2;

Assembler Symbols

LD2W (scalar plus immediate) Page 2082

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(64) base;
bits(PL) mask = P[g];
constant integer mbytes = esize DIV 8;
array [0..1] of bits(VL) values;

if HaveMTEExt() then SetTagCheckedInstruction(n != 31);

if !AnyActiveElement(mask, esize) then

if n == 31 && ConstrainUnpredictableBool(Unpredictable_CHECKSPNONEACTIVE) then
CheckSPAlignment();
else
if n == 31 then CheckSPAlignment();
base = if n == 31 then SP[] else X[n];

for r = 0 to nreg-1
Z[(t+r) MOD 32] = values[r];

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

LD2W (scalar plus immediate) Page 2083

LD2W (scalar plus scalar)

Contiguous load two-word structures to two vectors (scalar index)

Contiguous load two-word structures, each to the same element number in two vector registers from the memory
address generated by a 64-bit scalar base and a 64-bit scalar index register scaled by the element size (LSL option)
and added to the base address. After each structure access the index value is incremented by two. The index register
is not updated by the instruction.
Each predicate element applies to the same element number in each of the two vector registers, or equivalently to the
two consecutive words in memory which make up each structure. Inactive elements will not cause a read from Device
memory or signal a fault, and the corresponding element is set to zero in each of the two destination vector registers.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 1 0 0 1 Rm 1 1 0 Pg Rn Zt
msz<1>msz<0>

LD2W { <Zt1>.S, <Zt2>.S }, <Pg>/Z, [<Xn|SP>, <Xm>, LSL #2]

if !HaveSVE() then UNDEFINED;

if Rm == '11111' then UNDEFINED;
integer t = UInt(Zt);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer g = UInt(Pg);
integer esize = 32;
integer nreg = 2;

Assembler Symbols

<Zt1> Is the name of the first scalable vector register to be transferred, encoded in the "Zt" field.
<Zt2> Is the name of the second scalable vector register to be transferred, encoded as "Zt" plus 1 modulo 32.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<Xm> Is the 64-bit name of the general-purpose offset register, encoded in the "Rm" field.

LD2W (scalar plus scalar) Page 2084

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(64) base;
bits(PL) mask = P[g];
bits(64) offset;
constant integer mbytes = esize DIV 8;
array [0..1] of bits(VL) values;

if HaveMTEExt() then SetTagCheckedInstruction(TRUE);

if !AnyActiveElement(mask, esize) then

if n == 31 && ConstrainUnpredictableBool(Unpredictable_CHECKSPNONEACTIVE) then
CheckSPAlignment();
else
if n == 31 then CheckSPAlignment();
base = if n == 31 then SP[] else X[n];
offset = X[m];

for r = 0 to nreg-1
Z[(t+r) MOD 32] = values[r];

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

LD2W (scalar plus scalar) Page 2085

LD3B (scalar plus immediate)

Contiguous load three-byte structures to three vectors (immediate index)

Contiguous load three-byte structures, each to the same element number in three vector registers from the memory
address generated by a 64-bit scalar base and an immediate index which is a multiple of 3 in the range -24 to 21 that
is multiplied by the vector's in-memory size, irrespective of predication,
Each predicate element applies to the same element number in each of the three vector registers, or equivalently to
the three consecutive bytes in memory which make up each structure. Inactive elements will not cause a read from
Device memory or signal a fault, and the corresponding element is set to zero in each of the three destination vector
registers.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 0 0 1 0 0 imm4 1 1 1 Pg Rn Zt
msz<1>msz<0>

LD3B { <Zt1>.B, <Zt2>.B, <Zt3>.B }, <Pg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]

if !HaveSVE() then UNDEFINED;

integer t = UInt(Zt);
integer n = UInt(Rn);
integer g = UInt(Pg);
integer esize = 8;
integer offset = SInt(imm4);
integer nreg = 3;

Assembler Symbols

<Zt1> Is the name of the first scalable vector register to be transferred, encoded in the "Zt" field.
<Zt2> Is the name of the second scalable vector register to be transferred, encoded as "Zt" plus 1 modulo 32.
<Zt3> Is the name of the third scalable vector register to be transferred, encoded as "Zt" plus 2 modulo 32.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<imm> Is the optional signed immediate vector offset, a multiple of 3 in the range -24 to 21, defaulting to 0,
encoded in the "imm4" field.

LD3B (scalar plus immediate) Page 2086

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(64) base;
bits(PL) mask = P[g];
constant integer mbytes = esize DIV 8;
array [0..2] of bits(VL) values;

if HaveMTEExt() then SetTagCheckedInstruction(n != 31);

if !AnyActiveElement(mask, esize) then

if n == 31 && ConstrainUnpredictableBool(Unpredictable_CHECKSPNONEACTIVE) then
CheckSPAlignment();
else
if n == 31 then CheckSPAlignment();
base = if n == 31 then SP[] else X[n];

for r = 0 to nreg-1
Z[(t+r) MOD 32] = values[r];

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

LD3B (scalar plus immediate) Page 2087

LD3B (scalar plus scalar)

Contiguous load three-byte structures to three vectors (scalar index)

Contiguous load three-byte structures, each to the same element number in three vector registers from the memory
address generated by a 64-bit scalar base and a 64-bit scalar index register and added to the base address. After each
structure access the index value is incremented by three. The index register is not updated by the instruction.
Each predicate element applies to the same element number in each of the three vector registers, or equivalently to
the three consecutive bytes in memory which make up each structure. Inactive elements will not cause a read from
Device memory or signal a fault, and the corresponding element is set to zero in each of the three destination vector
registers.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 0 0 1 0 Rm 1 1 0 Pg Rn Zt
msz<1>msz<0>

LD3B { <Zt1>.B, <Zt2>.B, <Zt3>.B }, <Pg>/Z, [<Xn|SP>, <Xm>]

if !HaveSVE() then UNDEFINED;

if Rm == '11111' then UNDEFINED;
integer t = UInt(Zt);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer g = UInt(Pg);
integer esize = 8;
integer nreg = 3;

Assembler Symbols

<Zt1> Is the name of the first scalable vector register to be transferred, encoded in the "Zt" field.
<Zt2> Is the name of the second scalable vector register to be transferred, encoded as "Zt" plus 1 modulo 32.
<Zt3> Is the name of the third scalable vector register to be transferred, encoded as "Zt" plus 2 modulo 32.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<Xm> Is the 64-bit name of the general-purpose offset register, encoded in the "Rm" field.

LD3B (scalar plus scalar) Page 2088

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(64) base;
bits(PL) mask = P[g];
bits(64) offset;
constant integer mbytes = esize DIV 8;
array [0..2] of bits(VL) values;

if HaveMTEExt() then SetTagCheckedInstruction(TRUE);

if !AnyActiveElement(mask, esize) then

if n == 31 && ConstrainUnpredictableBool(Unpredictable_CHECKSPNONEACTIVE) then
CheckSPAlignment();
else
if n == 31 then CheckSPAlignment();
base = if n == 31 then SP[] else X[n];
offset = X[m];

for r = 0 to nreg-1
Z[(t+r) MOD 32] = values[r];

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

LD3B (scalar plus scalar) Page 2089

LD3D (scalar plus immediate)

Contiguous load three-doubleword structures to three vectors (immediate index)

Contiguous load three-doubleword structures, each to the same element number in three vector registers from the
memory address generated by a 64-bit scalar base and an immediate index which is a multiple of 3 in the range -24 to
21 that is multiplied by the vector's in-memory size, irrespective of predication,
Each predicate element applies to the same element number in each of the three vector registers, or equivalently to
the three consecutive doublewords in memory which make up each structure. Inactive elements will not cause a read
from Device memory or signal a fault, and the corresponding element is set to zero in each of the three destination
vector registers.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 1 1 1 0 0 imm4 1 1 1 Pg Rn Zt
msz<1>msz<0>

LD3D { <Zt1>.D, <Zt2>.D, <Zt3>.D }, <Pg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]

if !HaveSVE() then UNDEFINED;

integer t = UInt(Zt);
integer n = UInt(Rn);
integer g = UInt(Pg);
integer esize = 64;
integer offset = SInt(imm4);
integer nreg = 3;

Assembler Symbols

<Zt1> Is the name of the first scalable vector register to be transferred, encoded in the "Zt" field.
<Zt2> Is the name of the second scalable vector register to be transferred, encoded as "Zt" plus 1 modulo 32.
<Zt3> Is the name of the third scalable vector register to be transferred, encoded as "Zt" plus 2 modulo 32.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<imm> Is the optional signed immediate vector offset, a multiple of 3 in the range -24 to 21, defaulting to 0,
encoded in the "imm4" field.

LD3D (scalar plus immediate) Page 2090

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(64) base;
bits(PL) mask = P[g];
constant integer mbytes = esize DIV 8;
array [0..2] of bits(VL) values;

if HaveMTEExt() then SetTagCheckedInstruction(n != 31);

if !AnyActiveElement(mask, esize) then

if n == 31 && ConstrainUnpredictableBool(Unpredictable_CHECKSPNONEACTIVE) then
CheckSPAlignment();
else
if n == 31 then CheckSPAlignment();
base = if n == 31 then SP[] else X[n];

for r = 0 to nreg-1
Z[(t+r) MOD 32] = values[r];

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

LD3D (scalar plus immediate) Page 2091

LD3D (scalar plus scalar)

Contiguous load three-doubleword structures to three vectors (scalar index)

Contiguous load three-doubleword structures, each to the same element number in three vector registers from the
memory address generated by a 64-bit scalar base and a 64-bit scalar index register scaled by the element size (LSL
option) and added to the base address. After each structure access the index value is incremented by three. The index
register is not updated by the instruction.
Each predicate element applies to the same element number in each of the three vector registers, or equivalently to
the three consecutive doublewords in memory which make up each structure. Inactive elements will not cause a read
from Device memory or signal a fault, and the corresponding element is set to zero in each of the three destination
vector registers.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 1 1 1 0 Rm 1 1 0 Pg Rn Zt
msz<1>msz<0>

LD3D { <Zt1>.D, <Zt2>.D, <Zt3>.D }, <Pg>/Z, [<Xn|SP>, <Xm>, LSL #3]

if !HaveSVE() then UNDEFINED;

if Rm == '11111' then UNDEFINED;
integer t = UInt(Zt);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer g = UInt(Pg);
integer esize = 64;
integer nreg = 3;

Assembler Symbols

<Zt1> Is the name of the first scalable vector register to be transferred, encoded in the "Zt" field.
<Zt2> Is the name of the second scalable vector register to be transferred, encoded as "Zt" plus 1 modulo 32.
<Zt3> Is the name of the third scalable vector register to be transferred, encoded as "Zt" plus 2 modulo 32.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<Xm> Is the 64-bit name of the general-purpose offset register, encoded in the "Rm" field.

LD3D (scalar plus scalar) Page 2092

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(64) base;
bits(PL) mask = P[g];
bits(64) offset;
constant integer mbytes = esize DIV 8;
array [0..2] of bits(VL) values;

if HaveMTEExt() then SetTagCheckedInstruction(TRUE);

if !AnyActiveElement(mask, esize) then

if n == 31 && ConstrainUnpredictableBool(Unpredictable_CHECKSPNONEACTIVE) then
CheckSPAlignment();
else
if n == 31 then CheckSPAlignment();
base = if n == 31 then SP[] else X[n];
offset = X[m];

for r = 0 to nreg-1
Z[(t+r) MOD 32] = values[r];

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

LD3D (scalar plus scalar) Page 2093

LD3H (scalar plus immediate)

Contiguous load three-halfword structures to three vectors (immediate index)

Contiguous load three-halfword structures, each to the same element number in three vector registers from the
memory address generated by a 64-bit scalar base and an immediate index which is a multiple of 3 in the range -24 to
21 that is multiplied by the vector's in-memory size, irrespective of predication,
Each predicate element applies to the same element number in each of the three vector registers, or equivalently to
the three consecutive halfwords in memory which make up each structure. Inactive elements will not cause a read
from Device memory or signal a fault, and the corresponding element is set to zero in each of the three destination
vector registers.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 0 1 1 0 0 imm4 1 1 1 Pg Rn Zt
msz<1>msz<0>

LD3H { <Zt1>.H, <Zt2>.H, <Zt3>.H }, <Pg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]

if !HaveSVE() then UNDEFINED;

integer t = UInt(Zt);
integer n = UInt(Rn);
integer g = UInt(Pg);
integer esize = 16;
integer offset = SInt(imm4);
integer nreg = 3;

Assembler Symbols

<Zt1> Is the name of the first scalable vector register to be transferred, encoded in the "Zt" field.
<Zt2> Is the name of the second scalable vector register to be transferred, encoded as "Zt" plus 1 modulo 32.
<Zt3> Is the name of the third scalable vector register to be transferred, encoded as "Zt" plus 2 modulo 32.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<imm> Is the optional signed immediate vector offset, a multiple of 3 in the range -24 to 21, defaulting to 0,
encoded in the "imm4" field.

LD3H (scalar plus immediate) Page 2094

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(64) base;
bits(PL) mask = P[g];
constant integer mbytes = esize DIV 8;
array [0..2] of bits(VL) values;

if HaveMTEExt() then SetTagCheckedInstruction(n != 31);

if !AnyActiveElement(mask, esize) then

if n == 31 && ConstrainUnpredictableBool(Unpredictable_CHECKSPNONEACTIVE) then
CheckSPAlignment();
else
if n == 31 then CheckSPAlignment();
base = if n == 31 then SP[] else X[n];

for r = 0 to nreg-1
Z[(t+r) MOD 32] = values[r];

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

LD3H (scalar plus immediate) Page 2095

LD3H (scalar plus scalar)

Contiguous load three-halfword structures to three vectors (scalar index)

Contiguous load three-halfword structures, each to the same element number in three vector registers from the
memory address generated by a 64-bit scalar base and a 64-bit scalar index register scaled by the element size (LSL
option) and added to the base address. After each structure access the index value is incremented by three. The index
register is not updated by the instruction.
Each predicate element applies to the same element number in each of the three vector registers, or equivalently to
the three consecutive halfwords in memory which make up each structure. Inactive elements will not cause a read
from Device memory or signal a fault, and the corresponding element is set to zero in each of the three destination
vector registers.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 0 1 1 0 Rm 1 1 0 Pg Rn Zt
msz<1>msz<0>

LD3H { <Zt1>.H, <Zt2>.H, <Zt3>.H }, <Pg>/Z, [<Xn|SP>, <Xm>, LSL #1]

if !HaveSVE() then UNDEFINED;

if Rm == '11111' then UNDEFINED;
integer t = UInt(Zt);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer g = UInt(Pg);
integer esize = 16;
integer nreg = 3;

Assembler Symbols

<Zt1> Is the name of the first scalable vector register to be transferred, encoded in the "Zt" field.
<Zt2> Is the name of the second scalable vector register to be transferred, encoded as "Zt" plus 1 modulo 32.
<Zt3> Is the name of the third scalable vector register to be transferred, encoded as "Zt" plus 2 modulo 32.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<Xm> Is the 64-bit name of the general-purpose offset register, encoded in the "Rm" field.

LD3H (scalar plus scalar) Page 2096

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(64) base;
bits(PL) mask = P[g];
bits(64) offset;
constant integer mbytes = esize DIV 8;
array [0..2] of bits(VL) values;

if HaveMTEExt() then SetTagCheckedInstruction(TRUE);

if !AnyActiveElement(mask, esize) then

if n == 31 && ConstrainUnpredictableBool(Unpredictable_CHECKSPNONEACTIVE) then
CheckSPAlignment();
else
if n == 31 then CheckSPAlignment();
base = if n == 31 then SP[] else X[n];
offset = X[m];

for r = 0 to nreg-1
Z[(t+r) MOD 32] = values[r];

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

LD3H (scalar plus scalar) Page 2097

LD3W (scalar plus immediate)

Contiguous load three-word structures to three vectors (immediate index)

Contiguous load three-word structures, each to the same element number in three vector registers from the memory
address generated by a 64-bit scalar base and an immediate index which is a multiple of 3 in the range -24 to 21 that
is multiplied by the vector's in-memory size, irrespective of predication,
Each predicate element applies to the same element number in each of the three vector registers, or equivalently to
the three consecutive words in memory which make up each structure. Inactive elements will not cause a read from
Device memory or signal a fault, and the corresponding element is set to zero in each of the three destination vector
registers.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 1 0 1 0 0 imm4 1 1 1 Pg Rn Zt
msz<1>msz<0>

LD3W { <Zt1>.S, <Zt2>.S, <Zt3>.S }, <Pg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]

if !HaveSVE() then UNDEFINED;

integer t = UInt(Zt);
integer n = UInt(Rn);
integer g = UInt(Pg);
integer esize = 32;
integer offset = SInt(imm4);
integer nreg = 3;

Assembler Symbols

<Zt1> Is the name of the first scalable vector register to be transferred, encoded in the "Zt" field.
<Zt2> Is the name of the second scalable vector register to be transferred, encoded as "Zt" plus 1 modulo 32.
<Zt3> Is the name of the third scalable vector register to be transferred, encoded as "Zt" plus 2 modulo 32.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<imm> Is the optional signed immediate vector offset, a multiple of 3 in the range -24 to 21, defaulting to 0,
encoded in the "imm4" field.

LD3W (scalar plus immediate) Page 2098

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(64) base;
bits(PL) mask = P[g];
constant integer mbytes = esize DIV 8;
array [0..2] of bits(VL) values;

if HaveMTEExt() then SetTagCheckedInstruction(n != 31);

if !AnyActiveElement(mask, esize) then

if n == 31 && ConstrainUnpredictableBool(Unpredictable_CHECKSPNONEACTIVE) then
CheckSPAlignment();
else
if n == 31 then CheckSPAlignment();
base = if n == 31 then SP[] else X[n];

for r = 0 to nreg-1
Z[(t+r) MOD 32] = values[r];

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

LD3W (scalar plus immediate) Page 2099

LD3W (scalar plus scalar)

Contiguous load three-word structures to three vectors (scalar index)

Contiguous load three-word structures, each to the same element number in three vector registers from the memory
address generated by a 64-bit scalar base and a 64-bit scalar index register scaled by the element size (LSL option)
and added to the base address. After each structure access the index value is incremented by three. The index register
is not updated by the instruction.
Each predicate element applies to the same element number in each of the three vector registers, or equivalently to
the three consecutive words in memory which make up each structure. Inactive elements will not cause a read from
Device memory or signal a fault, and the corresponding element is set to zero in each of the three destination vector
registers.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 1 0 1 0 Rm 1 1 0 Pg Rn Zt
msz<1>msz<0>

LD3W { <Zt1>.S, <Zt2>.S, <Zt3>.S }, <Pg>/Z, [<Xn|SP>, <Xm>, LSL #2]

if !HaveSVE() then UNDEFINED;

if Rm == '11111' then UNDEFINED;
integer t = UInt(Zt);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer g = UInt(Pg);
integer esize = 32;
integer nreg = 3;

Assembler Symbols

<Zt1> Is the name of the first scalable vector register to be transferred, encoded in the "Zt" field.
<Zt2> Is the name of the second scalable vector register to be transferred, encoded as "Zt" plus 1 modulo 32.
<Zt3> Is the name of the third scalable vector register to be transferred, encoded as "Zt" plus 2 modulo 32.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<Xm> Is the 64-bit name of the general-purpose offset register, encoded in the "Rm" field.

LD3W (scalar plus scalar) Page 2100

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(64) base;
bits(PL) mask = P[g];
bits(64) offset;
constant integer mbytes = esize DIV 8;
array [0..2] of bits(VL) values;

if HaveMTEExt() then SetTagCheckedInstruction(TRUE);

if !AnyActiveElement(mask, esize) then

if n == 31 && ConstrainUnpredictableBool(Unpredictable_CHECKSPNONEACTIVE) then
CheckSPAlignment();
else
if n == 31 then CheckSPAlignment();
base = if n == 31 then SP[] else X[n];
offset = X[m];

for r = 0 to nreg-1
Z[(t+r) MOD 32] = values[r];

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

LD3W (scalar plus scalar) Page 2101

LD4B (scalar plus immediate)

Contiguous load four-byte structures to four vectors (immediate index)

Contiguous load four-byte structures, each to the same element number in four vector registers from the memory
address generated by a 64-bit scalar base and an immediate index which is a multiple of 4 in the range -32 to 28 that
is multiplied by the vector's in-memory size, irrespective of predication,
Each predicate element applies to the same element number in each of the four vector registers, or equivalently to the
four consecutive bytes in memory which make up each structure. Inactive elements will not cause a read from Device
memory or signal a fault, and the corresponding element is set to zero in each of the four destination vector registers.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 0 0 1 1 0 imm4 1 1 1 Pg Rn Zt
msz<1>msz<0>

LD4B { <Zt1>.B, <Zt2>.B, <Zt3>.B, <Zt4>.B }, <Pg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]

if !HaveSVE() then UNDEFINED;

integer t = UInt(Zt);
integer n = UInt(Rn);
integer g = UInt(Pg);
integer esize = 8;
integer offset = SInt(imm4);
integer nreg = 4;

Assembler Symbols

<Zt1> Is the name of the first scalable vector register to be transferred, encoded in the "Zt" field.
<Zt2> Is the name of the second scalable vector register to be transferred, encoded as "Zt" plus 1 modulo 32.
<Zt3> Is the name of the third scalable vector register to be transferred, encoded as "Zt" plus 2 modulo 32.
<Zt4> Is the name of the fourth scalable vector register to be transferred, encoded as "Zt" plus 3 modulo 32.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<imm> Is the optional signed immediate vector offset, a multiple of 4 in the range -32 to 28, defaulting to 0,
encoded in the "imm4" field.

LD4B (scalar plus immediate) Page 2102

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(64) base;
bits(PL) mask = P[g];
constant integer mbytes = esize DIV 8;
array [0..3] of bits(VL) values;

if HaveMTEExt() then SetTagCheckedInstruction(n != 31);

if !AnyActiveElement(mask, esize) then

if n == 31 && ConstrainUnpredictableBool(Unpredictable_CHECKSPNONEACTIVE) then
CheckSPAlignment();
else
if n == 31 then CheckSPAlignment();
base = if n == 31 then SP[] else X[n];

for r = 0 to nreg-1
Z[(t+r) MOD 32] = values[r];

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

LD4B (scalar plus immediate) Page 2103

LD4B (scalar plus scalar)

Contiguous load four-byte structures to four vectors (scalar index)

Contiguous load four-byte structures, each to the same element number in four vector registers from the memory
address generated by a 64-bit scalar base and a 64-bit scalar index register and added to the base address. After each
structure access the index value is incremented by four. The index register is not updated by the instruction.
Each predicate element applies to the same element number in each of the four vector registers, or equivalently to the
four consecutive bytes in memory which make up each structure. Inactive elements will not cause a read from Device
memory or signal a fault, and the corresponding element is set to zero in each of the four destination vector registers.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 0 0 1 1 Rm 1 1 0 Pg Rn Zt
msz<1>msz<0>

LD4B { <Zt1>.B, <Zt2>.B, <Zt3>.B, <Zt4>.B }, <Pg>/Z, [<Xn|SP>, <Xm>]

if !HaveSVE() then UNDEFINED;

if Rm == '11111' then UNDEFINED;
integer t = UInt(Zt);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer g = UInt(Pg);
integer esize = 8;
integer nreg = 4;

Assembler Symbols

<Zt1> Is the name of the first scalable vector register to be transferred, encoded in the "Zt" field.
<Zt2> Is the name of the second scalable vector register to be transferred, encoded as "Zt" plus 1 modulo 32.
<Zt3> Is the name of the third scalable vector register to be transferred, encoded as "Zt" plus 2 modulo 32.
<Zt4> Is the name of the fourth scalable vector register to be transferred, encoded as "Zt" plus 3 modulo 32.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<Xm> Is the 64-bit name of the general-purpose offset register, encoded in the "Rm" field.

LD4B (scalar plus scalar) Page 2104

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(64) base;
bits(PL) mask = P[g];
bits(64) offset;
constant integer mbytes = esize DIV 8;
array [0..3] of bits(VL) values;

if HaveMTEExt() then SetTagCheckedInstruction(TRUE);

if !AnyActiveElement(mask, esize) then

if n == 31 && ConstrainUnpredictableBool(Unpredictable_CHECKSPNONEACTIVE) then
CheckSPAlignment();
else
if n == 31 then CheckSPAlignment();
base = if n == 31 then SP[] else X[n];
offset = X[m];

for r = 0 to nreg-1
Z[(t+r) MOD 32] = values[r];

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

LD4B (scalar plus scalar) Page 2105

LD4D (scalar plus immediate)

Contiguous load four-doubleword structures to four vectors (immediate index)

Contiguous load four-doubleword structures, each to the same element number in four vector registers from the
memory address generated by a 64-bit scalar base and an immediate index which is a multiple of 4 in the range -32 to
28 that is multiplied by the vector's in-memory size, irrespective of predication,
Each predicate element applies to the same element number in each of the four vector registers, or equivalently to the
four consecutive doublewords in memory which make up each structure. Inactive elements will not cause a read from
Device memory or signal a fault, and the corresponding element is set to zero in each of the four destination vector
registers.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 1 1 1 1 0 imm4 1 1 1 Pg Rn Zt
msz<1>msz<0>

LD4D { <Zt1>.D, <Zt2>.D, <Zt3>.D, <Zt4>.D }, <Pg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]

if !HaveSVE() then UNDEFINED;

integer t = UInt(Zt);
integer n = UInt(Rn);
integer g = UInt(Pg);
integer esize = 64;
integer offset = SInt(imm4);
integer nreg = 4;

Assembler Symbols

<Zt1> Is the name of the first scalable vector register to be transferred, encoded in the "Zt" field.
<Zt2> Is the name of the second scalable vector register to be transferred, encoded as "Zt" plus 1 modulo 32.
<Zt3> Is the name of the third scalable vector register to be transferred, encoded as "Zt" plus 2 modulo 32.
<Zt4> Is the name of the fourth scalable vector register to be transferred, encoded as "Zt" plus 3 modulo 32.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<imm> Is the optional signed immediate vector offset, a multiple of 4 in the range -32 to 28, defaulting to 0,
encoded in the "imm4" field.

LD4D (scalar plus immediate) Page 2106

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(64) base;
bits(PL) mask = P[g];
constant integer mbytes = esize DIV 8;
array [0..3] of bits(VL) values;

if HaveMTEExt() then SetTagCheckedInstruction(n != 31);

if !AnyActiveElement(mask, esize) then

if n == 31 && ConstrainUnpredictableBool(Unpredictable_CHECKSPNONEACTIVE) then
CheckSPAlignment();
else
if n == 31 then CheckSPAlignment();
base = if n == 31 then SP[] else X[n];

for r = 0 to nreg-1
Z[(t+r) MOD 32] = values[r];

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

LD4D (scalar plus immediate) Page 2107

LD4D (scalar plus scalar)

Contiguous load four-doubleword structures to four vectors (scalar index)

Contiguous load four-doubleword structures, each to the same element number in four vector registers from the
memory address generated by a 64-bit scalar base and a 64-bit scalar index register scaled by the element size (LSL
option) and added to the base address. After each structure access the index value is incremented by four. The index
register is not updated by the instruction.
Each predicate element applies to the same element number in each of the four vector registers, or equivalently to the
four consecutive doublewords in memory which make up each structure. Inactive elements will not cause a read from
Device memory or signal a fault, and the corresponding element is set to zero in each of the four destination vector
registers.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 1 1 1 1 Rm 1 1 0 Pg Rn Zt
msz<1>msz<0>

LD4D { <Zt1>.D, <Zt2>.D, <Zt3>.D, <Zt4>.D }, <Pg>/Z, [<Xn|SP>, <Xm>, LSL #3]

if !HaveSVE() then UNDEFINED;

if Rm == '11111' then UNDEFINED;
integer t = UInt(Zt);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer g = UInt(Pg);
integer esize = 64;
integer nreg = 4;

Assembler Symbols

<Zt1> Is the name of the first scalable vector register to be transferred, encoded in the "Zt" field.
<Zt2> Is the name of the second scalable vector register to be transferred, encoded as "Zt" plus 1 modulo 32.
<Zt3> Is the name of the third scalable vector register to be transferred, encoded as "Zt" plus 2 modulo 32.
<Zt4> Is the name of the fourth scalable vector register to be transferred, encoded as "Zt" plus 3 modulo 32.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<Xm> Is the 64-bit name of the general-purpose offset register, encoded in the "Rm" field.

LD4D (scalar plus scalar) Page 2108

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(64) base;
bits(PL) mask = P[g];
bits(64) offset;
constant integer mbytes = esize DIV 8;
array [0..3] of bits(VL) values;

if HaveMTEExt() then SetTagCheckedInstruction(TRUE);

if !AnyActiveElement(mask, esize) then

if n == 31 && ConstrainUnpredictableBool(Unpredictable_CHECKSPNONEACTIVE) then
CheckSPAlignment();
else
if n == 31 then CheckSPAlignment();
base = if n == 31 then SP[] else X[n];
offset = X[m];

for r = 0 to nreg-1
Z[(t+r) MOD 32] = values[r];

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

LD4D (scalar plus scalar) Page 2109

LD4H (scalar plus immediate)

Contiguous load four-halfword structures to four vectors (immediate index)

Contiguous load four-halfword structures, each to the same element number in four vector registers from the memory
address generated by a 64-bit scalar base and an immediate index which is a multiple of 4 in the range -32 to 28 that
is multiplied by the vector's in-memory size, irrespective of predication,
Each predicate element applies to the same element number in each of the four vector registers, or equivalently to the
four consecutive halfwords in memory which make up each structure. Inactive elements will not cause a read from
Device memory or signal a fault, and the corresponding element is set to zero in each of the four destination vector
registers.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 0 1 1 1 0 imm4 1 1 1 Pg Rn Zt
msz<1>msz<0>

LD4H { <Zt1>.H, <Zt2>.H, <Zt3>.H, <Zt4>.H }, <Pg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]

if !HaveSVE() then UNDEFINED;

integer t = UInt(Zt);
integer n = UInt(Rn);
integer g = UInt(Pg);
integer esize = 16;
integer offset = SInt(imm4);
integer nreg = 4;

Assembler Symbols

<Zt1> Is the name of the first scalable vector register to be transferred, encoded in the "Zt" field.
<Zt2> Is the name of the second scalable vector register to be transferred, encoded as "Zt" plus 1 modulo 32.
<Zt3> Is the name of the third scalable vector register to be transferred, encoded as "Zt" plus 2 modulo 32.
<Zt4> Is the name of the fourth scalable vector register to be transferred, encoded as "Zt" plus 3 modulo 32.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<imm> Is the optional signed immediate vector offset, a multiple of 4 in the range -32 to 28, defaulting to 0,
encoded in the "imm4" field.

LD4H (scalar plus immediate) Page 2110

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(64) base;
bits(PL) mask = P[g];
constant integer mbytes = esize DIV 8;
array [0..3] of bits(VL) values;

if HaveMTEExt() then SetTagCheckedInstruction(n != 31);

if !AnyActiveElement(mask, esize) then

if n == 31 && ConstrainUnpredictableBool(Unpredictable_CHECKSPNONEACTIVE) then
CheckSPAlignment();
else
if n == 31 then CheckSPAlignment();
base = if n == 31 then SP[] else X[n];

for r = 0 to nreg-1
Z[(t+r) MOD 32] = values[r];

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

LD4H (scalar plus immediate) Page 2111

LD4H (scalar plus scalar)

Contiguous load four-halfword structures to four vectors (scalar index)

Contiguous load four-halfword structures, each to the same element number in four vector registers from the memory
address generated by a 64-bit scalar base and a 64-bit scalar index register scaled by the element size (LSL option)
and added to the base address. After each structure access the index value is incremented by four. The index register
is not updated by the instruction.
Each predicate element applies to the same element number in each of the four vector registers, or equivalently to the
four consecutive halfwords in memory which make up each structure. Inactive elements will not cause a read from
Device memory or signal a fault, and the corresponding element is set to zero in each of the four destination vector
registers.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 0 1 1 1 Rm 1 1 0 Pg Rn Zt
msz<1>msz<0>

LD4H { <Zt1>.H, <Zt2>.H, <Zt3>.H, <Zt4>.H }, <Pg>/Z, [<Xn|SP>, <Xm>, LSL #1]

if !HaveSVE() then UNDEFINED;

if Rm == '11111' then UNDEFINED;
integer t = UInt(Zt);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer g = UInt(Pg);
integer esize = 16;
integer nreg = 4;

Assembler Symbols

<Zt1> Is the name of the first scalable vector register to be transferred, encoded in the "Zt" field.
<Zt2> Is the name of the second scalable vector register to be transferred, encoded as "Zt" plus 1 modulo 32.
<Zt3> Is the name of the third scalable vector register to be transferred, encoded as "Zt" plus 2 modulo 32.
<Zt4> Is the name of the fourth scalable vector register to be transferred, encoded as "Zt" plus 3 modulo 32.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<Xm> Is the 64-bit name of the general-purpose offset register, encoded in the "Rm" field.

LD4H (scalar plus scalar) Page 2112

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(64) base;
bits(PL) mask = P[g];
bits(64) offset;
constant integer mbytes = esize DIV 8;
array [0..3] of bits(VL) values;

if HaveMTEExt() then SetTagCheckedInstruction(TRUE);

if !AnyActiveElement(mask, esize) then

if n == 31 && ConstrainUnpredictableBool(Unpredictable_CHECKSPNONEACTIVE) then
CheckSPAlignment();
else
if n == 31 then CheckSPAlignment();
base = if n == 31 then SP[] else X[n];
offset = X[m];

for r = 0 to nreg-1
Z[(t+r) MOD 32] = values[r];

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

LD4H (scalar plus scalar) Page 2113

LD4W (scalar plus immediate)

Contiguous load four-word structures to four vectors (immediate index)

Contiguous load four-word structures, each to the same element number in four vector registers from the memory
address generated by a 64-bit scalar base and an immediate index which is a multiple of 4 in the range -32 to 28 that
is multiplied by the vector's in-memory size, irrespective of predication,
Each predicate element applies to the same element number in each of the four vector registers, or equivalently to the
four consecutive words in memory which make up each structure. Inactive elements will not cause a read from Device
memory or signal a fault, and the corresponding element is set to zero in each of the four destination vector registers.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 1 0 1 1 0 imm4 1 1 1 Pg Rn Zt
msz<1>msz<0>

LD4W { <Zt1>.S, <Zt2>.S, <Zt3>.S, <Zt4>.S }, <Pg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]

if !HaveSVE() then UNDEFINED;

integer t = UInt(Zt);
integer n = UInt(Rn);
integer g = UInt(Pg);
integer esize = 32;
integer offset = SInt(imm4);
integer nreg = 4;

Assembler Symbols

<Zt1> Is the name of the first scalable vector register to be transferred, encoded in the "Zt" field.
<Zt2> Is the name of the second scalable vector register to be transferred, encoded as "Zt" plus 1 modulo 32.
<Zt3> Is the name of the third scalable vector register to be transferred, encoded as "Zt" plus 2 modulo 32.
<Zt4> Is the name of the fourth scalable vector register to be transferred, encoded as "Zt" plus 3 modulo 32.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<imm> Is the optional signed immediate vector offset, a multiple of 4 in the range -32 to 28, defaulting to 0,
encoded in the "imm4" field.

LD4W (scalar plus immediate) Page 2114

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(64) base;
bits(PL) mask = P[g];
constant integer mbytes = esize DIV 8;
array [0..3] of bits(VL) values;

if HaveMTEExt() then SetTagCheckedInstruction(n != 31);

if !AnyActiveElement(mask, esize) then

if n == 31 && ConstrainUnpredictableBool(Unpredictable_CHECKSPNONEACTIVE) then
CheckSPAlignment();
else
if n == 31 then CheckSPAlignment();
base = if n == 31 then SP[] else X[n];

for r = 0 to nreg-1
Z[(t+r) MOD 32] = values[r];

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

LD4W (scalar plus immediate) Page 2115

LD4W (scalar plus scalar)

Contiguous load four-word structures to four vectors (scalar index)

Contiguous load four-word structures, each to the same element number in four vector registers from the memory
address generated by a 64-bit scalar base and a 64-bit scalar index register scaled by the element size (LSL option)
and added to the base address. After each structure access the index value is incremented by four. The index register
is not updated by the instruction.
Each predicate element applies to the same element number in each of the four vector registers, or equivalently to the
four consecutive words in memory which make up each structure. Inactive elements will not cause a read from Device
memory or signal a fault, and the corresponding element is set to zero in each of the four destination vector registers.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 1 0 1 1 Rm 1 1 0 Pg Rn Zt
msz<1>msz<0>

LD4W { <Zt1>.S, <Zt2>.S, <Zt3>.S, <Zt4>.S }, <Pg>/Z, [<Xn|SP>, <Xm>, LSL #2]

if !HaveSVE() then UNDEFINED;

if Rm == '11111' then UNDEFINED;
integer t = UInt(Zt);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer g = UInt(Pg);
integer esize = 32;
integer nreg = 4;

Assembler Symbols

<Zt1> Is the name of the first scalable vector register to be transferred, encoded in the "Zt" field.
<Zt2> Is the name of the second scalable vector register to be transferred, encoded as "Zt" plus 1 modulo 32.
<Zt3> Is the name of the third scalable vector register to be transferred, encoded as "Zt" plus 2 modulo 32.
<Zt4> Is the name of the fourth scalable vector register to be transferred, encoded as "Zt" plus 3 modulo 32.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<Xm> Is the 64-bit name of the general-purpose offset register, encoded in the "Rm" field.

LD4W (scalar plus scalar) Page 2116

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(64) base;
bits(PL) mask = P[g];
bits(64) offset;
constant integer mbytes = esize DIV 8;
array [0..3] of bits(VL) values;

if HaveMTEExt() then SetTagCheckedInstruction(TRUE);

if !AnyActiveElement(mask, esize) then

if n == 31 && ConstrainUnpredictableBool(Unpredictable_CHECKSPNONEACTIVE) then
CheckSPAlignment();
else
if n == 31 then CheckSPAlignment();
base = if n == 31 then SP[] else X[n];
offset = X[m];

for r = 0 to nreg-1
Z[(t+r) MOD 32] = values[r];

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

LD4W (scalar plus scalar) Page 2117

LDFF1B (scalar plus scalar)

Contiguous load first-fault unsigned bytes to vector (scalar index)

Contiguous load with first-faulting behavior of unsigned bytes to elements of a vector register from the memory
address generated by a 64-bit scalar base and scalar index which is added to the base address. After each element
access the index value is incremented, but the index register is not updated. Inactive elements will not not cause a
read from Device memory or signal a fault, and are set to zero in the destination vector.
It has encodings from 4 classes: 8-bit element , 16-bit element , 32-bit element and 64-bit element

8-bit element

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 0 0 0 0 Rm 0 1 1 Pg Rn Zt
dtype<3:1>dtype<0>

LDFF1B { <Zt>.B }, <Pg>/Z, [<Xn|SP>{, <Xm>}]

if !HaveSVE() then UNDEFINED;

integer t = UInt(Zt);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer g = UInt(Pg);
integer esize = 8;
integer msize = 8;
boolean unsigned = TRUE;

16-bit element

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 0 0 0 1 Rm 0 1 1 Pg Rn Zt
dtype<3:1>dtype<0>

LDFF1B { <Zt>.H }, <Pg>/Z, [<Xn|SP>{, <Xm>}]

if !HaveSVE() then UNDEFINED;

integer t = UInt(Zt);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer g = UInt(Pg);
integer esize = 16;
integer msize = 8;
boolean unsigned = TRUE;

32-bit element

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 0 0 1 0 Rm 0 1 1 Pg Rn Zt
dtype<3:1>dtype<0>

LDFF1B { <Zt>.S }, <Pg>/Z, [<Xn|SP>{, <Xm>}]

if !HaveSVE() then UNDEFINED;

integer t = UInt(Zt);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer g = UInt(Pg);
integer esize = 32;
integer msize = 8;
boolean unsigned = TRUE;

LDFF1B (scalar plus scalar) Page 2118

64-bit element

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 0 0 1 1 Rm 0 1 1 Pg Rn Zt
dtype<3:1>dtype<0>

LDFF1B { <Zt>.D }, <Pg>/Z, [<Xn|SP>{, <Xm>}]

if !HaveSVE() then UNDEFINED;

integer t = UInt(Zt);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer g = UInt(Pg);
integer esize = 64;
integer msize = 8;
boolean unsigned = TRUE;

Assembler Symbols

<Zt> Is the name of the scalable vector register to be transferred, encoded in the "Zt" field.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<Xm> Is the optional 64-bit name of the general-purpose offset register, defaulting to XZR, encoded in the
"Rm" field.

LDFF1B (scalar plus scalar) Page 2119

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(64) base;
bits(PL) mask = P[g];
bits(VL) result;
bits(VL) orig = Z[t];
bits(msize) data;
bits(64) offset;
constant integer mbytes = msize DIV 8;
boolean first = TRUE;
boolean fault = FALSE;
boolean faulted = FALSE;
boolean unknown = FALSE;

if HaveMTEExt() then SetTagCheckedInstruction(TRUE);

if !AnyActiveElement(mask, esize) then

if n == 31 && ConstrainUnpredictableBool(Unpredictable_CHECKSPNONEACTIVE) then
CheckSPAlignment();
else
if n == 31 then CheckSPAlignment();
base = if n == 31 then SP[] else X[n];
offset = X[m];

for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
bits(64) addr = base + (UInt(offset) + e) * mbytes;
if first then
// Mem[] will not return if a fault is detected for the first active element
data = Mem[addr, mbytes, AccType_SVE];
first = FALSE;
else
// MemNF[] will return fault=TRUE if access is not performed for any reason
(data, fault) = MemNF[addr, mbytes, AccType_CNOTFIRST];
else
(data, fault) = (Zeros(msize), FALSE);

// FFR elements set to FALSE following a supressed access/fault

faulted = faulted || fault;
if faulted then
ElemFFR[e, esize] = '0';

// Value becomes CONSTRAINED UNPREDICTABLE after an FFR element is FALSE

unknown = unknown || ElemFFR[e, esize] == '0';
if unknown then
if !fault && ConstrainUnpredictableBool(Unpredictable_SVELDNFDATA) then
Elem[result, e, esize] = Extend(data, esize, unsigned);
elsif ConstrainUnpredictableBool(Unpredictable_SVELDNFZERO) then
Elem[result, e, esize] = Zeros();
else // merge
Elem[result, e, esize] = Elem[orig, e, esize];
else
Elem[result, e, esize] = Extend(data, esize, unsigned);

Z[t] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

LDFF1B (scalar plus scalar) Page 2120

LDFF1B (scalar plus vector)

Gather load first-fault unsigned bytes to vector (vector index)

Gather load with first-faulting behavior of unsigned bytes to active elements of a vector register from memory
addresses generated by a 64-bit scalar base plus vector index. The index values are optionally sign or zero-extended
from 32 to 64 bits. Inactive elements will not cause a read from Device memory or signal faults, and are set to zero in
the destination vector.
It has encodings from 3 classes: 32-bit unpacked unscaled offset , 32-bit unscaled offset and 64-bit unscaled offset

32-bit unpacked unscaled offset

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 0 1 0 0 0 xs 0 Zm 0 1 1 Pg Rn Zt
msz<1>msz<0> U ff

LDFF1B { <Zt>.D }, <Pg>/Z, [<Xn|SP>, <Zm>.D, <mod>]

if !HaveSVE() then UNDEFINED;

32-bit unscaled offset

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 0 0 0 1 0 0 0 xs 0 Zm 0 1 1 Pg Rn Zt
U ff

LDFF1B { <Zt>.S }, <Pg>/Z, [<Xn|SP>, <Zm>.S, <mod>]

if !HaveSVE() then UNDEFINED;

64-bit unscaled offset

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 0 1 0 0 0 1 0 Zm 1 1 1 Pg Rn Zt
msz<1>msz<0> U ff

LDFF1B (scalar plus vector) Page 2121

LDFF1B { <Zt>.D }, <Pg>/Z, [<Xn|SP>, <Zm>.D]

if !HaveSVE() then UNDEFINED;

Assembler Symbols

<Zt> Is the name of the scalable vector register to be transferred, encoded in the "Zt" field.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<Zm> Is the name of the offset scalable vector register, encoded in the "Zm" field.

<mod> Is the index extend and shift specifier, encoded in “xs”:

xs <mod>
0 UXTW
1 SXTW

LDFF1B (scalar plus vector) Page 2122

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(64) base;
bits(VL) offset;
bits(VL) result;
bits(VL) orig = Z[t];
bits(msize) data;
constant integer mbytes = msize DIV 8;
boolean first = TRUE;
boolean fault = FALSE;
boolean faulted = FALSE;
boolean unknown = FALSE;

if HaveMTEExt() then SetTagCheckedInstruction(TRUE);

if !AnyActiveElement(mask, esize) then

if n == 31 && ConstrainUnpredictableBool(Unpredictable_CHECKSPNONEACTIVE) then
CheckSPAlignment();
else
if n == 31 then CheckSPAlignment();
base = if n == 31 then SP[] else X[n];
offset = Z[m];

for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
integer off = Int(Elem[offset, e, esize]<offs_size-1:0>, offs_unsigned);
bits(64) addr = base + (off << scale);
if first then
// Mem[] will not return if a fault is detected for the first active element
data = Mem[addr, mbytes, AccType_SVE];
first = FALSE;
else
// MemNF[] will return fault=TRUE if access is not performed for any reason
(data, fault) = MemNF[addr, mbytes, AccType_NONFAULT];
else
(data, fault) = (Zeros(msize), FALSE);

// FFR elements set to FALSE following a supressed access/fault

faulted = faulted || fault;
if faulted then
ElemFFR[e, esize] = '0';

// Value becomes CONSTRAINED UNPREDICTABLE after an FFR element is FALSE

Z[t] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

LDFF1B (scalar plus vector) Page 2123

LDFF1B (vector plus immediate)

Gather load first-fault unsigned bytes to vector (immediate index)

Gather load with first-faulting behavior of unsigned bytes to active elements of a vector register from memory
addresses generated by a vector base plus immediate index. The index is in the range 0 to 31. Inactive elements will
not cause a read from Device memory or signal faults, and are set to zero in the destination vector.
It has encodings from 2 classes: 32-bit element and 64-bit element

32-bit element

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 0 0 0 1 0 0 0 0 1 imm5 1 1 1 Pg Zn Zt
msz<1>msz<0> U ff

LDFF1B { <Zt>.S }, <Pg>/Z, [<Zn>.S{, #<imm>}]

if !HaveSVE() then UNDEFINED;

integer t = UInt(Zt);
integer n = UInt(Zn);
integer g = UInt(Pg);
integer esize = 32;
integer msize = 8;
boolean unsigned = TRUE;
integer offset = UInt(imm5);

64-bit element

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 0 1 0 0 0 0 1 imm5 1 1 1 Pg Zn Zt
msz<1>msz<0> U ff

LDFF1B { <Zt>.D }, <Pg>/Z, [<Zn>.D{, #<imm>}]

if !HaveSVE() then UNDEFINED;

integer t = UInt(Zt);
integer n = UInt(Zn);
integer g = UInt(Pg);
integer esize = 64;
integer msize = 8;
boolean unsigned = TRUE;
integer offset = UInt(imm5);

Assembler Symbols

<Zt> Is the name of the scalable vector register to be transferred, encoded in the "Zt" field.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Zn> Is the name of the base scalable vector register, encoded in the "Zn" field.
<imm> Is the optional unsigned immediate byte offset, in the range 0 to 31, defaulting to 0, encoded in the
"imm5" field.

LDFF1B (vector plus

Page 2124
immediate)
Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) base;
bits(VL) result;
bits(VL) orig = Z[t];
bits(msize) data;
constant integer mbytes = msize DIV 8;
boolean first = TRUE;
boolean fault = FALSE;
boolean faulted = FALSE;
boolean unknown = FALSE;

if HaveMTEExt() then SetTagCheckedInstruction(TRUE);

if AnyActiveElement(mask, esize) then

base = Z[n];

for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
bits(64) addr = ZeroExtend(Elem[base, e, esize], 64) + offset * mbytes;
if first then
// Mem[] will not return if a fault is detected for the first active element
data = Mem[addr, mbytes, AccType_SVE];
first = FALSE;
else
// MemNF[] will return fault=TRUE if access is not performed for any reason
(data, fault) = MemNF[addr, mbytes, AccType_NONFAULT];
else
(data, fault) = (Zeros(msize), FALSE);

// FFR elements set to FALSE following a supressed access/fault

faulted = faulted || fault;
if faulted then
ElemFFR[e, esize] = '0';

// Value becomes CONSTRAINED UNPREDICTABLE after an FFR element is FALSE

Z[t] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

LDFF1B (vector plus

Page 2125
immediate)
LDFF1D (scalar plus scalar)

Contiguous load first-fault doublewords to vector (scalar index)

Contiguous load with first-faulting behavior of doublewords to elements of a vector register from the memory address
generated by a 64-bit scalar base and scalar index which is multiplied by 8 and added to the base address. After each
element access the index value is incremented, but the index register is not updated. Inactive elements will not not
cause a read from Device memory or signal a fault, and are set to zero in the destination vector.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 1 1 1 1 Rm 0 1 1 Pg Rn Zt
dtype<3:1>dtype<0>

LDFF1D { <Zt>.D }, <Pg>/Z, [<Xn|SP>{, <Xm>, LSL #3}]

if !HaveSVE() then UNDEFINED;

integer t = UInt(Zt);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer g = UInt(Pg);
integer esize = 64;
integer msize = 64;
boolean unsigned = TRUE;

Assembler Symbols

<Zt> Is the name of the scalable vector register to be transferred, encoded in the "Zt" field.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<Xm> Is the optional 64-bit name of the general-purpose offset register, defaulting to XZR, encoded in the
"Rm" field.

LDFF1D (scalar plus scalar) Page 2126

Operation

if HaveMTEExt() then SetTagCheckedInstruction(TRUE);

if !AnyActiveElement(mask, esize) then

if n == 31 && ConstrainUnpredictableBool(Unpredictable_CHECKSPNONEACTIVE) then
CheckSPAlignment();
else
if n == 31 then CheckSPAlignment();
base = if n == 31 then SP[] else X[n];
offset = X[m];

// FFR elements set to FALSE following a supressed access/fault

faulted = faulted || fault;
if faulted then
ElemFFR[e, esize] = '0';

// Value becomes CONSTRAINED UNPREDICTABLE after an FFR element is FALSE

Z[t] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

LDFF1D (scalar plus scalar) Page 2127

LDFF1D (scalar plus vector)

Gather load first-fault doublewords to vector (vector index)

Gather load with first-faulting behavior of doublewords to active elements of a vector register from memory addresses
generated by a 64-bit scalar base plus vector index. The index values are optionally first sign or zero-extended from 32
to 64 bits and then optionally multiplied by 8. Inactive elements will not cause a read from Device memory or signal
faults, and are set to zero in the destination vector.
It has encodings from 4 classes: 32-bit unpacked scaled offset , 32-bit unpacked unscaled offset , 64-bit scaled offset
and 64-bit unscaled offset

32-bit unpacked scaled offset

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 0 1 0 1 1 xs 1 Zm 0 1 1 Pg Rn Zt
U ff

LDFF1D { <Zt>.D }, <Pg>/Z, [<Xn|SP>, <Zm>.D, <mod> #3]

if !HaveSVE() then UNDEFINED;

32-bit unpacked unscaled offset

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 0 1 0 1 1 xs 0 Zm 0 1 1 Pg Rn Zt
msz<1>msz<0> U ff

LDFF1D { <Zt>.D }, <Pg>/Z, [<Xn|SP>, <Zm>.D, <mod>]

if !HaveSVE() then UNDEFINED;

64-bit scaled offset

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 0 1 0 1 1 1 1 Zm 1 1 1 Pg Rn Zt
U ff

LDFF1D (scalar plus vector) Page 2128

LDFF1D { <Zt>.D }, <Pg>/Z, [<Xn|SP>, <Zm>.D, LSL #3]

if !HaveSVE() then UNDEFINED;

64-bit unscaled offset

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 0 1 0 1 1 1 0 Zm 1 1 1 Pg Rn Zt
msz<1>msz<0> U ff

LDFF1D { <Zt>.D }, <Pg>/Z, [<Xn|SP>, <Zm>.D]

if !HaveSVE() then UNDEFINED;

Assembler Symbols

<Zt> Is the name of the scalable vector register to be transferred, encoded in the "Zt" field.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<Zm> Is the name of the offset scalable vector register, encoded in the "Zm" field.

<mod> Is the index extend and shift specifier, encoded in “xs”:

xs <mod>
0 UXTW
1 SXTW

LDFF1D (scalar plus vector) Page 2129

Operation

if HaveMTEExt() then SetTagCheckedInstruction(TRUE);

if !AnyActiveElement(mask, esize) then

if n == 31 && ConstrainUnpredictableBool(Unpredictable_CHECKSPNONEACTIVE) then
CheckSPAlignment();
else
if n == 31 then CheckSPAlignment();
base = if n == 31 then SP[] else X[n];
offset = Z[m];

// FFR elements set to FALSE following a supressed access/fault

faulted = faulted || fault;
if faulted then
ElemFFR[e, esize] = '0';

// Value becomes CONSTRAINED UNPREDICTABLE after an FFR element is FALSE

Z[t] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

LDFF1D (scalar plus vector) Page 2130

LDFF1D (vector plus immediate)

Gather load first-fault doublewords to vector (immediate index)

Gather load with first-faulting behavior of doublewords to active elements of a vector register from memory addresses
generated by a vector base plus immediate index. The index is a multiple of 8 in the range 0 to 248. Inactive elements
will not cause a read from Device memory or signal faults, and are set to zero in the destination vector.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 0 1 0 1 1 0 1 imm5 1 1 1 Pg Zn Zt
msz<1>msz<0> U ff

LDFF1D { <Zt>.D }, <Pg>/Z, [<Zn>.D{, #<imm>}]

if !HaveSVE() then UNDEFINED;

integer t = UInt(Zt);
integer n = UInt(Zn);
integer g = UInt(Pg);
integer esize = 64;
integer msize = 64;
boolean unsigned = TRUE;
integer offset = UInt(imm5);

Assembler Symbols

<Zt> Is the name of the scalable vector register to be transferred, encoded in the "Zt" field.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Zn> Is the name of the base scalable vector register, encoded in the "Zn" field.
<imm> Is the optional unsigned immediate byte offset, a multiple of 8 in the range 0 to 248, defaulting to 0,
encoded in the "imm5" field.

LDFF1D (vector plus

Page 2131
immediate)
Operation

if HaveMTEExt() then SetTagCheckedInstruction(TRUE);

if AnyActiveElement(mask, esize) then

base = Z[n];

// FFR elements set to FALSE following a supressed access/fault

faulted = faulted || fault;
if faulted then
ElemFFR[e, esize] = '0';

// Value becomes CONSTRAINED UNPREDICTABLE after an FFR element is FALSE

Z[t] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

LDFF1D (vector plus

Page 2132
immediate)
LDFF1H (scalar plus scalar)

Contiguous load first-fault unsigned halfwords to vector (scalar index)

Contiguous load with first-faulting behavior of unsigned halfwords to elements of a vector register from the memory
address generated by a 64-bit scalar base and scalar index which is multiplied by 2 and added to the base address.
After each element access the index value is incremented, but the index register is not updated. Inactive elements will
not not cause a read from Device memory or signal a fault, and are set to zero in the destination vector.
It has encodings from 3 classes: 16-bit element , 32-bit element and 64-bit element

16-bit element

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 0 1 0 1 Rm 0 1 1 Pg Rn Zt
dtype<3:1>dtype<0>

LDFF1H { <Zt>.H }, <Pg>/Z, [<Xn|SP>{, <Xm>, LSL #1}]

if !HaveSVE() then UNDEFINED;

integer t = UInt(Zt);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer g = UInt(Pg);
integer esize = 16;
integer msize = 16;
boolean unsigned = TRUE;

32-bit element

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 0 1 1 0 Rm 0 1 1 Pg Rn Zt
dtype<3:1>dtype<0>

LDFF1H { <Zt>.S }, <Pg>/Z, [<Xn|SP>{, <Xm>, LSL #1}]

if !HaveSVE() then UNDEFINED;

integer t = UInt(Zt);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer g = UInt(Pg);
integer esize = 32;
integer msize = 16;
boolean unsigned = TRUE;

64-bit element

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 0 1 1 1 Rm 0 1 1 Pg Rn Zt
dtype<3:1>dtype<0>

LDFF1H { <Zt>.D }, <Pg>/Z, [<Xn|SP>{, <Xm>, LSL #1}]

if !HaveSVE() then UNDEFINED;

integer t = UInt(Zt);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer g = UInt(Pg);
integer esize = 64;
integer msize = 16;
boolean unsigned = TRUE;

LDFF1H (scalar plus scalar) Page 2133

Assembler Symbols

<Zt> Is the name of the scalable vector register to be transferred, encoded in the "Zt" field.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<Xm> Is the optional 64-bit name of the general-purpose offset register, defaulting to XZR, encoded in the
"Rm" field.

Operation

if HaveMTEExt() then SetTagCheckedInstruction(TRUE);

if !AnyActiveElement(mask, esize) then

if n == 31 && ConstrainUnpredictableBool(Unpredictable_CHECKSPNONEACTIVE) then
CheckSPAlignment();
else
if n == 31 then CheckSPAlignment();
base = if n == 31 then SP[] else X[n];
offset = X[m];

// FFR elements set to FALSE following a supressed access/fault

faulted = faulted || fault;
if faulted then
ElemFFR[e, esize] = '0';

// Value becomes CONSTRAINED UNPREDICTABLE after an FFR element is FALSE

Z[t] = result;

LDFF1H (scalar plus scalar) Page 2134

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

LDFF1H (scalar plus scalar) Page 2135

LDFF1H (scalar plus vector)

Gather load first-fault unsigned halfwords to vector (vector index)

Gather load with first-faulting behavior of unsigned halfwords to active elements of a vector register from memory
addresses generated by a 64-bit scalar base plus vector index. The index values are optionally first sign or zero-
extended from 32 to 64 bits and then optionally multiplied by 2. Inactive elements will not cause a read from Device
memory or signal faults, and are set to zero in the destination vector.
It has encodings from 6 classes: 32-bit scaled offset , 32-bit unpacked scaled offset , 32-bit unpacked unscaled offset ,
32-bit unscaled offset , 64-bit scaled offset and 64-bit unscaled offset

32-bit scaled offset

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 0 0 0 1 0 0 1 xs 1 Zm 0 1 1 Pg Rn Zt
U ff

LDFF1H { <Zt>.S }, <Pg>/Z, [<Xn|SP>, <Zm>.S, <mod> #1]

if !HaveSVE() then UNDEFINED;

32-bit unpacked scaled offset

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 0 1 0 0 1 xs 1 Zm 0 1 1 Pg Rn Zt
U ff

LDFF1H { <Zt>.D }, <Pg>/Z, [<Xn|SP>, <Zm>.D, <mod> #1]

if !HaveSVE() then UNDEFINED;

32-bit unpacked unscaled offset

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 0 1 0 0 1 xs 0 Zm 0 1 1 Pg Rn Zt
msz<1>msz<0> U ff

LDFF1H (scalar plus vector) Page 2136

LDFF1H { <Zt>.D }, <Pg>/Z, [<Xn|SP>, <Zm>.D, <mod>]

if !HaveSVE() then UNDEFINED;

32-bit unscaled offset

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 0 0 0 1 0 0 1 xs 0 Zm 0 1 1 Pg Rn Zt
U ff

LDFF1H { <Zt>.S }, <Pg>/Z, [<Xn|SP>, <Zm>.S, <mod>]

if !HaveSVE() then UNDEFINED;

64-bit scaled offset

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 0 1 0 0 1 1 1 Zm 1 1 1 Pg Rn Zt
U ff

LDFF1H { <Zt>.D }, <Pg>/Z, [<Xn|SP>, <Zm>.D, LSL #1]

if !HaveSVE() then UNDEFINED;

64-bit unscaled offset

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 0 1 0 0 1 1 0 Zm 1 1 1 Pg Rn Zt
msz<1>msz<0> U ff

LDFF1H (scalar plus vector) Page 2137

LDFF1H { <Zt>.D }, <Pg>/Z, [<Xn|SP>, <Zm>.D]

if !HaveSVE() then UNDEFINED;

Assembler Symbols

<Zt> Is the name of the scalable vector register to be transferred, encoded in the "Zt" field.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<Zm> Is the name of the offset scalable vector register, encoded in the "Zm" field.

<mod> Is the index extend and shift specifier, encoded in “xs”:

xs <mod>
0 UXTW
1 SXTW

LDFF1H (scalar plus vector) Page 2138

Operation

if HaveMTEExt() then SetTagCheckedInstruction(TRUE);

if !AnyActiveElement(mask, esize) then

if n == 31 && ConstrainUnpredictableBool(Unpredictable_CHECKSPNONEACTIVE) then
CheckSPAlignment();
else
if n == 31 then CheckSPAlignment();
base = if n == 31 then SP[] else X[n];
offset = Z[m];

// FFR elements set to FALSE following a supressed access/fault

faulted = faulted || fault;
if faulted then
ElemFFR[e, esize] = '0';

// Value becomes CONSTRAINED UNPREDICTABLE after an FFR element is FALSE

Z[t] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

LDFF1H (scalar plus vector) Page 2139

LDFF1H (vector plus immediate)

Gather load first-fault unsigned halfwords to vector (immediate index)

Gather load with first-faulting behavior of unsigned halfwords to active elements of a vector register from memory
addresses generated by a vector base plus immediate index. The index is a multiple of 2 in the range 0 to 62. Inactive
elements will not cause a read from Device memory or signal faults, and are set to zero in the destination vector.
It has encodings from 2 classes: 32-bit element and 64-bit element

32-bit element

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 0 0 0 1 0 0 1 0 1 imm5 1 1 1 Pg Zn Zt
msz<1>msz<0> U ff

LDFF1H { <Zt>.S }, <Pg>/Z, [<Zn>.S{, #<imm>}]

if !HaveSVE() then UNDEFINED;

integer t = UInt(Zt);
integer n = UInt(Zn);
integer g = UInt(Pg);
integer esize = 32;
integer msize = 16;
boolean unsigned = TRUE;
integer offset = UInt(imm5);

64-bit element

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 0 1 0 0 1 0 1 imm5 1 1 1 Pg Zn Zt
msz<1>msz<0> U ff

LDFF1H { <Zt>.D }, <Pg>/Z, [<Zn>.D{, #<imm>}]

if !HaveSVE() then UNDEFINED;

integer t = UInt(Zt);
integer n = UInt(Zn);
integer g = UInt(Pg);
integer esize = 64;
integer msize = 16;
boolean unsigned = TRUE;
integer offset = UInt(imm5);

Assembler Symbols

<Zt> Is the name of the scalable vector register to be transferred, encoded in the "Zt" field.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Zn> Is the name of the base scalable vector register, encoded in the "Zn" field.
<imm> Is the optional unsigned immediate byte offset, a multiple of 2 in the range 0 to 62, defaulting to 0,
encoded in the "imm5" field.

LDFF1H (vector plus

Page 2140
immediate)
Operation

if HaveMTEExt() then SetTagCheckedInstruction(TRUE);

if AnyActiveElement(mask, esize) then

base = Z[n];

// FFR elements set to FALSE following a supressed access/fault

faulted = faulted || fault;
if faulted then
ElemFFR[e, esize] = '0';

// Value becomes CONSTRAINED UNPREDICTABLE after an FFR element is FALSE

Z[t] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

LDFF1H (vector plus

Page 2141
immediate)
LDFF1SB (scalar plus scalar)

Contiguous load first-fault signed bytes to vector (scalar index)

Contiguous load with first-faulting behavior of signed bytes to elements of a vector register from the memory address
generated by a 64-bit scalar base and scalar index which is added to the base address. After each element access the
index value is incremented, but the index register is not updated. Inactive elements will not not cause a read from
Device memory or signal a fault, and are set to zero in the destination vector.
It has encodings from 3 classes: 16-bit element , 32-bit element and 64-bit element

16-bit element

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 1 1 1 0 Rm 0 1 1 Pg Rn Zt
dtype<3:1>dtype<0>

LDFF1SB { <Zt>.H }, <Pg>/Z, [<Xn|SP>{, <Xm>}]

if !HaveSVE() then UNDEFINED;

integer t = UInt(Zt);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer g = UInt(Pg);
integer esize = 16;
integer msize = 8;
boolean unsigned = FALSE;

32-bit element

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 1 1 0 1 Rm 0 1 1 Pg Rn Zt
dtype<3:1>dtype<0>

LDFF1SB { <Zt>.S }, <Pg>/Z, [<Xn|SP>{, <Xm>}]

if !HaveSVE() then UNDEFINED;

integer t = UInt(Zt);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer g = UInt(Pg);
integer esize = 32;
integer msize = 8;
boolean unsigned = FALSE;

64-bit element

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 1 1 0 0 Rm 0 1 1 Pg Rn Zt
dtype<3:1>dtype<0>

LDFF1SB { <Zt>.D }, <Pg>/Z, [<Xn|SP>{, <Xm>}]

if !HaveSVE() then UNDEFINED;

integer t = UInt(Zt);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer g = UInt(Pg);
integer esize = 64;
integer msize = 8;
boolean unsigned = FALSE;

LDFF1SB (scalar plus scalar) Page 2142

Assembler Symbols

<Zt> Is the name of the scalable vector register to be transferred, encoded in the "Zt" field.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<Xm> Is the optional 64-bit name of the general-purpose offset register, defaulting to XZR, encoded in the
"Rm" field.

Operation

if HaveMTEExt() then SetTagCheckedInstruction(TRUE);

if !AnyActiveElement(mask, esize) then

if n == 31 && ConstrainUnpredictableBool(Unpredictable_CHECKSPNONEACTIVE) then
CheckSPAlignment();
else
if n == 31 then CheckSPAlignment();
base = if n == 31 then SP[] else X[n];
offset = X[m];

// FFR elements set to FALSE following a supressed access/fault

faulted = faulted || fault;
if faulted then
ElemFFR[e, esize] = '0';

// Value becomes CONSTRAINED UNPREDICTABLE after an FFR element is FALSE

Z[t] = result;

LDFF1SB (scalar plus scalar) Page 2143

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

LDFF1SB (scalar plus scalar) Page 2144

LDFF1SB (scalar plus vector)

Gather load first-fault signed bytes to vector (vector index)

Gather load with first-faulting behavior of signed bytes to active elements of a vector register from memory addresses
generated by a 64-bit scalar base plus vector index. The index values are optionally sign or zero-extended from 32 to
64 bits. Inactive elements will not cause a read from Device memory or signal faults, and are set to zero in the
destination vector.
It has encodings from 3 classes: 32-bit unpacked unscaled offset , 32-bit unscaled offset and 64-bit unscaled offset

32-bit unpacked unscaled offset

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 0 1 0 0 0 xs 0 Zm 0 0 1 Pg Rn Zt
msz<1>msz<0> U ff

LDFF1SB { <Zt>.D }, <Pg>/Z, [<Xn|SP>, <Zm>.D, <mod>]

if !HaveSVE() then UNDEFINED;

32-bit unscaled offset

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 0 0 0 1 0 0 0 xs 0 Zm 0 0 1 Pg Rn Zt
U ff

LDFF1SB { <Zt>.S }, <Pg>/Z, [<Xn|SP>, <Zm>.S, <mod>]

if !HaveSVE() then UNDEFINED;

64-bit unscaled offset

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 0 1 0 0 0 1 0 Zm 1 0 1 Pg Rn Zt
msz<1>msz<0> U ff

LDFF1SB (scalar plus vector) Page 2145

LDFF1SB { <Zt>.D }, <Pg>/Z, [<Xn|SP>, <Zm>.D]

if !HaveSVE() then UNDEFINED;

Assembler Symbols

<Zt> Is the name of the scalable vector register to be transferred, encoded in the "Zt" field.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<Zm> Is the name of the offset scalable vector register, encoded in the "Zm" field.

<mod> Is the index extend and shift specifier, encoded in “xs”:

xs <mod>
0 UXTW
1 SXTW

LDFF1SB (scalar plus vector) Page 2146

Operation

if HaveMTEExt() then SetTagCheckedInstruction(TRUE);

if !AnyActiveElement(mask, esize) then

if n == 31 && ConstrainUnpredictableBool(Unpredictable_CHECKSPNONEACTIVE) then
CheckSPAlignment();
else
if n == 31 then CheckSPAlignment();
base = if n == 31 then SP[] else X[n];
offset = Z[m];

// FFR elements set to FALSE following a supressed access/fault

faulted = faulted || fault;
if faulted then
ElemFFR[e, esize] = '0';

// Value becomes CONSTRAINED UNPREDICTABLE after an FFR element is FALSE

Z[t] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

LDFF1SB (scalar plus vector) Page 2147

LDFF1SB (vector plus immediate)

Gather load first-fault signed bytes to vector (immediate index)

Gather load with first-faulting behavior of signed bytes to active elements of a vector register from memory addresses
generated by a vector base plus immediate index. The index is in the range 0 to 31. Inactive elements will not cause a
read from Device memory or signal faults, and are set to zero in the destination vector.
It has encodings from 2 classes: 32-bit element and 64-bit element

32-bit element

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 0 0 0 1 0 0 0 0 1 imm5 1 0 1 Pg Zn Zt
msz<1>msz<0> U ff

LDFF1SB { <Zt>.S }, <Pg>/Z, [<Zn>.S{, #<imm>}]

if !HaveSVE() then UNDEFINED;

integer t = UInt(Zt);
integer n = UInt(Zn);
integer g = UInt(Pg);
integer esize = 32;
integer msize = 8;
boolean unsigned = FALSE;
integer offset = UInt(imm5);

64-bit element

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 0 1 0 0 0 0 1 imm5 1 0 1 Pg Zn Zt
msz<1>msz<0> U ff

LDFF1SB { <Zt>.D }, <Pg>/Z, [<Zn>.D{, #<imm>}]

if !HaveSVE() then UNDEFINED;

integer t = UInt(Zt);
integer n = UInt(Zn);
integer g = UInt(Pg);
integer esize = 64;
integer msize = 8;
boolean unsigned = FALSE;
integer offset = UInt(imm5);

Assembler Symbols

<Zt> Is the name of the scalable vector register to be transferred, encoded in the "Zt" field.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Zn> Is the name of the base scalable vector register, encoded in the "Zn" field.
<imm> Is the optional unsigned immediate byte offset, in the range 0 to 31, defaulting to 0, encoded in the
"imm5" field.

LDFF1SB (vector plus

Page 2148
immediate)
Operation

if HaveMTEExt() then SetTagCheckedInstruction(TRUE);

if AnyActiveElement(mask, esize) then

base = Z[n];

// FFR elements set to FALSE following a supressed access/fault

faulted = faulted || fault;
if faulted then
ElemFFR[e, esize] = '0';

// Value becomes CONSTRAINED UNPREDICTABLE after an FFR element is FALSE

Z[t] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

LDFF1SB (vector plus

Page 2149
immediate)
LDFF1SH (scalar plus scalar)

Contiguous load first-fault signed halfwords to vector (scalar index)

Contiguous load with first-faulting behavior of signed halfwords to elements of a vector register from the memory
address generated by a 64-bit scalar base and scalar index which is multiplied by 2 and added to the base address.
After each element access the index value is incremented, but the index register is not updated. Inactive elements will
not not cause a read from Device memory or signal a fault, and are set to zero in the destination vector.
It has encodings from 2 classes: 32-bit element and 64-bit element

32-bit element

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 1 0 0 1 Rm 0 1 1 Pg Rn Zt
dtype<3:1>dtype<0>

LDFF1SH { <Zt>.S }, <Pg>/Z, [<Xn|SP>{, <Xm>, LSL #1}]

if !HaveSVE() then UNDEFINED;

integer t = UInt(Zt);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer g = UInt(Pg);
integer esize = 32;
integer msize = 16;
boolean unsigned = FALSE;

64-bit element

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 1 0 0 0 Rm 0 1 1 Pg Rn Zt
dtype<3:1>dtype<0>

LDFF1SH { <Zt>.D }, <Pg>/Z, [<Xn|SP>{, <Xm>, LSL #1}]

if !HaveSVE() then UNDEFINED;

integer t = UInt(Zt);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer g = UInt(Pg);
integer esize = 64;
integer msize = 16;
boolean unsigned = FALSE;

Assembler Symbols

<Zt> Is the name of the scalable vector register to be transferred, encoded in the "Zt" field.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<Xm> Is the optional 64-bit name of the general-purpose offset register, defaulting to XZR, encoded in the
"Rm" field.

LDFF1SH (scalar plus scalar) Page 2150

Operation

if HaveMTEExt() then SetTagCheckedInstruction(TRUE);

if !AnyActiveElement(mask, esize) then

if n == 31 && ConstrainUnpredictableBool(Unpredictable_CHECKSPNONEACTIVE) then
CheckSPAlignment();
else
if n == 31 then CheckSPAlignment();
base = if n == 31 then SP[] else X[n];
offset = X[m];

// FFR elements set to FALSE following a supressed access/fault

faulted = faulted || fault;
if faulted then
ElemFFR[e, esize] = '0';

// Value becomes CONSTRAINED UNPREDICTABLE after an FFR element is FALSE

Z[t] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

LDFF1SH (scalar plus scalar) Page 2151

LDFF1SH (scalar plus vector)

Gather load first-fault signed halfwords to vector (vector index)

Gather load with first-faulting behavior of signed halfwords to active elements of a vector register from memory
addresses generated by a 64-bit scalar base plus vector index. The index values are optionally first sign or zero-
extended from 32 to 64 bits and then optionally multiplied by 2. Inactive elements will not cause a read from Device
memory or signal faults, and are set to zero in the destination vector.
It has encodings from 6 classes: 32-bit scaled offset , 32-bit unpacked scaled offset , 32-bit unpacked unscaled offset ,
32-bit unscaled offset , 64-bit scaled offset and 64-bit unscaled offset

32-bit scaled offset

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 0 0 0 1 0 0 1 xs 1 Zm 0 0 1 Pg Rn Zt
U ff

LDFF1SH { <Zt>.S }, <Pg>/Z, [<Xn|SP>, <Zm>.S, <mod> #1]

if !HaveSVE() then UNDEFINED;

32-bit unpacked scaled offset

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 0 1 0 0 1 xs 1 Zm 0 0 1 Pg Rn Zt
U ff

LDFF1SH { <Zt>.D }, <Pg>/Z, [<Xn|SP>, <Zm>.D, <mod> #1]

if !HaveSVE() then UNDEFINED;

32-bit unpacked unscaled offset

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 0 1 0 0 1 xs 0 Zm 0 0 1 Pg Rn Zt
msz<1>msz<0> U ff

LDFF1SH (scalar plus vector) Page 2152

LDFF1SH { <Zt>.D }, <Pg>/Z, [<Xn|SP>, <Zm>.D, <mod>]

if !HaveSVE() then UNDEFINED;

32-bit unscaled offset

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 0 0 0 1 0 0 1 xs 0 Zm 0 0 1 Pg Rn Zt
U ff

LDFF1SH { <Zt>.S }, <Pg>/Z, [<Xn|SP>, <Zm>.S, <mod>]

if !HaveSVE() then UNDEFINED;

64-bit scaled offset

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 0 1 0 0 1 1 1 Zm 1 0 1 Pg Rn Zt
U ff

LDFF1SH { <Zt>.D }, <Pg>/Z, [<Xn|SP>, <Zm>.D, LSL #1]

if !HaveSVE() then UNDEFINED;

64-bit unscaled offset

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 0 1 0 0 1 1 0 Zm 1 0 1 Pg Rn Zt
msz<1>msz<0> U ff

LDFF1SH (scalar plus vector) Page 2153

LDFF1SH { <Zt>.D }, <Pg>/Z, [<Xn|SP>, <Zm>.D]

if !HaveSVE() then UNDEFINED;

Assembler Symbols

<Zt> Is the name of the scalable vector register to be transferred, encoded in the "Zt" field.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<Zm> Is the name of the offset scalable vector register, encoded in the "Zm" field.

<mod> Is the index extend and shift specifier, encoded in “xs”:

xs <mod>
0 UXTW
1 SXTW

LDFF1SH (scalar plus vector) Page 2154

Operation

if HaveMTEExt() then SetTagCheckedInstruction(TRUE);

if !AnyActiveElement(mask, esize) then

if n == 31 && ConstrainUnpredictableBool(Unpredictable_CHECKSPNONEACTIVE) then
CheckSPAlignment();
else
if n == 31 then CheckSPAlignment();
base = if n == 31 then SP[] else X[n];
offset = Z[m];

// FFR elements set to FALSE following a supressed access/fault

faulted = faulted || fault;
if faulted then
ElemFFR[e, esize] = '0';

// Value becomes CONSTRAINED UNPREDICTABLE after an FFR element is FALSE

Z[t] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

LDFF1SH (scalar plus vector) Page 2155

LDFF1SH (vector plus immediate)

Gather load first-fault signed halfwords to vector (immediate index)

Gather load with first-faulting behavior of signed halfwords to active elements of a vector register from memory
addresses generated by a vector base plus immediate index. The index is a multiple of 2 in the range 0 to 62. Inactive
elements will not cause a read from Device memory or signal faults, and are set to zero in the destination vector.
It has encodings from 2 classes: 32-bit element and 64-bit element

32-bit element

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 0 0 0 1 0 0 1 0 1 imm5 1 0 1 Pg Zn Zt
msz<1>msz<0> U ff

LDFF1SH { <Zt>.S }, <Pg>/Z, [<Zn>.S{, #<imm>}]

if !HaveSVE() then UNDEFINED;

integer t = UInt(Zt);
integer n = UInt(Zn);
integer g = UInt(Pg);
integer esize = 32;
integer msize = 16;
boolean unsigned = FALSE;
integer offset = UInt(imm5);

64-bit element

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 0 1 0 0 1 0 1 imm5 1 0 1 Pg Zn Zt
msz<1>msz<0> U ff

LDFF1SH { <Zt>.D }, <Pg>/Z, [<Zn>.D{, #<imm>}]

if !HaveSVE() then UNDEFINED;

integer t = UInt(Zt);
integer n = UInt(Zn);
integer g = UInt(Pg);
integer esize = 64;
integer msize = 16;
boolean unsigned = FALSE;
integer offset = UInt(imm5);

Assembler Symbols

<Zt> Is the name of the scalable vector register to be transferred, encoded in the "Zt" field.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Zn> Is the name of the base scalable vector register, encoded in the "Zn" field.
<imm> Is the optional unsigned immediate byte offset, a multiple of 2 in the range 0 to 62, defaulting to 0,
encoded in the "imm5" field.

LDFF1SH (vector plus

Page 2156
immediate)
Operation

if HaveMTEExt() then SetTagCheckedInstruction(TRUE);

if AnyActiveElement(mask, esize) then

base = Z[n];

// FFR elements set to FALSE following a supressed access/fault

faulted = faulted || fault;
if faulted then
ElemFFR[e, esize] = '0';

// Value becomes CONSTRAINED UNPREDICTABLE after an FFR element is FALSE

Z[t] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

LDFF1SH (vector plus

Page 2157
immediate)
LDFF1SW (scalar plus scalar)

Contiguous load first-fault signed words to vector (scalar index)

Contiguous load with first-faulting behavior of signed words to elements of a vector register from the memory address
generated by a 64-bit scalar base and scalar index which is multiplied by 4 and added to the base address. After each
element access the index value is incremented, but the index register is not updated. Inactive elements will not not
cause a read from Device memory or signal a fault, and are set to zero in the destination vector.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 0 1 0 0 Rm 0 1 1 Pg Rn Zt
dtype<3:1>dtype<0>

LDFF1SW { <Zt>.D }, <Pg>/Z, [<Xn|SP>{, <Xm>, LSL #2}]

if !HaveSVE() then UNDEFINED;

integer t = UInt(Zt);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer g = UInt(Pg);
integer esize = 64;
integer msize = 32;
boolean unsigned = FALSE;

Assembler Symbols

<Zt> Is the name of the scalable vector register to be transferred, encoded in the "Zt" field.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<Xm> Is the optional 64-bit name of the general-purpose offset register, defaulting to XZR, encoded in the
"Rm" field.

LDFF1SW (scalar plus scalar) Page 2158

Operation

if HaveMTEExt() then SetTagCheckedInstruction(TRUE);

if !AnyActiveElement(mask, esize) then

if n == 31 && ConstrainUnpredictableBool(Unpredictable_CHECKSPNONEACTIVE) then
CheckSPAlignment();
else
if n == 31 then CheckSPAlignment();
base = if n == 31 then SP[] else X[n];
offset = X[m];

// FFR elements set to FALSE following a supressed access/fault

faulted = faulted || fault;
if faulted then
ElemFFR[e, esize] = '0';

// Value becomes CONSTRAINED UNPREDICTABLE after an FFR element is FALSE

Z[t] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

LDFF1SW (scalar plus scalar) Page 2159

LDFF1SW (scalar plus vector)

Gather load first-fault signed words to vector (vector index)

Gather load with first-faulting behavior of signed words to active elements of a vector register from memory addresses
generated by a 64-bit scalar base plus vector index. The index values are optionally first sign or zero-extended from 32
to 64 bits and then optionally multiplied by 4. Inactive elements will not cause a read from Device memory or signal
faults, and are set to zero in the destination vector.
It has encodings from 4 classes: 32-bit unpacked scaled offset , 32-bit unpacked unscaled offset , 64-bit scaled offset
and 64-bit unscaled offset

32-bit unpacked scaled offset

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 0 1 0 1 0 xs 1 Zm 0 0 1 Pg Rn Zt
U ff

LDFF1SW { <Zt>.D }, <Pg>/Z, [<Xn|SP>, <Zm>.D, <mod> #2]

if !HaveSVE() then UNDEFINED;

32-bit unpacked unscaled offset

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 0 1 0 1 0 xs 0 Zm 0 0 1 Pg Rn Zt
msz<1>msz<0> U ff

LDFF1SW { <Zt>.D }, <Pg>/Z, [<Xn|SP>, <Zm>.D, <mod>]

if !HaveSVE() then UNDEFINED;

64-bit scaled offset

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 0 1 0 1 0 1 1 Zm 1 0 1 Pg Rn Zt
U ff

LDFF1SW (scalar plus vector) Page 2160

LDFF1SW { <Zt>.D }, <Pg>/Z, [<Xn|SP>, <Zm>.D, LSL #2]

if !HaveSVE() then UNDEFINED;

64-bit unscaled offset

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 0 1 0 1 0 1 0 Zm 1 0 1 Pg Rn Zt
msz<1>msz<0> U ff

LDFF1SW { <Zt>.D }, <Pg>/Z, [<Xn|SP>, <Zm>.D]

if !HaveSVE() then UNDEFINED;

Assembler Symbols

<Zt> Is the name of the scalable vector register to be transferred, encoded in the "Zt" field.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<Zm> Is the name of the offset scalable vector register, encoded in the "Zm" field.

<mod> Is the index extend and shift specifier, encoded in “xs”:

xs <mod>
0 UXTW
1 SXTW

LDFF1SW (scalar plus vector) Page 2161

Operation

if HaveMTEExt() then SetTagCheckedInstruction(TRUE);

if !AnyActiveElement(mask, esize) then

if n == 31 && ConstrainUnpredictableBool(Unpredictable_CHECKSPNONEACTIVE) then
CheckSPAlignment();
else
if n == 31 then CheckSPAlignment();
base = if n == 31 then SP[] else X[n];
offset = Z[m];

// FFR elements set to FALSE following a supressed access/fault

faulted = faulted || fault;
if faulted then
ElemFFR[e, esize] = '0';

// Value becomes CONSTRAINED UNPREDICTABLE after an FFR element is FALSE

Z[t] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

LDFF1SW (scalar plus vector) Page 2162

LDFF1SW (vector plus immediate)

Gather load first-fault signed words to vector (immediate index)

Gather load with first-faulting behavior of signed words to active elements of a vector register from memory addresses
generated by a vector base plus immediate index. The index is a multiple of 4 in the range 0 to 124. Inactive elements
will not cause a read from Device memory or signal faults, and are set to zero in the destination vector.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 0 1 0 1 0 0 1 imm5 1 0 1 Pg Zn Zt
msz<1>msz<0> U ff

LDFF1SW { <Zt>.D }, <Pg>/Z, [<Zn>.D{, #<imm>}]

if !HaveSVE() then UNDEFINED;

integer t = UInt(Zt);
integer n = UInt(Zn);
integer g = UInt(Pg);
integer esize = 64;
integer msize = 32;
boolean unsigned = FALSE;
integer offset = UInt(imm5);

Assembler Symbols

<Zt> Is the name of the scalable vector register to be transferred, encoded in the "Zt" field.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Zn> Is the name of the base scalable vector register, encoded in the "Zn" field.
<imm> Is the optional unsigned immediate byte offset, a multiple of 4 in the range 0 to 124, defaulting to 0,
encoded in the "imm5" field.

LDFF1SW (vector plus

Page 2163
immediate)
Operation

if HaveMTEExt() then SetTagCheckedInstruction(TRUE);

if AnyActiveElement(mask, esize) then

base = Z[n];

// FFR elements set to FALSE following a supressed access/fault

faulted = faulted || fault;
if faulted then
ElemFFR[e, esize] = '0';

// Value becomes CONSTRAINED UNPREDICTABLE after an FFR element is FALSE

Z[t] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

LDFF1SW (vector plus

Page 2164
immediate)
LDFF1W (scalar plus scalar)

Contiguous load first-fault unsigned words to vector (scalar index)

Contiguous load with first-faulting behavior of unsigned words to elements of a vector register from the memory
address generated by a 64-bit scalar base and scalar index which is multiplied by 4 and added to the base address.
After each element access the index value is incremented, but the index register is not updated. Inactive elements will
not not cause a read from Device memory or signal a fault, and are set to zero in the destination vector.
It has encodings from 2 classes: 32-bit element and 64-bit element

32-bit element

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 1 0 1 0 Rm 0 1 1 Pg Rn Zt
dtype<3:1>dtype<0>

LDFF1W { <Zt>.S }, <Pg>/Z, [<Xn|SP>{, <Xm>, LSL #2}]

if !HaveSVE() then UNDEFINED;

integer t = UInt(Zt);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer g = UInt(Pg);
integer esize = 32;
integer msize = 32;
boolean unsigned = TRUE;

64-bit element

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 1 0 1 1 Rm 0 1 1 Pg Rn Zt
dtype<3:1>dtype<0>

LDFF1W { <Zt>.D }, <Pg>/Z, [<Xn|SP>{, <Xm>, LSL #2}]

if !HaveSVE() then UNDEFINED;

integer t = UInt(Zt);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer g = UInt(Pg);
integer esize = 64;
integer msize = 32;
boolean unsigned = TRUE;

Assembler Symbols

<Zt> Is the name of the scalable vector register to be transferred, encoded in the "Zt" field.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<Xm> Is the optional 64-bit name of the general-purpose offset register, defaulting to XZR, encoded in the
"Rm" field.

LDFF1W (scalar plus scalar) Page 2165

Operation

if HaveMTEExt() then SetTagCheckedInstruction(TRUE);

if !AnyActiveElement(mask, esize) then

if n == 31 && ConstrainUnpredictableBool(Unpredictable_CHECKSPNONEACTIVE) then
CheckSPAlignment();
else
if n == 31 then CheckSPAlignment();
base = if n == 31 then SP[] else X[n];
offset = X[m];

// FFR elements set to FALSE following a supressed access/fault

faulted = faulted || fault;
if faulted then
ElemFFR[e, esize] = '0';

// Value becomes CONSTRAINED UNPREDICTABLE after an FFR element is FALSE

Z[t] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

LDFF1W (scalar plus scalar) Page 2166

LDFF1W (scalar plus vector)

Gather load first-fault unsigned words to vector (vector index)

Gather load with first-faulting behavior of unsigned words to active elements of a vector register from memory
addresses generated by a 64-bit scalar base plus vector index. The index values are optionally first sign or zero-
extended from 32 to 64 bits and then optionally multiplied by 4. Inactive elements will not cause a read from Device
memory or signal faults, and are set to zero in the destination vector.
It has encodings from 6 classes: 32-bit scaled offset , 32-bit unpacked scaled offset , 32-bit unpacked unscaled offset ,
32-bit unscaled offset , 64-bit scaled offset and 64-bit unscaled offset

32-bit scaled offset

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 0 0 0 1 0 1 0 xs 1 Zm 0 1 1 Pg Rn Zt
U ff

LDFF1W { <Zt>.S }, <Pg>/Z, [<Xn|SP>, <Zm>.S, <mod> #2]

if !HaveSVE() then UNDEFINED;

32-bit unpacked scaled offset

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 0 1 0 1 0 xs 1 Zm 0 1 1 Pg Rn Zt
U ff

LDFF1W { <Zt>.D }, <Pg>/Z, [<Xn|SP>, <Zm>.D, <mod> #2]

if !HaveSVE() then UNDEFINED;

32-bit unpacked unscaled offset

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 0 1 0 1 0 xs 0 Zm 0 1 1 Pg Rn Zt
msz<1>msz<0> U ff

LDFF1W (scalar plus vector) Page 2167

LDFF1W { <Zt>.D }, <Pg>/Z, [<Xn|SP>, <Zm>.D, <mod>]

if !HaveSVE() then UNDEFINED;

32-bit unscaled offset

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 0 0 0 1 0 1 0 xs 0 Zm 0 1 1 Pg Rn Zt
U ff

LDFF1W { <Zt>.S }, <Pg>/Z, [<Xn|SP>, <Zm>.S, <mod>]

if !HaveSVE() then UNDEFINED;

64-bit scaled offset

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 0 1 0 1 0 1 1 Zm 1 1 1 Pg Rn Zt
U ff

LDFF1W { <Zt>.D }, <Pg>/Z, [<Xn|SP>, <Zm>.D, LSL #2]

if !HaveSVE() then UNDEFINED;

64-bit unscaled offset

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 0 1 0 1 0 1 0 Zm 1 1 1 Pg Rn Zt
msz<1>msz<0> U ff

LDFF1W (scalar plus vector) Page 2168

LDFF1W { <Zt>.D }, <Pg>/Z, [<Xn|SP>, <Zm>.D]

if !HaveSVE() then UNDEFINED;

Assembler Symbols

<Zt> Is the name of the scalable vector register to be transferred, encoded in the "Zt" field.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<Zm> Is the name of the offset scalable vector register, encoded in the "Zm" field.

<mod> Is the index extend and shift specifier, encoded in “xs”:

xs <mod>
0 UXTW
1 SXTW

LDFF1W (scalar plus vector) Page 2169

Operation

if HaveMTEExt() then SetTagCheckedInstruction(TRUE);

if !AnyActiveElement(mask, esize) then

if n == 31 && ConstrainUnpredictableBool(Unpredictable_CHECKSPNONEACTIVE) then
CheckSPAlignment();
else
if n == 31 then CheckSPAlignment();
base = if n == 31 then SP[] else X[n];
offset = Z[m];

// FFR elements set to FALSE following a supressed access/fault

faulted = faulted || fault;
if faulted then
ElemFFR[e, esize] = '0';

// Value becomes CONSTRAINED UNPREDICTABLE after an FFR element is FALSE

Z[t] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

LDFF1W (scalar plus vector) Page 2170

LDFF1W (vector plus immediate)

Gather load first-fault unsigned words to vector (immediate index)

Gather load with first-faulting behavior of unsigned words to active elements of a vector register from memory
addresses generated by a vector base plus immediate index. The index is a multiple of 4 in the range 0 to 124. Inactive
elements will not cause a read from Device memory or signal faults, and are set to zero in the destination vector.
It has encodings from 2 classes: 32-bit element and 64-bit element

32-bit element

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 0 0 0 1 0 1 0 0 1 imm5 1 1 1 Pg Zn Zt
msz<1>msz<0> U ff

LDFF1W { <Zt>.S }, <Pg>/Z, [<Zn>.S{, #<imm>}]

if !HaveSVE() then UNDEFINED;

integer t = UInt(Zt);
integer n = UInt(Zn);
integer g = UInt(Pg);
integer esize = 32;
integer msize = 32;
boolean unsigned = TRUE;
integer offset = UInt(imm5);

64-bit element

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 0 1 0 1 0 0 1 imm5 1 1 1 Pg Zn Zt
msz<1>msz<0> U ff

LDFF1W { <Zt>.D }, <Pg>/Z, [<Zn>.D{, #<imm>}]

if !HaveSVE() then UNDEFINED;

integer t = UInt(Zt);
integer n = UInt(Zn);
integer g = UInt(Pg);
integer esize = 64;
integer msize = 32;
boolean unsigned = TRUE;
integer offset = UInt(imm5);

Assembler Symbols

<Zt> Is the name of the scalable vector register to be transferred, encoded in the "Zt" field.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Zn> Is the name of the base scalable vector register, encoded in the "Zn" field.
<imm> Is the optional unsigned immediate byte offset, a multiple of 4 in the range 0 to 124, defaulting to 0,
encoded in the "imm5" field.

LDFF1W (vector plus

Page 2171
immediate)
Operation

if HaveMTEExt() then SetTagCheckedInstruction(TRUE);

if AnyActiveElement(mask, esize) then

base = Z[n];

// FFR elements set to FALSE following a supressed access/fault

faulted = faulted || fault;
if faulted then
ElemFFR[e, esize] = '0';

// Value becomes CONSTRAINED UNPREDICTABLE after an FFR element is FALSE

Z[t] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

LDFF1W (vector plus

Page 2172
immediate)
LDNF1B

Contiguous load non-fault unsigned bytes to vector (immediate index)

Contiguous load with non-faulting behavior of unsigned bytes to elements of a vector register from the memory
address generated by a 64-bit scalar base and immediate index in the range -8 to 7 which is multiplied by the vector's
in-memory size, irrespective of predication, and added to the base address. Inactive elements will not not cause a read
from Device memory or signal a fault, and are set to zero in the destination vector.
It has encodings from 4 classes: 8-bit element , 16-bit element , 32-bit element and 64-bit element

8-bit element

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 0 0 0 0 1 imm4 1 0 1 Pg Rn Zt
dtype<3:1>dtype<0>

LDNF1B { <Zt>.B }, <Pg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]

if !HaveSVE() then UNDEFINED;

integer t = UInt(Zt);
integer n = UInt(Rn);
integer g = UInt(Pg);
integer esize = 8;
integer msize = 8;
boolean unsigned = TRUE;
integer offset = SInt(imm4);

16-bit element

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 0 0 0 1 1 imm4 1 0 1 Pg Rn Zt
dtype<3:1>dtype<0>

LDNF1B { <Zt>.H }, <Pg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]

if !HaveSVE() then UNDEFINED;

integer t = UInt(Zt);
integer n = UInt(Rn);
integer g = UInt(Pg);
integer esize = 16;
integer msize = 8;
boolean unsigned = TRUE;
integer offset = SInt(imm4);

32-bit element

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 0 0 1 0 1 imm4 1 0 1 Pg Rn Zt
dtype<3:1>dtype<0>

LDNF1B { <Zt>.S }, <Pg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]

if !HaveSVE() then UNDEFINED;

integer t = UInt(Zt);
integer n = UInt(Rn);
integer g = UInt(Pg);
integer esize = 32;
integer msize = 8;
boolean unsigned = TRUE;
integer offset = SInt(imm4);

LDNF1B Page 2173

64-bit element

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 0 0 1 1 1 imm4 1 0 1 Pg Rn Zt
dtype<3:1>dtype<0>

LDNF1B { <Zt>.D }, <Pg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]

if !HaveSVE() then UNDEFINED;

integer t = UInt(Zt);
integer n = UInt(Rn);
integer g = UInt(Pg);
integer esize = 64;
integer msize = 8;
boolean unsigned = TRUE;
integer offset = SInt(imm4);

Assembler Symbols

LDNF1B Page 2174

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(64) base;
bits(PL) mask = P[g];
bits(VL) result;
bits(VL) orig = Z[t];
bits(msize) data;
constant integer mbytes = msize DIV 8;
boolean fault = FALSE;
boolean faulted = FALSE;
boolean unknown = FALSE;

if HaveMTEExt() then SetTagCheckedInstruction(n != 31);

if !AnyActiveElement(mask, esize) then

if n == 31 && ConstrainUnpredictableBool(Unpredictable_CHECKSPNONEACTIVE) then
CheckSPAlignment();
else
if n == 31 then CheckSPAlignment();
base = if n == 31 then SP[] else X[n];

for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
integer eoff = (offset * elements) + e;
bits(64) addr = base + eoff * mbytes;
// MemNF[] will return fault=TRUE if access is not performed for any reason
(data, fault) = MemNF[addr, mbytes, AccType_NONFAULT];
else
(data, fault) = (Zeros(msize), FALSE);

// FFR elements set to FALSE following a supressed access/fault

faulted = faulted || fault;
if faulted then
ElemFFR[e, esize] = '0';

// Value becomes CONSTRAINED UNPREDICTABLE after an FFR element is FALSE

Z[t] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

LDNF1B Page 2175

LDNF1D

Contiguous load non-fault doublewords to vector (immediate index)

Contiguous load with non-faulting behavior of doublewords to elements of a vector register from the memory address
generated by a 64-bit scalar base and immediate index in the range -8 to 7 which is multiplied by the vector's in-
memory size, irrespective of predication, and added to the base address. Inactive elements will not not cause a read
from Device memory or signal a fault, and are set to zero in the destination vector.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 1 1 1 1 1 imm4 1 0 1 Pg Rn Zt
dtype<3:1>dtype<0>

LDNF1D { <Zt>.D }, <Pg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]

if !HaveSVE() then UNDEFINED;

integer t = UInt(Zt);
integer n = UInt(Rn);
integer g = UInt(Pg);
integer esize = 64;
integer msize = 64;
boolean unsigned = TRUE;
integer offset = SInt(imm4);

Assembler Symbols

LDNF1D Page 2176

Operation

if HaveMTEExt() then SetTagCheckedInstruction(n != 31);

if !AnyActiveElement(mask, esize) then

if n == 31 && ConstrainUnpredictableBool(Unpredictable_CHECKSPNONEACTIVE) then
CheckSPAlignment();
else
if n == 31 then CheckSPAlignment();
base = if n == 31 then SP[] else X[n];

// FFR elements set to FALSE following a supressed access/fault

faulted = faulted || fault;
if faulted then
ElemFFR[e, esize] = '0';

// Value becomes CONSTRAINED UNPREDICTABLE after an FFR element is FALSE

Z[t] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

LDNF1D Page 2177

LDNF1H

Contiguous load non-fault unsigned halfwords to vector (immediate index)

Contiguous load with non-faulting behavior of unsigned halfwords to elements of a vector register from the memory
address generated by a 64-bit scalar base and immediate index in the range -8 to 7 which is multiplied by the vector's
in-memory size, irrespective of predication, and added to the base address. Inactive elements will not not cause a read
from Device memory or signal a fault, and are set to zero in the destination vector.
It has encodings from 3 classes: 16-bit element , 32-bit element and 64-bit element

16-bit element

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 0 1 0 1 1 imm4 1 0 1 Pg Rn Zt
dtype<3:1>dtype<0>

LDNF1H { <Zt>.H }, <Pg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]

if !HaveSVE() then UNDEFINED;

integer t = UInt(Zt);
integer n = UInt(Rn);
integer g = UInt(Pg);
integer esize = 16;
integer msize = 16;
boolean unsigned = TRUE;
integer offset = SInt(imm4);

32-bit element

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 0 1 1 0 1 imm4 1 0 1 Pg Rn Zt
dtype<3:1>dtype<0>

LDNF1H { <Zt>.S }, <Pg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]

if !HaveSVE() then UNDEFINED;

integer t = UInt(Zt);
integer n = UInt(Rn);
integer g = UInt(Pg);
integer esize = 32;
integer msize = 16;
boolean unsigned = TRUE;
integer offset = SInt(imm4);

64-bit element

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 0 1 1 1 1 imm4 1 0 1 Pg Rn Zt
dtype<3:1>dtype<0>

LDNF1H { <Zt>.D }, <Pg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]

if !HaveSVE() then UNDEFINED;

integer t = UInt(Zt);
integer n = UInt(Rn);
integer g = UInt(Pg);
integer esize = 64;
integer msize = 16;
boolean unsigned = TRUE;
integer offset = SInt(imm4);

LDNF1H Page 2178

Assembler Symbols

Operation

if HaveMTEExt() then SetTagCheckedInstruction(n != 31);

if !AnyActiveElement(mask, esize) then

if n == 31 && ConstrainUnpredictableBool(Unpredictable_CHECKSPNONEACTIVE) then
CheckSPAlignment();
else
if n == 31 then CheckSPAlignment();
base = if n == 31 then SP[] else X[n];

// FFR elements set to FALSE following a supressed access/fault

faulted = faulted || fault;
if faulted then
ElemFFR[e, esize] = '0';

// Value becomes CONSTRAINED UNPREDICTABLE after an FFR element is FALSE

Z[t] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

LDNF1H Page 2179

LDNF1SB

Contiguous load non-fault signed bytes to vector (immediate index)

Contiguous load with non-faulting behavior of signed bytes to elements of a vector register from the memory address
generated by a 64-bit scalar base and immediate index in the range -8 to 7 which is multiplied by the vector's in-
memory size, irrespective of predication, and added to the base address. Inactive elements will not not cause a read
from Device memory or signal a fault, and are set to zero in the destination vector.
It has encodings from 3 classes: 16-bit element , 32-bit element and 64-bit element

16-bit element

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 1 1 1 0 1 imm4 1 0 1 Pg Rn Zt
dtype<3:1>dtype<0>

LDNF1SB { <Zt>.H }, <Pg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]

if !HaveSVE() then UNDEFINED;

integer t = UInt(Zt);
integer n = UInt(Rn);
integer g = UInt(Pg);
integer esize = 16;
integer msize = 8;
boolean unsigned = FALSE;
integer offset = SInt(imm4);

32-bit element

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 1 1 0 1 1 imm4 1 0 1 Pg Rn Zt
dtype<3:1>dtype<0>

LDNF1SB { <Zt>.S }, <Pg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]

if !HaveSVE() then UNDEFINED;

integer t = UInt(Zt);
integer n = UInt(Rn);
integer g = UInt(Pg);
integer esize = 32;
integer msize = 8;
boolean unsigned = FALSE;
integer offset = SInt(imm4);

64-bit element

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 1 1 0 0 1 imm4 1 0 1 Pg Rn Zt
dtype<3:1>dtype<0>

LDNF1SB { <Zt>.D }, <Pg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]

if !HaveSVE() then UNDEFINED;

integer t = UInt(Zt);
integer n = UInt(Rn);
integer g = UInt(Pg);
integer esize = 64;
integer msize = 8;
boolean unsigned = FALSE;
integer offset = SInt(imm4);

LDNF1SB Page 2180

Assembler Symbols

Operation

if HaveMTEExt() then SetTagCheckedInstruction(n != 31);

if !AnyActiveElement(mask, esize) then

if n == 31 && ConstrainUnpredictableBool(Unpredictable_CHECKSPNONEACTIVE) then
CheckSPAlignment();
else
if n == 31 then CheckSPAlignment();
base = if n == 31 then SP[] else X[n];

// FFR elements set to FALSE following a supressed access/fault

faulted = faulted || fault;
if faulted then
ElemFFR[e, esize] = '0';

// Value becomes CONSTRAINED UNPREDICTABLE after an FFR element is FALSE

Z[t] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

LDNF1SB Page 2181

LDNF1SH

Contiguous load non-fault signed halfwords to vector (immediate index)

Contiguous load with non-faulting behavior of signed halfwords to elements of a vector register from the memory
address generated by a 64-bit scalar base and immediate index in the range -8 to 7 which is multiplied by the vector's
in-memory size, irrespective of predication, and added to the base address. Inactive elements will not not cause a read
from Device memory or signal a fault, and are set to zero in the destination vector.
It has encodings from 2 classes: 32-bit element and 64-bit element

32-bit element

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 1 0 0 1 1 imm4 1 0 1 Pg Rn Zt
dtype<3:1>dtype<0>

LDNF1SH { <Zt>.S }, <Pg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]

if !HaveSVE() then UNDEFINED;

integer t = UInt(Zt);
integer n = UInt(Rn);
integer g = UInt(Pg);
integer esize = 32;
integer msize = 16;
boolean unsigned = FALSE;
integer offset = SInt(imm4);

64-bit element

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 1 0 0 0 1 imm4 1 0 1 Pg Rn Zt
dtype<3:1>dtype<0>

LDNF1SH { <Zt>.D }, <Pg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]

if !HaveSVE() then UNDEFINED;

integer t = UInt(Zt);
integer n = UInt(Rn);
integer g = UInt(Pg);
integer esize = 64;
integer msize = 16;
boolean unsigned = FALSE;
integer offset = SInt(imm4);

Assembler Symbols

LDNF1SH Page 2182

Operation

if HaveMTEExt() then SetTagCheckedInstruction(n != 31);

if !AnyActiveElement(mask, esize) then

if n == 31 && ConstrainUnpredictableBool(Unpredictable_CHECKSPNONEACTIVE) then
CheckSPAlignment();
else
if n == 31 then CheckSPAlignment();
base = if n == 31 then SP[] else X[n];

// FFR elements set to FALSE following a supressed access/fault

faulted = faulted || fault;
if faulted then
ElemFFR[e, esize] = '0';

// Value becomes CONSTRAINED UNPREDICTABLE after an FFR element is FALSE

Z[t] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

LDNF1SH Page 2183

LDNF1SW

Contiguous load non-fault signed words to vector (immediate index)

Contiguous load with non-faulting behavior of signed words to elements of a vector register from the memory address
generated by a 64-bit scalar base and immediate index in the range -8 to 7 which is multiplied by the vector's in-
memory size, irrespective of predication, and added to the base address. Inactive elements will not not cause a read
from Device memory or signal a fault, and are set to zero in the destination vector.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 0 1 0 0 1 imm4 1 0 1 Pg Rn Zt
dtype<3:1>dtype<0>

LDNF1SW { <Zt>.D }, <Pg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]

if !HaveSVE() then UNDEFINED;

integer t = UInt(Zt);
integer n = UInt(Rn);
integer g = UInt(Pg);
integer esize = 64;
integer msize = 32;
boolean unsigned = FALSE;
integer offset = SInt(imm4);

Assembler Symbols

LDNF1SW Page 2184

Operation

if HaveMTEExt() then SetTagCheckedInstruction(n != 31);

if !AnyActiveElement(mask, esize) then

if n == 31 && ConstrainUnpredictableBool(Unpredictable_CHECKSPNONEACTIVE) then
CheckSPAlignment();
else
if n == 31 then CheckSPAlignment();
base = if n == 31 then SP[] else X[n];

// FFR elements set to FALSE following a supressed access/fault

faulted = faulted || fault;
if faulted then
ElemFFR[e, esize] = '0';

// Value becomes CONSTRAINED UNPREDICTABLE after an FFR element is FALSE

Z[t] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

LDNF1SW Page 2185

LDNF1W

Contiguous load non-fault unsigned words to vector (immediate index)

Contiguous load with non-faulting behavior of unsigned words to elements of a vector register from the memory
address generated by a 64-bit scalar base and immediate index in the range -8 to 7 which is multiplied by the vector's
in-memory size, irrespective of predication, and added to the base address. Inactive elements will not not cause a read
from Device memory or signal a fault, and are set to zero in the destination vector.
It has encodings from 2 classes: 32-bit element and 64-bit element

32-bit element

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 1 0 1 0 1 imm4 1 0 1 Pg Rn Zt
dtype<3:1>dtype<0>

LDNF1W { <Zt>.S }, <Pg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]

if !HaveSVE() then UNDEFINED;

integer t = UInt(Zt);
integer n = UInt(Rn);
integer g = UInt(Pg);
integer esize = 32;
integer msize = 32;
boolean unsigned = TRUE;
integer offset = SInt(imm4);

64-bit element

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 1 0 1 1 1 imm4 1 0 1 Pg Rn Zt
dtype<3:1>dtype<0>

LDNF1W { <Zt>.D }, <Pg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]

if !HaveSVE() then UNDEFINED;

integer t = UInt(Zt);
integer n = UInt(Rn);
integer g = UInt(Pg);
integer esize = 64;
integer msize = 32;
boolean unsigned = TRUE;
integer offset = SInt(imm4);

Assembler Symbols

LDNF1W Page 2186

Operation

if HaveMTEExt() then SetTagCheckedInstruction(n != 31);

if !AnyActiveElement(mask, esize) then

if n == 31 && ConstrainUnpredictableBool(Unpredictable_CHECKSPNONEACTIVE) then
CheckSPAlignment();
else
if n == 31 then CheckSPAlignment();
base = if n == 31 then SP[] else X[n];

// FFR elements set to FALSE following a supressed access/fault

faulted = faulted || fault;
if faulted then
ElemFFR[e, esize] = '0';

// Value becomes CONSTRAINED UNPREDICTABLE after an FFR element is FALSE

Z[t] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

LDNF1W Page 2187

LDNT1B (scalar plus immediate)

Contiguous load non-temporal bytes to vector (immediate index)

Contiguous load non-temporal of bytes to elements of a vector register from the memory address generated by a 64-bit
scalar base and immediate index in the range -8 to 7 which is multiplied by the vector's in-memory size, irrespective of
predication, and added to the base address. Inactive elements will not not cause a read from Device memory or signal
a fault, and are set to zero in the destination vector.
A non-temporal load is a hint to the system that this data is unlikely to be referenced again soon.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 0 0 0 0 0 imm4 1 1 1 Pg Rn Zt
msz<1>msz<0>

LDNT1B { <Zt>.B }, <Pg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]

if !HaveSVE() then UNDEFINED;

integer t = UInt(Zt);
integer n = UInt(Rn);
integer g = UInt(Pg);
integer esize = 8;
integer offset = SInt(imm4);

Assembler Symbols

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(64) base;
bits(PL) mask = P[g];
bits(VL) result;
constant integer mbytes = esize DIV 8;

if HaveMTEExt() then SetTagCheckedInstruction(n != 31);

if !AnyActiveElement(mask, esize) then

if n == 31 && ConstrainUnpredictableBool(Unpredictable_CHECKSPNONEACTIVE) then
CheckSPAlignment();
else
if n == 31 then CheckSPAlignment();
base = if n == 31 then SP[] else X[n];

Z[t] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

LDNT1B (scalar plus

Page 2188
immediate)
LDNT1B (scalar plus scalar)

Contiguous load non-temporal bytes to vector (scalar index)

Contiguous load non-temporal of bytes to elements of a vector register from the memory address generated by a 64-bit
scalar base and scalar index which is added to the base address. After each element access the index value is
incremented, but the index register is not updated. Inactive elements will not not cause a read from Device memory or
signal a fault, and are set to zero in the destination vector.
A non-temporal load is a hint to the system that this data is unlikely to be referenced again soon.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 0 0 0 0 Rm 1 1 0 Pg Rn Zt
msz<1>msz<0>

LDNT1B { <Zt>.B }, <Pg>/Z, [<Xn|SP>, <Xm>]

if !HaveSVE() then UNDEFINED;

if Rm == '11111' then UNDEFINED;
integer t = UInt(Zt);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer g = UInt(Pg);
integer esize = 8;

Assembler Symbols

<Zt> Is the name of the scalable vector register to be transferred, encoded in the "Zt" field.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<Xm> Is the 64-bit name of the general-purpose offset register, encoded in the "Rm" field.

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(64) base;
bits(64) offset;
bits(PL) mask = P[g];
bits(VL) result;
constant integer mbytes = esize DIV 8;

if HaveMTEExt() then SetTagCheckedInstruction(TRUE);

if !AnyActiveElement(mask, esize) then

if n == 31 && ConstrainUnpredictableBool(Unpredictable_CHECKSPNONEACTIVE) then
CheckSPAlignment();
else
if n == 31 then CheckSPAlignment();
base = if n == 31 then SP[] else X[n];
offset = X[m];

for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
bits(64) addr = base + (UInt(offset) + e) * mbytes;
Elem[result, e, esize] = Mem[addr, mbytes, AccType_SVESTREAM];
else
Elem[result, e, esize] = Zeros();

Z[t] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

LDNT1B (scalar plus scalar) Page 2189

LDNT1D (scalar plus immediate)

Contiguous load non-temporal doublewords to vector (immediate index)

Contiguous load non-temporal of doublewords to elements of a vector register from the memory address generated by
a 64-bit scalar base and immediate index in the range -8 to 7 which is multiplied by the vector's in-memory size,
irrespective of predication, and added to the base address. Inactive elements will not not cause a read from Device
memory or signal a fault, and are set to zero in the destination vector.
A non-temporal load is a hint to the system that this data is unlikely to be referenced again soon.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 1 1 0 0 0 imm4 1 1 1 Pg Rn Zt
msz<1>msz<0>

LDNT1D { <Zt>.D }, <Pg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]

if !HaveSVE() then UNDEFINED;

integer t = UInt(Zt);
integer n = UInt(Rn);
integer g = UInt(Pg);
integer esize = 64;
integer offset = SInt(imm4);

Assembler Symbols

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(64) base;
bits(PL) mask = P[g];
bits(VL) result;
constant integer mbytes = esize DIV 8;

if HaveMTEExt() then SetTagCheckedInstruction(n != 31);

if !AnyActiveElement(mask, esize) then

if n == 31 && ConstrainUnpredictableBool(Unpredictable_CHECKSPNONEACTIVE) then
CheckSPAlignment();
else
if n == 31 then CheckSPAlignment();
base = if n == 31 then SP[] else X[n];

Z[t] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

LDNT1D (scalar plus

Page 2190
immediate)
LDNT1D (scalar plus scalar)

Contiguous load non-temporal doublewords to vector (scalar index)

Contiguous load non-temporal of doublewords to elements of a vector register from the memory address generated by
a 64-bit scalar base and scalar index which is multiplied by 8 and added to the base address. After each element
access the index value is incremented, but the index register is not updated. Inactive elements will not not cause a
read from Device memory or signal a fault, and are set to zero in the destination vector.
A non-temporal load is a hint to the system that this data is unlikely to be referenced again soon.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 1 1 0 0 Rm 1 1 0 Pg Rn Zt
msz<1>msz<0>

LDNT1D { <Zt>.D }, <Pg>/Z, [<Xn|SP>, <Xm>, LSL #3]

if !HaveSVE() then UNDEFINED;

if Rm == '11111' then UNDEFINED;
integer t = UInt(Zt);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer g = UInt(Pg);
integer esize = 64;

Assembler Symbols

<Zt> Is the name of the scalable vector register to be transferred, encoded in the "Zt" field.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<Xm> Is the 64-bit name of the general-purpose offset register, encoded in the "Rm" field.

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(64) base;
bits(64) offset;
bits(PL) mask = P[g];
bits(VL) result;
constant integer mbytes = esize DIV 8;

if HaveMTEExt() then SetTagCheckedInstruction(TRUE);

if !AnyActiveElement(mask, esize) then

if n == 31 && ConstrainUnpredictableBool(Unpredictable_CHECKSPNONEACTIVE) then
CheckSPAlignment();
else
if n == 31 then CheckSPAlignment();
base = if n == 31 then SP[] else X[n];
offset = X[m];

Z[t] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

LDNT1D (scalar plus scalar) Page 2191

LDNT1H (scalar plus immediate)

Contiguous load non-temporal halfwords to vector (immediate index)

Contiguous load non-temporal of halfwords to elements of a vector register from the memory address generated by a
64-bit scalar base and immediate index in the range -8 to 7 which is multiplied by the vector's in-memory size,
irrespective of predication, and added to the base address. Inactive elements will not not cause a read from Device
memory or signal a fault, and are set to zero in the destination vector.
A non-temporal load is a hint to the system that this data is unlikely to be referenced again soon.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 0 1 0 0 0 imm4 1 1 1 Pg Rn Zt
msz<1>msz<0>

LDNT1H { <Zt>.H }, <Pg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]

if !HaveSVE() then UNDEFINED;

integer t = UInt(Zt);
integer n = UInt(Rn);
integer g = UInt(Pg);
integer esize = 16;
integer offset = SInt(imm4);

Assembler Symbols

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(64) base;
bits(PL) mask = P[g];
bits(VL) result;
constant integer mbytes = esize DIV 8;

if HaveMTEExt() then SetTagCheckedInstruction(n != 31);

if !AnyActiveElement(mask, esize) then

if n == 31 && ConstrainUnpredictableBool(Unpredictable_CHECKSPNONEACTIVE) then
CheckSPAlignment();
else
if n == 31 then CheckSPAlignment();
base = if n == 31 then SP[] else X[n];

Z[t] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

LDNT1H (scalar plus

Page 2192
immediate)
LDNT1H (scalar plus scalar)

Contiguous load non-temporal halfwords to vector (scalar index)

Contiguous load non-temporal of halfwords to elements of a vector register from the memory address generated by a
64-bit scalar base and scalar index which is multiplied by 2 and added to the base address. After each element access
the index value is incremented, but the index register is not updated. Inactive elements will not not cause a read from
Device memory or signal a fault, and are set to zero in the destination vector.
A non-temporal load is a hint to the system that this data is unlikely to be referenced again soon.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 0 1 0 0 Rm 1 1 0 Pg Rn Zt
msz<1>msz<0>

LDNT1H { <Zt>.H }, <Pg>/Z, [<Xn|SP>, <Xm>, LSL #1]

if !HaveSVE() then UNDEFINED;

if Rm == '11111' then UNDEFINED;
integer t = UInt(Zt);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer g = UInt(Pg);
integer esize = 16;

Assembler Symbols

<Zt> Is the name of the scalable vector register to be transferred, encoded in the "Zt" field.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<Xm> Is the 64-bit name of the general-purpose offset register, encoded in the "Rm" field.

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(64) base;
bits(64) offset;
bits(PL) mask = P[g];
bits(VL) result;
constant integer mbytes = esize DIV 8;

if HaveMTEExt() then SetTagCheckedInstruction(TRUE);

if !AnyActiveElement(mask, esize) then

if n == 31 && ConstrainUnpredictableBool(Unpredictable_CHECKSPNONEACTIVE) then
CheckSPAlignment();
else
if n == 31 then CheckSPAlignment();
base = if n == 31 then SP[] else X[n];
offset = X[m];

Z[t] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

LDNT1H (scalar plus scalar) Page 2193

LDNT1W (scalar plus immediate)

Contiguous load non-temporal words to vector (immediate index)

Contiguous load non-temporal of words to elements of a vector register from the memory address generated by a
64-bit scalar base and immediate index in the range -8 to 7 which is multiplied by the vector's in-memory size,
irrespective of predication, and added to the base address. Inactive elements will not not cause a read from Device
memory or signal a fault, and are set to zero in the destination vector.
A non-temporal load is a hint to the system that this data is unlikely to be referenced again soon.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 1 0 0 0 0 imm4 1 1 1 Pg Rn Zt
msz<1>msz<0>

LDNT1W { <Zt>.S }, <Pg>/Z, [<Xn|SP>{, #<imm>, MUL VL}]

if !HaveSVE() then UNDEFINED;

integer t = UInt(Zt);
integer n = UInt(Rn);
integer g = UInt(Pg);
integer esize = 32;
integer offset = SInt(imm4);

Assembler Symbols

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(64) base;
bits(PL) mask = P[g];
bits(VL) result;
constant integer mbytes = esize DIV 8;

if HaveMTEExt() then SetTagCheckedInstruction(n != 31);

if !AnyActiveElement(mask, esize) then

if n == 31 && ConstrainUnpredictableBool(Unpredictable_CHECKSPNONEACTIVE) then
CheckSPAlignment();
else
if n == 31 then CheckSPAlignment();
base = if n == 31 then SP[] else X[n];

Z[t] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

LDNT1W (scalar plus

Page 2194
immediate)
LDNT1W (scalar plus scalar)

Contiguous load non-temporal words to vector (scalar index)

Contiguous load non-temporal of words to elements of a vector register from the memory address generated by a
64-bit scalar base and scalar index which is multiplied by 4 and added to the base address. After each element access
the index value is incremented, but the index register is not updated. Inactive elements will not not cause a read from
Device memory or signal a fault, and are set to zero in the destination vector.
A non-temporal load is a hint to the system that this data is unlikely to be referenced again soon.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 1 0 0 1 0 1 0 0 0 Rm 1 1 0 Pg Rn Zt
msz<1>msz<0>

LDNT1W { <Zt>.S }, <Pg>/Z, [<Xn|SP>, <Xm>, LSL #2]

if !HaveSVE() then UNDEFINED;

if Rm == '11111' then UNDEFINED;
integer t = UInt(Zt);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer g = UInt(Pg);
integer esize = 32;

Assembler Symbols

<Zt> Is the name of the scalable vector register to be transferred, encoded in the "Zt" field.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<Xm> Is the 64-bit name of the general-purpose offset register, encoded in the "Rm" field.

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(64) base;
bits(64) offset;
bits(PL) mask = P[g];
bits(VL) result;
constant integer mbytes = esize DIV 8;

if HaveMTEExt() then SetTagCheckedInstruction(TRUE);

if !AnyActiveElement(mask, esize) then

if n == 31 && ConstrainUnpredictableBool(Unpredictable_CHECKSPNONEACTIVE) then
CheckSPAlignment();
else
if n == 31 then CheckSPAlignment();
base = if n == 31 then SP[] else X[n];
offset = X[m];

Z[t] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

LDNT1W (scalar plus scalar) Page 2195

LDR (predicate)

Load predicate register

Load a predicate register from a memory address generated by a 64-bit scalar base, plus an immediate offset in the
range -256 to 255 which is multiplied by the current predicate register size in bytes. This instruction is unpredicated.
The load is performed as contiguous byte accesses, each containing 8 consecutive predicate bits in ascending element
order, with no endian conversion and no guarantee of single-copy atomicity larger than a byte. However, if alignment
is checked, then a general-purpose base register must be aligned to 2 bytes.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 0 0 0 1 0 1 1 0 imm9h 0 0 0 imm9l Rn 0 Pt

LDR <Pt>, [<Xn|SP>{, #<imm>, MUL VL}]

if !HaveSVE() then UNDEFINED;

integer t = UInt(Pt);
integer n = UInt(Rn);
integer imm = SInt(imm9h:imm9l);

Assembler Symbols

<Pt> Is the name of the destination scalable predicate register, encoded in the "Pt" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<imm> Is the optional signed immediate vector offset, in the range -256 to 255, defaulting to 0, encoded in the
"imm9h:imm9l" fields.

Operation

CheckSVEEnabled();
integer elements = PL DIV 8;
bits(64) base;
integer offset = imm * elements;
bits(PL) result;

if n == 31 then
CheckSPAlignment();
if HaveMTEExt() then SetTagCheckedInstruction(FALSE);
base = SP[];
else
if HaveMTEExt() then SetTagCheckedInstruction(TRUE);
base = X[n];

boolean aligned = AArch64.CheckAlignment(base + offset, 2, AccType_SVE, FALSE);

for e = 0 to elements-1
Elem[result, e, 8] = AArch64.MemSingle[base + offset, 1, AccType_SVE, aligned];
offset = offset + 1;

P[t] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

LDR (predicate) Page 2196

LDR (vector)

Load vector register

Load a vector register from a memory address generated by a 64-bit scalar base, plus an immediate offset in the range
-256 to 255 which is multiplied by the current vector register size in bytes. This instruction is unpredicated.
The load is performed as contiguous byte accesses, with no endian conversion and no guarantee of single-copy
atomicity larger than a byte. However, if alignment is checked, then the base register must be aligned to 16 bytes.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 0 0 0 1 0 1 1 0 imm9h 0 1 0 imm9l Rn Zt

LDR <Zt>, [<Xn|SP>{, #<imm>, MUL VL}]

if !HaveSVE() then UNDEFINED;

integer t = UInt(Zt);
integer n = UInt(Rn);
integer imm = SInt(imm9h:imm9l);

Assembler Symbols

<Zt> Is the name of the scalable vector register to be transferred, encoded in the "Zt" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<imm> Is the optional signed immediate vector offset, in the range -256 to 255, defaulting to 0, encoded in the
"imm9h:imm9l" fields.

Operation

CheckSVEEnabled();
integer elements = VL DIV 8;
bits(64) base;
integer offset = imm * elements;
bits(VL) result;

if n == 31 then
CheckSPAlignment();
if HaveMTEExt() then SetTagCheckedInstruction(FALSE);
base = SP[];
else
if HaveMTEExt() then SetTagCheckedInstruction(TRUE);
base = X[n];

boolean aligned = AArch64.CheckAlignment(base + offset, 16, AccType_SVE, FALSE);

for e = 0 to elements-1
Elem[result, e, 8] = AArch64.MemSingle[base + offset, 1, AccType_SVE, aligned];
offset = offset + 1;

Z[t] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

LDR (vector) Page 2197

LSL (immediate, predicated)

Logical shift left by immediate (predicated)

Shift left by immediate each active element of the source vector, and destructively place the results in the
corresponding elements of the source vector. The immediate shift amount is an unsigned value in the range 0 to
number of bits per element minus 1. Inactive elements in the destination vector register remain unmodified.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 tszh 0 0 0 0 1 1 1 0 0 Pg tszl imm3 Zdn
L U

LSL <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, #<const>

if !HaveSVE() then UNDEFINED;

Assembler Symbols

<Zdn> Is the name of the source and destination scalable vector register, encoded in the "Zdn" field.

<T> Is the size specifier, encoded in “tszh:tszl”:

tszh tszl <T>

00 00 RESERVED
00 01 B
00 1x H
01 xx S
1x xx D

<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<const> Is the immediate shift amount, in the range 0 to number of bits per element minus 1, encoded in
"tsz:imm3".

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(VL) operand1 = Z[dn];
bits(PL) mask = P[g];
bits(VL) result;

for e = 0 to elements-1
bits(esize) element1 = Elem[operand1, e, esize];
if ElemP[mask, e, esize] == '1' then
Elem[result, e, esize] = LSL(element1, shift);
else
Elem[result, e, esize] = Elem[operand1, e, esize];

Z[dn] = result;

LSL (immediate, predicated) Page 2198

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

LSL (immediate, predicated) Page 2199

LSL (immediate, unpredicated)

Logical shift left by immediate (unpredicated)

Shift left by immediate each element of the source vector, and place the results in the corresponding elements of the
destination vector. The immediate shift amount is an unsigned value in the range 0 to number of bits per element
minus 1. This instruction is unpredicated.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 tszh 1 tszl imm3 1 0 0 1 1 1 Zn Zd

LSL <Zd>.<T>, <Zn>.<T>, #<const>

if !HaveSVE() then UNDEFINED;

Assembler Symbols

<Zd> Is the name of the destination scalable vector register, encoded in the "Zd" field.

<T> Is the size specifier, encoded in “tszh:tszl”:

tszh tszl <T>

00 00 RESERVED
00 01 B
00 1x H
01 xx S
1x xx D

<Zn> Is the name of the source scalable vector register, encoded in the "Zn" field.
<const> Is the immediate shift amount, in the range 0 to number of bits per element minus 1, encoded in
"tsz:imm3".

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(VL) operand1 = Z[n];
bits(VL) result;

for e = 0 to elements-1
bits(esize) element1 = Elem[operand1, e, esize];
Elem[result, e, esize] = LSL(element1, shift);

Z[d] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

LSL (immediate,
Page 2200
unpredicated)
LSL (vectors)

Logical shift left by vector (predicated)

Shift left active elements of the first source vector by corresponding elements of the second source vector and
destructively place the results in the corresponding elements of the first source vector. The shift amount operand is a
vector of unsigned elements in which all bits are significant, and not used modulo the element size. Inactive elements
in the destination vector register remain unmodified.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 0 1 0 0 1 1 1 0 0 Pg Zm Zdn
R L U

LSL <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>

if !HaveSVE() then UNDEFINED;

integer esize = 8 << UInt(size);
integer g = UInt(Pg);
integer dn = UInt(Zdn);
integer m = UInt(Zm);

Assembler Symbols

<Zdn> Is the name of the first source and destination scalable vector register, encoded in the "Zdn" field.

<T> Is the size specifier, encoded in “size”:

size <T>
00 B
01 H
10 S
11 D

<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Zm> Is the name of the second source scalable vector register, encoded in the "Zm" field.

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) operand1 = Z[dn];
bits(VL) operand2 = if AnyActiveElement(mask, esize) then Z[m] else Zeros();
bits(VL) result;

for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
bits(esize) element1 = Elem[operand1, e, esize];
bits(esize) element2 = Elem[operand2, e, esize];
integer shift = Min(UInt(element2), esize);
Elem[result, e, esize] = LSL(element1, shift);
else
Elem[result, e, esize] = Elem[operand1, e, esize];

Z[dn] = result;

Operational information

LSL (vectors) Page 2201

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

LSL (vectors) Page 2202

LSL (wide elements, predicated)

Logical shift left by 64-bit wide elements (predicated)

Shift left active elements of the first source vector by corresponding overlapping 64-bit elements of the second source
vector and destructively place the results in the corresponding elements of the first source vector. The shift amount is
a vector of unsigned 64-bit doubleword elements in which all bits are significant, and not used modulo the destination
element size. Inactive elements in the destination vector register remain unmodified.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 0 1 1 0 1 1 1 0 0 Pg Zm Zdn
R L U

LSL <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.D

if !HaveSVE() then UNDEFINED;

if size == '11' then UNDEFINED;
integer esize = 8 << UInt(size);
integer g = UInt(Pg);
integer dn = UInt(Zdn);
integer m = UInt(Zm);

Assembler Symbols

<Zdn> Is the name of the first source and destination scalable vector register, encoded in the "Zdn" field.

<T> Is the size specifier, encoded in “size”:

size <T>
00 B
01 H
10 S
11 RESERVED

<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Zm> Is the name of the second source scalable vector register, encoded in the "Zm" field.

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) operand1 = Z[dn];
bits(VL) operand2 = if AnyActiveElement(mask, esize) then Z[m] else Zeros();
bits(VL) result;

for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
bits(esize) element1 = Elem[operand1, e, esize];
bits(64) element2 = Elem[operand2, (e * esize) DIV 64, 64];
integer shift = Min(UInt(element2), esize);
Elem[result, e, esize] = LSL(element1, shift);
else
Elem[result, e, esize] = Elem[operand1, e, esize];

Z[dn] = result;

Operational information

LSL (wide elements,

Page 2203
predicated)
• The MOVPRFX instruction must be unpredicated, or be predicated using the same governing predicate register
and destination element size as this instruction.
• The MOVPRFX instruction must specify the same destination register as this instruction.
• The destination register must not refer to architectural register state referenced by any other source operand
register of this instruction.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

LSL (wide elements,

Page 2204
predicated)
LSL (wide elements, unpredicated)

Logical shift left by 64-bit wide elements (unpredicated)

Shift left all elements of the first source vector by corresponding overlapping 64-bit elements of the second source
vector and place the first in the corresponding elements of the destination vector. The shift amount is a vector of
unsigned 64-bit doubleword elements in which all bits are significant, and not used modulo the destination element
size. Inactive elements in the destination vector register remain unmodified.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 1 Zm 1 0 0 0 1 1 Zn Zd

LSL <Zd>.<T>, <Zn>.<T>, <Zm>.D

if !HaveSVE() then UNDEFINED;

if size == '11' then UNDEFINED;
integer esize = 8 << UInt(size);
integer n = UInt(Zn);
integer m = UInt(Zm);
integer d = UInt(Zd);

Assembler Symbols

<Zd> Is the name of the destination scalable vector register, encoded in the "Zd" field.

<T> Is the size specifier, encoded in “size”:

size <T>
00 B
01 H
10 S
11 RESERVED

<Zn> Is the name of the first source scalable vector register, encoded in the "Zn" field.
<Zm> Is the name of the second source scalable vector register, encoded in the "Zm" field.

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(VL) operand1 = Z[n];
bits(VL) operand2 = Z[m];
bits(VL) result;

Z[d] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

LSL (wide elements,

Page 2205
unpredicated)
LSLR

Reversed logical shift left by vector (predicated)

Reversed shift left active elements of the second source vector by corresponding elements of the first source vector
and destructively place the results in the corresponding elements of the first source vector. The shift amount operand
is a vector of unsigned elements in which all bits are significant, and not used modulo the element size. Inactive
elements in the destination vector register remain unmodified.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 0 1 0 1 1 1 1 0 0 Pg Zm Zdn
R L U

LSLR <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>

if !HaveSVE() then UNDEFINED;

integer esize = 8 << UInt(size);
integer g = UInt(Pg);
integer dn = UInt(Zdn);
integer m = UInt(Zm);

Assembler Symbols

<Zdn> Is the name of the first source and destination scalable vector register, encoded in the "Zdn" field.

<T> Is the size specifier, encoded in “size”:

size <T>
00 B
01 H
10 S
11 D

<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Zm> Is the name of the second source scalable vector register, encoded in the "Zm" field.

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) operand1 = Z[dn];
bits(VL) operand2 = if AnyActiveElement(mask, esize) then Z[m] else Zeros();
bits(VL) result;

for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
bits(esize) element1 = Elem[operand1, e, esize];
bits(esize) element2 = Elem[operand2, e, esize];
integer shift = Min(UInt(element1), esize);
Elem[result, e, esize] = LSL(element2, shift);
else
Elem[result, e, esize] = Elem[operand1, e, esize];

Z[dn] = result;

Operational information

LSLR Page 2206

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

LSLR Page 2207

LSR (immediate, predicated)

Logical shift right by immediate (predicated)

Shift right by immediate, inserting zeroes, each active element of the source vector, and destructively place the results
in the corresponding elements of the source vector. The immediate shift amount is an unsigned value in the range 1 to
number of bits per element. Inactive elements in the destination vector register remain unmodified.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 tszh 0 0 0 0 0 1 1 0 0 Pg tszl imm3 Zdn
L U

LSR <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, #<const>

if !HaveSVE() then UNDEFINED;

Assembler Symbols

<Zdn> Is the name of the source and destination scalable vector register, encoded in the "Zdn" field.

<T> Is the size specifier, encoded in “tszh:tszl”:

tszh tszl <T>

00 00 RESERVED
00 01 B
00 1x H
01 xx S
1x xx D

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(VL) operand1 = Z[dn];
bits(PL) mask = P[g];
bits(VL) result;

for e = 0 to elements-1
bits(esize) element1 = Elem[operand1, e, esize];
if ElemP[mask, e, esize] == '1' then
Elem[result, e, esize] = LSR(element1, shift);
else
Elem[result, e, esize] = Elem[operand1, e, esize];

Z[dn] = result;

LSR (immediate, predicated) Page 2208

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

LSR (immediate, predicated) Page 2209

LSR (immediate, unpredicated)

Logical shift right by immediate (unpredicated)

Shift right by immediate, inserting zeroes, each element of the source vector, and place the results in the
corresponding elements of the destination vector. The immediate shift amount is an unsigned value in the range 1 to
number of bits per element. This instruction is unpredicated.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 tszh 1 tszl imm3 1 0 0 1 0 1 Zn Zd
U

LSR <Zd>.<T>, <Zn>.<T>, #<const>

if !HaveSVE() then UNDEFINED;

Assembler Symbols

<Zd> Is the name of the destination scalable vector register, encoded in the "Zd" field.

<T> Is the size specifier, encoded in “tszh:tszl”:

tszh tszl <T>

00 00 RESERVED
00 01 B
00 1x H
01 xx S
1x xx D

<Zn> Is the name of the source scalable vector register, encoded in the "Zn" field.
<const> Is the immediate shift amount, in the range 1 to number of bits per element, encoded in "tsz:imm3".

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(VL) operand1 = Z[n];
bits(VL) result;

for e = 0 to elements-1
bits(esize) element1 = Elem[operand1, e, esize];
Elem[result, e, esize] = LSR(element1, shift);

Z[d] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

LSR (immediate,
Page 2210
unpredicated)
LSR (vectors)

Logical shift right by vector (predicated)

Shift right, inserting zeroes, active elements of the first source vector by corresponding elements of the second source
vector and destructively place the results in the corresponding elements of the first source vector. The shift amount
operand is a vector of unsigned elements in which all bits are significant, and not used modulo the element size.
Inactive elements in the destination vector register remain unmodified.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 0 1 0 0 0 1 1 0 0 Pg Zm Zdn
R L U

LSR <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>

if !HaveSVE() then UNDEFINED;

integer esize = 8 << UInt(size);
integer g = UInt(Pg);
integer dn = UInt(Zdn);
integer m = UInt(Zm);

Assembler Symbols

<Zdn> Is the name of the first source and destination scalable vector register, encoded in the "Zdn" field.

<T> Is the size specifier, encoded in “size”:

size <T>
00 B
01 H
10 S
11 D

<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Zm> Is the name of the second source scalable vector register, encoded in the "Zm" field.

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) operand1 = Z[dn];
bits(VL) operand2 = if AnyActiveElement(mask, esize) then Z[m] else Zeros();
bits(VL) result;

for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
bits(esize) element1 = Elem[operand1, e, esize];
bits(esize) element2 = Elem[operand2, e, esize];
integer shift = Min(UInt(element2), esize);
Elem[result, e, esize] = LSR(element1, shift);
else
Elem[result, e, esize] = Elem[operand1, e, esize];

Z[dn] = result;

Operational information

LSR (vectors) Page 2211

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

LSR (vectors) Page 2212

LSR (wide elements, predicated)

Logical shift right by 64-bit wide elements (predicated)

Shift right, inserting zeroes, active elements of the first source vector by corresponding overlapping 64-bit elements of
the second source vector and destructively place the results in the corresponding elements of the first source vector.
The shift amount is a vector of unsigned 64-bit doubleword elements in which all bits are significant, and not used
modulo the destination element size. Inactive elements in the destination vector register remain unmodified.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 0 1 1 0 0 1 1 0 0 Pg Zm Zdn
R L U

LSR <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.D

if !HaveSVE() then UNDEFINED;

if size == '11' then UNDEFINED;
integer esize = 8 << UInt(size);
integer g = UInt(Pg);
integer dn = UInt(Zdn);
integer m = UInt(Zm);

Assembler Symbols

<Zdn> Is the name of the first source and destination scalable vector register, encoded in the "Zdn" field.

<T> Is the size specifier, encoded in “size”:

size <T>
00 B
01 H
10 S
11 RESERVED

<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Zm> Is the name of the second source scalable vector register, encoded in the "Zm" field.

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) operand1 = Z[dn];
bits(VL) operand2 = if AnyActiveElement(mask, esize) then Z[m] else Zeros();
bits(VL) result;

for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
bits(esize) element1 = Elem[operand1, e, esize];
bits(64) element2 = Elem[operand2, (e * esize) DIV 64, 64];
integer shift = Min(UInt(element2), esize);
Elem[result, e, esize] = LSR(element1, shift);
else
Elem[result, e, esize] = Elem[operand1, e, esize];

Z[dn] = result;

Operational information

LSR (wide elements,

Page 2213
predicated)
• The MOVPRFX instruction must be unpredicated, or be predicated using the same governing predicate register
and destination element size as this instruction.
• The MOVPRFX instruction must specify the same destination register as this instruction.
• The destination register must not refer to architectural register state referenced by any other source operand
register of this instruction.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

LSR (wide elements,

Page 2214
predicated)
LSR (wide elements, unpredicated)

Logical shift right by 64-bit wide elements (unpredicated)

Shift right, inserting zeroes, all elements of the first source vector by corresponding overlapping 64-bit elements of the
second source vector and place the first in the corresponding elements of the destination vector. The shift amount is a
vector of unsigned 64-bit doubleword elements in which all bits are significant, and not used modulo the destination
element size. This instruction is unpredicated.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 1 Zm 1 0 0 0 0 1 Zn Zd
U

LSR <Zd>.<T>, <Zn>.<T>, <Zm>.D

if !HaveSVE() then UNDEFINED;

if size == '11' then UNDEFINED;
integer esize = 8 << UInt(size);
integer n = UInt(Zn);
integer m = UInt(Zm);
integer d = UInt(Zd);

Assembler Symbols

<Zd> Is the name of the destination scalable vector register, encoded in the "Zd" field.

<T> Is the size specifier, encoded in “size”:

size <T>
00 B
01 H
10 S
11 RESERVED

<Zn> Is the name of the first source scalable vector register, encoded in the "Zn" field.
<Zm> Is the name of the second source scalable vector register, encoded in the "Zm" field.

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(VL) operand1 = Z[n];
bits(VL) operand2 = Z[m];
bits(VL) result;

Z[d] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

LSR (wide elements,

Page 2215
unpredicated)
LSRR

Reversed logical shift right by vector (predicated)

Reversed shift right, inserting zeroes, active elements of the second source vector by corresponding elements of the
first source vector and destructively place the results in the corresponding elements of the first source vector. The
shift amount operand is a vector of unsigned elements in which all bits are significant, and not used modulo the
element size. Inactive elements in the destination vector register remain unmodified.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 0 1 0 1 0 1 1 0 0 Pg Zm Zdn
R L U

LSRR <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>

if !HaveSVE() then UNDEFINED;

integer esize = 8 << UInt(size);
integer g = UInt(Pg);
integer dn = UInt(Zdn);
integer m = UInt(Zm);

Assembler Symbols

<Zdn> Is the name of the first source and destination scalable vector register, encoded in the "Zdn" field.

<T> Is the size specifier, encoded in “size”:

size <T>
00 B
01 H
10 S
11 D

<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Zm> Is the name of the second source scalable vector register, encoded in the "Zm" field.

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) operand1 = Z[dn];
bits(VL) operand2 = if AnyActiveElement(mask, esize) then Z[m] else Zeros();
bits(VL) result;

for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
bits(esize) element1 = Elem[operand1, e, esize];
bits(esize) element2 = Elem[operand2, e, esize];
integer shift = Min(UInt(element1), esize);
Elem[result, e, esize] = LSR(element2, shift);
else
Elem[result, e, esize] = Elem[operand1, e, esize];

Z[dn] = result;

Operational information

LSRR Page 2216

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

LSRR Page 2217

MAD

Multiply-add vectors (predicated), writing multiplicand [Zdn = Za + Zdn * Zm]

Multiply the corresponding active elements of the first and second source vectors and add to elements of the third
(addend) vector. Destructively place the results in the destination and first source (multiplicand) vector. Inactive
elements in the destination vector register remain unmodified.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 0 Zm 1 1 0 Pg Za Zdn
op

MAD <Zdn>.<T>, <Pg>/M, <Zm>.<T>, <Za>.<T>

if !HaveSVE() then UNDEFINED;

integer esize = 8 << UInt(size);
integer g = UInt(Pg);
integer dn = UInt(Zdn);
integer m = UInt(Zm);
integer a = UInt(Za);
boolean sub_op = FALSE;

Assembler Symbols

<Zdn> Is the name of the first source and destination scalable vector register, encoded in the "Zdn" field.

<T> Is the size specifier, encoded in “size”:

size <T>
00 B
01 H
10 S
11 D

Operation

for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
integer element1 = UInt(Elem[operand1, e, esize]);
integer element2 = UInt(Elem[operand2, e, esize]);
integer product = element1 * element2;
if sub_op then
Elem[result, e, esize] = Elem[operand3, e, esize] - product;
else
Elem[result, e, esize] = Elem[operand3, e, esize] + product;
else
Elem[result, e, esize] = Elem[operand1, e, esize];

Z[dn] = result;

MAD Page 2218

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

MAD Page 2219

MLA

Multiply-add vectors (predicated), writing addend [Zda = Zda + Zn * Zm]

Multiply the corresponding active elements of the first and second source vectors and add to elements of the third
source (addend) vector. Destructively place the results in the destination and third source (addend) vector. Inactive
elements in the destination vector register remain unmodified.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 0 Zm 0 1 0 Pg Zn Zda
op

MLA <Zda>.<T>, <Pg>/M, <Zn>.<T>, <Zm>.<T>

if !HaveSVE() then UNDEFINED;

integer esize = 8 << UInt(size);
integer g = UInt(Pg);
integer n = UInt(Zn);
integer m = UInt(Zm);
integer da = UInt(Zda);
boolean sub_op = FALSE;

Assembler Symbols

<Zda> Is the name of the third source and destination scalable vector register, encoded in the "Zda" field.

<T> Is the size specifier, encoded in “size”:

size <T>
00 B
01 H
10 S
11 D

Operation

Z[da] = result;

MLA Page 2220

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

MLA Page 2221

MLS

Multiply-subtract vectors (predicated), writing addend [Zda = Zda - Zn * Zm]

Multiply the corresponding active elements of the first and second source vectors and subtract from elements of the
third source (addend) vector. Destructively place the results in the destination and third source (addend) vector.
Inactive elements in the destination vector register remain unmodified.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 0 Zm 0 1 1 Pg Zn Zda
op

MLS <Zda>.<T>, <Pg>/M, <Zn>.<T>, <Zm>.<T>

if !HaveSVE() then UNDEFINED;

integer esize = 8 << UInt(size);
integer g = UInt(Pg);
integer n = UInt(Zn);
integer m = UInt(Zm);
integer da = UInt(Zda);
boolean sub_op = TRUE;

Assembler Symbols

<Zda> Is the name of the third source and destination scalable vector register, encoded in the "Zda" field.

<T> Is the size specifier, encoded in “size”:

size <T>
00 B
01 H
10 S
11 D

Operation

Z[da] = result;

MLS Page 2222

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

MLS Page 2223

MOV

Move logical bitmask immediate to vector (unpredicated)

This is an alias of DUPM. This means:

• The encodings in this description are named to match the encodings of DUPM.
• The description of DUPM gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 1 1 0 0 0 0 imm13 Zd

MOV <Zd>.<T>, #<const>

is equivalent to

DUPM <Zd>.<T>, #<const>

and is the preferred disassembly when SVEMoveMaskPreferred(imm13).

Assembler Symbols

<Zd> Is the name of the destination scalable vector register, encoded in the "Zd" field.

<T> Is the size specifier, encoded in “imm13<12>:imm13<5:0>”:

imm13<12> imm13<5:0> <T>

0 0xxxxx S
0 10xxxx H
0 110xxx B
0 1110xx B
0 11110x B
0 111110 RESERVED
0 111111 RESERVED
1 xxxxxx D

<const> Is a 64, 32, 16 or 8-bit bitmask consisting of replicated 2, 4, 8, 16, 32 or 64 bit fields, each field
containing a rotated run of non-zero bits, encoded in the "imm13" field.

Operation

The description of DUPM gives the operational pseudocode for this instruction.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

MOV Page 2224

MOV

Move predicate (unpredicated)

Read all elements from the source predicate and place in the destination predicate. This instruction is unpredicated.
Does not set the condition flags.

This is an alias of ORR (predicates). This means:

• The encodings in this description are named to match the encodings of ORR (predicates).
• The description of ORR (predicates) gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 1 0 0 0 Pm 0 1 Pg 0 Pn 0 Pd
S

MOV <Pd>.B, <Pn>.B

is equivalent to

ORR <Pd>.B, <Pn>/Z, <Pn>.B, <Pn>.B

and is the preferred disassembly when S == '0' && Pn == Pm && Pm == Pg.

Assembler Symbols

<Pd> Is the name of the destination scalable predicate register, encoded in the "Pd" field.
<Pn> Is the name of the first source scalable predicate register, encoded in the "Pn" field.

Operation

The description of ORR (predicates) gives the operational pseudocode for this instruction.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

MOV Page 2225

MOV (immediate, predicated, merging)

Move signed integer immediate to vector elements (merging)

Move a signed integer immediate to each active element in the destination vector. Inactive elements in the destination
vector register remain unmodified.
The immediate operand is a signed value in the range -128 to +127, and for element widths of 16 bits or higher it may
also be a signed multiple of 256 in the range -32768 to +32512 (excluding 0).
The immediate is encoded in 8 bits with an optional left shift by 8. The preferred disassembly when the shift option is
specified is "#<simm8>, LSL #8". However an assembler and disassembler may also allow use of the shifted 16-bit
value unless the immediate is 0 and the shift amount is 8, which must be unambiguously described as "#0, LSL #8".

This is an alias of CPY (immediate, merging). This means:

• The encodings in this description are named to match the encodings of CPY (immediate, merging).
• The description of CPY (immediate, merging) gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 size 0 1 Pg 0 1 sh imm8 Zd
M

MOV <Zd>.<T>, <Pg>/M, #<imm>{, <shift>}

is equivalent to

CPY <Zd>.<T>, <Pg>/M, #<imm>{, <shift>}

and is always the preferred disassembly.

Assembler Symbols

<Zd> Is the name of the destination scalable vector register, encoded in the "Zd" field.

<T> Is the size specifier, encoded in “size”:

size <T>
00 B
01 H
10 S
11 D

<Pg> Is the name of the governing scalable predicate register, encoded in the "Pg" field.
<imm> Is a signed immediate in the range -128 to 127, encoded in the "imm8" field.

<shift> Is the optional left shift to apply to the immediate, defaulting to LSL #0 and encoded in “sh”:

sh <shift>
0 LSL #0
1 LSL #8

Operation

The description of CPY (immediate, merging) gives the operational pseudocode for this instruction.

Operational information

MOV (immediate, predicated,

Page 2226
merging)
• The destination register must not refer to architectural register state referenced by any other source operand
register of this instruction.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

MOV (immediate, predicated,

Page 2227
merging)
MOV (immediate, predicated, zeroing)

Move signed integer immediate to vector elements (zeroing)

Move a signed integer immediate to each active element in the destination vector. Inactive elements in the destination
vector register are set to zero.
The immediate operand is a signed value in the range -128 to +127, and for element widths of 16 bits or higher it may
also be a signed multiple of 256 in the range -32768 to +32512 (excluding 0).
The immediate is encoded in 8 bits with an optional left shift by 8. The preferred disassembly when the shift option is
specified is "#<simm8>, LSL #8". However an assembler and disassembler may also allow use of the shifted 16-bit
value unless the immediate is 0 and the shift amount is 8, which must be unambiguously described as "#0, LSL #8".

This is an alias of CPY (immediate, zeroing). This means:

• The encodings in this description are named to match the encodings of CPY (immediate, zeroing).
• The description of CPY (immediate, zeroing) gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 size 0 1 Pg 0 0 sh imm8 Zd
M

MOV <Zd>.<T>, <Pg>/Z, #<imm>{, <shift>}

is equivalent to

CPY <Zd>.<T>, <Pg>/Z, #<imm>{, <shift>}

and is always the preferred disassembly.

Assembler Symbols

<Zd> Is the name of the destination scalable vector register, encoded in the "Zd" field.

<T> Is the size specifier, encoded in “size”:

size <T>
00 B
01 H
10 S
11 D

<Pg> Is the name of the governing scalable predicate register, encoded in the "Pg" field.
<imm> Is a signed immediate in the range -128 to 127, encoded in the "imm8" field.

<shift> Is the optional left shift to apply to the immediate, defaulting to LSL #0 and encoded in “sh”:

sh <shift>
0 LSL #0
1 LSL #8

Operation

The description of CPY (immediate, zeroing) gives the operational pseudocode for this instruction.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

MOV (immediate, predicated,

Page 2228
zeroing)
MOV (immediate, unpredicated)

Move signed immediate to vector elements (unpredicated)

This is an alias of DUP (immediate). This means:

• The encodings in this description are named to match the encodings of DUP (immediate).
• The description of DUP (immediate) gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 size 1 1 1 0 0 0 1 1 sh imm8 Zd

MOV <Zd>.<T>, #<imm>{, <shift>}

is equivalent to

DUP <Zd>.<T>, #<imm>{, <shift>}

and is always the preferred disassembly.

Assembler Symbols

<Zd> Is the name of the destination scalable vector register, encoded in the "Zd" field.

<T> Is the size specifier, encoded in “size”:

size <T>
00 B
01 H
10 S
11 D

<imm> Is a signed immediate in the range -128 to 127, encoded in the "imm8" field.

<shift> Is the optional left shift to apply to the immediate, defaulting to LSL #0 and encoded in “sh”:

sh <shift>
0 LSL #0
1 LSL #8

Operation

The description of DUP (immediate) gives the operational pseudocode for this instruction.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

MOV (immediate,
Page 2229
unpredicated)
MOV (predicate, predicated, merging)

Move predicates (merging)

Read active elements from the source predicate and place in the corresponding elements of the destination predicate.
Inactive elements in the destination predicate register remain unmodified. Does not set the condition flags.

This is an alias of SEL (predicates). This means:

• The encodings in this description are named to match the encodings of SEL (predicates).
• The description of SEL (predicates) gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 0 0 0 0 Pm 0 1 Pg 1 Pn 1 Pd
S

MOV <Pd>.B, <Pg>/M, <Pn>.B

is equivalent to

SEL <Pd>.B, <Pg>, <Pn>.B, <Pd>.B

and is the preferred disassembly when Pd == Pm.

Assembler Symbols

Operation

The description of SEL (predicates) gives the operational pseudocode for this instruction.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

MOV (predicate, predicated,

Page 2230
merging)
MOV (predicate, predicated, zeroing)

Move predicates (zeroing)

Read active elements from the source predicate and place in the corresponding elements of the destination predicate.
Inactive elements in the destination predicate register are set to zero. Does not set the condition flags.

This is an alias of AND (predicates). This means:

• The encodings in this description are named to match the encodings of AND (predicates).
• The description of AND (predicates) gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 0 0 0 0 Pm 0 1 Pg 0 Pn 0 Pd
S

MOV <Pd>.B, <Pg>/Z, <Pn>.B

is equivalent to

AND <Pd>.B, <Pg>/Z, <Pn>.B, <Pn>.B

and is the preferred disassembly when S == '0' && Pn == Pm.

Assembler Symbols

Operation

The description of AND (predicates) gives the operational pseudocode for this instruction.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

MOV (predicate, predicated,

Page 2231
zeroing)
MOV (scalar, predicated)

Move general-purpose register to vector elements (predicated)

Move the general-purpose scalar source register to each active element in the destination vector. Inactive elements in
the destination vector register remain unmodified.

This is an alias of CPY (scalar). This means:

• The encodings in this description are named to match the encodings of CPY (scalar).
• The description of CPY (scalar) gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 size 1 0 1 0 0 0 1 0 1 Pg Rn Zd

MOV <Zd>.<T>, <Pg>/M, <R><n|SP>

is equivalent to

CPY <Zd>.<T>, <Pg>/M, <R><n|SP>

and is always the preferred disassembly.

Assembler Symbols

<Zd> Is the name of the destination scalable vector register, encoded in the "Zd" field.

<T> Is the size specifier, encoded in “size”:

size <T>
00 B
01 H
10 S
11 D

<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.

<R> Is a width specifier, encoded in “size”:

size <R>
01 W
x0 W
11 X

<n|SP> Is the number [0-30] of the general-purpose source register or the name SP (31), encoded in the "Rn"
field.

Operation

The description of CPY (scalar) gives the operational pseudocode for this instruction.

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

MOV (scalar, predicated) Page 2232

MOV (scalar, unpredicated)

Move general-purpose register to vector elements (unpredicated)

Unconditionally broadcast the general-purpose scalar source register into each element of the destination vector. This
instruction is unpredicated.

This is an alias of DUP (scalar). This means:

• The encodings in this description are named to match the encodings of DUP (scalar).
• The description of DUP (scalar) gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 size 1 0 0 0 0 0 0 0 1 1 1 0 Rn Zd

MOV <Zd>.<T>, <R><n|SP>

is equivalent to

DUP <Zd>.<T>, <R><n|SP>

and is always the preferred disassembly.

Assembler Symbols

<Zd> Is the name of the destination scalable vector register, encoded in the "Zd" field.

<T> Is the size specifier, encoded in “size”:

size <T>
00 B
01 H
10 S
11 D

<R> Is a width specifier, encoded in “size”:

size <R>
01 W
x0 W
11 X

<n|SP> Is the number [0-30] of the general-purpose source register or the name SP (31), encoded in the "Rn"
field.

Operation

The description of DUP (scalar) gives the operational pseudocode for this instruction.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

MOV (scalar, unpredicated) Page 2233

MOV (SIMD&FP scalar, predicated)

Move SIMD&FP scalar register to vector elements (predicated)

Move the SIMD & floating-point scalar source register to each active element in the destination vector. Inactive
elements in the destination vector register remain unmodified.

This is an alias of CPY (SIMD&FP scalar). This means:

• The encodings in this description are named to match the encodings of CPY (SIMD&FP scalar).
• The description of CPY (SIMD&FP scalar) gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 size 1 0 0 0 0 0 1 0 0 Pg Vn Zd

MOV <Zd>.<T>, <Pg>/M, <V><n>

is equivalent to

CPY <Zd>.<T>, <Pg>/M, <V><n>

and is always the preferred disassembly.

Assembler Symbols

<Zd> Is the name of the destination scalable vector register, encoded in the "Zd" field.

<T> Is the size specifier, encoded in “size”:

size <T>
00 B
01 H
10 S
11 D

<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.

<V> Is a width specifier, encoded in “size”:

size <V>
00 B
01 H
10 S
11 D

<n> Is the number [0-31] of the source SIMD&FP register, encoded in the "Vn" field.

Operation

The description of CPY (SIMD&FP scalar) gives the operational pseudocode for this instruction.

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

MOV (SIMD&FP scalar,

Page 2234
predicated)
MOV (SIMD&FP scalar, unpredicated)

Move indexed element or SIMD&FP scalar to vector (unpredicated)

Unconditionally broadcast the SIMD&FP scalar into each element of the destination vector. This instruction is
unpredicated.

This is an alias of DUP (indexed). This means:

• The encodings in this description are named to match the encodings of DUP (indexed).
• The description of DUP (indexed) gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 imm2 1 tsz 0 0 1 0 0 0 Zn Zd

MOV <Zd>.<T>, <Zn>.<T>[<imm>]

is equivalent to

DUP <Zd>.<T>, <Zn>.<T>[<imm>]

and is the preferred disassembly when BitCount(imm2:tsz) > 1.

MOV <Zd>.<T>, <V><n>

is equivalent to

DUP <Zd>.<T>, <Zn>.<T>[0]

and is the preferred disassembly when BitCount(imm2:tsz) == 1.

Assembler Symbols

<Zd> Is the name of the destination scalable vector register, encoded in the "Zd" field.

<T> Is the size specifier, encoded in “tsz”:

tsz <T>
00000 RESERVED
xxxx1 B
xxx10 H
xx100 S
x1000 D
10000 Q

<V> Is a width specifier, encoded in “tsz”:

tsz <V>
00000 RESERVED
xxxx1 B
xxx10 H
xx100 S
x1000 D
10000 Q

<n> Is the number [0-31] of the source SIMD&FP register, encoded in the "Zn" field.

Operation

The description of DUP (indexed) gives the operational pseudocode for this instruction.

MOV (SIMD&FP scalar,

Page 2235
unpredicated)
Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

MOV (SIMD&FP scalar,

Page 2236
unpredicated)
MOV (vector, predicated)

Move vector elements (predicated)

Move elements from the source vector to the corresponding elements of the destination vector. Inactive elements in
the destination vector register remain unmodified.

This is an alias of SEL (vectors). This means:

• The encodings in this description are named to match the encodings of SEL (vectors).
• The description of SEL (vectors) gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 size 1 Zm 1 1 Pg Zn Zd

MOV <Zd>.<T>, <Pg>/M, <Zn>.<T>

is equivalent to

SEL <Zd>.<T>, <Pg>, <Zn>.<T>, <Zd>.<T>

and is the preferred disassembly when Zd == Zm.

Assembler Symbols

<Zd> Is the name of the destination scalable vector register, encoded in the "Zd" field.

<T> Is the size specifier, encoded in “size”:

size <T>
00 B
01 H
10 S
11 D

<Pg> Is the name of the governing scalable predicate register, encoded in the "Pg" field.
<Zn> Is the name of the first source scalable vector register, encoded in the "Zn" field.

Operation

The description of SEL (vectors) gives the operational pseudocode for this instruction.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

MOV (vector, predicated) Page 2237

MOV (vector, unpredicated)

Move vector register (unpredicated)

Move vector register. This instruction is unpredicated.

This is an alias of ORR (vectors, unpredicated). This means:

• The encodings in this description are named to match the encodings of ORR (vectors, unpredicated).
• The description of ORR (vectors, unpredicated) gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 0 1 1 Zm 0 0 1 1 0 0 Zn Zd

MOV <Zd>.D, <Zn>.D

is equivalent to

ORR <Zd>.D, <Zn>.D, <Zn>.D

and is the preferred disassembly when Zn == Zm.

Assembler Symbols

<Zd> Is the name of the destination scalable vector register, encoded in the "Zd" field.
<Zn> Is the name of the first source scalable vector register, encoded in the "Zn" field.

Operation

The description of ORR (vectors, unpredicated) gives the operational pseudocode for this instruction.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

MOV (vector, unpredicated) Page 2238

MOVPRFX (predicated)

Move prefix (predicated)

The predicated MOVPRFX instruction is a hint to hardware that the instruction may be combined with the destructive
instruction which follows it in program order to create a single constructive operation. Since it is a hint it is also
permitted to be implemented as a discrete vector copy, and the result of executing the pair of instructions with or
without combining is identical. The choice of combined versus discrete operation may vary dynamically.
Unless the combination of a constructive operation with merging predication is specifically required, it is strongly
recommended that for performance reasons software should prefer to use the zeroing form of predicated MOVPRFX or
the unpredicated MOVPRFX instruction.
Although the operation of the instruction is defined as a simple predicated vector copy, it is required that the prefixed
instruction at PC+4 must be an SVE destructive binary or ternary instruction encoding, or a unary operation with
merging predication, but excluding other MOVPRFX instructions. The prefixed instruction must specify the same
predicate register, and have the same maximum element size (ignoring a fixed 64-bit "wide vector" operand), and the
same destination vector as the MOVPRFX instruction. The prefixed instruction must not use the destination register in
any other operand position, even if they have different names but refer to the same architectural register state. Any
other use is UNPREDICTABLE.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 0 1 0 0 0 M 0 0 1 Pg Zn Zd

MOVPRFX <Zd>.<T>, <Pg>/<ZM>, <Zn>.<T>

if !HaveSVE() then UNDEFINED;

integer esize = 8 << UInt(size);
integer g = UInt(Pg);
integer n = UInt(Zn);
integer d = UInt(Zd);
boolean merging = (M == '1');

Assembler Symbols

<Zd> Is the name of the destination scalable vector register, encoded in the "Zd" field.

<T> Is the size specifier, encoded in “size”:

size <T>
00 B
01 H
10 S
11 D

<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.

<ZM> Is the predication qualifier, encoded in “M”:

M <ZM>
0 Z
1 M

<Zn> Is the name of the source scalable vector register, encoded in the "Zn" field.

MOVPRFX (predicated) Page 2239

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) operand1 = if AnyActiveElement(mask, esize) then Z[n] else Zeros();
bits(VL) dest = Z[d];
bits(VL) result;

for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
bits(esize) element = Elem[operand1, e, esize];
Elem[result, e, esize] = element;
elsif merging then
Elem[result, e, esize] = Elem[dest, e, esize];
else
Elem[result, e, esize] = Zeros();

Z[d] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

MOVPRFX (predicated) Page 2240

MOVPRFX (unpredicated)

Move prefix (unpredicated)

The unpredicated MOVPRFX instruction is a hint to hardware that the instruction may be combined with the destructive
instruction which follows it in program order to create a single constructive operation. Since it is a hint it is also
permitted to be implemented as a discrete vector copy, and the result of executing the pair of instructions with or
without combining is identical. The choice of combined versus discrete operation may vary dynamically.
Although the operation of the instruction is defined as a simple unpredicated vector copy, it is required that the
prefixed instruction at PC+4 must be an SVE destructive binary or ternary instruction encoding, or a unary operation
with merging predication, but excluding other MOVPRFX instructions. The prefixed instruction must specify the same
destination vector as the MOVPRFX instruction. The prefixed instruction must not use the destination register in any
other operand position, even if they have different names but refer to the same architectural register state. Any other
use is UNPREDICTABLE.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 1 0 1 1 1 1 Zn Zd

MOVPRFX <Zd>, <Zn>

if !HaveSVE() then UNDEFINED;

integer n = UInt(Zn);
integer d = UInt(Zd);

Assembler Symbols

<Zd> Is the name of the destination scalable vector register, encoded in the "Zd" field.
<Zn> Is the name of the source scalable vector register, encoded in the "Zn" field.

Operation

CheckSVEEnabled();
bits(VL) result = Z[n];
Z[d] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

MOVPRFX (unpredicated) Page 2241

MOVS (predicated)

Move predicates (zeroing), setting the condition flags

This is an alias of ANDS. This means:

• The encodings in this description are named to match the encodings of ANDS.
• The description of ANDS gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 0 1 0 0 Pm 0 1 Pg 0 Pn 0 Pd
S

MOVS <Pd>.B, <Pg>/Z, <Pn>.B

is equivalent to

ANDS <Pd>.B, <Pg>/Z, <Pn>.B, <Pn>.B

and is the preferred disassembly when S == '1' && Pn == Pm.

Assembler Symbols

Operation

The description of ANDS gives the operational pseudocode for this instruction.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

MOVS (predicated) Page 2242

MOVS (unpredicated)

Move predicate (unpredicated), setting the condition flags

Read all elements from the source predicate and place in the destination predicate. This instruction is unpredicated.
Sets the FIRST (N), NONE (Z), !LAST (C) condition flags based on the predicate result, and the V flag to zero.

This is an alias of ORRS. This means:

• The encodings in this description are named to match the encodings of ORRS.
• The description of ORRS gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 1 1 0 0 Pm 0 1 Pg 0 Pn 0 Pd
S

MOVS <Pd>.B, <Pn>.B

is equivalent to

ORRS <Pd>.B, <Pn>/Z, <Pn>.B, <Pn>.B

and is the preferred disassembly when S == '1' && Pn == Pm && Pm == Pg.

Assembler Symbols

<Pd> Is the name of the destination scalable predicate register, encoded in the "Pd" field.
<Pn> Is the name of the first source scalable predicate register, encoded in the "Pn" field.

Operation

The description of ORRS gives the operational pseudocode for this instruction.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

MOVS (unpredicated) Page 2243

MSB

Multiply-subtract vectors (predicated), writing multiplicand [Zdn = Za - Zdn * Zm]

Multiply the corresponding active elements of the first and second source vectors and subtract from elements of the
third (addend) vector. Destructively place the results in the destination and first source (multiplicand) vector. Inactive
elements in the destination vector register remain unmodified.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 0 Zm 1 1 1 Pg Za Zdn
op

MSB <Zdn>.<T>, <Pg>/M, <Zm>.<T>, <Za>.<T>

if !HaveSVE() then UNDEFINED;

integer esize = 8 << UInt(size);
integer g = UInt(Pg);
integer dn = UInt(Zdn);
integer m = UInt(Zm);
integer a = UInt(Za);
boolean sub_op = TRUE;

Assembler Symbols

<Zdn> Is the name of the first source and destination scalable vector register, encoded in the "Zdn" field.

<T> Is the size specifier, encoded in “size”:

size <T>
00 B
01 H
10 S
11 D

Operation

Z[dn] = result;

MSB Page 2244

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

MSB Page 2245

MUL (immediate)

Multiply by immediate (unpredicated)

Multiply by an immediate each element of the source vector, and destructively place the results in the corresponding
elements of the source vector. The immediate is a signed 8-bit value in the range -128 to +127, inclusive. This
instruction is unpredicated.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 size 1 1 0 0 0 0 1 1 0 imm8 Zdn

MUL <Zdn>.<T>, <Zdn>.<T>, #<imm>

if !HaveSVE() then UNDEFINED;

integer esize = 8 << UInt(size);
integer dn = UInt(Zdn);
integer imm = SInt(imm8);

Assembler Symbols

<Zdn> Is the name of the source and destination scalable vector register, encoded in the "Zdn" field.

<T> Is the size specifier, encoded in “size”:

size <T>
00 B
01 H
10 S
11 D

<imm> Is the signed immediate operand, in the range -128 to 127, encoded in the "imm8" field.

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(VL) operand1 = Z[dn];
bits(VL) result;

for e = 0 to elements-1
integer element1 = SInt(Elem[operand1, e, esize]);
Elem[result, e, esize] = (element1 * imm)<esize-1:0>;

Z[dn] = result;

Operational information

This instruction might be immediately preceded in program order by a MOVPRFX instruction. The MOVPRFX instruction
must conform to all of the following requirements, otherwise the behavior of the MOVPRFX and this instruction is
UNPREDICTABLE:
• The MOVPRFX instruction must be unpredicated.
• The MOVPRFX instruction must specify the same destination register as this instruction.
• The destination register must not refer to architectural register state referenced by any other source operand
register of this instruction.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

MUL (immediate) Page 2246

MUL (vectors)

Multiply vectors (predicated)

Multiply active elements of the first source vector by corresponding elements of the second source vector and
destructively place the results in the corresponding elements of the first source vector. Inactive elements in the
destination vector register remain unmodified.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 0 1 0 0 0 0 0 0 0 Pg Zm Zdn
H U

MUL <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>

if !HaveSVE() then UNDEFINED;

integer esize = 8 << UInt(size);
integer g = UInt(Pg);
integer dn = UInt(Zdn);
integer m = UInt(Zm);

Assembler Symbols

<Zdn> Is the name of the first source and destination scalable vector register, encoded in the "Zdn" field.

<T> Is the size specifier, encoded in “size”:

size <T>
00 B
01 H
10 S
11 D

<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Zm> Is the name of the second source scalable vector register, encoded in the "Zm" field.

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) operand1 = Z[dn];
bits(VL) operand2 = if AnyActiveElement(mask, esize) then Z[m] else Zeros();
bits(VL) result;

for e = 0 to elements-1
integer element1 = UInt(Elem[operand1, e, esize]);
integer element2 = UInt(Elem[operand2, e, esize]);
if ElemP[mask, e, esize] == '1' then
integer product = element1 * element2;
Elem[result, e, esize] = product<esize-1:0>;
else
Elem[result, e, esize] = Elem[operand1, e, esize];

Z[dn] = result;

Operational information

MUL (vectors) Page 2247

• The destination register must not refer to architectural register state referenced by any other source operand
register of this instruction.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

MUL (vectors) Page 2248

NAND

Bitwise NAND predicates

Bitwise NAND active elements of the second source predicate with corresponding elements of the first source
predicate and place the results in the corresponding elements of the destination predicate. Inactive elements in the
destination predicate register are set to zero. Does not set the condition flags.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 1 0 0 0 Pm 0 1 Pg 1 Pn 1 Pd
S

NAND <Pd>.B, <Pg>/Z, <Pn>.B, <Pm>.B

if !HaveSVE() then UNDEFINED;

integer esize = 8;
integer g = UInt(Pg);
integer n = UInt(Pn);
integer m = UInt(Pm);
integer d = UInt(Pd);
boolean setflags = FALSE;

Assembler Symbols

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(PL) operand1 = P[n];
bits(PL) operand2 = P[m];
bits(PL) result;

for e = 0 to elements-1
bit element1 = ElemP[operand1, e, esize];
bit element2 = ElemP[operand2, e, esize];
if ElemP[mask, e, esize] == '1' then
ElemP[result, e, esize] = NOT(element1 AND element2);
else
ElemP[result, e, esize] = '0';

if setflags then
PSTATE.<N,Z,C,V> = PredTest(mask, result, esize);
P[d] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

NAND Page 2249

NANDS

Bitwise NAND predicates, setting the condition flags

Bitwise NAND active elements of the second source predicate with corresponding elements of the first source
predicate and place the results in the corresponding elements of the destination predicate. Inactive elements in the
destination predicate register are set to zero. Sets the FIRST (N), NONE (Z), !LAST (C) condition flags based on the
predicate result, and the V flag to zero.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 1 1 0 0 Pm 0 1 Pg 1 Pn 1 Pd
S

NANDS <Pd>.B, <Pg>/Z, <Pn>.B, <Pm>.B

if !HaveSVE() then UNDEFINED;

integer esize = 8;
integer g = UInt(Pg);
integer n = UInt(Pn);
integer m = UInt(Pm);
integer d = UInt(Pd);
boolean setflags = TRUE;

Assembler Symbols

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(PL) operand1 = P[n];
bits(PL) operand2 = P[m];
bits(PL) result;

if setflags then
PSTATE.<N,Z,C,V> = PredTest(mask, result, esize);
P[d] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

NANDS Page 2250

NEG

Negate (predicated)

Negate the signed integer value in each active element of the source vector, and place the results in the corresponding
elements of the destination vector. Inactive elements in the destination vector register remain unmodified.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 0 1 0 1 1 1 1 0 1 Pg Zn Zd

NEG <Zd>.<T>, <Pg>/M, <Zn>.<T>

if !HaveSVE() then UNDEFINED;

integer esize = 8 << UInt(size);
integer g = UInt(Pg);
integer n = UInt(Zn);
integer d = UInt(Zd);

Assembler Symbols

<Zd> Is the name of the destination scalable vector register, encoded in the "Zd" field.

<T> Is the size specifier, encoded in “size”:

size <T>
00 B
01 H
10 S
11 D

<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Zn> Is the name of the source scalable vector register, encoded in the "Zn" field.

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) operand = if AnyActiveElement(mask, esize) then Z[n] else Zeros();
bits(VL) result = Z[d];

for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
integer element = SInt(Elem[operand, e, esize]);
element = -element;
Elem[result, e, esize] = element<esize-1:0>;

Z[d] = result;

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

NEG Page 2251

NOR

Bitwise NOR predicates

Bitwise NOR active elements of the second source predicate with corresponding elements of the first source predicate
and place the results in the corresponding elements of the destination predicate. Inactive elements in the destination
predicate register are set to zero. Does not set the condition flags.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 1 0 0 0 Pm 0 1 Pg 1 Pn 0 Pd
S

NOR <Pd>.B, <Pg>/Z, <Pn>.B, <Pm>.B

if !HaveSVE() then UNDEFINED;

integer esize = 8;
integer g = UInt(Pg);
integer n = UInt(Pn);
integer m = UInt(Pm);
integer d = UInt(Pd);
boolean setflags = FALSE;

Assembler Symbols

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(PL) operand1 = P[n];
bits(PL) operand2 = P[m];
bits(PL) result;

if setflags then
PSTATE.<N,Z,C,V> = PredTest(mask, result, esize);
P[d] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

NOR Page 2252

NORS

Bitwise NOR predicates, setting the condition flags

Bitwise NOR active elements of the second source predicate with corresponding elements of the first source predicate
and place the results in the corresponding elements of the destination predicate. Inactive elements in the destination
predicate register are set to zero. Sets the FIRST (N), NONE (Z), !LAST (C) condition flags based on the predicate result,
and the V flag to zero.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 1 1 0 0 Pm 0 1 Pg 1 Pn 0 Pd
S

NORS <Pd>.B, <Pg>/Z, <Pn>.B, <Pm>.B

if !HaveSVE() then UNDEFINED;

integer esize = 8;
integer g = UInt(Pg);
integer n = UInt(Pn);
integer m = UInt(Pm);
integer d = UInt(Pd);
boolean setflags = TRUE;

Assembler Symbols

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(PL) operand1 = P[n];
bits(PL) operand2 = P[m];
bits(PL) result;

if setflags then
PSTATE.<N,Z,C,V> = PredTest(mask, result, esize);
P[d] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

NORS Page 2253

NOT (predicate)

Bitwise invert predicate

Bitwise invert each active element of the source predicate, and place the results in the corresponding elements of the
destination predicate. Inactive elements in the destination predicate register are set to zero. Does not set the
condition flags.

This is an alias of EOR (predicates). This means:

• The encodings in this description are named to match the encodings of EOR (predicates).
• The description of EOR (predicates) gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 0 0 0 0 Pm 0 1 Pg 1 Pn 0 Pd
S

NOT <Pd>.B, <Pg>/Z, <Pn>.B

is equivalent to

EOR <Pd>.B, <Pg>/Z, <Pn>.B, <Pg>.B

and is the preferred disassembly when Pm == Pg.

Assembler Symbols

Operation

The description of EOR (predicates) gives the operational pseudocode for this instruction.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

NOT (predicate) Page 2254

NOT (vector)

Bitwise invert vector (predicated)

Bitwise invert each active element of the source vector, and place the results in the corresponding elements of the
destination vector. Inactive elements in the destination vector register remain unmodified.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 0 1 1 1 1 0 1 0 1 Pg Zn Zd

NOT <Zd>.<T>, <Pg>/M, <Zn>.<T>

if !HaveSVE() then UNDEFINED;

integer esize = 8 << UInt(size);
integer g = UInt(Pg);
integer n = UInt(Zn);
integer d = UInt(Zd);

Assembler Symbols

<Zd> Is the name of the destination scalable vector register, encoded in the "Zd" field.

<T> Is the size specifier, encoded in “size”:

size <T>
00 B
01 H
10 S
11 D

<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Zn> Is the name of the source scalable vector register, encoded in the "Zn" field.

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) operand = if AnyActiveElement(mask, esize) then Z[n] else Zeros();
bits(VL) result = Z[d];

for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
bits(esize) element = Elem[operand, e, esize];
Elem[result, e, esize] = NOT element;

Z[d] = result;

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

NOT (vector) Page 2255

NOTS

Bitwise invert predicate, setting the condition flags

This is an alias of EORS. This means:

• The encodings in this description are named to match the encodings of EORS.
• The description of EORS gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 0 1 0 0 Pm 0 1 Pg 1 Pn 0 Pd
S

NOTS <Pd>.B, <Pg>/Z, <Pn>.B

is equivalent to

EORS <Pd>.B, <Pg>/Z, <Pn>.B, <Pg>.B

and is the preferred disassembly when Pm == Pg.

Assembler Symbols

Operation

The description of EORS gives the operational pseudocode for this instruction.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

NOTS Page 2256

ORN (immediate)

Bitwise inclusive OR with inverted immediate (unpredicated)

Bitwise inclusive OR an inverted immediate with each 64-bit element of the source vector, and destructively place the
results in the corresponding elements of the source vector. The immediate is a 64-bit value consisting of a single run of
ones or zeros repeating every 2, 4, 8, 16, 32 or 64 bits. This instruction is unpredicated.

This is a pseudo-instruction of ORR (immediate). This means:

• The encodings in this description are named to match the encodings of ORR (immediate).
• The assembler syntax is used only for assembly, and is not used on disassembly.
• The description of ORR (immediate) gives the operational pseudocode for this instruction.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 0 0 0 0 0 0 imm13 Zdn

ORN <Zdn>.<T>, <Zdn>.<T>, #<const>

is equivalent to

ORR <Zdn>.<T>, <Zdn>.<T>, #(-<const> - 1)

Assembler Symbols

<Zdn> Is the name of the source and destination scalable vector register, encoded in the "Zdn" field.

<T> Is the size specifier, encoded in “imm13<12>:imm13<5:0>”:

imm13<12> imm13<5:0> <T>

0 0xxxxx S
0 10xxxx H
0 110xxx B
0 1110xx B
0 11110x B
0 111110 RESERVED
0 111111 RESERVED
1 xxxxxx D

<const> Is a 64, 32, 16 or 8-bit bitmask consisting of replicated 2, 4, 8, 16, 32 or 64 bit fields, each field
containing a rotated run of non-zero bits, encoded in the "imm13" field.

Operation

The description of ORR (immediate) gives the operational pseudocode for this instruction.

Operational information

This instruction might be immediately preceded in program order by a MOVPRFX instruction. The MOVPRFX instruction
must conform to all of the following requirements, otherwise the behavior of the MOVPRFX and this instruction is
UNPREDICTABLE:
• The MOVPRFX instruction must be unpredicated.
• The MOVPRFX instruction must specify the same destination register as this instruction.
• The destination register must not refer to architectural register state referenced by any other source operand
register of this instruction.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

ORN (immediate) Page 2257

ORN (predicates)

Bitwise inclusive OR inverted predicate

Bitwise inclusive OR inverted active elements of the second source predicate with corresponding elements of the first
source predicate and place the results in the corresponding elements of the destination predicate. Inactive elements in
the destination predicate register are set to zero. Does not set the condition flags.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 1 0 0 0 Pm 0 1 Pg 0 Pn 1 Pd
S

ORN <Pd>.B, <Pg>/Z, <Pn>.B, <Pm>.B

if !HaveSVE() then UNDEFINED;

integer esize = 8;
integer g = UInt(Pg);
integer n = UInt(Pn);
integer m = UInt(Pm);
integer d = UInt(Pd);
boolean setflags = FALSE;

Assembler Symbols

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(PL) operand1 = P[n];
bits(PL) operand2 = P[m];
bits(PL) result;

if setflags then
PSTATE.<N,Z,C,V> = PredTest(mask, result, esize);
P[d] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

ORN (predicates) Page 2258

ORNS

Bitwise inclusive OR inverted predicate, setting the condition flags

Bitwise inclusive OR inverted active elements of the second source predicate with corresponding elements of the first
source predicate and place the results in the corresponding elements of the destination predicate. Inactive elements in
the destination predicate register are set to zero. Sets the FIRST (N), NONE (Z), !LAST (C) condition flags based on the
predicate result, and the V flag to zero.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 1 1 0 0 Pm 0 1 Pg 0 Pn 1 Pd
S

ORNS <Pd>.B, <Pg>/Z, <Pn>.B, <Pm>.B

if !HaveSVE() then UNDEFINED;

integer esize = 8;
integer g = UInt(Pg);
integer n = UInt(Pn);
integer m = UInt(Pm);
integer d = UInt(Pd);
boolean setflags = TRUE;

Assembler Symbols

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(PL) operand1 = P[n];
bits(PL) operand2 = P[m];
bits(PL) result;

if setflags then
PSTATE.<N,Z,C,V> = PredTest(mask, result, esize);
P[d] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

ORNS Page 2259

ORR (immediate)

Bitwise inclusive OR with immediate (unpredicated)

Bitwise inclusive OR an immediate with each 64-bit element of the source vector, and destructively place the results in
the corresponding elements of the source vector. The immediate is a 64-bit value consisting of a single run of ones or
zeros repeating every 2, 4, 8, 16, 32 or 64 bits. This instruction is unpredicated.
This instruction is used by the pseudo-instruction ORN (immediate).
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 0 0 0 0 0 0 imm13 Zdn

ORR <Zdn>.<T>, <Zdn>.<T>, #<const>

if !HaveSVE() then UNDEFINED;

integer dn = UInt(Zdn);
bits(64) imm;
(imm, -) = DecodeBitMasks(imm13<12>, imm13<5:0>, imm13<11:6>, TRUE);

Assembler Symbols

<Zdn> Is the name of the source and destination scalable vector register, encoded in the "Zdn" field.

<T> Is the size specifier, encoded in “imm13<12>:imm13<5:0>”:

imm13<12> imm13<5:0> <T>

0 0xxxxx S
0 10xxxx H
0 110xxx B
0 1110xx B
0 11110x B
0 111110 RESERVED
0 111111 RESERVED
1 xxxxxx D

<const> Is a 64, 32, 16 or 8-bit bitmask consisting of replicated 2, 4, 8, 16, 32 or 64 bit fields, each field
containing a rotated run of non-zero bits, encoded in the "imm13" field.

Operation

CheckSVEEnabled();
integer elements = VL DIV 64;
bits(VL) operand = Z[dn];
bits(VL) result;

for e = 0 to elements-1
bits(64) element1 = Elem[operand, e, 64];
Elem[result, e, 64] = element1 OR imm;

Z[dn] = result;

Operational information

This instruction might be immediately preceded in program order by a MOVPRFX instruction. The MOVPRFX instruction
must conform to all of the following requirements, otherwise the behavior of the MOVPRFX and this instruction is
UNPREDICTABLE:
• The MOVPRFX instruction must be unpredicated.
• The MOVPRFX instruction must specify the same destination register as this instruction.
• The destination register must not refer to architectural register state referenced by any other source operand
register of this instruction.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

ORR (immediate) Page 2260

ORR (predicates)

Bitwise inclusive OR predicates

Bitwise inclusive OR active elements of the second source predicate with corresponding elements of the first source
predicate and place the results in the corresponding elements of the destination predicate. Inactive elements in the
destination predicate register are set to zero. Does not set the condition flags.
This instruction is used by the alias MOV.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 1 0 0 0 Pm 0 1 Pg 0 Pn 0 Pd
S

ORR <Pd>.B, <Pg>/Z, <Pn>.B, <Pm>.B

if !HaveSVE() then UNDEFINED;

integer esize = 8;
integer g = UInt(Pg);
integer n = UInt(Pn);
integer m = UInt(Pm);
integer d = UInt(Pd);
boolean setflags = FALSE;

Assembler Symbols

Alias Conditions

Alias Is preferred when

MOV S == '0' && Pn == Pm && Pm == Pg

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(PL) operand1 = P[n];
bits(PL) operand2 = P[m];
bits(PL) result;

if setflags then
PSTATE.<N,Z,C,V> = PredTest(mask, result, esize);
P[d] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

ORR (predicates) Page 2261

ORR (vectors, predicated)

Bitwise inclusive OR vectors (predicated)

Bitwise inclusive OR active elements of the second source vector with corresponding elements of the first source
vector and destructively place the results in the corresponding elements of the first source vector. Inactive elements in
the destination vector register remain unmodified.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 0 1 1 0 0 0 0 0 0 Pg Zm Zdn

ORR <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>

if !HaveSVE() then UNDEFINED;

integer esize = 8 << UInt(size);
integer g = UInt(Pg);
integer dn = UInt(Zdn);
integer m = UInt(Zm);

Assembler Symbols

<Zdn> Is the name of the first source and destination scalable vector register, encoded in the "Zdn" field.

<T> Is the size specifier, encoded in “size”:

size <T>
00 B
01 H
10 S
11 D

<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Zm> Is the name of the second source scalable vector register, encoded in the "Zm" field.

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) operand1 = Z[dn];
bits(VL) operand2 = if AnyActiveElement(mask, esize) then Z[m] else Zeros();
bits(VL) result;

for e = 0 to elements-1
bits(esize) element1 = Elem[operand1, e, esize];
bits(esize) element2 = Elem[operand2, e, esize];
if ElemP[mask, e, esize] == '1' then
Elem[result, e, esize] = element1 OR element2;
else
Elem[result, e, esize] = Elem[operand1, e, esize];

Z[dn] = result;

Operational information

ORR (vectors, predicated) Page 2262

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

ORR (vectors, predicated) Page 2263

ORR (vectors, unpredicated)

Bitwise inclusive OR vectors (unpredicated)

Bitwise inclusive OR all elements of the second source vector with corresponding elements of the first source vector
and place the first in the corresponding elements of the destination vector. This instruction is unpredicated.
This instruction is used by the alias MOV (vector, unpredicated).
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 0 1 1 Zm 0 0 1 1 0 0 Zn Zd

ORR <Zd>.D, <Zn>.D, <Zm>.D

if !HaveSVE() then UNDEFINED;

integer n = UInt(Zn);
integer m = UInt(Zm);
integer d = UInt(Zd);

Assembler Symbols

Alias Conditions

Alias Is preferred when

MOV (vector, unpredicated) Zn == Zm

Operation

CheckSVEEnabled();
bits(VL) operand1 = Z[n];
bits(VL) operand2 = Z[m];

Z[d] = operand1 OR operand2;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

ORR (vectors, unpredicated) Page 2264

ORRS

Bitwise inclusive OR predicates, setting the condition flags

Bitwise inclusive OR active elements of the second source predicate with corresponding elements of the first source
predicate and place the results in the corresponding elements of the destination predicate. Inactive elements in the
destination predicate register are set to zero. Sets the FIRST (N), NONE (Z), !LAST (C) condition flags based on the
predicate result, and the V flag to zero.
This instruction is used by the alias MOVS (unpredicated).
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 1 1 0 0 Pm 0 1 Pg 0 Pn 0 Pd
S

ORRS <Pd>.B, <Pg>/Z, <Pn>.B, <Pm>.B

if !HaveSVE() then UNDEFINED;

integer esize = 8;
integer g = UInt(Pg);
integer n = UInt(Pn);
integer m = UInt(Pm);
integer d = UInt(Pd);
boolean setflags = TRUE;

Assembler Symbols

Alias Conditions

Alias Is preferred when

MOVS (unpredicated) S == '1' && Pn == Pm && Pm == Pg

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(PL) operand1 = P[n];
bits(PL) operand2 = P[m];
bits(PL) result;

if setflags then
PSTATE.<N,Z,C,V> = PredTest(mask, result, esize);
P[d] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

ORRS Page 2265

ORV

Bitwise inclusive OR reduction to scalar

Bitwise inclusive OR horizontally across all lanes of a vector, and place the result in the SIMD&FP scalar destination
register. Inactive elements in the source vector are treated as zero.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 0 1 1 0 0 0 0 0 1 Pg Zn Vd

ORV <V><d>, <Pg>, <Zn>.<T>

if !HaveSVE() then UNDEFINED;

integer esize = 8 << UInt(size);
integer g = UInt(Pg);
integer n = UInt(Zn);
integer d = UInt(Vd);

Assembler Symbols

<V> Is a width specifier, encoded in “size”:

size <V>
00 B
01 H
10 S
11 D

<T> Is the size specifier, encoded in “size”:

size <T>
00 B
01 H
10 S
11 D

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) operand = if AnyActiveElement(mask, esize) then Z[n] else Zeros();
bits(esize) result = Zeros(esize);

for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
result = result OR Elem[operand, e, esize];

V[d] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

ORV Page 2266

PFALSE

Set all predicate elements to false

Set all elements in the destination predicate to false.

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 0 0 0 1 1 0 0 0 1 1 1 0 0 1 0 0 0 0 0 0 Pd
S

PFALSE <Pd>.B

if !HaveSVE() then UNDEFINED;

integer d = UInt(Pd);

Assembler Symbols

<Pd> Is the name of the destination scalable predicate register, encoded in the "Pd" field.

Operation

CheckSVEEnabled();
P[d] = Zeros(PL);

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

PFALSE Page 2267

PFIRST

Set the first active predicate element to true

Sets the first active element in the destination predicate to true, otherwise elements from the source predicate are
passed through unchanged. Sets the FIRST (N), NONE (Z), !LAST (C) condition flags based on the predicate result, and
the V flag to zero.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 0 1 0 1 1 0 0 0 1 1 0 0 0 0 0 Pg 0 Pdn
S

PFIRST <Pdn>.B, <Pg>, <Pdn>.B

if !HaveSVE() then UNDEFINED;

integer esize = 8;
integer g = UInt(Pg);
integer dn = UInt(Pdn);

Assembler Symbols

<Pdn> Is the name of the source and destination scalable predicate register, encoded in the "Pdn" field.
<Pg> Is the name of the governing scalable predicate register, encoded in the "Pg" field.

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(PL) result = P[dn];
integer first = -1;

for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' && first == -1 then
first = e;

if first >= 0 then

ElemP[result, first, esize] = '1';

PSTATE.<N,Z,C,V> = PredTest(mask, result, esize);

P[dn] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

PFIRST Page 2268

PNEXT

Find next active predicate

An instruction used to construct a loop which iterates over all active elements in a predicate. If all source predicate
elements are false it sets the first active predicate element in the destination predicate to true. Otherwise it
determines the next active predicate element following the last true source predicate element, and if one is found sets
the corresponding destination predicate element to true. All other destination predicate elements are set to false. Sets
the FIRST (N), NONE (Z), !LAST (C) condition flags based on the predicate result, and the V flag to zero.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 size 0 1 1 0 0 1 1 1 0 0 0 1 0 Pg 0 Pdn

PNEXT <Pdn>.<T>, <Pg>, <Pdn>.<T>

if !HaveSVE() then UNDEFINED;

integer esize = 8 << UInt(size);
integer g = UInt(Pg);
integer dn = UInt(Pdn);

Assembler Symbols

<Pdn> Is the name of the source and destination scalable predicate register, encoded in the "Pdn" field.

<T> Is the size specifier, encoded in “size”:

size <T>
00 B
01 H
10 S
11 D

<Pg> Is the name of the governing scalable predicate register, encoded in the "Pg" field.

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(PL) operand = P[dn];
bits(PL) result;

integer next = LastActiveElement(operand, esize) + 1;

while next < elements && (ElemP[mask, next, esize] == '0') do

next = next + 1;

result = Zeros();
if next < elements then
ElemP[result, next, esize] = '1';

PSTATE.<N,Z,C,V> = PredTest(mask, result, esize);

P[dn] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

PNEXT Page 2269

PRFB (scalar plus immediate)

Contiguous prefetch bytes (immediate index)

Contiguous prefetch of byte elements from the memory address generated by a 64-bit scalar base and immediate
index in the range -32 to 31 which is multiplied by the vector's in-memory size, irrespective of predication, and added
to the base address.
The predicate may be used to suppress prefetches from unwanted addresses.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 0 0 0 1 0 1 1 1 imm6 0 0 0 Pg Rn 0 prfop
msz<1>msz<0>

PRFB <prfop>, <Pg>, [<Xn|SP>{, #<imm>, MUL VL}]

if !HaveSVE() then UNDEFINED;

integer esize = 8;
integer g = UInt(Pg);
integer n = UInt(Rn);
integer level = UInt(prfop<2:1>);
boolean stream = (prfop<0> == '1');
pref_hint = if prfop<3> == '0' then Prefetch_READ else Prefetch_WRITE;
integer scale = 0;
integer offset = SInt(imm6);

Assembler Symbols

<prfop> Is the prefetch operation specifier, encoded in “prfop”:

prfop <prfop>
0000 PLDL1KEEP
0001 PLDL1STRM
0010 PLDL2KEEP
0011 PLDL2STRM
0100 PLDL3KEEP
0101 PLDL3STRM
x11x #uimm4
1000 PSTL1KEEP
1001 PSTL1STRM
1010 PSTL2KEEP
1011 PSTL2STRM
1100 PSTL3KEEP
1101 PSTL3STRM

<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<imm> Is the optional signed immediate vector offset, in the range -32 to 31, defaulting to 0, encoded in the
"imm6" field.

PRFB (scalar plus immediate) Page 2270

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(64) base;

if AnyActiveElement(mask, esize) then

base = if n == 31 then SP[] else X[n];

for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
integer eoff = (offset * elements) + e;
bits(64) addr = base + (eoff << scale);
Hint_Prefetch(addr, pref_hint, level, stream);

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

PRFB (scalar plus immediate) Page 2271

PRFB (scalar plus scalar)

Contiguous prefetch bytes (scalar index)

Contiguous prefetch of byte elements from the memory address generated by a 64-bit scalar base and scalar index
which is added to the base address. After each element prefetch the index value is incremented, but the index register
is not updated.
The predicate may be used to suppress prefetches from unwanted addresses.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 0 0 0 1 0 0 0 0 0 Rm 1 1 0 Pg Rn 0 prfop
msz<1>msz<0>

PRFB <prfop>, <Pg>, [<Xn|SP>, <Xm>]

if !HaveSVE() then UNDEFINED;

if Rm == '11111' then UNDEFINED;
integer esize = 8;
integer g = UInt(Pg);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer level = UInt(prfop<2:1>);
boolean stream = (prfop<0> == '1');
pref_hint = if prfop<3> == '0' then Prefetch_READ else Prefetch_WRITE;
integer scale = 0;

Assembler Symbols

<prfop> Is the prefetch operation specifier, encoded in “prfop”:

<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<Xm> Is the 64-bit name of the general-purpose offset register, encoded in the "Rm" field.

PRFB (scalar plus scalar) Page 2272

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(64) base;
bits(64) offset;

if AnyActiveElement(mask, esize) then

base = if n == 31 then SP[] else X[n];
offset = X[m];

for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
integer eoff = UInt(offset) + e;
bits(64) addr = base + (eoff << scale);
Hint_Prefetch(addr, pref_hint, level, stream);

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

PRFB (scalar plus scalar) Page 2273

PRFB (scalar plus vector)

Gather prefetch bytes (scalar plus vector)

Gather prefetch of bytes from the active memory addresses generated by a 64-bit scalar base plus vector index. The
index values are optionally sign or zero-extended from 32 to 64 bits. Inactive addresses are not prefetched from
memory.
The <prfop> symbol specifies the prefetch hint as a combination of three options: access type PLD for load or PST for
store; target cache level L1, L2 or L3; temporality (KEEP for temporal or STRM for non-temporal).
It has encodings from 3 classes: 32-bit scaled offset , 32-bit unpacked scaled offset and 64-bit scaled offset

32-bit scaled offset

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 0 0 0 1 0 0 0 xs 1 Zm 0 0 0 Pg Rn 0 prfop
msz<1>msz<0>

PRFB <prfop>, <Pg>, [<Xn|SP>, <Zm>.S, <mod>]

if !HaveSVE() then UNDEFINED;

integer esize = 32;
integer g = UInt(Pg);
integer n = UInt(Rn);
integer m = UInt(Zm);
integer level = UInt(prfop<2:1>);
boolean stream = (prfop<0> == '1');
pref_hint = if prfop<3> == '0' then Prefetch_READ else Prefetch_WRITE;
integer offs_size = 32;
boolean offs_unsigned = (xs == '0');
integer scale = 0;

32-bit unpacked scaled offset

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 0 1 0 0 0 xs 1 Zm 0 0 0 Pg Rn 0 prfop
msz<1>msz<0>

PRFB <prfop>, <Pg>, [<Xn|SP>, <Zm>.D, <mod>]

if !HaveSVE() then UNDEFINED;

integer esize = 64;
integer g = UInt(Pg);
integer n = UInt(Rn);
integer m = UInt(Zm);
integer level = UInt(prfop<2:1>);
boolean stream = (prfop<0> == '1');
pref_hint = if prfop<3> == '0' then Prefetch_READ else Prefetch_WRITE;
integer offs_size = 32;
boolean offs_unsigned = (xs == '0');
integer scale = 0;

64-bit scaled offset

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 0 1 0 0 0 1 1 Zm 1 0 0 Pg Rn 0 prfop
msz<1>msz<0>

PRFB (scalar plus vector) Page 2274

PRFB <prfop>, <Pg>, [<Xn|SP>, <Zm>.D]

if !HaveSVE() then UNDEFINED;

integer esize = 64;
integer g = UInt(Pg);
integer n = UInt(Rn);
integer m = UInt(Zm);
integer level = UInt(prfop<2:1>);
boolean stream = (prfop<0> == '1');
pref_hint = if prfop<3> == '0' then Prefetch_READ else Prefetch_WRITE;
integer offs_size = 64;
boolean offs_unsigned = TRUE;
integer scale = 0;

Assembler Symbols

<prfop> Is the prefetch operation specifier, encoded in “prfop”:

<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<Zm> Is the name of the offset scalable vector register, encoded in the "Zm" field.

<mod> Is the index extend and shift specifier, encoded in “xs”:

xs <mod>
0 UXTW
1 SXTW

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(64) base;
bits(VL) offset;

if AnyActiveElement(mask, esize) then

base = if n == 31 then SP[] else X[n];
offset = Z[m];

for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
integer off = Int(Elem[offset, e, esize]<offs_size-1:0>, offs_unsigned);
bits(64) addr = base + (off << scale);
Hint_Prefetch(addr, pref_hint, level, stream);

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

PRFB (scalar plus vector) Page 2275

PRFB (vector plus immediate)

Gather prefetch bytes (vector plus immediate)

Gather prefetch of bytes from the active memory addresses generated by a vector base plus immediate index. The
index is in the range 0 to 31. Inactive addresses are not prefetched from memory.
The <prfop> symbol specifies the prefetch hint as a combination of three options: access type PLD for load or PST for
store; target cache level L1, L2 or L3; temporality (KEEP for temporal or STRM for non-temporal).
It has encodings from 2 classes: 32-bit element and 64-bit element

32-bit element

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 0 0 0 1 0 0 0 0 0 imm5 1 1 1 Pg Zn 0 prfop
msz<1>msz<0>

PRFB <prfop>, <Pg>, [<Zn>.S{, #<imm>}]

if !HaveSVE() then UNDEFINED;

integer esize = 32;
integer g = UInt(Pg);
integer n = UInt(Zn);
integer level = UInt(prfop<2:1>);
boolean stream = (prfop<0> == '1');
pref_hint = if prfop<3> == '0' then Prefetch_READ else Prefetch_WRITE;
integer scale = 0;
integer offset = UInt(imm5);

64-bit element

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 0 1 0 0 0 0 0 imm5 1 1 1 Pg Zn 0 prfop
msz<1>msz<0>

PRFB <prfop>, <Pg>, [<Zn>.D{, #<imm>}]

if !HaveSVE() then UNDEFINED;

integer esize = 64;
integer g = UInt(Pg);
integer n = UInt(Zn);
integer level = UInt(prfop<2:1>);
boolean stream = (prfop<0> == '1');
pref_hint = if prfop<3> == '0' then Prefetch_READ else Prefetch_WRITE;
integer scale = 0;
integer offset = UInt(imm5);

Assembler Symbols

<prfop> Is the prefetch operation specifier, encoded in “prfop”:

PRFB (vector plus immediate) Page 2276

<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Zn> Is the name of the base scalable vector register, encoded in the "Zn" field.
<imm> Is the optional unsigned immediate byte offset, in the range 0 to 31, defaulting to 0, encoded in the
"imm5" field.

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) base;

if AnyActiveElement(mask, esize) then

base = Z[n];

for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
bits(64) addr = ZeroExtend(Elem[base, e, esize], 64) + (offset << scale);
Hint_Prefetch(addr, pref_hint, level, stream);

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

PRFB (vector plus immediate) Page 2277

PRFD (scalar plus immediate)

Contiguous prefetch doublewords (immediate index)

Contiguous prefetch of doubleword elements from the memory address generated by a 64-bit scalar base and
immediate index in the range -32 to 31 which is multiplied by the vector's in-memory size, irrespective of predication,
and added to the base address.
The predicate may be used to suppress prefetches from unwanted addresses.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 0 0 0 1 0 1 1 1 imm6 0 1 1 Pg Rn 0 prfop
msz<1>msz<0>

PRFD <prfop>, <Pg>, [<Xn|SP>{, #<imm>, MUL VL}]

if !HaveSVE() then UNDEFINED;

integer esize = 64;
integer g = UInt(Pg);
integer n = UInt(Rn);
integer level = UInt(prfop<2:1>);
boolean stream = (prfop<0> == '1');
pref_hint = if prfop<3> == '0' then Prefetch_READ else Prefetch_WRITE;
integer scale = 3;
integer offset = SInt(imm6);

Assembler Symbols

<prfop> Is the prefetch operation specifier, encoded in “prfop”:

PRFD (scalar plus immediate) Page 2278

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(64) base;

if AnyActiveElement(mask, esize) then

base = if n == 31 then SP[] else X[n];

for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
integer eoff = (offset * elements) + e;
bits(64) addr = base + (eoff << scale);
Hint_Prefetch(addr, pref_hint, level, stream);

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

PRFD (scalar plus immediate) Page 2279

PRFD (scalar plus scalar)

Contiguous prefetch doublewords (scalar index)

Contiguous prefetch of doubleword elements from the memory address generated by a 64-bit scalar base and scalar
index which is multiplied by 8 and added to the base address. After each element prefetch the index value is
incremented, but the index register is not updated.
The predicate may be used to suppress prefetches from unwanted addresses.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 0 0 0 1 0 1 1 0 0 Rm 1 1 0 Pg Rn 0 prfop
msz<1>msz<0>

PRFD <prfop>, <Pg>, [<Xn|SP>, <Xm>, LSL #3]

if !HaveSVE() then UNDEFINED;

if Rm == '11111' then UNDEFINED;
integer esize = 64;
integer g = UInt(Pg);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer level = UInt(prfop<2:1>);
boolean stream = (prfop<0> == '1');
pref_hint = if prfop<3> == '0' then Prefetch_READ else Prefetch_WRITE;
integer scale = 3;

Assembler Symbols

<prfop> Is the prefetch operation specifier, encoded in “prfop”:

<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<Xm> Is the 64-bit name of the general-purpose offset register, encoded in the "Rm" field.

PRFD (scalar plus scalar) Page 2280

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(64) base;
bits(64) offset;

if AnyActiveElement(mask, esize) then

base = if n == 31 then SP[] else X[n];
offset = X[m];

for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
integer eoff = UInt(offset) + e;
bits(64) addr = base + (eoff << scale);
Hint_Prefetch(addr, pref_hint, level, stream);

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

PRFD (scalar plus scalar) Page 2281

PRFD (scalar plus vector)

Gather prefetch doublewords (scalar plus vector)

Gather prefetch of doublewords from the active memory addresses generated by a 64-bit scalar base plus vector
index. The index values are optionally first sign or zero-extended from 32 to 64 bits and then multiplied by 8. Inactive
addresses are not prefetched from memory.
The <prfop> symbol specifies the prefetch hint as a combination of three options: access type PLD for load or PST for
store; target cache level L1, L2 or L3; temporality (KEEP for temporal or STRM for non-temporal).
It has encodings from 3 classes: 32-bit scaled offset , 32-bit unpacked scaled offset and 64-bit scaled offset

32-bit scaled offset

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 0 0 0 1 0 0 0 xs 1 Zm 0 1 1 Pg Rn 0 prfop
msz<1>msz<0>

PRFD <prfop>, <Pg>, [<Xn|SP>, <Zm>.S, <mod> #3]

if !HaveSVE() then UNDEFINED;

32-bit unpacked scaled offset

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 0 1 0 0 0 xs 1 Zm 0 1 1 Pg Rn 0 prfop
msz<1>msz<0>

PRFD <prfop>, <Pg>, [<Xn|SP>, <Zm>.D, <mod> #3]

if !HaveSVE() then UNDEFINED;

64-bit scaled offset

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 0 1 0 0 0 1 1 Zm 1 1 1 Pg Rn 0 prfop
msz<1>msz<0>

PRFD (scalar plus vector) Page 2282

PRFD <prfop>, <Pg>, [<Xn|SP>, <Zm>.D, LSL #3]

if !HaveSVE() then UNDEFINED;

integer esize = 64;
integer g = UInt(Pg);
integer n = UInt(Rn);
integer m = UInt(Zm);
integer level = UInt(prfop<2:1>);
boolean stream = (prfop<0> == '1');
pref_hint = if prfop<3> == '0' then Prefetch_READ else Prefetch_WRITE;
integer offs_size = 64;
boolean offs_unsigned = TRUE;
integer scale = 3;

Assembler Symbols

<prfop> Is the prefetch operation specifier, encoded in “prfop”:

<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<Zm> Is the name of the offset scalable vector register, encoded in the "Zm" field.

<mod> Is the index extend and shift specifier, encoded in “xs”:

xs <mod>
0 UXTW
1 SXTW

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(64) base;
bits(VL) offset;

if AnyActiveElement(mask, esize) then

base = if n == 31 then SP[] else X[n];
offset = Z[m];

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

PRFD (scalar plus vector) Page 2283

PRFD (vector plus immediate)

Gather prefetch doublewords (vector plus immediate)

Gather prefetch of doublewords from the active memory addresses generated by a vector base plus immediate index.
The index is a multiple of 8 in the range 0 to 248. Inactive addresses are not prefetched from memory.
The <prfop> symbol specifies the prefetch hint as a combination of three options: access type PLD for load or PST for
store; target cache level L1, L2 or L3; temporality (KEEP for temporal or STRM for non-temporal).
It has encodings from 2 classes: 32-bit element and 64-bit element

32-bit element

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 0 0 0 1 0 1 1 0 0 imm5 1 1 1 Pg Zn 0 prfop
msz<1>msz<0>

PRFD <prfop>, <Pg>, [<Zn>.S{, #<imm>}]

if !HaveSVE() then UNDEFINED;

64-bit element

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 0 1 0 1 1 0 0 imm5 1 1 1 Pg Zn 0 prfop
msz<1>msz<0>

PRFD <prfop>, <Pg>, [<Zn>.D{, #<imm>}]

if !HaveSVE() then UNDEFINED;

Assembler Symbols

<prfop> Is the prefetch operation specifier, encoded in “prfop”:

PRFD (vector plus immediate) Page 2284

<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Zn> Is the name of the base scalable vector register, encoded in the "Zn" field.
<imm> Is the optional unsigned immediate byte offset, a multiple of 8 in the range 0 to 248, defaulting to 0,
encoded in the "imm5" field.

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) base;

if AnyActiveElement(mask, esize) then

base = Z[n];

for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
bits(64) addr = ZeroExtend(Elem[base, e, esize], 64) + (offset << scale);
Hint_Prefetch(addr, pref_hint, level, stream);

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

PRFD (vector plus immediate) Page 2285

PRFH (scalar plus immediate)

Contiguous prefetch halfwords (immediate index)

Contiguous prefetch of halfword elements from the memory address generated by a 64-bit scalar base and immediate
index in the range -32 to 31 which is multiplied by the vector's in-memory size, irrespective of predication, and added
to the base address.
The predicate may be used to suppress prefetches from unwanted addresses.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 0 0 0 1 0 1 1 1 imm6 0 0 1 Pg Rn 0 prfop
msz<1>msz<0>

PRFH <prfop>, <Pg>, [<Xn|SP>{, #<imm>, MUL VL}]

if !HaveSVE() then UNDEFINED;

integer esize = 16;
integer g = UInt(Pg);
integer n = UInt(Rn);
integer level = UInt(prfop<2:1>);
boolean stream = (prfop<0> == '1');
pref_hint = if prfop<3> == '0' then Prefetch_READ else Prefetch_WRITE;
integer scale = 1;
integer offset = SInt(imm6);

Assembler Symbols

<prfop> Is the prefetch operation specifier, encoded in “prfop”:

PRFH (scalar plus immediate) Page 2286

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(64) base;

if AnyActiveElement(mask, esize) then

base = if n == 31 then SP[] else X[n];

for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
integer eoff = (offset * elements) + e;
bits(64) addr = base + (eoff << scale);
Hint_Prefetch(addr, pref_hint, level, stream);

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

PRFH (scalar plus immediate) Page 2287

PRFH (scalar plus scalar)

Contiguous prefetch halfwords (scalar index)

Contiguous prefetch of halfword elements from the memory address generated by a 64-bit scalar base and scalar
index which is multiplied by 2 and added to the base address. After each element prefetch the index value is
incremented, but the index register is not updated.
The predicate may be used to suppress prefetches from unwanted addresses.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 0 0 0 1 0 0 1 0 0 Rm 1 1 0 Pg Rn 0 prfop
msz<1>msz<0>

PRFH <prfop>, <Pg>, [<Xn|SP>, <Xm>, LSL #1]

if !HaveSVE() then UNDEFINED;

if Rm == '11111' then UNDEFINED;
integer esize = 16;
integer g = UInt(Pg);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer level = UInt(prfop<2:1>);
boolean stream = (prfop<0> == '1');
pref_hint = if prfop<3> == '0' then Prefetch_READ else Prefetch_WRITE;
integer scale = 1;

Assembler Symbols

<prfop> Is the prefetch operation specifier, encoded in “prfop”:

<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<Xm> Is the 64-bit name of the general-purpose offset register, encoded in the "Rm" field.

PRFH (scalar plus scalar) Page 2288

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(64) base;
bits(64) offset;

if AnyActiveElement(mask, esize) then

base = if n == 31 then SP[] else X[n];
offset = X[m];

for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
integer eoff = UInt(offset) + e;
bits(64) addr = base + (eoff << scale);
Hint_Prefetch(addr, pref_hint, level, stream);

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

PRFH (scalar plus scalar) Page 2289

PRFH (scalar plus vector)

Gather prefetch halfwords (scalar plus vector)

Gather prefetch of halfwords from the active memory addresses generated by a 64-bit scalar base plus vector index.
The index values are optionally first sign or zero-extended from 32 to 64 bits and then multiplied by 2. Inactive
addresses are not prefetched from memory.
The <prfop> symbol specifies the prefetch hint as a combination of three options: access type PLD for load or PST for
store; target cache level L1, L2 or L3; temporality (KEEP for temporal or STRM for non-temporal).
It has encodings from 3 classes: 32-bit scaled offset , 32-bit unpacked scaled offset and 64-bit scaled offset

32-bit scaled offset

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 0 0 0 1 0 0 0 xs 1 Zm 0 0 1 Pg Rn 0 prfop
msz<1>msz<0>

PRFH <prfop>, <Pg>, [<Xn|SP>, <Zm>.S, <mod> #1]

if !HaveSVE() then UNDEFINED;

32-bit unpacked scaled offset

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 0 1 0 0 0 xs 1 Zm 0 0 1 Pg Rn 0 prfop
msz<1>msz<0>

PRFH <prfop>, <Pg>, [<Xn|SP>, <Zm>.D, <mod> #1]

if !HaveSVE() then UNDEFINED;

64-bit scaled offset

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 0 1 0 0 0 1 1 Zm 1 0 1 Pg Rn 0 prfop
msz<1>msz<0>

PRFH (scalar plus vector) Page 2290

PRFH <prfop>, <Pg>, [<Xn|SP>, <Zm>.D, LSL #1]

if !HaveSVE() then UNDEFINED;

integer esize = 64;
integer g = UInt(Pg);
integer n = UInt(Rn);
integer m = UInt(Zm);
integer level = UInt(prfop<2:1>);
boolean stream = (prfop<0> == '1');
pref_hint = if prfop<3> == '0' then Prefetch_READ else Prefetch_WRITE;
integer offs_size = 64;
boolean offs_unsigned = TRUE;
integer scale = 1;

Assembler Symbols

<prfop> Is the prefetch operation specifier, encoded in “prfop”:

<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<Zm> Is the name of the offset scalable vector register, encoded in the "Zm" field.

<mod> Is the index extend and shift specifier, encoded in “xs”:

xs <mod>
0 UXTW
1 SXTW

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(64) base;
bits(VL) offset;

if AnyActiveElement(mask, esize) then

base = if n == 31 then SP[] else X[n];
offset = Z[m];

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

PRFH (scalar plus vector) Page 2291

PRFH (vector plus immediate)

Gather prefetch halfwords (vector plus immediate)

Gather prefetch of halfwords from the active memory addresses generated by a vector base plus immediate index. The
index is a multiple of 2 in the range 0 to 62. Inactive addresses are not prefetched from memory.
The <prfop> symbol specifies the prefetch hint as a combination of three options: access type PLD for load or PST for
store; target cache level L1, L2 or L3; temporality (KEEP for temporal or STRM for non-temporal).
It has encodings from 2 classes: 32-bit element and 64-bit element

32-bit element

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 0 0 0 1 0 0 1 0 0 imm5 1 1 1 Pg Zn 0 prfop
msz<1>msz<0>

PRFH <prfop>, <Pg>, [<Zn>.S{, #<imm>}]

if !HaveSVE() then UNDEFINED;

64-bit element

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 0 1 0 0 1 0 0 imm5 1 1 1 Pg Zn 0 prfop
msz<1>msz<0>

PRFH <prfop>, <Pg>, [<Zn>.D{, #<imm>}]

if !HaveSVE() then UNDEFINED;

Assembler Symbols

<prfop> Is the prefetch operation specifier, encoded in “prfop”:

PRFH (vector plus immediate) Page 2292

<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Zn> Is the name of the base scalable vector register, encoded in the "Zn" field.
<imm> Is the optional unsigned immediate byte offset, a multiple of 2 in the range 0 to 62, defaulting to 0,
encoded in the "imm5" field.

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) base;

if AnyActiveElement(mask, esize) then

base = Z[n];

for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
bits(64) addr = ZeroExtend(Elem[base, e, esize], 64) + (offset << scale);
Hint_Prefetch(addr, pref_hint, level, stream);

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

PRFH (vector plus immediate) Page 2293

PRFW (scalar plus immediate)

Contiguous prefetch words (immediate index)

Contiguous prefetch of word elements from the memory address generated by a 64-bit scalar base and immediate
index in the range -32 to 31 which is multiplied by the vector's in-memory size, irrespective of predication, and added
to the base address.
The predicate may be used to suppress prefetches from unwanted addresses.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 0 0 0 1 0 1 1 1 imm6 0 1 0 Pg Rn 0 prfop
msz<1>msz<0>

PRFW <prfop>, <Pg>, [<Xn|SP>{, #<imm>, MUL VL}]

if !HaveSVE() then UNDEFINED;

integer esize = 32;
integer g = UInt(Pg);
integer n = UInt(Rn);
integer level = UInt(prfop<2:1>);
boolean stream = (prfop<0> == '1');
pref_hint = if prfop<3> == '0' then Prefetch_READ else Prefetch_WRITE;
integer scale = 2;
integer offset = SInt(imm6);

Assembler Symbols

<prfop> Is the prefetch operation specifier, encoded in “prfop”:

PRFW (scalar plus immediate) Page 2294

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(64) base;

if AnyActiveElement(mask, esize) then

base = if n == 31 then SP[] else X[n];

for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
integer eoff = (offset * elements) + e;
bits(64) addr = base + (eoff << scale);
Hint_Prefetch(addr, pref_hint, level, stream);

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

PRFW (scalar plus immediate) Page 2295

PRFW (scalar plus scalar)

Contiguous prefetch words (scalar index)

Contiguous prefetch of word elements from the memory address generated by a 64-bit scalar base and scalar index
which is multiplied by 4 and added to the base address. After each element prefetch the index value is incremented,
but the index register is not updated.
The predicate may be used to suppress prefetches from unwanted addresses.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 0 0 0 1 0 1 0 0 0 Rm 1 1 0 Pg Rn 0 prfop
msz<1>msz<0>

PRFW <prfop>, <Pg>, [<Xn|SP>, <Xm>, LSL #2]

if !HaveSVE() then UNDEFINED;

if Rm == '11111' then UNDEFINED;
integer esize = 32;
integer g = UInt(Pg);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer level = UInt(prfop<2:1>);
boolean stream = (prfop<0> == '1');
pref_hint = if prfop<3> == '0' then Prefetch_READ else Prefetch_WRITE;
integer scale = 2;

Assembler Symbols

<prfop> Is the prefetch operation specifier, encoded in “prfop”:

<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<Xm> Is the 64-bit name of the general-purpose offset register, encoded in the "Rm" field.

PRFW (scalar plus scalar) Page 2296

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(64) base;
bits(64) offset;

if AnyActiveElement(mask, esize) then

base = if n == 31 then SP[] else X[n];
offset = X[m];

for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
integer eoff = UInt(offset) + e;
bits(64) addr = base + (eoff << scale);
Hint_Prefetch(addr, pref_hint, level, stream);

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

PRFW (scalar plus scalar) Page 2297

PRFW (scalar plus vector)

Gather prefetch words (scalar plus vector)

Gather prefetch of words from the active memory addresses generated by a 64-bit scalar base plus vector index. The
index values are optionally first sign or zero-extended from 32 to 64 bits and then multiplied by 4. Inactive addresses
are not prefetched from memory.
The <prfop> symbol specifies the prefetch hint as a combination of three options: access type PLD for load or PST for
store; target cache level L1, L2 or L3; temporality (KEEP for temporal or STRM for non-temporal).
It has encodings from 3 classes: 32-bit scaled offset , 32-bit unpacked scaled offset and 64-bit scaled offset

32-bit scaled offset

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 0 0 0 1 0 0 0 xs 1 Zm 0 1 0 Pg Rn 0 prfop
msz<1>msz<0>

PRFW <prfop>, <Pg>, [<Xn|SP>, <Zm>.S, <mod> #2]

if !HaveSVE() then UNDEFINED;

32-bit unpacked scaled offset

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 0 1 0 0 0 xs 1 Zm 0 1 0 Pg Rn 0 prfop
msz<1>msz<0>

PRFW <prfop>, <Pg>, [<Xn|SP>, <Zm>.D, <mod> #2]

if !HaveSVE() then UNDEFINED;

64-bit scaled offset

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 0 1 0 0 0 1 1 Zm 1 1 0 Pg Rn 0 prfop
msz<1>msz<0>

PRFW (scalar plus vector) Page 2298

PRFW <prfop>, <Pg>, [<Xn|SP>, <Zm>.D, LSL #2]

if !HaveSVE() then UNDEFINED;

integer esize = 64;
integer g = UInt(Pg);
integer n = UInt(Rn);
integer m = UInt(Zm);
integer level = UInt(prfop<2:1>);
boolean stream = (prfop<0> == '1');
pref_hint = if prfop<3> == '0' then Prefetch_READ else Prefetch_WRITE;
integer offs_size = 64;
boolean offs_unsigned = TRUE;
integer scale = 2;

Assembler Symbols

<prfop> Is the prefetch operation specifier, encoded in “prfop”:

<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<Zm> Is the name of the offset scalable vector register, encoded in the "Zm" field.

<mod> Is the index extend and shift specifier, encoded in “xs”:

xs <mod>
0 UXTW
1 SXTW

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(64) base;
bits(VL) offset;

if AnyActiveElement(mask, esize) then

base = if n == 31 then SP[] else X[n];
offset = Z[m];

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

PRFW (scalar plus vector) Page 2299

PRFW (vector plus immediate)

Gather prefetch words (vector plus immediate)

Gather prefetch of words from the active memory addresses generated by a vector base plus immediate index. The
index is a multiple of 4 in the range 0 to 124. Inactive addresses are not prefetched from memory.
The <prfop> symbol specifies the prefetch hint as a combination of three options: access type PLD for load or PST for
store; target cache level L1, L2 or L3; temporality (KEEP for temporal or STRM for non-temporal).
It has encodings from 2 classes: 32-bit element and 64-bit element

32-bit element

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 0 0 0 0 1 0 1 0 0 0 imm5 1 1 1 Pg Zn 0 prfop
msz<1>msz<0>

PRFW <prfop>, <Pg>, [<Zn>.S{, #<imm>}]

if !HaveSVE() then UNDEFINED;

64-bit element

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 0 0 0 1 0 1 0 0 0 imm5 1 1 1 Pg Zn 0 prfop
msz<1>msz<0>

PRFW <prfop>, <Pg>, [<Zn>.D{, #<imm>}]

if !HaveSVE() then UNDEFINED;

Assembler Symbols

<prfop> Is the prefetch operation specifier, encoded in “prfop”:

PRFW (vector plus

Page 2300
immediate)
prfop <prfop>
0000 PLDL1KEEP
0001 PLDL1STRM
0010 PLDL2KEEP
0011 PLDL2STRM
0100 PLDL3KEEP
0101 PLDL3STRM
x11x #uimm4
1000 PSTL1KEEP
1001 PSTL1STRM
1010 PSTL2KEEP
1011 PSTL2STRM
1100 PSTL3KEEP
1101 PSTL3STRM

<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Zn> Is the name of the base scalable vector register, encoded in the "Zn" field.
<imm> Is the optional unsigned immediate byte offset, a multiple of 4 in the range 0 to 124, defaulting to 0,
encoded in the "imm5" field.

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) base;

if AnyActiveElement(mask, esize) then

base = Z[n];

for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
bits(64) addr = ZeroExtend(Elem[base, e, esize], 64) + (offset << scale);
Hint_Prefetch(addr, pref_hint, level, stream);

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

PRFW (vector plus

Page 2301
immediate)
PTEST

Set condition flags for predicate

Sets the FIRST (N), NONE (Z), !LAST (C) condition flags based on the predicate source register, and the V flag to zero.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 0 1 0 1 0 0 0 0 1 1 Pg 0 Pn 0 0 0 0 0
S

PTEST <Pg>, <Pn>.B

if !HaveSVE() then UNDEFINED;

integer esize = 8;
integer g = UInt(Pg);
integer n = UInt(Pn);

Assembler Symbols

<Pg> Is the name of the governing scalable predicate register, encoded in the "Pg" field.
<Pn> Is the name of the source scalable predicate register, encoded in the "Pn" field.

Operation

CheckSVEEnabled();
bits(PL) mask = P[g];
bits(PL) result = P[n];

PSTATE.<N,Z,C,V> = PredTest(mask, result, esize);

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

PTEST Page 2302

PTRUE

Initialise predicate from named constraint

Set elements of the destination predicate to true if the element number satisfies the named predicate constraint, or to
false otherwise. If the constraint specifies more elements than are available at the current vector length then all
elements of the destination predicate are set to false.
The named predicate constraint limits the number of active elements in a single predicate to:
* A fixed number (VL1 to VL256)
* The largest power of two (POW2)
* The largest multiple of three or four (MUL3 or MUL4)
* All available, implicitly a multiple of two (ALL).
Unspecified or out of range constraint encodings generate an empty predicate or zero element count rather than
Undefined Instruction exception. Does not set the condition flags.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 size 0 1 1 0 0 0 1 1 1 0 0 0 pattern 0 Pd
S

PTRUE <Pd>.<T>{, <pattern>}

if !HaveSVE() then UNDEFINED;

integer esize = 8 << UInt(size);
integer d = UInt(Pd);
boolean setflags = FALSE;
bits(5) pat = pattern;

Assembler Symbols

<Pd> Is the name of the destination scalable predicate register, encoded in the "Pd" field.

<T> Is the size specifier, encoded in “size”:

size <T>
00 B
01 H
10 S
11 D

<pattern> Is the optional pattern specifier, defaulting to ALL, encoded in “pattern”:

PTRUE Page 2303

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
integer count = DecodePredCount(pat, esize);
bits(PL) result;

for e = 0 to elements-1
ElemP[result, e, esize] = if e < count then '1' else '0';

if setflags then
PSTATE.<N,Z,C,V> = PredTest(result, result, esize);
P[d] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

PTRUE Page 2304

PTRUES

Initialise predicate from named constraint and set the condition flags

Set elements of the destination predicate to true if the element number satisfies the named predicate constraint, or to
false otherwise. If the constraint specifies more elements than are available at the current vector length then all
elements of the destination predicate are set to false.
The named predicate constraint limits the number of active elements in a single predicate to:
* A fixed number (VL1 to VL256)
* The largest power of two (POW2)
* The largest multiple of three or four (MUL3 or MUL4)
* All available, implicitly a multiple of two (ALL).
Unspecified or out of range constraint encodings generate an empty predicate or zero element count rather than
Undefined Instruction exception. Sets the FIRST (N), NONE (Z), !LAST (C) condition flags based on the predicate result,
and the V flag to zero.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 size 0 1 1 0 0 1 1 1 1 0 0 0 pattern 0 Pd
S

PTRUES <Pd>.<T>{, <pattern>}

if !HaveSVE() then UNDEFINED;

integer esize = 8 << UInt(size);
integer d = UInt(Pd);
boolean setflags = TRUE;
bits(5) pat = pattern;

Assembler Symbols

<Pd> Is the name of the destination scalable predicate register, encoded in the "Pd" field.

<T> Is the size specifier, encoded in “size”:

size <T>
00 B
01 H
10 S
11 D

<pattern> Is the optional pattern specifier, defaulting to ALL, encoded in “pattern”:

PTRUES Page 2305

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
integer count = DecodePredCount(pat, esize);
bits(PL) result;

for e = 0 to elements-1
ElemP[result, e, esize] = if e < count then '1' else '0';

if setflags then
PSTATE.<N,Z,C,V> = PredTest(result, result, esize);
P[d] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

PTRUES Page 2306

PUNPKHI, PUNPKLO

Unpack and widen half of predicate

Unpack elements from the lowest or highest half of the source predicate and place in elements of twice their size
within the destination predicate. This instruction is unpredicated.
It has encodings from 2 classes: High half and Low half

High half

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 0 0 1 1 0 0 0 1 0 1 0 0 0 0 0 Pn 0 Pd
H

PUNPKHI <Pd>.H, <Pn>.B

if !HaveSVE() then UNDEFINED;

integer esize = 16;
integer n = UInt(Pn);
integer d = UInt(Pd);
boolean hi = TRUE;

Low half

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 0 0 1 1 0 0 0 0 0 1 0 0 0 0 0 Pn 0 Pd
H

PUNPKLO <Pd>.H, <Pn>.B

if !HaveSVE() then UNDEFINED;

integer esize = 16;
integer n = UInt(Pn);
integer d = UInt(Pd);
boolean hi = FALSE;

Assembler Symbols

<Pd> Is the name of the destination scalable predicate register, encoded in the "Pd" field.
<Pn> Is the name of the source scalable predicate register, encoded in the "Pn" field.

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) operand = P[n];
bits(PL) result;

for e = 0 to elements-1
ElemP[result, e, esize] = ElemP[operand, if hi then e + elements else e, esize DIV 2];

P[d] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

PUNPKHI, PUNPKLO Page 2307

RBIT

Reverse bits (predicated)

Reverse bits in each active element of the source vector, and place the results in the corresponding elements of the
destination vector. Inactive elements in the destination vector register remain unmodified.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 size 1 0 0 1 1 1 1 0 0 Pg Zn Zd

RBIT <Zd>.<T>, <Pg>/M, <Zn>.<T>

if !HaveSVE() then UNDEFINED;

integer esize = 8 << UInt(size);
integer g = UInt(Pg);
integer n = UInt(Zn);
integer d = UInt(Zd);

Assembler Symbols

<Zd> Is the name of the destination scalable vector register, encoded in the "Zd" field.

<T> Is the size specifier, encoded in “size”:

size <T>
00 B
01 H
10 S
11 D

<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Zn> Is the name of the source scalable vector register, encoded in the "Zn" field.

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) operand = if AnyActiveElement(mask, esize) then Z[n] else Zeros();
bits(VL) result = Z[d];

for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
bits(esize) element = Elem[operand, e, esize];
Elem[result, e, esize] = BitReverse(element);

Z[d] = result;

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

RBIT Page 2308

RDFFR (predicated)

Return predicate of succesfully loaded elements

Read the first-fault register (FFR) and place active elements in the corresponding elements of the destination
predicate. Inactive elements in the destination predicate register are set to zero. Does not set the condition flags.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 0 0 0 1 1 0 0 0 1 1 1 1 0 0 0 Pg 0 Pd
S

RDFFR <Pd>.B, <Pg>/Z

if !HaveSVE() then UNDEFINED;

integer g = UInt(Pg);
integer d = UInt(Pd);
boolean setflags = FALSE;

Assembler Symbols

<Pd> Is the name of the destination scalable predicate register, encoded in the "Pd" field.
<Pg> Is the name of the governing scalable predicate register, encoded in the "Pg" field.

Operation

CheckSVEEnabled();
bits(PL) mask = P[g];
bits(PL) ffr = FFR[];
bits(PL) result = ffr AND mask;

if setflags then
PSTATE.<N,Z,C,V> = PredTest(mask, result, 8);
P[d] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

RDFFR (predicated) Page 2309

RDFFR (unpredicated)

Read the first-fault register

Read the first-fault register (FFR) and place in the destination predicate without predication.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 0 0 0 1 1 0 0 1 1 1 1 1 0 0 0 0 0 0 0 0 Pd
S

RDFFR <Pd>.B

if !HaveSVE() then UNDEFINED;

integer d = UInt(Pd);

Assembler Symbols

<Pd> Is the name of the destination scalable predicate register, encoded in the "Pd" field.

Operation

CheckSVEEnabled();
bits(PL) ffr = FFR[];
P[d] = ffr;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

RDFFR (unpredicated) Page 2310

RDFFRS

Return predicate of succesfully loaded elements, setting the condition flags

Read the first-fault register (FFR) and place active elements in the corresponding elements of the destination
predicate. Inactive elements in the destination predicate register are set to zero. Sets the FIRST (N), NONE (Z), !LAST (C)
condition flags based on the predicate result, and the V flag to zero.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 0 1 0 1 1 0 0 0 1 1 1 1 0 0 0 Pg 0 Pd
S

RDFFRS <Pd>.B, <Pg>/Z

if !HaveSVE() then UNDEFINED;

integer g = UInt(Pg);
integer d = UInt(Pd);
boolean setflags = TRUE;

Assembler Symbols

<Pd> Is the name of the destination scalable predicate register, encoded in the "Pd" field.
<Pg> Is the name of the governing scalable predicate register, encoded in the "Pg" field.

Operation

CheckSVEEnabled();
bits(PL) mask = P[g];
bits(PL) ffr = FFR[];
bits(PL) result = ffr AND mask;

if setflags then
PSTATE.<N,Z,C,V> = PredTest(mask, result, 8);
P[d] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

RDFFRS Page 2311

RDVL

Read multiple of vector register size to scalar register

Multiply the current vector register size in bytes by an immediate in the range -32 to 31 and place the result in the
64-bit destination general-purpose register.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 1 0 1 1 1 1 1 1 0 1 0 1 0 imm6 Rd

RDVL <Xd>, #<imm>

if !HaveSVE() then UNDEFINED;

integer d = UInt(Rd);
integer imm = SInt(imm6);

Assembler Symbols

<Xd> Is the 64-bit name of the destination general-purpose register, encoded in the "Rd" field.
<imm> Is the signed immediate operand, in the range -32 to 31, encoded in the "imm6" field.

Operation

CheckSVEEnabled();
integer len = imm * (VL DIV 8);
X[d] = len<63:0>;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

RDVL Page 2312

REV (predicate)

Reverse all elements in a predicate

Reverse the order of all elements in the source predicate and place in the destination predicate. This instruction is
unpredicated.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 size 1 1 0 1 0 0 0 1 0 0 0 0 0 Pn 0 Pd

REV <Pd>.<T>, <Pn>.<T>

if !HaveSVE() then UNDEFINED;

integer esize = 8 << UInt(size);
integer n = UInt(Pn);
integer d = UInt(Pd);

Assembler Symbols

<Pd> Is the name of the destination scalable predicate register, encoded in the "Pd" field.

<T> Is the size specifier, encoded in “size”:

size <T>
00 B
01 H
10 S
11 D

<Pn> Is the name of the source scalable predicate register, encoded in the "Pn" field.

Operation

CheckSVEEnabled();
bits(PL) operand = P[n];
bits(PL) result = Reverse(operand, esize DIV 8);
P[d] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

REV (predicate) Page 2313

REV (vector)

Reverse all elements in a vector (unpredicated)

Reverse the order of all elements in the source vector and place in the destination vector. This instruction is
unpredicated.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 size 1 1 1 0 0 0 0 0 1 1 1 0 Zn Zd

REV <Zd>.<T>, <Zn>.<T>

if !HaveSVE() then UNDEFINED;

integer esize = 8 << UInt(size);
integer n = UInt(Zn);
integer d = UInt(Zd);

Assembler Symbols

<Zd> Is the name of the destination scalable vector register, encoded in the "Zd" field.

<T> Is the size specifier, encoded in “size”:

size <T>
00 B
01 H
10 S
11 D

<Zn> Is the name of the source scalable vector register, encoded in the "Zn" field.

Operation

CheckSVEEnabled();
bits(VL) operand = Z[n];
bits(VL) result = Reverse(operand, esize);
Z[d] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

REV (vector) Page 2314

REVB, REVH, REVW

Reverse bytes / halfwords / words within elements (predicated)

Reverse the order of 8-bit bytes, 16-bit halfwords or 32-bit words within each active element of the source vector, and
place the results in the corresponding elements of the destination vector. Inactive elements in the destination vector
register remain unmodified.
It has encodings from 3 classes: Byte , Halfword and Word

Byte

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 size 1 0 0 1 0 0 1 0 0 Pg Zn Zd

REVB <Zd>.<T>, <Pg>/M, <Zn>.<T>

if !HaveSVE() then UNDEFINED;

if size == '00' then UNDEFINED;
integer esize = 8 << UInt(size);
integer g = UInt(Pg);
integer n = UInt(Zn);
integer d = UInt(Zd);
integer swsize = 8;

Halfword

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 size 1 0 0 1 0 1 1 0 0 Pg Zn Zd

REVH <Zd>.<T>, <Pg>/M, <Zn>.<T>

if !HaveSVE() then UNDEFINED;

if size != '1x' then UNDEFINED;
integer esize = 8 << UInt(size);
integer g = UInt(Pg);
integer n = UInt(Zn);
integer d = UInt(Zd);
integer swsize = 16;

Word

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 size 1 0 0 1 1 0 1 0 0 Pg Zn Zd

REVW <Zd>.D, <Pg>/M, <Zn>.D

if !HaveSVE() then UNDEFINED;

if size != '11' then UNDEFINED;
integer esize = 8 << UInt(size);
integer g = UInt(Pg);
integer n = UInt(Zn);
integer d = UInt(Zd);
integer swsize = 32;

Assembler Symbols

<Zd> Is the name of the destination scalable vector register, encoded in the "Zd" field.

<T> For the byte variant: is the size specifier, encoded in “size”:

REVB, REVH, REVW Page 2315

size <T>
00 RESERVED
01 H
10 S
11 D

For the halfword variant: is the size specifier, encoded in “size<0>”:

size<0> <T>
0 S
1 D

<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Zn> Is the name of the source scalable vector register, encoded in the "Zn" field.

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) operand = if AnyActiveElement(mask, esize) then Z[n] else Zeros();
bits(VL) result = Z[d];

for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
bits(esize) element = Elem[operand, e, esize];
Elem[result, e, esize] = Reverse(element, swsize);

Z[d] = result;

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

REVB, REVH, REVW Page 2316

SABD

Signed absolute difference (predicated)

Compute the absolute difference between signed integer values in active elements of the second source vector and
corresponding elements of the first source vector and destructively place the difference in the corresponding elements
of the first source vector. Inactive elements in the destination vector register remain unmodified.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 0 0 1 1 0 0 0 0 0 Pg Zm Zdn
U

SABD <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>

if !HaveSVE() then UNDEFINED;

integer esize = 8 << UInt(size);
integer g = UInt(Pg);
integer dn = UInt(Zdn);
integer m = UInt(Zm);
boolean unsigned = FALSE;

Assembler Symbols

<Zdn> Is the name of the first source and destination scalable vector register, encoded in the "Zdn" field.

<T> Is the size specifier, encoded in “size”:

size <T>
00 B
01 H
10 S
11 D

<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Zm> Is the name of the second source scalable vector register, encoded in the "Zm" field.

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) operand1 = Z[dn];
bits(VL) operand2 = if AnyActiveElement(mask, esize) then Z[m] else Zeros();
bits(VL) result;

for e = 0 to elements-1
integer element1 = Int(Elem[operand1, e, esize], unsigned);
integer element2 = Int(Elem[operand2, e, esize], unsigned);
if ElemP[mask, e, esize] == '1' then
integer absdiff = Abs(element1 - element2);
Elem[result, e, esize] = absdiff<esize-1:0>;
else
Elem[result, e, esize] = Elem[operand1, e, esize];

Z[dn] = result;

Operational information

SABD Page 2317

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

SABD Page 2318

SADDV

Signed add reduction to scalar

Signed add horizontally across all lanes of a vector, and place the result in the SIMD&FP scalar destination register.
Narrow elements are first sign-extended to 64 bits. Inactive elements in the source vector are treated as zero.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 0 0 0 0 0 0 0 0 1 Pg Zn Vd
U

SADDV <Dd>, <Pg>, <Zn>.<T>

if !HaveSVE() then UNDEFINED;

if size == '11' then UNDEFINED;
integer esize = 8 << UInt(size);
integer g = UInt(Pg);
integer n = UInt(Zn);
integer d = UInt(Vd);

Assembler Symbols

<Dd> Is the 64-bit name of the destination SIMD&FP register, encoded in the "Vd" field.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Zn> Is the name of the source scalable vector register, encoded in the "Zn" field.

<T> Is the size specifier, encoded in “size”:

size <T>
00 B
01 H
10 S
11 RESERVED

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) operand = Z[n];
integer sum = 0;

for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
integer element = SInt(Elem[operand, e, esize]);
sum = sum + element;

V[d] = sum<63:0>;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

SADDV Page 2319

SCVTF

Signed integer convert to floating-point (predicated)

Convert to floating-point from the signed integer in each active element of the source vector, and place the results in
the corresponding elements of the destination vector. Inactive elements in the destination vector register remain
unmodified.
If the input and result types have a different size the smaller type is held unpacked in the least significant bits of
elements of the larger size. When the input is the smaller type the upper bits of each source element are ignored.
When the result is the smaller type the results are zero-extended to fill each destination element.
It has encodings from 7 classes: 16-bit to half-precision , 32-bit to half-precision , 32-bit to single-precision , 32-bit to
double-precision , 64-bit to half-precision , 64-bit to single-precision and 64-bit to double-precision

16-bit to half-precision

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 0 1 0 1 0 0 1 0 1 0 1 Pg Zn Zd
int_U

SCVTF <Zd>.H, <Pg>/M, <Zn>.H

if !HaveSVE() then UNDEFINED;

integer esize = 16;
integer g = UInt(Pg);
integer n = UInt(Zn);
integer d = UInt(Zd);
integer s_esize = 16;
integer d_esize = 16;
boolean unsigned = FALSE;
FPRounding rounding = FPRoundingMode(FPCR[]);

32-bit to half-precision

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 0 1 0 1 0 1 0 0 1 0 1 Pg Zn Zd
int_U

SCVTF <Zd>.H, <Pg>/M, <Zn>.S

if !HaveSVE() then UNDEFINED;

integer esize = 32;
integer g = UInt(Pg);
integer n = UInt(Zn);
integer d = UInt(Zd);
integer s_esize = 32;
integer d_esize = 16;
boolean unsigned = FALSE;
FPRounding rounding = FPRoundingMode(FPCR[]);

32-bit to single-precision

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 1 0 0 1 0 1 0 0 1 0 1 Pg Zn Zd
int_U

SCVTF Page 2320

SCVTF <Zd>.S, <Pg>/M, <Zn>.S

if !HaveSVE() then UNDEFINED;

integer esize = 32;
integer g = UInt(Pg);
integer n = UInt(Zn);
integer d = UInt(Zd);
integer s_esize = 32;
integer d_esize = 32;
boolean unsigned = FALSE;
FPRounding rounding = FPRoundingMode(FPCR[]);

32-bit to double-precision

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 1 1 0 1 0 0 0 0 1 0 1 Pg Zn Zd
int_U

SCVTF <Zd>.D, <Pg>/M, <Zn>.S

if !HaveSVE() then UNDEFINED;

integer esize = 64;
integer g = UInt(Pg);
integer n = UInt(Zn);
integer d = UInt(Zd);
integer s_esize = 32;
integer d_esize = 64;
boolean unsigned = FALSE;
FPRounding rounding = FPRoundingMode(FPCR[]);

64-bit to half-precision

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 0 1 0 1 0 1 1 0 1 0 1 Pg Zn Zd
int_U

SCVTF <Zd>.H, <Pg>/M, <Zn>.D

if !HaveSVE() then UNDEFINED;

integer esize = 64;
integer g = UInt(Pg);
integer n = UInt(Zn);
integer d = UInt(Zd);
integer s_esize = 64;
integer d_esize = 16;
boolean unsigned = FALSE;
FPRounding rounding = FPRoundingMode(FPCR[]);

64-bit to single-precision

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 1 1 0 1 0 1 0 0 1 0 1 Pg Zn Zd
int_U

SCVTF Page 2321

SCVTF <Zd>.S, <Pg>/M, <Zn>.D

if !HaveSVE() then UNDEFINED;

integer esize = 64;
integer g = UInt(Pg);
integer n = UInt(Zn);
integer d = UInt(Zd);
integer s_esize = 64;
integer d_esize = 32;
boolean unsigned = FALSE;
FPRounding rounding = FPRoundingMode(FPCR[]);

64-bit to double-precision

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 1 1 0 1 0 1 1 0 1 0 1 Pg Zn Zd
int_U

SCVTF <Zd>.D, <Pg>/M, <Zn>.D

if !HaveSVE() then UNDEFINED;

integer esize = 64;
integer g = UInt(Pg);
integer n = UInt(Zn);
integer d = UInt(Zd);
integer s_esize = 64;
integer d_esize = 64;
boolean unsigned = FALSE;
FPRounding rounding = FPRoundingMode(FPCR[]);

Assembler Symbols

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) operand = if AnyActiveElement(mask, esize) then Z[n] else Zeros();
bits(VL) result = Z[d];

for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
bits(esize) element = Elem[operand, e, esize];
bits(d_esize) fpval = FixedToFP(element<s_esize-1:0>, 0, unsigned, FPCR[], rounding);
Elem[result, e, esize] = ZeroExtend(fpval);

Z[d] = result;

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

SCVTF Page 2322

SCVTF Page 2323

SDIV

Signed divide (predicated)

Signed divide active elements of the first source vector by corresponding elements of the second source vector and
destructively place the quotient in the corresponding elements of the first source vector. Inactive elements in the
destination vector register remain unmodified.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 0 1 0 1 0 0 0 0 0 Pg Zm Zdn
R U

SDIV <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>

if !HaveSVE() then UNDEFINED;

if size == '0x' then UNDEFINED;
integer esize = 8 << UInt(size);
integer g = UInt(Pg);
integer dn = UInt(Zdn);
integer m = UInt(Zm);
boolean unsigned = FALSE;

Assembler Symbols

<Zdn> Is the name of the first source and destination scalable vector register, encoded in the "Zdn" field.

<T> Is the size specifier, encoded in “size<0>”:

size<0> <T>
0 S
1 D

<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Zm> Is the name of the second source scalable vector register, encoded in the "Zm" field.

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) operand1 = Z[dn];
bits(VL) operand2 = if AnyActiveElement(mask, esize) then Z[m] else Zeros();
bits(VL) result;

for e = 0 to elements-1
integer element1 = Int(Elem[operand1, e, esize], unsigned);
if ElemP[mask, e, esize] == '1' then
integer element2 = Int(Elem[operand2, e, esize], unsigned);
integer quotient;
if element2 == 0 then
quotient = 0;
else
quotient = RoundTowardsZero(Real(element1) / Real(element2));
Elem[result, e, esize] = quotient<esize-1:0>;
else
Elem[result, e, esize] = Elem[operand1, e, esize];

Z[dn] = result;

SDIV Page 2324

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

SDIV Page 2325

SDIVR

Signed reversed divide (predicated)

Signed reversed divide active elements of the second source vector by corresponding elements of the first source
vector and destructively place the quotient in the corresponding elements of the first source vector. Inactive elements
in the destination vector register remain unmodified.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 0 1 0 1 1 0 0 0 0 Pg Zm Zdn
R U

SDIVR <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>

if !HaveSVE() then UNDEFINED;

if size == '0x' then UNDEFINED;
integer esize = 8 << UInt(size);
integer g = UInt(Pg);
integer dn = UInt(Zdn);
integer m = UInt(Zm);
boolean unsigned = FALSE;

Assembler Symbols

<Zdn> Is the name of the first source and destination scalable vector register, encoded in the "Zdn" field.

<T> Is the size specifier, encoded in “size<0>”:

size<0> <T>
0 S
1 D

<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Zm> Is the name of the second source scalable vector register, encoded in the "Zm" field.

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) operand1 = Z[dn];
bits(VL) operand2 = if AnyActiveElement(mask, esize) then Z[m] else Zeros();
bits(VL) result;

for e = 0 to elements-1
integer element1 = Int(Elem[operand1, e, esize], unsigned);
if ElemP[mask, e, esize] == '1' then
integer element2 = Int(Elem[operand2, e, esize], unsigned);
integer quotient;
if element1 == 0 then
quotient = 0;
else
quotient = RoundTowardsZero(Real(element2) / Real(element1));
Elem[result, e, esize] = quotient<esize-1:0>;
else
Elem[result, e, esize] = Elem[operand1, e, esize];

Z[dn] = result;

SDIVR Page 2326

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

SDIVR Page 2327

SDOT (indexed)

Signed integer indexed dot product

The signed integer indexed dot product instruction computes the dot product of a group of four signed 8-bit or 16-bit
integer values held in each 32-bit or 64-bit element of the first source vector multiplied by a group of four signed 8-bit
or 16-bit integer values in an indexed 32-bit or 64-bit element of the second source vector, and then destructively adds
the widened dot product to the corresponding 32-bit or 64-bit element of the destination vector.
The groups within the second source vector are specified using an immediate index which selects the same group
position within each 128-bit vector segment. The index range is from 0 to one less than the number of groups per
128-bit segment, encoded in 1 to 2 bits depending on the size of the group. This instruction is unpredicated.
It has encodings from 2 classes: 32-bit and 64-bit

32-bit

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 0 0 1 0 0 1 0 1 i2 Zm 0 0 0 0 0 0 Zn Zda
size<1>size<0> U

SDOT <Zda>.S, <Zn>.B, <Zm>.B[<imm>]

if !HaveSVE() then UNDEFINED;

integer esize = 32;
integer index = UInt(i2);
integer n = UInt(Zn);
integer m = UInt(Zm);
integer da = UInt(Zda);

64-bit

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 0 0 1 0 0 1 1 1 i1 Zm 0 0 0 0 0 0 Zn Zda
size<1>size<0> U

SDOT <Zda>.D, <Zn>.H, <Zm>.H[<imm>]

if !HaveSVE() then UNDEFINED;

integer esize = 64;
integer index = UInt(i1);
integer n = UInt(Zn);
integer m = UInt(Zm);
integer da = UInt(Zda);

Assembler Symbols

<Zda> Is the name of the third source and destination scalable vector register, encoded in the "Zda" field.
<Zn> Is the name of the first source scalable vector register, encoded in the "Zn" field.
<Zm> For the 32-bit variant: is the name of the second source scalable vector register Z0-Z7, encoded in the
"Zm" field.
For the 64-bit variant: is the name of the second source scalable vector register Z0-Z15, encoded in the
"Zm" field.
<imm> For the 32-bit variant: is the immediate index of a quadtuplet of four 8-bit elements within each 128-bit
vector segment, in the range 0 to 3, encoded in the "i2" field.
For the 64-bit variant: is the immediate index of a quadtuplet of four 16-bit elements within each 128-bit
vector segment, in the range 0 to 1, encoded in the "i1" field.

SDOT (indexed) Page 2328

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
integer eltspersegment = 128 DIV esize;
bits(VL) operand1 = Z[n];
bits(VL) operand2 = Z[m];
bits(VL) operand3 = Z[da];
bits(VL) result;

for e = 0 to elements-1
integer segmentbase = e - (e MOD eltspersegment);
integer s = segmentbase + index;
bits(esize) res = Elem[operand3, e, esize];
for i = 0 to 3
integer element1 = SInt(Elem[operand1, 4 * e + i, esize DIV 4]);
integer element2 = SInt(Elem[operand2, 4 * s + i, esize DIV 4]);
res = res + element1 * element2;
Elem[result, e, esize] = res;

Z[da] = result;

Operational information

This instruction might be immediately preceded in program order by a MOVPRFX instruction. The MOVPRFX instruction
must conform to all of the following requirements, otherwise the behavior of the MOVPRFX and this instruction is
UNPREDICTABLE:
• The MOVPRFX instruction must be unpredicated.
• The MOVPRFX instruction must specify the same destination register as this instruction.
• The destination register must not refer to architectural register state referenced by any other source operand
register of this instruction.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

SDOT (indexed) Page 2329

SDOT (vectors)

Signed integer dot product

The signed integer dot product instruction computes the dot product of a group of four signed 8-bit or 16-bit integer
values held in each 32-bit or 64-bit element of the first source vector multiplied by a group of four signed 8-bit or
16-bit integer values in the corresponding 32-bit or 64-bit element of the second source vector, and then destructively
adds the widened dot product to the corresponding 32-bit or 64-bit element of the destination vector.
This instruction is unpredicated.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 0 0 1 0 0 size 0 Zm 0 0 0 0 0 0 Zn Zda
U

SDOT <Zda>.<T>, <Zn>.<Tb>, <Zm>.<Tb>

if !HaveSVE() then UNDEFINED;

if size == '0x' then UNDEFINED;
integer esize = 8 << UInt(size);
integer n = UInt(Zn);
integer m = UInt(Zm);
integer da = UInt(Zda);

Assembler Symbols

<Zda> Is the name of the third source and destination scalable vector register, encoded in the "Zda" field.

<T> Is the size specifier, encoded in “size<0>”:

size<0> <T>
0 S
1 D

<Zn> Is the name of the first source scalable vector register, encoded in the "Zn" field.

<Tb> Is the size specifier, encoded in “size<0>”:

size<0> <Tb>
0 B
1 H

<Zm> Is the name of the second source scalable vector register, encoded in the "Zm" field.

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(VL) operand1 = Z[n];
bits(VL) operand2 = Z[m];
bits(VL) operand3 = Z[da];
bits(VL) result;

for e = 0 to elements-1
bits(esize) res = Elem[operand3, e, esize];
for i = 0 to 3
integer element1 = SInt(Elem[operand1, 4 * e + i, esize DIV 4]);
integer element2 = SInt(Elem[operand2, 4 * e + i, esize DIV 4]);
res = res + element1 * element2;
Elem[result, e, esize] = res;

Z[da] = result;

SDOT (vectors) Page 2330

Operational information

This instruction might be immediately preceded in program order by a MOVPRFX instruction. The MOVPRFX instruction
must conform to all of the following requirements, otherwise the behavior of the MOVPRFX and this instruction is
UNPREDICTABLE:
• The MOVPRFX instruction must be unpredicated.
• The MOVPRFX instruction must specify the same destination register as this instruction.
• The destination register must not refer to architectural register state referenced by any other source operand
register of this instruction.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

SDOT (vectors) Page 2331

SEL (predicates)

Conditionally select elements from two predicates

Read active elements from the first source predicate and inactive elements from the second source predicate and
place in the corresponding elements of the destination predicate. Does not set the condition flags.
This instruction is used by the alias MOV (predicate, predicated, merging).
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 0 0 0 0 Pm 0 1 Pg 1 Pn 1 Pd
S

SEL <Pd>.B, <Pg>, <Pn>.B, <Pm>.B

if !HaveSVE() then UNDEFINED;

integer esize = 8;
integer g = UInt(Pg);
integer n = UInt(Pn);
integer m = UInt(Pm);
integer d = UInt(Pd);

Assembler Symbols

Alias Conditions

Alias Is preferred when

MOV (predicate, predicated, merging) Pd == Pm

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(PL) operand1 = P[n];
bits(PL) operand2 = P[m];
bits(PL) result;

P[d] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

SEL (predicates) Page 2332

SEL (vectors)

Conditionally select elements from two vectors

Read active elements from the first source vector and inactive elements from the second source vector and place in
the corresponding elements of the destination vector.
This instruction is used by the alias MOV (vector, predicated).
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 size 1 Zm 1 1 Pg Zn Zd

SEL <Zd>.<T>, <Pg>, <Zn>.<T>, <Zm>.<T>

if !HaveSVE() then UNDEFINED;

integer esize = 8 << UInt(size);
integer g = UInt(Pg);
integer n = UInt(Zn);
integer m = UInt(Zm);
integer d = UInt(Zd);

Assembler Symbols

<Zd> Is the name of the destination scalable vector register, encoded in the "Zd" field.

<T> Is the size specifier, encoded in “size”:

size <T>
00 B
01 H
10 S
11 D

<Pg> Is the name of the governing scalable predicate register, encoded in the "Pg" field.
<Zn> Is the name of the first source scalable vector register, encoded in the "Zn" field.
<Zm> Is the name of the second source scalable vector register, encoded in the "Zm" field.

Alias Conditions

Alias Is preferred when

MOV (vector, predicated) Zd == Zm

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) operand1 = if AnyActiveElement(mask, esize) then Z[n] else Zeros();
bits(VL) operand2 = if AnyActiveElement(NOT(mask), esize) then Z[m] else Zeros();
bits(VL) result;

for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
Elem[result, e, esize] = Elem[operand1, e, esize];
else
Elem[result, e, esize] = Elem[operand2, e, esize];

Z[d] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

SEL (vectors) Page 2333

SETFFR

Initialise the first-fault register to all true

Initialise the first-fault register (FFR) to all true prior to a sequence of first-fault or non-fault loads. This instruction is
unpredicated.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 0 0 1 0 1 1 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0

SETFFR

if !HaveSVE() then UNDEFINED;

Operation

CheckSVEEnabled();
FFR[] = Ones(PL);

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

SETFFR Page 2334

SMAX (immediate)

Signed maximum with immediate (unpredicated)

Determine the signed maximum of an immediate and each element of the source vector, and destructively place the
results in the corresponding elements of the source vector. The immediate is a signed 8-bit value in the range -128 to
+127, inclusive. This instruction is unpredicated.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 size 1 0 1 0 0 0 1 1 0 imm8 Zdn
U

SMAX <Zdn>.<T>, <Zdn>.<T>, #<imm>

if !HaveSVE() then UNDEFINED;

integer esize = 8 << UInt(size);
integer dn = UInt(Zdn);
boolean unsigned = FALSE;
integer imm = Int(imm8, unsigned);

Assembler Symbols

<Zdn> Is the name of the source and destination scalable vector register, encoded in the "Zdn" field.

<T> Is the size specifier, encoded in “size”:

size <T>
00 B
01 H
10 S
11 D

<imm> Is the signed immediate operand, in the range -128 to 127, encoded in the "imm8" field.

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(VL) operand1 = Z[dn];
bits(VL) result;

for e = 0 to elements-1
integer element1 = Int(Elem[operand1, e, esize], unsigned);
Elem[result, e, esize] = Max(element1, imm)<esize-1:0>;

Z[dn] = result;

Operational information

This instruction might be immediately preceded in program order by a MOVPRFX instruction. The MOVPRFX instruction
must conform to all of the following requirements, otherwise the behavior of the MOVPRFX and this instruction is
UNPREDICTABLE:
• The MOVPRFX instruction must be unpredicated.
• The MOVPRFX instruction must specify the same destination register as this instruction.
• The destination register must not refer to architectural register state referenced by any other source operand
register of this instruction.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

SMAX (immediate) Page 2335

SMAX (vectors)

Signed maximum vectors (predicated)

Determine the signed maximum of active elements of the second source vector and corresponding elements of the first
source vector and destructively place the results in the corresponding elements of the first source vector. Inactive
elements in the destination vector register remain unmodified.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 0 0 1 0 0 0 0 0 0 Pg Zm Zdn
U

SMAX <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>

if !HaveSVE() then UNDEFINED;

integer esize = 8 << UInt(size);
integer g = UInt(Pg);
integer dn = UInt(Zdn);
integer m = UInt(Zm);
boolean unsigned = FALSE;

Assembler Symbols

<Zdn> Is the name of the first source and destination scalable vector register, encoded in the "Zdn" field.

<T> Is the size specifier, encoded in “size”:

size <T>
00 B
01 H
10 S
11 D

<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Zm> Is the name of the second source scalable vector register, encoded in the "Zm" field.

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) operand1 = Z[dn];
bits(VL) operand2 = if AnyActiveElement(mask, esize) then Z[m] else Zeros();
bits(VL) result;

for e = 0 to elements-1
integer element1 = Int(Elem[operand1, e, esize], unsigned);
integer element2 = Int(Elem[operand2, e, esize], unsigned);
if ElemP[mask, e, esize] == '1' then
integer maximum = Max(element1, element2);
Elem[result, e, esize] = maximum<esize-1:0>;
else
Elem[result, e, esize] = Elem[operand1, e, esize];

Z[dn] = result;

Operational information

SMAX (vectors) Page 2336

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

SMAX (vectors) Page 2337

SMAXV

Signed maximum reduction to scalar

Signed maximum horizontally across all lanes of a vector, and place the result in the SIMD&FP scalar destination
register. Inactive elements in the source vector are treated as the minimum signed integer for the element size.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 0 0 1 0 0 0 0 0 1 Pg Zn Vd
U

SMAXV <V><d>, <Pg>, <Zn>.<T>

if !HaveSVE() then UNDEFINED;

integer esize = 8 << UInt(size);
integer g = UInt(Pg);
integer n = UInt(Zn);
integer d = UInt(Vd);
boolean unsigned = FALSE;

Assembler Symbols

<V> Is a width specifier, encoded in “size”:

size <V>
00 B
01 H
10 S
11 D

<T> Is the size specifier, encoded in “size”:

size <T>
00 B
01 H
10 S
11 D

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) operand = if AnyActiveElement(mask, esize) then Z[n] else Zeros();
integer maximum = if unsigned then 0 else -(2^(esize-1));

for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
integer element = Int(Elem[operand, e, esize], unsigned);
maximum = Max(maximum, element);

V[d] = maximum<esize-1:0>;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

SMAXV Page 2338

SMIN (immediate)

Signed minimum with immediate (unpredicated)

Determine the signed minimum of an immediate and each element of the source vector, and destructively place the
results in the corresponding elements of the source vector. The immediate is a signed 8-bit value in the range -128 to
+127, inclusive. This instruction is unpredicated.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 size 1 0 1 0 1 0 1 1 0 imm8 Zdn
U

SMIN <Zdn>.<T>, <Zdn>.<T>, #<imm>

if !HaveSVE() then UNDEFINED;

integer esize = 8 << UInt(size);
integer dn = UInt(Zdn);
boolean unsigned = FALSE;
integer imm = Int(imm8, unsigned);

Assembler Symbols

<Zdn> Is the name of the source and destination scalable vector register, encoded in the "Zdn" field.

<T> Is the size specifier, encoded in “size”:

size <T>
00 B
01 H
10 S
11 D

<imm> Is the signed immediate operand, in the range -128 to 127, encoded in the "imm8" field.

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(VL) operand1 = Z[dn];
bits(VL) result;

for e = 0 to elements-1
integer element1 = Int(Elem[operand1, e, esize], unsigned);
Elem[result, e, esize] = Min(element1, imm)<esize-1:0>;

Z[dn] = result;

Operational information

This instruction might be immediately preceded in program order by a MOVPRFX instruction. The MOVPRFX instruction
must conform to all of the following requirements, otherwise the behavior of the MOVPRFX and this instruction is
UNPREDICTABLE:
• The MOVPRFX instruction must be unpredicated.
• The MOVPRFX instruction must specify the same destination register as this instruction.
• The destination register must not refer to architectural register state referenced by any other source operand
register of this instruction.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

SMIN (immediate) Page 2339

SMIN (vectors)

Signed minimum vectors (predicated)

Determine the signed minimum of active elements of the second source vector and corresponding elements of the first
source vector and destructively place the results in the corresponding elements of the first source vector. Inactive
elements in the destination vector register remain unmodified.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 0 0 1 0 1 0 0 0 0 Pg Zm Zdn
U

SMIN <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>

if !HaveSVE() then UNDEFINED;

integer esize = 8 << UInt(size);
integer g = UInt(Pg);
integer dn = UInt(Zdn);
integer m = UInt(Zm);
boolean unsigned = FALSE;

Assembler Symbols

<Zdn> Is the name of the first source and destination scalable vector register, encoded in the "Zdn" field.

<T> Is the size specifier, encoded in “size”:

size <T>
00 B
01 H
10 S
11 D

<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Zm> Is the name of the second source scalable vector register, encoded in the "Zm" field.

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) operand1 = Z[dn];
bits(VL) operand2 = if AnyActiveElement(mask, esize) then Z[m] else Zeros();
bits(VL) result;

for e = 0 to elements-1
integer element1 = Int(Elem[operand1, e, esize], unsigned);
integer element2 = Int(Elem[operand2, e, esize], unsigned);
if ElemP[mask, e, esize] == '1' then
integer minimum = Min(element1, element2);
Elem[result, e, esize] = minimum<esize-1:0>;
else
Elem[result, e, esize] = Elem[operand1, e, esize];

Z[dn] = result;

Operational information

SMIN (vectors) Page 2340

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

SMIN (vectors) Page 2341

SMINV

Signed minimum reduction to scalar

Signed minimum horizontally across all lanes of a vector, and place the result in the SIMD&FP scalar destination
register. Inactive elements in the source vector are treated as the maximum signed integer for the element size.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 0 0 1 0 1 0 0 0 1 Pg Zn Vd
U

SMINV <V><d>, <Pg>, <Zn>.<T>

if !HaveSVE() then UNDEFINED;

integer esize = 8 << UInt(size);
integer g = UInt(Pg);
integer n = UInt(Zn);
integer d = UInt(Vd);
boolean unsigned = FALSE;

Assembler Symbols

<V> Is a width specifier, encoded in “size”:

size <V>
00 B
01 H
10 S
11 D

<T> Is the size specifier, encoded in “size”:

size <T>
00 B
01 H
10 S
11 D

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) operand = if AnyActiveElement(mask, esize) then Z[n] else Zeros();
integer minimum = if unsigned then (2^esize - 1) else (2^(esize-1) - 1);

for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
integer element = Int(Elem[operand, e, esize], unsigned);
minimum = Min(minimum, element);

V[d] = minimum<esize-1:0>;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

SMINV Page 2342

SMMLA

Signed integer matrix multiply-accumulate

The signed integer matrix multiply-accumulate instruction multiplies the 2×8 matrix of signed 8-bit integer values
held in each 128-bit segment of the first source vector by the 8×2 matrix of signed 8-bit integer values in the
corresponding segment of the second source vector. The resulting 2×2 widened 32-bit integer matrix product is then
destructively added to the 32-bit integer matrix accumulator held in the corresponding segment of the addend and
destination vector. This is equivalent to performing an 8-way dot product per destination element.
This instruction is unpredicated.
ID_AA64ZFR0_EL1.I8MM indicates whether this instruction is implemented.

SVE
(FEAT_I8MM)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 0 0 1 0 1 0 0 0 Zm 1 0 0 1 1 0 Zn Zda
uns<1>uns<0>

SMMLA <Zda>.S, <Zn>.B, <Zm>.B

if !HaveSVE() || !HaveInt8MatMulExt() then UNDEFINED;

integer n = UInt(Zn);
integer m = UInt(Zm);
integer da = UInt(Zda);
boolean op1_unsigned = FALSE;
boolean op2_unsigned = FALSE;

Assembler Symbols

Operation

CheckSVEEnabled();
integer segments = VL DIV 128;
bits(VL) operand1 = Z[n];
bits(VL) operand2 = Z[m];
bits(VL) operand3 = Z[da];
bits(VL) result = Zeros();
bits(128) op1, op2;
bits(128) res, addend;

for s = 0 to segments-1
op1 = Elem[operand1, s, 128];
op2 = Elem[operand2, s, 128];
addend = Elem[operand3, s, 128];
res = MatMulAdd(addend, op1, op2, op1_unsigned, op2_unsigned);
Elem[result, s, 128] = res;

Z[da] = result;

Operational information

SMMLA Page 2343

• The destination register must not refer to architectural register state referenced by any other source operand
register of this instruction.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

SMMLA Page 2344

SMULH

Signed multiply returning high half (predicated)

Widening multiply signed integer values in active elements of the first source vector by corresponding elements of the
second source vector and destructively place the high half of the result in the corresponding elements of the first
source vector. Inactive elements in the destination vector register remain unmodified.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 0 1 0 0 1 0 0 0 0 Pg Zm Zdn
H U

SMULH <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>

if !HaveSVE() then UNDEFINED;

integer esize = 8 << UInt(size);
integer g = UInt(Pg);
integer dn = UInt(Zdn);
integer m = UInt(Zm);
boolean unsigned = FALSE;

Assembler Symbols

<Zdn> Is the name of the first source and destination scalable vector register, encoded in the "Zdn" field.

<T> Is the size specifier, encoded in “size”:

size <T>
00 B
01 H
10 S
11 D

<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Zm> Is the name of the second source scalable vector register, encoded in the "Zm" field.

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) operand1 = Z[dn];
bits(VL) operand2 = if AnyActiveElement(mask, esize) then Z[m] else Zeros();
bits(VL) result;

for e = 0 to elements-1
integer element1 = Int(Elem[operand1, e, esize], unsigned);
integer element2 = Int(Elem[operand2, e, esize], unsigned);
if ElemP[mask, e, esize] == '1' then
integer product = (element1 * element2) >> esize;
Elem[result, e, esize] = product<esize-1:0>;
else
Elem[result, e, esize] = Elem[operand1, e, esize];

Z[dn] = result;

Operational information

SMULH Page 2345

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

SMULH Page 2346

SPLICE

Splice two vectors under predicate control

Copy the first active to last active elements (inclusive) from the first source vector to the lowest-numbered elements of
the result. Then set any remaining elements of the result to a copy of the lowest-numbered elements from the second
source vector. The result is placed destructively in the first source vector.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 size 1 0 1 1 0 0 1 0 0 Pg Zm Zdn

SPLICE <Zdn>.<T>, <Pg>, <Zdn>.<T>, <Zm>.<T>

if !HaveSVE() then UNDEFINED;

integer esize = 8 << UInt(size);
integer g = UInt(Pg);
integer dn = UInt(Zdn);
integer m = UInt(Zm);

Assembler Symbols

<Zdn> Is the name of the first source and destination scalable vector register, encoded in the "Zdn" field.

<T> Is the size specifier, encoded in “size”:

size <T>
00 B
01 H
10 S
11 D

<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Zm> Is the name of the second source scalable vector register, encoded in the "Zm" field.

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) operand1 = Z[dn];
bits(VL) operand2 = Z[m];
bits(VL) result;
integer x = 0;
boolean active = FALSE;
integer lastnum = LastActiveElement(mask, esize);

if lastnum >= 0 then

for e = 0 to lastnum
active = active || ElemP[mask, e, esize] == '1';
if active then
Elem[result, x, esize] = Elem[operand1, e, esize];
x = x + 1;

elements = (elements - x) - 1;
for e = 0 to elements
Elem[result, x, esize] = Elem[operand2, e, esize];
x = x + 1;

Z[dn] = result;

SPLICE Page 2347

Operational information

This instruction might be immediately preceded in program order by a MOVPRFX instruction. The MOVPRFX instruction
must conform to all of the following requirements, otherwise the behavior of the MOVPRFX and this instruction is
UNPREDICTABLE:
• The MOVPRFX instruction must be unpredicated.
• The MOVPRFX instruction must specify the same destination register as this instruction.
• The destination register must not refer to architectural register state referenced by any other source operand
register of this instruction.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

SPLICE Page 2348

SQADD (immediate)

Signed saturating add immediate (unpredicated)

Signed saturating add of an unsigned immediate to each element of the source vector, and destructively place the
results in the corresponding elements of the source vector. Each result element is saturated to the N-bit element's
signed integer range -2(N-1) to (2(N-1) )-1. This instruction is unpredicated.
The immediate is an unsigned value in the range 0 to 255, and for element widths of 16 bits or higher it may also be a
positive multiple of 256 in the range 256 to 65280.
The immediate is encoded in 8 bits with an optional left shift by 8. The preferred disassembly when the shift option is
specified is "#<uimm8>, LSL #8". However an assembler and disassembler may also allow use of the shifted 16-bit
value unless the immediate is 0 and the shift amount is 8, which must be unambiguously described as "#0, LSL #8".
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 size 1 0 0 1 0 0 1 1 sh imm8 Zdn
U

SQADD <Zdn>.<T>, <Zdn>.<T>, #<imm>{, <shift>}

if !HaveSVE() then UNDEFINED;

if size:sh == '001' then UNDEFINED;
integer esize = 8 << UInt(size);
integer dn = UInt(Zdn);
integer imm = UInt(imm8);
if sh == '1' then imm = imm << 8;
boolean unsigned = FALSE;

Assembler Symbols

<Zdn> Is the name of the source and destination scalable vector register, encoded in the "Zdn" field.

<T> Is the size specifier, encoded in “size”:

size <T>
00 B
01 H
10 S
11 D

<imm> Is an unsigned immediate in the range 0 to 255, encoded in the "imm8" field.

<shift> Is the optional left shift to apply to the immediate, defaulting to LSL #0 and encoded in “sh”:

sh <shift>
0 LSL #0
1 LSL #8

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(VL) operand1 = Z[dn];
bits(VL) result;

for e = 0 to elements-1
integer element1 = Int(Elem[operand1, e, esize], unsigned);
(Elem[result, e, esize], -) = SatQ(element1 + imm, esize, unsigned);

Z[dn] = result;

SQADD (immediate) Page 2349

Operational information

This instruction might be immediately preceded in program order by a MOVPRFX instruction. The MOVPRFX instruction
must conform to all of the following requirements, otherwise the behavior of the MOVPRFX and this instruction is
UNPREDICTABLE:
• The MOVPRFX instruction must be unpredicated.
• The MOVPRFX instruction must specify the same destination register as this instruction.
• The destination register must not refer to architectural register state referenced by any other source operand
register of this instruction.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

SQADD (immediate) Page 2350

SQADD (vectors)

Signed saturating add vectors (unpredicated)

Signed saturating add all elements of the second source vector to corresponding elements of the first source vector
and place the results in the corresponding elements of the destination vector. Each result element is saturated to the
N-bit element's signed integer range -2(N-1) to (2(N-1) )-1. This instruction is unpredicated.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 1 Zm 0 0 0 1 0 0 Zn Zd
U

SQADD <Zd>.<T>, <Zn>.<T>, <Zm>.<T>

if !HaveSVE() then UNDEFINED;

integer esize = 8 << UInt(size);
integer n = UInt(Zn);
integer m = UInt(Zm);
integer d = UInt(Zd);
boolean unsigned = FALSE;

Assembler Symbols

<Zd> Is the name of the destination scalable vector register, encoded in the "Zd" field.

<T> Is the size specifier, encoded in “size”:

size <T>
00 B
01 H
10 S
11 D

<Zn> Is the name of the first source scalable vector register, encoded in the "Zn" field.
<Zm> Is the name of the second source scalable vector register, encoded in the "Zm" field.

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(VL) operand1 = Z[n];
bits(VL) operand2 = Z[m];
bits(VL) result;

for e = 0 to elements-1
integer element1 = Int(Elem[operand1, e, esize], unsigned);
integer element2 = Int(Elem[operand2, e, esize], unsigned);
(Elem[result, e, esize], -) = SatQ(element1 + element2, esize, unsigned);

Z[d] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

SQADD (vectors) Page 2351

SQDECB

Signed saturating decrement scalar by multiple of 8-bit predicate constraint element count

Determines the number of active 8-bit elements implied by the named predicate constraint, multiplies that by an
immediate in the range 1 to 16 inclusive, and then uses the result to decrement the scalar destination. The result is
saturated to the source general-purpose register's signed integer range. A 32-bit saturated result is then sign-
extended to 64 bits.
The named predicate constraint limits the number of active elements in a single predicate to:
* A fixed number (VL1 to VL256)
* The largest power of two (POW2)
* The largest multiple of three or four (MUL3 or MUL4)
* All available, implicitly a multiple of two (ALL).
Unspecified or out of range constraint encodings generate an empty predicate or zero element count rather than
Undefined Instruction exception.
It has encodings from 2 classes: 32-bit and 64-bit

32-bit

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 0 0 1 0 imm4 1 1 1 1 1 0 pattern Rdn
size<1>size<0> sf D U

SQDECB <Xdn>, <Wdn>{, <pattern>{, MUL #<imm>}}

if !HaveSVE() then UNDEFINED;

integer esize = 8;
integer dn = UInt(Rdn);
bits(5) pat = pattern;
integer imm = UInt(imm4) + 1;
boolean unsigned = FALSE;
integer ssize = 32;

64-bit

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 0 0 1 1 imm4 1 1 1 1 1 0 pattern Rdn
size<1>size<0> sf D U

SQDECB <Xdn>{, <pattern>{, MUL #<imm>}}

if !HaveSVE() then UNDEFINED;

integer esize = 8;
integer dn = UInt(Rdn);
bits(5) pat = pattern;
integer imm = UInt(imm4) + 1;
boolean unsigned = FALSE;
integer ssize = 64;

Assembler Symbols

<Xdn> Is the 64-bit name of the source and destination general-purpose register, encoded in the "Rdn" field.
<Wdn> Is the 32-bit name of the source and destination general-purpose register, encoded in the "Rdn" field.

<pattern> Is the optional pattern specifier, defaulting to ALL, encoded in “pattern”:

SQDECB Page 2352

<imm> Is the immediate multiplier, in the range 1 to 16, defaulting to 1, encoded in the "imm4" field.

Operation

CheckSVEEnabled();
integer count = DecodePredCount(pat, esize);
bits(ssize) operand1 = X[dn];
bits(ssize) result;

integer element1 = Int(operand1, unsigned);

(result, -) = SatQ(element1 - (count * imm), ssize, unsigned);
X[dn] = Extend(result, 64, unsigned);

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

SQDECB Page 2353

SQDECD (scalar)

Signed saturating decrement scalar by multiple of 64-bit predicate constraint element count

Determines the number of active 64-bit elements implied by the named predicate constraint, multiplies that by an
immediate in the range 1 to 16 inclusive, and then uses the result to decrement the scalar destination. The result is
saturated to the source general-purpose register's signed integer range. A 32-bit saturated result is then sign-
extended to 64 bits.
The named predicate constraint limits the number of active elements in a single predicate to:
* A fixed number (VL1 to VL256)
* The largest power of two (POW2)
* The largest multiple of three or four (MUL3 or MUL4)
* All available, implicitly a multiple of two (ALL).
Unspecified or out of range constraint encodings generate an empty predicate or zero element count rather than
Undefined Instruction exception.
It has encodings from 2 classes: 32-bit and 64-bit

32-bit

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 1 1 1 0 imm4 1 1 1 1 1 0 pattern Rdn
size<1>size<0> sf D U

SQDECD <Xdn>, <Wdn>{, <pattern>{, MUL #<imm>}}

if !HaveSVE() then UNDEFINED;

integer esize = 64;
integer dn = UInt(Rdn);
bits(5) pat = pattern;
integer imm = UInt(imm4) + 1;
boolean unsigned = FALSE;
integer ssize = 32;

64-bit

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 1 1 1 1 imm4 1 1 1 1 1 0 pattern Rdn
size<1>size<0> sf D U

SQDECD <Xdn>{, <pattern>{, MUL #<imm>}}

if !HaveSVE() then UNDEFINED;

integer esize = 64;
integer dn = UInt(Rdn);
bits(5) pat = pattern;
integer imm = UInt(imm4) + 1;
boolean unsigned = FALSE;
integer ssize = 64;

Assembler Symbols

<pattern> Is the optional pattern specifier, defaulting to ALL, encoded in “pattern”:

SQDECD (scalar) Page 2354

<imm> Is the immediate multiplier, in the range 1 to 16, defaulting to 1, encoded in the "imm4" field.

Operation

CheckSVEEnabled();
integer count = DecodePredCount(pat, esize);
bits(ssize) operand1 = X[dn];
bits(ssize) result;

integer element1 = Int(operand1, unsigned);

(result, -) = SatQ(element1 - (count * imm), ssize, unsigned);
X[dn] = Extend(result, 64, unsigned);

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

SQDECD (scalar) Page 2355

SQDECD (vector)

Signed saturating decrement vector by multiple of 64-bit predicate constraint element count

Determines the number of active 64-bit elements implied by the named predicate constraint, multiplies that by an
immediate in the range 1 to 16 inclusive, and then uses the result to decrement all destination vector elements. The
results are saturated to the 64-bit signed integer range.
The named predicate constraint limits the number of active elements in a single predicate to:
* A fixed number (VL1 to VL256)
* The largest power of two (POW2)
* The largest multiple of three or four (MUL3 or MUL4)
* All available, implicitly a multiple of two (ALL).
Unspecified or out of range constraint encodings generate an empty predicate or zero element count rather than
Undefined Instruction exception.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 1 1 1 0 imm4 1 1 0 0 1 0 pattern Zdn
size<1>size<0> D U

SQDECD <Zdn>.D{, <pattern>{, MUL #<imm>}}

if !HaveSVE() then UNDEFINED;

integer esize = 64;
integer dn = UInt(Zdn);
bits(5) pat = pattern;
integer imm = UInt(imm4) + 1;
boolean unsigned = FALSE;

Assembler Symbols

<Zdn> Is the name of the source and destination scalable vector register, encoded in the "Zdn" field.

<pattern> Is the optional pattern specifier, defaulting to ALL, encoded in “pattern”:

<imm> Is the immediate multiplier, in the range 1 to 16, defaulting to 1, encoded in the "imm4" field.

SQDECD (vector) Page 2356

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
integer count = DecodePredCount(pat, esize);
bits(VL) operand1 = Z[dn];
bits(VL) result;

for e = 0 to elements-1
integer element1 = Int(Elem[operand1, e, esize], unsigned);
(Elem[result, e, esize], -) = SatQ(element1 - (count * imm), esize, unsigned);

Z[dn] = result;

Operational information

This instruction might be immediately preceded in program order by a MOVPRFX instruction. The MOVPRFX instruction
must conform to all of the following requirements, otherwise the behavior of the MOVPRFX and this instruction is
UNPREDICTABLE:
• The MOVPRFX instruction must be unpredicated.
• The MOVPRFX instruction must specify the same destination register as this instruction.
• The destination register must not refer to architectural register state referenced by any other source operand
register of this instruction.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

SQDECD (vector) Page 2357

SQDECH (scalar)

Signed saturating decrement scalar by multiple of 16-bit predicate constraint element count

Determines the number of active 16-bit elements implied by the named predicate constraint, multiplies that by an
immediate in the range 1 to 16 inclusive, and then uses the result to decrement the scalar destination. The result is
saturated to the source general-purpose register's signed integer range. A 32-bit saturated result is then sign-
extended to 64 bits.
The named predicate constraint limits the number of active elements in a single predicate to:
* A fixed number (VL1 to VL256)
* The largest power of two (POW2)
* The largest multiple of three or four (MUL3 or MUL4)
* All available, implicitly a multiple of two (ALL).
Unspecified or out of range constraint encodings generate an empty predicate or zero element count rather than
Undefined Instruction exception.
It has encodings from 2 classes: 32-bit and 64-bit

32-bit

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 0 1 1 0 imm4 1 1 1 1 1 0 pattern Rdn
size<1>size<0> sf D U

SQDECH <Xdn>, <Wdn>{, <pattern>{, MUL #<imm>}}

if !HaveSVE() then UNDEFINED;

integer esize = 16;
integer dn = UInt(Rdn);
bits(5) pat = pattern;
integer imm = UInt(imm4) + 1;
boolean unsigned = FALSE;
integer ssize = 32;

64-bit

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 0 1 1 1 imm4 1 1 1 1 1 0 pattern Rdn
size<1>size<0> sf D U

SQDECH <Xdn>{, <pattern>{, MUL #<imm>}}

if !HaveSVE() then UNDEFINED;

integer esize = 16;
integer dn = UInt(Rdn);
bits(5) pat = pattern;
integer imm = UInt(imm4) + 1;
boolean unsigned = FALSE;
integer ssize = 64;

Assembler Symbols

<pattern> Is the optional pattern specifier, defaulting to ALL, encoded in “pattern”:

SQDECH (scalar) Page 2358

<imm> Is the immediate multiplier, in the range 1 to 16, defaulting to 1, encoded in the "imm4" field.

Operation

CheckSVEEnabled();
integer count = DecodePredCount(pat, esize);
bits(ssize) operand1 = X[dn];
bits(ssize) result;

integer element1 = Int(operand1, unsigned);

(result, -) = SatQ(element1 - (count * imm), ssize, unsigned);
X[dn] = Extend(result, 64, unsigned);

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

SQDECH (scalar) Page 2359

SQDECH (vector)

Signed saturating decrement vector by multiple of 16-bit predicate constraint element count

Determines the number of active 16-bit elements implied by the named predicate constraint, multiplies that by an
immediate in the range 1 to 16 inclusive, and then uses the result to decrement all destination vector elements. The
results are saturated to the 16-bit signed integer range.
The named predicate constraint limits the number of active elements in a single predicate to:
* A fixed number (VL1 to VL256)
* The largest power of two (POW2)
* The largest multiple of three or four (MUL3 or MUL4)
* All available, implicitly a multiple of two (ALL).
Unspecified or out of range constraint encodings generate an empty predicate or zero element count rather than
Undefined Instruction exception.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 0 1 1 0 imm4 1 1 0 0 1 0 pattern Zdn
size<1>size<0> D U

SQDECH <Zdn>.H{, <pattern>{, MUL #<imm>}}

if !HaveSVE() then UNDEFINED;

integer esize = 16;
integer dn = UInt(Zdn);
bits(5) pat = pattern;
integer imm = UInt(imm4) + 1;
boolean unsigned = FALSE;

Assembler Symbols

<Zdn> Is the name of the source and destination scalable vector register, encoded in the "Zdn" field.

<pattern> Is the optional pattern specifier, defaulting to ALL, encoded in “pattern”:

<imm> Is the immediate multiplier, in the range 1 to 16, defaulting to 1, encoded in the "imm4" field.

SQDECH (vector) Page 2360

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
integer count = DecodePredCount(pat, esize);
bits(VL) operand1 = Z[dn];
bits(VL) result;

for e = 0 to elements-1
integer element1 = Int(Elem[operand1, e, esize], unsigned);
(Elem[result, e, esize], -) = SatQ(element1 - (count * imm), esize, unsigned);

Z[dn] = result;

Operational information

This instruction might be immediately preceded in program order by a MOVPRFX instruction. The MOVPRFX instruction
must conform to all of the following requirements, otherwise the behavior of the MOVPRFX and this instruction is
UNPREDICTABLE:
• The MOVPRFX instruction must be unpredicated.
• The MOVPRFX instruction must specify the same destination register as this instruction.
• The destination register must not refer to architectural register state referenced by any other source operand
register of this instruction.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

SQDECH (vector) Page 2361

SQDECP (scalar)

Signed saturating decrement scalar by count of true predicate elements

Counts the number of true elements in the source predicate and then uses the result to decrement the scalar
destination. The result is saturated to the source general-purpose register's signed integer range. A 32-bit saturated
result is then sign-extended to 64 bits.
It has encodings from 2 classes: 32-bit and 64-bit

32-bit

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 size 1 0 1 0 1 0 1 0 0 0 1 0 0 Pm Rdn
D U sf

SQDECP <Xdn>, <Pm>.<T>, <Wdn>

if !HaveSVE() then UNDEFINED;

integer esize = 8 << UInt(size);
integer m = UInt(Pm);
integer dn = UInt(Rdn);
boolean unsigned = FALSE;
integer ssize = 32;

64-bit

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 size 1 0 1 0 1 0 1 0 0 0 1 1 0 Pm Rdn
D U sf

SQDECP <Xdn>, <Pm>.<T>

if !HaveSVE() then UNDEFINED;

integer esize = 8 << UInt(size);
integer m = UInt(Pm);
integer dn = UInt(Rdn);
boolean unsigned = FALSE;
integer ssize = 64;

Assembler Symbols

<Xdn> Is the 64-bit name of the source and destination general-purpose register, encoded in the "Rdn" field.
<Pm> Is the name of the source scalable predicate register, encoded in the "Pm" field.

<T> Is the size specifier, encoded in “size”:

size <T>
00 B
01 H
10 S
11 D

<Wdn> Is the 32-bit name of the source and destination general-purpose register, encoded in the "Rdn" field.

SQDECP (scalar) Page 2362

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(ssize) operand1 = X[dn];
bits(PL) operand2 = P[m];
bits(ssize) result;
integer count = 0;

for e = 0 to elements-1
if ElemP[operand2, e, esize] == '1' then
count = count + 1;

integer element = Int(operand1, unsigned);

(result, -) = SatQ(element - count, ssize, unsigned);
X[dn] = Extend(result, 64, unsigned);

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

SQDECP (scalar) Page 2363

SQDECP (vector)

Signed saturating decrement vector by count of true predicate elements

Counts the number of true elements in the source predicate and then uses the result to decrement all destination
vector elements. The results are saturated to the element signed integer range.
The predicate size specifier may be omitted in assembler source code, but this is deprecated and will be prohibited in
a future release of the architecture.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 size 1 0 1 0 1 0 1 0 0 0 0 0 0 Pm Zdn
D U

SQDECP <Zdn>.<T>, <Pm>.<T>

if !HaveSVE() then UNDEFINED;

if size == '00' then UNDEFINED;
integer esize = 8 << UInt(size);
integer m = UInt(Pm);
integer dn = UInt(Zdn);
boolean unsigned = FALSE;

Assembler Symbols

<Zdn> Is the name of the source and destination scalable vector register, encoded in the "Zdn" field.

<T> Is the size specifier, encoded in “size”:

size <T>
00 RESERVED
01 H
10 S
11 D

<Pm> Is the name of the source scalable predicate register, encoded in the "Pm" field.

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(VL) operand1 = Z[dn];
bits(PL) operand2 = P[m];
bits(VL) result;
integer count = 0;

for e = 0 to elements-1
if ElemP[operand2, e, esize] == '1' then
count = count + 1;

for e = 0 to elements-1
integer element = Int(Elem[operand1, e, esize], unsigned);
(Elem[result, e, esize], -) = SatQ(element - count, esize, unsigned);

Z[dn] = result;

Operational information

This instruction might be immediately preceded in program order by a MOVPRFX instruction. The MOVPRFX instruction
must conform to all of the following requirements, otherwise the behavior of the MOVPRFX and this instruction is
UNPREDICTABLE:
• The MOVPRFX instruction must be unpredicated.
• The MOVPRFX instruction must specify the same destination register as this instruction.
• The destination register must not refer to architectural register state referenced by any other source operand
register of this instruction.

SQDECP (vector) Page 2364

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

SQDECP (vector) Page 2365

SQDECW (scalar)

Signed saturating decrement scalar by multiple of 32-bit predicate constraint element count

Determines the number of active 32-bit elements implied by the named predicate constraint, multiplies that by an
immediate in the range 1 to 16 inclusive, and then uses the result to decrement the scalar destination. The result is
saturated to the source general-purpose register's signed integer range. A 32-bit saturated result is then sign-
extended to 64 bits.
The named predicate constraint limits the number of active elements in a single predicate to:
* A fixed number (VL1 to VL256)
* The largest power of two (POW2)
* The largest multiple of three or four (MUL3 or MUL4)
* All available, implicitly a multiple of two (ALL).
Unspecified or out of range constraint encodings generate an empty predicate or zero element count rather than
Undefined Instruction exception.
It has encodings from 2 classes: 32-bit and 64-bit

32-bit

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 1 0 1 0 imm4 1 1 1 1 1 0 pattern Rdn
size<1>size<0> sf D U

SQDECW <Xdn>, <Wdn>{, <pattern>{, MUL #<imm>}}

if !HaveSVE() then UNDEFINED;

integer esize = 32;
integer dn = UInt(Rdn);
bits(5) pat = pattern;
integer imm = UInt(imm4) + 1;
boolean unsigned = FALSE;
integer ssize = 32;

64-bit

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 1 0 1 1 imm4 1 1 1 1 1 0 pattern Rdn
size<1>size<0> sf D U

SQDECW <Xdn>{, <pattern>{, MUL #<imm>}}

if !HaveSVE() then UNDEFINED;

integer esize = 32;
integer dn = UInt(Rdn);
bits(5) pat = pattern;
integer imm = UInt(imm4) + 1;
boolean unsigned = FALSE;
integer ssize = 64;

Assembler Symbols

<pattern> Is the optional pattern specifier, defaulting to ALL, encoded in “pattern”:

SQDECW (scalar) Page 2366

<imm> Is the immediate multiplier, in the range 1 to 16, defaulting to 1, encoded in the "imm4" field.

Operation

CheckSVEEnabled();
integer count = DecodePredCount(pat, esize);
bits(ssize) operand1 = X[dn];
bits(ssize) result;

integer element1 = Int(operand1, unsigned);

(result, -) = SatQ(element1 - (count * imm), ssize, unsigned);
X[dn] = Extend(result, 64, unsigned);

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

SQDECW (scalar) Page 2367

SQDECW (vector)

Signed saturating decrement vector by multiple of 32-bit predicate constraint element count

Determines the number of active 32-bit elements implied by the named predicate constraint, multiplies that by an
immediate in the range 1 to 16 inclusive, and then uses the result to decrement all destination vector elements. The
results are saturated to the 32-bit signed integer range.
The named predicate constraint limits the number of active elements in a single predicate to:
* A fixed number (VL1 to VL256)
* The largest power of two (POW2)
* The largest multiple of three or four (MUL3 or MUL4)
* All available, implicitly a multiple of two (ALL).
Unspecified or out of range constraint encodings generate an empty predicate or zero element count rather than
Undefined Instruction exception.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 1 0 1 0 imm4 1 1 0 0 1 0 pattern Zdn
size<1>size<0> D U

SQDECW <Zdn>.S{, <pattern>{, MUL #<imm>}}

if !HaveSVE() then UNDEFINED;

integer esize = 32;
integer dn = UInt(Zdn);
bits(5) pat = pattern;
integer imm = UInt(imm4) + 1;
boolean unsigned = FALSE;

Assembler Symbols

<Zdn> Is the name of the source and destination scalable vector register, encoded in the "Zdn" field.

<pattern> Is the optional pattern specifier, defaulting to ALL, encoded in “pattern”:

<imm> Is the immediate multiplier, in the range 1 to 16, defaulting to 1, encoded in the "imm4" field.

SQDECW (vector) Page 2368

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
integer count = DecodePredCount(pat, esize);
bits(VL) operand1 = Z[dn];
bits(VL) result;

for e = 0 to elements-1
integer element1 = Int(Elem[operand1, e, esize], unsigned);
(Elem[result, e, esize], -) = SatQ(element1 - (count * imm), esize, unsigned);

Z[dn] = result;

Operational information

This instruction might be immediately preceded in program order by a MOVPRFX instruction. The MOVPRFX instruction
must conform to all of the following requirements, otherwise the behavior of the MOVPRFX and this instruction is
UNPREDICTABLE:
• The MOVPRFX instruction must be unpredicated.
• The MOVPRFX instruction must specify the same destination register as this instruction.
• The destination register must not refer to architectural register state referenced by any other source operand
register of this instruction.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

SQDECW (vector) Page 2369

SQINCB

Signed saturating increment scalar by multiple of 8-bit predicate constraint element count

Determines the number of active 8-bit elements implied by the named predicate constraint, multiplies that by an
immediate in the range 1 to 16 inclusive, and then uses the result to increment the scalar destination. The result is
saturated to the source general-purpose register's signed integer range. A 32-bit saturated result is then sign-
extended to 64 bits.
The named predicate constraint limits the number of active elements in a single predicate to:
* A fixed number (VL1 to VL256)
* The largest power of two (POW2)
* The largest multiple of three or four (MUL3 or MUL4)
* All available, implicitly a multiple of two (ALL).
Unspecified or out of range constraint encodings generate an empty predicate or zero element count rather than
Undefined Instruction exception.
It has encodings from 2 classes: 32-bit and 64-bit

32-bit

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 0 0 1 0 imm4 1 1 1 1 0 0 pattern Rdn
size<1>size<0> sf D U

SQINCB <Xdn>, <Wdn>{, <pattern>{, MUL #<imm>}}

if !HaveSVE() then UNDEFINED;

integer esize = 8;
integer dn = UInt(Rdn);
bits(5) pat = pattern;
integer imm = UInt(imm4) + 1;
boolean unsigned = FALSE;
integer ssize = 32;

64-bit

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 0 0 1 1 imm4 1 1 1 1 0 0 pattern Rdn
size<1>size<0> sf D U

SQINCB <Xdn>{, <pattern>{, MUL #<imm>}}

if !HaveSVE() then UNDEFINED;

integer esize = 8;
integer dn = UInt(Rdn);
bits(5) pat = pattern;
integer imm = UInt(imm4) + 1;
boolean unsigned = FALSE;
integer ssize = 64;

Assembler Symbols

<pattern> Is the optional pattern specifier, defaulting to ALL, encoded in “pattern”:

SQINCB Page 2370

<imm> Is the immediate multiplier, in the range 1 to 16, defaulting to 1, encoded in the "imm4" field.

Operation

CheckSVEEnabled();
integer count = DecodePredCount(pat, esize);
bits(ssize) operand1 = X[dn];
bits(ssize) result;

integer element1 = Int(operand1, unsigned);

(result, -) = SatQ(element1 + (count * imm), ssize, unsigned);
X[dn] = Extend(result, 64, unsigned);

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

SQINCB Page 2371

SQINCD (scalar)

Signed saturating increment scalar by multiple of 64-bit predicate constraint element count

Determines the number of active 64-bit elements implied by the named predicate constraint, multiplies that by an
immediate in the range 1 to 16 inclusive, and then uses the result to increment the scalar destination. The result is
saturated to the source general-purpose register's signed integer range. A 32-bit saturated result is then sign-
extended to 64 bits.
The named predicate constraint limits the number of active elements in a single predicate to:
* A fixed number (VL1 to VL256)
* The largest power of two (POW2)
* The largest multiple of three or four (MUL3 or MUL4)
* All available, implicitly a multiple of two (ALL).
Unspecified or out of range constraint encodings generate an empty predicate or zero element count rather than
Undefined Instruction exception.
It has encodings from 2 classes: 32-bit and 64-bit

32-bit

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 1 1 1 0 imm4 1 1 1 1 0 0 pattern Rdn
size<1>size<0> sf D U

SQINCD <Xdn>, <Wdn>{, <pattern>{, MUL #<imm>}}

if !HaveSVE() then UNDEFINED;

integer esize = 64;
integer dn = UInt(Rdn);
bits(5) pat = pattern;
integer imm = UInt(imm4) + 1;
boolean unsigned = FALSE;
integer ssize = 32;

64-bit

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 1 1 1 1 imm4 1 1 1 1 0 0 pattern Rdn
size<1>size<0> sf D U

SQINCD <Xdn>{, <pattern>{, MUL #<imm>}}

if !HaveSVE() then UNDEFINED;

integer esize = 64;
integer dn = UInt(Rdn);
bits(5) pat = pattern;
integer imm = UInt(imm4) + 1;
boolean unsigned = FALSE;
integer ssize = 64;

Assembler Symbols

<pattern> Is the optional pattern specifier, defaulting to ALL, encoded in “pattern”:

SQINCD (scalar) Page 2372

<imm> Is the immediate multiplier, in the range 1 to 16, defaulting to 1, encoded in the "imm4" field.

Operation

CheckSVEEnabled();
integer count = DecodePredCount(pat, esize);
bits(ssize) operand1 = X[dn];
bits(ssize) result;

integer element1 = Int(operand1, unsigned);

(result, -) = SatQ(element1 + (count * imm), ssize, unsigned);
X[dn] = Extend(result, 64, unsigned);

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

SQINCD (scalar) Page 2373

SQINCD (vector)

Signed saturating increment vector by multiple of 64-bit predicate constraint element count

Determines the number of active 64-bit elements implied by the named predicate constraint, multiplies that by an
immediate in the range 1 to 16 inclusive, and then uses the result to increment all destination vector elements. The
results are saturated to the 64-bit signed integer range.
The named predicate constraint limits the number of active elements in a single predicate to:
* A fixed number (VL1 to VL256)
* The largest power of two (POW2)
* The largest multiple of three or four (MUL3 or MUL4)
* All available, implicitly a multiple of two (ALL).
Unspecified or out of range constraint encodings generate an empty predicate or zero element count rather than
Undefined Instruction exception.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 1 1 1 0 imm4 1 1 0 0 0 0 pattern Zdn
size<1>size<0> D U

SQINCD <Zdn>.D{, <pattern>{, MUL #<imm>}}

if !HaveSVE() then UNDEFINED;

integer esize = 64;
integer dn = UInt(Zdn);
bits(5) pat = pattern;
integer imm = UInt(imm4) + 1;
boolean unsigned = FALSE;

Assembler Symbols

<Zdn> Is the name of the source and destination scalable vector register, encoded in the "Zdn" field.

<pattern> Is the optional pattern specifier, defaulting to ALL, encoded in “pattern”:

<imm> Is the immediate multiplier, in the range 1 to 16, defaulting to 1, encoded in the "imm4" field.

SQINCD (vector) Page 2374

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
integer count = DecodePredCount(pat, esize);
bits(VL) operand1 = Z[dn];
bits(VL) result;

for e = 0 to elements-1
integer element1 = Int(Elem[operand1, e, esize], unsigned);
(Elem[result, e, esize], -) = SatQ(element1 + (count * imm), esize, unsigned);

Z[dn] = result;

Operational information

This instruction might be immediately preceded in program order by a MOVPRFX instruction. The MOVPRFX instruction
must conform to all of the following requirements, otherwise the behavior of the MOVPRFX and this instruction is
UNPREDICTABLE:
• The MOVPRFX instruction must be unpredicated.
• The MOVPRFX instruction must specify the same destination register as this instruction.
• The destination register must not refer to architectural register state referenced by any other source operand
register of this instruction.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

SQINCD (vector) Page 2375

SQINCH (scalar)

Signed saturating increment scalar by multiple of 16-bit predicate constraint element count

Determines the number of active 16-bit elements implied by the named predicate constraint, multiplies that by an
immediate in the range 1 to 16 inclusive, and then uses the result to increment the scalar destination. The result is
saturated to the source general-purpose register's signed integer range. A 32-bit saturated result is then sign-
extended to 64 bits.
The named predicate constraint limits the number of active elements in a single predicate to:
* A fixed number (VL1 to VL256)
* The largest power of two (POW2)
* The largest multiple of three or four (MUL3 or MUL4)
* All available, implicitly a multiple of two (ALL).
Unspecified or out of range constraint encodings generate an empty predicate or zero element count rather than
Undefined Instruction exception.
It has encodings from 2 classes: 32-bit and 64-bit

32-bit

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 0 1 1 0 imm4 1 1 1 1 0 0 pattern Rdn
size<1>size<0> sf D U

SQINCH <Xdn>, <Wdn>{, <pattern>{, MUL #<imm>}}

if !HaveSVE() then UNDEFINED;

integer esize = 16;
integer dn = UInt(Rdn);
bits(5) pat = pattern;
integer imm = UInt(imm4) + 1;
boolean unsigned = FALSE;
integer ssize = 32;

64-bit

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 0 1 1 1 imm4 1 1 1 1 0 0 pattern Rdn
size<1>size<0> sf D U

SQINCH <Xdn>{, <pattern>{, MUL #<imm>}}

if !HaveSVE() then UNDEFINED;

integer esize = 16;
integer dn = UInt(Rdn);
bits(5) pat = pattern;
integer imm = UInt(imm4) + 1;
boolean unsigned = FALSE;
integer ssize = 64;

Assembler Symbols

<pattern> Is the optional pattern specifier, defaulting to ALL, encoded in “pattern”:

SQINCH (scalar) Page 2376

<imm> Is the immediate multiplier, in the range 1 to 16, defaulting to 1, encoded in the "imm4" field.

Operation

CheckSVEEnabled();
integer count = DecodePredCount(pat, esize);
bits(ssize) operand1 = X[dn];
bits(ssize) result;

integer element1 = Int(operand1, unsigned);

(result, -) = SatQ(element1 + (count * imm), ssize, unsigned);
X[dn] = Extend(result, 64, unsigned);

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

SQINCH (scalar) Page 2377

SQINCH (vector)

Signed saturating increment vector by multiple of 16-bit predicate constraint element count

Determines the number of active 16-bit elements implied by the named predicate constraint, multiplies that by an
immediate in the range 1 to 16 inclusive, and then uses the result to increment all destination vector elements. The
results are saturated to the 16-bit signed integer range.
The named predicate constraint limits the number of active elements in a single predicate to:
* A fixed number (VL1 to VL256)
* The largest power of two (POW2)
* The largest multiple of three or four (MUL3 or MUL4)
* All available, implicitly a multiple of two (ALL).
Unspecified or out of range constraint encodings generate an empty predicate or zero element count rather than
Undefined Instruction exception.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 0 1 1 0 imm4 1 1 0 0 0 0 pattern Zdn
size<1>size<0> D U

SQINCH <Zdn>.H{, <pattern>{, MUL #<imm>}}

if !HaveSVE() then UNDEFINED;

integer esize = 16;
integer dn = UInt(Zdn);
bits(5) pat = pattern;
integer imm = UInt(imm4) + 1;
boolean unsigned = FALSE;

Assembler Symbols

<Zdn> Is the name of the source and destination scalable vector register, encoded in the "Zdn" field.

<pattern> Is the optional pattern specifier, defaulting to ALL, encoded in “pattern”:

<imm> Is the immediate multiplier, in the range 1 to 16, defaulting to 1, encoded in the "imm4" field.

SQINCH (vector) Page 2378

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
integer count = DecodePredCount(pat, esize);
bits(VL) operand1 = Z[dn];
bits(VL) result;

for e = 0 to elements-1
integer element1 = Int(Elem[operand1, e, esize], unsigned);
(Elem[result, e, esize], -) = SatQ(element1 + (count * imm), esize, unsigned);

Z[dn] = result;

Operational information

This instruction might be immediately preceded in program order by a MOVPRFX instruction. The MOVPRFX instruction
must conform to all of the following requirements, otherwise the behavior of the MOVPRFX and this instruction is
UNPREDICTABLE:
• The MOVPRFX instruction must be unpredicated.
• The MOVPRFX instruction must specify the same destination register as this instruction.
• The destination register must not refer to architectural register state referenced by any other source operand
register of this instruction.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

SQINCH (vector) Page 2379

SQINCP (scalar)

Signed saturating increment scalar by count of true predicate elements

Counts the number of true elements in the source predicate and then uses the result to increment the scalar
destination. The result is saturated to the source general-purpose register's signed integer range. A 32-bit saturated
result is then sign-extended to 64 bits.
It has encodings from 2 classes: 32-bit and 64-bit

32-bit

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 size 1 0 1 0 0 0 1 0 0 0 1 0 0 Pm Rdn
D U sf

SQINCP <Xdn>, <Pm>.<T>, <Wdn>

if !HaveSVE() then UNDEFINED;

integer esize = 8 << UInt(size);
integer m = UInt(Pm);
integer dn = UInt(Rdn);
boolean unsigned = FALSE;
integer ssize = 32;

64-bit

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 size 1 0 1 0 0 0 1 0 0 0 1 1 0 Pm Rdn
D U sf

SQINCP <Xdn>, <Pm>.<T>

if !HaveSVE() then UNDEFINED;

integer esize = 8 << UInt(size);
integer m = UInt(Pm);
integer dn = UInt(Rdn);
boolean unsigned = FALSE;
integer ssize = 64;

Assembler Symbols

<Xdn> Is the 64-bit name of the source and destination general-purpose register, encoded in the "Rdn" field.
<Pm> Is the name of the source scalable predicate register, encoded in the "Pm" field.

<T> Is the size specifier, encoded in “size”:

size <T>
00 B
01 H
10 S
11 D

<Wdn> Is the 32-bit name of the source and destination general-purpose register, encoded in the "Rdn" field.

SQINCP (scalar) Page 2380

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(ssize) operand1 = X[dn];
bits(PL) operand2 = P[m];
bits(ssize) result;
integer count = 0;

for e = 0 to elements-1
if ElemP[operand2, e, esize] == '1' then
count = count + 1;

integer element = Int(operand1, unsigned);

(result, -) = SatQ(element + count, ssize, unsigned);
X[dn] = Extend(result, 64, unsigned);

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

SQINCP (scalar) Page 2381

SQINCP (vector)

Signed saturating increment vector by count of true predicate elements

Counts the number of true elements in the source predicate and then uses the result to increment all destination
vector elements. The results are saturated to the element signed integer range.
The predicate size specifier may be omitted in assembler source code, but this is deprecated and will be prohibited in
a future release of the architecture.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 size 1 0 1 0 0 0 1 0 0 0 0 0 0 Pm Zdn
D U

SQINCP <Zdn>.<T>, <Pm>.<T>

if !HaveSVE() then UNDEFINED;

if size == '00' then UNDEFINED;
integer esize = 8 << UInt(size);
integer m = UInt(Pm);
integer dn = UInt(Zdn);
boolean unsigned = FALSE;

Assembler Symbols

<Zdn> Is the name of the source and destination scalable vector register, encoded in the "Zdn" field.

<T> Is the size specifier, encoded in “size”:

size <T>
00 RESERVED
01 H
10 S
11 D

<Pm> Is the name of the source scalable predicate register, encoded in the "Pm" field.

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(VL) operand1 = Z[dn];
bits(PL) operand2 = P[m];
bits(VL) result;
integer count = 0;

for e = 0 to elements-1
if ElemP[operand2, e, esize] == '1' then
count = count + 1;

for e = 0 to elements-1
integer element = Int(Elem[operand1, e, esize], unsigned);
(Elem[result, e, esize], -) = SatQ(element + count, esize, unsigned);

Z[dn] = result;

Operational information

This instruction might be immediately preceded in program order by a MOVPRFX instruction. The MOVPRFX instruction
must conform to all of the following requirements, otherwise the behavior of the MOVPRFX and this instruction is
UNPREDICTABLE:
• The MOVPRFX instruction must be unpredicated.
• The MOVPRFX instruction must specify the same destination register as this instruction.
• The destination register must not refer to architectural register state referenced by any other source operand
register of this instruction.

SQINCP (vector) Page 2382

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

SQINCP (vector) Page 2383

SQINCW (scalar)

Signed saturating increment scalar by multiple of 32-bit predicate constraint element count

Determines the number of active 32-bit elements implied by the named predicate constraint, multiplies that by an
immediate in the range 1 to 16 inclusive, and then uses the result to increment the scalar destination. The result is
saturated to the source general-purpose register's signed integer range. A 32-bit saturated result is then sign-
extended to 64 bits.
The named predicate constraint limits the number of active elements in a single predicate to:
* A fixed number (VL1 to VL256)
* The largest power of two (POW2)
* The largest multiple of three or four (MUL3 or MUL4)
* All available, implicitly a multiple of two (ALL).
Unspecified or out of range constraint encodings generate an empty predicate or zero element count rather than
Undefined Instruction exception.
It has encodings from 2 classes: 32-bit and 64-bit

32-bit

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 1 0 1 0 imm4 1 1 1 1 0 0 pattern Rdn
size<1>size<0> sf D U

SQINCW <Xdn>, <Wdn>{, <pattern>{, MUL #<imm>}}

if !HaveSVE() then UNDEFINED;

integer esize = 32;
integer dn = UInt(Rdn);
bits(5) pat = pattern;
integer imm = UInt(imm4) + 1;
boolean unsigned = FALSE;
integer ssize = 32;

64-bit

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 1 0 1 1 imm4 1 1 1 1 0 0 pattern Rdn
size<1>size<0> sf D U

SQINCW <Xdn>{, <pattern>{, MUL #<imm>}}

if !HaveSVE() then UNDEFINED;

integer esize = 32;
integer dn = UInt(Rdn);
bits(5) pat = pattern;
integer imm = UInt(imm4) + 1;
boolean unsigned = FALSE;
integer ssize = 64;

Assembler Symbols

<pattern> Is the optional pattern specifier, defaulting to ALL, encoded in “pattern”:

SQINCW (scalar) Page 2384

<imm> Is the immediate multiplier, in the range 1 to 16, defaulting to 1, encoded in the "imm4" field.

Operation

CheckSVEEnabled();
integer count = DecodePredCount(pat, esize);
bits(ssize) operand1 = X[dn];
bits(ssize) result;

integer element1 = Int(operand1, unsigned);

(result, -) = SatQ(element1 + (count * imm), ssize, unsigned);
X[dn] = Extend(result, 64, unsigned);

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

SQINCW (scalar) Page 2385

SQINCW (vector)

Signed saturating increment vector by multiple of 32-bit predicate constraint element count

Determines the number of active 32-bit elements implied by the named predicate constraint, multiplies that by an
immediate in the range 1 to 16 inclusive, and then uses the result to increment all destination vector elements. The
results are saturated to the 32-bit signed integer range.
The named predicate constraint limits the number of active elements in a single predicate to:
* A fixed number (VL1 to VL256)
* The largest power of two (POW2)
* The largest multiple of three or four (MUL3 or MUL4)
* All available, implicitly a multiple of two (ALL).
Unspecified or out of range constraint encodings generate an empty predicate or zero element count rather than
Undefined Instruction exception.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 1 0 1 0 imm4 1 1 0 0 0 0 pattern Zdn
size<1>size<0> D U

SQINCW <Zdn>.S{, <pattern>{, MUL #<imm>}}

if !HaveSVE() then UNDEFINED;

integer esize = 32;
integer dn = UInt(Zdn);
bits(5) pat = pattern;
integer imm = UInt(imm4) + 1;
boolean unsigned = FALSE;

Assembler Symbols

<Zdn> Is the name of the source and destination scalable vector register, encoded in the "Zdn" field.

<pattern> Is the optional pattern specifier, defaulting to ALL, encoded in “pattern”:

<imm> Is the immediate multiplier, in the range 1 to 16, defaulting to 1, encoded in the "imm4" field.

SQINCW (vector) Page 2386

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
integer count = DecodePredCount(pat, esize);
bits(VL) operand1 = Z[dn];
bits(VL) result;

for e = 0 to elements-1
integer element1 = Int(Elem[operand1, e, esize], unsigned);
(Elem[result, e, esize], -) = SatQ(element1 + (count * imm), esize, unsigned);

Z[dn] = result;

Operational information

This instruction might be immediately preceded in program order by a MOVPRFX instruction. The MOVPRFX instruction
must conform to all of the following requirements, otherwise the behavior of the MOVPRFX and this instruction is
UNPREDICTABLE:
• The MOVPRFX instruction must be unpredicated.
• The MOVPRFX instruction must specify the same destination register as this instruction.
• The destination register must not refer to architectural register state referenced by any other source operand
register of this instruction.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

SQINCW (vector) Page 2387

SQSUB (immediate)

Signed saturating subtract immediate (unpredicated)

Signed saturating subtract of an unsigned immediate from each element of the source vector, and destructively place
the results in the corresponding elements of the source vector. Each result element is saturated to the N-bit element's
signed integer range -2(N-1) to (2(N-1) )-1. This instruction is unpredicated.
The immediate is an unsigned value in the range 0 to 255, and for element widths of 16 bits or higher it may also be a
positive multiple of 256 in the range 256 to 65280.
The immediate is encoded in 8 bits with an optional left shift by 8. The preferred disassembly when the shift option is
specified is "#<uimm8>, LSL #8". However an assembler and disassembler may also allow use of the shifted 16-bit
value unless the immediate is 0 and the shift amount is 8, which must be unambiguously described as "#0, LSL #8".
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 size 1 0 0 1 1 0 1 1 sh imm8 Zdn
U

SQSUB <Zdn>.<T>, <Zdn>.<T>, #<imm>{, <shift>}

if !HaveSVE() then UNDEFINED;

if size:sh == '001' then UNDEFINED;
integer esize = 8 << UInt(size);
integer dn = UInt(Zdn);
integer imm = UInt(imm8);
if sh == '1' then imm = imm << 8;
boolean unsigned = FALSE;

Assembler Symbols

<Zdn> Is the name of the source and destination scalable vector register, encoded in the "Zdn" field.

<T> Is the size specifier, encoded in “size”:

size <T>
00 B
01 H
10 S
11 D

<imm> Is an unsigned immediate in the range 0 to 255, encoded in the "imm8" field.

<shift> Is the optional left shift to apply to the immediate, defaulting to LSL #0 and encoded in “sh”:

sh <shift>
0 LSL #0
1 LSL #8

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(VL) operand1 = Z[dn];
bits(VL) result;

for e = 0 to elements-1
integer element1 = Int(Elem[operand1, e, esize], unsigned);
(Elem[result, e, esize], -) = SatQ(element1 - imm, esize, unsigned);

Z[dn] = result;

SQSUB (immediate) Page 2388

Operational information

This instruction might be immediately preceded in program order by a MOVPRFX instruction. The MOVPRFX instruction
must conform to all of the following requirements, otherwise the behavior of the MOVPRFX and this instruction is
UNPREDICTABLE:
• The MOVPRFX instruction must be unpredicated.
• The MOVPRFX instruction must specify the same destination register as this instruction.
• The destination register must not refer to architectural register state referenced by any other source operand
register of this instruction.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

SQSUB (immediate) Page 2389

SQSUB (vectors)

Signed saturating subtract vectors (unpredicated)

Signed saturating subtract all elements of the second source vector from corresponding elements of the first source
vector and place the results in the corresponding elements of the destination vector. Each result element is saturated
to the N-bit element's signed integer range -2(N-1) to (2(N-1) )-1. This instruction is unpredicated.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 1 Zm 0 0 0 1 1 0 Zn Zd
U

SQSUB <Zd>.<T>, <Zn>.<T>, <Zm>.<T>

if !HaveSVE() then UNDEFINED;

integer esize = 8 << UInt(size);
integer n = UInt(Zn);
integer m = UInt(Zm);
integer d = UInt(Zd);
boolean unsigned = FALSE;

Assembler Symbols

<Zd> Is the name of the destination scalable vector register, encoded in the "Zd" field.

<T> Is the size specifier, encoded in “size”:

size <T>
00 B
01 H
10 S
11 D

<Zn> Is the name of the first source scalable vector register, encoded in the "Zn" field.
<Zm> Is the name of the second source scalable vector register, encoded in the "Zm" field.

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(VL) operand1 = Z[n];
bits(VL) operand2 = Z[m];
bits(VL) result;

Z[d] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

SQSUB (vectors) Page 2390

ST1B (scalar plus immediate)

Contiguous store bytes from vector (immediate index)

Contiguous store of bytes from elements of a vector register to the memory address generated by a 64-bit scalar base
and immediate index in the range -8 to 7 which is multiplied by the vector's in-memory size, irrespective of
predication, and added to the base address. Inactive elements are not written to memory.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 0 0 1 0 0 0 size 0 imm4 1 1 1 Pg Rn Zt
msz<1>msz<0>

ST1B { <Zt>.<T> }, <Pg>, [<Xn|SP>{, #<imm>, MUL VL}]

if !HaveSVE() then UNDEFINED;

integer t = UInt(Zt);
integer n = UInt(Rn);
integer g = UInt(Pg);
integer esize = 8 << UInt(size);
integer msize = 8;
integer offset = SInt(imm4);

Assembler Symbols

<Zt> Is the name of the scalable vector register to be transferred, encoded in the "Zt" field.

<T> Is the size specifier, encoded in “size”:

size <T>
00 B
01 H
10 S
11 D

<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<imm> Is the optional signed immediate vector offset, in the range -8 to 7, defaulting to 0, encoded in the
"imm4" field.

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(64) base;
bits(PL) mask = P[g];
bits(VL) src;
constant integer mbytes = msize DIV 8;

if HaveMTEExt() then SetTagCheckedInstruction(n != 31);

if !AnyActiveElement(mask, esize) then

if n == 31 && ConstrainUnpredictableBool(Unpredictable_CHECKSPNONEACTIVE) then
CheckSPAlignment();
else
if n == 31 then CheckSPAlignment();
base = if n == 31 then SP[] else X[n];
src = Z[t];

for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
integer eoff = (offset * elements) + e;
bits(64) addr = base + eoff * mbytes;
Mem[addr, mbytes, AccType_SVE] = Elem[src, e, esize]<msize-1:0>;

ST1B (scalar plus immediate) Page 2391

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

ST1B (scalar plus immediate) Page 2392

ST1B (scalar plus scalar)

Contiguous store bytes from vector (scalar index)

Contiguous store of bytes from elements of a vector register to the memory address generated by a 64-bit scalar base
and scalar index which is added to the base address. After each element access the index value is incremented, but the
index register is not updated. Inactive elements are not written to memory.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 0 0 1 0 0 0 size Rm 0 1 0 Pg Rn Zt

ST1B { <Zt>.<T> }, <Pg>, [<Xn|SP>, <Xm>]

if !HaveSVE() then UNDEFINED;

if Rm == '11111' then UNDEFINED;
integer t = UInt(Zt);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer g = UInt(Pg);
integer esize = 8 << UInt(size);
integer msize = 8;

Assembler Symbols

<Zt> Is the name of the scalable vector register to be transferred, encoded in the "Zt" field.

<T> Is the size specifier, encoded in “size”:

size <T>
00 B
01 H
10 S
11 D

<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<Xm> Is the 64-bit name of the general-purpose offset register, encoded in the "Rm" field.

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(64) base;
bits(PL) mask = P[g];
bits(64) offset;
bits(VL) src;
constant integer mbytes = msize DIV 8;

if HaveMTEExt() then SetTagCheckedInstruction(TRUE);

if !AnyActiveElement(mask, esize) then

for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
bits(64) addr = base + (UInt(offset) + e) * mbytes;
Mem[addr, mbytes, AccType_SVE] = Elem[src, e, esize]<msize-1:0>;

ST1B (scalar plus scalar) Page 2393

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

ST1B (scalar plus scalar) Page 2394

ST1B (scalar plus vector)

Scatter store bytes from a vector (vector index)

Scatter store of bytes from the active elements of a vector register to the memory addresses generated by a 64-bit
scalar base plus vector index. The index values are optionally sign or zero-extended from 32 to 64 bits. Inactive
elements are not written to memory.
It has encodings from 3 classes: 32-bit unpacked unscaled offset , 32-bit unscaled offset and 64-bit unscaled offset

32-bit unpacked unscaled offset

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 0 0 1 0 0 0 0 0 Zm 1 xs 0 Pg Rn Zt
msz<1>msz<0>

ST1B { <Zt>.D }, <Pg>, [<Xn|SP>, <Zm>.D, <mod>]

if !HaveSVE() then UNDEFINED;

integer t = UInt(Zt);
integer n = UInt(Rn);
integer m = UInt(Zm);
integer g = UInt(Pg);
integer esize = 64;
integer msize = 8;
integer offs_size = 32;
boolean offs_unsigned = xs == '0';
integer scale = 0;

32-bit unscaled offset

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 0 0 1 0 0 0 1 0 Zm 1 xs 0 Pg Rn Zt
msz<1>msz<0>

ST1B { <Zt>.S }, <Pg>, [<Xn|SP>, <Zm>.S, <mod>]

if !HaveSVE() then UNDEFINED;

integer t = UInt(Zt);
integer n = UInt(Rn);
integer m = UInt(Zm);
integer g = UInt(Pg);
integer esize = 32;
integer msize = 8;
integer offs_size = 32;
boolean offs_unsigned = xs == '0';
integer scale = 0;

64-bit unscaled offset

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 0 0 1 0 0 0 0 0 Zm 1 0 1 Pg Rn Zt
msz<1>msz<0>

ST1B (scalar plus vector) Page 2395

ST1B { <Zt>.D }, <Pg>, [<Xn|SP>, <Zm>.D]

if !HaveSVE() then UNDEFINED;

integer t = UInt(Zt);
integer n = UInt(Rn);
integer m = UInt(Zm);
integer g = UInt(Pg);
integer esize = 64;
integer msize = 8;
integer offs_size = 64;
boolean offs_unsigned = TRUE;
integer scale = 0;

Assembler Symbols

<Zt> Is the name of the scalable vector register to be transferred, encoded in the "Zt" field.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<Zm> Is the name of the offset scalable vector register, encoded in the "Zm" field.

<mod> Is the index extend and shift specifier, encoded in “xs”:

xs <mod>
0 UXTW
1 SXTW

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(64) base;
bits(PL) mask = P[g];
bits(VL) offset;
bits(VL) src;
constant integer mbytes = msize DIV 8;

if HaveMTEExt() then SetTagCheckedInstruction(TRUE);

if !AnyActiveElement(mask, esize) then

for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
integer off = Int(Elem[offset, e, esize]<offs_size-1:0>, offs_unsigned);
bits(64) addr = base + (off << scale);
Mem[addr, mbytes, AccType_SVE] = Elem[src, e, esize]<msize-1:0>;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

ST1B (scalar plus vector) Page 2396

ST1B (vector plus immediate)

Scatter store bytes from a vector (immediate index)

Scatter store of bytes from the active elements of a vector register to the memory addresses generated by a vector
base plus immediate index. The index is in the range 0 to 31. Inactive elements are not written to memory.
It has encodings from 2 classes: 32-bit element and 64-bit element

32-bit element

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 0 0 1 0 0 0 1 1 imm5 1 0 1 Pg Zn Zt
msz<1>msz<0>

ST1B { <Zt>.S }, <Pg>, [<Zn>.S{, #<imm>}]

if !HaveSVE() then UNDEFINED;

integer t = UInt(Zt);
integer n = UInt(Zn);
integer g = UInt(Pg);
integer esize = 32;
integer msize = 8;
integer offset = UInt(imm5);

64-bit element

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 0 0 1 0 0 0 1 0 imm5 1 0 1 Pg Zn Zt
msz<1>msz<0>

ST1B { <Zt>.D }, <Pg>, [<Zn>.D{, #<imm>}]

if !HaveSVE() then UNDEFINED;

integer t = UInt(Zt);
integer n = UInt(Zn);
integer g = UInt(Pg);
integer esize = 64;
integer msize = 8;
integer offset = UInt(imm5);

Assembler Symbols

<Zt> Is the name of the scalable vector register to be transferred, encoded in the "Zt" field.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Zn> Is the name of the base scalable vector register, encoded in the "Zn" field.
<imm> Is the optional unsigned immediate byte offset, in the range 0 to 31, defaulting to 0, encoded in the
"imm5" field.

ST1B (vector plus immediate) Page 2397

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) base;
bits(VL) src;
constant integer mbytes = msize DIV 8;

if HaveMTEExt() then SetTagCheckedInstruction(TRUE);

if AnyActiveElement(mask, esize) then

base = Z[n];
src = Z[t];

for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
bits(64) addr = ZeroExtend(Elem[base, e, esize], 64) + offset * mbytes;
Mem[addr, mbytes, AccType_SVE] = Elem[src, e, esize]<msize-1:0>;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

ST1B (vector plus immediate) Page 2398

ST1D (scalar plus immediate)

Contiguous store doublewords from vector (immediate index)

Contiguous store of doublewords from elements of a vector register to the memory address generated by a 64-bit
scalar base and immediate index in the range -8 to 7 which is multiplied by the vector's in-memory size, irrespective of
predication, and added to the base address. Inactive elements are not written to memory.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 0 0 1 0 1 1 size 0 imm4 1 1 1 Pg Rn Zt
msz<1>msz<0>

ST1D { <Zt>.D }, <Pg>, [<Xn|SP>{, #<imm>, MUL VL}]

if !HaveSVE() then UNDEFINED;

if size != '11' then UNDEFINED;
integer t = UInt(Zt);
integer n = UInt(Rn);
integer g = UInt(Pg);
integer esize = 8 << UInt(size);
integer msize = 64;
integer offset = SInt(imm4);

Assembler Symbols

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(64) base;
bits(PL) mask = P[g];
bits(VL) src;
constant integer mbytes = msize DIV 8;

if HaveMTEExt() then SetTagCheckedInstruction(n != 31);

if !AnyActiveElement(mask, esize) then

if n == 31 && ConstrainUnpredictableBool(Unpredictable_CHECKSPNONEACTIVE) then
CheckSPAlignment();
else
if n == 31 then CheckSPAlignment();
base = if n == 31 then SP[] else X[n];
src = Z[t];

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

ST1D (scalar plus immediate) Page 2399

ST1D (scalar plus scalar)

Contiguous store doublewords from vector (scalar index)

Contiguous store of doublewords from elements of a vector register to the memory address generated by a 64-bit
scalar base and scalar index which is multiplied by 8 and added to the base address. After each element access the
index value is incremented, but the index register is not updated. Inactive elements are not written to memory.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 0 0 1 0 1 1 1 1 Rm 0 1 0 Pg Rn Zt

ST1D { <Zt>.D }, <Pg>, [<Xn|SP>, <Xm>, LSL #3]

if !HaveSVE() then UNDEFINED;

if Rm == '11111' then UNDEFINED;
integer t = UInt(Zt);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer g = UInt(Pg);
integer esize = 64;
integer msize = 64;

Assembler Symbols

<Zt> Is the name of the scalable vector register to be transferred, encoded in the "Zt" field.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<Xm> Is the 64-bit name of the general-purpose offset register, encoded in the "Rm" field.

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(64) base;
bits(PL) mask = P[g];
bits(64) offset;
bits(VL) src;
constant integer mbytes = msize DIV 8;

if HaveMTEExt() then SetTagCheckedInstruction(TRUE);

if !AnyActiveElement(mask, esize) then

for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
bits(64) addr = base + (UInt(offset) + e) * mbytes;
Mem[addr, mbytes, AccType_SVE] = Elem[src, e, esize]<msize-1:0>;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

ST1D (scalar plus scalar) Page 2400

ST1D (scalar plus vector)

Scatter store doublewords from a vector (vector index)

Scatter store of doublewords from the active elements of a vector register to the memory addresses generated by a
64-bit scalar base plus vector index. The index values are optionally first sign or zero-extended from 32 to 64 bits and
then optionally multiplied by 8. Inactive elements are not written to memory.
It has encodings from 4 classes: 32-bit unpacked scaled offset , 32-bit unpacked unscaled offset , 64-bit scaled offset
and 64-bit unscaled offset

32-bit unpacked scaled offset

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 0 0 1 0 1 1 0 1 Zm 1 xs 0 Pg Rn Zt
msz<1>msz<0>

ST1D { <Zt>.D }, <Pg>, [<Xn|SP>, <Zm>.D, <mod> #3]

if !HaveSVE() then UNDEFINED;

integer t = UInt(Zt);
integer n = UInt(Rn);
integer m = UInt(Zm);
integer g = UInt(Pg);
integer esize = 64;
integer msize = 64;
integer offs_size = 32;
boolean offs_unsigned = xs == '0';
integer scale = 3;

32-bit unpacked unscaled offset

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 0 0 1 0 1 1 0 0 Zm 1 xs 0 Pg Rn Zt
msz<1>msz<0>

ST1D { <Zt>.D }, <Pg>, [<Xn|SP>, <Zm>.D, <mod>]

if !HaveSVE() then UNDEFINED;

64-bit scaled offset

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 0 0 1 0 1 1 0 1 Zm 1 0 1 Pg Rn Zt
msz<1>msz<0>

ST1D (scalar plus vector) Page 2401

ST1D { <Zt>.D }, <Pg>, [<Xn|SP>, <Zm>.D, LSL #3]

if !HaveSVE() then UNDEFINED;

integer t = UInt(Zt);
integer n = UInt(Rn);
integer m = UInt(Zm);
integer g = UInt(Pg);
integer esize = 64;
integer msize = 64;
integer offs_size = 64;
boolean offs_unsigned = TRUE;
integer scale = 3;

64-bit unscaled offset

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 0 0 1 0 1 1 0 0 Zm 1 0 1 Pg Rn Zt
msz<1>msz<0>

ST1D { <Zt>.D }, <Pg>, [<Xn|SP>, <Zm>.D]

if !HaveSVE() then UNDEFINED;

Assembler Symbols

<Zt> Is the name of the scalable vector register to be transferred, encoded in the "Zt" field.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<Zm> Is the name of the offset scalable vector register, encoded in the "Zm" field.

<mod> Is the index extend and shift specifier, encoded in “xs”:

xs <mod>
0 UXTW
1 SXTW

ST1D (scalar plus vector) Page 2402

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(64) base;
bits(PL) mask = P[g];
bits(VL) offset;
bits(VL) src;
constant integer mbytes = msize DIV 8;

if HaveMTEExt() then SetTagCheckedInstruction(TRUE);

if !AnyActiveElement(mask, esize) then

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

ST1D (scalar plus vector) Page 2403

ST1D (vector plus immediate)

Scatter store doublewords from a vector (immediate index)

Scatter store of doublewords from the active elements of a vector register to the memory addresses generated by a
vector base plus immediate index. The index is a multiple of 8 in the range 0 to 248. Inactive elements are not written
to memory.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 0 0 1 0 1 1 1 0 imm5 1 0 1 Pg Zn Zt
msz<1>msz<0>

ST1D { <Zt>.D }, <Pg>, [<Zn>.D{, #<imm>}]

if !HaveSVE() then UNDEFINED;

integer t = UInt(Zt);
integer n = UInt(Zn);
integer g = UInt(Pg);
integer esize = 64;
integer msize = 64;
integer offset = UInt(imm5);

Assembler Symbols

<Zt> Is the name of the scalable vector register to be transferred, encoded in the "Zt" field.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Zn> Is the name of the base scalable vector register, encoded in the "Zn" field.
<imm> Is the optional unsigned immediate byte offset, a multiple of 8 in the range 0 to 248, defaulting to 0,
encoded in the "imm5" field.

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) base;
bits(VL) src;
constant integer mbytes = msize DIV 8;

if HaveMTEExt() then SetTagCheckedInstruction(TRUE);

if AnyActiveElement(mask, esize) then

base = Z[n];
src = Z[t];

for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
bits(64) addr = ZeroExtend(Elem[base, e, esize], 64) + offset * mbytes;
Mem[addr, mbytes, AccType_SVE] = Elem[src, e, esize]<msize-1:0>;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

ST1D (vector plus immediate) Page 2404

ST1H (scalar plus immediate)

Contiguous store halfwords from vector (immediate index)

Contiguous store of halfwords from elements of a vector register to the memory address generated by a 64-bit scalar
base and immediate index in the range -8 to 7 which is multiplied by the vector's in-memory size, irrespective of
predication, and added to the base address. Inactive elements are not written to memory.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 0 0 1 0 0 1 size 0 imm4 1 1 1 Pg Rn Zt
msz<1>msz<0>

ST1H { <Zt>.<T> }, <Pg>, [<Xn|SP>{, #<imm>, MUL VL}]

if !HaveSVE() then UNDEFINED;

if size == '00' then UNDEFINED;
integer t = UInt(Zt);
integer n = UInt(Rn);
integer g = UInt(Pg);
integer esize = 8 << UInt(size);
integer msize = 16;
integer offset = SInt(imm4);

Assembler Symbols

<Zt> Is the name of the scalable vector register to be transferred, encoded in the "Zt" field.

<T> Is the size specifier, encoded in “size”:

size <T>
00 RESERVED
01 H
10 S
11 D

<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<imm> Is the optional signed immediate vector offset, in the range -8 to 7, defaulting to 0, encoded in the
"imm4" field.

ST1H (scalar plus immediate) Page 2405

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(64) base;
bits(PL) mask = P[g];
bits(VL) src;
constant integer mbytes = msize DIV 8;

if HaveMTEExt() then SetTagCheckedInstruction(n != 31);

if !AnyActiveElement(mask, esize) then

if n == 31 && ConstrainUnpredictableBool(Unpredictable_CHECKSPNONEACTIVE) then
CheckSPAlignment();
else
if n == 31 then CheckSPAlignment();
base = if n == 31 then SP[] else X[n];
src = Z[t];

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

ST1H (scalar plus immediate) Page 2406

ST1H (scalar plus scalar)

Contiguous store halfwords from vector (scalar index)

Contiguous store of halfwords from elements of a vector register to the memory address generated by a 64-bit scalar
base and scalar index which is multiplied by 2 and added to the base address. After each element access the index
value is incremented, but the index register is not updated. Inactive elements are not written to memory.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 0 0 1 0 0 1 size Rm 0 1 0 Pg Rn Zt

ST1H { <Zt>.<T> }, <Pg>, [<Xn|SP>, <Xm>, LSL #1]

if !HaveSVE() then UNDEFINED;

if size == '00' then UNDEFINED;
if Rm == '11111' then UNDEFINED;
integer t = UInt(Zt);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer g = UInt(Pg);
integer esize = 8 << UInt(size);
integer msize = 16;

Assembler Symbols

<Zt> Is the name of the scalable vector register to be transferred, encoded in the "Zt" field.

<T> Is the size specifier, encoded in “size”:

size <T>
00 RESERVED
01 H
10 S
11 D

<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<Xm> Is the 64-bit name of the general-purpose offset register, encoded in the "Rm" field.

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(64) base;
bits(PL) mask = P[g];
bits(64) offset;
bits(VL) src;
constant integer mbytes = msize DIV 8;

if HaveMTEExt() then SetTagCheckedInstruction(TRUE);

if !AnyActiveElement(mask, esize) then

for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
bits(64) addr = base + (UInt(offset) + e) * mbytes;
Mem[addr, mbytes, AccType_SVE] = Elem[src, e, esize]<msize-1:0>;

ST1H (scalar plus scalar) Page 2407

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

ST1H (scalar plus scalar) Page 2408

ST1H (scalar plus vector)

Scatter store halfwords from a vector (vector index)

Scatter store of halfwords from the active elements of a vector register to the memory addresses generated by a 64-bit
scalar base plus vector index. The index values are optionally first sign or zero-extended from 32 to 64 bits and then
optionally multiplied by 2. Inactive elements are not written to memory.
It has encodings from 6 classes: 32-bit scaled offset , 32-bit unpacked scaled offset , 32-bit unpacked unscaled offset ,
32-bit unscaled offset , 64-bit scaled offset and 64-bit unscaled offset

32-bit scaled offset

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 0 0 1 0 0 1 1 1 Zm 1 xs 0 Pg Rn Zt
msz<1>msz<0>

ST1H { <Zt>.S }, <Pg>, [<Xn|SP>, <Zm>.S, <mod> #1]

if !HaveSVE() then UNDEFINED;

integer t = UInt(Zt);
integer n = UInt(Rn);
integer m = UInt(Zm);
integer g = UInt(Pg);
integer esize = 32;
integer msize = 16;
integer offs_size = 32;
boolean offs_unsigned = xs == '0';
integer scale = 1;

32-bit unpacked scaled offset

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 0 0 1 0 0 1 0 1 Zm 1 xs 0 Pg Rn Zt
msz<1>msz<0>

ST1H { <Zt>.D }, <Pg>, [<Xn|SP>, <Zm>.D, <mod> #1]

if !HaveSVE() then UNDEFINED;

integer t = UInt(Zt);
integer n = UInt(Rn);
integer m = UInt(Zm);
integer g = UInt(Pg);
integer esize = 64;
integer msize = 16;
integer offs_size = 32;
boolean offs_unsigned = xs == '0';
integer scale = 1;

32-bit unpacked unscaled offset

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 0 0 1 0 0 1 0 0 Zm 1 xs 0 Pg Rn Zt
msz<1>msz<0>

ST1H (scalar plus vector) Page 2409

ST1H { <Zt>.D }, <Pg>, [<Xn|SP>, <Zm>.D, <mod>]

if !HaveSVE() then UNDEFINED;

32-bit unscaled offset

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 0 0 1 0 0 1 1 0 Zm 1 xs 0 Pg Rn Zt
msz<1>msz<0>

ST1H { <Zt>.S }, <Pg>, [<Xn|SP>, <Zm>.S, <mod>]

if !HaveSVE() then UNDEFINED;

64-bit scaled offset

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 0 0 1 0 0 1 0 1 Zm 1 0 1 Pg Rn Zt
msz<1>msz<0>

ST1H { <Zt>.D }, <Pg>, [<Xn|SP>, <Zm>.D, LSL #1]

if !HaveSVE() then UNDEFINED;

integer t = UInt(Zt);
integer n = UInt(Rn);
integer m = UInt(Zm);
integer g = UInt(Pg);
integer esize = 64;
integer msize = 16;
integer offs_size = 64;
boolean offs_unsigned = TRUE;
integer scale = 1;

64-bit unscaled offset

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 0 0 1 0 0 1 0 0 Zm 1 0 1 Pg Rn Zt
msz<1>msz<0>

ST1H (scalar plus vector) Page 2410

ST1H { <Zt>.D }, <Pg>, [<Xn|SP>, <Zm>.D]

if !HaveSVE() then UNDEFINED;

Assembler Symbols

<Zt> Is the name of the scalable vector register to be transferred, encoded in the "Zt" field.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<Zm> Is the name of the offset scalable vector register, encoded in the "Zm" field.

<mod> Is the index extend and shift specifier, encoded in “xs”:

xs <mod>
0 UXTW
1 SXTW

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(64) base;
bits(PL) mask = P[g];
bits(VL) offset;
bits(VL) src;
constant integer mbytes = msize DIV 8;

if HaveMTEExt() then SetTagCheckedInstruction(TRUE);

if !AnyActiveElement(mask, esize) then

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

ST1H (scalar plus vector) Page 2411

ST1H (vector plus immediate)

Scatter store halfwords from a vector (immediate index)

Scatter store of halfwords from the active elements of a vector register to the memory addresses generated by a
vector base plus immediate index. The index is a multiple of 2 in the range 0 to 62. Inactive elements are not written
to memory.
It has encodings from 2 classes: 32-bit element and 64-bit element

32-bit element

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 0 0 1 0 0 1 1 1 imm5 1 0 1 Pg Zn Zt
msz<1>msz<0>

ST1H { <Zt>.S }, <Pg>, [<Zn>.S{, #<imm>}]

if !HaveSVE() then UNDEFINED;

integer t = UInt(Zt);
integer n = UInt(Zn);
integer g = UInt(Pg);
integer esize = 32;
integer msize = 16;
integer offset = UInt(imm5);

64-bit element

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 0 0 1 0 0 1 1 0 imm5 1 0 1 Pg Zn Zt
msz<1>msz<0>

ST1H { <Zt>.D }, <Pg>, [<Zn>.D{, #<imm>}]

if !HaveSVE() then UNDEFINED;

integer t = UInt(Zt);
integer n = UInt(Zn);
integer g = UInt(Pg);
integer esize = 64;
integer msize = 16;
integer offset = UInt(imm5);

Assembler Symbols

<Zt> Is the name of the scalable vector register to be transferred, encoded in the "Zt" field.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Zn> Is the name of the base scalable vector register, encoded in the "Zn" field.
<imm> Is the optional unsigned immediate byte offset, a multiple of 2 in the range 0 to 62, defaulting to 0,
encoded in the "imm5" field.

ST1H (vector plus immediate) Page 2412

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) base;
bits(VL) src;
constant integer mbytes = msize DIV 8;

if HaveMTEExt() then SetTagCheckedInstruction(TRUE);

if AnyActiveElement(mask, esize) then

base = Z[n];
src = Z[t];

for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
bits(64) addr = ZeroExtend(Elem[base, e, esize], 64) + offset * mbytes;
Mem[addr, mbytes, AccType_SVE] = Elem[src, e, esize]<msize-1:0>;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

ST1H (vector plus immediate) Page 2413

ST1W (scalar plus immediate)

Contiguous store words from vector (immediate index)

Contiguous store of words from elements of a vector register to the memory address generated by a 64-bit scalar base
and immediate index in the range -8 to 7 which is multiplied by the vector's in-memory size, irrespective of
predication, and added to the base address. Inactive elements are not written to memory.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 0 0 1 0 1 0 size 0 imm4 1 1 1 Pg Rn Zt
msz<1>msz<0>

ST1W { <Zt>.<T> }, <Pg>, [<Xn|SP>{, #<imm>, MUL VL}]

if !HaveSVE() then UNDEFINED;

if size != '1x' then UNDEFINED;
integer t = UInt(Zt);
integer n = UInt(Rn);
integer g = UInt(Pg);
integer esize = 8 << UInt(size);
integer msize = 32;
integer offset = SInt(imm4);

Assembler Symbols

<Zt> Is the name of the scalable vector register to be transferred, encoded in the "Zt" field.

<T> Is the size specifier, encoded in “size<0>”:

size<0> <T>
0 S
1 D

<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<imm> Is the optional signed immediate vector offset, in the range -8 to 7, defaulting to 0, encoded in the
"imm4" field.

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(64) base;
bits(PL) mask = P[g];
bits(VL) src;
constant integer mbytes = msize DIV 8;

if HaveMTEExt() then SetTagCheckedInstruction(n != 31);

if !AnyActiveElement(mask, esize) then

if n == 31 && ConstrainUnpredictableBool(Unpredictable_CHECKSPNONEACTIVE) then
CheckSPAlignment();
else
if n == 31 then CheckSPAlignment();
base = if n == 31 then SP[] else X[n];
src = Z[t];

ST1W (scalar plus immediate) Page 2414

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

ST1W (scalar plus immediate) Page 2415

ST1W (scalar plus scalar)

Contiguous store words from vector (scalar index)

Contiguous store of words from elements of a vector register to the memory address generated by a 64-bit scalar base
and scalar index which is multiplied by 4 and added to the base address. After each element access the index value is
incremented, but the index register is not updated. Inactive elements are not written to memory.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 0 0 1 0 1 0 size Rm 0 1 0 Pg Rn Zt

ST1W { <Zt>.<T> }, <Pg>, [<Xn|SP>, <Xm>, LSL #2]

if !HaveSVE() then UNDEFINED;

if size != '1x' then UNDEFINED;
if Rm == '11111' then UNDEFINED;
integer t = UInt(Zt);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer g = UInt(Pg);
integer esize = 8 << UInt(size);
integer msize = 32;

Assembler Symbols

<Zt> Is the name of the scalable vector register to be transferred, encoded in the "Zt" field.

<T> Is the size specifier, encoded in “size<0>”:

size<0> <T>
0 S
1 D

<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<Xm> Is the 64-bit name of the general-purpose offset register, encoded in the "Rm" field.

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(64) base;
bits(PL) mask = P[g];
bits(64) offset;
bits(VL) src;
constant integer mbytes = msize DIV 8;

if HaveMTEExt() then SetTagCheckedInstruction(TRUE);

if !AnyActiveElement(mask, esize) then

for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
bits(64) addr = base + (UInt(offset) + e) * mbytes;
Mem[addr, mbytes, AccType_SVE] = Elem[src, e, esize]<msize-1:0>;

ST1W (scalar plus scalar) Page 2416

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

ST1W (scalar plus scalar) Page 2417

ST1W (scalar plus vector)

Scatter store words from a vector (vector index)

Scatter store of words from the active elements of a vector register to the memory addresses generated by a 64-bit
scalar base plus vector index. The index values are optionally first sign or zero-extended from 32 to 64 bits and then
optionally multiplied by 4. Inactive elements are not written to memory.
It has encodings from 6 classes: 32-bit scaled offset , 32-bit unpacked scaled offset , 32-bit unpacked unscaled offset ,
32-bit unscaled offset , 64-bit scaled offset and 64-bit unscaled offset

32-bit scaled offset

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 0 0 1 0 1 0 1 1 Zm 1 xs 0 Pg Rn Zt
msz<1>msz<0>

ST1W { <Zt>.S }, <Pg>, [<Xn|SP>, <Zm>.S, <mod> #2]

if !HaveSVE() then UNDEFINED;

integer t = UInt(Zt);
integer n = UInt(Rn);
integer m = UInt(Zm);
integer g = UInt(Pg);
integer esize = 32;
integer msize = 32;
integer offs_size = 32;
boolean offs_unsigned = xs == '0';
integer scale = 2;

32-bit unpacked scaled offset

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 0 0 1 0 1 0 0 1 Zm 1 xs 0 Pg Rn Zt
msz<1>msz<0>

ST1W { <Zt>.D }, <Pg>, [<Xn|SP>, <Zm>.D, <mod> #2]

if !HaveSVE() then UNDEFINED;

integer t = UInt(Zt);
integer n = UInt(Rn);
integer m = UInt(Zm);
integer g = UInt(Pg);
integer esize = 64;
integer msize = 32;
integer offs_size = 32;
boolean offs_unsigned = xs == '0';
integer scale = 2;

32-bit unpacked unscaled offset

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 0 0 1 0 1 0 0 0 Zm 1 xs 0 Pg Rn Zt
msz<1>msz<0>

ST1W (scalar plus vector) Page 2418

ST1W { <Zt>.D }, <Pg>, [<Xn|SP>, <Zm>.D, <mod>]

if !HaveSVE() then UNDEFINED;

32-bit unscaled offset

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 0 0 1 0 1 0 1 0 Zm 1 xs 0 Pg Rn Zt
msz<1>msz<0>

ST1W { <Zt>.S }, <Pg>, [<Xn|SP>, <Zm>.S, <mod>]

if !HaveSVE() then UNDEFINED;

64-bit scaled offset

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 0 0 1 0 1 0 0 1 Zm 1 0 1 Pg Rn Zt
msz<1>msz<0>

ST1W { <Zt>.D }, <Pg>, [<Xn|SP>, <Zm>.D, LSL #2]

if !HaveSVE() then UNDEFINED;

integer t = UInt(Zt);
integer n = UInt(Rn);
integer m = UInt(Zm);
integer g = UInt(Pg);
integer esize = 64;
integer msize = 32;
integer offs_size = 64;
boolean offs_unsigned = TRUE;
integer scale = 2;

64-bit unscaled offset

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 0 0 1 0 1 0 0 0 Zm 1 0 1 Pg Rn Zt
msz<1>msz<0>

ST1W (scalar plus vector) Page 2419

ST1W { <Zt>.D }, <Pg>, [<Xn|SP>, <Zm>.D]

if !HaveSVE() then UNDEFINED;

Assembler Symbols

<Zt> Is the name of the scalable vector register to be transferred, encoded in the "Zt" field.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<Zm> Is the name of the offset scalable vector register, encoded in the "Zm" field.

<mod> Is the index extend and shift specifier, encoded in “xs”:

xs <mod>
0 UXTW
1 SXTW

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(64) base;
bits(PL) mask = P[g];
bits(VL) offset;
bits(VL) src;
constant integer mbytes = msize DIV 8;

if HaveMTEExt() then SetTagCheckedInstruction(TRUE);

if !AnyActiveElement(mask, esize) then

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

ST1W (scalar plus vector) Page 2420

ST1W (vector plus immediate)

Scatter store words from a vector (immediate index)

Scatter store of words from the active elements of a vector register to the memory addresses generated by a vector
base plus immediate index. The index is a multiple of 4 in the range 0 to 124. Inactive elements are not written to
memory.
It has encodings from 2 classes: 32-bit element and 64-bit element

32-bit element

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 0 0 1 0 1 0 1 1 imm5 1 0 1 Pg Zn Zt
msz<1>msz<0>

ST1W { <Zt>.S }, <Pg>, [<Zn>.S{, #<imm>}]

if !HaveSVE() then UNDEFINED;

integer t = UInt(Zt);
integer n = UInt(Zn);
integer g = UInt(Pg);
integer esize = 32;
integer msize = 32;
integer offset = UInt(imm5);

64-bit element

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 0 0 1 0 1 0 1 0 imm5 1 0 1 Pg Zn Zt
msz<1>msz<0>

ST1W { <Zt>.D }, <Pg>, [<Zn>.D{, #<imm>}]

if !HaveSVE() then UNDEFINED;

integer t = UInt(Zt);
integer n = UInt(Zn);
integer g = UInt(Pg);
integer esize = 64;
integer msize = 32;
integer offset = UInt(imm5);

Assembler Symbols

<Zt> Is the name of the scalable vector register to be transferred, encoded in the "Zt" field.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Zn> Is the name of the base scalable vector register, encoded in the "Zn" field.
<imm> Is the optional unsigned immediate byte offset, a multiple of 4 in the range 0 to 124, defaulting to 0,
encoded in the "imm5" field.

ST1W (vector plus immediate) Page 2421

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) base;
bits(VL) src;
constant integer mbytes = msize DIV 8;

if HaveMTEExt() then SetTagCheckedInstruction(TRUE);

if AnyActiveElement(mask, esize) then

base = Z[n];
src = Z[t];

for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
bits(64) addr = ZeroExtend(Elem[base, e, esize], 64) + offset * mbytes;
Mem[addr, mbytes, AccType_SVE] = Elem[src, e, esize]<msize-1:0>;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

ST1W (vector plus immediate) Page 2422

ST2B (scalar plus immediate)

Contiguous store two-byte structures from two vectors (immediate index)

Contiguous store two-byte structures, each from the same element number in two vector registers to the memory
address generated by a 64-bit scalar base and an immediate index which is a multiple of 2 in the range -16 to 14 that
is multiplied by the vector's in-memory size, irrespective of predication,
Each predicate element applies to the same element number in each of the two vector registers, or equivalently to the
two consecutive bytes in memory which make up each structure. Inactive structures are not written to memory.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 0 0 1 0 0 0 0 1 1 imm4 1 1 1 Pg Rn Zt
msz<1>msz<0>

ST2B { <Zt1>.B, <Zt2>.B }, <Pg>, [<Xn|SP>{, #<imm>, MUL VL}]

if !HaveSVE() then UNDEFINED;

integer t = UInt(Zt);
integer n = UInt(Rn);
integer g = UInt(Pg);
integer esize = 8;
integer offset = SInt(imm4);
integer nreg = 2;

Assembler Symbols

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(64) base;
bits(PL) mask = P[g];
constant integer mbytes = esize DIV 8;
array [0..1] of bits(VL) values;

if HaveMTEExt() then SetTagCheckedInstruction(n != 31);

if !AnyActiveElement(mask, esize) then

if n == 31 && ConstrainUnpredictableBool(Unpredictable_CHECKSPNONEACTIVE) then
CheckSPAlignment();
else
if n == 31 then CheckSPAlignment();
base = if n == 31 then SP[] else X[n];

for r = 0 to nreg-1
values[r] = Z[(t+r) MOD 32];

for e = 0 to elements-1
for r = 0 to nreg-1
if ElemP[mask, e, esize] == '1' then
integer eoff = (offset * elements * nreg) + (e * nreg) + r;
bits(64) addr = base + eoff * mbytes;
Mem[addr, mbytes, AccType_SVE] = Elem[values[r], e, esize];

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

ST2B (scalar plus immediate) Page 2423

ST2B (scalar plus immediate) Page 2424

ST2B (scalar plus scalar)

Contiguous store two-byte structures from two vectors (scalar index)

Contiguous store two-byte structures, each from the same element number in two vector registers to the memory
address generated by a 64-bit scalar base and a 64-bit scalar index register and added to the base address. After each
structure access the index value is incremented by two. The index register is not updated by the instruction.
Each predicate element applies to the same element number in each of the two vector registers, or equivalently to the
two consecutive bytes in memory which make up each structure. Inactive structures are not written to memory.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 0 0 1 0 0 0 0 1 Rm 0 1 1 Pg Rn Zt
msz<1>msz<0>

ST2B { <Zt1>.B, <Zt2>.B }, <Pg>, [<Xn|SP>, <Xm>]

if !HaveSVE() then UNDEFINED;

if Rm == '11111' then UNDEFINED;
integer t = UInt(Zt);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer g = UInt(Pg);
integer esize = 8;
integer nreg = 2;

Assembler Symbols

<Zt1> Is the name of the first scalable vector register to be transferred, encoded in the "Zt" field.
<Zt2> Is the name of the second scalable vector register to be transferred, encoded as "Zt" plus 1 modulo 32.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<Xm> Is the 64-bit name of the general-purpose offset register, encoded in the "Rm" field.

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(64) base;
bits(PL) mask = P[g];
bits(64) offset;
constant integer mbytes = esize DIV 8;
array [0..1] of bits(VL) values;

if HaveMTEExt() then SetTagCheckedInstruction(TRUE);

if !AnyActiveElement(mask, esize) then

if n == 31 && ConstrainUnpredictableBool(Unpredictable_CHECKSPNONEACTIVE) then
CheckSPAlignment();
else
if n == 31 then CheckSPAlignment();
base = if n == 31 then SP[] else X[n];
offset = X[m];

for r = 0 to nreg-1
values[r] = Z[(t+r) MOD 32];

for e = 0 to elements-1
for r = 0 to nreg-1
if ElemP[mask, e, esize] == '1' then
integer eoff = UInt(offset) + (e * nreg) + r;
bits(64) addr = base + eoff * mbytes;
Mem[addr, mbytes, AccType_SVE] = Elem[values[r], e, esize];

ST2B (scalar plus scalar) Page 2425

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

ST2B (scalar plus scalar) Page 2426

ST2D (scalar plus immediate)

Contiguous store two-doubleword structures from two vectors (immediate index)

Contiguous store two-doubleword structures, each from the same element number in two vector registers to the
memory address generated by a 64-bit scalar base and an immediate index which is a multiple of 2 in the range -16 to
14 that is multiplied by the vector's in-memory size, irrespective of predication,
Each predicate element applies to the same element number in each of the two vector registers, or equivalently to the
two consecutive doublewords in memory which make up each structure. Inactive structures are not written to memory.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 0 0 1 0 1 1 0 1 1 imm4 1 1 1 Pg Rn Zt
msz<1>msz<0>

ST2D { <Zt1>.D, <Zt2>.D }, <Pg>, [<Xn|SP>{, #<imm>, MUL VL}]

if !HaveSVE() then UNDEFINED;

integer t = UInt(Zt);
integer n = UInt(Rn);
integer g = UInt(Pg);
integer esize = 64;
integer offset = SInt(imm4);
integer nreg = 2;

Assembler Symbols

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(64) base;
bits(PL) mask = P[g];
constant integer mbytes = esize DIV 8;
array [0..1] of bits(VL) values;

if HaveMTEExt() then SetTagCheckedInstruction(n != 31);

if !AnyActiveElement(mask, esize) then

if n == 31 && ConstrainUnpredictableBool(Unpredictable_CHECKSPNONEACTIVE) then
CheckSPAlignment();
else
if n == 31 then CheckSPAlignment();
base = if n == 31 then SP[] else X[n];

for r = 0 to nreg-1
values[r] = Z[(t+r) MOD 32];

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

ST2D (scalar plus immediate) Page 2427

ST2D (scalar plus immediate) Page 2428

ST2D (scalar plus scalar)

Contiguous store two-doubleword structures from two vectors (scalar index)

Contiguous store two-doubleword structures, each from the same element number in two vector registers to the
memory address generated by a 64-bit scalar base and a 64-bit scalar index register scaled by the element size (LSL
option) and added to the base address. After each structure access the index value is incremented by two. The index
register is not updated by the instruction.
Each predicate element applies to the same element number in each of the two vector registers, or equivalently to the
two consecutive doublewords in memory which make up each structure. Inactive structures are not written to memory.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 0 0 1 0 1 1 0 1 Rm 0 1 1 Pg Rn Zt
msz<1>msz<0>

ST2D { <Zt1>.D, <Zt2>.D }, <Pg>, [<Xn|SP>, <Xm>, LSL #3]

if !HaveSVE() then UNDEFINED;

if Rm == '11111' then UNDEFINED;
integer t = UInt(Zt);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer g = UInt(Pg);
integer esize = 64;
integer nreg = 2;

Assembler Symbols

<Zt1> Is the name of the first scalable vector register to be transferred, encoded in the "Zt" field.
<Zt2> Is the name of the second scalable vector register to be transferred, encoded as "Zt" plus 1 modulo 32.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<Xm> Is the 64-bit name of the general-purpose offset register, encoded in the "Rm" field.

ST2D (scalar plus scalar) Page 2429

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(64) base;
bits(PL) mask = P[g];
bits(64) offset;
constant integer mbytes = esize DIV 8;
array [0..1] of bits(VL) values;

if HaveMTEExt() then SetTagCheckedInstruction(TRUE);

if !AnyActiveElement(mask, esize) then

if n == 31 && ConstrainUnpredictableBool(Unpredictable_CHECKSPNONEACTIVE) then
CheckSPAlignment();
else
if n == 31 then CheckSPAlignment();
base = if n == 31 then SP[] else X[n];
offset = X[m];

for r = 0 to nreg-1
values[r] = Z[(t+r) MOD 32];

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

ST2D (scalar plus scalar) Page 2430

ST2H (scalar plus immediate)

Contiguous store two-halfword structures from two vectors (immediate index)

Contiguous store two-halfword structures, each from the same element number in two vector registers to the memory
address generated by a 64-bit scalar base and an immediate index which is a multiple of 2 in the range -16 to 14 that
is multiplied by the vector's in-memory size, irrespective of predication,
Each predicate element applies to the same element number in each of the two vector registers, or equivalently to the
two consecutive halfwords in memory which make up each structure. Inactive structures are not written to memory.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 0 0 1 0 0 1 0 1 1 imm4 1 1 1 Pg Rn Zt
msz<1>msz<0>

ST2H { <Zt1>.H, <Zt2>.H }, <Pg>, [<Xn|SP>{, #<imm>, MUL VL}]

if !HaveSVE() then UNDEFINED;

integer t = UInt(Zt);
integer n = UInt(Rn);
integer g = UInt(Pg);
integer esize = 16;
integer offset = SInt(imm4);
integer nreg = 2;

Assembler Symbols

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(64) base;
bits(PL) mask = P[g];
constant integer mbytes = esize DIV 8;
array [0..1] of bits(VL) values;

if HaveMTEExt() then SetTagCheckedInstruction(n != 31);

if !AnyActiveElement(mask, esize) then

if n == 31 && ConstrainUnpredictableBool(Unpredictable_CHECKSPNONEACTIVE) then
CheckSPAlignment();
else
if n == 31 then CheckSPAlignment();
base = if n == 31 then SP[] else X[n];

for r = 0 to nreg-1
values[r] = Z[(t+r) MOD 32];

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

ST2H (scalar plus immediate) Page 2431

ST2H (scalar plus immediate) Page 2432

ST2H (scalar plus scalar)

Contiguous store two-halfword structures from two vectors (scalar index)

Contiguous store two-halfword structures, each from the same element number in two vector registers to the memory
address generated by a 64-bit scalar base and a 64-bit scalar index register scaled by the element size (LSL option)
and added to the base address. After each structure access the index value is incremented by two. The index register
is not updated by the instruction.
Each predicate element applies to the same element number in each of the two vector registers, or equivalently to the
two consecutive halfwords in memory which make up each structure. Inactive structures are not written to memory.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 0 0 1 0 0 1 0 1 Rm 0 1 1 Pg Rn Zt
msz<1>msz<0>

ST2H { <Zt1>.H, <Zt2>.H }, <Pg>, [<Xn|SP>, <Xm>, LSL #1]

if !HaveSVE() then UNDEFINED;

if Rm == '11111' then UNDEFINED;
integer t = UInt(Zt);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer g = UInt(Pg);
integer esize = 16;
integer nreg = 2;

Assembler Symbols

<Zt1> Is the name of the first scalable vector register to be transferred, encoded in the "Zt" field.
<Zt2> Is the name of the second scalable vector register to be transferred, encoded as "Zt" plus 1 modulo 32.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<Xm> Is the 64-bit name of the general-purpose offset register, encoded in the "Rm" field.

ST2H (scalar plus scalar) Page 2433

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(64) base;
bits(PL) mask = P[g];
bits(64) offset;
constant integer mbytes = esize DIV 8;
array [0..1] of bits(VL) values;

if HaveMTEExt() then SetTagCheckedInstruction(TRUE);

if !AnyActiveElement(mask, esize) then

if n == 31 && ConstrainUnpredictableBool(Unpredictable_CHECKSPNONEACTIVE) then
CheckSPAlignment();
else
if n == 31 then CheckSPAlignment();
base = if n == 31 then SP[] else X[n];
offset = X[m];

for r = 0 to nreg-1
values[r] = Z[(t+r) MOD 32];

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

ST2H (scalar plus scalar) Page 2434

ST2W (scalar plus immediate)

Contiguous store two-word structures from two vectors (immediate index)

Contiguous store two-word structures, each from the same element number in two vector registers to the memory
address generated by a 64-bit scalar base and an immediate index which is a multiple of 2 in the range -16 to 14 that
is multiplied by the vector's in-memory size, irrespective of predication,
Each predicate element applies to the same element number in each of the two vector registers, or equivalently to the
two consecutive words in memory which make up each structure. Inactive structures are not written to memory.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 0 0 1 0 1 0 0 1 1 imm4 1 1 1 Pg Rn Zt
msz<1>msz<0>

ST2W { <Zt1>.S, <Zt2>.S }, <Pg>, [<Xn|SP>{, #<imm>, MUL VL}]

if !HaveSVE() then UNDEFINED;

integer t = UInt(Zt);
integer n = UInt(Rn);
integer g = UInt(Pg);
integer esize = 32;
integer offset = SInt(imm4);
integer nreg = 2;

Assembler Symbols

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(64) base;
bits(PL) mask = P[g];
constant integer mbytes = esize DIV 8;
array [0..1] of bits(VL) values;

if HaveMTEExt() then SetTagCheckedInstruction(n != 31);

if !AnyActiveElement(mask, esize) then

if n == 31 && ConstrainUnpredictableBool(Unpredictable_CHECKSPNONEACTIVE) then
CheckSPAlignment();
else
if n == 31 then CheckSPAlignment();
base = if n == 31 then SP[] else X[n];

for r = 0 to nreg-1
values[r] = Z[(t+r) MOD 32];

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

ST2W (scalar plus immediate) Page 2435

ST2W (scalar plus immediate) Page 2436

ST2W (scalar plus scalar)

Contiguous store two-word structures from two vectors (scalar index)

Contiguous store two-word structures, each from the same element number in two vector registers to the memory
address generated by a 64-bit scalar base and a 64-bit scalar index register scaled by the element size (LSL option)
and added to the base address. After each structure access the index value is incremented by two. The index register
is not updated by the instruction.
Each predicate element applies to the same element number in each of the two vector registers, or equivalently to the
two consecutive words in memory which make up each structure. Inactive structures are not written to memory.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 0 0 1 0 1 0 0 1 Rm 0 1 1 Pg Rn Zt
msz<1>msz<0>

ST2W { <Zt1>.S, <Zt2>.S }, <Pg>, [<Xn|SP>, <Xm>, LSL #2]

if !HaveSVE() then UNDEFINED;

if Rm == '11111' then UNDEFINED;
integer t = UInt(Zt);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer g = UInt(Pg);
integer esize = 32;
integer nreg = 2;

Assembler Symbols

<Zt1> Is the name of the first scalable vector register to be transferred, encoded in the "Zt" field.
<Zt2> Is the name of the second scalable vector register to be transferred, encoded as "Zt" plus 1 modulo 32.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<Xm> Is the 64-bit name of the general-purpose offset register, encoded in the "Rm" field.

ST2W (scalar plus scalar) Page 2437

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(64) base;
bits(PL) mask = P[g];
bits(64) offset;
constant integer mbytes = esize DIV 8;
array [0..1] of bits(VL) values;

if HaveMTEExt() then SetTagCheckedInstruction(TRUE);

if !AnyActiveElement(mask, esize) then

if n == 31 && ConstrainUnpredictableBool(Unpredictable_CHECKSPNONEACTIVE) then
CheckSPAlignment();
else
if n == 31 then CheckSPAlignment();
base = if n == 31 then SP[] else X[n];
offset = X[m];

for r = 0 to nreg-1
values[r] = Z[(t+r) MOD 32];

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

ST2W (scalar plus scalar) Page 2438

ST3B (scalar plus immediate)

Contiguous store three-byte structures from three vectors (immediate index)

Contiguous store three-byte structures, each from the same element number in three vector registers to the memory
address generated by a 64-bit scalar base and an immediate index which is a multiple of 3 in the range -24 to 21 that
is multiplied by the vector's in-memory size, irrespective of predication,
Each predicate element applies to the same element number in each of the three vector registers, or equivalently to
the three consecutive bytes in memory which make up each structure. Inactive structures are not written to memory.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 0 0 1 0 0 0 1 0 1 imm4 1 1 1 Pg Rn Zt
msz<1>msz<0>

ST3B { <Zt1>.B, <Zt2>.B, <Zt3>.B }, <Pg>, [<Xn|SP>{, #<imm>, MUL VL}]

if !HaveSVE() then UNDEFINED;

integer t = UInt(Zt);
integer n = UInt(Rn);
integer g = UInt(Pg);
integer esize = 8;
integer offset = SInt(imm4);
integer nreg = 3;

Assembler Symbols

<Zt1> Is the name of the first scalable vector register to be transferred, encoded in the "Zt" field.
<Zt2> Is the name of the second scalable vector register to be transferred, encoded as "Zt" plus 1 modulo 32.
<Zt3> Is the name of the third scalable vector register to be transferred, encoded as "Zt" plus 2 modulo 32.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<imm> Is the optional signed immediate vector offset, a multiple of 3 in the range -24 to 21, defaulting to 0,
encoded in the "imm4" field.

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(64) base;
bits(PL) mask = P[g];
constant integer mbytes = esize DIV 8;
array [0..2] of bits(VL) values;

if HaveMTEExt() then SetTagCheckedInstruction(n != 31);

if !AnyActiveElement(mask, esize) then

if n == 31 && ConstrainUnpredictableBool(Unpredictable_CHECKSPNONEACTIVE) then
CheckSPAlignment();
else
if n == 31 then CheckSPAlignment();
base = if n == 31 then SP[] else X[n];

for r = 0 to nreg-1
values[r] = Z[(t+r) MOD 32];

ST3B (scalar plus immediate) Page 2439

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

ST3B (scalar plus immediate) Page 2440

ST3B (scalar plus scalar)

Contiguous store three-byte structures from three vectors (scalar index)

Contiguous store three-byte structures, each from the same element number in three vector registers to the memory
address generated by a 64-bit scalar base and a 64-bit scalar index register and added to the base address. After each
structure access the index value is incremented by three. The index register is not updated by the instruction.
Each predicate element applies to the same element number in each of the three vector registers, or equivalently to
the three consecutive bytes in memory which make up each structure. Inactive structures are not written to memory.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 0 0 1 0 0 0 1 0 Rm 0 1 1 Pg Rn Zt
msz<1>msz<0>

ST3B { <Zt1>.B, <Zt2>.B, <Zt3>.B }, <Pg>, [<Xn|SP>, <Xm>]

if !HaveSVE() then UNDEFINED;

if Rm == '11111' then UNDEFINED;
integer t = UInt(Zt);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer g = UInt(Pg);
integer esize = 8;
integer nreg = 3;

Assembler Symbols

<Zt1> Is the name of the first scalable vector register to be transferred, encoded in the "Zt" field.
<Zt2> Is the name of the second scalable vector register to be transferred, encoded as "Zt" plus 1 modulo 32.
<Zt3> Is the name of the third scalable vector register to be transferred, encoded as "Zt" plus 2 modulo 32.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<Xm> Is the 64-bit name of the general-purpose offset register, encoded in the "Rm" field.

ST3B (scalar plus scalar) Page 2441

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(64) base;
bits(PL) mask = P[g];
bits(64) offset;
constant integer mbytes = esize DIV 8;
array [0..2] of bits(VL) values;

if HaveMTEExt() then SetTagCheckedInstruction(TRUE);

if !AnyActiveElement(mask, esize) then

if n == 31 && ConstrainUnpredictableBool(Unpredictable_CHECKSPNONEACTIVE) then
CheckSPAlignment();
else
if n == 31 then CheckSPAlignment();
base = if n == 31 then SP[] else X[n];
offset = X[m];

for r = 0 to nreg-1
values[r] = Z[(t+r) MOD 32];

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

ST3B (scalar plus scalar) Page 2442

ST3D (scalar plus immediate)

Contiguous store three-doubleword structures from three vectors (immediate index)

Contiguous store three-doubleword structures, each from the same element number in three vector registers to the
memory address generated by a 64-bit scalar base and an immediate index which is a multiple of 3 in the range -24 to
21 that is multiplied by the vector's in-memory size, irrespective of predication,
Each predicate element applies to the same element number in each of the three vector registers, or equivalently to
the three consecutive doublewords in memory which make up each structure. Inactive structures are not written to
memory.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 0 0 1 0 1 1 1 0 1 imm4 1 1 1 Pg Rn Zt
msz<1>msz<0>

ST3D { <Zt1>.D, <Zt2>.D, <Zt3>.D }, <Pg>, [<Xn|SP>{, #<imm>, MUL VL}]

if !HaveSVE() then UNDEFINED;

integer t = UInt(Zt);
integer n = UInt(Rn);
integer g = UInt(Pg);
integer esize = 64;
integer offset = SInt(imm4);
integer nreg = 3;

Assembler Symbols

<Zt1> Is the name of the first scalable vector register to be transferred, encoded in the "Zt" field.
<Zt2> Is the name of the second scalable vector register to be transferred, encoded as "Zt" plus 1 modulo 32.
<Zt3> Is the name of the third scalable vector register to be transferred, encoded as "Zt" plus 2 modulo 32.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<imm> Is the optional signed immediate vector offset, a multiple of 3 in the range -24 to 21, defaulting to 0,
encoded in the "imm4" field.

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(64) base;
bits(PL) mask = P[g];
constant integer mbytes = esize DIV 8;
array [0..2] of bits(VL) values;

if HaveMTEExt() then SetTagCheckedInstruction(n != 31);

if !AnyActiveElement(mask, esize) then

if n == 31 && ConstrainUnpredictableBool(Unpredictable_CHECKSPNONEACTIVE) then
CheckSPAlignment();
else
if n == 31 then CheckSPAlignment();
base = if n == 31 then SP[] else X[n];

for r = 0 to nreg-1
values[r] = Z[(t+r) MOD 32];

ST3D (scalar plus immediate) Page 2443

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

ST3D (scalar plus immediate) Page 2444

ST3D (scalar plus scalar)

Contiguous store three-doubleword structures from three vectors (scalar index)

Contiguous store three-doubleword structures, each from the same element number in three vector registers to the
memory address generated by a 64-bit scalar base and a 64-bit scalar index register scaled by the element size (LSL
option) and added to the base address. After each structure access the index value is incremented by three. The index
register is not updated by the instruction.
Each predicate element applies to the same element number in each of the three vector registers, or equivalently to
the three consecutive doublewords in memory which make up each structure. Inactive structures are not written to
memory.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 0 0 1 0 1 1 1 0 Rm 0 1 1 Pg Rn Zt
msz<1>msz<0>

ST3D { <Zt1>.D, <Zt2>.D, <Zt3>.D }, <Pg>, [<Xn|SP>, <Xm>, LSL #3]

if !HaveSVE() then UNDEFINED;

if Rm == '11111' then UNDEFINED;
integer t = UInt(Zt);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer g = UInt(Pg);
integer esize = 64;
integer nreg = 3;

Assembler Symbols

<Zt1> Is the name of the first scalable vector register to be transferred, encoded in the "Zt" field.
<Zt2> Is the name of the second scalable vector register to be transferred, encoded as "Zt" plus 1 modulo 32.
<Zt3> Is the name of the third scalable vector register to be transferred, encoded as "Zt" plus 2 modulo 32.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<Xm> Is the 64-bit name of the general-purpose offset register, encoded in the "Rm" field.

ST3D (scalar plus scalar) Page 2445

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(64) base;
bits(PL) mask = P[g];
bits(64) offset;
constant integer mbytes = esize DIV 8;
array [0..2] of bits(VL) values;

if HaveMTEExt() then SetTagCheckedInstruction(TRUE);

if !AnyActiveElement(mask, esize) then

if n == 31 && ConstrainUnpredictableBool(Unpredictable_CHECKSPNONEACTIVE) then
CheckSPAlignment();
else
if n == 31 then CheckSPAlignment();
base = if n == 31 then SP[] else X[n];
offset = X[m];

for r = 0 to nreg-1
values[r] = Z[(t+r) MOD 32];

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

ST3D (scalar plus scalar) Page 2446

ST3H (scalar plus immediate)

Contiguous store three-halfword structures from three vectors (immediate index)

Contiguous store three-halfword structures, each from the same element number in three vector registers to the
memory address generated by a 64-bit scalar base and an immediate index which is a multiple of 3 in the range -24 to
21 that is multiplied by the vector's in-memory size, irrespective of predication,
Each predicate element applies to the same element number in each of the three vector registers, or equivalently to
the three consecutive halfwords in memory which make up each structure. Inactive structures are not written to
memory.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 0 0 1 0 0 1 1 0 1 imm4 1 1 1 Pg Rn Zt
msz<1>msz<0>

ST3H { <Zt1>.H, <Zt2>.H, <Zt3>.H }, <Pg>, [<Xn|SP>{, #<imm>, MUL VL}]

if !HaveSVE() then UNDEFINED;

integer t = UInt(Zt);
integer n = UInt(Rn);
integer g = UInt(Pg);
integer esize = 16;
integer offset = SInt(imm4);
integer nreg = 3;

Assembler Symbols

<Zt1> Is the name of the first scalable vector register to be transferred, encoded in the "Zt" field.
<Zt2> Is the name of the second scalable vector register to be transferred, encoded as "Zt" plus 1 modulo 32.
<Zt3> Is the name of the third scalable vector register to be transferred, encoded as "Zt" plus 2 modulo 32.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<imm> Is the optional signed immediate vector offset, a multiple of 3 in the range -24 to 21, defaulting to 0,
encoded in the "imm4" field.

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(64) base;
bits(PL) mask = P[g];
constant integer mbytes = esize DIV 8;
array [0..2] of bits(VL) values;

if HaveMTEExt() then SetTagCheckedInstruction(n != 31);

if !AnyActiveElement(mask, esize) then

if n == 31 && ConstrainUnpredictableBool(Unpredictable_CHECKSPNONEACTIVE) then
CheckSPAlignment();
else
if n == 31 then CheckSPAlignment();
base = if n == 31 then SP[] else X[n];

for r = 0 to nreg-1
values[r] = Z[(t+r) MOD 32];

ST3H (scalar plus immediate) Page 2447

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

ST3H (scalar plus immediate) Page 2448

ST3H (scalar plus scalar)

Contiguous store three-halfword structures from three vectors (scalar index)

Contiguous store three-halfword structures, each from the same element number in three vector registers to the
memory address generated by a 64-bit scalar base and a 64-bit scalar index register scaled by the element size (LSL
option) and added to the base address. After each structure access the index value is incremented by three. The index
register is not updated by the instruction.
Each predicate element applies to the same element number in each of the three vector registers, or equivalently to
the three consecutive halfwords in memory which make up each structure. Inactive structures are not written to
memory.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 0 0 1 0 0 1 1 0 Rm 0 1 1 Pg Rn Zt
msz<1>msz<0>

ST3H { <Zt1>.H, <Zt2>.H, <Zt3>.H }, <Pg>, [<Xn|SP>, <Xm>, LSL #1]

if !HaveSVE() then UNDEFINED;

if Rm == '11111' then UNDEFINED;
integer t = UInt(Zt);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer g = UInt(Pg);
integer esize = 16;
integer nreg = 3;

Assembler Symbols

<Zt1> Is the name of the first scalable vector register to be transferred, encoded in the "Zt" field.
<Zt2> Is the name of the second scalable vector register to be transferred, encoded as "Zt" plus 1 modulo 32.
<Zt3> Is the name of the third scalable vector register to be transferred, encoded as "Zt" plus 2 modulo 32.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<Xm> Is the 64-bit name of the general-purpose offset register, encoded in the "Rm" field.

ST3H (scalar plus scalar) Page 2449

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(64) base;
bits(PL) mask = P[g];
bits(64) offset;
constant integer mbytes = esize DIV 8;
array [0..2] of bits(VL) values;

if HaveMTEExt() then SetTagCheckedInstruction(TRUE);

if !AnyActiveElement(mask, esize) then

if n == 31 && ConstrainUnpredictableBool(Unpredictable_CHECKSPNONEACTIVE) then
CheckSPAlignment();
else
if n == 31 then CheckSPAlignment();
base = if n == 31 then SP[] else X[n];
offset = X[m];

for r = 0 to nreg-1
values[r] = Z[(t+r) MOD 32];

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

ST3H (scalar plus scalar) Page 2450

ST3W (scalar plus immediate)

Contiguous store three-word structures from three vectors (immediate index)

Contiguous store three-word structures, each from the same element number in three vector registers to the memory
address generated by a 64-bit scalar base and an immediate index which is a multiple of 3 in the range -24 to 21 that
is multiplied by the vector's in-memory size, irrespective of predication,
Each predicate element applies to the same element number in each of the three vector registers, or equivalently to
the three consecutive words in memory which make up each structure. Inactive structures are not written to memory.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 0 0 1 0 1 0 1 0 1 imm4 1 1 1 Pg Rn Zt
msz<1>msz<0>

ST3W { <Zt1>.S, <Zt2>.S, <Zt3>.S }, <Pg>, [<Xn|SP>{, #<imm>, MUL VL}]

if !HaveSVE() then UNDEFINED;

integer t = UInt(Zt);
integer n = UInt(Rn);
integer g = UInt(Pg);
integer esize = 32;
integer offset = SInt(imm4);
integer nreg = 3;

Assembler Symbols

<Zt1> Is the name of the first scalable vector register to be transferred, encoded in the "Zt" field.
<Zt2> Is the name of the second scalable vector register to be transferred, encoded as "Zt" plus 1 modulo 32.
<Zt3> Is the name of the third scalable vector register to be transferred, encoded as "Zt" plus 2 modulo 32.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<imm> Is the optional signed immediate vector offset, a multiple of 3 in the range -24 to 21, defaulting to 0,
encoded in the "imm4" field.

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(64) base;
bits(PL) mask = P[g];
constant integer mbytes = esize DIV 8;
array [0..2] of bits(VL) values;

if HaveMTEExt() then SetTagCheckedInstruction(n != 31);

if !AnyActiveElement(mask, esize) then

if n == 31 && ConstrainUnpredictableBool(Unpredictable_CHECKSPNONEACTIVE) then
CheckSPAlignment();
else
if n == 31 then CheckSPAlignment();
base = if n == 31 then SP[] else X[n];

for r = 0 to nreg-1
values[r] = Z[(t+r) MOD 32];

ST3W (scalar plus immediate) Page 2451

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

ST3W (scalar plus immediate) Page 2452

ST3W (scalar plus scalar)

Contiguous store three-word structures from three vectors (scalar index)

Contiguous store three-word structures, each from the same element number in three vector registers to the memory
address generated by a 64-bit scalar base and a 64-bit scalar index register scaled by the element size (LSL option)
and added to the base address. After each structure access the index value is incremented by three. The index register
is not updated by the instruction.
Each predicate element applies to the same element number in each of the three vector registers, or equivalently to
the three consecutive words in memory which make up each structure. Inactive structures are not written to memory.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 0 0 1 0 1 0 1 0 Rm 0 1 1 Pg Rn Zt
msz<1>msz<0>

ST3W { <Zt1>.S, <Zt2>.S, <Zt3>.S }, <Pg>, [<Xn|SP>, <Xm>, LSL #2]

if !HaveSVE() then UNDEFINED;

if Rm == '11111' then UNDEFINED;
integer t = UInt(Zt);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer g = UInt(Pg);
integer esize = 32;
integer nreg = 3;

Assembler Symbols

<Zt1> Is the name of the first scalable vector register to be transferred, encoded in the "Zt" field.
<Zt2> Is the name of the second scalable vector register to be transferred, encoded as "Zt" plus 1 modulo 32.
<Zt3> Is the name of the third scalable vector register to be transferred, encoded as "Zt" plus 2 modulo 32.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<Xm> Is the 64-bit name of the general-purpose offset register, encoded in the "Rm" field.

ST3W (scalar plus scalar) Page 2453

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(64) base;
bits(PL) mask = P[g];
bits(64) offset;
constant integer mbytes = esize DIV 8;
array [0..2] of bits(VL) values;

if HaveMTEExt() then SetTagCheckedInstruction(TRUE);

if !AnyActiveElement(mask, esize) then

if n == 31 && ConstrainUnpredictableBool(Unpredictable_CHECKSPNONEACTIVE) then
CheckSPAlignment();
else
if n == 31 then CheckSPAlignment();
base = if n == 31 then SP[] else X[n];
offset = X[m];

for r = 0 to nreg-1
values[r] = Z[(t+r) MOD 32];

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

ST3W (scalar plus scalar) Page 2454

ST4B (scalar plus immediate)

Contiguous store four-byte structures from four vectors (immediate index)

Contiguous store four-byte structures, each from the same element number in four vector registers to the memory
address generated by a 64-bit scalar base and an immediate index which is a multiple of 4 in the range -32 to 28 that
is multiplied by the vector's in-memory size, irrespective of predication,
Each predicate element applies to the same element number in each of the four vector registers, or equivalently to the
four consecutive bytes in memory which make up each structure. Inactive structures are not written to memory.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 0 0 1 0 0 0 1 1 1 imm4 1 1 1 Pg Rn Zt
msz<1>msz<0>

ST4B { <Zt1>.B, <Zt2>.B, <Zt3>.B, <Zt4>.B }, <Pg>, [<Xn|SP>{, #<imm>, MUL VL}]

if !HaveSVE() then UNDEFINED;

integer t = UInt(Zt);
integer n = UInt(Rn);
integer g = UInt(Pg);
integer esize = 8;
integer offset = SInt(imm4);
integer nreg = 4;

Assembler Symbols

<Zt1> Is the name of the first scalable vector register to be transferred, encoded in the "Zt" field.
<Zt2> Is the name of the second scalable vector register to be transferred, encoded as "Zt" plus 1 modulo 32.
<Zt3> Is the name of the third scalable vector register to be transferred, encoded as "Zt" plus 2 modulo 32.
<Zt4> Is the name of the fourth scalable vector register to be transferred, encoded as "Zt" plus 3 modulo 32.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<imm> Is the optional signed immediate vector offset, a multiple of 4 in the range -32 to 28, defaulting to 0,
encoded in the "imm4" field.

ST4B (scalar plus immediate) Page 2455

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(64) base;
bits(PL) mask = P[g];
constant integer mbytes = esize DIV 8;
array [0..3] of bits(VL) values;

if HaveMTEExt() then SetTagCheckedInstruction(n != 31);

if !AnyActiveElement(mask, esize) then

if n == 31 && ConstrainUnpredictableBool(Unpredictable_CHECKSPNONEACTIVE) then
CheckSPAlignment();
else
if n == 31 then CheckSPAlignment();
base = if n == 31 then SP[] else X[n];

for r = 0 to nreg-1
values[r] = Z[(t+r) MOD 32];

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

ST4B (scalar plus immediate) Page 2456

ST4B (scalar plus scalar)

Contiguous store four-byte structures from four vectors (scalar index)

Contiguous store four-byte structures, each from the same element number in four vector registers to the memory
address generated by a 64-bit scalar base and a 64-bit scalar index register and added to the base address. After each
structure access the index value is incremented by four. The index register is not updated by the instruction.
Each predicate element applies to the same element number in each of the four vector registers, or equivalently to the
four consecutive bytes in memory which make up each structure. Inactive structures are not written to memory.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 0 0 1 0 0 0 1 1 Rm 0 1 1 Pg Rn Zt
msz<1>msz<0>

ST4B { <Zt1>.B, <Zt2>.B, <Zt3>.B, <Zt4>.B }, <Pg>, [<Xn|SP>, <Xm>]

if !HaveSVE() then UNDEFINED;

if Rm == '11111' then UNDEFINED;
integer t = UInt(Zt);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer g = UInt(Pg);
integer esize = 8;
integer nreg = 4;

Assembler Symbols

<Zt1> Is the name of the first scalable vector register to be transferred, encoded in the "Zt" field.
<Zt2> Is the name of the second scalable vector register to be transferred, encoded as "Zt" plus 1 modulo 32.
<Zt3> Is the name of the third scalable vector register to be transferred, encoded as "Zt" plus 2 modulo 32.
<Zt4> Is the name of the fourth scalable vector register to be transferred, encoded as "Zt" plus 3 modulo 32.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<Xm> Is the 64-bit name of the general-purpose offset register, encoded in the "Rm" field.

ST4B (scalar plus scalar) Page 2457

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(64) base;
bits(PL) mask = P[g];
bits(64) offset;
constant integer mbytes = esize DIV 8;
array [0..3] of bits(VL) values;

if HaveMTEExt() then SetTagCheckedInstruction(TRUE);

if !AnyActiveElement(mask, esize) then

if n == 31 && ConstrainUnpredictableBool(Unpredictable_CHECKSPNONEACTIVE) then
CheckSPAlignment();
else
if n == 31 then CheckSPAlignment();
base = if n == 31 then SP[] else X[n];
offset = X[m];

for r = 0 to nreg-1
values[r] = Z[(t+r) MOD 32];

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

ST4B (scalar plus scalar) Page 2458

ST4D (scalar plus immediate)

Contiguous store four-doubleword structures from four vectors (immediate index)

Contiguous store four-doubleword structures, each from the same element number in four vector registers to the
memory address generated by a 64-bit scalar base and an immediate index which is a multiple of 4 in the range -32 to
28 that is multiplied by the vector's in-memory size, irrespective of predication,
Each predicate element applies to the same element number in each of the four vector registers, or equivalently to the
four consecutive doublewords in memory which make up each structure. Inactive structures are not written to
memory.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 0 0 1 0 1 1 1 1 1 imm4 1 1 1 Pg Rn Zt
msz<1>msz<0>

ST4D { <Zt1>.D, <Zt2>.D, <Zt3>.D, <Zt4>.D }, <Pg>, [<Xn|SP>{, #<imm>, MUL VL}]

if !HaveSVE() then UNDEFINED;

integer t = UInt(Zt);
integer n = UInt(Rn);
integer g = UInt(Pg);
integer esize = 64;
integer offset = SInt(imm4);
integer nreg = 4;

Assembler Symbols

<Zt1> Is the name of the first scalable vector register to be transferred, encoded in the "Zt" field.
<Zt2> Is the name of the second scalable vector register to be transferred, encoded as "Zt" plus 1 modulo 32.
<Zt3> Is the name of the third scalable vector register to be transferred, encoded as "Zt" plus 2 modulo 32.
<Zt4> Is the name of the fourth scalable vector register to be transferred, encoded as "Zt" plus 3 modulo 32.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<imm> Is the optional signed immediate vector offset, a multiple of 4 in the range -32 to 28, defaulting to 0,
encoded in the "imm4" field.

ST4D (scalar plus immediate) Page 2459

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(64) base;
bits(PL) mask = P[g];
constant integer mbytes = esize DIV 8;
array [0..3] of bits(VL) values;

if HaveMTEExt() then SetTagCheckedInstruction(n != 31);

if !AnyActiveElement(mask, esize) then

if n == 31 && ConstrainUnpredictableBool(Unpredictable_CHECKSPNONEACTIVE) then
CheckSPAlignment();
else
if n == 31 then CheckSPAlignment();
base = if n == 31 then SP[] else X[n];

for r = 0 to nreg-1
values[r] = Z[(t+r) MOD 32];

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

ST4D (scalar plus immediate) Page 2460

ST4D (scalar plus scalar)

Contiguous store four-doubleword structures from four vectors (scalar index)

Contiguous store four-doubleword structures, each from the same element number in four vector registers to the
memory address generated by a 64-bit scalar base and a 64-bit scalar index register scaled by the element size (LSL
option) and added to the base address. After each structure access the index value is incremented by four. The index
register is not updated by the instruction.
Each predicate element applies to the same element number in each of the four vector registers, or equivalently to the
four consecutive doublewords in memory which make up each structure. Inactive structures are not written to
memory.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 0 0 1 0 1 1 1 1 Rm 0 1 1 Pg Rn Zt
msz<1>msz<0>

ST4D { <Zt1>.D, <Zt2>.D, <Zt3>.D, <Zt4>.D }, <Pg>, [<Xn|SP>, <Xm>, LSL #3]

if !HaveSVE() then UNDEFINED;

if Rm == '11111' then UNDEFINED;
integer t = UInt(Zt);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer g = UInt(Pg);
integer esize = 64;
integer nreg = 4;

Assembler Symbols

<Zt1> Is the name of the first scalable vector register to be transferred, encoded in the "Zt" field.
<Zt2> Is the name of the second scalable vector register to be transferred, encoded as "Zt" plus 1 modulo 32.
<Zt3> Is the name of the third scalable vector register to be transferred, encoded as "Zt" plus 2 modulo 32.
<Zt4> Is the name of the fourth scalable vector register to be transferred, encoded as "Zt" plus 3 modulo 32.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<Xm> Is the 64-bit name of the general-purpose offset register, encoded in the "Rm" field.

ST4D (scalar plus scalar) Page 2461

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(64) base;
bits(PL) mask = P[g];
bits(64) offset;
constant integer mbytes = esize DIV 8;
array [0..3] of bits(VL) values;

if HaveMTEExt() then SetTagCheckedInstruction(TRUE);

if !AnyActiveElement(mask, esize) then

if n == 31 && ConstrainUnpredictableBool(Unpredictable_CHECKSPNONEACTIVE) then
CheckSPAlignment();
else
if n == 31 then CheckSPAlignment();
base = if n == 31 then SP[] else X[n];
offset = X[m];

for r = 0 to nreg-1
values[r] = Z[(t+r) MOD 32];

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

ST4D (scalar plus scalar) Page 2462

ST4H (scalar plus immediate)

Contiguous store four-halfword structures from four vectors (immediate index)

Contiguous store four-halfword structures, each from the same element number in four vector registers to the memory
address generated by a 64-bit scalar base and an immediate index which is a multiple of 4 in the range -32 to 28 that
is multiplied by the vector's in-memory size, irrespective of predication,
Each predicate element applies to the same element number in each of the four vector registers, or equivalently to the
four consecutive halfwords in memory which make up each structure. Inactive structures are not written to memory.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 0 0 1 0 0 1 1 1 1 imm4 1 1 1 Pg Rn Zt
msz<1>msz<0>

ST4H { <Zt1>.H, <Zt2>.H, <Zt3>.H, <Zt4>.H }, <Pg>, [<Xn|SP>{, #<imm>, MUL VL}]

if !HaveSVE() then UNDEFINED;

integer t = UInt(Zt);
integer n = UInt(Rn);
integer g = UInt(Pg);
integer esize = 16;
integer offset = SInt(imm4);
integer nreg = 4;

Assembler Symbols

<Zt1> Is the name of the first scalable vector register to be transferred, encoded in the "Zt" field.
<Zt2> Is the name of the second scalable vector register to be transferred, encoded as "Zt" plus 1 modulo 32.
<Zt3> Is the name of the third scalable vector register to be transferred, encoded as "Zt" plus 2 modulo 32.
<Zt4> Is the name of the fourth scalable vector register to be transferred, encoded as "Zt" plus 3 modulo 32.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<imm> Is the optional signed immediate vector offset, a multiple of 4 in the range -32 to 28, defaulting to 0,
encoded in the "imm4" field.

ST4H (scalar plus immediate) Page 2463

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(64) base;
bits(PL) mask = P[g];
constant integer mbytes = esize DIV 8;
array [0..3] of bits(VL) values;

if HaveMTEExt() then SetTagCheckedInstruction(n != 31);

if !AnyActiveElement(mask, esize) then

if n == 31 && ConstrainUnpredictableBool(Unpredictable_CHECKSPNONEACTIVE) then
CheckSPAlignment();
else
if n == 31 then CheckSPAlignment();
base = if n == 31 then SP[] else X[n];

for r = 0 to nreg-1
values[r] = Z[(t+r) MOD 32];

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

ST4H (scalar plus immediate) Page 2464

ST4H (scalar plus scalar)

Contiguous store four-halfword structures from four vectors (scalar index)

Contiguous store four-halfword structures, each from the same element number in four vector registers to the memory
address generated by a 64-bit scalar base and a 64-bit scalar index register scaled by the element size (LSL option)
and added to the base address. After each structure access the index value is incremented by four. The index register
is not updated by the instruction.
Each predicate element applies to the same element number in each of the four vector registers, or equivalently to the
four consecutive halfwords in memory which make up each structure. Inactive structures are not written to memory.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 0 0 1 0 0 1 1 1 Rm 0 1 1 Pg Rn Zt
msz<1>msz<0>

ST4H { <Zt1>.H, <Zt2>.H, <Zt3>.H, <Zt4>.H }, <Pg>, [<Xn|SP>, <Xm>, LSL #1]

if !HaveSVE() then UNDEFINED;

if Rm == '11111' then UNDEFINED;
integer t = UInt(Zt);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer g = UInt(Pg);
integer esize = 16;
integer nreg = 4;

Assembler Symbols

<Zt1> Is the name of the first scalable vector register to be transferred, encoded in the "Zt" field.
<Zt2> Is the name of the second scalable vector register to be transferred, encoded as "Zt" plus 1 modulo 32.
<Zt3> Is the name of the third scalable vector register to be transferred, encoded as "Zt" plus 2 modulo 32.
<Zt4> Is the name of the fourth scalable vector register to be transferred, encoded as "Zt" plus 3 modulo 32.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<Xm> Is the 64-bit name of the general-purpose offset register, encoded in the "Rm" field.

ST4H (scalar plus scalar) Page 2465

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(64) base;
bits(PL) mask = P[g];
bits(64) offset;
constant integer mbytes = esize DIV 8;
array [0..3] of bits(VL) values;

if HaveMTEExt() then SetTagCheckedInstruction(TRUE);

if !AnyActiveElement(mask, esize) then

if n == 31 && ConstrainUnpredictableBool(Unpredictable_CHECKSPNONEACTIVE) then
CheckSPAlignment();
else
if n == 31 then CheckSPAlignment();
base = if n == 31 then SP[] else X[n];
offset = X[m];

for r = 0 to nreg-1
values[r] = Z[(t+r) MOD 32];

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

ST4H (scalar plus scalar) Page 2466

ST4W (scalar plus immediate)

Contiguous store four-word structures from four vectors (immediate index)

Contiguous store four-word structures, each from the same element number in four vector registers to the memory
address generated by a 64-bit scalar base and an immediate index which is a multiple of 4 in the range -32 to 28 that
is multiplied by the vector's in-memory size, irrespective of predication,
Each predicate element applies to the same element number in each of the four vector registers, or equivalently to the
four consecutive words in memory which make up each structure. Inactive structures are not written to memory.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 0 0 1 0 1 0 1 1 1 imm4 1 1 1 Pg Rn Zt
msz<1>msz<0>

ST4W { <Zt1>.S, <Zt2>.S, <Zt3>.S, <Zt4>.S }, <Pg>, [<Xn|SP>{, #<imm>, MUL VL}]

if !HaveSVE() then UNDEFINED;

integer t = UInt(Zt);
integer n = UInt(Rn);
integer g = UInt(Pg);
integer esize = 32;
integer offset = SInt(imm4);
integer nreg = 4;

Assembler Symbols

<Zt1> Is the name of the first scalable vector register to be transferred, encoded in the "Zt" field.
<Zt2> Is the name of the second scalable vector register to be transferred, encoded as "Zt" plus 1 modulo 32.
<Zt3> Is the name of the third scalable vector register to be transferred, encoded as "Zt" plus 2 modulo 32.
<Zt4> Is the name of the fourth scalable vector register to be transferred, encoded as "Zt" plus 3 modulo 32.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<imm> Is the optional signed immediate vector offset, a multiple of 4 in the range -32 to 28, defaulting to 0,
encoded in the "imm4" field.

ST4W (scalar plus immediate) Page 2467

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(64) base;
bits(PL) mask = P[g];
constant integer mbytes = esize DIV 8;
array [0..3] of bits(VL) values;

if HaveMTEExt() then SetTagCheckedInstruction(n != 31);

if !AnyActiveElement(mask, esize) then

if n == 31 && ConstrainUnpredictableBool(Unpredictable_CHECKSPNONEACTIVE) then
CheckSPAlignment();
else
if n == 31 then CheckSPAlignment();
base = if n == 31 then SP[] else X[n];

for r = 0 to nreg-1
values[r] = Z[(t+r) MOD 32];

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

ST4W (scalar plus immediate) Page 2468

ST4W (scalar plus scalar)

Contiguous store four-word structures from four vectors (scalar index)

Contiguous store four-word structures, each from the same element number in four vector registers to the memory
address generated by a 64-bit scalar base and a 64-bit scalar index register scaled by the element size (LSL option)
and added to the base address. After each structure access the index value is incremented by four. The index register
is not updated by the instruction.
Each predicate element applies to the same element number in each of the four vector registers, or equivalently to the
four consecutive words in memory which make up each structure. Inactive structures are not written to memory.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 0 0 1 0 1 0 1 1 Rm 0 1 1 Pg Rn Zt
msz<1>msz<0>

ST4W { <Zt1>.S, <Zt2>.S, <Zt3>.S, <Zt4>.S }, <Pg>, [<Xn|SP>, <Xm>, LSL #2]

if !HaveSVE() then UNDEFINED;

if Rm == '11111' then UNDEFINED;
integer t = UInt(Zt);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer g = UInt(Pg);
integer esize = 32;
integer nreg = 4;

Assembler Symbols

<Zt1> Is the name of the first scalable vector register to be transferred, encoded in the "Zt" field.
<Zt2> Is the name of the second scalable vector register to be transferred, encoded as "Zt" plus 1 modulo 32.
<Zt3> Is the name of the third scalable vector register to be transferred, encoded as "Zt" plus 2 modulo 32.
<Zt4> Is the name of the fourth scalable vector register to be transferred, encoded as "Zt" plus 3 modulo 32.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<Xm> Is the 64-bit name of the general-purpose offset register, encoded in the "Rm" field.

ST4W (scalar plus scalar) Page 2469

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(64) base;
bits(PL) mask = P[g];
bits(64) offset;
constant integer mbytes = esize DIV 8;
array [0..3] of bits(VL) values;

if HaveMTEExt() then SetTagCheckedInstruction(TRUE);

if !AnyActiveElement(mask, esize) then

if n == 31 && ConstrainUnpredictableBool(Unpredictable_CHECKSPNONEACTIVE) then
CheckSPAlignment();
else
if n == 31 then CheckSPAlignment();
base = if n == 31 then SP[] else X[n];
offset = X[m];

for r = 0 to nreg-1
values[r] = Z[(t+r) MOD 32];

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

ST4W (scalar plus scalar) Page 2470

STNT1B (scalar plus immediate)

Contiguous store non-temporal bytes from vector (immediate index)

Contiguous store non-temporal of bytes from elements of a vector register to the memory address generated by a
64-bit scalar base and immediate index in the range -8 to 7 which is multiplied by the vector's in-memory size,
irrespective of predication, and added to the base address. Inactive elements are not written to memory.
A non-temporal store is a hint to the system that this data is unlikely to be referenced again soon.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 0 0 1 0 0 0 0 0 1 imm4 1 1 1 Pg Rn Zt
msz<1>msz<0>

STNT1B { <Zt>.B }, <Pg>, [<Xn|SP>{, #<imm>, MUL VL}]

if !HaveSVE() then UNDEFINED;

integer t = UInt(Zt);
integer n = UInt(Rn);
integer g = UInt(Pg);
integer esize = 8;
integer offset = SInt(imm4);

Assembler Symbols

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(64) base;
constant integer mbytes = esize DIV 8;
bits(VL) src;
bits(PL) mask = P[g];

if HaveMTEExt() then SetTagCheckedInstruction(n != 31);

if !AnyActiveElement(mask, esize) then

if n == 31 && ConstrainUnpredictableBool(Unpredictable_CHECKSPNONEACTIVE) then
CheckSPAlignment();
else
if n == 31 then CheckSPAlignment();
base = if n == 31 then SP[] else X[n];
src = Z[t];

for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
integer eoff = (offset * elements) + e;
bits(64) addr = base + eoff * mbytes;
Mem[addr, mbytes, AccType_SVESTREAM] = Elem[src, e, esize];

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

STNT1B (scalar plus

Page 2471
immediate)
STNT1B (scalar plus scalar)

Contiguous store non-temporal bytes from vector (scalar index)

Contiguous store non-temporal of bytes from elements of a vector register to the memory address generated by a
64-bit scalar base and scalar index which is added to the base address. After each element access the index value is
incremented, but the index register is not updated. Inactive elements are not written to memory.
A non-temporal store is a hint to the system that this data is unlikely to be referenced again soon.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 0 0 1 0 0 0 0 0 Rm 0 1 1 Pg Rn Zt
msz<1>msz<0>

STNT1B { <Zt>.B }, <Pg>, [<Xn|SP>, <Xm>]

if !HaveSVE() then UNDEFINED;

if Rm == '11111' then UNDEFINED;
integer t = UInt(Zt);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer g = UInt(Pg);
integer esize = 8;

Assembler Symbols

<Zt> Is the name of the scalable vector register to be transferred, encoded in the "Zt" field.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<Xm> Is the 64-bit name of the general-purpose offset register, encoded in the "Rm" field.

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(64) base;
bits(64) offset;
bits(VL) src;
bits(PL) mask = P[g];
constant integer mbytes = esize DIV 8;

if HaveMTEExt() then SetTagCheckedInstruction(TRUE);

if !AnyActiveElement(mask, esize) then

for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
bits(64) addr = base + (UInt(offset) + e) * mbytes;
Mem[addr, mbytes, AccType_SVESTREAM] = Elem[src, e, esize];

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

STNT1B (scalar plus scalar) Page 2472

STNT1D (scalar plus immediate)

Contiguous store non-temporal doublewords from vector (immediate index)

Contiguous store non-temporal of doublewords from elements of a vector register to the memory address generated by
a 64-bit scalar base and immediate index in the range -8 to 7 which is multiplied by the vector's in-memory size,
irrespective of predication, and added to the base address. Inactive elements are not written to memory.
A non-temporal store is a hint to the system that this data is unlikely to be referenced again soon.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 0 0 1 0 1 1 0 0 1 imm4 1 1 1 Pg Rn Zt
msz<1>msz<0>

STNT1D { <Zt>.D }, <Pg>, [<Xn|SP>{, #<imm>, MUL VL}]

if !HaveSVE() then UNDEFINED;

integer t = UInt(Zt);
integer n = UInt(Rn);
integer g = UInt(Pg);
integer esize = 64;
integer offset = SInt(imm4);

Assembler Symbols

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(64) base;
constant integer mbytes = esize DIV 8;
bits(VL) src;
bits(PL) mask = P[g];

if HaveMTEExt() then SetTagCheckedInstruction(n != 31);

if !AnyActiveElement(mask, esize) then

if n == 31 && ConstrainUnpredictableBool(Unpredictable_CHECKSPNONEACTIVE) then
CheckSPAlignment();
else
if n == 31 then CheckSPAlignment();
base = if n == 31 then SP[] else X[n];
src = Z[t];

for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
integer eoff = (offset * elements) + e;
bits(64) addr = base + eoff * mbytes;
Mem[addr, mbytes, AccType_SVESTREAM] = Elem[src, e, esize];

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

STNT1D (scalar plus

Page 2473
immediate)
STNT1D (scalar plus scalar)

Contiguous store non-temporal doublewords from vector (scalar index)

Contiguous store non-temporal of doublewords from elements of a vector register to the memory address generated by
a 64-bit scalar base and scalar index which is multiplied by 8 and added to the base address. After each element
access the index value is incremented, but the index register is not updated. Inactive elements are not written to
memory.
A non-temporal store is a hint to the system that this data is unlikely to be referenced again soon.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 0 0 1 0 1 1 0 0 Rm 0 1 1 Pg Rn Zt
msz<1>msz<0>

STNT1D { <Zt>.D }, <Pg>, [<Xn|SP>, <Xm>, LSL #3]

if !HaveSVE() then UNDEFINED;

if Rm == '11111' then UNDEFINED;
integer t = UInt(Zt);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer g = UInt(Pg);
integer esize = 64;

Assembler Symbols

<Zt> Is the name of the scalable vector register to be transferred, encoded in the "Zt" field.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<Xm> Is the 64-bit name of the general-purpose offset register, encoded in the "Rm" field.

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(64) base;
bits(64) offset;
bits(VL) src;
bits(PL) mask = P[g];
constant integer mbytes = esize DIV 8;

if HaveMTEExt() then SetTagCheckedInstruction(TRUE);

if !AnyActiveElement(mask, esize) then

for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
bits(64) addr = base + (UInt(offset) + e) * mbytes;
Mem[addr, mbytes, AccType_SVESTREAM] = Elem[src, e, esize];

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

STNT1D (scalar plus scalar) Page 2474

STNT1H (scalar plus immediate)

Contiguous store non-temporal halfwords from vector (immediate index)

Contiguous store non-temporal of halfwords from elements of a vector register to the memory address generated by a
64-bit scalar base and immediate index in the range -8 to 7 which is multiplied by the vector's in-memory size,
irrespective of predication, and added to the base address. Inactive elements are not written to memory.
A non-temporal store is a hint to the system that this data is unlikely to be referenced again soon.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 0 0 1 0 0 1 0 0 1 imm4 1 1 1 Pg Rn Zt
msz<1>msz<0>

STNT1H { <Zt>.H }, <Pg>, [<Xn|SP>{, #<imm>, MUL VL}]

if !HaveSVE() then UNDEFINED;

integer t = UInt(Zt);
integer n = UInt(Rn);
integer g = UInt(Pg);
integer esize = 16;
integer offset = SInt(imm4);

Assembler Symbols

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(64) base;
constant integer mbytes = esize DIV 8;
bits(VL) src;
bits(PL) mask = P[g];

if HaveMTEExt() then SetTagCheckedInstruction(n != 31);

if !AnyActiveElement(mask, esize) then

if n == 31 && ConstrainUnpredictableBool(Unpredictable_CHECKSPNONEACTIVE) then
CheckSPAlignment();
else
if n == 31 then CheckSPAlignment();
base = if n == 31 then SP[] else X[n];
src = Z[t];

for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
integer eoff = (offset * elements) + e;
bits(64) addr = base + eoff * mbytes;
Mem[addr, mbytes, AccType_SVESTREAM] = Elem[src, e, esize];

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

STNT1H (scalar plus

Page 2475
immediate)
STNT1H (scalar plus scalar)

Contiguous store non-temporal halfwords from vector (scalar index)

Contiguous store non-temporal of halfwords from elements of a vector register to the memory address generated by a
64-bit scalar base and scalar index which is multiplied by 2 and added to the base address. After each element access
the index value is incremented, but the index register is not updated. Inactive elements are not written to memory.
A non-temporal store is a hint to the system that this data is unlikely to be referenced again soon.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 0 0 1 0 0 1 0 0 Rm 0 1 1 Pg Rn Zt
msz<1>msz<0>

STNT1H { <Zt>.H }, <Pg>, [<Xn|SP>, <Xm>, LSL #1]

if !HaveSVE() then UNDEFINED;

if Rm == '11111' then UNDEFINED;
integer t = UInt(Zt);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer g = UInt(Pg);
integer esize = 16;

Assembler Symbols

<Zt> Is the name of the scalable vector register to be transferred, encoded in the "Zt" field.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<Xm> Is the 64-bit name of the general-purpose offset register, encoded in the "Rm" field.

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(64) base;
bits(64) offset;
bits(VL) src;
bits(PL) mask = P[g];
constant integer mbytes = esize DIV 8;

if HaveMTEExt() then SetTagCheckedInstruction(TRUE);

if !AnyActiveElement(mask, esize) then

for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
bits(64) addr = base + (UInt(offset) + e) * mbytes;
Mem[addr, mbytes, AccType_SVESTREAM] = Elem[src, e, esize];

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

STNT1H (scalar plus scalar) Page 2476

STNT1W (scalar plus immediate)

Contiguous store non-temporal words from vector (immediate index)

Contiguous store non-temporal of words from elements of a vector register to the memory address generated by a
64-bit scalar base and immediate index in the range -8 to 7 which is multiplied by the vector's in-memory size,
irrespective of predication, and added to the base address. Inactive elements are not written to memory.
A non-temporal store is a hint to the system that this data is unlikely to be referenced again soon.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 0 0 1 0 1 0 0 0 1 imm4 1 1 1 Pg Rn Zt
msz<1>msz<0>

STNT1W { <Zt>.S }, <Pg>, [<Xn|SP>{, #<imm>, MUL VL}]

if !HaveSVE() then UNDEFINED;

integer t = UInt(Zt);
integer n = UInt(Rn);
integer g = UInt(Pg);
integer esize = 32;
integer offset = SInt(imm4);

Assembler Symbols

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(64) base;
constant integer mbytes = esize DIV 8;
bits(VL) src;
bits(PL) mask = P[g];

if HaveMTEExt() then SetTagCheckedInstruction(n != 31);

if !AnyActiveElement(mask, esize) then

if n == 31 && ConstrainUnpredictableBool(Unpredictable_CHECKSPNONEACTIVE) then
CheckSPAlignment();
else
if n == 31 then CheckSPAlignment();
base = if n == 31 then SP[] else X[n];
src = Z[t];

for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
integer eoff = (offset * elements) + e;
bits(64) addr = base + eoff * mbytes;
Mem[addr, mbytes, AccType_SVESTREAM] = Elem[src, e, esize];

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

STNT1W (scalar plus

Page 2477
immediate)
STNT1W (scalar plus scalar)

Contiguous store non-temporal words from vector (scalar index)

Contiguous store non-temporal of words from elements of a vector register to the memory address generated by a
64-bit scalar base and scalar index which is multiplied by 4 and added to the base address. After each element access
the index value is incremented, but the index register is not updated. Inactive elements are not written to memory.
A non-temporal store is a hint to the system that this data is unlikely to be referenced again soon.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 0 0 1 0 1 0 0 0 Rm 0 1 1 Pg Rn Zt
msz<1>msz<0>

STNT1W { <Zt>.S }, <Pg>, [<Xn|SP>, <Xm>, LSL #2]

if !HaveSVE() then UNDEFINED;

if Rm == '11111' then UNDEFINED;
integer t = UInt(Zt);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer g = UInt(Pg);
integer esize = 32;

Assembler Symbols

<Zt> Is the name of the scalable vector register to be transferred, encoded in the "Zt" field.
<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<Xm> Is the 64-bit name of the general-purpose offset register, encoded in the "Rm" field.

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(64) base;
bits(64) offset;
bits(VL) src;
bits(PL) mask = P[g];
constant integer mbytes = esize DIV 8;

if HaveMTEExt() then SetTagCheckedInstruction(TRUE);

if !AnyActiveElement(mask, esize) then

for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
bits(64) addr = base + (UInt(offset) + e) * mbytes;
Mem[addr, mbytes, AccType_SVESTREAM] = Elem[src, e, esize];

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

STNT1W (scalar plus scalar) Page 2478

STR (predicate)

Store predicate register

Store a predicate register to a memory address generated by a 64-bit scalar base, plus an immediate offset in the
range -256 to 255 which is multiplied by the current predicate register size in bytes. This instruction is unpredicated.
The store is performed as contiguous byte accesses, each containing 8 consecutive predicate bits in ascending element
order, with no endian conversion and no guarantee of single-copy atomicity larger than a byte. However, if alignment
is checked, then a general-purpose base register must be aligned to 2 bytes.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 0 0 1 0 1 1 0 imm9h 0 0 0 imm9l Rn 0 Pt

STR <Pt>, [<Xn|SP>{, #<imm>, MUL VL}]

if !HaveSVE() then UNDEFINED;

integer t = UInt(Pt);
integer n = UInt(Rn);
integer imm = SInt(imm9h:imm9l);

Assembler Symbols

<Pt> Is the name of the scalable predicate transfer register, encoded in the "Pt" field.
<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.
<imm> Is the optional signed immediate vector offset, in the range -256 to 255, defaulting to 0, encoded in the
"imm9h:imm9l" fields.

Operation

CheckSVEEnabled();
integer elements = PL DIV 8;
bits(PL) src;
bits(64) base;
integer offset = imm * elements;

if n == 31 then
CheckSPAlignment();
if HaveMTEExt() then SetTagCheckedInstruction(FALSE);
base = SP[];
else
if HaveMTEExt() then SetTagCheckedInstruction(TRUE);
base = X[n];

src = P[t];
boolean aligned = AArch64.CheckAlignment(base + offset, 2, AccType_SVE, TRUE);
for e = 0 to elements-1
AArch64.MemSingle[base + offset, 1, AccType_SVE, aligned] = Elem[src, e, 8];
offset = offset + 1;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

STR (predicate) Page 2479

STR (vector)

Store vector register

Store a vector register to a memory address generated by a 64-bit scalar base, plus an immediate offset in the range
-256 to 255 which is multiplied by the current vector register size in bytes. This instruction is unpredicated.
The store is performed as contiguous byte accesses, with no endian conversion and no guarantee of single-copy
atomicity larger than a byte. However, if alignment is checked, then the base register must be aligned to 16 bytes.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1 1 1 0 0 1 0 1 1 0 imm9h 0 1 0 imm9l Rn Zt

STR <Zt>, [<Xn|SP>{, #<imm>, MUL VL}]

if !HaveSVE() then UNDEFINED;

integer t = UInt(Zt);
integer n = UInt(Rn);
integer imm = SInt(imm9h:imm9l);

Assembler Symbols

Operation

CheckSVEEnabled();
integer elements = VL DIV 8;
bits(VL) src;
bits(64) base;
integer offset = imm * elements;

if n == 31 then
CheckSPAlignment();
if HaveMTEExt() then SetTagCheckedInstruction(FALSE);
base = SP[];
else
if HaveMTEExt() then SetTagCheckedInstruction(TRUE);
base = X[n];

src = Z[t];
boolean aligned = AArch64.CheckAlignment(base + offset, 16, AccType_SVE, TRUE);
for e = 0 to elements-1
AArch64.MemSingle[base + offset, 1, AccType_SVE, aligned] = Elem[src, e, 8];
offset = offset + 1;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

STR (vector) Page 2480

SUB (immediate)

Subtract immediate (unpredicated)

Subtract an unsigned immediate from each element of the source vector, and destructively place the results in the
corresponding elements of the source vector. This instruction is unpredicated.
The immediate is an unsigned value in the range 0 to 255, and for element widths of 16 bits or higher it may also be a
positive multiple of 256 in the range 256 to 65280.
The immediate is encoded in 8 bits with an optional left shift by 8. The preferred disassembly when the shift option is
specified is "#<uimm8>, LSL #8". However an assembler and disassembler may also allow use of the shifted 16-bit
value unless the immediate is 0 and the shift amount is 8, which must be unambiguously described as "#0, LSL #8".
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 size 1 0 0 0 0 1 1 1 sh imm8 Zdn

SUB <Zdn>.<T>, <Zdn>.<T>, #<imm>{, <shift>}

if !HaveSVE() then UNDEFINED;

if size:sh == '001' then UNDEFINED;
integer esize = 8 << UInt(size);
integer dn = UInt(Zdn);
integer imm = UInt(imm8);
if sh == '1' then imm = imm << 8;

Assembler Symbols

<Zdn> Is the name of the source and destination scalable vector register, encoded in the "Zdn" field.

<T> Is the size specifier, encoded in “size”:

size <T>
00 B
01 H
10 S
11 D

<imm> Is an unsigned immediate in the range 0 to 255, encoded in the "imm8" field.

<shift> Is the optional left shift to apply to the immediate, defaulting to LSL #0 and encoded in “sh”:

sh <shift>
0 LSL #0
1 LSL #8

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(VL) operand1 = Z[dn];
bits(VL) result;

for e = 0 to elements-1
bits(esize) element1 = Elem[operand1, e, esize];
Elem[result, e, esize] = element1 - imm;

Z[dn] = result;

Operational information

SUB (immediate) Page 2481

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

SUB (immediate) Page 2482

SUB (vectors, predicated)

Subtract vectors (predicated)

Subtract active elements of the second source vector from corresponding elements of the first source vector and
destructively place the results in the corresponding elements of the first source vector. Inactive elements in the
destination vector register remain unmodified.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 0 0 0 0 0 1 0 0 0 Pg Zm Zdn

SUB <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>

if !HaveSVE() then UNDEFINED;

integer esize = 8 << UInt(size);
integer g = UInt(Pg);
integer dn = UInt(Zdn);
integer m = UInt(Zm);

Assembler Symbols

<Zdn> Is the name of the first source and destination scalable vector register, encoded in the "Zdn" field.

<T> Is the size specifier, encoded in “size”:

size <T>
00 B
01 H
10 S
11 D

<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Zm> Is the name of the second source scalable vector register, encoded in the "Zm" field.

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) operand1 = Z[dn];
bits(VL) operand2 = if AnyActiveElement(mask, esize) then Z[m] else Zeros();
bits(VL) result;

for e = 0 to elements-1
bits(esize) element1 = Elem[operand1, e, esize];
bits(esize) element2 = Elem[operand2, e, esize];
if ElemP[mask, e, esize] == '1' then
Elem[result, e, esize] = element1 - element2;
else
Elem[result, e, esize] = Elem[operand1, e, esize];

Z[dn] = result;

Operational information

SUB (vectors, predicated) Page 2483

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

SUB (vectors, predicated) Page 2484

SUB (vectors, unpredicated)

Subtract vectors (unpredicated)

Subtract all elements of the second source vector from corresponding elements of the first source vector and place the
results in the corresponding elements of the destination vector. This instruction is unpredicated.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 1 Zm 0 0 0 0 0 1 Zn Zd

SUB <Zd>.<T>, <Zn>.<T>, <Zm>.<T>

if !HaveSVE() then UNDEFINED;

integer esize = 8 << UInt(size);
integer n = UInt(Zn);
integer m = UInt(Zm);
integer d = UInt(Zd);

Assembler Symbols

<Zd> Is the name of the destination scalable vector register, encoded in the "Zd" field.

<T> Is the size specifier, encoded in “size”:

size <T>
00 B
01 H
10 S
11 D

<Zn> Is the name of the first source scalable vector register, encoded in the "Zn" field.
<Zm> Is the name of the second source scalable vector register, encoded in the "Zm" field.

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(VL) operand1 = Z[n];
bits(VL) operand2 = Z[m];
bits(VL) result;

for e = 0 to elements-1
bits(esize) element1 = Elem[operand1, e, esize];
bits(esize) element2 = Elem[operand2, e, esize];
Elem[result, e, esize] = element1 - element2;

Z[d] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

SUB (vectors, unpredicated) Page 2485

SUBR (immediate)

Reversed subtract from immediate (unpredicated)

Reversed subtract from an unsigned immediate each element of the source vector, and destructively place the results
in the corresponding elements of the source vector. This instruction is unpredicated.
The immediate is an unsigned value in the range 0 to 255, and for element widths of 16 bits or higher it may also be a
positive multiple of 256 in the range 256 to 65280.
The immediate is encoded in 8 bits with an optional left shift by 8. The preferred disassembly when the shift option is
specified is "#<uimm8>, LSL #8". However an assembler and disassembler may also allow use of the shifted 16-bit
value unless the immediate is 0 and the shift amount is 8, which must be unambiguously described as "#0, LSL #8".
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 size 1 0 0 0 1 1 1 1 sh imm8 Zdn

SUBR <Zdn>.<T>, <Zdn>.<T>, #<imm>{, <shift>}

if !HaveSVE() then UNDEFINED;

if size:sh == '001' then UNDEFINED;
integer esize = 8 << UInt(size);
integer dn = UInt(Zdn);
integer imm = UInt(imm8);
if sh == '1' then imm = imm << 8;

Assembler Symbols

<Zdn> Is the name of the source and destination scalable vector register, encoded in the "Zdn" field.

<T> Is the size specifier, encoded in “size”:

size <T>
00 B
01 H
10 S
11 D

<imm> Is an unsigned immediate in the range 0 to 255, encoded in the "imm8" field.

<shift> Is the optional left shift to apply to the immediate, defaulting to LSL #0 and encoded in “sh”:

sh <shift>
0 LSL #0
1 LSL #8

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(VL) operand1 = Z[dn];
bits(VL) result;

for e = 0 to elements-1
integer element1 = UInt(Elem[operand1, e, esize]);
Elem[result, e, esize] = (imm - element1)<esize-1:0>;

Z[dn] = result;

Operational information

SUBR (immediate) Page 2486

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

SUBR (immediate) Page 2487

SUBR (vectors)

Reversed subtract vectors (predicated)

Reversed subtract active elements of the first source vector from corresponding elements of the second source vector
and destructively place the results in the corresponding elements of the first source vector. Inactive elements in the
destination vector register remain unmodified.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 0 0 0 0 1 1 0 0 0 Pg Zm Zdn

SUBR <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>

if !HaveSVE() then UNDEFINED;

integer esize = 8 << UInt(size);
integer g = UInt(Pg);
integer dn = UInt(Zdn);
integer m = UInt(Zm);

Assembler Symbols

<Zdn> Is the name of the first source and destination scalable vector register, encoded in the "Zdn" field.

<T> Is the size specifier, encoded in “size”:

size <T>
00 B
01 H
10 S
11 D

<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Zm> Is the name of the second source scalable vector register, encoded in the "Zm" field.

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) operand1 = Z[dn];
bits(VL) operand2 = if AnyActiveElement(mask, esize) then Z[m] else Zeros();
bits(VL) result;

for e = 0 to elements-1
bits(esize) element1 = Elem[operand1, e, esize];
bits(esize) element2 = Elem[operand2, e, esize];
if ElemP[mask, e, esize] == '1' then
Elem[result, e, esize] = element2 - element1;
else
Elem[result, e, esize] = Elem[operand1, e, esize];

Z[dn] = result;

Operational information

SUBR (vectors) Page 2488

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

SUBR (vectors) Page 2489

SUDOT

Signed by unsigned integer indexed dot product

The signed by unsigned integer indexed dot product instruction computes the dot product of a group of four signed
8-bit integer values held in each 32-bit element of the first source vector multiplied by a group of four unsigned 8-bit
integer values in an indexed 32-bit element of the second source vector, and then destructively adds the widened dot
product to the corresponding 32-bit element of the destination vector.
The groups within the second source vector are specified using an immediate index which selects the same group
position within each 128-bit vector segment. The index range is from 0 to 3. This instruction is unpredicated.
ID_AA64ZFR0_EL1.I8MM indicates whether this instruction is implemented.

SVE
(FEAT_I8MM)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 0 0 1 0 0 1 0 1 i2 Zm 0 0 0 1 1 1 Zn Zda
size<1>size<0> U

SUDOT <Zda>.S, <Zn>.B, <Zm>.B[<imm>]

if !HaveSVE() || !HaveInt8MatMulExt() then UNDEFINED;

integer esize = 32;
integer index = UInt(i2);
integer n = UInt(Zn);
integer m = UInt(Zm);
integer da = UInt(Zda);

Assembler Symbols

<Zda> Is the name of the third source and destination scalable vector register, encoded in the "Zda" field.
<Zn> Is the name of the first source scalable vector register, encoded in the "Zn" field.
<Zm> Is the name of the second source scalable vector register Z0-Z7, encoded in the "Zm" field.
<imm> Is the immediate index of a quadtuplet of four 8-bit elements within each 128-bit vector segment, in the
range 0 to 3, encoded in the "i2" field.

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
integer eltspersegment = 128 DIV esize;
bits(VL) operand1 = Z[n];
bits(VL) operand2 = Z[m];
bits(VL) operand3 = Z[da];
bits(VL) result;

for e = 0 to elements-1
integer segmentbase = e - (e MOD eltspersegment);
integer s = segmentbase + index;
bits(esize) res = Elem[operand3, e, esize];
for i = 0 to 3
integer element1 = SInt(Elem[operand1, 4 * e + i, esize DIV 4]);
integer element2 = UInt(Elem[operand2, 4 * s + i, esize DIV 4]);
res = res + element1 * element2;
Elem[result, e, esize] = res;

Z[da] = result;

SUDOT Page 2490

Operational information

This instruction might be immediately preceded in program order by a MOVPRFX instruction. The MOVPRFX instruction
must conform to all of the following requirements, otherwise the behavior of the MOVPRFX and this instruction is
UNPREDICTABLE:
• The MOVPRFX instruction must be unpredicated.
• The MOVPRFX instruction must specify the same destination register as this instruction.
• The destination register must not refer to architectural register state referenced by any other source operand
register of this instruction.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

SUDOT Page 2491

SUNPKHI, SUNPKLO

Signed unpack and extend half of vector

Unpack elements from the lowest or highest half of the source vector and then sign-extend them to place in elements
of twice their size within the destination vector. This instruction is unpredicated.
It has encodings from 2 classes: High half and Low half

High half

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 size 1 1 0 0 0 1 0 0 1 1 1 0 Zn Zd
U H

SUNPKHI <Zd>.<T>, <Zn>.<Tb>

if !HaveSVE() then UNDEFINED;

if size == '00' then UNDEFINED;
integer esize = 8 << UInt(size);
integer n = UInt(Zn);
integer d = UInt(Zd);
boolean unsigned = FALSE;
boolean hi = TRUE;

Low half

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 size 1 1 0 0 0 0 0 0 1 1 1 0 Zn Zd
U H

SUNPKLO <Zd>.<T>, <Zn>.<Tb>

if !HaveSVE() then UNDEFINED;

if size == '00' then UNDEFINED;
integer esize = 8 << UInt(size);
integer n = UInt(Zn);
integer d = UInt(Zd);
boolean unsigned = FALSE;
boolean hi = FALSE;

Assembler Symbols

<Zd> Is the name of the destination scalable vector register, encoded in the "Zd" field.

<T> Is the size specifier, encoded in “size”:

size <T>
00 RESERVED
01 H
10 S
11 D

<Zn> Is the name of the source scalable vector register, encoded in the "Zn" field.

<Tb> Is the size specifier, encoded in “size”:

size <Tb>
00 RESERVED
01 B
10 H
11 S

SUNPKHI, SUNPKLO Page 2492

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
integer hsize = esize DIV 2;
bits(VL) operand = Z[n];
bits(VL) result;

for e = 0 to elements-1
bits(hsize) element = if hi then Elem[operand, e + elements, hsize] else Elem[operand, e, hsize];
Elem[result, e, esize] = Extend(element, esize, unsigned);

Z[d] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

SUNPKHI, SUNPKLO Page 2493

SXTB, SXTH, SXTW

Signed byte / halfword / word extend (predicated)

Sign-extend the least-significant sub-element of each active element of the source vector, and place the results in the
corresponding elements of the destination vector. Inactive elements in the destination vector register remain
unmodified.
It has encodings from 3 classes: Byte , Halfword and Word

Byte

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 0 1 0 0 0 0 1 0 1 Pg Zn Zd
U

SXTB <Zd>.<T>, <Pg>/M, <Zn>.<T>

if !HaveSVE() then UNDEFINED;

if size == '00' then UNDEFINED;
integer esize = 8 << UInt(size);
integer s_esize = 8;
integer g = UInt(Pg);
integer n = UInt(Zn);
integer d = UInt(Zd);
boolean unsigned = FALSE;

Halfword

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 0 1 0 0 1 0 1 0 1 Pg Zn Zd
U

SXTH <Zd>.<T>, <Pg>/M, <Zn>.<T>

if !HaveSVE() then UNDEFINED;

if size != '1x' then UNDEFINED;
integer esize = 8 << UInt(size);
integer s_esize = 16;
integer g = UInt(Pg);
integer n = UInt(Zn);
integer d = UInt(Zd);
boolean unsigned = FALSE;

Word

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 0 1 0 1 0 0 1 0 1 Pg Zn Zd
U

SXTW <Zd>.D, <Pg>/M, <Zn>.D

if !HaveSVE() then UNDEFINED;

if size != '11' then UNDEFINED;
integer esize = 8 << UInt(size);
integer s_esize = 32;
integer g = UInt(Pg);
integer n = UInt(Zn);
integer d = UInt(Zd);
boolean unsigned = FALSE;

SXTB, SXTH, SXTW Page 2494

Assembler Symbols

<Zd> Is the name of the destination scalable vector register, encoded in the "Zd" field.

<T> For the byte variant: is the size specifier, encoded in “size”:

size <T>
00 RESERVED
01 H
10 S
11 D

For the halfword variant: is the size specifier, encoded in “size<0>”:

size<0> <T>
0 S
1 D

<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Zn> Is the name of the source scalable vector register, encoded in the "Zn" field.

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) operand = if AnyActiveElement(mask, esize) then Z[n] else Zeros();
bits(VL) result = Z[d];

for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
bits(esize) element = Elem[operand, e, esize];
Elem[result, e, esize] = Extend(element<s_esize-1:0>, esize, unsigned);

Z[d] = result;

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

SXTB, SXTH, SXTW Page 2495

TBL

Programmable table lookup in single vector table

Reads each element of the second source (index) vector and uses its value to select an indexed element from the first
source (table) vector, and places the indexed table element in the destination vector element corresponding to the
index vector element. If an index value is greater than or equal to the number of vector elements then it places zero in
the corresponding destination vector element.
Since the index values can select any element in a vector this operation is not naturally vector length agnostic.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 size 1 Zm 0 0 1 1 0 0 Zn Zd

TBL <Zd>.<T>, { <Zn>.<T> }, <Zm>.<T>

if !HaveSVE() then UNDEFINED;

integer esize = 8 << UInt(size);
integer n = UInt(Zn);
integer m = UInt(Zm);
integer d = UInt(Zd);

Assembler Symbols

<Zd> Is the name of the destination scalable vector register, encoded in the "Zd" field.

<T> Is the size specifier, encoded in “size”:

size <T>
00 B
01 H
10 S
11 D

<Zn> Is the name of the first source scalable vector register, encoded in the "Zn" field.
<Zm> Is the name of the second source scalable vector register, encoded in the "Zm" field.

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(VL) operand1 = Z[n];
bits(VL) operand2 = Z[m];
bits(VL) result;

for e = 0 to elements-1
integer idx = UInt(Elem[operand2, e, esize]);
Elem[result, e, esize] = if idx < elements then Elem[operand1, idx, esize] else Zeros();

Z[d] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

TBL Page 2496

TRN1, TRN2 (predicates)

Interleave even or odd elements from two predicates

Interleave alternating even or odd-numbered elements from the first and second source predicates and place in
elements of the destination predicate. This instruction is unpredicated.
It has encodings from 2 classes: Even and Odd

Even

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 size 1 0 Pm 0 1 0 1 0 0 0 Pn 0 Pd
H

TRN1 <Pd>.<T>, <Pn>.<T>, <Pm>.<T>

if !HaveSVE() then UNDEFINED;

integer esize = 8 << UInt(size);
integer n = UInt(Pn);
integer m = UInt(Pm);
integer d = UInt(Pd);
integer part = 0;

Odd

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 size 1 0 Pm 0 1 0 1 0 1 0 Pn 0 Pd
H

TRN2 <Pd>.<T>, <Pn>.<T>, <Pm>.<T>

if !HaveSVE() then UNDEFINED;

integer esize = 8 << UInt(size);
integer n = UInt(Pn);
integer m = UInt(Pm);
integer d = UInt(Pd);
integer part = 1;

Assembler Symbols

<Pd> Is the name of the destination scalable predicate register, encoded in the "Pd" field.

<T> Is the size specifier, encoded in “size”:

size <T>
00 B
01 H
10 S
11 D

<Pn> Is the name of the first source scalable predicate register, encoded in the "Pn" field.
<Pm> Is the name of the second source scalable predicate register, encoded in the "Pm" field.

TRN1, TRN2 (predicates) Page 2497

Operation

CheckSVEEnabled();
integer pairs = VL DIV (esize * 2);
bits(PL) operand1 = P[n];
bits(PL) operand2 = P[m];
bits(PL) result;

for p = 0 to pairs-1
Elem[result, 2*p+0, esize DIV 8] = Elem[operand1, 2*p+part, esize DIV 8];
Elem[result, 2*p+1, esize DIV 8] = Elem[operand2, 2*p+part, esize DIV 8];

P[d] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

TRN1, TRN2 (predicates) Page 2498

TRN1, TRN2 (vectors)

Interleave even or odd elements from two vectors

Interleave alternating even or odd-numbered elements from the first and second source vectors and place in elements
of the destination vector. This instruction is unpredicated. The 128-bit element variant of this instruction requires that
the current vector length is at least 256 bits, and if the current vector length is not an integer multiple of 256 bits then
the trailing bits are set to zero.
ID_AA64ZFR0_EL1.F64MM indicates whether the 128-bit element variant of the instruction is implemented.
It has encodings from 4 classes: Even , Even (quadwords) , Odd and Odd (quadwords)

Even

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 size 1 Zm 0 1 1 1 0 0 Zn Zd
H

TRN1 <Zd>.<T>, <Zn>.<T>, <Zm>.<T>

if !HaveSVE() then UNDEFINED;

integer esize = 8 << UInt(size);
integer n = UInt(Zn);
integer m = UInt(Zm);
integer d = UInt(Zd);
integer part = 0;

Even (quadwords)
(FEAT_F64MM)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 1 0 1 Zm 0 0 0 1 1 0 Zn Zd
H

TRN1 <Zd>.Q, <Zn>.Q, <Zm>.Q

if !HaveSVEFP64MatMulExt() then UNDEFINED;

integer esize = 128;
integer n = UInt(Zn);
integer m = UInt(Zm);
integer d = UInt(Zd);
integer part = 0;

Odd

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 size 1 Zm 0 1 1 1 0 1 Zn Zd
H

TRN2 <Zd>.<T>, <Zn>.<T>, <Zm>.<T>

if !HaveSVE() then UNDEFINED;

integer esize = 8 << UInt(size);
integer n = UInt(Zn);
integer m = UInt(Zm);
integer d = UInt(Zd);
integer part = 1;

TRN1, TRN2 (vectors) Page 2499

Odd (quadwords)
(FEAT_F64MM)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 1 0 1 Zm 0 0 0 1 1 1 Zn Zd
H

TRN2 <Zd>.Q, <Zn>.Q, <Zm>.Q

if !HaveSVEFP64MatMulExt() then UNDEFINED;

integer esize = 128;
integer n = UInt(Zn);
integer m = UInt(Zm);
integer d = UInt(Zd);
integer part = 1;

Assembler Symbols

<Zd> Is the name of the destination scalable vector register, encoded in the "Zd" field.

<T> Is the size specifier, encoded in “size”:

size <T>
00 B
01 H
10 S
11 D

<Zn> Is the name of the first source scalable vector register, encoded in the "Zn" field.
<Zm> Is the name of the second source scalable vector register, encoded in the "Zm" field.

Operation

CheckSVEEnabled();
if VL < esize * 2 then UNDEFINED;
integer pairs = VL DIV (esize * 2);
bits(VL) operand1 = Z[n];
bits(VL) operand2 = Z[m];
bits(VL) result = Zeros();

for p = 0 to pairs-1
Elem[result, 2*p+0, esize] = Elem[operand1, 2*p+part, esize];
Elem[result, 2*p+1, esize] = Elem[operand2, 2*p+part, esize];

Z[d] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

TRN1, TRN2 (vectors) Page 2500

UABD

Unsigned absolute difference (predicated)

Compute the absolute difference between unsigned integer values in active elements of the second source vector and
corresponding elements of the first source vector and destructively place the difference in the corresponding elements
of the first source vector. Inactive elements in the destination vector register remain unmodified.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 0 0 1 1 0 1 0 0 0 Pg Zm Zdn
U

UABD <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>

if !HaveSVE() then UNDEFINED;

integer esize = 8 << UInt(size);
integer g = UInt(Pg);
integer dn = UInt(Zdn);
integer m = UInt(Zm);
boolean unsigned = TRUE;

Assembler Symbols

<Zdn> Is the name of the first source and destination scalable vector register, encoded in the "Zdn" field.

<T> Is the size specifier, encoded in “size”:

size <T>
00 B
01 H
10 S
11 D

<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Zm> Is the name of the second source scalable vector register, encoded in the "Zm" field.

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) operand1 = Z[dn];
bits(VL) operand2 = if AnyActiveElement(mask, esize) then Z[m] else Zeros();
bits(VL) result;

Z[dn] = result;

Operational information

UABD Page 2501

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

UABD Page 2502

UADDV

Unsigned add reduction to scalar

Unsigned add horizontally across all lanes of a vector, and place the result in the SIMD&FP scalar destination register.
Narrow elements are first zero-extended to 64 bits. Inactive elements in the source vector are treated as zero.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 0 0 0 0 0 1 0 0 1 Pg Zn Vd
U

UADDV <Dd>, <Pg>, <Zn>.<T>

if !HaveSVE() then UNDEFINED;

integer esize = 8 << UInt(size);
integer g = UInt(Pg);
integer n = UInt(Zn);
integer d = UInt(Vd);

Assembler Symbols

<T> Is the size specifier, encoded in “size”:

size <T>
00 B
01 H
10 S
11 D

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) operand = if AnyActiveElement(mask, esize) then Z[n] else Zeros();
integer sum = 0;

for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
integer element = UInt(Elem[operand, e, esize]);
sum = sum + element;

V[d] = sum<63:0>;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

UADDV Page 2503

UCVTF

Unsigned integer convert to floating-point (predicated)

Convert to floating-point from the unsigned integer in each active element of the source vector, and place the results
in the corresponding elements of the destination vector. Inactive elements in the destination vector register remain
unmodified.
If the input and result types have a different size the smaller type is held unpacked in the least significant bits of
elements of the larger size. When the input is the smaller type the upper bits of each source element are ignored.
When the result is the smaller type the results are zero-extended to fill each destination element.
It has encodings from 7 classes: 16-bit to half-precision , 32-bit to half-precision , 32-bit to single-precision , 32-bit to
double-precision , 64-bit to half-precision , 64-bit to single-precision and 64-bit to double-precision

16-bit to half-precision

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 0 1 0 1 0 0 1 1 1 0 1 Pg Zn Zd
int_U

UCVTF <Zd>.H, <Pg>/M, <Zn>.H

if !HaveSVE() then UNDEFINED;

integer esize = 16;
integer g = UInt(Pg);
integer n = UInt(Zn);
integer d = UInt(Zd);
integer s_esize = 16;
integer d_esize = 16;
boolean unsigned = TRUE;
FPRounding rounding = FPRoundingMode(FPCR[]);

32-bit to half-precision

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 0 1 0 1 0 1 0 1 1 0 1 Pg Zn Zd
int_U

UCVTF <Zd>.H, <Pg>/M, <Zn>.S

if !HaveSVE() then UNDEFINED;

integer esize = 32;
integer g = UInt(Pg);
integer n = UInt(Zn);
integer d = UInt(Zd);
integer s_esize = 32;
integer d_esize = 16;
boolean unsigned = TRUE;
FPRounding rounding = FPRoundingMode(FPCR[]);

32-bit to single-precision

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 1 0 0 1 0 1 0 1 1 0 1 Pg Zn Zd
int_U

UCVTF Page 2504

UCVTF <Zd>.S, <Pg>/M, <Zn>.S

if !HaveSVE() then UNDEFINED;

integer esize = 32;
integer g = UInt(Pg);
integer n = UInt(Zn);
integer d = UInt(Zd);
integer s_esize = 32;
integer d_esize = 32;
boolean unsigned = TRUE;
FPRounding rounding = FPRoundingMode(FPCR[]);

32-bit to double-precision

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 1 1 0 1 0 0 0 1 1 0 1 Pg Zn Zd
int_U

UCVTF <Zd>.D, <Pg>/M, <Zn>.S

if !HaveSVE() then UNDEFINED;

integer esize = 64;
integer g = UInt(Pg);
integer n = UInt(Zn);
integer d = UInt(Zd);
integer s_esize = 32;
integer d_esize = 64;
boolean unsigned = TRUE;
FPRounding rounding = FPRoundingMode(FPCR[]);

64-bit to half-precision

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 0 1 0 1 0 1 1 1 1 0 1 Pg Zn Zd
int_U

UCVTF <Zd>.H, <Pg>/M, <Zn>.D

if !HaveSVE() then UNDEFINED;

integer esize = 64;
integer g = UInt(Pg);
integer n = UInt(Zn);
integer d = UInt(Zd);
integer s_esize = 64;
integer d_esize = 16;
boolean unsigned = TRUE;
FPRounding rounding = FPRoundingMode(FPCR[]);

64-bit to single-precision

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 1 1 0 1 0 1 0 1 1 0 1 Pg Zn Zd
int_U

UCVTF Page 2505

UCVTF <Zd>.S, <Pg>/M, <Zn>.D

if !HaveSVE() then UNDEFINED;

integer esize = 64;
integer g = UInt(Pg);
integer n = UInt(Zn);
integer d = UInt(Zd);
integer s_esize = 64;
integer d_esize = 32;
boolean unsigned = TRUE;
FPRounding rounding = FPRoundingMode(FPCR[]);

64-bit to double-precision

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 1 1 0 1 0 1 1 1 1 0 1 Pg Zn Zd
int_U

UCVTF <Zd>.D, <Pg>/M, <Zn>.D

if !HaveSVE() then UNDEFINED;

integer esize = 64;
integer g = UInt(Pg);
integer n = UInt(Zn);
integer d = UInt(Zd);
integer s_esize = 64;
integer d_esize = 64;
boolean unsigned = TRUE;
FPRounding rounding = FPRoundingMode(FPCR[]);

Assembler Symbols

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) operand = if AnyActiveElement(mask, esize) then Z[n] else Zeros();
bits(VL) result = Z[d];

Z[d] = result;

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

UCVTF Page 2506

UCVTF Page 2507

UDIV

Unsigned divide (predicated)

Unsigned divide active elements of the first source vector by corresponding elements of the second source vector and
destructively place the quotient in the corresponding elements of the first source vector. Inactive elements in the
destination vector register remain unmodified.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 0 1 0 1 0 1 0 0 0 Pg Zm Zdn
R U

UDIV <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>

if !HaveSVE() then UNDEFINED;

if size == '0x' then UNDEFINED;
integer esize = 8 << UInt(size);
integer g = UInt(Pg);
integer dn = UInt(Zdn);
integer m = UInt(Zm);
boolean unsigned = TRUE;

Assembler Symbols

<Zdn> Is the name of the first source and destination scalable vector register, encoded in the "Zdn" field.

<T> Is the size specifier, encoded in “size<0>”:

size<0> <T>
0 S
1 D

<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Zm> Is the name of the second source scalable vector register, encoded in the "Zm" field.

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) operand1 = Z[dn];
bits(VL) operand2 = if AnyActiveElement(mask, esize) then Z[m] else Zeros();
bits(VL) result;

Z[dn] = result;

UDIV Page 2508

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

UDIV Page 2509

UDIVR

Unsigned reversed divide (predicated)

Unsigned reversed divide active elements of the second source vector by corresponding elements of the first source
vector and destructively place the quotient in the corresponding elements of the first source vector. Inactive elements
in the destination vector register remain unmodified.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 0 1 0 1 1 1 0 0 0 Pg Zm Zdn
R U

UDIVR <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>

if !HaveSVE() then UNDEFINED;

if size == '0x' then UNDEFINED;
integer esize = 8 << UInt(size);
integer g = UInt(Pg);
integer dn = UInt(Zdn);
integer m = UInt(Zm);
boolean unsigned = TRUE;

Assembler Symbols

<Zdn> Is the name of the first source and destination scalable vector register, encoded in the "Zdn" field.

<T> Is the size specifier, encoded in “size<0>”:

size<0> <T>
0 S
1 D

<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Zm> Is the name of the second source scalable vector register, encoded in the "Zm" field.

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) operand1 = Z[dn];
bits(VL) operand2 = if AnyActiveElement(mask, esize) then Z[m] else Zeros();
bits(VL) result;

for e = 0 to elements-1
integer element1 = Int(Elem[operand1, e, esize], unsigned);
if ElemP[mask, e, esize] == '1' then
integer element2 = Int(Elem[operand2, e, esize], unsigned);
integer quotient;
if element1 == 0 then
quotient = 0;
else
quotient = RoundTowardsZero(Real(element2) / Real(element1));
Elem[result, e, esize] = quotient<esize-1:0>;
else
Elem[result, e, esize] = Elem[operand1, e, esize];

Z[dn] = result;

UDIVR Page 2510

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

UDIVR Page 2511

UDOT (indexed)

Unsigned integer indexed dot product

The unsigned integer indexed dot product instruction computes the dot product of a group of four unsigned 8-bit or
16-bit integer values held in each 32-bit or 64-bit element of the first source vector multiplied by a group of four
unsigned 8-bit or 16-bit integer values in an indexed 32-bit or 64-bit element of the second source vector, and then
destructively adds the widened dot product to the corresponding 32-bit or 64-bit element of the destination vector.
The groups within the second source vector are specified using an immediate index which selects the same group
position within each 128-bit vector segment. The index range is from 0 to one less than the number of groups per
128-bit segment, encoded in 1 to 2 bits depending on the size of the group. This instruction is unpredicated.
It has encodings from 2 classes: 32-bit and 64-bit

32-bit

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 0 0 1 0 0 1 0 1 i2 Zm 0 0 0 0 0 1 Zn Zda
size<1>size<0> U

UDOT <Zda>.S, <Zn>.B, <Zm>.B[<imm>]

if !HaveSVE() then UNDEFINED;

integer esize = 32;
integer index = UInt(i2);
integer n = UInt(Zn);
integer m = UInt(Zm);
integer da = UInt(Zda);

64-bit

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 0 0 1 0 0 1 1 1 i1 Zm 0 0 0 0 0 1 Zn Zda
size<1>size<0> U

UDOT <Zda>.D, <Zn>.H, <Zm>.H[<imm>]

if !HaveSVE() then UNDEFINED;

integer esize = 64;
integer index = UInt(i1);
integer n = UInt(Zn);
integer m = UInt(Zm);
integer da = UInt(Zda);

Assembler Symbols

<Zda> Is the name of the third source and destination scalable vector register, encoded in the "Zda" field.
<Zn> Is the name of the first source scalable vector register, encoded in the "Zn" field.
<Zm> For the 32-bit variant: is the name of the second source scalable vector register Z0-Z7, encoded in the
"Zm" field.
For the 64-bit variant: is the name of the second source scalable vector register Z0-Z15, encoded in the
"Zm" field.
<imm> For the 32-bit variant: is the immediate index of a quadtuplet of four 8-bit elements within each 128-bit
vector segment, in the range 0 to 3, encoded in the "i2" field.
For the 64-bit variant: is the immediate index of a quadtuplet of four 16-bit elements within each 128-bit
vector segment, in the range 0 to 1, encoded in the "i1" field.

UDOT (indexed) Page 2512

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
integer eltspersegment = 128 DIV esize;
bits(VL) operand1 = Z[n];
bits(VL) operand2 = Z[m];
bits(VL) operand3 = Z[da];
bits(VL) result;

for e = 0 to elements-1
integer segmentbase = e - (e MOD eltspersegment);
integer s = segmentbase + index;
bits(esize) res = Elem[operand3, e, esize];
for i = 0 to 3
integer element1 = UInt(Elem[operand1, 4 * e + i, esize DIV 4]);
integer element2 = UInt(Elem[operand2, 4 * s + i, esize DIV 4]);
res = res + element1 * element2;
Elem[result, e, esize] = res;

Z[da] = result;

Operational information

This instruction might be immediately preceded in program order by a MOVPRFX instruction. The MOVPRFX instruction
must conform to all of the following requirements, otherwise the behavior of the MOVPRFX and this instruction is
UNPREDICTABLE:
• The MOVPRFX instruction must be unpredicated.
• The MOVPRFX instruction must specify the same destination register as this instruction.
• The destination register must not refer to architectural register state referenced by any other source operand
register of this instruction.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

UDOT (indexed) Page 2513

UDOT (vectors)

Unsigned integer dot product

The unsigned integer dot product instruction computes the dot product of a group of four unsigned 8-bit or 16-bit
integer values held in each 32-bit or 64-bit element of the first source vector multiplied by a group of four unsigned
8-bit or 16-bit integer values in the corresponding 32-bit or 64-bit element of the second source vector, and then
destructively adds the widened dot product to the corresponding 32-bit or 64-bit element of the destination vector.
This instruction is unpredicated.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 0 0 1 0 0 size 0 Zm 0 0 0 0 0 1 Zn Zda
U

UDOT <Zda>.<T>, <Zn>.<Tb>, <Zm>.<Tb>

if !HaveSVE() then UNDEFINED;

if size == '0x' then UNDEFINED;
integer esize = 8 << UInt(size);
integer n = UInt(Zn);
integer m = UInt(Zm);
integer da = UInt(Zda);

Assembler Symbols

<Zda> Is the name of the third source and destination scalable vector register, encoded in the "Zda" field.

<T> Is the size specifier, encoded in “size<0>”:

size<0> <T>
0 S
1 D

<Zn> Is the name of the first source scalable vector register, encoded in the "Zn" field.

<Tb> Is the size specifier, encoded in “size<0>”:

size<0> <Tb>
0 B
1 H

<Zm> Is the name of the second source scalable vector register, encoded in the "Zm" field.

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(VL) operand1 = Z[n];
bits(VL) operand2 = Z[m];
bits(VL) operand3 = Z[da];
bits(VL) result;

for e = 0 to elements-1
bits(esize) res = Elem[operand3, e, esize];
for i = 0 to 3
integer element1 = UInt(Elem[operand1, 4 * e + i, esize DIV 4]);
integer element2 = UInt(Elem[operand2, 4 * e + i, esize DIV 4]);
res = res + element1 * element2;
Elem[result, e, esize] = res;

Z[da] = result;

UDOT (vectors) Page 2514

Operational information

This instruction might be immediately preceded in program order by a MOVPRFX instruction. The MOVPRFX instruction
must conform to all of the following requirements, otherwise the behavior of the MOVPRFX and this instruction is
UNPREDICTABLE:
• The MOVPRFX instruction must be unpredicated.
• The MOVPRFX instruction must specify the same destination register as this instruction.
• The destination register must not refer to architectural register state referenced by any other source operand
register of this instruction.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

UDOT (vectors) Page 2515

UMAX (immediate)

Unsigned maximum with immediate (unpredicated)

Determine the unsigned maximum of an immediate and each element of the source vector, and destructively place the
results in the corresponding elements of the source vector. The immediate is an unsigned 8-bit value in the range 0 to
255, inclusive. This instruction is unpredicated.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 size 1 0 1 0 0 1 1 1 0 imm8 Zdn
U

UMAX <Zdn>.<T>, <Zdn>.<T>, #<imm>

if !HaveSVE() then UNDEFINED;

integer esize = 8 << UInt(size);
integer dn = UInt(Zdn);
boolean unsigned = TRUE;
integer imm = Int(imm8, unsigned);

Assembler Symbols

<Zdn> Is the name of the source and destination scalable vector register, encoded in the "Zdn" field.

<T> Is the size specifier, encoded in “size”:

size <T>
00 B
01 H
10 S
11 D

<imm> Is the unsigned immediate operand, in the range 0 to 255, encoded in the "imm8" field.

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(VL) operand1 = Z[dn];
bits(VL) result;

for e = 0 to elements-1
integer element1 = Int(Elem[operand1, e, esize], unsigned);
Elem[result, e, esize] = Max(element1, imm)<esize-1:0>;

Z[dn] = result;

Operational information

This instruction might be immediately preceded in program order by a MOVPRFX instruction. The MOVPRFX instruction
must conform to all of the following requirements, otherwise the behavior of the MOVPRFX and this instruction is
UNPREDICTABLE:
• The MOVPRFX instruction must be unpredicated.
• The MOVPRFX instruction must specify the same destination register as this instruction.
• The destination register must not refer to architectural register state referenced by any other source operand
register of this instruction.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

UMAX (immediate) Page 2516

UMAX (vectors)

Unsigned maximum vectors (predicated)

Determine the unsigned maximum of active elements of the second source vector and corresponding elements of the
first source vector and destructively place the results in the corresponding elements of the first source vector. Inactive
elements in the destination vector register remain unmodified.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 0 0 1 0 0 1 0 0 0 Pg Zm Zdn
U

UMAX <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>

if !HaveSVE() then UNDEFINED;

integer esize = 8 << UInt(size);
integer g = UInt(Pg);
integer dn = UInt(Zdn);
integer m = UInt(Zm);
boolean unsigned = TRUE;

Assembler Symbols

<Zdn> Is the name of the first source and destination scalable vector register, encoded in the "Zdn" field.

<T> Is the size specifier, encoded in “size”:

size <T>
00 B
01 H
10 S
11 D

<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Zm> Is the name of the second source scalable vector register, encoded in the "Zm" field.

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) operand1 = Z[dn];
bits(VL) operand2 = if AnyActiveElement(mask, esize) then Z[m] else Zeros();
bits(VL) result;

for e = 0 to elements-1
integer element1 = Int(Elem[operand1, e, esize], unsigned);
integer element2 = Int(Elem[operand2, e, esize], unsigned);
if ElemP[mask, e, esize] == '1' then
integer maximum = Max(element1, element2);
Elem[result, e, esize] = maximum<esize-1:0>;
else
Elem[result, e, esize] = Elem[operand1, e, esize];

Z[dn] = result;

Operational information

UMAX (vectors) Page 2517

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

UMAX (vectors) Page 2518

UMAXV

Unsigned maximum reduction to scalar

Unsigned maximum horizontally across all lanes of a vector, and place the result in the SIMD&FP scalar destination
register. Inactive elements in the source vector are treated as zero.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 0 0 1 0 0 1 0 0 1 Pg Zn Vd
U

UMAXV <V><d>, <Pg>, <Zn>.<T>

if !HaveSVE() then UNDEFINED;

integer esize = 8 << UInt(size);
integer g = UInt(Pg);
integer n = UInt(Zn);
integer d = UInt(Vd);
boolean unsigned = TRUE;

Assembler Symbols

<V> Is a width specifier, encoded in “size”:

size <V>
00 B
01 H
10 S
11 D

<T> Is the size specifier, encoded in “size”:

size <T>
00 B
01 H
10 S
11 D

Operation

for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
integer element = Int(Elem[operand, e, esize], unsigned);
maximum = Max(maximum, element);

V[d] = maximum<esize-1:0>;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

UMAXV Page 2519

UMIN (immediate)

Unsigned minimum with immediate (unpredicated)

Determine the unsigned minimum of an immediate and each element of the source vector, and destructively place the
results in the corresponding elements of the source vector. The immediate is an unsigned 8-bit value in the range 0 to
255, inclusive. This instruction is unpredicated.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 size 1 0 1 0 1 1 1 1 0 imm8 Zdn
U

UMIN <Zdn>.<T>, <Zdn>.<T>, #<imm>

if !HaveSVE() then UNDEFINED;

integer esize = 8 << UInt(size);
integer dn = UInt(Zdn);
boolean unsigned = TRUE;
integer imm = Int(imm8, unsigned);

Assembler Symbols

<Zdn> Is the name of the source and destination scalable vector register, encoded in the "Zdn" field.

<T> Is the size specifier, encoded in “size”:

size <T>
00 B
01 H
10 S
11 D

<imm> Is the unsigned immediate operand, in the range 0 to 255, encoded in the "imm8" field.

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(VL) operand1 = Z[dn];
bits(VL) result;

for e = 0 to elements-1
integer element1 = Int(Elem[operand1, e, esize], unsigned);
Elem[result, e, esize] = Min(element1, imm)<esize-1:0>;

Z[dn] = result;

Operational information

This instruction might be immediately preceded in program order by a MOVPRFX instruction. The MOVPRFX instruction
must conform to all of the following requirements, otherwise the behavior of the MOVPRFX and this instruction is
UNPREDICTABLE:
• The MOVPRFX instruction must be unpredicated.
• The MOVPRFX instruction must specify the same destination register as this instruction.
• The destination register must not refer to architectural register state referenced by any other source operand
register of this instruction.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

UMIN (immediate) Page 2520

UMIN (vectors)

Unsigned minimum vectors (predicated)

Determine the unsigned minimum of active elements of the second source vector and corresponding elements of the
first source vector and destructively place the results in the corresponding elements of the first source vector. Inactive
elements in the destination vector register remain unmodified.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 0 0 1 0 1 1 0 0 0 Pg Zm Zdn
U

UMIN <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>

if !HaveSVE() then UNDEFINED;

integer esize = 8 << UInt(size);
integer g = UInt(Pg);
integer dn = UInt(Zdn);
integer m = UInt(Zm);
boolean unsigned = TRUE;

Assembler Symbols

<Zdn> Is the name of the first source and destination scalable vector register, encoded in the "Zdn" field.

<T> Is the size specifier, encoded in “size”:

size <T>
00 B
01 H
10 S
11 D

<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Zm> Is the name of the second source scalable vector register, encoded in the "Zm" field.

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) operand1 = Z[dn];
bits(VL) operand2 = if AnyActiveElement(mask, esize) then Z[m] else Zeros();
bits(VL) result;

for e = 0 to elements-1
integer element1 = Int(Elem[operand1, e, esize], unsigned);
integer element2 = Int(Elem[operand2, e, esize], unsigned);
if ElemP[mask, e, esize] == '1' then
integer minimum = Min(element1, element2);
Elem[result, e, esize] = minimum<esize-1:0>;
else
Elem[result, e, esize] = Elem[operand1, e, esize];

Z[dn] = result;

Operational information

UMIN (vectors) Page 2521

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

UMIN (vectors) Page 2522

UMINV

Unsigned minimum reduction to scalar

Unsigned minimum horizontally across all lanes of a vector, and place the result in the SIMD&FP scalar destination
register. Inactive elements in the source vector are treated as the maximum unsigned integer for the element size.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 0 0 1 0 1 1 0 0 1 Pg Zn Vd
U

UMINV <V><d>, <Pg>, <Zn>.<T>

if !HaveSVE() then UNDEFINED;

integer esize = 8 << UInt(size);
integer g = UInt(Pg);
integer n = UInt(Zn);
integer d = UInt(Vd);
boolean unsigned = TRUE;

Assembler Symbols

<V> Is a width specifier, encoded in “size”:

size <V>
00 B
01 H
10 S
11 D

<T> Is the size specifier, encoded in “size”:

size <T>
00 B
01 H
10 S
11 D

Operation

for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
integer element = Int(Elem[operand, e, esize], unsigned);
minimum = Min(minimum, element);

V[d] = minimum<esize-1:0>;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

UMINV Page 2523

UMMLA

Unsigned integer matrix multiply-accumulate

The unsigned integer matrix multiply-accumulate instruction multiplies the 2×8 matrix of unsigned 8-bit integer
values held in each 128-bit segment of the first source vector by the 8×2 matrix of unsigned 8-bit integer values in the
corresponding segment of the second source vector. The resulting 2×2 widened 32-bit integer matrix product is then
destructively added to the 32-bit integer matrix accumulator held in the corresponding segment of the addend and
destination vector. This is equivalent to performing an 8-way dot product per destination element.
This instruction is unpredicated.
ID_AA64ZFR0_EL1.I8MM indicates whether this instruction is implemented.

SVE
(FEAT_I8MM)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 0 0 1 0 1 1 1 0 Zm 1 0 0 1 1 0 Zn Zda
uns<1>uns<0>

UMMLA <Zda>.S, <Zn>.B, <Zm>.B

if !HaveSVE() || !HaveInt8MatMulExt() then UNDEFINED;

integer n = UInt(Zn);
integer m = UInt(Zm);
integer da = UInt(Zda);
boolean op1_unsigned = TRUE;
boolean op2_unsigned = TRUE;

Assembler Symbols

Operation

CheckSVEEnabled();
integer segments = VL DIV 128;
bits(VL) operand1 = Z[n];
bits(VL) operand2 = Z[m];
bits(VL) operand3 = Z[da];
bits(VL) result = Zeros();
bits(128) op1, op2;
bits(128) res, addend;

Z[da] = result;

Operational information

UMMLA Page 2524

• The destination register must not refer to architectural register state referenced by any other source operand
register of this instruction.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

UMMLA Page 2525

UMULH

Unsigned multiply returning high half (predicated)

Widening multiply unsigned integer values in active elements of the first source vector by corresponding elements of
the second source vector and destructively place the high half of the result in the corresponding elements of the first
source vector. Inactive elements in the destination vector register remain unmodified.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 0 1 0 0 1 1 0 0 0 Pg Zm Zdn
H U

UMULH <Zdn>.<T>, <Pg>/M, <Zdn>.<T>, <Zm>.<T>

if !HaveSVE() then UNDEFINED;

integer esize = 8 << UInt(size);
integer g = UInt(Pg);
integer dn = UInt(Zdn);
integer m = UInt(Zm);
boolean unsigned = TRUE;

Assembler Symbols

<Zdn> Is the name of the first source and destination scalable vector register, encoded in the "Zdn" field.

<T> Is the size specifier, encoded in “size”:

size <T>
00 B
01 H
10 S
11 D

<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Zm> Is the name of the second source scalable vector register, encoded in the "Zm" field.

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) operand1 = Z[dn];
bits(VL) operand2 = if AnyActiveElement(mask, esize) then Z[m] else Zeros();
bits(VL) result;

for e = 0 to elements-1
integer element1 = Int(Elem[operand1, e, esize], unsigned);
integer element2 = Int(Elem[operand2, e, esize], unsigned);
if ElemP[mask, e, esize] == '1' then
integer product = (element1 * element2) >> esize;
Elem[result, e, esize] = product<esize-1:0>;
else
Elem[result, e, esize] = Elem[operand1, e, esize];

Z[dn] = result;

Operational information

UMULH Page 2526

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

UMULH Page 2527

UQADD (immediate)

Unsigned saturating add immediate (unpredicated)

Unsigned saturating add of an unsigned immediate to each element of the source vector, and destructively place the
results in the corresponding elements of the source vector. Each result element is saturated to the N-bit element's
unsigned integer range 0 to (2N)-1. This instruction is unpredicated.
The immediate is an unsigned value in the range 0 to 255, and for element widths of 16 bits or higher it may also be a
positive multiple of 256 in the range 256 to 65280.
The immediate is encoded in 8 bits with an optional left shift by 8. The preferred disassembly when the shift option is
specified is "#<uimm8>, LSL #8". However an assembler and disassembler may also allow use of the shifted 16-bit
value unless the immediate is 0 and the shift amount is 8, which must be unambiguously described as "#0, LSL #8".
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 size 1 0 0 1 0 1 1 1 sh imm8 Zdn
U

UQADD <Zdn>.<T>, <Zdn>.<T>, #<imm>{, <shift>}

if !HaveSVE() then UNDEFINED;

if size:sh == '001' then UNDEFINED;
integer esize = 8 << UInt(size);
integer dn = UInt(Zdn);
integer imm = UInt(imm8);
if sh == '1' then imm = imm << 8;
boolean unsigned = TRUE;

Assembler Symbols

<Zdn> Is the name of the source and destination scalable vector register, encoded in the "Zdn" field.

<T> Is the size specifier, encoded in “size”:

size <T>
00 B
01 H
10 S
11 D

<imm> Is an unsigned immediate in the range 0 to 255, encoded in the "imm8" field.

<shift> Is the optional left shift to apply to the immediate, defaulting to LSL #0 and encoded in “sh”:

sh <shift>
0 LSL #0
1 LSL #8

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(VL) operand1 = Z[dn];
bits(VL) result;

for e = 0 to elements-1
integer element1 = Int(Elem[operand1, e, esize], unsigned);
(Elem[result, e, esize], -) = SatQ(element1 + imm, esize, unsigned);

Z[dn] = result;

UQADD (immediate) Page 2528

Operational information

This instruction might be immediately preceded in program order by a MOVPRFX instruction. The MOVPRFX instruction
must conform to all of the following requirements, otherwise the behavior of the MOVPRFX and this instruction is
UNPREDICTABLE:
• The MOVPRFX instruction must be unpredicated.
• The MOVPRFX instruction must specify the same destination register as this instruction.
• The destination register must not refer to architectural register state referenced by any other source operand
register of this instruction.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

UQADD (immediate) Page 2529

UQADD (vectors)

Unsigned saturating add vectors (unpredicated)

Unsigned saturating add all elements of the second source vector to corresponding elements of the first source vector
and place the results in the corresponding elements of the destination vector. Each result element is saturated to the
N-bit element's unsigned integer range 0 to (2N)-1. This instruction is unpredicated.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 1 Zm 0 0 0 1 0 1 Zn Zd
U

UQADD <Zd>.<T>, <Zn>.<T>, <Zm>.<T>

if !HaveSVE() then UNDEFINED;

integer esize = 8 << UInt(size);
integer n = UInt(Zn);
integer m = UInt(Zm);
integer d = UInt(Zd);
boolean unsigned = TRUE;

Assembler Symbols

<Zd> Is the name of the destination scalable vector register, encoded in the "Zd" field.

<T> Is the size specifier, encoded in “size”:

size <T>
00 B
01 H
10 S
11 D

<Zn> Is the name of the first source scalable vector register, encoded in the "Zn" field.
<Zm> Is the name of the second source scalable vector register, encoded in the "Zm" field.

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(VL) operand1 = Z[n];
bits(VL) operand2 = Z[m];
bits(VL) result;

Z[d] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

UQADD (vectors) Page 2530

UQDECB

Unsigned saturating decrement scalar by multiple of 8-bit predicate constraint element count

Determines the number of active 8-bit elements implied by the named predicate constraint, multiplies that by an
immediate in the range 1 to 16 inclusive, and then uses the result to decrement the scalar destination. The result is
saturated to the general-purpose register's unsigned integer range.
The named predicate constraint limits the number of active elements in a single predicate to:
* A fixed number (VL1 to VL256)
* The largest power of two (POW2)
* The largest multiple of three or four (MUL3 or MUL4)
* All available, implicitly a multiple of two (ALL).
Unspecified or out of range constraint encodings generate an empty predicate or zero element count rather than
Undefined Instruction exception.
It has encodings from 2 classes: 32-bit and 64-bit

32-bit

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 0 0 1 0 imm4 1 1 1 1 1 1 pattern Rdn
size<1>size<0> sf D U

UQDECB <Wdn>{, <pattern>{, MUL #<imm>}}

if !HaveSVE() then UNDEFINED;

integer esize = 8;
integer dn = UInt(Rdn);
bits(5) pat = pattern;
integer imm = UInt(imm4) + 1;
boolean unsigned = TRUE;
integer ssize = 32;

64-bit

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 0 0 1 1 imm4 1 1 1 1 1 1 pattern Rdn
size<1>size<0> sf D U

UQDECB <Xdn>{, <pattern>{, MUL #<imm>}}

if !HaveSVE() then UNDEFINED;

integer esize = 8;
integer dn = UInt(Rdn);
bits(5) pat = pattern;
integer imm = UInt(imm4) + 1;
boolean unsigned = TRUE;
integer ssize = 64;

Assembler Symbols

<Wdn> Is the 32-bit name of the source and destination general-purpose register, encoded in the "Rdn" field.
<Xdn> Is the 64-bit name of the source and destination general-purpose register, encoded in the "Rdn" field.

<pattern> Is the optional pattern specifier, defaulting to ALL, encoded in “pattern”:

UQDECB Page 2531

<imm> Is the immediate multiplier, in the range 1 to 16, defaulting to 1, encoded in the "imm4" field.

Operation

CheckSVEEnabled();
integer count = DecodePredCount(pat, esize);
bits(ssize) operand1 = X[dn];
bits(ssize) result;

integer element1 = Int(operand1, unsigned);

(result, -) = SatQ(element1 - (count * imm), ssize, unsigned);
X[dn] = Extend(result, 64, unsigned);

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

UQDECB Page 2532

UQDECD (scalar)

Unsigned saturating decrement scalar by multiple of 64-bit predicate constraint element count

Determines the number of active 64-bit elements implied by the named predicate constraint, multiplies that by an
immediate in the range 1 to 16 inclusive, and then uses the result to decrement the scalar destination. The result is
saturated to the general-purpose register's unsigned integer range.
The named predicate constraint limits the number of active elements in a single predicate to:
* A fixed number (VL1 to VL256)
* The largest power of two (POW2)
* The largest multiple of three or four (MUL3 or MUL4)
* All available, implicitly a multiple of two (ALL).
Unspecified or out of range constraint encodings generate an empty predicate or zero element count rather than
Undefined Instruction exception.
It has encodings from 2 classes: 32-bit and 64-bit

32-bit

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 1 1 1 0 imm4 1 1 1 1 1 1 pattern Rdn
size<1>size<0> sf D U

UQDECD <Wdn>{, <pattern>{, MUL #<imm>}}

if !HaveSVE() then UNDEFINED;

integer esize = 64;
integer dn = UInt(Rdn);
bits(5) pat = pattern;
integer imm = UInt(imm4) + 1;
boolean unsigned = TRUE;
integer ssize = 32;

64-bit

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 1 1 1 1 imm4 1 1 1 1 1 1 pattern Rdn
size<1>size<0> sf D U

UQDECD <Xdn>{, <pattern>{, MUL #<imm>}}

if !HaveSVE() then UNDEFINED;

integer esize = 64;
integer dn = UInt(Rdn);
bits(5) pat = pattern;
integer imm = UInt(imm4) + 1;
boolean unsigned = TRUE;
integer ssize = 64;

Assembler Symbols

<pattern> Is the optional pattern specifier, defaulting to ALL, encoded in “pattern”:

UQDECD (scalar) Page 2533

<imm> Is the immediate multiplier, in the range 1 to 16, defaulting to 1, encoded in the "imm4" field.

Operation

CheckSVEEnabled();
integer count = DecodePredCount(pat, esize);
bits(ssize) operand1 = X[dn];
bits(ssize) result;

integer element1 = Int(operand1, unsigned);

(result, -) = SatQ(element1 - (count * imm), ssize, unsigned);
X[dn] = Extend(result, 64, unsigned);

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

UQDECD (scalar) Page 2534

UQDECD (vector)

Unsigned saturating decrement vector by multiple of 64-bit predicate constraint element count

Determines the number of active 64-bit elements implied by the named predicate constraint, multiplies that by an
immediate in the range 1 to 16 inclusive, and then uses the result to decrement all destination vector elements. The
results are saturated to the 64-bit unsigned integer range.
The named predicate constraint limits the number of active elements in a single predicate to:
* A fixed number (VL1 to VL256)
* The largest power of two (POW2)
* The largest multiple of three or four (MUL3 or MUL4)
* All available, implicitly a multiple of two (ALL).
Unspecified or out of range constraint encodings generate an empty predicate or zero element count rather than
Undefined Instruction exception.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 1 1 1 0 imm4 1 1 0 0 1 1 pattern Zdn
size<1>size<0> D U

UQDECD <Zdn>.D{, <pattern>{, MUL #<imm>}}

if !HaveSVE() then UNDEFINED;

integer esize = 64;
integer dn = UInt(Zdn);
bits(5) pat = pattern;
integer imm = UInt(imm4) + 1;
boolean unsigned = TRUE;

Assembler Symbols

<Zdn> Is the name of the source and destination scalable vector register, encoded in the "Zdn" field.

<pattern> Is the optional pattern specifier, defaulting to ALL, encoded in “pattern”:

<imm> Is the immediate multiplier, in the range 1 to 16, defaulting to 1, encoded in the "imm4" field.

UQDECD (vector) Page 2535

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
integer count = DecodePredCount(pat, esize);
bits(VL) operand1 = Z[dn];
bits(VL) result;

for e = 0 to elements-1
integer element1 = Int(Elem[operand1, e, esize], unsigned);
(Elem[result, e, esize], -) = SatQ(element1 - (count * imm), esize, unsigned);

Z[dn] = result;

Operational information

This instruction might be immediately preceded in program order by a MOVPRFX instruction. The MOVPRFX instruction
must conform to all of the following requirements, otherwise the behavior of the MOVPRFX and this instruction is
UNPREDICTABLE:
• The MOVPRFX instruction must be unpredicated.
• The MOVPRFX instruction must specify the same destination register as this instruction.
• The destination register must not refer to architectural register state referenced by any other source operand
register of this instruction.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

UQDECD (vector) Page 2536

UQDECH (scalar)

Unsigned saturating decrement scalar by multiple of 16-bit predicate constraint element count

Determines the number of active 16-bit elements implied by the named predicate constraint, multiplies that by an
immediate in the range 1 to 16 inclusive, and then uses the result to decrement the scalar destination. The result is
saturated to the general-purpose register's unsigned integer range.
The named predicate constraint limits the number of active elements in a single predicate to:
* A fixed number (VL1 to VL256)
* The largest power of two (POW2)
* The largest multiple of three or four (MUL3 or MUL4)
* All available, implicitly a multiple of two (ALL).
Unspecified or out of range constraint encodings generate an empty predicate or zero element count rather than
Undefined Instruction exception.
It has encodings from 2 classes: 32-bit and 64-bit

32-bit

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 0 1 1 0 imm4 1 1 1 1 1 1 pattern Rdn
size<1>size<0> sf D U

UQDECH <Wdn>{, <pattern>{, MUL #<imm>}}

if !HaveSVE() then UNDEFINED;

integer esize = 16;
integer dn = UInt(Rdn);
bits(5) pat = pattern;
integer imm = UInt(imm4) + 1;
boolean unsigned = TRUE;
integer ssize = 32;

64-bit

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 0 1 1 1 imm4 1 1 1 1 1 1 pattern Rdn
size<1>size<0> sf D U

UQDECH <Xdn>{, <pattern>{, MUL #<imm>}}

if !HaveSVE() then UNDEFINED;

integer esize = 16;
integer dn = UInt(Rdn);
bits(5) pat = pattern;
integer imm = UInt(imm4) + 1;
boolean unsigned = TRUE;
integer ssize = 64;

Assembler Symbols

<pattern> Is the optional pattern specifier, defaulting to ALL, encoded in “pattern”:

UQDECH (scalar) Page 2537

<imm> Is the immediate multiplier, in the range 1 to 16, defaulting to 1, encoded in the "imm4" field.

Operation

CheckSVEEnabled();
integer count = DecodePredCount(pat, esize);
bits(ssize) operand1 = X[dn];
bits(ssize) result;

integer element1 = Int(operand1, unsigned);

(result, -) = SatQ(element1 - (count * imm), ssize, unsigned);
X[dn] = Extend(result, 64, unsigned);

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

UQDECH (scalar) Page 2538

UQDECH (vector)

Unsigned saturating decrement vector by multiple of 16-bit predicate constraint element count

Determines the number of active 16-bit elements implied by the named predicate constraint, multiplies that by an
immediate in the range 1 to 16 inclusive, and then uses the result to decrement all destination vector elements. The
results are saturated to the 16-bit unsigned integer range.
The named predicate constraint limits the number of active elements in a single predicate to:
* A fixed number (VL1 to VL256)
* The largest power of two (POW2)
* The largest multiple of three or four (MUL3 or MUL4)
* All available, implicitly a multiple of two (ALL).
Unspecified or out of range constraint encodings generate an empty predicate or zero element count rather than
Undefined Instruction exception.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 0 1 1 0 imm4 1 1 0 0 1 1 pattern Zdn
size<1>size<0> D U

UQDECH <Zdn>.H{, <pattern>{, MUL #<imm>}}

if !HaveSVE() then UNDEFINED;

integer esize = 16;
integer dn = UInt(Zdn);
bits(5) pat = pattern;
integer imm = UInt(imm4) + 1;
boolean unsigned = TRUE;

Assembler Symbols

<Zdn> Is the name of the source and destination scalable vector register, encoded in the "Zdn" field.

<pattern> Is the optional pattern specifier, defaulting to ALL, encoded in “pattern”:

<imm> Is the immediate multiplier, in the range 1 to 16, defaulting to 1, encoded in the "imm4" field.

UQDECH (vector) Page 2539

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
integer count = DecodePredCount(pat, esize);
bits(VL) operand1 = Z[dn];
bits(VL) result;

for e = 0 to elements-1
integer element1 = Int(Elem[operand1, e, esize], unsigned);
(Elem[result, e, esize], -) = SatQ(element1 - (count * imm), esize, unsigned);

Z[dn] = result;

Operational information

This instruction might be immediately preceded in program order by a MOVPRFX instruction. The MOVPRFX instruction
must conform to all of the following requirements, otherwise the behavior of the MOVPRFX and this instruction is
UNPREDICTABLE:
• The MOVPRFX instruction must be unpredicated.
• The MOVPRFX instruction must specify the same destination register as this instruction.
• The destination register must not refer to architectural register state referenced by any other source operand
register of this instruction.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

UQDECH (vector) Page 2540

UQDECP (scalar)

Unsigned saturating decrement scalar by count of true predicate elements

Counts the number of true elements in the source predicate and then uses the result to decrement the scalar
destination. The result is saturated to the general-purpose register's unsigned integer range.
It has encodings from 2 classes: 32-bit and 64-bit

32-bit

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 size 1 0 1 0 1 1 1 0 0 0 1 0 0 Pm Rdn
D U sf

UQDECP <Wdn>, <Pm>.<T>

if !HaveSVE() then UNDEFINED;

integer esize = 8 << UInt(size);
integer m = UInt(Pm);
integer dn = UInt(Rdn);
boolean unsigned = TRUE;
integer ssize = 32;

64-bit

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 size 1 0 1 0 1 1 1 0 0 0 1 1 0 Pm Rdn
D U sf

UQDECP <Xdn>, <Pm>.<T>

if !HaveSVE() then UNDEFINED;

integer esize = 8 << UInt(size);
integer m = UInt(Pm);
integer dn = UInt(Rdn);
boolean unsigned = TRUE;
integer ssize = 64;

Assembler Symbols

<T> Is the size specifier, encoded in “size”:

size <T>
00 B
01 H
10 S
11 D

UQDECP (scalar) Page 2541

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(ssize) operand1 = X[dn];
bits(PL) operand2 = P[m];
bits(ssize) result;
integer count = 0;

for e = 0 to elements-1
if ElemP[operand2, e, esize] == '1' then
count = count + 1;

integer element = Int(operand1, unsigned);

(result, -) = SatQ(element - count, ssize, unsigned);
X[dn] = Extend(result, 64, unsigned);

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

UQDECP (scalar) Page 2542

UQDECP (vector)

Unsigned saturating decrement vector by count of true predicate elements

Counts the number of true elements in the source predicate and then uses the result to decrement all destination
vector elements. The results are saturated to the element unsigned integer range.
The predicate size specifier may be omitted in assembler source code, but this is deprecated and will be prohibited in
a future release of the architecture.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 size 1 0 1 0 1 1 1 0 0 0 0 0 0 Pm Zdn
D U

UQDECP <Zdn>.<T>, <Pm>.<T>

if !HaveSVE() then UNDEFINED;

if size == '00' then UNDEFINED;
integer esize = 8 << UInt(size);
integer m = UInt(Pm);
integer dn = UInt(Zdn);
boolean unsigned = TRUE;

Assembler Symbols

<Zdn> Is the name of the source and destination scalable vector register, encoded in the "Zdn" field.

<T> Is the size specifier, encoded in “size”:

size <T>
00 RESERVED
01 H
10 S
11 D

<Pm> Is the name of the source scalable predicate register, encoded in the "Pm" field.

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(VL) operand1 = Z[dn];
bits(PL) operand2 = P[m];
bits(VL) result;
integer count = 0;

for e = 0 to elements-1
if ElemP[operand2, e, esize] == '1' then
count = count + 1;

for e = 0 to elements-1
integer element = Int(Elem[operand1, e, esize], unsigned);
(Elem[result, e, esize], -) = SatQ(element - count, esize, unsigned);

Z[dn] = result;

Operational information

This instruction might be immediately preceded in program order by a MOVPRFX instruction. The MOVPRFX instruction
must conform to all of the following requirements, otherwise the behavior of the MOVPRFX and this instruction is
UNPREDICTABLE:
• The MOVPRFX instruction must be unpredicated.
• The MOVPRFX instruction must specify the same destination register as this instruction.
• The destination register must not refer to architectural register state referenced by any other source operand
register of this instruction.

UQDECP (vector) Page 2543

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

UQDECP (vector) Page 2544

UQDECW (scalar)

Unsigned saturating decrement scalar by multiple of 32-bit predicate constraint element count

Determines the number of active 32-bit elements implied by the named predicate constraint, multiplies that by an
immediate in the range 1 to 16 inclusive, and then uses the result to decrement the scalar destination. The result is
saturated to the general-purpose register's unsigned integer range.
The named predicate constraint limits the number of active elements in a single predicate to:
* A fixed number (VL1 to VL256)
* The largest power of two (POW2)
* The largest multiple of three or four (MUL3 or MUL4)
* All available, implicitly a multiple of two (ALL).
Unspecified or out of range constraint encodings generate an empty predicate or zero element count rather than
Undefined Instruction exception.
It has encodings from 2 classes: 32-bit and 64-bit

32-bit

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 1 0 1 0 imm4 1 1 1 1 1 1 pattern Rdn
size<1>size<0> sf D U

UQDECW <Wdn>{, <pattern>{, MUL #<imm>}}

if !HaveSVE() then UNDEFINED;

integer esize = 32;
integer dn = UInt(Rdn);
bits(5) pat = pattern;
integer imm = UInt(imm4) + 1;
boolean unsigned = TRUE;
integer ssize = 32;

64-bit

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 1 0 1 1 imm4 1 1 1 1 1 1 pattern Rdn
size<1>size<0> sf D U

UQDECW <Xdn>{, <pattern>{, MUL #<imm>}}

if !HaveSVE() then UNDEFINED;

integer esize = 32;
integer dn = UInt(Rdn);
bits(5) pat = pattern;
integer imm = UInt(imm4) + 1;
boolean unsigned = TRUE;
integer ssize = 64;

Assembler Symbols

<pattern> Is the optional pattern specifier, defaulting to ALL, encoded in “pattern”:

UQDECW (scalar) Page 2545

<imm> Is the immediate multiplier, in the range 1 to 16, defaulting to 1, encoded in the "imm4" field.

Operation

CheckSVEEnabled();
integer count = DecodePredCount(pat, esize);
bits(ssize) operand1 = X[dn];
bits(ssize) result;

integer element1 = Int(operand1, unsigned);

(result, -) = SatQ(element1 - (count * imm), ssize, unsigned);
X[dn] = Extend(result, 64, unsigned);

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

UQDECW (scalar) Page 2546

UQDECW (vector)

Unsigned saturating decrement vector by multiple of 32-bit predicate constraint element count

Determines the number of active 32-bit elements implied by the named predicate constraint, multiplies that by an
immediate in the range 1 to 16 inclusive, and then uses the result to decrement all destination vector elements. The
results are saturated to the 32-bit unsigned integer range.
The named predicate constraint limits the number of active elements in a single predicate to:
* A fixed number (VL1 to VL256)
* The largest power of two (POW2)
* The largest multiple of three or four (MUL3 or MUL4)
* All available, implicitly a multiple of two (ALL).
Unspecified or out of range constraint encodings generate an empty predicate or zero element count rather than
Undefined Instruction exception.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 1 0 1 0 imm4 1 1 0 0 1 1 pattern Zdn
size<1>size<0> D U

UQDECW <Zdn>.S{, <pattern>{, MUL #<imm>}}

if !HaveSVE() then UNDEFINED;

integer esize = 32;
integer dn = UInt(Zdn);
bits(5) pat = pattern;
integer imm = UInt(imm4) + 1;
boolean unsigned = TRUE;

Assembler Symbols

<Zdn> Is the name of the source and destination scalable vector register, encoded in the "Zdn" field.

<pattern> Is the optional pattern specifier, defaulting to ALL, encoded in “pattern”:

<imm> Is the immediate multiplier, in the range 1 to 16, defaulting to 1, encoded in the "imm4" field.

UQDECW (vector) Page 2547

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
integer count = DecodePredCount(pat, esize);
bits(VL) operand1 = Z[dn];
bits(VL) result;

for e = 0 to elements-1
integer element1 = Int(Elem[operand1, e, esize], unsigned);
(Elem[result, e, esize], -) = SatQ(element1 - (count * imm), esize, unsigned);

Z[dn] = result;

Operational information

This instruction might be immediately preceded in program order by a MOVPRFX instruction. The MOVPRFX instruction
must conform to all of the following requirements, otherwise the behavior of the MOVPRFX and this instruction is
UNPREDICTABLE:
• The MOVPRFX instruction must be unpredicated.
• The MOVPRFX instruction must specify the same destination register as this instruction.
• The destination register must not refer to architectural register state referenced by any other source operand
register of this instruction.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

UQDECW (vector) Page 2548

UQINCB

Unsigned saturating increment scalar by multiple of 8-bit predicate constraint element count

Determines the number of active 8-bit elements implied by the named predicate constraint, multiplies that by an
immediate in the range 1 to 16 inclusive, and then uses the result to increment the scalar destination. The result is
saturated to the general-purpose register's unsigned integer range.
The named predicate constraint limits the number of active elements in a single predicate to:
* A fixed number (VL1 to VL256)
* The largest power of two (POW2)
* The largest multiple of three or four (MUL3 or MUL4)
* All available, implicitly a multiple of two (ALL).
Unspecified or out of range constraint encodings generate an empty predicate or zero element count rather than
Undefined Instruction exception.
It has encodings from 2 classes: 32-bit and 64-bit

32-bit

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 0 0 1 0 imm4 1 1 1 1 0 1 pattern Rdn
size<1>size<0> sf D U

UQINCB <Wdn>{, <pattern>{, MUL #<imm>}}

if !HaveSVE() then UNDEFINED;

integer esize = 8;
integer dn = UInt(Rdn);
bits(5) pat = pattern;
integer imm = UInt(imm4) + 1;
boolean unsigned = TRUE;
integer ssize = 32;

64-bit

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 0 0 1 1 imm4 1 1 1 1 0 1 pattern Rdn
size<1>size<0> sf D U

UQINCB <Xdn>{, <pattern>{, MUL #<imm>}}

if !HaveSVE() then UNDEFINED;

integer esize = 8;
integer dn = UInt(Rdn);
bits(5) pat = pattern;
integer imm = UInt(imm4) + 1;
boolean unsigned = TRUE;
integer ssize = 64;

Assembler Symbols

<pattern> Is the optional pattern specifier, defaulting to ALL, encoded in “pattern”:

UQINCB Page 2549

<imm> Is the immediate multiplier, in the range 1 to 16, defaulting to 1, encoded in the "imm4" field.

Operation

CheckSVEEnabled();
integer count = DecodePredCount(pat, esize);
bits(ssize) operand1 = X[dn];
bits(ssize) result;

integer element1 = Int(operand1, unsigned);

(result, -) = SatQ(element1 + (count * imm), ssize, unsigned);
X[dn] = Extend(result, 64, unsigned);

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

UQINCB Page 2550

UQINCD (scalar)

Unsigned saturating increment scalar by multiple of 64-bit predicate constraint element count

Determines the number of active 64-bit elements implied by the named predicate constraint, multiplies that by an
immediate in the range 1 to 16 inclusive, and then uses the result to increment the scalar destination. The result is
saturated to the general-purpose register's unsigned integer range.
The named predicate constraint limits the number of active elements in a single predicate to:
* A fixed number (VL1 to VL256)
* The largest power of two (POW2)
* The largest multiple of three or four (MUL3 or MUL4)
* All available, implicitly a multiple of two (ALL).
Unspecified or out of range constraint encodings generate an empty predicate or zero element count rather than
Undefined Instruction exception.
It has encodings from 2 classes: 32-bit and 64-bit

32-bit

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 1 1 1 0 imm4 1 1 1 1 0 1 pattern Rdn
size<1>size<0> sf D U

UQINCD <Wdn>{, <pattern>{, MUL #<imm>}}

if !HaveSVE() then UNDEFINED;

integer esize = 64;
integer dn = UInt(Rdn);
bits(5) pat = pattern;
integer imm = UInt(imm4) + 1;
boolean unsigned = TRUE;
integer ssize = 32;

64-bit

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 1 1 1 1 imm4 1 1 1 1 0 1 pattern Rdn
size<1>size<0> sf D U

UQINCD <Xdn>{, <pattern>{, MUL #<imm>}}

if !HaveSVE() then UNDEFINED;

integer esize = 64;
integer dn = UInt(Rdn);
bits(5) pat = pattern;
integer imm = UInt(imm4) + 1;
boolean unsigned = TRUE;
integer ssize = 64;

Assembler Symbols

<pattern> Is the optional pattern specifier, defaulting to ALL, encoded in “pattern”:

UQINCD (scalar) Page 2551

<imm> Is the immediate multiplier, in the range 1 to 16, defaulting to 1, encoded in the "imm4" field.

Operation

CheckSVEEnabled();
integer count = DecodePredCount(pat, esize);
bits(ssize) operand1 = X[dn];
bits(ssize) result;

integer element1 = Int(operand1, unsigned);

(result, -) = SatQ(element1 + (count * imm), ssize, unsigned);
X[dn] = Extend(result, 64, unsigned);

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

UQINCD (scalar) Page 2552

UQINCD (vector)

Unsigned saturating increment vector by multiple of 64-bit predicate constraint element count

Determines the number of active 64-bit elements implied by the named predicate constraint, multiplies that by an
immediate in the range 1 to 16 inclusive, and then uses the result to increment all destination vector elements. The
results are saturated to the 64-bit unsigned integer range.
The named predicate constraint limits the number of active elements in a single predicate to:
* A fixed number (VL1 to VL256)
* The largest power of two (POW2)
* The largest multiple of three or four (MUL3 or MUL4)
* All available, implicitly a multiple of two (ALL).
Unspecified or out of range constraint encodings generate an empty predicate or zero element count rather than
Undefined Instruction exception.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 1 1 1 0 imm4 1 1 0 0 0 1 pattern Zdn
size<1>size<0> D U

UQINCD <Zdn>.D{, <pattern>{, MUL #<imm>}}

if !HaveSVE() then UNDEFINED;

integer esize = 64;
integer dn = UInt(Zdn);
bits(5) pat = pattern;
integer imm = UInt(imm4) + 1;
boolean unsigned = TRUE;

Assembler Symbols

<Zdn> Is the name of the source and destination scalable vector register, encoded in the "Zdn" field.

<pattern> Is the optional pattern specifier, defaulting to ALL, encoded in “pattern”:

<imm> Is the immediate multiplier, in the range 1 to 16, defaulting to 1, encoded in the "imm4" field.

UQINCD (vector) Page 2553

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
integer count = DecodePredCount(pat, esize);
bits(VL) operand1 = Z[dn];
bits(VL) result;

for e = 0 to elements-1
integer element1 = Int(Elem[operand1, e, esize], unsigned);
(Elem[result, e, esize], -) = SatQ(element1 + (count * imm), esize, unsigned);

Z[dn] = result;

Operational information

This instruction might be immediately preceded in program order by a MOVPRFX instruction. The MOVPRFX instruction
must conform to all of the following requirements, otherwise the behavior of the MOVPRFX and this instruction is
UNPREDICTABLE:
• The MOVPRFX instruction must be unpredicated.
• The MOVPRFX instruction must specify the same destination register as this instruction.
• The destination register must not refer to architectural register state referenced by any other source operand
register of this instruction.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

UQINCD (vector) Page 2554

UQINCH (scalar)

Unsigned saturating increment scalar by multiple of 16-bit predicate constraint element count

Determines the number of active 16-bit elements implied by the named predicate constraint, multiplies that by an
immediate in the range 1 to 16 inclusive, and then uses the result to increment the scalar destination. The result is
saturated to the general-purpose register's unsigned integer range.
The named predicate constraint limits the number of active elements in a single predicate to:
* A fixed number (VL1 to VL256)
* The largest power of two (POW2)
* The largest multiple of three or four (MUL3 or MUL4)
* All available, implicitly a multiple of two (ALL).
Unspecified or out of range constraint encodings generate an empty predicate or zero element count rather than
Undefined Instruction exception.
It has encodings from 2 classes: 32-bit and 64-bit

32-bit

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 0 1 1 0 imm4 1 1 1 1 0 1 pattern Rdn
size<1>size<0> sf D U

UQINCH <Wdn>{, <pattern>{, MUL #<imm>}}

if !HaveSVE() then UNDEFINED;

integer esize = 16;
integer dn = UInt(Rdn);
bits(5) pat = pattern;
integer imm = UInt(imm4) + 1;
boolean unsigned = TRUE;
integer ssize = 32;

64-bit

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 0 1 1 1 imm4 1 1 1 1 0 1 pattern Rdn
size<1>size<0> sf D U

UQINCH <Xdn>{, <pattern>{, MUL #<imm>}}

if !HaveSVE() then UNDEFINED;

integer esize = 16;
integer dn = UInt(Rdn);
bits(5) pat = pattern;
integer imm = UInt(imm4) + 1;
boolean unsigned = TRUE;
integer ssize = 64;

Assembler Symbols

<pattern> Is the optional pattern specifier, defaulting to ALL, encoded in “pattern”:

UQINCH (scalar) Page 2555

<imm> Is the immediate multiplier, in the range 1 to 16, defaulting to 1, encoded in the "imm4" field.

Operation

CheckSVEEnabled();
integer count = DecodePredCount(pat, esize);
bits(ssize) operand1 = X[dn];
bits(ssize) result;

integer element1 = Int(operand1, unsigned);

(result, -) = SatQ(element1 + (count * imm), ssize, unsigned);
X[dn] = Extend(result, 64, unsigned);

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

UQINCH (scalar) Page 2556

UQINCH (vector)

Unsigned saturating increment vector by multiple of 16-bit predicate constraint element count

Determines the number of active 16-bit elements implied by the named predicate constraint, multiplies that by an
immediate in the range 1 to 16 inclusive, and then uses the result to increment all destination vector elements. The
results are saturated to the 16-bit unsigned integer range.
The named predicate constraint limits the number of active elements in a single predicate to:
* A fixed number (VL1 to VL256)
* The largest power of two (POW2)
* The largest multiple of three or four (MUL3 or MUL4)
* All available, implicitly a multiple of two (ALL).
Unspecified or out of range constraint encodings generate an empty predicate or zero element count rather than
Undefined Instruction exception.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 0 1 1 0 imm4 1 1 0 0 0 1 pattern Zdn
size<1>size<0> D U

UQINCH <Zdn>.H{, <pattern>{, MUL #<imm>}}

if !HaveSVE() then UNDEFINED;

integer esize = 16;
integer dn = UInt(Zdn);
bits(5) pat = pattern;
integer imm = UInt(imm4) + 1;
boolean unsigned = TRUE;

Assembler Symbols

<Zdn> Is the name of the source and destination scalable vector register, encoded in the "Zdn" field.

<pattern> Is the optional pattern specifier, defaulting to ALL, encoded in “pattern”:

<imm> Is the immediate multiplier, in the range 1 to 16, defaulting to 1, encoded in the "imm4" field.

UQINCH (vector) Page 2557

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
integer count = DecodePredCount(pat, esize);
bits(VL) operand1 = Z[dn];
bits(VL) result;

for e = 0 to elements-1
integer element1 = Int(Elem[operand1, e, esize], unsigned);
(Elem[result, e, esize], -) = SatQ(element1 + (count * imm), esize, unsigned);

Z[dn] = result;

Operational information

This instruction might be immediately preceded in program order by a MOVPRFX instruction. The MOVPRFX instruction
must conform to all of the following requirements, otherwise the behavior of the MOVPRFX and this instruction is
UNPREDICTABLE:
• The MOVPRFX instruction must be unpredicated.
• The MOVPRFX instruction must specify the same destination register as this instruction.
• The destination register must not refer to architectural register state referenced by any other source operand
register of this instruction.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

UQINCH (vector) Page 2558

UQINCP (scalar)

Unsigned saturating increment scalar by count of true predicate elements

Counts the number of true elements in the source predicate and then uses the result to increment the scalar
destination. The result is saturated to the general-purpose register's unsigned integer range.
It has encodings from 2 classes: 32-bit and 64-bit

32-bit

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 size 1 0 1 0 0 1 1 0 0 0 1 0 0 Pm Rdn
D U sf

UQINCP <Wdn>, <Pm>.<T>

if !HaveSVE() then UNDEFINED;

integer esize = 8 << UInt(size);
integer m = UInt(Pm);
integer dn = UInt(Rdn);
boolean unsigned = TRUE;
integer ssize = 32;

64-bit

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 size 1 0 1 0 0 1 1 0 0 0 1 1 0 Pm Rdn
D U sf

UQINCP <Xdn>, <Pm>.<T>

if !HaveSVE() then UNDEFINED;

integer esize = 8 << UInt(size);
integer m = UInt(Pm);
integer dn = UInt(Rdn);
boolean unsigned = TRUE;
integer ssize = 64;

Assembler Symbols

<T> Is the size specifier, encoded in “size”:

size <T>
00 B
01 H
10 S
11 D

UQINCP (scalar) Page 2559

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(ssize) operand1 = X[dn];
bits(PL) operand2 = P[m];
bits(ssize) result;
integer count = 0;

for e = 0 to elements-1
if ElemP[operand2, e, esize] == '1' then
count = count + 1;

integer element = Int(operand1, unsigned);

(result, -) = SatQ(element + count, ssize, unsigned);
X[dn] = Extend(result, 64, unsigned);

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

UQINCP (scalar) Page 2560

UQINCP (vector)

Unsigned saturating increment vector by count of true predicate elements

Counts the number of true elements in the source predicate and then uses the result to increment all destination
vector elements. The results are saturated to the element unsigned integer range.
The predicate size specifier may be omitted in assembler source code, but this is deprecated and will be prohibited in
a future release of the architecture.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 size 1 0 1 0 0 1 1 0 0 0 0 0 0 Pm Zdn
D U

UQINCP <Zdn>.<T>, <Pm>.<T>

if !HaveSVE() then UNDEFINED;

if size == '00' then UNDEFINED;
integer esize = 8 << UInt(size);
integer m = UInt(Pm);
integer dn = UInt(Zdn);
boolean unsigned = TRUE;

Assembler Symbols

<Zdn> Is the name of the source and destination scalable vector register, encoded in the "Zdn" field.

<T> Is the size specifier, encoded in “size”:

size <T>
00 RESERVED
01 H
10 S
11 D

<Pm> Is the name of the source scalable predicate register, encoded in the "Pm" field.

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(VL) operand1 = Z[dn];
bits(PL) operand2 = P[m];
bits(VL) result;
integer count = 0;

for e = 0 to elements-1
if ElemP[operand2, e, esize] == '1' then
count = count + 1;

for e = 0 to elements-1
integer element = Int(Elem[operand1, e, esize], unsigned);
(Elem[result, e, esize], -) = SatQ(element + count, esize, unsigned);

Z[dn] = result;

Operational information

This instruction might be immediately preceded in program order by a MOVPRFX instruction. The MOVPRFX instruction
must conform to all of the following requirements, otherwise the behavior of the MOVPRFX and this instruction is
UNPREDICTABLE:
• The MOVPRFX instruction must be unpredicated.
• The MOVPRFX instruction must specify the same destination register as this instruction.
• The destination register must not refer to architectural register state referenced by any other source operand
register of this instruction.

UQINCP (vector) Page 2561

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

UQINCP (vector) Page 2562

UQINCW (scalar)

Unsigned saturating increment scalar by multiple of 32-bit predicate constraint element count

Determines the number of active 32-bit elements implied by the named predicate constraint, multiplies that by an
immediate in the range 1 to 16 inclusive, and then uses the result to increment the scalar destination. The result is
saturated to the general-purpose register's unsigned integer range.
The named predicate constraint limits the number of active elements in a single predicate to:
* A fixed number (VL1 to VL256)
* The largest power of two (POW2)
* The largest multiple of three or four (MUL3 or MUL4)
* All available, implicitly a multiple of two (ALL).
Unspecified or out of range constraint encodings generate an empty predicate or zero element count rather than
Undefined Instruction exception.
It has encodings from 2 classes: 32-bit and 64-bit

32-bit

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 1 0 1 0 imm4 1 1 1 1 0 1 pattern Rdn
size<1>size<0> sf D U

UQINCW <Wdn>{, <pattern>{, MUL #<imm>}}

if !HaveSVE() then UNDEFINED;

integer esize = 32;
integer dn = UInt(Rdn);
bits(5) pat = pattern;
integer imm = UInt(imm4) + 1;
boolean unsigned = TRUE;
integer ssize = 32;

64-bit

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 1 0 1 1 imm4 1 1 1 1 0 1 pattern Rdn
size<1>size<0> sf D U

UQINCW <Xdn>{, <pattern>{, MUL #<imm>}}

if !HaveSVE() then UNDEFINED;

integer esize = 32;
integer dn = UInt(Rdn);
bits(5) pat = pattern;
integer imm = UInt(imm4) + 1;
boolean unsigned = TRUE;
integer ssize = 64;

Assembler Symbols

<pattern> Is the optional pattern specifier, defaulting to ALL, encoded in “pattern”:

UQINCW (scalar) Page 2563

<imm> Is the immediate multiplier, in the range 1 to 16, defaulting to 1, encoded in the "imm4" field.

Operation

CheckSVEEnabled();
integer count = DecodePredCount(pat, esize);
bits(ssize) operand1 = X[dn];
bits(ssize) result;

integer element1 = Int(operand1, unsigned);

(result, -) = SatQ(element1 + (count * imm), ssize, unsigned);
X[dn] = Extend(result, 64, unsigned);

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

UQINCW (scalar) Page 2564

UQINCW (vector)

Unsigned saturating increment vector by multiple of 32-bit predicate constraint element count

Determines the number of active 32-bit elements implied by the named predicate constraint, multiplies that by an
immediate in the range 1 to 16 inclusive, and then uses the result to increment all destination vector elements. The
results are saturated to the 32-bit unsigned integer range.
The named predicate constraint limits the number of active elements in a single predicate to:
* A fixed number (VL1 to VL256)
* The largest power of two (POW2)
* The largest multiple of three or four (MUL3 or MUL4)
* All available, implicitly a multiple of two (ALL).
Unspecified or out of range constraint encodings generate an empty predicate or zero element count rather than
Undefined Instruction exception.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 1 0 1 0 imm4 1 1 0 0 0 1 pattern Zdn
size<1>size<0> D U

UQINCW <Zdn>.S{, <pattern>{, MUL #<imm>}}

if !HaveSVE() then UNDEFINED;

integer esize = 32;
integer dn = UInt(Zdn);
bits(5) pat = pattern;
integer imm = UInt(imm4) + 1;
boolean unsigned = TRUE;

Assembler Symbols

<Zdn> Is the name of the source and destination scalable vector register, encoded in the "Zdn" field.

<pattern> Is the optional pattern specifier, defaulting to ALL, encoded in “pattern”:

<imm> Is the immediate multiplier, in the range 1 to 16, defaulting to 1, encoded in the "imm4" field.

UQINCW (vector) Page 2565

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
integer count = DecodePredCount(pat, esize);
bits(VL) operand1 = Z[dn];
bits(VL) result;

for e = 0 to elements-1
integer element1 = Int(Elem[operand1, e, esize], unsigned);
(Elem[result, e, esize], -) = SatQ(element1 + (count * imm), esize, unsigned);

Z[dn] = result;

Operational information

This instruction might be immediately preceded in program order by a MOVPRFX instruction. The MOVPRFX instruction
must conform to all of the following requirements, otherwise the behavior of the MOVPRFX and this instruction is
UNPREDICTABLE:
• The MOVPRFX instruction must be unpredicated.
• The MOVPRFX instruction must specify the same destination register as this instruction.
• The destination register must not refer to architectural register state referenced by any other source operand
register of this instruction.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

UQINCW (vector) Page 2566

UQSUB (immediate)

Unsigned saturating subtract immediate (unpredicated)

Unsigned saturating subtract an unsigned immediate from each element of the source vector, and destructively place
the results in the corresponding elements of the source vector. Each result element is saturated to the N-bit element's
unsigned integer range 0 to (2N)-1. This instruction is unpredicated.
The immediate is an unsigned value in the range 0 to 255, and for element widths of 16 bits or higher it may also be a
positive multiple of 256 in the range 256 to 65280.
The immediate is encoded in 8 bits with an optional left shift by 8. The preferred disassembly when the shift option is
specified is "#<uimm8>, LSL #8". However an assembler and disassembler may also allow use of the shifted 16-bit
value unless the immediate is 0 and the shift amount is 8, which must be unambiguously described as "#0, LSL #8".
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 size 1 0 0 1 1 1 1 1 sh imm8 Zdn
U

UQSUB <Zdn>.<T>, <Zdn>.<T>, #<imm>{, <shift>}

if !HaveSVE() then UNDEFINED;

if size:sh == '001' then UNDEFINED;
integer esize = 8 << UInt(size);
integer dn = UInt(Zdn);
integer imm = UInt(imm8);
if sh == '1' then imm = imm << 8;
boolean unsigned = TRUE;

Assembler Symbols

<Zdn> Is the name of the source and destination scalable vector register, encoded in the "Zdn" field.

<T> Is the size specifier, encoded in “size”:

size <T>
00 B
01 H
10 S
11 D

<imm> Is an unsigned immediate in the range 0 to 255, encoded in the "imm8" field.

<shift> Is the optional left shift to apply to the immediate, defaulting to LSL #0 and encoded in “sh”:

sh <shift>
0 LSL #0
1 LSL #8

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(VL) operand1 = Z[dn];
bits(VL) result;

for e = 0 to elements-1
integer element1 = Int(Elem[operand1, e, esize], unsigned);
(Elem[result, e, esize], -) = SatQ(element1 - imm, esize, unsigned);

Z[dn] = result;

UQSUB (immediate) Page 2567

Operational information

This instruction might be immediately preceded in program order by a MOVPRFX instruction. The MOVPRFX instruction
must conform to all of the following requirements, otherwise the behavior of the MOVPRFX and this instruction is
UNPREDICTABLE:
• The MOVPRFX instruction must be unpredicated.
• The MOVPRFX instruction must specify the same destination register as this instruction.
• The destination register must not refer to architectural register state referenced by any other source operand
register of this instruction.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

UQSUB (immediate) Page 2568

UQSUB (vectors)

Unsigned saturating subtract vectors (unpredicated)

Unsigned saturating subtract all elements of the second source vector from corresponding elements of the first source
vector and place the results in the corresponding elements of the destination vector. Each result element is saturated
to the N-bit element's unsigned integer range 0 to (2N)-1. This instruction is unpredicated.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 1 Zm 0 0 0 1 1 1 Zn Zd
U

UQSUB <Zd>.<T>, <Zn>.<T>, <Zm>.<T>

if !HaveSVE() then UNDEFINED;

integer esize = 8 << UInt(size);
integer n = UInt(Zn);
integer m = UInt(Zm);
integer d = UInt(Zd);
boolean unsigned = TRUE;

Assembler Symbols

<Zd> Is the name of the destination scalable vector register, encoded in the "Zd" field.

<T> Is the size specifier, encoded in “size”:

size <T>
00 B
01 H
10 S
11 D

<Zn> Is the name of the first source scalable vector register, encoded in the "Zn" field.
<Zm> Is the name of the second source scalable vector register, encoded in the "Zm" field.

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(VL) operand1 = Z[n];
bits(VL) operand2 = Z[m];
bits(VL) result;

Z[d] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

UQSUB (vectors) Page 2569

USDOT (indexed)

Unsigned by signed integer indexed dot product

The unsigned by signed integer indexed dot product instruction computes the dot product of a group of four unsigned
8-bit integer values held in each 32-bit element of the first source vector multiplied by a group of four signed 8-bit
integer values in an indexed 32-bit element of the second source vector, and then destructively adds the widened dot
product to the corresponding 32-bit element of the destination vector.
The groups within the second source vector are specified using an immediate index which selects the same group
position within each 128-bit vector segment. The index range is from 0 to 3. This instruction is unpredicated.
ID_AA64ZFR0_EL1.I8MM indicates whether this instruction is implemented.

SVE
(FEAT_I8MM)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 0 0 1 0 0 1 0 1 i2 Zm 0 0 0 1 1 0 Zn Zda
size<1>size<0> U

USDOT <Zda>.S, <Zn>.B, <Zm>.B[<imm>]

if !HaveSVE() || !HaveInt8MatMulExt() then UNDEFINED;

integer esize = 32;
integer index = UInt(i2);
integer n = UInt(Zn);
integer m = UInt(Zm);
integer da = UInt(Zda);

Assembler Symbols

<Zda> Is the name of the third source and destination scalable vector register, encoded in the "Zda" field.
<Zn> Is the name of the first source scalable vector register, encoded in the "Zn" field.
<Zm> Is the name of the second source scalable vector register Z0-Z7, encoded in the "Zm" field.
<imm> Is the immediate index of a quadtuplet of four 8-bit elements within each 128-bit vector segment, in the
range 0 to 3, encoded in the "i2" field.

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
integer eltspersegment = 128 DIV esize;
bits(VL) operand1 = Z[n];
bits(VL) operand2 = Z[m];
bits(VL) operand3 = Z[da];
bits(VL) result;

for e = 0 to elements-1
integer segmentbase = e - (e MOD eltspersegment);
integer s = segmentbase + index;
bits(esize) res = Elem[operand3, e, esize];
for i = 0 to 3
integer element1 = UInt(Elem[operand1, 4 * e + i, esize DIV 4]);
integer element2 = SInt(Elem[operand2, 4 * s + i, esize DIV 4]);
res = res + element1 * element2;
Elem[result, e, esize] = res;

Z[da] = result;

USDOT (indexed) Page 2570

Operational information

This instruction might be immediately preceded in program order by a MOVPRFX instruction. The MOVPRFX instruction
must conform to all of the following requirements, otherwise the behavior of the MOVPRFX and this instruction is
UNPREDICTABLE:
• The MOVPRFX instruction must be unpredicated.
• The MOVPRFX instruction must specify the same destination register as this instruction.
• The destination register must not refer to architectural register state referenced by any other source operand
register of this instruction.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

USDOT (indexed) Page 2571

USDOT (vectors)

Unsigned by signed integer dot product

The unsigned by signed integer dot product instruction computes the dot product of a group of four unsigned 8-bit
integer values held in each 32-bit element of the first source vector multiplied by a group of four signed 8-bit integer
values in the corresponding 32-bit element of the second source vector, and then destructively adds the widened dot
product to the corresponding 32-bit element of the destination vector.
This instruction is unpredicated.
ID_AA64ZFR0_EL1.I8MM indicates whether this instruction is implemented.

SVE
(FEAT_I8MM)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 0 0 1 0 0 1 0 0 Zm 0 1 1 1 1 0 Zn Zda
size<1>size<0>

USDOT <Zda>.S, <Zn>.B, <Zm>.B

if !HaveSVE() || !HaveInt8MatMulExt() then UNDEFINED;

integer esize = 32;
integer n = UInt(Zn);
integer m = UInt(Zm);
integer da = UInt(Zda);

Assembler Symbols

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(VL) operand1 = Z[n];
bits(VL) operand2 = Z[m];
bits(VL) operand3 = Z[da];
bits(VL) result;

for e = 0 to elements-1
bits(esize) res = Elem[operand3, e, esize];
for i = 0 to 3
integer element1 = UInt(Elem[operand1, 4 * e + i, esize DIV 4]);
integer element2 = SInt(Elem[operand2, 4 * e + i, esize DIV 4]);
res = res + element1 * element2;
Elem[result, e, esize] = res;

Z[da] = result;

Operational information

This instruction might be immediately preceded in program order by a MOVPRFX instruction. The MOVPRFX instruction
must conform to all of the following requirements, otherwise the behavior of the MOVPRFX and this instruction is
UNPREDICTABLE:
• The MOVPRFX instruction must be unpredicated.
• The MOVPRFX instruction must specify the same destination register as this instruction.
• The destination register must not refer to architectural register state referenced by any other source operand
register of this instruction.

USDOT (vectors) Page 2572

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

USDOT (vectors) Page 2573

USMMLA

Unsigned by signed integer matrix multiply-accumulate

The unsigned by signed integer matrix multiply-accumulate instruction multiplies the 2×8 matrix of unsigned 8-bit
integer values held in each 128-bit segment of the first source vector by the 8×2 matrix of signed 8-bit integer values
in the corresponding segment of the second source vector. The resulting 2×2 widened 32-bit integer matrix product is
then destructively added to the 32-bit integer matrix accumulator held in the corresponding segment of the addend
and destination vector. This is equivalent to performing an 8-way dot product per destination element.
This instruction is unpredicated.
ID_AA64ZFR0_EL1.I8MM indicates whether this instruction is implemented.

SVE
(FEAT_I8MM)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 0 0 1 0 1 1 0 0 Zm 1 0 0 1 1 0 Zn Zda
uns<1>uns<0>

USMMLA <Zda>.S, <Zn>.B, <Zm>.B

if !HaveSVE() || !HaveInt8MatMulExt() then UNDEFINED;

integer n = UInt(Zn);
integer m = UInt(Zm);
integer da = UInt(Zda);
boolean op1_unsigned = TRUE;
boolean op2_unsigned = FALSE;

Assembler Symbols

Operation

CheckSVEEnabled();
integer segments = VL DIV 128;
bits(VL) operand1 = Z[n];
bits(VL) operand2 = Z[m];
bits(VL) operand3 = Z[da];
bits(VL) result = Zeros();
bits(128) op1, op2;
bits(128) res, addend;

Z[da] = result;

Operational information

USMMLA Page 2574

• The destination register must not refer to architectural register state referenced by any other source operand
register of this instruction.

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

USMMLA Page 2575

UUNPKHI, UUNPKLO

Unsigned unpack and extend half of vector

Unpack elements from the lowest or highest half of the source vector and then zero-extend them to place in elements
of twice their size within the destination vector. This instruction is unpredicated.
It has encodings from 2 classes: High half and Low half

High half

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 size 1 1 0 0 1 1 0 0 1 1 1 0 Zn Zd
U H

UUNPKHI <Zd>.<T>, <Zn>.<Tb>

if !HaveSVE() then UNDEFINED;

if size == '00' then UNDEFINED;
integer esize = 8 << UInt(size);
integer n = UInt(Zn);
integer d = UInt(Zd);
boolean unsigned = TRUE;
boolean hi = TRUE;

Low half

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 size 1 1 0 0 1 0 0 0 1 1 1 0 Zn Zd
U H

UUNPKLO <Zd>.<T>, <Zn>.<Tb>

if !HaveSVE() then UNDEFINED;

if size == '00' then UNDEFINED;
integer esize = 8 << UInt(size);
integer n = UInt(Zn);
integer d = UInt(Zd);
boolean unsigned = TRUE;
boolean hi = FALSE;

Assembler Symbols

<Zd> Is the name of the destination scalable vector register, encoded in the "Zd" field.

<T> Is the size specifier, encoded in “size”:

size <T>
00 RESERVED
01 H
10 S
11 D

<Zn> Is the name of the source scalable vector register, encoded in the "Zn" field.

<Tb> Is the size specifier, encoded in “size”:

size <Tb>
00 RESERVED
01 B
10 H
11 S

UUNPKHI, UUNPKLO Page 2576

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
integer hsize = esize DIV 2;
bits(VL) operand = Z[n];
bits(VL) result;

for e = 0 to elements-1
bits(hsize) element = if hi then Elem[operand, e + elements, hsize] else Elem[operand, e, hsize];
Elem[result, e, esize] = Extend(element, esize, unsigned);

Z[d] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

UUNPKHI, UUNPKLO Page 2577

UXTB, UXTH, UXTW

Unsigned byte / halfword / word extend (predicated)

Zero-extend the least-significant sub-element of each active element of the source vector, and place the results in the
corresponding elements of the destination vector. Inactive elements in the destination vector register remain
unmodified.
It has encodings from 3 classes: Byte , Halfword and Word

Byte

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 0 1 0 0 0 1 1 0 1 Pg Zn Zd
U

UXTB <Zd>.<T>, <Pg>/M, <Zn>.<T>

if !HaveSVE() then UNDEFINED;

if size == '00' then UNDEFINED;
integer esize = 8 << UInt(size);
integer s_esize = 8;
integer g = UInt(Pg);
integer n = UInt(Zn);
integer d = UInt(Zd);
boolean unsigned = TRUE;

Halfword

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 0 1 0 0 1 1 1 0 1 Pg Zn Zd
U

UXTH <Zd>.<T>, <Pg>/M, <Zn>.<T>

if !HaveSVE() then UNDEFINED;

if size != '1x' then UNDEFINED;
integer esize = 8 << UInt(size);
integer s_esize = 16;
integer g = UInt(Pg);
integer n = UInt(Zn);
integer d = UInt(Zd);
boolean unsigned = TRUE;

Word

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 0 1 0 1 0 1 1 0 1 Pg Zn Zd
U

UXTW <Zd>.D, <Pg>/M, <Zn>.D

if !HaveSVE() then UNDEFINED;

if size != '11' then UNDEFINED;
integer esize = 8 << UInt(size);
integer s_esize = 32;
integer g = UInt(Pg);
integer n = UInt(Zn);
integer d = UInt(Zd);
boolean unsigned = TRUE;

UXTB, UXTH, UXTW Page 2578

Assembler Symbols

<Zd> Is the name of the destination scalable vector register, encoded in the "Zd" field.

<T> For the byte variant: is the size specifier, encoded in “size”:

size <T>
00 RESERVED
01 H
10 S
11 D

For the halfword variant: is the size specifier, encoded in “size<0>”:

size<0> <T>
0 S
1 D

<Pg> Is the name of the governing scalable predicate register P0-P7, encoded in the "Pg" field.
<Zn> Is the name of the source scalable vector register, encoded in the "Zn" field.

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = P[g];
bits(VL) operand = if AnyActiveElement(mask, esize) then Z[n] else Zeros();
bits(VL) result = Z[d];

for e = 0 to elements-1
if ElemP[mask, e, esize] == '1' then
bits(esize) element = Elem[operand, e, esize];
Elem[result, e, esize] = Extend(element<s_esize-1:0>, esize, unsigned);

Z[d] = result;

Operational information

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

UXTB, UXTH, UXTW Page 2579

UZP1, UZP2 (predicates)

Concatenate even or odd elements from two predicates

Concatenate adjacent even or odd-numbered elements from the first and second source predicates and place in
elements of the destination predicate. This instruction is unpredicated.
It has encodings from 2 classes: Even and Odd

Even

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 size 1 0 Pm 0 1 0 0 1 0 0 Pn 0 Pd
H

UZP1 <Pd>.<T>, <Pn>.<T>, <Pm>.<T>

if !HaveSVE() then UNDEFINED;

integer esize = 8 << UInt(size);
integer n = UInt(Pn);
integer m = UInt(Pm);
integer d = UInt(Pd);
integer part = 0;

Odd

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 size 1 0 Pm 0 1 0 0 1 1 0 Pn 0 Pd
H

UZP2 <Pd>.<T>, <Pn>.<T>, <Pm>.<T>

if !HaveSVE() then UNDEFINED;

integer esize = 8 << UInt(size);
integer n = UInt(Pn);
integer m = UInt(Pm);
integer d = UInt(Pd);
integer part = 1;

Assembler Symbols

<Pd> Is the name of the destination scalable predicate register, encoded in the "Pd" field.

<T> Is the size specifier, encoded in “size”:

size <T>
00 B
01 H
10 S
11 D

<Pn> Is the name of the first source scalable predicate register, encoded in the "Pn" field.
<Pm> Is the name of the second source scalable predicate register, encoded in the "Pm" field.

UZP1, UZP2 (predicates) Page 2580

Operation

CheckSVEEnabled();
integer pairs = VL DIV (esize * 2);
bits(PL) operand1 = P[n];
bits(PL) operand2 = P[m];
bits(PL) result;

for p = 0 to pairs - 1
Elem[result, p, esize DIV 8] = Elem[operand1, 2*p+part, esize DIV 8];

for p = 0 to pairs - 1
Elem[result, pairs+p, esize DIV 8] = Elem[operand2, 2*p+part, esize DIV 8];

P[d] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

UZP1, UZP2 (predicates) Page 2581

UZP1, UZP2 (vectors)

Concatenate even or odd elements from two vectors

Concatenate adjacent even or odd-numbered elements from the first and second source vectors and place in elements
of the destination vector. This instruction is unpredicated. The 128-bit element variant of this instruction requires that
the current vector length is at least 256 bits, and if the current vector length is not an integer multiple of 256 bits then
the trailing bits are set to zero.
ID_AA64ZFR0_EL1.F64MM indicates whether the 128-bit element variant of the instruction is implemented.
It has encodings from 4 classes: Even , Even (quadwords) , Odd and Odd (quadwords)

Even

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 size 1 Zm 0 1 1 0 1 0 Zn Zd
H

UZP1 <Zd>.<T>, <Zn>.<T>, <Zm>.<T>

if !HaveSVE() then UNDEFINED;

integer esize = 8 << UInt(size);
integer n = UInt(Zn);
integer m = UInt(Zm);
integer d = UInt(Zd);
integer part = 0;

Even (quadwords)
(FEAT_F64MM)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 1 0 1 Zm 0 0 0 0 1 0 Zn Zd
H

UZP1 <Zd>.Q, <Zn>.Q, <Zm>.Q

if !HaveSVEFP64MatMulExt() then UNDEFINED;

integer esize = 128;
integer n = UInt(Zn);
integer m = UInt(Zm);
integer d = UInt(Zd);
integer part = 0;

Odd

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 size 1 Zm 0 1 1 0 1 1 Zn Zd
H

UZP2 <Zd>.<T>, <Zn>.<T>, <Zm>.<T>

if !HaveSVE() then UNDEFINED;

integer esize = 8 << UInt(size);
integer n = UInt(Zn);
integer m = UInt(Zm);
integer d = UInt(Zd);
integer part = 1;

UZP1, UZP2 (vectors) Page 2582

Odd (quadwords)
(FEAT_F64MM)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 1 0 1 Zm 0 0 0 0 1 1 Zn Zd
H

UZP2 <Zd>.Q, <Zn>.Q, <Zm>.Q

if !HaveSVEFP64MatMulExt() then UNDEFINED;

integer esize = 128;
integer n = UInt(Zn);
integer m = UInt(Zm);
integer d = UInt(Zd);
integer part = 1;

Assembler Symbols

<Zd> Is the name of the destination scalable vector register, encoded in the "Zd" field.

<T> Is the size specifier, encoded in “size”:

size <T>
00 B
01 H
10 S
11 D

<Zn> Is the name of the first source scalable vector register, encoded in the "Zn" field.
<Zm> Is the name of the second source scalable vector register, encoded in the "Zm" field.

Operation

CheckSVEEnabled();
if VL < esize * 2 then UNDEFINED;
integer pairs = VL DIV (esize * 2);
bits(VL) operand1 = Z[n];
bits(VL) operand2 = Z[m];
bits(VL) result = Zeros();

for p = 0 to pairs - 1
Elem[result, p, esize] = Elem[operand1, 2*p+part, esize];

for p = 0 to pairs - 1
Elem[result, pairs+p, esize] = Elem[operand2, 2*p+part, esize];

Z[d] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

UZP1, UZP2 (vectors) Page 2583

WHILELE

While incrementing signed scalar less than or equal to scalar

Generate a predicate that starting from the lowest numbered element is true while the incrementing value of the first,
signed scalar operand is less than or equal to the second scalar operand and false thereafter up to the highest
numbered element.
If the second scalar operand is equal to the maximum signed integer value then a condition which includes an equality
test can never fail and the result will be an all-true predicate.
The full width of the scalar operands is significant for the purposes of comparison, and the full width first operand is
incremented by one for each destination predicate element, irrespective of the predicate result element size. The first
general-purpose source register is not itself updated.
The predicate result is placed in the predicate destination register. Sets the FIRST (N), NONE (Z), !LAST (C) condition
flags based on the predicate result, and the V flag to zero.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 size 1 Rm 0 0 0 sf 0 1 Rn 1 Pd
U lt eq

WHILELE <Pd>.<T>, <R><n>, <R><m>

if !HaveSVE() then UNDEFINED;

integer esize = 8 << UInt(size);
integer rsize = 32 << UInt(sf);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer d = UInt(Pd);
boolean unsigned = FALSE;
SVECmp op = Cmp_LE;

Assembler Symbols

<Pd> Is the name of the destination scalable predicate register, encoded in the "Pd" field.

<T> Is the size specifier, encoded in “size”:

size <T>
00 B
01 H
10 S
11 D

<R> Is a width specifier, encoded in “sf”:

sf <R>
0 W
1 X

WHILELE Page 2584

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = Ones(PL);
bits(rsize) operand1 = X[n];
bits(rsize) operand2 = X[m];
bits(PL) result;
boolean last = TRUE;

for e = 0 to elements-1
boolean cond;
case op of
when Cmp_LT cond = (Int(operand1, unsigned) < Int(operand2, unsigned));
when Cmp_LE cond = (Int(operand1, unsigned) <= Int(operand2, unsigned));

last = last && cond;

ElemP[result, e, esize] = if last then '1' else '0';
operand1 = operand1 + 1;

PSTATE.<N,Z,C,V> = PredTest(mask, result, esize);

P[d] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

WHILELE Page 2585

WHILELO

While incrementing unsigned scalar lower than scalar

Generate a predicate that starting from the lowest numbered element is true while the incrementing value of the first,
unsigned scalar operand is lower than the second scalar operand and false thereafter up to the highest numbered
element.
The full width of the scalar operands is significant for the purposes of comparison, and the full width first operand is
incremented by one for each destination predicate element, irrespective of the predicate result element size. The first
general-purpose source register is not itself updated.
The predicate result is placed in the predicate destination register. Sets the FIRST (N), NONE (Z), !LAST (C) condition
flags based on the predicate result, and the V flag to zero.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 size 1 Rm 0 0 0 sf 1 1 Rn 0 Pd
U lt eq

WHILELO <Pd>.<T>, <R><n>, <R><m>

if !HaveSVE() then UNDEFINED;

integer esize = 8 << UInt(size);
integer rsize = 32 << UInt(sf);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer d = UInt(Pd);
boolean unsigned = TRUE;
SVECmp op = Cmp_LT;

Assembler Symbols

<Pd> Is the name of the destination scalable predicate register, encoded in the "Pd" field.

<T> Is the size specifier, encoded in “size”:

size <T>
00 B
01 H
10 S
11 D

<R> Is a width specifier, encoded in “sf”:

sf <R>
0 W
1 X

WHILELO Page 2586

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = Ones(PL);
bits(rsize) operand1 = X[n];
bits(rsize) operand2 = X[m];
bits(PL) result;
boolean last = TRUE;

for e = 0 to elements-1
boolean cond;
case op of
when Cmp_LT cond = (Int(operand1, unsigned) < Int(operand2, unsigned));
when Cmp_LE cond = (Int(operand1, unsigned) <= Int(operand2, unsigned));

last = last && cond;

ElemP[result, e, esize] = if last then '1' else '0';
operand1 = operand1 + 1;

PSTATE.<N,Z,C,V> = PredTest(mask, result, esize);

P[d] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

WHILELO Page 2587

WHILELS

While incrementing unsigned scalar lower or same as scalar

Generate a predicate that starting from the lowest numbered element is true while the incrementing value of the first,
unsigned scalar operand is lower or same as the second scalar operand and false thereafter up to the highest
numbered element.
If the second scalar operand is equal to the maximum unsigned integer value then a condition which includes an
equality test can never fail and the result will be an all-true predicate.
The full width of the scalar operands is significant for the purposes of comparison, and the full width first operand is
incremented by one for each destination predicate element, irrespective of the predicate result element size. The first
general-purpose source register is not itself updated.
The predicate result is placed in the predicate destination register. Sets the FIRST (N), NONE (Z), !LAST (C) condition
flags based on the predicate result, and the V flag to zero.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 size 1 Rm 0 0 0 sf 1 1 Rn 1 Pd
U lt eq

WHILELS <Pd>.<T>, <R><n>, <R><m>

if !HaveSVE() then UNDEFINED;

integer esize = 8 << UInt(size);
integer rsize = 32 << UInt(sf);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer d = UInt(Pd);
boolean unsigned = TRUE;
SVECmp op = Cmp_LE;

Assembler Symbols

<Pd> Is the name of the destination scalable predicate register, encoded in the "Pd" field.

<T> Is the size specifier, encoded in “size”:

size <T>
00 B
01 H
10 S
11 D

<R> Is a width specifier, encoded in “sf”:

sf <R>
0 W
1 X

WHILELS Page 2588

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = Ones(PL);
bits(rsize) operand1 = X[n];
bits(rsize) operand2 = X[m];
bits(PL) result;
boolean last = TRUE;

for e = 0 to elements-1
boolean cond;
case op of
when Cmp_LT cond = (Int(operand1, unsigned) < Int(operand2, unsigned));
when Cmp_LE cond = (Int(operand1, unsigned) <= Int(operand2, unsigned));

last = last && cond;

ElemP[result, e, esize] = if last then '1' else '0';
operand1 = operand1 + 1;

PSTATE.<N,Z,C,V> = PredTest(mask, result, esize);

P[d] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

WHILELS Page 2589

WHILELT

While incrementing signed scalar less than scalar

Generate a predicate that starting from the lowest numbered element is true while the incrementing value of the first,
signed scalar operand is less than the second scalar operand and false thereafter up to the highest numbered element.
The full width of the scalar operands is significant for the purposes of comparison, and the full width first operand is
incremented by one for each destination predicate element, irrespective of the predicate result element size. The first
general-purpose source register is not itself updated.
The predicate result is placed in the predicate destination register. Sets the FIRST (N), NONE (Z), !LAST (C) condition
flags based on the predicate result, and the V flag to zero.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 size 1 Rm 0 0 0 sf 0 1 Rn 0 Pd
U lt eq

WHILELT <Pd>.<T>, <R><n>, <R><m>

if !HaveSVE() then UNDEFINED;

integer esize = 8 << UInt(size);
integer rsize = 32 << UInt(sf);
integer n = UInt(Rn);
integer m = UInt(Rm);
integer d = UInt(Pd);
boolean unsigned = FALSE;
SVECmp op = Cmp_LT;

Assembler Symbols

<Pd> Is the name of the destination scalable predicate register, encoded in the "Pd" field.

<T> Is the size specifier, encoded in “size”:

size <T>
00 B
01 H
10 S
11 D

<R> Is a width specifier, encoded in “sf”:

sf <R>
0 W
1 X

WHILELT Page 2590

Operation

CheckSVEEnabled();
integer elements = VL DIV esize;
bits(PL) mask = Ones(PL);
bits(rsize) operand1 = X[n];
bits(rsize) operand2 = X[m];
bits(PL) result;
boolean last = TRUE;

for e = 0 to elements-1
boolean cond;
case op of
when Cmp_LT cond = (Int(operand1, unsigned) < Int(operand2, unsigned));
when Cmp_LE cond = (Int(operand1, unsigned) <= Int(operand2, unsigned));

last = last && cond;

ElemP[result, e, esize] = if last then '1' else '0';
operand1 = operand1 + 1;

PSTATE.<N,Z,C,V> = PredTest(mask, result, esize);

P[d] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

WHILELT Page 2591

WRFFR

Write the first-fault register

Read the source predicate register and place in the first-fault register (FFR). This instruction is intended to restore a
saved FFR and is not recommended for general use by applications.
This instruction requires that the source predicate contains a MONOTONIC predicate value, in which starting from bit 0
there are zero or more 1 bits, followed only by 0 bits in any remaining bit positions. If the source is not a monotonic
predicate value, then the resulting value in the FFR will be UNPREDICTABLE. It is not possible to generate a non-
monotonic value in FFR when using SETFFR followed by first-fault or non-fault loads.
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 0 0 1 0 1 0 0 0 1 0 0 1 0 0 0 Pn 0 0 0 0 0

WRFFR <Pn>.B

if !HaveSVE() then UNDEFINED;

integer n = UInt(Pn);

Assembler Symbols

<Pn> Is the name of the source scalable predicate register, encoded in the "Pn" field.

Operation

CheckSVEEnabled();
bits(PL) operand = P[n];

hsb = HighestSetBit(operand);
if hsb < 0 || IsOnes(operand<hsb:0>) then
FFR[] = operand;
else // not a monotonic predicate
FFR[] = bits(PL) UNKNOWN;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

WRFFR Page 2592

ZIP1, ZIP2 (predicates)

Interleave elements from two half predicates

Interleave alternating elements from the lowest or highest halves of the first and second source predicates and place
in elements of the destination predicate. This instruction is unpredicated.
It has encodings from 2 classes: High halves and Low halves

High halves

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 size 1 0 Pm 0 1 0 0 0 1 0 Pn 0 Pd
H

ZIP2 <Pd>.<T>, <Pn>.<T>, <Pm>.<T>

if !HaveSVE() then UNDEFINED;

integer esize = 8 << UInt(size);
integer n = UInt(Pn);
integer m = UInt(Pm);
integer d = UInt(Pd);
integer part = 1;

Low halves

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 size 1 0 Pm 0 1 0 0 0 0 0 Pn 0 Pd
H

ZIP1 <Pd>.<T>, <Pn>.<T>, <Pm>.<T>

if !HaveSVE() then UNDEFINED;

integer esize = 8 << UInt(size);
integer n = UInt(Pn);
integer m = UInt(Pm);
integer d = UInt(Pd);
integer part = 0;

Assembler Symbols

<Pd> Is the name of the destination scalable predicate register, encoded in the "Pd" field.

<T> Is the size specifier, encoded in “size”:

size <T>
00 B
01 H
10 S
11 D

<Pn> Is the name of the first source scalable predicate register, encoded in the "Pn" field.
<Pm> Is the name of the second source scalable predicate register, encoded in the "Pm" field.

ZIP1, ZIP2 (predicates) Page 2593

Operation

CheckSVEEnabled();
integer pairs = VL DIV (esize * 2);
bits(PL) operand1 = P[n];
bits(PL) operand2 = P[m];
bits(PL) result;

integer base = part * pairs;

for p = 0 to pairs-1
Elem[result, 2*p+0, esize DIV 8] = Elem[operand1, base+p, esize DIV 8];
Elem[result, 2*p+1, esize DIV 8] = Elem[operand2, base+p, esize DIV 8];

P[d] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

ZIP1, ZIP2 (predicates) Page 2594

ZIP1, ZIP2 (vectors)

Interleave elements from two half vectors

Interleave alternating elements from the lowest or highest halves of the first and second source vectors and place in
elements of the destination vector. This instruction is unpredicated. The 128-bit element variant of this instruction
requires that the current vector length is at least 256 bits, and if the current vector length is not an integer multiple of
256 bits then the trailing bits are set to zero.
ID_AA64ZFR0_EL1.F64MM indicates whether the 128-bit element variant of the instruction is implemented.
It has encodings from 4 classes: High halves , High halves (quadwords) , Low halves and Low halves (quadwords)

High halves

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 size 1 Zm 0 1 1 0 0 1 Zn Zd
H

ZIP2 <Zd>.<T>, <Zn>.<T>, <Zm>.<T>

if !HaveSVE() then UNDEFINED;

integer esize = 8 << UInt(size);
integer n = UInt(Zn);
integer m = UInt(Zm);
integer d = UInt(Zd);
integer part = 1;

High halves (quadwords)

(FEAT_F64MM)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 1 0 1 Zm 0 0 0 0 0 1 Zn Zd
H

ZIP2 <Zd>.Q, <Zn>.Q, <Zm>.Q

if !HaveSVEFP64MatMulExt() then UNDEFINED;

integer esize = 128;
integer n = UInt(Zn);
integer m = UInt(Zm);
integer d = UInt(Zd);
integer part = 1;

Low halves

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 size 1 Zm 0 1 1 0 0 0 Zn Zd
H

ZIP1 <Zd>.<T>, <Zn>.<T>, <Zm>.<T>

if !HaveSVE() then UNDEFINED;

integer esize = 8 << UInt(size);
integer n = UInt(Zn);
integer m = UInt(Zm);
integer d = UInt(Zd);
integer part = 0;

ZIP1, ZIP2 (vectors) Page 2595

Low halves (quadwords)
(FEAT_F64MM)

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 1 0 1 Zm 0 0 0 0 0 0 Zn Zd
H

ZIP1 <Zd>.Q, <Zn>.Q, <Zm>.Q

if !HaveSVEFP64MatMulExt() then UNDEFINED;

integer esize = 128;
integer n = UInt(Zn);
integer m = UInt(Zm);
integer d = UInt(Zd);
integer part = 0;

Assembler Symbols

<Zd> Is the name of the destination scalable vector register, encoded in the "Zd" field.

<T> Is the size specifier, encoded in “size”:

size <T>
00 B
01 H
10 S
11 D

<Zn> Is the name of the first source scalable vector register, encoded in the "Zn" field.
<Zm> Is the name of the second source scalable vector register, encoded in the "Zm" field.

Operation

CheckSVEEnabled();
if VL < esize * 2 then UNDEFINED;
integer pairs = VL DIV (esize * 2);
bits(VL) operand1 = Z[n];
bits(VL) operand2 = Z[m];
bits(VL) result = Zeros();

integer base = part * pairs;

for p = 0 to pairs-1
Elem[result, 2*p+0, esize] = Elem[operand1, base+p, esize];
Elem[result, 2*p+1, esize] = Elem[operand2, base+p, esize];

Z[d] = result;

Internal version only: isa v33.16decrel, AdvSIMD v29.05, pseudocode v2021-12_rel, sve v2021-12 ; Build timestamp: 2021-12-15T12:33

ZIP1, ZIP2 (vectors) Page 2596

Top-level encodings for A64

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
op0

Decode fields
Instruction details
op0
0000 Reserved
0001 UNALLOCATED
0010 SVE encodings
0011 UNALLOCATED
100x Data Processing -- Immediate
101x Branches, Exception Generating and System instructions
x1x0 Loads and Stores
x101 Data Processing -- Register
x111 Data Processing -- Scalar Floating-Point and Advanced SIMD

Reserved

These instructions are under the top-level.

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
op0 0000 op1

Decode fields
Instruction details
op0 op1
000 000000000 UDF
!= 000000000 UNALLOCATED
!= 000 UNALLOCATED

SVE encodings

These instructions are under the top-level.

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
op0 0010 op1 op2 op3

Decode fields
Instruction details
op0 op1 op2 op3
000 0x 0xxxx x1xxxx SVE Integer Multiply-Add - Predicated
000 0x 0xxxx 000xxx SVE Integer Binary Arithmetic - Predicated
000 0x 0xxxx 001xxx SVE Integer Reduction
000 0x 0xxxx 100xxx SVE Bitwise Shift - Predicated
000 0x 0xxxx 101xxx SVE Integer Unary Arithmetic - Predicated
000 0x 1xxxx 000xxx SVE integer add/subtract vectors (unpredicated)
000 0x 1xxxx 001xxx SVE Bitwise Logical - Unpredicated
000 0x 1xxxx 0100xx SVE Index Generation
000 0x 1xxxx 0101xx SVE Stack Allocation
000 0x 1xxxx 011xxx UNALLOCATED
000 0x 1xxxx 100xxx SVE Bitwise Shift - Unpredicated
000 0x 1xxxx 1010xx SVE address generation
000 0x 1xxxx 1011xx SVE Integer Misc - Unpredicated

Page 2597
Top-level encodings for A64

000 0x 1xxxx 11xxxx SVE Element Count

000 1x 00xxx SVE Bitwise Immediate
000 1x 01xxx SVE Integer Wide Immediate - Predicated
000 1x 1xxxx 001000 DUP (indexed)
000 1x 1xxxx 001001 UNALLOCATED
000 1x 1xxxx 00101x UNALLOCATED
000 1x 1xxxx 0011x1 UNALLOCATED
000 1x 1xxxx 001100 TBL
000 1x 1xxxx 001110 SVE Permute Vector - Unpredicated
000 1x 1xxxx 010xxx SVE Permute Predicate
000 1x 1xxxx 011xxx SVE permute vector elements
000 1x 1xxxx 10xxxx SVE Permute Vector - Predicated
000 1x 1xxxx 11xxxx SEL (vectors)
000 10 1xxxx 000xxx SVE Permute Vector - Extract
000 11 1xxxx 000xxx SVE Permute Vector - Segments
001 0x 0xxxx SVE Integer Compare - Vectors
001 0x 1xxxx SVE integer compare with unsigned immediate
001 1x 0xxxx x0xxxx SVE integer compare with signed immediate
001 1x 00xxx 01xxxx SVE predicate logical operations
001 1x 00xxx 11xxxx SVE Propagate Break
001 1x 01xxx 01xxxx SVE Partition Break
001 1x 01xxx 11xxxx SVE Predicate Misc
001 1x 1xxxx 00xxxx SVE Integer Compare - Scalars
001 1x 1xxxx 01xxxx UNALLOCATED
001 1x 1xxxx 11xxxx SVE Integer Wide Immediate - Unpredicated
001 1x 100xx 10xxxx SVE Predicate Count
001 1x 101xx 1000xx SVE Inc/Dec by Predicate Count
001 1x 101xx 1001xx SVE Write FFR
001 1x 101xx 101xxx UNALLOCATED
001 1x 11xxx 10xxxx UNALLOCATED
010 0x 0xxxx 0xxxxx SVE Integer Multiply-Add - Unpredicated
010 0x 0xxxx 1xxxxx UNALLOCATED
010 0x 1xxxx SVE Multiply - Indexed
010 1x 0xxxx 0xxxxx UNALLOCATED
010 1x 0xxxx 10xxxx SVE Misc
010 1x 0xxxx 11xxxx UNALLOCATED
010 1x 1xxxx UNALLOCATED
011 0x 0xxxx 0xxxxx FCMLA (vectors)
011 0x 00x1x 1xxxxx UNALLOCATED
011 0x 00000 100xxx FCADD
011 0x 00000 101xxx UNALLOCATED
011 0x 00000 11xxxx UNALLOCATED
011 0x 00001 1xxxxx UNALLOCATED
011 0x 0010x 100xxx UNALLOCATED
011 0x 0010x 101xxx SVE floating-point convert precision odd elements
011 0x 0010x 11xxxx UNALLOCATED
011 0x 01xxx 1xxxxx UNALLOCATED
011 0x 1xxxx x0x01x UNALLOCATED
011 0x 1xxxx 00000x SVE floating-point multiply-add (indexed)

Page 2598
Top-level encodings for A64

011 0x 1xxxx 0001xx SVE floating-point complex multiply-add (indexed)

011 0x 1xxxx 001000 SVE floating-point multiply (indexed)
011 0x 1xxxx 001001 UNALLOCATED
011 0x 1xxxx 0011xx UNALLOCATED
011 0x 1xxxx 01x0xx SVE Floating Point Widening Multiply-Add - Indexed
011 0x 1xxxx 01x1xx UNALLOCATED
011 0x 1xxxx 10x00x SVE Floating Point Widening Multiply-Add
011 0x 1xxxx 10x1xx UNALLOCATED
011 0x 1xxxx 110xxx UNALLOCATED
011 0x 1xxxx 111000 UNALLOCATED
011 0x 1xxxx 111001 SVE floating point matrix multiply accumulate
011 0x 1xxxx 11101x UNALLOCATED
011 0x 1xxxx 1111xx UNALLOCATED
011 1x 0xxxx x1xxxx SVE floating-point compare vectors
011 1x 0xxxx 000xxx SVE floating-point arithmetic (unpredicated)
011 1x 0xxxx 100xxx SVE Floating Point Arithmetic - Predicated
011 1x 0xxxx 101xxx SVE Floating Point Unary Operations - Predicated
011 1x 000xx 001xxx SVE floating-point recursive reduction
011 1x 001xx 0010xx UNALLOCATED
011 1x 001xx 0011xx SVE Floating Point Unary Operations - Unpredicated
011 1x 010xx 001xxx SVE Floating Point Compare - with Zero
011 1x 011xx 001xxx SVE Floating Point Accumulating Reduction
011 1x 1xxxx SVE Floating Point Multiply-Add
100 SVE Memory - 32-bit Gather and Unsized Contiguous
101 SVE Memory - Contiguous Load
110 SVE Memory - 64-bit Gather
111 0x0xxx SVE Memory - Contiguous Store and Unsized Contiguous
111 0x1xxx SVE Memory - Non-temporal and Multi-register Store
111 1x0xxx SVE Memory - Scatter with Optional Sign Extend
111 101xxx SVE Memory - Scatter
111 111xxx SVE Memory - Contiguous Store with Immediate Offset

SVE Integer Multiply-Add - Predicated

These instructions are under SVE encodings.

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
00000100 0 op0 1

Decode fields
Instruction details
op0
0 SVE integer multiply-accumulate writing addend (predicated)
1 SVE integer multiply-add writing multiplicand (predicated)

SVE integer multiply-accumulate writing addend (predicated)

These instructions are under SVE Integer Multiply-Add - Predicated.

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 0 Zm 0 1 op Pg Zn Zda

Page 2599
Top-level encodings for A64

Decode fields
Instruction Details
op
0 MLA
1 MLS

SVE integer multiply-add writing multiplicand (predicated)

These instructions are under SVE Integer Multiply-Add - Predicated.

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 0 Zm 1 1 op Pg Za Zdn

Decode fields
Instruction Details
op
0 MAD
1 MSB

SVE Integer Binary Arithmetic - Predicated

These instructions are under SVE encodings.

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
00000100 0 op0 000

Decode fields
Instruction details
op0
00x SVE integer add/subtract vectors (predicated)
01x SVE integer min/max/difference (predicated)
100 SVE integer multiply vectors (predicated)
101 SVE integer divide vectors (predicated)
11x SVE bitwise logical operations (predicated)

SVE integer add/subtract vectors (predicated)

These instructions are under SVE Integer Binary Arithmetic - Predicated.

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 0 0 0 opc 0 0 0 Pg Zm Zdn

Decode fields
Instruction Details
opc
000 ADD (vectors, predicated)
001 SUB (vectors, predicated)
010 UNALLOCATED
011 SUBR (vectors)
1xx UNALLOCATED

SVE integer min/max/difference (predicated)

These instructions are under SVE Integer Binary Arithmetic - Predicated.

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 0 0 1 opc U 0 0 0 Pg Zm Zdn

Page 2600
Top-level encodings for A64

Decode fields
Instruction Details
opc U
00 0 SMAX (vectors)
00 1 UMAX (vectors)
01 0 SMIN (vectors)
01 1 UMIN (vectors)
10 0 SABD
10 1 UABD
11 UNALLOCATED

SVE integer multiply vectors (predicated)

These instructions are under SVE Integer Binary Arithmetic - Predicated.

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 0 1 0 0 H U 0 0 0 Pg Zm Zdn

Decode fields
Instruction Details
H U
0 0 MUL (vectors)
0 1 UNALLOCATED
1 0 SMULH
1 1 UMULH

SVE integer divide vectors (predicated)

These instructions are under SVE Integer Binary Arithmetic - Predicated.

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 0 1 0 1 R U 0 0 0 Pg Zm Zdn

Decode fields
Instruction Details
R U
0 0 SDIV
0 1 UDIV
1 0 SDIVR
1 1 UDIVR

SVE bitwise logical operations (predicated)

These instructions are under SVE Integer Binary Arithmetic - Predicated.

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 0 1 1 opc 0 0 0 Pg Zm Zdn

Decode fields
Instruction Details
opc
000 ORR (vectors, predicated)
001 EOR (vectors, predicated)
010 AND (vectors, predicated)
011 BIC (vectors, predicated)
1xx UNALLOCATED

Page 2601
Top-level encodings for A64

SVE Integer Reduction

These instructions are under SVE encodings.

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
00000100 0 op0 001

Decode fields
Instruction details
op0
000 SVE integer add reduction (predicated)
010 SVE integer min/max reduction (predicated)
0x1 UNALLOCATED
10x SVE constructive prefix (predicated)
110 SVE bitwise logical reduction (predicated)
111 UNALLOCATED

SVE integer add reduction (predicated)

These instructions are under SVE Integer Reduction.

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 0 0 0 0 op U 0 0 1 Pg Zn Vd

Decode fields
Instruction Details
op U
0 0 SADDV
0 1 UADDV
1 UNALLOCATED

SVE integer min/max reduction (predicated)

These instructions are under SVE Integer Reduction.

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 0 0 1 0 op U 0 0 1 Pg Zn Vd

Decode fields
Instruction Details
op U
0 0 SMAXV
0 1 UMAXV
1 0 SMINV
1 1 UMINV

SVE constructive prefix (predicated)

These instructions are under SVE Integer Reduction.

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 0 1 0 opc M 0 0 1 Pg Zn Zd

Decode fields
Instruction Details
opc
00 MOVPRFX (predicated)
01 UNALLOCATED
1x UNALLOCATED

Page 2602
Top-level encodings for A64

SVE bitwise logical reduction (predicated)

These instructions are under SVE Integer Reduction.

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 0 1 1 0 opc 0 0 1 Pg Zn Vd

Decode fields
Instruction Details
opc
00 ORV
01 EORV
10 ANDV
11 UNALLOCATED

SVE Bitwise Shift - Predicated

These instructions are under SVE encodings.

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
00000100 0 op0 100

Decode fields
Instruction details
op0
0x SVE bitwise shift by immediate (predicated)
10 SVE bitwise shift by vector (predicated)
11 SVE bitwise shift by wide elements (predicated)

SVE bitwise shift by immediate (predicated)

These instructions are under SVE Bitwise Shift - Predicated.

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 tszh 0 0 opc L U 1 0 0 Pg tszl imm3 Zdn

Decode fields
Instruction Details
opc L U
00 0 0 ASR (immediate, predicated)
00 0 1 LSR (immediate, predicated)
00 1 0 UNALLOCATED
00 1 1 LSL (immediate, predicated)
01 0 0 ASRD
01 0 1 UNALLOCATED
01 1 UNALLOCATED
1x UNALLOCATED

SVE bitwise shift by vector (predicated)

These instructions are under SVE Bitwise Shift - Predicated.

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 0 1 0 R L U 1 0 0 Pg Zm Zdn

Decode fields
Instruction Details
R L U
1 0 UNALLOCATED
0 0 0 ASR (vectors)

Page 2603
Top-level encodings for A64

Decode fields
Instruction Details
R L U
0 0 1 LSR (vectors)
0 1 1 LSL (vectors)
1 0 0 ASRR
1 0 1 LSRR
1 1 1 LSLR

SVE bitwise shift by wide elements (predicated)

These instructions are under SVE Bitwise Shift - Predicated.

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 0 1 1 R L U 1 0 0 Pg Zm Zdn

Decode fields
Instruction Details
R L U
0 0 0 ASR (wide elements, predicated)
0 0 1 LSR (wide elements, predicated)
0 1 0 UNALLOCATED
0 1 1 LSL (wide elements, predicated)
1 UNALLOCATED

SVE Integer Unary Arithmetic - Predicated

These instructions are under SVE encodings.

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
00000100 0 op0 101

Decode fields
Instruction details
op0
0x UNALLOCATED
10 SVE integer unary operations (predicated)
11 SVE bitwise unary operations (predicated)

SVE integer unary operations (predicated)

These instructions are under SVE Integer Unary Arithmetic - Predicated.

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 0 1 0 opc 1 0 1 Pg Zn Zd

Decode fields
Instruction Details
opc
000 SXTB, SXTH, SXTW — SXTB
001 UXTB, UXTH, UXTW — UXTB
010 SXTB, SXTH, SXTW — SXTH
011 UXTB, UXTH, UXTW — UXTH
100 SXTB, SXTH, SXTW — SXTW
101 UXTB, UXTH, UXTW — UXTW
110 ABS
111 NEG

Page 2604
Top-level encodings for A64

SVE bitwise unary operations (predicated)

These instructions are under SVE Integer Unary Arithmetic - Predicated.

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 0 1 1 opc 1 0 1 Pg Zn Zd

Decode fields
Instruction Details
opc
000 CLS
001 CLZ
010 CNT
011 CNOT
100 FABS
101 FNEG
110 NOT (vector)
111 UNALLOCATED

SVE integer add/subtract vectors (unpredicated)

These instructions are under SVE encodings.

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 1 Zm 0 0 0 opc Zn Zd

Decode fields
Instruction Details
opc
000 ADD (vectors, unpredicated)
001 SUB (vectors, unpredicated)
01x UNALLOCATED
100 SQADD (vectors)
101 UQADD (vectors)
110 SQSUB (vectors)
111 UQSUB (vectors)

SVE Bitwise Logical - Unpredicated

These instructions are under SVE encodings.

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
00000100 1 001 op0 op1

Decode fields
Instruction details
op0 op1
0 UNALLOCATED
1 00 SVE bitwise logical operations (unpredicated)
1 != 00 UNALLOCATED

SVE bitwise logical operations (unpredicated)

These instructions are under SVE Bitwise Logical - Unpredicated.

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 opc 1 Zm 0 0 1 1 0 0 Zn Zd

Page 2605
Top-level encodings for A64

Decode fields
Instruction Details
opc
00 AND (vectors, unpredicated)
01 ORR (vectors, unpredicated)
10 EOR (vectors, unpredicated)
11 BIC (vectors, unpredicated)

SVE Index Generation

These instructions are under SVE encodings.

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
00000100 1 0100 op0

Decode fields
Instruction details
op0
00 INDEX (immediates)
01 INDEX (scalar, immediate)
10 INDEX (immediate, scalar)
11 INDEX (scalars)

SVE Stack Allocation

These instructions are under SVE encodings.

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
00000100 op0 1 0101 op1

Decode fields
Instruction details
op0 op1
0 0 SVE stack frame adjustment
1 0 SVE stack frame size
1 UNALLOCATED

SVE stack frame adjustment

These instructions are under SVE Stack Allocation.

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 0 op 1 Rn 0 1 0 1 0 imm6 Rd

Decode fields
Instruction Details
op
0 ADDVL
1 ADDPL

SVE stack frame size

These instructions are under SVE Stack Allocation.

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 1 op 1 opc2 0 1 0 1 0 imm6 Rd

Page 2606
Top-level encodings for A64

Decode fields
Instruction Details
op opc2
0 0xxxx UNALLOCATED
0 10xxx UNALLOCATED
0 110xx UNALLOCATED
0 1110x UNALLOCATED
0 11110 UNALLOCATED
0 11111 RDVL
1 UNALLOCATED

SVE Bitwise Shift - Unpredicated

These instructions are under SVE encodings.

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
00000100 1 100 op0

Decode fields
Instruction details
op0
0 SVE bitwise shift by wide elements (unpredicated)
1 SVE bitwise shift by immediate (unpredicated)

SVE bitwise shift by wide elements (unpredicated)

These instructions are under SVE Bitwise Shift - Unpredicated.

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 1 Zm 1 0 0 0 opc Zn Zd

Decode fields
Instruction Details
opc
00 ASR (wide elements, unpredicated)
01 LSR (wide elements, unpredicated)
10 UNALLOCATED
11 LSL (wide elements, unpredicated)

SVE bitwise shift by immediate (unpredicated)

These instructions are under SVE Bitwise Shift - Unpredicated.

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 tszh 1 tszl imm3 1 0 0 1 opc Zn Zd

Decode fields
Instruction Details
opc
00 ASR (immediate, unpredicated)
01 LSR (immediate, unpredicated)
10 UNALLOCATED
11 LSL (immediate, unpredicated)

SVE address generation

These instructions are under SVE encodings.

Page 2607
Top-level encodings for A64

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 opc 1 Zm 1 0 1 0 msz Zn Zd

Decode fields
Instruction Details
opc
00 ADR — Unpacked 32-bit signed offsets
01 ADR — Unpacked 32-bit unsigned offsets
1x ADR — Packed offsets

SVE Integer Misc - Unpredicated

These instructions are under SVE encodings.

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
00000100 1 1011 op0

Decode fields
Instruction details
op0
0x SVE floating-point trig select coefficient
10 SVE floating-point exponential accelerator
11 SVE constructive prefix (unpredicated)

SVE floating-point trig select coefficient

These instructions are under SVE Integer Misc - Unpredicated.

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 1 Zm 1 0 1 1 0 op Zn Zd

Decode fields
Instruction Details
op
0 FTSSEL
1 UNALLOCATED

SVE floating-point exponential accelerator

These instructions are under SVE Integer Misc - Unpredicated.

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 1 opc 1 0 1 1 1 0 Zn Zd

Decode fields
Instruction Details
opc
00000 FEXPA
00001 UNALLOCATED
0001x UNALLOCATED
001xx UNALLOCATED
01xxx UNALLOCATED
1xxxx UNALLOCATED

SVE constructive prefix (unpredicated)

These instructions are under SVE Integer Misc - Unpredicated.

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 opc 1 opc2 1 0 1 1 1 1 Zn Zd

Page 2608
Top-level encodings for A64

Decode fields
Instruction Details
opc opc2
00 00000 MOVPRFX (unpredicated)
00 00001 UNALLOCATED
00 0001x UNALLOCATED
00 001xx UNALLOCATED
00 01xxx UNALLOCATED
00 1xxxx UNALLOCATED
01 UNALLOCATED
1x UNALLOCATED

SVE Element Count

These instructions are under SVE encodings.

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
00000100 1 op0 11 op1

Decode fields
Instruction details
op0 op1
0 00x SVE saturating inc/dec vector by element count
0 100 SVE element count
0 101 UNALLOCATED
1 000 SVE inc/dec vector by element count
1 100 SVE inc/dec register by element count
1 x01 UNALLOCATED
01x UNALLOCATED
11x SVE saturating inc/dec register by element count

SVE saturating inc/dec vector by element count

These instructions are under SVE Element Count.

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 1 0 imm4 1 1 0 0 D U pattern Zdn

Decode fields
Instruction Details
size D U
00 UNALLOCATED
01 0 0 SQINCH (vector)
01 0 1 UQINCH (vector)
01 1 0 SQDECH (vector)
01 1 1 UQDECH (vector)
10 0 0 SQINCW (vector)
10 0 1 UQINCW (vector)
10 1 0 SQDECW (vector)
10 1 1 UQDECW (vector)
11 0 0 SQINCD (vector)
11 0 1 UQINCD (vector)
11 1 0 SQDECD (vector)
11 1 1 UQDECD (vector)

Page 2609
Top-level encodings for A64

SVE element count

These instructions are under SVE Element Count.

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 1 0 imm4 1 1 1 0 0 op pattern Rd

Decode fields
Instruction Details
size op
1 UNALLOCATED
00 0 CNTB, CNTD, CNTH, CNTW — CNTB
01 0 CNTB, CNTD, CNTH, CNTW — CNTH
10 0 CNTB, CNTD, CNTH, CNTW — CNTW
11 0 CNTB, CNTD, CNTH, CNTW — CNTD

SVE inc/dec vector by element count

These instructions are under SVE Element Count.

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 1 1 imm4 1 1 0 0 0 D pattern Zdn

Decode fields
Instruction Details
size D
00 UNALLOCATED
01 0 INCD, INCH, INCW (vector) — INCH
01 1 DECD, DECH, DECW (vector) — DECH
10 0 INCD, INCH, INCW (vector) — INCW
10 1 DECD, DECH, DECW (vector) — DECW
11 0 INCD, INCH, INCW (vector) — INCD
11 1 DECD, DECH, DECW (vector) — DECD

SVE inc/dec register by element count

These instructions are under SVE Element Count.

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 1 1 imm4 1 1 1 0 0 D pattern Rdn

Decode fields
Instruction Details
size D
00 0 INCB, INCD, INCH, INCW (scalar) — INCB
00 1 DECB, DECD, DECH, DECW (scalar) — DECB
01 0 INCB, INCD, INCH, INCW (scalar) — INCH
01 1 DECB, DECD, DECH, DECW (scalar) — DECH
10 0 INCB, INCD, INCH, INCW (scalar) — INCW
10 1 DECB, DECD, DECH, DECW (scalar) — DECW
11 0 INCB, INCD, INCH, INCW (scalar) — INCD
11 1 DECB, DECD, DECH, DECW (scalar) — DECD

SVE saturating inc/dec register by element count

These instructions are under SVE Element Count.

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 0 size 1 sf imm4 1 1 1 1 D U pattern Rdn

Page 2610
Top-level encodings for A64

Decode fields
Instruction Details
size sf D U
00 0 0 0 SQINCB — 32-bit
00 0 0 1 UQINCB — 32-bit
00 0 1 0 SQDECB — 32-bit
00 0 1 1 UQDECB — 32-bit
00 1 0 0 SQINCB — 64-bit
00 1 0 1 UQINCB — 64-bit
00 1 1 0 SQDECB — 64-bit
00 1 1 1 UQDECB — 64-bit
01 0 0 0 SQINCH (scalar) — 32-bit
01 0 0 1 UQINCH (scalar) — 32-bit
01 0 1 0 SQDECH (scalar) — 32-bit
01 0 1 1 UQDECH (scalar) — 32-bit
01 1 0 0 SQINCH (scalar) — 64-bit
01 1 0 1 UQINCH (scalar) — 64-bit
01 1 1 0 SQDECH (scalar) — 64-bit
01 1 1 1 UQDECH (scalar) — 64-bit
10 0 0 0 SQINCW (scalar) — 32-bit
10 0 0 1 UQINCW (scalar) — 32-bit
10 0 1 0 SQDECW (scalar) — 32-bit
10 0 1 1 UQDECW (scalar) — 32-bit
10 1 0 0 SQINCW (scalar) — 64-bit
10 1 0 1 UQINCW (scalar) — 64-bit
10 1 1 0 SQDECW (scalar) — 64-bit
10 1 1 1 UQDECW (scalar) — 64-bit
11 0 0 0 SQINCD (scalar) — 32-bit
11 0 0 1 UQINCD (scalar) — 32-bit
11 0 1 0 SQDECD (scalar) — 32-bit
11 0 1 1 UQDECD (scalar) — 32-bit
11 1 0 0 SQINCD (scalar) — 64-bit
11 1 0 1 UQINCD (scalar) — 64-bit
11 1 1 0 SQDECD (scalar) — 64-bit
11 1 1 1 UQDECD (scalar) — 64-bit

SVE Bitwise Immediate

These instructions are under SVE encodings.

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
00000101 op0 00 op1

Decode fields
Instruction details
op0 op1
11 00 DUPM
!= 11 00 SVE bitwise logical with immediate (unpredicated)
!= 00 UNALLOCATED

SVE bitwise logical with immediate (unpredicated)

These instructions are under SVE Bitwise Immediate.

Page 2611
Top-level encodings for A64

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 != 11 0 0 0 0 imm13 Zdn
opc

The following constraints also apply to this encoding: opc != 11 && opc != 11

Decode fields
Instruction Details
opc
00 ORR (immediate)
01 EOR (immediate)
10 AND (immediate)

SVE Integer Wide Immediate - Predicated

These instructions are under SVE encodings.

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
00000101 01 op0

Decode fields
Instruction details
op0
0xx SVE copy integer immediate (predicated)
10x UNALLOCATED
110 FCPY
111 UNALLOCATED

SVE copy integer immediate (predicated)

These instructions are under SVE Integer Wide Immediate - Predicated.

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 size 0 1 Pg 0 M sh imm8 Zd

Decode fields
Instruction Details
M
0 CPY (immediate, zeroing)
1 CPY (immediate, merging)

SVE Permute Vector - Unpredicated

These instructions are under SVE encodings.

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
00000101 1 op0 op1 001110

Decode fields
Instruction details
op0 op1
00 000 DUP (scalar)
00 100 INSR (scalar)
00 x10 UNALLOCATED
00 xx1 UNALLOCATED
01 UNALLOCATED
10 0xx SVE unpack vector elements
10 100 INSR (SIMD&FP scalar)
10 110 UNALLOCATED

Page 2612
Top-level encodings for A64

10 1x1 UNALLOCATED
11 000 REV (vector)
11 != 000 UNALLOCATED

SVE unpack vector elements

These instructions are under SVE Permute Vector - Unpredicated.

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 size 1 1 0 0 U H 0 0 1 1 1 0 Zn Zd

Decode fields
Instruction Details
U H
0 0 SUNPKHI, SUNPKLO — SUNPKLO
0 1 SUNPKHI, SUNPKLO — SUNPKHI
1 0 UUNPKHI, UUNPKLO — UUNPKLO
1 1 UUNPKHI, UUNPKLO — UUNPKHI

SVE Permute Predicate

These instructions are under SVE encodings.

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
00000101 op0 1 op1 010 op2 op3

Decode fields
Instruction details
op0 op1 op2 op3
00 1000x 0000 0 SVE unpack predicate elements
01 1000x 0000 0 UNALLOCATED
10 1000x 0000 0 UNALLOCATED
11 1000x 0000 0 UNALLOCATED
0xxxx xxx0 0 SVE permute predicate elements
0xxxx xxx1 0 UNALLOCATED
10100 0000 0 REV (predicate)
10101 0000 0 UNALLOCATED
10x0x 1000 0 UNALLOCATED
10x0x x100 0 UNALLOCATED
10x0x xx10 0 UNALLOCATED
10x0x xxx1 0 UNALLOCATED
10x1x 0 UNALLOCATED
11xxx 0 UNALLOCATED
1 UNALLOCATED

SVE unpack predicate elements

These instructions are under SVE Permute Predicate.

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 0 0 1 1 0 0 0 H 0 1 0 0 0 0 0 Pn 0 Pd

Decode fields
Instruction Details
H
0 PUNPKHI, PUNPKLO — PUNPKLO

Page 2613
Top-level encodings for A64

Decode fields
Instruction Details
H
1 PUNPKHI, PUNPKLO — PUNPKHI

SVE permute predicate elements

These instructions are under SVE Permute Predicate.

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 size 1 0 Pm 0 1 0 opc H 0 Pn 0 Pd

Decode fields
Instruction Details
opc H
00 0 ZIP1, ZIP2 (predicates) — ZIP1
00 1 ZIP1, ZIP2 (predicates) — ZIP2
01 0 UZP1, UZP2 (predicates) — UZP1
01 1 UZP1, UZP2 (predicates) — UZP2
10 0 TRN1, TRN2 (predicates) — TRN1
10 1 TRN1, TRN2 (predicates) — TRN2
11 UNALLOCATED

SVE permute vector elements

These instructions are under SVE encodings.

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 size 1 Zm 0 1 1 opc Zn Zd

Decode fields
Instruction Details
opc
000 ZIP1, ZIP2 (vectors) — ZIP1
001 ZIP1, ZIP2 (vectors) — ZIP2
010 UZP1, UZP2 (vectors) — UZP1
011 UZP1, UZP2 (vectors) — UZP2
100 TRN1, TRN2 (vectors) — TRN1
101 TRN1, TRN2 (vectors) — TRN2
11x UNALLOCATED

SVE Permute Vector - Predicated

These instructions are under SVE encodings.

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
00000101 1 op0 op1 op2 10 op3

Decode fields
Instruction details
op0 op1 op2 op3
0 000 0 0 CPY (SIMD&FP scalar)
0 000 1 0 COMPACT
0 000 1 SVE extract element to general register
0 001 0 SVE extract element to SIMD&FP scalar register
0 01x 0 SVE reverse within elements
0 01x 1 UNALLOCATED
0 100 0 1 CPY (scalar)

Page 2614
Top-level encodings for A64

0 100 1 1 UNALLOCATED
0 100 0 SVE conditionally broadcast element to vector
0 101 0 SVE conditionally extract element to SIMD&FP scalar
0 110 0 0 SPLICE
0 110 0 1 UNALLOCATED
0 110 1 UNALLOCATED
0 111 0 0 UNALLOCATED
0 111 0 1 UNALLOCATED
0 111 1 UNALLOCATED
0 x01 1 UNALLOCATED
1 000 0 UNALLOCATED
1 000 1 SVE conditionally extract element to general register
1 != 000 UNALLOCATED

SVE extract element to general register

These instructions are under SVE Permute Vector - Predicated.

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 size 1 0 0 0 0 B 1 0 1 Pg Zn Rd

Decode fields
Instruction Details
B
0 LASTA (scalar)
1 LASTB (scalar)

SVE extract element to SIMD&FP scalar register

These instructions are under SVE Permute Vector - Predicated.

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 size 1 0 0 0 1 B 1 0 0 Pg Zn Vd

Decode fields
Instruction Details
B
0 LASTA (SIMD&FP scalar)
1 LASTB (SIMD&FP scalar)

SVE reverse within elements

These instructions are under SVE Permute Vector - Predicated.

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 size 1 0 0 1 opc 1 0 0 Pg Zn Zd

Decode fields
Instruction Details
opc
00 REVB, REVH, REVW — REVB
01 REVB, REVH, REVW — REVH
10 REVB, REVH, REVW — REVW
11 RBIT

Page 2615
Top-level encodings for A64

SVE conditionally broadcast element to vector

These instructions are under SVE Permute Vector - Predicated.

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 size 1 0 1 0 0 B 1 0 0 Pg Zm Zdn

Decode fields
Instruction Details
B
0 CLASTA (vectors)
1 CLASTB (vectors)

SVE conditionally extract element to SIMD&FP scalar

These instructions are under SVE Permute Vector - Predicated.

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 size 1 0 1 0 1 B 1 0 0 Pg Zm Vdn

Decode fields
Instruction Details
B
0 CLASTA (SIMD&FP scalar)
1 CLASTB (SIMD&FP scalar)

SVE conditionally extract element to general register

These instructions are under SVE Permute Vector - Predicated.

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 size 1 1 0 0 0 B 1 0 1 Pg Zm Rdn

Decode fields
Instruction Details
B
0 CLASTA (scalar)
1 CLASTB (scalar)

SVE Permute Vector - Extract

These instructions are under SVE encodings.

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
000001010 op0 1 000

Decode fields
Instruction details
op0
0 EXT
1 UNALLOCATED

SVE Permute Vector - Segments

These instructions are under SVE encodings.

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
000001011 op0 1 000

Decode fields
Instruction details
op0

Page 2616
Top-level encodings for A64

0 SVE permute vector segments

1 UNALLOCATED

SVE permute vector segments

These instructions are under SVE Permute Vector - Segments.

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 0 0 0 1 0 1 1 0 1 Zm 0 0 0 opc H Zn Zd

Decode fields
Instruction Details Feature
opc H
00 0 ZIP1, ZIP2 (vectors) — ZIP1 FEAT_F64MM
00 1 ZIP1, ZIP2 (vectors) — ZIP2 FEAT_F64MM
01 0 UZP1, UZP2 (vectors) — UZP1 FEAT_F64MM
01 1 UZP1, UZP2 (vectors) — UZP2 FEAT_F64MM
10 UNALLOCATED -
11 0 TRN1, TRN2 (vectors) — TRN1 FEAT_F64MM
11 1 TRN1, TRN2 (vectors) — TRN2 FEAT_F64MM

SVE Integer Compare - Vectors

These instructions are under SVE encodings.

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
00100100 0 op0

Decode fields
Instruction details
op0
0 SVE integer compare vectors
1 SVE integer compare with wide elements

SVE integer compare vectors

These instructions are under SVE Integer Compare - Vectors.

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 0 size 0 Zm op 0 o2 Pg Zn ne Pd

Decode fields
Instruction Details
op o2 ne
0 0 0 CMP<cc> (vectors) — CMPHS
0 0 1 CMP<cc> (vectors) — CMPHI
0 1 0 CMP<cc> (wide elements) — CMPEQ
0 1 1 CMP<cc> (wide elements) — CMPNE
1 0 0 CMP<cc> (vectors) — CMPGE
1 0 1 CMP<cc> (vectors) — CMPGT
1 1 0 CMP<cc> (vectors) — CMPEQ
1 1 1 CMP<cc> (vectors) — CMPNE

SVE integer compare with wide elements

These instructions are under SVE Integer Compare - Vectors.

Page 2617
Top-level encodings for A64

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 0 size 0 Zm U 1 lt Pg Zn ne Pd

Decode fields
Instruction Details
U lt ne
0 0 0 CMP<cc> (wide elements) — CMPGE
0 0 1 CMP<cc> (wide elements) — CMPGT
0 1 0 CMP<cc> (wide elements) — CMPLT
0 1 1 CMP<cc> (wide elements) — CMPLE
1 0 0 CMP<cc> (wide elements) — CMPHS
1 0 1 CMP<cc> (wide elements) — CMPHI
1 1 0 CMP<cc> (wide elements) — CMPLO
1 1 1 CMP<cc> (wide elements) — CMPLS

SVE integer compare with unsigned immediate

These instructions are under SVE encodings.

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 0 size 1 imm7 lt Pg Zn ne Pd

Decode fields
Instruction Details
lt ne
0 0 CMP<cc> (immediate) — CMPHS
0 1 CMP<cc> (immediate) — CMPHI
1 0 CMP<cc> (immediate) — CMPLO
1 1 CMP<cc> (immediate) — CMPLS

SVE integer compare with signed immediate

These instructions are under SVE encodings.

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 size 0 imm5 op 0 o2 Pg Zn ne Pd

Decode fields
Instruction Details
op o2 ne
0 0 0 CMP<cc> (immediate) — CMPGE
0 0 1 CMP<cc> (immediate) — CMPGT
0 1 0 CMP<cc> (immediate) — CMPLT
0 1 1 CMP<cc> (immediate) — CMPLE
1 0 0 CMP<cc> (immediate) — CMPEQ
1 0 1 CMP<cc> (immediate) — CMPNE
1 1 UNALLOCATED

SVE predicate logical operations

These instructions are under SVE encodings.

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 op S 0 0 Pm 0 1 Pg o2 Pn o3 Pd

Decode fields
Instruction Details
op S o2 o3
0 0 0 0 AND (predicates)

Page 2618
Top-level encodings for A64

Decode fields
Instruction Details
op S o2 o3
0 0 0 1 BIC (predicates)
0 0 1 0 EOR (predicates)
0 0 1 1 SEL (predicates)
0 1 0 0 ANDS
0 1 0 1 BICS
0 1 1 0 EORS
0 1 1 1 UNALLOCATED
1 0 0 0 ORR (predicates)
1 0 0 1 ORN (predicates)
1 0 1 0 NOR
1 0 1 1 NAND
1 1 0 0 ORRS
1 1 0 1 ORNS
1 1 1 0 NORS
1 1 1 1 NANDS

SVE Propagate Break

These instructions are under SVE encodings.

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
00100101 00 11 op0

Decode fields
Instruction details
op0
0 SVE propagate break from previous partition
1 UNALLOCATED

SVE propagate break from previous partition

These instructions are under SVE Propagate Break.

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 op S 0 0 Pm 1 1 Pg 0 Pn B Pd

Decode fields
Instruction Details
op S B
0 0 0 BRKPA
0 0 1 BRKPB
0 1 0 BRKPAS
0 1 1 BRKPBS
1 UNALLOCATED

SVE Partition Break

These instructions are under SVE encodings.

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
00100101 op0 01 op1 01 op2 op3

Decode fields Instruction details

Page 2619
Top-level encodings for A64

op0 op1 op2 op3

0 1000 0 0 SVE propagate break to next partition
0 1000 0 1 UNALLOCATED
0 x000 1 UNALLOCATED
0 x1xx UNALLOCATED
0 xx1x UNALLOCATED
0 xxx1 UNALLOCATED
1 0000 1 UNALLOCATED
1 != 0000 UNALLOCATED
0000 0 SVE partition break condition

SVE propagate break to next partition

These instructions are under SVE Partition Break.

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 0 S 0 1 1 0 0 0 0 1 Pg 0 Pn 0 Pdm

Decode fields
Instruction Details
S
0 BRKN
1 BRKNS

SVE partition break condition

These instructions are under SVE Partition Break.

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 B S 0 1 0 0 0 0 0 1 Pg 0 Pn M Pd

Decode fields
Instruction Details
B S M
1 1 UNALLOCATED
0 0 BRKA
0 1 0 BRKAS
1 0 BRKB
1 1 0 BRKBS

SVE Predicate Misc

These instructions are under SVE encodings.

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
00100101 01 op0 11 op1 op2 op3 op4

Decode fields
Instruction details
op0 op1 op2 op3 op4
0000 x0 0 SVE predicate test
0100 x0 0 UNALLOCATED
0x10 x0 0 UNALLOCATED
0xx1 x0 0 UNALLOCATED
0xxx x1 0 UNALLOCATED
1000 000 00 0 SVE predicate first active

Page 2620
Top-level encodings for A64

1000 000 != 00 0 UNALLOCATED

1000 100 10 0000 0 SVE predicate zero
1000 100 10 != 0000 0 UNALLOCATED
1000 110 00 0 SVE predicate read from FFR (predicated)
1001 000 0x 0 UNALLOCATED
1001 000 10 0 PNEXT
1001 000 11 0 UNALLOCATED
1001 100 10 0 UNALLOCATED
1001 110 00 0000 0 SVE predicate read from FFR (unpredicated)
1001 110 00 != 0000 0 UNALLOCATED
100x 010 0 UNALLOCATED
100x 100 0x 0 SVE predicate initialize
100x 100 11 0 UNALLOCATED
100x 110 != 00 0 UNALLOCATED
100x xx1 0 UNALLOCATED
110x 0 UNALLOCATED
1x1x 0 UNALLOCATED
1 UNALLOCATED

SVE predicate test

These instructions are under SVE Predicate Misc.

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 op S 0 1 0 0 0 0 1 1 Pg 0 Pn 0 opc2

Decode fields
Instruction Details
op S opc2
0 0 UNALLOCATED
0 1 0000 PTEST
0 1 0001 UNALLOCATED
0 1 001x UNALLOCATED
0 1 01xx UNALLOCATED
0 1 1xxx UNALLOCATED
1 UNALLOCATED

SVE predicate first active

These instructions are under SVE Predicate Misc.

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 op S 0 1 1 0 0 0 1 1 0 0 0 0 0 Pg 0 Pdn

Decode fields
Instruction Details
op S
0 0 UNALLOCATED
0 1 PFIRST
1 UNALLOCATED

SVE predicate zero

These instructions are under SVE Predicate Misc.

Page 2621
Top-level encodings for A64

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 op S 0 1 1 0 0 0 1 1 1 0 0 1 0 0 0 0 0 0 Pd

Decode fields
Instruction Details
op S
0 0 PFALSE
0 1 UNALLOCATED
1 UNALLOCATED

SVE predicate read from FFR (predicated)

These instructions are under SVE Predicate Misc.

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 op S 0 1 1 0 0 0 1 1 1 1 0 0 0 Pg 0 Pd

Decode fields
Instruction Details
op S
0 0 RDFFR (predicated)
0 1 RDFFRS
1 UNALLOCATED

SVE predicate read from FFR (unpredicated)

These instructions are under SVE Predicate Misc.

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 op S 0 1 1 0 0 1 1 1 1 1 0 0 0 0 0 0 0 0 Pd

Decode fields
Instruction Details
op S
0 0 RDFFR (unpredicated)
0 1 UNALLOCATED
1 UNALLOCATED

SVE predicate initialize

These instructions are under SVE Predicate Misc.

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 size 0 1 1 0 0 S 1 1 1 0 0 0 pattern 0 Pd

Decode fields
Instruction Details
S
0 PTRUE
1 PTRUES

SVE Integer Compare - Scalars

These instructions are under SVE encodings.

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
00100101 1 00 op0 op1 op2

Decode fields
Instruction details
op0 op1 op2
0 SVE integer compare scalar count and limit

Page 2622
Top-level encodings for A64

1 000 0000 SVE conditionally terminate scalars

1 000 != 0000 UNALLOCATED
1 != 000 UNALLOCATED

SVE integer compare scalar count and limit

These instructions are under SVE Integer Compare - Scalars.

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 size 1 Rm 0 0 0 sf U lt Rn eq Pd

Decode fields
Instruction Details
U lt eq
0 UNALLOCATED
0 1 0 WHILELT
0 1 1 WHILELE
1 1 0 WHILELO
1 1 1 WHILELS

SVE conditionally terminate scalars

These instructions are under SVE Integer Compare - Scalars.

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 op sz 1 Rm 0 0 1 0 0 0 Rn ne 0 0 0 0

Decode fields
Instruction Details
op ne
0 UNALLOCATED
1 0 CTERMEQ, CTERMNE — CTERMEQ
1 1 CTERMEQ, CTERMNE — CTERMNE

SVE Integer Wide Immediate - Unpredicated

These instructions are under SVE encodings.

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
00100101 1 op0 op1 11

Decode fields
Instruction details
op0 op1
00 SVE integer add/subtract immediate (unpredicated)
01 SVE integer min/max immediate (unpredicated)
10 SVE integer multiply immediate (unpredicated)
11 0 SVE broadcast integer immediate (unpredicated)
11 1 SVE broadcast floating-point immediate (unpredicated)

SVE integer add/subtract immediate (unpredicated)

These instructions are under SVE Integer Wide Immediate - Unpredicated.

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 size 1 0 0 opc 1 1 sh imm8 Zdn

Page 2623
Top-level encodings for A64

Decode fields
Instruction Details
opc
000 ADD (immediate)
001 SUB (immediate)
010 UNALLOCATED
011 SUBR (immediate)
100 SQADD (immediate)
101 UQADD (immediate)
110 SQSUB (immediate)
111 UQSUB (immediate)

SVE integer min/max immediate (unpredicated)

These instructions are under SVE Integer Wide Immediate - Unpredicated.

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 size 1 0 1 opc 1 1 o2 imm8 Zdn

Decode fields
Instruction Details
opc o2
0xx 1 UNALLOCATED
000 0 SMAX (immediate)
001 0 UMAX (immediate)
010 0 SMIN (immediate)
011 0 UMIN (immediate)
1xx UNALLOCATED

SVE integer multiply immediate (unpredicated)

These instructions are under SVE Integer Wide Immediate - Unpredicated.

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 size 1 1 0 opc 1 1 o2 imm8 Zdn

Decode fields
Instruction Details
opc o2
000 0 MUL (immediate)
000 1 UNALLOCATED
001 UNALLOCATED
01x UNALLOCATED
1xx UNALLOCATED

SVE broadcast integer immediate (unpredicated)

These instructions are under SVE Integer Wide Immediate - Unpredicated.

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 size 1 1 1 opc 0 1 1 sh imm8 Zd

Decode fields
Instruction Details
opc
00 DUP (immediate)
01 UNALLOCATED
1x UNALLOCATED

Page 2624
Top-level encodings for A64

SVE broadcast floating-point immediate (unpredicated)

These instructions are under SVE Integer Wide Immediate - Unpredicated.

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 size 1 1 1 opc 1 1 1 o2 imm8 Zd

Decode fields
Instruction Details
opc o2
00 0 FDUP
00 1 UNALLOCATED
01 UNALLOCATED
1x UNALLOCATED

SVE Predicate Count

These instructions are under SVE encodings.

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
00100101 100 10 op0

Decode fields
Instruction details
op0
0 SVE predicate count
1 UNALLOCATED

SVE predicate count

These instructions are under SVE Predicate Count.

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 size 1 0 0 opc 1 0 Pg 0 Pn Rd

Decode fields
Instruction Details
opc
000 CNTP
001 UNALLOCATED
01x UNALLOCATED
1xx UNALLOCATED

SVE Inc/Dec by Predicate Count

These instructions are under SVE encodings.

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
00100101 101 op0 1000 op1

Decode fields
Instruction details
op0 op1
0 0 SVE saturating inc/dec vector by predicate count
0 1 SVE saturating inc/dec register by predicate count
1 0 SVE inc/dec vector by predicate count
1 1 SVE inc/dec register by predicate count

Page 2625
Top-level encodings for A64

SVE saturating inc/dec vector by predicate count

These instructions are under SVE Inc/Dec by Predicate Count.

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 size 1 0 1 0 D U 1 0 0 0 0 opc Pm Zdn

Decode fields
Instruction Details
D U opc
01 UNALLOCATED
1x UNALLOCATED
0 0 00 SQINCP (vector)
0 1 00 UQINCP (vector)
1 0 00 SQDECP (vector)
1 1 00 UQDECP (vector)

SVE saturating inc/dec register by predicate count

These instructions are under SVE Inc/Dec by Predicate Count.

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 size 1 0 1 0 D U 1 0 0 0 1 sf op Pm Rdn

Decode fields
Instruction Details
D U sf op
1 UNALLOCATED
0 0 0 0 SQINCP (scalar) — 32-bit
0 0 1 0 SQINCP (scalar) — 64-bit
0 1 0 0 UQINCP (scalar) — 32-bit
0 1 1 0 UQINCP (scalar) — 64-bit
1 0 0 0 SQDECP (scalar) — 32-bit
1 0 1 0 SQDECP (scalar) — 64-bit
1 1 0 0 UQDECP (scalar) — 32-bit
1 1 1 0 UQDECP (scalar) — 64-bit

SVE inc/dec vector by predicate count

These instructions are under SVE Inc/Dec by Predicate Count.

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 size 1 0 1 1 op D 1 0 0 0 0 opc2 Pm Zdn

Decode fields
Instruction Details
op D opc2
0 01 UNALLOCATED
0 1x UNALLOCATED
0 0 00 INCP (vector)
0 1 00 DECP (vector)
1 UNALLOCATED

SVE inc/dec register by predicate count

These instructions are under SVE Inc/Dec by Predicate Count.

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 size 1 0 1 1 op D 1 0 0 0 1 opc2 Pm Rdn

Page 2626
Top-level encodings for A64

Decode fields
Instruction Details
op D opc2
0 01 UNALLOCATED
0 1x UNALLOCATED
0 0 00 INCP (scalar)
0 1 00 DECP (scalar)
1 UNALLOCATED

SVE Write FFR

These instructions are under SVE encodings.

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
00100101 101 op0 op1 1001 op2 op3 op4

Decode fields
Instruction details
op0 op1 op2 op3 op4
0 00 000 00000 SVE FFR write from predicate
1 00 000 0000 00000 SVE FFR initialise
1 00 000 1xxx 00000 UNALLOCATED
1 00 000 x1xx 00000 UNALLOCATED
1 00 000 xx1x 00000 UNALLOCATED
1 00 000 xxx1 00000 UNALLOCATED
00 000 != 00000 UNALLOCATED
00 != 000 UNALLOCATED
!= 00 UNALLOCATED

SVE FFR write from predicate

These instructions are under SVE Write FFR.

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 opc 1 0 1 0 0 0 1 0 0 1 0 0 0 Pn 0 0 0 0 0

Decode fields
Instruction Details
opc
00 WRFFR
01 UNALLOCATED
1x UNALLOCATED

SVE FFR initialise

These instructions are under SVE Write FFR.

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 0 1 0 0 1 0 1 opc 1 0 1 1 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0

Decode fields
Instruction Details
opc
00 SETFFR
01 UNALLOCATED
1x UNALLOCATED

Page 2627
Top-level encodings for A64

SVE Integer Multiply-Add - Unpredicated

These instructions are under SVE encodings.

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
01000100 0 0 op0 op1 op2

Decode fields
Instruction details
op0 op1 op2
0 000 SVE integer dot product (unpredicated)
0 != 000 UNALLOCATED
1 0xx UNALLOCATED
1 10x UNALLOCATED
1 110 UNALLOCATED
1 111 0 SVE mixed sign dot product
1 111 1 UNALLOCATED

SVE integer dot product (unpredicated)

These instructions are under SVE Integer Multiply-Add - Unpredicated.

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 0 0 1 0 0 size 0 Zm 0 0 0 0 0 U Zn Zda

Decode fields
Instruction Details
U
0 SDOT (vectors)
1 UDOT (vectors)

SVE mixed sign dot product

These instructions are under SVE Integer Multiply-Add - Unpredicated.

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 0 0 1 0 0 size 0 Zm 0 1 1 1 1 0 Zn Zda

Decode fields
Instruction Details Feature
size
0x UNALLOCATED -
10 USDOT (vectors) FEAT_I8MM
11 UNALLOCATED -

SVE Multiply - Indexed

These instructions are under SVE encodings.

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
01000100 1 op0 op1

Decode fields
Instruction details
op0 op1
000 00 SVE integer dot product (indexed)
000 01 UNALLOCATED
000 10 UNALLOCATED
000 11 SVE mixed sign dot product (indexed)
!= 000 UNALLOCATED

Page 2628
Top-level encodings for A64

SVE integer dot product (indexed)

These instructions are under SVE Multiply - Indexed.

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 0 0 1 0 0 size 1 opc 0 0 0 0 0 U Zn Zda

Decode fields
Instruction Details
size U
0x UNALLOCATED
10 0 SDOT (indexed) — 32-bit
10 1 UDOT (indexed) — 32-bit
11 0 SDOT (indexed) — 64-bit
11 1 UDOT (indexed) — 64-bit

SVE mixed sign dot product (indexed)

These instructions are under SVE Multiply - Indexed.

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 0 0 1 0 0 size 1 opc 0 0 0 1 1 U Zn Zda

Decode fields
Instruction Details Feature
size U
0x UNALLOCATED -
10 0 USDOT (indexed) FEAT_I8MM
10 1 SUDOT FEAT_I8MM
11 UNALLOCATED -

SVE Misc

These instructions are under SVE encodings.

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
01000101 0 10 op0

Decode fields
Instruction details
op0
00xx UNALLOCATED
010x UNALLOCATED
0110 SVE integer matrix multiply accumulate
0111 UNALLOCATED
1xxx UNALLOCATED

SVE integer matrix multiply accumulate

These instructions are under SVE Misc.

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 0 0 0 1 0 1 uns 0 Zm 1 0 0 1 1 0 Zn Zd

Decode fields
Instruction Details Feature
uns
00 SMMLA FEAT_I8MM
01 UNALLOCATED -
10 USMMLA FEAT_I8MM

Page 2629
Top-level encodings for A64

Decode fields
Instruction Details Feature
uns
11 UMMLA FEAT_I8MM

SVE floating-point convert precision odd elements

These instructions are under SVE encodings.

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 0 opc 0 0 1 0 opc2 1 0 1 Pg Zn Zd

Decode fields
Instruction Details Feature
opc opc2
0x UNALLOCATED -
10 0x UNALLOCATED -
10 10 BFCVTNT FEAT_BF16
10 11 UNALLOCATED -
11 UNALLOCATED -

SVE floating-point multiply-add (indexed)

These instructions are under SVE encodings.

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 0 size 1 opc 0 0 0 0 0 op Zn Zda

Decode fields
Instruction Details
size op
0x 0 FMLA (indexed) — half-precision
0x 1 FMLS (indexed) — half-precision
10 0 FMLA (indexed) — single-precision
10 1 FMLS (indexed) — single-precision
11 0 FMLA (indexed) — double-precision
11 1 FMLS (indexed) — double-precision

SVE floating-point complex multiply-add (indexed)

These instructions are under SVE encodings.

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 0 size 1 opc 0 0 0 1 rot Zn Zda

Decode fields
Instruction Details
size
0x UNALLOCATED
10 FCMLA (indexed) — half-precision
11 FCMLA (indexed) — single-precision

SVE floating-point multiply (indexed)

These instructions are under SVE encodings.

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 0 size 1 opc 0 0 1 0 0 0 Zn Zd

Page 2630
Top-level encodings for A64

Decode fields
Instruction Details
size
0x FMUL (indexed) — half-precision
10 FMUL (indexed) — single-precision
11 FMUL (indexed) — double-precision

SVE Floating Point Widening Multiply-Add - Indexed

These instructions are under SVE encodings.

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
01100100 op0 1 01 op1 0 op2

Decode fields
Instruction details
op0 op1 op2
0 0 00 SVE BFloat16 floating-point dot product (indexed)
0 0 != 00 UNALLOCATED
0 1 UNALLOCATED
1 SVE floating-point multiply-add long (indexed)

SVE BFloat16 floating-point dot product (indexed)

These instructions are under SVE Floating Point Widening Multiply-Add - Indexed.

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 0 0 op 1 i2 Zm 0 1 0 0 0 0 Zn Zda

Decode fields
Instruction Details Feature
op
0 UNALLOCATED -
1 BFDOT (indexed) FEAT_BF16

SVE floating-point multiply-add long (indexed)

These instructions are under SVE Floating Point Widening Multiply-Add - Indexed.

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 0 1 o2 1 i3h Zm 0 1 op 0 i3l T Zn Zda

Decode fields
Instruction Details Feature
o2 op T
0 UNALLOCATED -
1 0 0 BFMLALB (indexed) FEAT_BF16
1 0 1 BFMLALT (indexed) FEAT_BF16
1 1 UNALLOCATED -

SVE Floating Point Widening Multiply-Add

These instructions are under SVE encodings.

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
01100100 op0 1 10 op1 00 op2

Decode fields
Instruction details
op0 op1 op2

Page 2631
Top-level encodings for A64

0 0 0 SVE BFloat16 floating-point dot product

0 0 1 UNALLOCATED
0 1 UNALLOCATED
1 SVE floating-point multiply-add long

SVE BFloat16 floating-point dot product

These instructions are under SVE Floating Point Widening Multiply-Add.

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 0 0 op 1 Zm 1 0 0 0 0 0 Zn Zda

Decode fields
Instruction Details Feature
op
0 UNALLOCATED -
1 BFDOT (vectors) FEAT_BF16

SVE floating-point multiply-add long

These instructions are under SVE Floating Point Widening Multiply-Add.

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 0 1 o2 1 Zm 1 0 op 0 0 T Zn Zda

Decode fields
Instruction Details Feature
o2 op T
0 UNALLOCATED -
1 0 0 BFMLALB (vectors) FEAT_BF16
1 0 1 BFMLALT (vectors) FEAT_BF16
1 1 UNALLOCATED -

SVE floating point matrix multiply accumulate

These instructions are under SVE encodings.

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 0 opc 1 Zm 1 1 1 0 0 1 Zn Zda

Decode fields
Instruction Details Feature
opc
00 UNALLOCATED -
01 BFMMLA FEAT_BF16
10 FMMLA — 32-bit element FEAT_F32MM
11 FMMLA — 64-bit element FEAT_F64MM

SVE floating-point compare vectors

These instructions are under SVE encodings.

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 size 0 Zm op 1 o2 Pg Zn o3 Pd

Decode fields
Instruction Details
op o2 o3
0 0 0 FCM<cc> (vectors) — FCMGE

Page 2632
Top-level encodings for A64

Decode fields
Instruction Details
op o2 o3
0 0 1 FCM<cc> (vectors) — FCMGT
0 1 0 FCM<cc> (vectors) — FCMEQ
0 1 1 FCM<cc> (vectors) — FCMNE
1 0 0 FCM<cc> (vectors) — FCMUO
1 0 1 FAC<cc> — FACGE
1 1 0 UNALLOCATED
1 1 1 FAC<cc> — FACGT

SVE floating-point arithmetic (unpredicated)

These instructions are under SVE encodings.

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 size 0 Zm 0 0 0 opc Zn Zd

Decode fields
Instruction Details
opc
000 FADD (vectors, unpredicated)
001 FSUB (vectors, unpredicated)
010 FMUL (vectors, unpredicated)
011 FTSMUL
10x UNALLOCATED
110 FRECPS
111 FRSQRTS

SVE Floating Point Arithmetic - Predicated

These instructions are under SVE encodings.

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
01100101 0 op0 100 op1 op2

Decode fields
Instruction details
op0 op1 op2
0x SVE floating-point arithmetic (predicated)
10 000 FTMAD
10 != 000 UNALLOCATED
11 0000 SVE floating-point arithmetic with immediate (predicated)
11 != 0000 UNALLOCATED

SVE floating-point arithmetic (predicated)

These instructions are under SVE Floating Point Arithmetic - Predicated.

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 size 0 0 opc 1 0 0 Pg Zm Zdn

Decode fields
Instruction Details
opc
0000 FADD (vectors, predicated)
0001 FSUB (vectors, predicated)
0010 FMUL (vectors, predicated)

Page 2633
Top-level encodings for A64

Decode fields
Instruction Details
opc
0011 FSUBR (vectors)
0100 FMAXNM (vectors)
0101 FMINNM (vectors)
0110 FMAX (vectors)
0111 FMIN (vectors)
1000 FABD
1001 FSCALE
1010 FMULX
1011 UNALLOCATED
1100 FDIVR
1101 FDIV
111x UNALLOCATED

SVE floating-point arithmetic with immediate (predicated)

These instructions are under SVE Floating Point Arithmetic - Predicated.

31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
0 1 1 0 0 1 0 1 size 0 1 1 opc 1 0 0 Pg 0 0 0 0 i1 Zdn

Decode fields
Instruction Details
opc
000 FADD (immediate)
001 FSUB (immediate)
010 FMUL (immediate)
011 FSUBR (immediate)
100 FMAXNM (immediate)
101 FMINNM (immediate)
110 FMAX (immediate)
111 FMIN (immediate)

SVE Floating Point Unary Operations - Predicated

These instructions are under SVE encodings.

DDI0553B P Armv8m Arm
No ratings yet
DDI0553B P Armv8m Arm
2,067 pages
DDI0403E D Armv7m Arm PDF
No ratings yet
DDI0403E D Armv7m Arm PDF
858 pages
ARMv7-AR Reference Manual (Issue C)
No ratings yet
ARMv7-AR Reference Manual (Issue C)
2,680 pages
ISA A64 XML A Profile-2024-03
No ratings yet
ISA A64 XML A Profile-2024-03
5,277 pages
ISA A64 XML A Profile-2025-03
No ratings yet
ISA A64 XML A Profile-2025-03
6,079 pages
SysReg XML A Profile-2025-03
No ratings yet
SysReg XML A Profile-2025-03
6,718 pages
Campbell M., Baars J.-Glossika Dutch Fluency 1
100% (2)
Campbell M., Baars J.-Glossika Dutch Fluency 1
281 pages
ISA AArch32 XML A Profile-2024-03
No ratings yet
ISA AArch32 XML A Profile-2024-03
2,232 pages
ISA AArch32 XML A Profile-2024-09
No ratings yet
ISA AArch32 XML A Profile-2024-09
2,275 pages
SysReg XML A Profile-2023-12
No ratings yet
SysReg XML A Profile-2023-12
6,170 pages
DDI0600A C Armv8 r64 Supplement
No ratings yet
DDI0600A C Armv8 r64 Supplement
484 pages
ISA A64 XML A Profile-2023-12
No ratings yet
ISA A64 XML A Profile-2023-12
5,276 pages
Welding Repair Sandvik
100% (2)
Welding Repair Sandvik
42 pages
DDI0406C D Armv7ar Arm PDF
No ratings yet
DDI0406C D Armv7ar Arm PDF
2,720 pages
DDI0600B A Armv8 r64 Supplement
No ratings yet
DDI0600B A Armv8 r64 Supplement
576 pages
ARM® NEON™ Intrinsics Reference
No ratings yet
ARM® NEON™ Intrinsics Reference
344 pages
DDI0403D Armv7m Arm
No ratings yet
DDI0403D Armv7m Arm
1,020 pages
DUI0489F Arm Assembler Reference
No ratings yet
DUI0489F Arm Assembler Reference
368 pages
Arm Neon Intrinsics Ref
No ratings yet
Arm Neon Intrinsics Ref
348 pages
Armv86bisa PDF
No ratings yet
Armv86bisa PDF
2,909 pages
DDI0553B G Armv8m Arm PDF
No ratings yet
DDI0553B G Armv8m Arm PDF
1,972 pages
Instruction Set Assembly Guide For Armv7 and Earlier Arm Architectures 100076 0200 00 en
No ratings yet
Instruction Set Assembly Guide For Armv7 and Earlier Arm Architectures 100076 0200 00 en
590 pages
ISA V85a A64 XML 00bet8 PDF
No ratings yet
ISA V85a A64 XML 00bet8 PDF
2,036 pages
DDI0598C B MPAM Supplement
No ratings yet
DDI0598C B MPAM Supplement
410 pages
ARM1176 Reference Manual
No ratings yet
ARM1176 Reference Manual
759 pages
ARM C Language Extensions - Al Grant
No ratings yet
ARM C Language Extensions - Al Grant
81 pages
DDI0608A A Armv9 Supp
No ratings yet
DDI0608A A Armv9 Supp
529 pages
IHI0031G Debug Interface v5 2 Architecture Specification
No ratings yet
IHI0031G Debug Interface v5 2 Architecture Specification
296 pages
ARMv6-M Architecture Reference Manual
No ratings yet
ARMv6-M Architecture Reference Manual
436 pages
IHI0089C Amba Lti Protocol Spec
No ratings yet
IHI0089C Amba Lti Protocol Spec
116 pages
ARMv7M Architecture Reference Manual Thumb-2 Supplement PDF
No ratings yet
ARMv7M Architecture Reference Manual Thumb-2 Supplement PDF
650 pages
ARM-DeN-0068 Coresight Base System Architecture 1.0
No ratings yet
ARM-DeN-0068 Coresight Base System Architecture 1.0
29 pages
IHI0031
No ratings yet
IHI0031
244 pages
ARM Architecture Reference Manual
No ratings yet
ARM Architecture Reference Manual
1,138 pages
Aapcs 32
No ratings yet
Aapcs 32
41 pages
Software Development Guide 100066 0611 00 en
No ratings yet
Software Development Guide 100066 0611 00 en
91 pages
Failure of Submarine Cables Used in High-Voltage Power Transmission (D
100% (2)
Failure of Submarine Cables Used in High-Voltage Power Transmission (D
16 pages
Armv8-A Instruction Set Architecture
100% (1)
Armv8-A Instruction Set Architecture
39 pages
Aapcs 32
No ratings yet
Aapcs 32
41 pages
Aapcs 32
No ratings yet
Aapcs 32
40 pages
Arm Cortex-A77 Software Optimization Guide
No ratings yet
Arm Cortex-A77 Software Optimization Guide
68 pages
Learn The Architecture - Providing Protect - Arm LTD
No ratings yet
Learn The Architecture - Providing Protect - Arm LTD
30 pages
IRR RA 9292 - ECE Law
No ratings yet
IRR RA 9292 - ECE Law
30 pages
Learn The Architecture - Trustzone For Aarch64 102418 0102 01 en
No ratings yet
Learn The Architecture - Trustzone For Aarch64 102418 0102 01 en
53 pages
Application Note: Bare-Metal Boot Code For Armv8-A Processors
No ratings yet
Application Note: Bare-Metal Boot Code For Armv8-A Processors
53 pages
DEN0093 ACPI For Arm Components 1 2 BET0
No ratings yet
DEN0093 ACPI For Arm Components 1 2 BET0
25 pages
Armv8 M Processor Debug 100734 0100 0100 en
No ratings yet
Armv8 M Processor Debug 100734 0100 0100 en
20 pages
Sysvabi 64
No ratings yet
Sysvabi 64
38 pages
ARM UG ARMv8-M Security Extensions Requirements On Dev Tools
No ratings yet
ARM UG ARMv8-M Security Extensions Requirements On Dev Tools
30 pages
Aarch64 Programmer'S Guides Generic Timer
No ratings yet
Aarch64 Programmer'S Guides Generic Timer
22 pages
AEG0014E ARM Corporate Glossary
No ratings yet
AEG0014E ARM Corporate Glossary
41 pages
AMBA CHI Issue E.A Errata v2.0
No ratings yet
AMBA CHI Issue E.A Errata v2.0
24 pages
DDI0408I Cortex A9 Fpu r4p1 TRM
No ratings yet
DDI0408I Cortex A9 Fpu r4p1 TRM
27 pages
DAI0386C Cortex m4 On v2m Mps2
No ratings yet
DAI0386C Cortex m4 On v2m Mps2
24 pages
Armv8-A External Debug
No ratings yet
Armv8-A External Debug
25 pages
Advanced Geology and Economics of Strategic and Critical Minerals
No ratings yet
Advanced Geology and Economics of Strategic and Critical Minerals
185 pages
W. G. Sebald (Uwe Schütte)
No ratings yet
W. G. Sebald (Uwe Schütte)
145 pages
INFOR Training Guide - Creating Reports With Mongoose Reporting (v9.01.00) 1
No ratings yet
INFOR Training Guide - Creating Reports With Mongoose Reporting (v9.01.00) 1
142 pages
Lesson 5 - Converting Units of Measurements
No ratings yet
Lesson 5 - Converting Units of Measurements
3 pages
Convergence-Building Towards Sustainable Management of Watersheds
No ratings yet
Convergence-Building Towards Sustainable Management of Watersheds
54 pages
Practical Cryptography in Python: Learning Correct Cryptography by Example
No ratings yet
Practical Cryptography in Python: Learning Correct Cryptography by Example
11 pages
Deduction vs. Induction
No ratings yet
Deduction vs. Induction
18 pages
Epoxy Jointing of Concrete Bridge Segments
No ratings yet
Epoxy Jointing of Concrete Bridge Segments
8 pages
3TNV74F-SPFN 0CD50-M05560 - en
No ratings yet
3TNV74F-SPFN 0CD50-M05560 - en
22 pages
Indian Pharmacopoeia
No ratings yet
Indian Pharmacopoeia
24 pages
MY - PMPL MY Qualifier Ruleset
No ratings yet
MY - PMPL MY Qualifier Ruleset
18 pages
LECTURE 9 OLIGOPOLY MARKET (Lec)
No ratings yet
LECTURE 9 OLIGOPOLY MARKET (Lec)
19 pages
University of Anbar
No ratings yet
University of Anbar
14 pages
NFO
No ratings yet
NFO
2 pages
Presentation 2
No ratings yet
Presentation 2
12 pages
Saideep Hulas Brochure
No ratings yet
Saideep Hulas Brochure
5 pages
Econ 330
No ratings yet
Econ 330
3 pages
Blind Bank
No ratings yet
Blind Bank
9 pages
GSC201 Assignment Solution 2025 by VU NEXUS Ali Khan
No ratings yet
GSC201 Assignment Solution 2025 by VU NEXUS Ali Khan
4 pages
1.write A Program To Print ' 100 Times Using Linefeed and Carriage Return. Code
No ratings yet
1.write A Program To Print ' 100 Times Using Linefeed and Carriage Return. Code
7 pages
Grout
No ratings yet
Grout
2 pages
Image Captioning Generator Using Deep Machine Learning
No ratings yet
Image Captioning Generator Using Deep Machine Learning
3 pages
Undergraduate International: Medicine
No ratings yet
Undergraduate International: Medicine
2 pages
Training Proposal Google Apps
No ratings yet
Training Proposal Google Apps
2 pages
Samiya Rouguy Resume
No ratings yet
Samiya Rouguy Resume
1 page
My Account - MuleBuy 2
No ratings yet
My Account - MuleBuy 2
1 page
Amazon Web Services: The Definitive Guide for Beginners and Advanced Users
From Everand
Amazon Web Services: The Definitive Guide for Beginners and Advanced Users
Parul Dubey
No ratings yet
Micropolis Data Storage Primer: Micropolis Handbooks
From Everand
Micropolis Data Storage Primer: Micropolis Handbooks
Micropolis Handbooks
No ratings yet
REST API Design Control and Management
From Everand
REST API Design Control and Management
alasdair gilchrist
4/5 (4)
S1000D BRDP Reference Book for Issues 1.9 through 6
From Everand
S1000D BRDP Reference Book for Issues 1.9 through 6
Victoria Ichizli-Bartels
No ratings yet
C Programming Pocket Primer: An Essential Guide to C Programming Basics
From Everand
C Programming Pocket Primer: An Essential Guide to C Programming Basics
Mercury Learning and Information
No ratings yet
Profit From Your Idea: How to Make Smart Licensing Deals
From Everand
Profit From Your Idea: How to Make Smart Licensing Deals
Richard Stim
4/5 (2)
Instrument Procedures Handbook (2025): FAA-H-8083-16B
From Everand
Instrument Procedures Handbook (2025): FAA-H-8083-16B
Federal Aviation Administration (FAA)
4/5 (7)
Automotive Code-to-Flowchart Repair (Ford)
From Everand
Automotive Code-to-Flowchart Repair (Ford)
Mandy Concepcion
No ratings yet
AWS For Developers For Dummies
From Everand
AWS For Developers For Dummies
John Paul Mueller
No ratings yet